Mailing List Archive

Back Compatibility
I have been thinking for a while that is time we revisit our back-
compatibility "policy" (http://wiki.apache.org/lucene-java/BackwardsCompatibility
) in terms of maybe becoming a little leaner and also in terms of
addressing some issues that come up from time to time in relation to
bug fixes that effect how Tokens are produced. As examples of the
latter, see: https://issues.apache.org/jira/browse/LUCENE-1084 and https://issues.apache.org/jira/browse/LUCENE-1100
. Examples of the former issue include things like removing
deprecations sooner and the ability to add new methods to interfaces
(both of these are not to be done ad-hoc)

In the case of bugs, the main issue is that people may be expecting
the "incorrect" behavior (admittedly, the maxFieldLength is not
incorrect), so the question becomes should we be in the business of
preserving incorrect values for a full version?

In the case of being "leaner", there are times when it would be useful
to be able to add new methods to interfaces w/o waiting for a full
major release (other projects do this) and also being able to pare
down the deprecated methods sooner.

I propose a couple of solutions to the leaner issue, but I am not sure
how to handle the incorrectness issue, although I suppose it could be
similar. With all of this, I really think the issue comes down to how
we communicate current and future changes to our users.

1. We add a new section to CHANGES for each release, at the top where
we can declare what deprecations will be removed in the _next_ release
(major or minor) and also any interface API changes
2. When deprecating, the @deprecate tag should declare what version it
will be removed in and that version must be one greater than the next
targeted release. That is, if the next release is 2.4, then anything
deprecated in 2.3 is game to be removed in 2.9.
3. Other ways of communicating changes????

My reasoning for this solution: Our minor release cycles are
currently in the 3-6 months range and our major release cycles are in
the 1-1.5 year range. I think giving someone 4-8 (or whatever) months
is more than enough time to prepare for API changes. I am not sure
how this would effect Index changes, but I do think we should KEEP our
current index reading policy where possible. This may mean that some
deprecated items cannot be removed until a major release and I think
that is fine.

Do people think that the bug issue also fits into this way of doing
things? Or do we need another way to think about those?

These are just suggestions and I am interested in hearing more about
what people think. I know, in some sense, it may make us less stable,
but I doubt it given the time frame of our releases. I also know a
perfectly valid response is "If it ain't broke, don't fix it" and to
great extent, I know it ain't broke. And believe me, I am fine with
that. I am just wondering if there is an opportunity to make Lucene
better.

Cheers,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
> Examples of the former issue include things like removing
> deprecations sooner and the ability to add new methods to interfaces
> (both of these are not to be done ad-hoc)

What would be the difference between ad-hoc and non-ad-hoc?

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 17, 2008, at 2:42 PM, Bill Janssen wrote:

>> Examples of the former issue include things like removing
>> deprecations sooner and the ability to add new methods to interfaces
>> (both of these are not to be done ad-hoc)
>
> What would be the difference between ad-hoc and non-ad-hoc?
>

Maybe bad choice of words, but I meant to say that no interface/
deprecation changes would be done without announcing it and there
being at least one release in the meantime. Thus, if we wanted to add
isFancySchmancy() onto Fieldable today, it would have to be announced,
patch provided and referenced, a release without it (i.e. 2.3) and
then it would be available in 2.4. By ad-hoc, I meant that we
wouldn't just announce it and then have it show up in 2.3 and not give
people time to digest it.

HTH,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Grant Ingersoll wrote:
> 1. We add a new section to CHANGES for each release, at the top where we
> can declare what deprecations will be removed in the _next_ release
> (major or minor) and also any interface API changes
> 2. When deprecating, the @deprecate tag should declare what version it
> will be removed in and that version must be one greater than the next
> targeted release. That is, if the next release is 2.4, then anything
> deprecated in 2.3 is game to be removed in 2.9.

This would mean that one could never simply drop in the new jar and
expect things to still work, which is something that we currently try to
guarantee. That's a significant thing to give up, in terms of
usability. In my experience, folks hate incompatible changes, since
they're frequently no longer actively developing the portion of the code
that's broken, aren't seeking the new feature, etc. This is why lots of
folks stay back on old versions.

In terms of benefits, this would permit us to evolve APIs more rapidly.
So it pits external usability against API evolution speed, with no
clear winner. +0

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Grant Ingersoll wrote:
>
> My reasoning for this solution: Our minor release cycles are
> currently in the 3-6 months range and our major release cycles are in
> the 1-1.5 year range. I think giving someone 4-8 (or whatever) months
> is more than enough time to prepare for API changes. I am not sure
> how this would effect Index changes, but I do think we should KEEP our
> current index reading policy where possible. This may mean that some
> deprecated items cannot be removed until a major release and I think
> that is fine.

Personally, I like the stability of Lucene.

I don't see any problems with deprecations being done earlier, but
actual removal still at the major release.

Is there a roadmap of changes from 3.0 to 4.0 that would warrant such a
procedural change? What is Lucene missing that would have such a change?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 17, 2008, at 4:14 PM, Doug Cutting wrote:

> Grant Ingersoll wrote:
>> 1. We add a new section to CHANGES for each release, at the top
>> where we can declare what deprecations will be removed in the
>> _next_ release (major or minor) and also any interface API changes
>> 2. When deprecating, the @deprecate tag should declare what version
>> it will be removed in and that version must be one greater than the
>> next targeted release. That is, if the next release is 2.4, then
>> anything deprecated in 2.3 is game to be removed in 2.9.
>
> This would mean that one could never simply drop in the new jar and
> expect things to still work, which is something that we currently
> try to guarantee. That's a significant thing to give up, in terms
> of usability. In my experience, folks hate incompatible changes,
> since they're frequently no longer actively developing the portion
> of the code that's broken, aren't seeking the new feature, etc.
> This is why lots of folks stay back on old versions.
>

Yep. I agree. I don't make the suggestion lightly and a perfectly
valid answer is let's not bother. By the same token, do people
really just drop in a new release in these days of continuous
integration and short release cycles? At a minimum, it has to go
through a fair amount of testing, right?

An alternative is to do major releases more often, but, to some
extent, it's all just semantics. The key is communicating what the
exact changes are, regardless of what you call the version number.
This does bring an extra burden in that we would need to be better
about communicating upcoming changes. That I am not exactly thrilled
about, either.


> In terms of benefits, this would permit us to evolve APIs more
> rapidly. So it pits external usability against API evolution speed,
> with no clear winner. +0

Yep, I am still on the fence, but wanted to revisit the discussion in
light of some recent bugs and comments about using Interfaces more.

Perhaps more important is how to handle fixing issues that change how
a document is indexed. Do we preserve the incorrectness for the sake
of back-compatibility? Or do we tell them this way is flat out wrong
and it won't be supported anymore?

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: Back Compatibility [ In reply to ]
Hi Grant,

On 01/17/2008 at 7:51 AM, Grant Ingersoll wrote:
> Our minor release cycles are currently in the 3-6 months range
> and our major release cycles are in the 1-1.5 year range.

Since 2.0.0, including 2.3.0 - assuming it will be released in the next week or so - the minor release intervals will have averaged about 6.5 months, over three releases.

Historically, the major release cycle intervals have roughly been:

1.0 - 6 months (March 2000 - October 2000)
2.0.0 - 6 years (October 2000 - May 2006)

Six years is an incredibly long time to maintain backward compatibility.

Assuming there will be a 2.4 release, and then 3.0 following it, it's pretty optimistic (IMHO) to think that it will be released before June 2008, so for 3.0, that would be:

3.0.0 - 2 years (May 2006 - May 2008)

Two years doesn't seem so long in comparison :).

> I think giving someone 4-8 (or whatever) months is more than
> enough time to prepare for API changes. I am not sure how
> this would effect Index changes, but I do think we should
> KEEP our current index reading policy where possible. This
> may mean that some deprecated items cannot be removed until
> a major release and I think that is fine.

Given the 6.5 month average minor release interval for the most recent major release, and the relatively low probability that this will shrink appreciably, you seem in essense to be advocating altogether abandoning backward API compatibility from one (minor) release to the next.

However, below you are advocating a minimum of one "test balloon" release between incompatible changes:

On 01/17/2008 at 3:41 PM, Grant Ingersoll wrote:
> [N]o interface/deprecation changes would be done without announcing it
> and there being at least one release in the meantime. Thus, if we
> wanted to add isFancySchmancy() onto Fieldable today, it would have to
> be announced, patch provided and referenced, a release without it (i.e.
> 2.3) and then it would be available in 2.4. By ad-hoc, I meant that we
> wouldn't just announce it and then have it show up in 2.3 and not give
> people time to digest it.

If I understand you correctly, a major release series could contain a whole series of non-aligned overlapping back-incompatible changes, since you are allowing individual features to alter backward incompatibility independently of other features. I think this is actually worse than just abandoning back-compatibility, since users would have to look up information on each individual feature to be able to figure out whether they can do a drop-in upgrade.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
If they are " no longer actively developing the portion of the code
that's broken, aren't seeking the new feature, etc", and they stay
back on old versions... isn't that exactly what we want? They can
stay on the old version, and new application development uses the
newer version.

It would be different if it was a core JRE interface or similar -
this is an optional jar.

Part of what always made Windows so fragile is that as it evolved
they tried to maintain backward compatibility - making working with
the old/new code and fixing bugs almost impossible. The bloat became
impossible to deal with.

I bet, if you did a poll of all Lucene users, you would find a
majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,
2.3, or 3.0, that is still going to be the case.

As always, JMO.

On Jan 17, 2008, at 3:14 PM, Doug Cutting wrote:

> Grant Ingersoll wrote:
>> 1. We add a new section to CHANGES for each release, at the top
>> where we can declare what deprecations will be removed in the
>> _next_ release (major or minor) and also any interface API changes
>> 2. When deprecating, the @deprecate tag should declare what
>> version it will be removed in and that version must be one greater
>> than the next targeted release. That is, if the next release is
>> 2.4, then anything deprecated in 2.3 is game to be removed in 2.9.
>
> This would mean that one could never simply drop in the new jar and
> expect things to still work, which is something that we currently
> try to guarantee. That's a significant thing to give up, in terms
> of usability. In my experience, folks hate incompatible changes,
> since they're frequently no longer actively developing the portion
> of the code that's broken, aren't seeking the new feature, etc.
> This is why lots of folks stay back on old versions.
>
> In terms of benefits, this would permit us to evolve APIs more
> rapidly. So it pits external usability against API evolution
> speed, with no clear winner. +0
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 17, 2008, at 7:57 PM, robert engels wrote:

> If they are " no longer actively developing the portion of the code
> that's broken, aren't seeking the new feature, etc", and they stay
> back on old versions... isn't that exactly what we want? They can
> stay on the old version, and new application development uses the
> newer version.
>
> It would be different if it was a core JRE interface or similar -
> this is an optional jar.
>
> Part of what always made Windows so fragile is that as it evolved
> they tried to maintain backward compatibility - making working with
> the old/new code and fixing bugs almost impossible. The bloat became
> impossible to deal with.
>
> I bet, if you did a poll of all Lucene users, you would find a
> majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,
> 2.3, or 3.0, that is still going to be the case.

I found that upgrading from 1.4.3 (our first version) to 1.9, to
2.0, ... 2.2 and even 2.3 rc was painless.

The deprecations in 1.9 gave clear guidance on how to do the upgrade.
Very easy to do. And with Lucene's robust test suite, I had great
confidence that it would work without much testing.

Going forward was simply a matter of dropping in the new jar and
enjoying the improved performance.

The forward compatibility of the actual index was a great boon.

So, while one may not be actively developing code, dropping in a new
jar and getting huge performance gains is a great plus.

Many thanks to you all for such a stable product.

-- DM Smith


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I guess I am suggesting that instead of maintaining the whole major/
minor thing (not including file format) that we relax a bit and say
that any give feature we choose to remove or add has to go through two
release cycles, which according to your averages, would equal just
over 1 year's time. If people can't adapt to a coming change that was
announced a year ago, then I have to wonder why they need to upgrade
at all. (OK, that is a little strong, but...)

And mind you, I am not set in stone about this, I'm just starting the
conversation, being a bit of a devils advocate. I would like to know
if there is a way we can retain Lucene's solid stability and maturity
and encourage new and better features and not have to maintain so much
deprecated code.

I especially think this shorter cycle is useful when it comes to
deprecated features (not that there is much difference between
removing methods and adding methods.) Basically, we have code in
Lucene that we all agree should not be used, yet we leave it sitting
in there for, on average 2 years or more. Dropping that to be 1 year,
on average, is not going to all of a sudden break everyone and make
Lucene unstable. Besides, if people like the deprecated methods so
much, why upgrade in the first place? No one is forcing them.
Usually, the answer is there is some new feature somewhere else that
is needed, but that just shows that people are willing to invest the
time to get better features in the first place. Besides, just because
we can remove things, doesn't mean we have to remove them. For
instance, some bigger features that we improve, we may want to
deprecate for more than one full release cycle.

Fieldable is, in my mind a prime example of needing the ability to
announce additions. Let's say we come up with some new-fangled Field
type called Magic. This thing is so beautiful we all wonder how we
ever lived without it. Great. Now all we need to do is add an
isMagic() declaration onto Fieldable and we're good. Oops, can't do
that. Gotta wait 2 more years for 4.0. Seriously, we have to be
locked into an interface for 2 years or more? And oh by the way, in
that 2 years, Lucene has been left in the dust b/c every other open
source search engine out there already has Magic capabilities.
Furthermore, that gives us one 6 month window (+/-) to get it right
for the next 2 years. I know, it's a bit over the top, but I think it
demonstrates the point. I also don't see what is unstable about
telling the community, well in advance, that the following API changes
are coming, please plan accordingly. Most projects out there don't
even do that. Seriously, in Maven you get updates to functionality
without even knowing you are getting them and I am no where near
advocating for that.

And again, I still think the more pertinent issue that needs to be
addressed is how to better handle bugs in things like Tokenization,
etc. where people may have dependencies on broken functionality, but
haven't fully comprehended that they have such a dependency. I don't
think those fixes/deprecations should have to wait for a major release.

Does anyone have experience w/ how other open source projects deal
with this? Do they have explicit policies? Is it ad hoc? I know for
instance, in a certain library I was using that the difference between
what was announced as a beta and as 1.0 was quite different. Granted,
one could expect that a bit more out of something that was going from
0.X to 1.0, but still it was a pretty significant, more or less
unannounced change (unless you count following all the commit messages
as announcement) and it will require a decent amount of work to upgrade.


-Grant


On Jan 17, 2008, at 6:58 PM, Steven A Rowe wrote:

> Hi Grant,
>
> On 01/17/2008 at 7:51 AM, Grant Ingersoll wrote:
>> Our minor release cycles are currently in the 3-6 months range
>> and our major release cycles are in the 1-1.5 year range.
>
> Since 2.0.0, including 2.3.0 - assuming it will be released in the
> next week or so - the minor release intervals will have averaged
> about 6.5 months, over three releases.
>
> Historically, the major release cycle intervals have roughly been:
>
> 1.0 - 6 months (March 2000 - October 2000)
> 2.0.0 - 6 years (October 2000 - May 2006)
>
> Six years is an incredibly long time to maintain backward
> compatibility.
>
> Assuming there will be a 2.4 release, and then 3.0 following it,
> it's pretty optimistic (IMHO) to think that it will be released
> before June 2008, so for 3.0, that would be:
>
> 3.0.0 - 2 years (May 2006 - May 2008)
>
> Two years doesn't seem so long in comparison :).
>
>> I think giving someone 4-8 (or whatever) months is more than
>> enough time to prepare for API changes. I am not sure how
>> this would effect Index changes, but I do think we should
>> KEEP our current index reading policy where possible. This
>> may mean that some deprecated items cannot be removed until
>> a major release and I think that is fine.
>
> Given the 6.5 month average minor release interval for the most
> recent major release, and the relatively low probability that this
> will shrink appreciably, you seem in essense to be advocating
> altogether abandoning backward API compatibility from one (minor)
> release to the next.
>
> However, below you are advocating a minimum of one "test balloon"
> release between incompatible changes:
>
> On 01/17/2008 at 3:41 PM, Grant Ingersoll wrote:
>> [N]o interface/deprecation changes would be done without announcing
>> it
>> and there being at least one release in the meantime. Thus, if we
>> wanted to add isFancySchmancy() onto Fieldable today, it would have
>> to
>> be announced, patch provided and referenced, a release without it
>> (i.e.
>> 2.3) and then it would be available in 2.4. By ad-hoc, I meant
>> that we
>> wouldn't just announce it and then have it show up in 2.3 and not
>> give
>> people time to digest it.
>
> If I understand you correctly, a major release series could contain
> a whole series of non-aligned overlapping back-incompatible changes,
> since you are allowing individual features to alter backward
> incompatibility independently of other features. I think this is
> actually worse than just abandoning back-compatibility, since users
> would have to look up information on each individual feature to be
> able to figure out whether they can do a drop-in upgrade.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 17, 2008, at 9:30 PM, DM Smith wrote:

>
> On Jan 17, 2008, at 7:57 PM, robert engels wrote:
>
>> If they are " no longer actively developing the portion of the code
>> that's broken, aren't seeking the new feature, etc", and they stay
>> back on old versions... isn't that exactly what we want? They can
>> stay on the old version, and new application development uses the
>> newer version.
>>
>> It would be different if it was a core JRE interface or similar -
>> this is an optional jar.
>>
>> Part of what always made Windows so fragile is that as it evolved
>> they tried to maintain backward compatibility - making working with
>> the old/new code and fixing bugs almost impossible. The bloat
>> became impossible to deal with.
>>
>> I bet, if you did a poll of all Lucene users, you would find a
>> majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,
>> 2.3, or 3.0, that is still going to be the case.
>
> I found that upgrading from 1.4.3 (our first version) to 1.9, to
> 2.0, ... 2.2 and even 2.3 rc was painless.
>
> The deprecations in 1.9 gave clear guidance on how to do the
> upgrade. Very easy to do. And with Lucene's robust test suite, I had
> great confidence that it would work without much testing.

And I don't think this would change at all with what I am proposing.
We still would be giving clear guidance, we would just be saying it's
going to happen in 1 year, not 2 to 6.

>
>
> Going forward was simply a matter of dropping in the new jar and
> enjoying the improved performance.
>
> The forward compatibility of the actual index was a great boon.
>
> So, while one may not be actively developing code, dropping in a new
> jar and getting huge performance gains is a great plus.
>

But even going from 2.2. to 2.3, you get even bigger gains by doing
some work and actually taking the time to update your Analysis
process, reuse tokens, etc.


-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
18 jan 2008 kl. 03.39 skrev Grant Ingersoll:

> Does anyone have experience w/ how other open source projects deal
> with this?

Would be a pain to implement, but it could be done as libcompat.

lucene-2.4-compat-core-3.0.jar


--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
That brings us back to an earlier discussion: "if majority want to
break compatibility, then we should do so, and the minority can back-
port the changes to a previous release if they feel it is warranted."

I don't understand why that isn't a viable approach.

I agree that maintaining interface compatibility through versions is
a great ideal, but when the API becomes so bloated (deprecated
methods, and even usage patterns), it is much harder to learn, and
use properly.

Look at similar problems and how they handled in the JDK. The Date
class has been notorious since its inception. The Calendar class is
almost no better, now they are developing JSR-310 to replace both.

Existing code can still use the Date or Calendar classes. Both they
don't get any "newer" features. This would be similar to use the old
lucene jar.


On Jan 18, 2008, at 12:31 AM, Karl Wettin wrote:

>
> 18 jan 2008 kl. 03.39 skrev Grant Ingersoll:
>
>> Does anyone have experience w/ how other open source projects deal
>> with this?
>
> Would be a pain to implement, but it could be done as libcompat.
>
> lucene-2.4-compat-core-3.0.jar
>
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
18 jan 2008 kl. 07.41 skrev robert engels:

> Look at similar problems and how they handled in the JDK. The Date
> class has been notorious since its inception. The Calendar class is
> almost no better, now they are developing JSR-310 to replace both.
>
> Existing code can still use the Date or Calendar classes. Both they
> don't get any "newer" features. This would be similar to use the old
> lucene jar.

Sort of keeping all version in the trunk at once? IndexWriter2 is
IndexWriter with some some features replaced with something better?
And then IndexWriter3..? That's a bit messy if you ask me. But it
would work. But terrible messy.

--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
That wasn't what I was thinking. They would use lucene23.jar if they
wanted the 2.3 API. Newer code uses the lucene30.jar for the 3.0 API.

The others could continue to back-port 3.0 features to 2.3.X if they
wished (and could do so without changing the API - private changes
only).

I think you can look at the Oracle JDBC drivers as an example. They
warn that a API is going away in the next release, then it is gone.
Yet the new drivers may perform much better (not to mention fix a lot
of bugs). If you weren't using the old features you can easily move
to the new jar, if not, you need to change your code. Granted, it is
much better now, since they no longer have many needed proprietary
features, and rely mostly on the JDBC specification. They often
release newer versions of earlier releases when critical bugs have
been fixed.

JDBC is another good example. JDBC 3.0 requires Java 5. You cannot
use JDBC 3.0 features without it. Because of this, many of the db
vendors latest drivers are Java 5 only, and you need to use a
previous release if running under 1.4.

On Jan 18, 2008, at 1:04 AM, Karl Wettin wrote:

>
> 18 jan 2008 kl. 07.41 skrev robert engels:
>
>> Look at similar problems and how they handled in the JDK. The Date
>> class has been notorious since its inception. The Calendar class
>> is almost no better, now they are developing JSR-310 to replace both.
>>
>> Existing code can still use the Date or Calendar classes. Both
>> they don't get any "newer" features. This would be similar to use
>> the old lucene jar.
>
> Sort of keeping all version in the trunk at once? IndexWriter2 is
> IndexWriter with some some features replaced with something better?
> And then IndexWriter3..? That's a bit messy if you ask me. But it
> would work. But terrible messy.
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: Back Compatibility [ In reply to ]
> Sort of keeping all version in the trunk at once? IndexWriter2 is
> IndexWriter with some some features replaced with something better?
> And then IndexWriter3..? That's a bit messy if you ask me. But it
> would work. But terrible messy.

Brrr, I hate this. Microsoft does this always when they update their COM
interfaces...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Me too...

On Jan 18, 2008, at 4:33 AM, Uwe Schindler wrote:

>> Sort of keeping all version in the trunk at once? IndexWriter2 is
>> IndexWriter with some some features replaced with something better?
>> And then IndexWriter3..? That's a bit messy if you ask me. But it
>> would work. But terrible messy.
>
> Brrr, I hate this. Microsoft does this always when they update their
> COM
> interfaces...
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: If they are " no longer actively developing the portion of the code that's
: broken, aren't seeking the new feature, etc", and they stay back on old
: versions... isn't that exactly what we want? They can stay on the old version,
: and new application development uses the newer version.

This basically mirrors a philosophy that is rising in the Perl
community evangelized by (a really smart dude named chromatic) ...
"why are we worry about the effect of upgrades on users who don't upgrade?"

The problem is not all users are created equal and not all users upgrade
for the same reasons or at the same time...

Group A: If someone is paranoid about upgrading, and is still running
lucene1.4.3 because they are afraid if they upgrade their app will break
and they don't want to deal with it; they don't care about known bugs in
lucene1.4.3, as long as those bugs haven't impacted them yet -- these
people aren't going to care wether we add a bunch of new methods to
interfaces, or remove a bunch of public methods from arbitrary releases,
because they are never going to see them. They might do a total rewrite
of their project later, and they'll worry about it then (when they have
lots of time and QA resources)

Group: B: At the other extreme, are the "free-spirited" developers (god i
hate that that the word "agile" has been co-opted) who are always eager to
upgrade to get the latest bells and whistles, and don't mind making
changes to code and recompiling everytime they upgrades -- just as long as
there are some decent docs on what to change.

Croup: C: In the middle is a larg group of people who are interested in
upgrading, who want bug fixes, are willing to write new code to take
advantage of new features, in some cases are even willing to make
small or medium changes their code to get really good performance
improvements ... but they don't have a lot of time or energy to constantly
rewrite big chunks of their app. For these people, knowing that they can
"drop in" the new version and it will work is a big reason why there are
willing to upgrade, and why they are willing to spend soem time
tweaking code to take advantage of the new features and the new
performacne enhaced APIs -- becuase they don't have to spend a lot of time
just to get the app working as well as it was before.

To draw an analogy...

Group A will stand in one place for a really long time no matter how easy
the path is. Once in a great while they will decide to march forward
dozens of miles in one big push, but only once they feel they have
adequate resources to make the entire trip at once.

Group B likes to frolic, and will happily take two sptens backward and
then 3 steps forward every day.

Group C will walk forward with you at a steady pace, and occasionally even
take a step back before moving forward, but only if the path is clear and
not very steap.

: I bet, if you did a poll of all Lucene users, you would find a majority of
: them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or 3.0, that is
: still going to be the case.

That's probably true, but a nice perk of our current backwards
compatibility commitments is that when people pop up asking questions
about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
problem" and that advice isn't a death sentence -- the steps to move
forward are small and easy.

I look at things the way things like Maven v1 vs v2 worked out, and how
that fractured the community for a long time (as far as i can tell it's
still pretty fractured) because the path from v1 to v2 was so steep and
involved backtracking so much and i worry that if we make changes to our
"copatibility pledge" that don't allow for an even forward walk, we'll
wind up with a heavily fractured community.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: I guess I am suggesting that instead of maintaining the whole major/minor
: thing (not including file format) that we relax a bit and say that any give
: feature we choose to remove or add has to go through two release cycles, which
: according to your averages, would equal just over 1 year's time. If people
: can't adapt to a coming change that was announced a year ago, then I have to
: wonder why they need to upgrade at all. (OK, that is a little strong, but...)

as someone else pointed out somewhere in this thread (reading it all at
once i'm lossing track) this will make it harder for people to understand
how much effort it will be to go from version 3.4 to 3.5 ... is that a
drop in replacement?

Perhaps the crux of the issue is that we as a community need to become
more willing to crank out "major" releases ... if we just released 3.0 and
now someone came up with the "Magic" field type and it's really magically
and we want to start using it but it's not backwards compatibly -- well i
guess are next release just needs to be called 4.0 then ... it's clear
from the version number that this is a significant change, evne if it does
wind up getting released 3 months after v3.0

: And again, I still think the more pertinent issue that needs to be addressed
: is how to better handle bugs in things like Tokenization, etc. where people
: may have dependencies on broken functionality, but haven't fully comprehended
: that they have such a dependency. I don't think those fixes/deprecations
: should have to wait for a major release.

I think situations like this are the one place where using system
properties to force broken/legacy behavior would really make sense ... we
fix the code so all "new" users get the correct/better behavior, and we
document in the CHANGES.txt that the behavior has changed. the code is
drop in compatile for anyone who isn't relying on broken behavior, and if
you are you can set a system proberty to foce the old behavior.
(caveat: to support the few cases people have mentioned where you can't
set system properties easily (applets i think?) a static method should be
provided as well, so if you need old broken behavior *AND* you can't use
system properties you just have to add one line of code to your app)


: Does anyone have experience w/ how other open source projects deal with this?

Poorly.

The best solution I've seen is to support multiple "stable" branches.
we've talked about doing that before, but there haven't been any features
anyone has steped up to backport to an older version since that
discussion. (probably because we've done such a good job of making it easy
for people to upgrade)

As i mentioned elsewhere in this thread: i worry about the community
fragementing if we raise the bar on upgrading in order to lower the bar on
development ... having multiple "stable" branches seems like it could also
fragment the community very easily... people using 3.2.X releases not
being able to interact/help with people using 2.4.Y on the user list
because certain things work drasitcly differnetly.

backporting bug fixes is one thing, but i'm leary of backporting new
features and performance improvements (not that i would object to anyone
doing so ... i'm just scared of where it might lead)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I don't think group C is interested in bug fixes. I just don't see
how Lucene is at all useful if the users are encountering any bug -
so they either don't use that feature, or they have already developed
a work-around (or they have patched the code in a way that avoids the
bug, yet is specific to their environment).

For example, I think the NFS work (bugs, fixes, etc.) was quite
substantial. I think the actual number of people trying to use NFS is
probably very low - as the initial implementation had so many
problems (and IMO is not a very good solution for distributed indexes
anyway). So all the work in trying to make NFS work "correctly"
behind the scenes may have been inefficient, since a more direct, yet
major fix may have solved the problem better (like distributed server
support, not shared index access).

I just think that trying to maintain API compatibility through major
releases is a bad idea. Leads to bloat, and complex code - both
internal and external. In order to achieve great gains in usability
and/or performance in a mature product like Lucene almost certainly
requires massive changes to the processes, algorithms and structures,
and the API should change as well to reflect this.

On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:

>
> : If they are " no longer actively developing the portion of the
> code that's
> : broken, aren't seeking the new feature, etc", and they stay back
> on old
> : versions... isn't that exactly what we want? They can stay on the
> old version,
> : and new application development uses the newer version.
>
> This basically mirrors a philosophy that is rising in the Perl
> community evangelized by (a really smart dude named chromatic) ...
> "why are we worry about the effect of upgrades on users who don't
> upgrade?"
>
> The problem is not all users are created equal and not all users
> upgrade
> for the same reasons or at the same time...
>
> Group A: If someone is paranoid about upgrading, and is still running
> lucene1.4.3 because they are afraid if they upgrade their app will
> break
> and they don't want to deal with it; they don't care about known
> bugs in
> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
> people aren't going to care wether we add a bunch of new methods to
> interfaces, or remove a bunch of public methods from arbitrary
> releases,
> because they are never going to see them. They might do a total
> rewrite
> of their project later, and they'll worry about it then (when they
> have
> lots of time and QA resources)
>
> Group: B: At the other extreme, are the "free-spirited" developers
> (god i
> hate that that the word "agile" has been co-opted) who are always
> eager to
> upgrade to get the latest bells and whistles, and don't mind making
> changes to code and recompiling everytime they upgrades -- just as
> long as
> there are some decent docs on what to change.
>
> Croup: C: In the middle is a larg group of people who are
> interested in
> upgrading, who want bug fixes, are willing to write new code to take
> advantage of new features, in some cases are even willing to make
> small or medium changes their code to get really good performance
> improvements ... but they don't have a lot of time or energy to
> constantly
> rewrite big chunks of their app. For these people, knowing that
> they can
> "drop in" the new version and it will work is a big reason why
> there are
> willing to upgrade, and why they are willing to spend soem time
> tweaking code to take advantage of the new features and the new
> performacne enhaced APIs -- becuase they don't have to spend a lot
> of time
> just to get the app working as well as it was before.
>
> To draw an analogy...
>
> Group A will stand in one place for a really long time no matter
> how easy
> the path is. Once in a great while they will decide to march forward
> dozens of miles in one big push, but only once they feel they have
> adequate resources to make the entire trip at once.
>
> Group B likes to frolic, and will happily take two sptens backward and
> then 3 steps forward every day.
>
> Group C will walk forward with you at a steady pace, and
> occasionally even
> take a step back before moving forward, but only if the path is
> clear and
> not very steap.
>
> : I bet, if you did a poll of all Lucene users, you would find a
> majority of
> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
> 3.0, that is
> : still going to be the case.
>
> That's probably true, but a nice perk of our current backwards
> compatibility commitments is that when people pop up asking questions
> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
> problem" and that advice isn't a death sentence -- the steps to move
> forward are small and easy.
>
> I look at things the way things like Maven v1 vs v2 worked out, and
> how
> that fractured the community for a long time (as far as i can tell
> it's
> still pretty fractured) because the path from v1 to v2 was so steep
> and
> involved backtracking so much and i worry that if we make changes
> to our
> "copatibility pledge" that don't allow for an even forward walk, we'll
> wind up with a heavily fractured community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
One more example on this. A lot of work was done on transaction
support. I would argue that this falls way short of what is needed,
since there is no XA transaction support. Since the lucene index
(unless stored in an XA db) is a separate resource, it really needs
XA support in order to be consistent with the other resources.

All of the transaction work that has been performed only guarantees
that barring a physical hardware failure the lucene index can be
opened and used at a known state. This index though is probably not
consistent with the other resources.

All that was done is that we can now guarantee that the index is
consistent at SOME point in time.

Given the work that was done, we are probably closer to adding XA
support, but I think this would be much easier if the concept of a
transaction was made first class through the API (and then XA
transactions need to be supported).

On Jan 22, 2008, at 2:49 PM, robert engels wrote:

> I don't think group C is interested in bug fixes. I just don't see
> how Lucene is at all useful if the users are encountering any bug -
> so they either don't use that feature, or they have already
> developed a work-around (or they have patched the code in a way
> that avoids the bug, yet is specific to their environment).
>
> For example, I think the NFS work (bugs, fixes, etc.) was quite
> substantial. I think the actual number of people trying to use NFS
> is probably very low - as the initial implementation had so many
> problems (and IMO is not a very good solution for distributed
> indexes anyway). So all the work in trying to make NFS work
> "correctly" behind the scenes may have been inefficient, since a
> more direct, yet major fix may have solved the problem better (like
> distributed server support, not shared index access).
>
> I just think that trying to maintain API compatibility through
> major releases is a bad idea. Leads to bloat, and complex code -
> both internal and external. In order to achieve great gains in
> usability and/or performance in a mature product like Lucene almost
> certainly requires massive changes to the processes, algorithms and
> structures, and the API should change as well to reflect this.
>
> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>
>>
>> : If they are " no longer actively developing the portion of the
>> code that's
>> : broken, aren't seeking the new feature, etc", and they stay back
>> on old
>> : versions... isn't that exactly what we want? They can stay on
>> the old version,
>> : and new application development uses the newer version.
>>
>> This basically mirrors a philosophy that is rising in the Perl
>> community evangelized by (a really smart dude named chromatic) ...
>> "why are we worry about the effect of upgrades on users who don't
>> upgrade?"
>>
>> The problem is not all users are created equal and not all users
>> upgrade
>> for the same reasons or at the same time...
>>
>> Group A: If someone is paranoid about upgrading, and is still running
>> lucene1.4.3 because they are afraid if they upgrade their app will
>> break
>> and they don't want to deal with it; they don't care about known
>> bugs in
>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>> people aren't going to care wether we add a bunch of new methods to
>> interfaces, or remove a bunch of public methods from arbitrary
>> releases,
>> because they are never going to see them. They might do a total
>> rewrite
>> of their project later, and they'll worry about it then (when they
>> have
>> lots of time and QA resources)
>>
>> Group: B: At the other extreme, are the "free-spirited" developers
>> (god i
>> hate that that the word "agile" has been co-opted) who are always
>> eager to
>> upgrade to get the latest bells and whistles, and don't mind making
>> changes to code and recompiling everytime they upgrades -- just as
>> long as
>> there are some decent docs on what to change.
>>
>> Croup: C: In the middle is a larg group of people who are
>> interested in
>> upgrading, who want bug fixes, are willing to write new code to take
>> advantage of new features, in some cases are even willing to make
>> small or medium changes their code to get really good performance
>> improvements ... but they don't have a lot of time or energy to
>> constantly
>> rewrite big chunks of their app. For these people, knowing that
>> they can
>> "drop in" the new version and it will work is a big reason why
>> there are
>> willing to upgrade, and why they are willing to spend soem time
>> tweaking code to take advantage of the new features and the new
>> performacne enhaced APIs -- becuase they don't have to spend a lot
>> of time
>> just to get the app working as well as it was before.
>>
>> To draw an analogy...
>>
>> Group A will stand in one place for a really long time no matter
>> how easy
>> the path is. Once in a great while they will decide to march forward
>> dozens of miles in one big push, but only once they feel they have
>> adequate resources to make the entire trip at once.
>>
>> Group B likes to frolic, and will happily take two sptens backward
>> and
>> then 3 steps forward every day.
>>
>> Group C will walk forward with you at a steady pace, and
>> occasionally even
>> take a step back before moving forward, but only if the path is
>> clear and
>> not very steap.
>>
>> : I bet, if you did a poll of all Lucene users, you would find a
>> majority of
>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>> 3.0, that is
>> : still going to be the case.
>>
>> That's probably true, but a nice perk of our current backwards
>> compatibility commitments is that when people pop up asking questions
>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>> problem" and that advice isn't a death sentence -- the steps to move
>> forward are small and easy.
>>
>> I look at things the way things like Maven v1 vs v2 worked out,
>> and how
>> that fractured the community for a long time (as far as i can tell
>> it's
>> still pretty fractured) because the path from v1 to v2 was so
>> steep and
>> involved backtracking so much and i worry that if we make changes
>> to our
>> "copatibility pledge" that don't allow for an even forward walk,
>> we'll
>> wind up with a heavily fractured community.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I humbly disagree about NFS. Arguing about where free time was invested,
or wasted, or inefficient, in an open source project just seems silly.
One of the great benefits is esoteric work that would normally not be
allowed for. NFS is easy. A lot of Lucene users don't care about Lucene.
They just want something easy to setup. It especially doesn't make send
when talking about Michael. He seems to spit out Lucene code in his
sleep. I doubt NFS stuff did anything but to make him more brilliant at
manipulating Lucene. It certainly hasn't made him any less prolific.

I am very in favor of your talk about transactional support. Man do I
want Lucene to have that. But the fact that we are getting to where the
index cannot be corrupted is still a great step forward. Knowing that my
indexes will not be corrupted while running at a place that needs access
24/7 is just wonderful. I can get something working for them quick,
whether its lost a bit of data or not. Now full support to guarantee
that my Lucene index is consistent with my Database? Even better. I
wish. But I am still very thankful for the first step of a guaranteed
consistent index.

Your glass is always half full ;) I aspire to your crankiness when I get
older.

- Mark


robert engels wrote:
> One more example on this. A lot of work was done on transaction
> support. I would argue that this falls way short of what is needed,
> since there is no XA transaction support. Since the lucene index
> (unless stored in an XA db) is a separate resource, it really needs XA
> support in order to be consistent with the other resources.
>
> All of the transaction work that has been performed only guarantees
> that barring a physical hardware failure the lucene index can be
> opened and used at a known state. This index though is probably not
> consistent with the other resources.
>
> All that was done is that we can now guarantee that the index is
> consistent at SOME point in time.
>
> Given the work that was done, we are probably closer to adding XA
> support, but I think this would be much easier if the concept of a
> transaction was made first class through the API (and then XA
> transactions need to be supported).
>
> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>
>> I don't think group C is interested in bug fixes. I just don't see
>> how Lucene is at all useful if the users are encountering any bug -
>> so they either don't use that feature, or they have already developed
>> a work-around (or they have patched the code in a way that avoids the
>> bug, yet is specific to their environment).
>>
>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>> substantial. I think the actual number of people trying to use NFS is
>> probably very low - as the initial implementation had so many
>> problems (and IMO is not a very good solution for distributed indexes
>> anyway). So all the work in trying to make NFS work "correctly"
>> behind the scenes may have been inefficient, since a more direct, yet
>> major fix may have solved the problem better (like distributed server
>> support, not shared index access).
>>
>> I just think that trying to maintain API compatibility through major
>> releases is a bad idea. Leads to bloat, and complex code - both
>> internal and external. In order to achieve great gains in usability
>> and/or performance in a mature product like Lucene almost certainly
>> requires massive changes to the processes, algorithms and structures,
>> and the API should change as well to reflect this.
>>
>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>
>>>
>>> : If they are " no longer actively developing the portion of the
>>> code that's
>>> : broken, aren't seeking the new feature, etc", and they stay back
>>> on old
>>> : versions... isn't that exactly what we want? They can stay on the
>>> old version,
>>> : and new application development uses the newer version.
>>>
>>> This basically mirrors a philosophy that is rising in the Perl
>>> community evangelized by (a really smart dude named chromatic) ...
>>> "why are we worry about the effect of upgrades on users who don't
>>> upgrade?"
>>>
>>> The problem is not all users are created equal and not all users
>>> upgrade
>>> for the same reasons or at the same time...
>>>
>>> Group A: If someone is paranoid about upgrading, and is still running
>>> lucene1.4.3 because they are afraid if they upgrade their app will
>>> break
>>> and they don't want to deal with it; they don't care about known
>>> bugs in
>>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>>> people aren't going to care wether we add a bunch of new methods to
>>> interfaces, or remove a bunch of public methods from arbitrary
>>> releases,
>>> because they are never going to see them. They might do a total
>>> rewrite
>>> of their project later, and they'll worry about it then (when they have
>>> lots of time and QA resources)
>>>
>>> Group: B: At the other extreme, are the "free-spirited" developers
>>> (god i
>>> hate that that the word "agile" has been co-opted) who are always
>>> eager to
>>> upgrade to get the latest bells and whistles, and don't mind making
>>> changes to code and recompiling everytime they upgrades -- just as
>>> long as
>>> there are some decent docs on what to change.
>>>
>>> Croup: C: In the middle is a larg group of people who are interested in
>>> upgrading, who want bug fixes, are willing to write new code to take
>>> advantage of new features, in some cases are even willing to make
>>> small or medium changes their code to get really good performance
>>> improvements ... but they don't have a lot of time or energy to
>>> constantly
>>> rewrite big chunks of their app. For these people, knowing that
>>> they can
>>> "drop in" the new version and it will work is a big reason why there
>>> are
>>> willing to upgrade, and why they are willing to spend soem time
>>> tweaking code to take advantage of the new features and the new
>>> performacne enhaced APIs -- becuase they don't have to spend a lot
>>> of time
>>> just to get the app working as well as it was before.
>>>
>>> To draw an analogy...
>>>
>>> Group A will stand in one place for a really long time no matter how
>>> easy
>>> the path is. Once in a great while they will decide to march forward
>>> dozens of miles in one big push, but only once they feel they have
>>> adequate resources to make the entire trip at once.
>>>
>>> Group B likes to frolic, and will happily take two sptens backward and
>>> then 3 steps forward every day.
>>>
>>> Group C will walk forward with you at a steady pace, and
>>> occasionally even
>>> take a step back before moving forward, but only if the path is
>>> clear and
>>> not very steap.
>>>
>>> : I bet, if you did a poll of all Lucene users, you would find a
>>> majority of
>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>>> 3.0, that is
>>> : still going to be the case.
>>>
>>> That's probably true, but a nice perk of our current backwards
>>> compatibility commitments is that when people pop up asking questions
>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>> problem" and that advice isn't a death sentence -- the steps to move
>>> forward are small and easy.
>>>
>>> I look at things the way things like Maven v1 vs v2 worked out, and how
>>> that fractured the community for a long time (as far as i can tell it's
>>> still pretty fractured) because the path from v1 to v2 was so steep and
>>> involved backtracking so much and i worry that if we make changes to
>>> our
>>> "copatibility pledge" that don't allow for an even forward walk, we'll
>>> wind up with a heavily fractured community.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 22, 2008, at 3:45 PM, Chris Hostetter wrote:
>
> Perhaps the crux of the issue is that we as a community need to become
> more willing to crank out "major" releases ... if we just released
> 3.0 and
> now someone came up with the "Magic" field type and it's really
> magically
> and we want to start using it but it's not backwards compatibly --
> well i
> guess are next release just needs to be called 4.0 then ... it's clear
> from the version number that this is a significant change, evne if
> it does
> wind up getting released 3 months after v3.0

To paraphrase a dead English guy: A rose by any other name is still
the same, right?

Basically, all the version number tick saves them from is having to
read the CHANGES file, right?

To some extent, I am proposing that we clean out the cruft once a
year. Consider it spring cleaning. If we want to mark it as a major
version, I am fine with that. Basically, if history is any
indication, this would mean our releases will look like (given our
avg. 6 mos release cycle):
3.0
3.1
4.0
4.1
5.0
5.1

Thus, the version numbers become meaningless; the question is what do
we see as best for Lucene? We could just as easily call it Lucene
Summer '08 and Lucene Winter '08. Heck, we could pull the old MS Word
2.0 to MS Word 6.0 and jump to Lucene 6.0 next, too, for all I care.
I think 1 year is plenty long to keep both user Group B and C happy (A
will be oblivious). Once a year cleanup of code, in my mind is not
too burdensome for those in Group C. I consider myself in Group C,
for most tools I use (Lucene is probably the exception) and I can't
recall the last time I had an application that uses Lucene like stuff
that I haven't touched in over a year. But even for those other
tools, I expect that I am going to have a major upgrade once a year.
In fact, it is often part of the license agreement from commercial
companies and I would feel cheated if I didn't get it.

I think one could even argue that Group C would be happier w/ more
frequent removals of cruft, since it can be handled in a more
incremental way, versus an all at once upgrade every 2 years. They
have the option of how big of a chunk to bite off.


-Grant




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Grant Ingersoll wrote:
> Does anyone have experience w/ how other open source projects deal with
> this?

Use abstract base classes instead of interfaces: they're much easier to
evolve back-compatibly. In Hadoop, for example, we really wish that
Mapper and Reducer were not interfaces and are very happy that
FileSystem is an abstract class.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: To paraphrase a dead English guy: A rose by any other name is still the same,
: right?
:
: Basically, all the version number tick saves them from is having to read the
: CHANGES file, right?

Correct: i'm not disagreeing with your basic premise, just pointing out
that it can be done with the current model, and that predicable "version
identifiers" are a good idea when dealing with backwards compatibility.

: Thus, the version numbers become meaningless; the question is what do we see
: as best for Lucene? We could just as easily call it Lucene Summer '08 and
: Lucene Winter '08. Heck, we could pull the old MS Word 2.0 to MS Word 6.0 and

well .. i would argue that with what you hpothozied *then* version numbers
would becoming meaningless ... having 3.0, 3.1, 3.2, 4.0 would be no
differnet then having 3, 4, 5, 6 -- our version numbers would be
identifiers with no other context ... i'm just saying we should keep the
context in so that you know whether or not version X is backwards
compatible with version Y.

Which is not to say that we shouldn't hcange our version number format...

Ie: we could start using quad-tuple version numbers: 3.2.5.0 instead of 3.5.0

3: major version #
identifies file format back compatibility (as today)
2: api compat version #
classes/methods may be removed when this changes
5: minor version #
new methods may be added when this changes (as today)
0: patch version #
changes only when there are serious bug fixes

...that might mean that our version numbers go...

3.0.0.0
3.0.1.0
3.1.0.0
3.1.1.0
3.1.2.0
3.2.0.0

...where most numbers never get above "2" but at least the version number
conveys useful compatibility information (at no added developer "cost")



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I think there are a lot of applications using Lucene where "whether
its lost a bit of data or not" is not acceptable.

However, it is probably fine for a web search, or intranet search.

As to your first point, that is why the really great open-source
projects (eclipse, open office) have a financial backer that provides
significant direction, and contributions. They wouldn't waste their
resources developing esoteric features with little appeal, and direct
their resources to broader features that others can then develop
finer features on top off.

I don't question the abilities of Michael whatsoever - I just wish
they were directed at broader features. The review by the voters (and
this list) allows development to focused.

Frequently perfectly correct patches are rejected by the voters. Why?
Because SOMEONE needs to keep the development focused - if not there
will be chaos.

On Jan 22, 2008, at 4:19 PM, Mark Miller wrote:

> I humbly disagree about NFS. Arguing about where free time was
> invested, or wasted, or inefficient, in an open source project just
> seems silly. One of the great benefits is esoteric work that would
> normally not be allowed for. NFS is easy. A lot of Lucene users
> don't care about Lucene. They just want something easy to setup. It
> especially doesn't make send when talking about Michael. He seems
> to spit out Lucene code in his sleep. I doubt NFS stuff did
> anything but to make him more brilliant at manipulating Lucene. It
> certainly hasn't made him any less prolific.
>
> I am very in favor of your talk about transactional support. Man do
> I want Lucene to have that. But the fact that we are getting to
> where the index cannot be corrupted is still a great step forward.
> Knowing that my indexes will not be corrupted while running at a
> place that needs access 24/7 is just wonderful. I can get something
> working for them quick, whether its lost a bit of data or not. Now
> full support to guarantee that my Lucene index is consistent with
> my Database? Even better. I wish. But I am still very thankful for
> the first step of a guaranteed consistent index.
>
> Your glass is always half full ;) I aspire to your crankiness when
> I get older.
>
> - Mark
>
>
> robert engels wrote:
>> One more example on this. A lot of work was done on transaction
>> support. I would argue that this falls way short of what is
>> needed, since there is no XA transaction support. Since the lucene
>> index (unless stored in an XA db) is a separate resource, it
>> really needs XA support in order to be consistent with the other
>> resources.
>>
>> All of the transaction work that has been performed only
>> guarantees that barring a physical hardware failure the lucene
>> index can be opened and used at a known state. This index though
>> is probably not consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA
>> support, but I think this would be much easier if the concept of a
>> transaction was made first class through the API (and then XA
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't
>>> see how Lucene is at all useful if the users are encountering any
>>> bug - so they either don't use that feature, or they have already
>>> developed a work-around (or they have patched the code in a way
>>> that avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>> substantial. I think the actual number of people trying to use
>>> NFS is probably very low - as the initial implementation had so
>>> many problems (and IMO is not a very good solution for
>>> distributed indexes anyway). So all the work in trying to make
>>> NFS work "correctly" behind the scenes may have been inefficient,
>>> since a more direct, yet major fix may have solved the problem
>>> better (like distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through
>>> major releases is a bad idea. Leads to bloat, and complex code -
>>> both internal and external. In order to achieve great gains in
>>> usability and/or performance in a mature product like Lucene
>>> almost certainly requires massive changes to the processes,
>>> algorithms and structures, and the API should change as well to
>>> reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay
>>>> back on old
>>>> : versions... isn't that exactly what we want? They can stay on
>>>> the old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who
>>>> don't upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still
>>>> running
>>>> lucene1.4.3 because they are afraid if they upgrade their app
>>>> will break
>>>> and they don't want to deal with it; they don't care about known
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>>> these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>> releases,
>>>> because they are never going to see them. They might do a total
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when
>>>> they have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited"
>>>> developers (god i
>>>> hate that that the word "agile" has been co-opted) who are
>>>> always eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just
>>>> as long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to
>>>> take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to
>>>> constantly
>>>> rewrite big chunks of their app. For these people, knowing that
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a
>>>> lot of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter
>>>> how easy
>>>> the path is. Once in a great while they will decide to march
>>>> forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens
>>>> backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,
>>>> or 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking
>>>> questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to
>>>> move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out,
>>>> and how
>>>> that fractured the community for a long time (as far as i can
>>>> tell it's
>>>> still pretty fractured) because the path from v1 to v2 was so
>>>> steep and
>>>> involved backtracking so much and i worry that if we make
>>>> changes to our
>>>> "copatibility pledge" that don't allow for an even forward walk,
>>>> we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
robert engels wrote:
> I think there are a lot of applications using Lucene where "whether
> its lost a bit of data or not" is not acceptable.
Yeah, and I have one of them. Which is why I would love the support your
talking about. But its not there yet and I am just grateful that i can
get my customers back up and searching as quick as possible rather than
experience an index corruption. Access to the data is more important
than complete access to the data for my customers (though theyd say they
certainly want both). After such an experience I have to run through the
database and check if anything from the index is missing, and if it is,
re index. Not ideal, but what can you do? I find it odd that you don't
think non corruption is better than nothing. Its a big feature for me.
If the server reboots at night and causes a corruption, I have customers
that will be SOL for some time...id prefer when the server reboots, my
index - whatever is left, is searchable. My customers need to work.
Can't get behind on a daily product :)

I'd prefer what your talking about, but there are tons of other things
I'd love to see in Lucene as well. It just seems odd to complain about
them. I'd think that instead, I might spear head the development. Just
not experienced enough myself to do a lot of the deeper work. You don't
appear so limited. How about helping out with some transactional support :)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
A specific example:

You have a criminal justice system that indexes past court cases.

You do a search for cases involving Joe Smith because you are a judge
and you want to review priors before sentencing. Similar issues with
related cases, case history, etc.

Is it better to return something that may not be correct, or return
an error saying the index is offline and is being rebuilt - please
perform your search later? In this case old false positives are just
as bad as missing new records. I hope that demonstrates the position
clearly.

As I stated, there are several classes of applications where "any
data" whether it is current or valid is acceptable, but I would argue
that in MOST cases this is not the case, and if the interested
subjects fully reviewed their requirements they would not accept that
solution. It is easily summarized with the old adage "garbage in,
garbage out".

The only reason that corruption is ok is that you need to reindex
anyway, and rebuilding from scratch is often faster than determining
the affected documents and updating (especially if corruption is a
possibility).

It was in fact me that brought about the issue that none of the
"lockless commits" code fixed anything related to corruption. The
only way to ensure non-corruption is to sync all data files, then
write and sync the segments file. I think this change could have
been accomplished in about 10 lines of code, and is completely
independent of lockless commits, and in most cases makes lockless
commits obsolete. But to be honest, I am not really certain how
lockless commits can actually work in an environment that allows
updates to the documents (and or related resources), so I am sure
there are aspects I am just ignorant of.

As an aside, we engineered our software years ago to work around
these issues, which why we still use a 1.9 derivative, and monitor
the trunk for important fixes an enhancements.

On Jan 22, 2008, at 8:35 PM, Mark Miller wrote:

>
>
> robert engels wrote:
>> I think there are a lot of applications using Lucene where
>> "whether its lost a bit of data or not" is not acceptable.
> Yeah, and I have one of them. Which is why I would love the support
> your talking about. But its not there yet and I am just grateful
> that i can get my customers back up and searching as quick as
> possible rather than experience an index corruption. Access to the
> data is more important than complete access to the data for my
> customers (though theyd say they certainly want both). After such
> an experience I have to run through the database and check if
> anything from the index is missing, and if it is, re index. Not
> ideal, but what can you do? I find it odd that you don't think non
> corruption is better than nothing. Its a big feature for me. If the
> server reboots at night and causes a corruption, I have customers
> that will be SOL for some time...id prefer when the server reboots,
> my index - whatever is left, is searchable. My customers need to
> work. Can't get behind on a daily product :)
>
> I'd prefer what your talking about, but there are tons of other
> things I'd love to see in Lucene as well. It just seems odd to
> complain about them. I'd think that instead, I might spear head the
> development. Just not experienced enough myself to do a lot of the
> deeper work. You don't appear so limited. How about helping out
> with some transactional support :)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Catching up here...

Re the fracturing when Maven went from v1 -> v2: I think Lucene is a
totally different animal. Maven is an immense framework; Lucene is a
fairly small "core" set of APIs. I think for these "core" type
packages it's very important to keep drop-in compatibility as long as
possible.

I think we _really_ want our users to upgrade. Yes, there are alot of
A people who will forever be stuck in the past, but let's not make
barriers for them to switch to class C, or for class C to upgrade.
When someone is running old versions of Lucene it only hurts their (&
their friends & their users) perception of Lucene.

I think we've done a good job keeping backwards compatibility despite
some rather major recent changes:

* We now do segment merging in a BG thread

* We now flush by RAM (16 MB default) not at 10 buffered docs

* Merge selection is based on size of segment in bytes not doc count

* We will (in 2.4) "autoCommit" far less often (LUCENE-1044)

Now, we could have forced these into a major release instead, but, I
don't think we should have. As much as possible I think we should
keep on minor releases (keep backwards compatibility) so people can
always more easily upgrade.

As far as I know, the only solid reason for 3.0 is the
non-backwards-compatible switch to Java 1.5?

I do like the idea of a static/system property to match legacy
behavior. For example, the bugs around how StandardTokenizer
mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
Clearly those are silly bugs that should be fixed, quickly, with this
back-compatible mode to keep the bug in place.

We might want to, instead, have ctors for many classes take a required
arg which states the version of Lucene you are using? So if you are
writing a new app you would pass in the current version. Then, on
dropping in a future Lucene JAR, we could use that arg to enforce the
right backwards compatibility. This would save users from having to
realize they are hitting one of these situations and then know to go
set the right static/property to retain the buggy behavior.

Also, backporting is extremely costly over time. I'd much rather keep
compatibility for longer on our forward releases, than spend our
scarce resources moving changes back.

So to summarize ... I think we should have (keep) a high tolerance for
cruft to maintain API compatibility. I think our current approach
(try hard to keep compatibility during "minor" releases, then
deprecate, then remove APIs on a major release; do major releases only
when truly required) is a good one.

Mike

Chris Hostetter wrote:

>
> : To paraphrase a dead English guy: A rose by any other name is
> still the same,
> : right?
> :
> : Basically, all the version number tick saves them from is having
> to read the
> : CHANGES file, right?
>
> Correct: i'm not disagreeing with your basic premise, just pointing
> out
> that it can be done with the current model, and that predicable
> "version
> identifiers" are a good idea when dealing with backwards
> compatibility.
>
> : Thus, the version numbers become meaningless; the question is
> what do we see
> : as best for Lucene? We could just as easily call it Lucene
> Summer '08 and
> : Lucene Winter '08. Heck, we could pull the old MS Word 2.0 to MS
> Word 6.0 and
>
> well .. i would argue that with what you hpothozied *then* version
> numbers
> would becoming meaningless ... having 3.0, 3.1, 3.2, 4.0 would be no
> differnet then having 3, 4, 5, 6 -- our version numbers would be
> identifiers with no other context ... i'm just saying we should
> keep the
> context in so that you know whether or not version X is backwards
> compatible with version Y.
>
> Which is not to say that we shouldn't hcange our version number
> format...
>
> Ie: we could start using quad-tuple version numbers: 3.2.5.0
> instead of 3.5.0
>
> 3: major version #
> identifies file format back compatibility (as today)
> 2: api compat version #
> classes/methods may be removed when this changes
> 5: minor version #
> new methods may be added when this changes (as today)
> 0: patch version #
> changes only when there are serious bug fixes
>
> ...that might mean that our version numbers go...
>
> 3.0.0.0
> 3.0.1.0
> 3.1.0.0
> 3.1.1.0
> 3.1.2.0
> 3.2.0.0
>
> ...where most numbers never get above "2" but at least the version
> number
> conveys useful compatibility information (at no added developer
> "cost")
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
core missing in order for you (or, someone) to build XA compliance on
top of it?

Ie, you can open a writer with autoCommit=false and no changes are
committed until you close it. You can abort the session by calling
writer.abort(). What's still missing, besides LUCENE-1044?

Mike

robert engels wrote:

> One more example on this. A lot of work was done on transaction
> support. I would argue that this falls way short of what is needed,
> since there is no XA transaction support. Since the lucene index
> (unless stored in an XA db) is a separate resource, it really needs
> XA support in order to be consistent with the other resources.
>
> All of the transaction work that has been performed only guarantees
> that barring a physical hardware failure the lucene index can be
> opened and used at a known state. This index though is probably
> not consistent with the other resources.
>
> All that was done is that we can now guarantee that the index is
> consistent at SOME point in time.
>
> Given the work that was done, we are probably closer to adding XA
> support, but I think this would be much easier if the concept of a
> transaction was made first class through the API (and then XA
> transactions need to be supported).
>
> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>
>> I don't think group C is interested in bug fixes. I just don't see
>> how Lucene is at all useful if the users are encountering any bug
>> - so they either don't use that feature, or they have already
>> developed a work-around (or they have patched the code in a way
>> that avoids the bug, yet is specific to their environment).
>>
>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>> substantial. I think the actual number of people trying to use NFS
>> is probably very low - as the initial implementation had so many
>> problems (and IMO is not a very good solution for distributed
>> indexes anyway). So all the work in trying to make NFS work
>> "correctly" behind the scenes may have been inefficient, since a
>> more direct, yet major fix may have solved the problem better
>> (like distributed server support, not shared index access).
>>
>> I just think that trying to maintain API compatibility through
>> major releases is a bad idea. Leads to bloat, and complex code -
>> both internal and external. In order to achieve great gains in
>> usability and/or performance in a mature product like Lucene
>> almost certainly requires massive changes to the processes,
>> algorithms and structures, and the API should change as well to
>> reflect this.
>>
>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>
>>>
>>> : If they are " no longer actively developing the portion of the
>>> code that's
>>> : broken, aren't seeking the new feature, etc", and they stay
>>> back on old
>>> : versions... isn't that exactly what we want? They can stay on
>>> the old version,
>>> : and new application development uses the newer version.
>>>
>>> This basically mirrors a philosophy that is rising in the Perl
>>> community evangelized by (a really smart dude named chromatic) ...
>>> "why are we worry about the effect of upgrades on users who don't
>>> upgrade?"
>>>
>>> The problem is not all users are created equal and not all users
>>> upgrade
>>> for the same reasons or at the same time...
>>>
>>> Group A: If someone is paranoid about upgrading, and is still
>>> running
>>> lucene1.4.3 because they are afraid if they upgrade their app
>>> will break
>>> and they don't want to deal with it; they don't care about known
>>> bugs in
>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>> these
>>> people aren't going to care wether we add a bunch of new methods to
>>> interfaces, or remove a bunch of public methods from arbitrary
>>> releases,
>>> because they are never going to see them. They might do a total
>>> rewrite
>>> of their project later, and they'll worry about it then (when
>>> they have
>>> lots of time and QA resources)
>>>
>>> Group: B: At the other extreme, are the "free-spirited"
>>> developers (god i
>>> hate that that the word "agile" has been co-opted) who are always
>>> eager to
>>> upgrade to get the latest bells and whistles, and don't mind making
>>> changes to code and recompiling everytime they upgrades -- just
>>> as long as
>>> there are some decent docs on what to change.
>>>
>>> Croup: C: In the middle is a larg group of people who are
>>> interested in
>>> upgrading, who want bug fixes, are willing to write new code to take
>>> advantage of new features, in some cases are even willing to make
>>> small or medium changes their code to get really good performance
>>> improvements ... but they don't have a lot of time or energy to
>>> constantly
>>> rewrite big chunks of their app. For these people, knowing that
>>> they can
>>> "drop in" the new version and it will work is a big reason why
>>> there are
>>> willing to upgrade, and why they are willing to spend soem time
>>> tweaking code to take advantage of the new features and the new
>>> performacne enhaced APIs -- becuase they don't have to spend a
>>> lot of time
>>> just to get the app working as well as it was before.
>>>
>>> To draw an analogy...
>>>
>>> Group A will stand in one place for a really long time no matter
>>> how easy
>>> the path is. Once in a great while they will decide to march
>>> forward
>>> dozens of miles in one big push, but only once they feel they have
>>> adequate resources to make the entire trip at once.
>>>
>>> Group B likes to frolic, and will happily take two sptens
>>> backward and
>>> then 3 steps forward every day.
>>>
>>> Group C will walk forward with you at a steady pace, and
>>> occasionally even
>>> take a step back before moving forward, but only if the path is
>>> clear and
>>> not very steap.
>>>
>>> : I bet, if you did a poll of all Lucene users, you would find a
>>> majority of
>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>>> 3.0, that is
>>> : still going to be the case.
>>>
>>> That's probably true, but a nice perk of our current backwards
>>> compatibility commitments is that when people pop up asking
>>> questions
>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>> problem" and that advice isn't a death sentence -- the steps to move
>>> forward are small and easy.
>>>
>>> I look at things the way things like Maven v1 vs v2 worked out,
>>> and how
>>> that fractured the community for a long time (as far as i can
>>> tell it's
>>> still pretty fractured) because the path from v1 to v2 was so
>>> steep and
>>> involved backtracking so much and i worry that if we make changes
>>> to our
>>> "copatibility pledge" that don't allow for an even forward walk,
>>> we'll
>>> wind up with a heavily fractured community.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Thats where Robert is confusing me as well. To have XA support you just
need to be able to define a transaction, atomically commit, or rollback.
You also need a consistent state after any of these operations.
LUCENE-1044 seems to guarantee that, and so isn't it more like finishing
up needed work than going down the wrong path? It seems more to me (and
obviously I know a lot less about this than either of you) that you have
just gotten Lucene ready to add XA support. Lucene now fulfills all of
the requirements. No? Someone just needs to write a boatload of JTA code :)

It would seem the next step would be, as Robert suggests, to make a
transaction a first class citizen. The XA protocol will require Lucene
to communicate with the TM about what transactions it has completed to
help in failure recovery and transaction management. I can certainly see
the need for a better transaction abstraction to help with this.

A little enlightenment on this would be great robert. I am very
interested in it for future projects.

And I have to point out...it just seems logical that we would make
things so that the index was consistent at some point before taking the
next step of making it consistent with other resources...no? I am just
still confused about Roberts objections to what is going on here. I
think that it would be a real leap forward to get it done though.

Also, as he mentioned, we really need a good distributed system that
allows for index partitioning. Thats the ticket to more enterprise
adoption. Could be Solr's work though...

Michael McCandless wrote:
>
> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
> core missing in order for you (or, someone) to build XA compliance on
> top of it?
>
> Ie, you can open a writer with autoCommit=false and no changes are
> committed until you close it. You can abort the session by calling
> writer.abort(). What's still missing, besides LUCENE-1044?
>
> Mike
>
> robert engels wrote:
>
>> One more example on this. A lot of work was done on transaction
>> support. I would argue that this falls way short of what is needed,
>> since there is no XA transaction support. Since the lucene index
>> (unless stored in an XA db) is a separate resource, it really needs
>> XA support in order to be consistent with the other resources.
>>
>> All of the transaction work that has been performed only guarantees
>> that barring a physical hardware failure the lucene index can be
>> opened and used at a known state. This index though is probably not
>> consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA
>> support, but I think this would be much easier if the concept of a
>> transaction was made first class through the API (and then XA
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't see
>>> how Lucene is at all useful if the users are encountering any bug -
>>> so they either don't use that feature, or they have already
>>> developed a work-around (or they have patched the code in a way that
>>> avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>> substantial. I think the actual number of people trying to use NFS
>>> is probably very low - as the initial implementation had so many
>>> problems (and IMO is not a very good solution for distributed
>>> indexes anyway). So all the work in trying to make NFS work
>>> "correctly" behind the scenes may have been inefficient, since a
>>> more direct, yet major fix may have solved the problem better (like
>>> distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through major
>>> releases is a bad idea. Leads to bloat, and complex code - both
>>> internal and external. In order to achieve great gains in usability
>>> and/or performance in a mature product like Lucene almost certainly
>>> requires massive changes to the processes, algorithms and
>>> structures, and the API should change as well to reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay back
>>>> on old
>>>> : versions... isn't that exactly what we want? They can stay on the
>>>> old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who don't
>>>> upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still running
>>>> lucene1.4.3 because they are afraid if they upgrade their app will
>>>> break
>>>> and they don't want to deal with it; they don't care about known
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>> releases,
>>>> because they are never going to see them. They might do a total
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when they
>>>> have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited" developers
>>>> (god i
>>>> hate that that the word "agile" has been co-opted) who are always
>>>> eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just as
>>>> long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to
>>>> constantly
>>>> rewrite big chunks of their app. For these people, knowing that
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a lot
>>>> of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter
>>>> how easy
>>>> the path is. Once in a great while they will decide to march forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>>>> 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out, and
>>>> how
>>>> that fractured the community for a long time (as far as i can tell
>>>> it's
>>>> still pretty fractured) because the path from v1 to v2 was so steep
>>>> and
>>>> involved backtracking so much and i worry that if we make changes
>>>> to our
>>>> "copatibility pledge" that don't allow for an even forward walk, we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 23, 2008 9:53 AM, Mark Miller <markrmiller@gmail.com> wrote:
> Also, as he mentioned, we really need a good distributed system that
> allows for index partitioning. Thats the ticket to more enterprise
> adoption. Could be Solr's work though...

Yes, we're working on that :-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Maybe I don't understand lockless commits then.

I just don't think you can enforce transactional consistency without
either 1) locking, or 2) optimistic collision detection. I could be
wrong here, but this has been my experience.

By effectively removing the locking requirement, I think you are
going to have users developing code without thought as to what is
going to happen when locking is added. This is going to break the
backwards compatibility that people are striving for.

The lucene "writer" structure needs to be something like:

start tx for update
do work
commit

where commit is composed of (prepare and commit phases), but commit
may fail.

It is unknown if this can actually happen though, since there is no
unique ID that could cause collisions, but there is the internal id
(which would need to remain constant throughout the tx in order for
queries and delete operations to work).

I am sure it is that I don't understand lockless commits, so I will
give a scenario.

client A issues query looking for documents with OID (a field) =
"some field";
client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"

client A commits and client B commits

unless B is blocked, once A issues the query, the index is going to
end up with 2 different copies of the document.

I understand that Lucene is not a database, and has no concept of
unique constraints. It is my understand that this has been overcome
using locks and sequential access to the index when writing.

In a simple XA implementation, client A would open a SERIALIZABLE
transaction, which would block B from even reading the index. Most
simple XA implementation only support READ_COMMITTED, SERIALIZABLE,
and NONE.

There are other ways of offering finer grained locking (based on
internal id and timestamps), but most are going to need a "server
based" implementation of lucene to pull off.

To summarize, I think the "shared filestore (NFS)" and "lockless
commits" make implementing transactions very difficult. I am sure I
am missing something here, I just don't see what.

On Jan 23, 2008, at 8:53 AM, Mark Miller wrote:

> Thats where Robert is confusing me as well. To have XA support you
> just need to be able to define a transaction, atomically commit, or
> rollback. You also need a consistent state after any of these
> operations. LUCENE-1044 seems to guarantee that, and so isn't it
> more like finishing up needed work than going down the wrong path?
> It seems more to me (and obviously I know a lot less about this
> than either of you) that you have just gotten Lucene ready to add
> XA support. Lucene now fulfills all of the requirements. No?
> Someone just needs to write a boatload of JTA code :)
>
> It would seem the next step would be, as Robert suggests, to make a
> transaction a first class citizen. The XA protocol will require
> Lucene to communicate with the TM about what transactions it has
> completed to help in failure recovery and transaction management. I
> can certainly see the need for a better transaction abstraction to
> help with this.
>
> A little enlightenment on this would be great robert. I am very
> interested in it for future projects.
>
> And I have to point out...it just seems logical that we would make
> things so that the index was consistent at some point before taking
> the next step of making it consistent with other resources...no? I
> am just still confused about Roberts objections to what is going on
> here. I think that it would be a real leap forward to get it done
> though.
>
> Also, as he mentioned, we really need a good distributed system
> that allows for index partitioning. Thats the ticket to more
> enterprise adoption. Could be Solr's work though...
>
> Michael McCandless wrote:
>>
>> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
>> core missing in order for you (or, someone) to build XA compliance on
>> top of it?
>>
>> Ie, you can open a writer with autoCommit=false and no changes are
>> committed until you close it. You can abort the session by calling
>> writer.abort(). What's still missing, besides LUCENE-1044?
>>
>> Mike
>>
>> robert engels wrote:
>>
>>> One more example on this. A lot of work was done on transaction
>>> support. I would argue that this falls way short of what is
>>> needed, since there is no XA transaction support. Since the
>>> lucene index (unless stored in an XA db) is a separate resource,
>>> it really needs XA support in order to be consistent with the
>>> other resources.
>>>
>>> All of the transaction work that has been performed only
>>> guarantees that barring a physical hardware failure the lucene
>>> index can be opened and used at a known state. This index though
>>> is probably not consistent with the other resources.
>>>
>>> All that was done is that we can now guarantee that the index is
>>> consistent at SOME point in time.
>>>
>>> Given the work that was done, we are probably closer to adding XA
>>> support, but I think this would be much easier if the concept of
>>> a transaction was made first class through the API (and then XA
>>> transactions need to be supported).
>>>
>>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>>
>>>> I don't think group C is interested in bug fixes. I just don't
>>>> see how Lucene is at all useful if the users are encountering
>>>> any bug - so they either don't use that feature, or they have
>>>> already developed a work-around (or they have patched the code
>>>> in a way that avoids the bug, yet is specific to their
>>>> environment).
>>>>
>>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>>> substantial. I think the actual number of people trying to use
>>>> NFS is probably very low - as the initial implementation had so
>>>> many problems (and IMO is not a very good solution for
>>>> distributed indexes anyway). So all the work in trying to make
>>>> NFS work "correctly" behind the scenes may have been
>>>> inefficient, since a more direct, yet major fix may have solved
>>>> the problem better (like distributed server support, not shared
>>>> index access).
>>>>
>>>> I just think that trying to maintain API compatibility through
>>>> major releases is a bad idea. Leads to bloat, and complex code -
>>>> both internal and external. In order to achieve great gains in
>>>> usability and/or performance in a mature product like Lucene
>>>> almost certainly requires massive changes to the processes,
>>>> algorithms and structures, and the API should change as well to
>>>> reflect this.
>>>>
>>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : If they are " no longer actively developing the portion of
>>>>> the code that's
>>>>> : broken, aren't seeking the new feature, etc", and they stay
>>>>> back on old
>>>>> : versions... isn't that exactly what we want? They can stay on
>>>>> the old version,
>>>>> : and new application development uses the newer version.
>>>>>
>>>>> This basically mirrors a philosophy that is rising in the Perl
>>>>> community evangelized by (a really smart dude named chromatic) ...
>>>>> "why are we worry about the effect of upgrades on users who
>>>>> don't upgrade?"
>>>>>
>>>>> The problem is not all users are created equal and not all
>>>>> users upgrade
>>>>> for the same reasons or at the same time...
>>>>>
>>>>> Group A: If someone is paranoid about upgrading, and is still
>>>>> running
>>>>> lucene1.4.3 because they are afraid if they upgrade their app
>>>>> will break
>>>>> and they don't want to deal with it; they don't care about
>>>>> known bugs in
>>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>>>> these
>>>>> people aren't going to care wether we add a bunch of new
>>>>> methods to
>>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>>> releases,
>>>>> because they are never going to see them. They might do a
>>>>> total rewrite
>>>>> of their project later, and they'll worry about it then (when
>>>>> they have
>>>>> lots of time and QA resources)
>>>>>
>>>>> Group: B: At the other extreme, are the "free-spirited"
>>>>> developers (god i
>>>>> hate that that the word "agile" has been co-opted) who are
>>>>> always eager to
>>>>> upgrade to get the latest bells and whistles, and don't mind
>>>>> making
>>>>> changes to code and recompiling everytime they upgrades -- just
>>>>> as long as
>>>>> there are some decent docs on what to change.
>>>>>
>>>>> Croup: C: In the middle is a larg group of people who are
>>>>> interested in
>>>>> upgrading, who want bug fixes, are willing to write new code to
>>>>> take
>>>>> advantage of new features, in some cases are even willing to make
>>>>> small or medium changes their code to get really good performance
>>>>> improvements ... but they don't have a lot of time or energy to
>>>>> constantly
>>>>> rewrite big chunks of their app. For these people, knowing
>>>>> that they can
>>>>> "drop in" the new version and it will work is a big reason why
>>>>> there are
>>>>> willing to upgrade, and why they are willing to spend soem time
>>>>> tweaking code to take advantage of the new features and the new
>>>>> performacne enhaced APIs -- becuase they don't have to spend a
>>>>> lot of time
>>>>> just to get the app working as well as it was before.
>>>>>
>>>>> To draw an analogy...
>>>>>
>>>>> Group A will stand in one place for a really long time no
>>>>> matter how easy
>>>>> the path is. Once in a great while they will decide to march
>>>>> forward
>>>>> dozens of miles in one big push, but only once they feel they have
>>>>> adequate resources to make the entire trip at once.
>>>>>
>>>>> Group B likes to frolic, and will happily take two sptens
>>>>> backward and
>>>>> then 3 steps forward every day.
>>>>>
>>>>> Group C will walk forward with you at a steady pace, and
>>>>> occasionally even
>>>>> take a step back before moving forward, but only if the path is
>>>>> clear and
>>>>> not very steap.
>>>>>
>>>>> : I bet, if you did a poll of all Lucene users, you would find
>>>>> a majority of
>>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,
>>>>> or 3.0, that is
>>>>> : still going to be the case.
>>>>>
>>>>> That's probably true, but a nice perk of our current backwards
>>>>> compatibility commitments is that when people pop up asking
>>>>> questions
>>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>>> problem" and that advice isn't a death sentence -- the steps to
>>>>> move
>>>>> forward are small and easy.
>>>>>
>>>>> I look at things the way things like Maven v1 vs v2 worked out,
>>>>> and how
>>>>> that fractured the community for a long time (as far as i can
>>>>> tell it's
>>>>> still pretty fractured) because the path from v1 to v2 was so
>>>>> steep and
>>>>> involved backtracking so much and i worry that if we make
>>>>> changes to our
>>>>> "copatibility pledge" that don't allow for an even forward
>>>>> walk, we'll
>>>>> wind up with a heavily fractured community.
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
robert engels wrote:

> Maybe I don't understand lockless commits then.
>
> I just don't think you can enforce transactional consistency
> without either 1) locking, or 2) optimistic collision detection. I
> could be wrong here, but this has been my experience.
> By effectively removing the locking requirement, I think you are
> going to have users developing code without thought as to what is
> going to happen when locking is added. This is going to break the
> backwards compatibility that people are striving for.

Lucene still has locking (write.lock), to only allow one writer at a
time to make changes to the index (ie, it serializes writer
sessions). Lock-less commits just replaced the old "commit.lock".

> The lucene "writer" structure needs to be something like:
>
> start tx for update
> do work
> commit
>
> where commit is composed of (prepare and commit phases), but commit
> may fail.

Right, this is what IndexWriter does now. It's just that with
autoCommit=false you have total control on when that commit takes
place (only on closing the writer).

> It is unknown if this can actually happen though, since there is no
> unique ID that could cause collisions, but there is the internal id
> (which would need to remain constant throughout the tx in order for
> queries and delete operations to work).

Yes but there are other errors that Lucene may hit, like disk full,
which must (and do) rollback the commit to the start of the
transaction (ie, index state when writer was first opened).

> I am sure it is that I don't understand lockless commits, so I will
> give a scenario.
>
> client A issues query looking for documents with OID (a field) =
> "some field";
> client B issues same query
> both queries return nothing found
> client A inserts document with OID = "some filed"
> client B inserts document with OID = "some field"
>
> client A commits and client B commits
>
> unless B is blocked, once A issues the query, the index is going to
> end up with 2 different copies of the document.
>
> I understand that Lucene is not a database, and has no concept of
> unique constraints. It is my understand that this has been overcome
> using locks and sequential access to the index when writing.
>
> In a simple XA implementation, client A would open a SERIALIZABLE
> transaction, which would block B from even reading the index. Most
> simple XA implementation only support READ_COMMITTED, SERIALIZABLE,
> and NONE.
>
> There are other ways of offering finer grained locking (based on
> internal id and timestamps), but most are going to need a "server
> based" implementation of lucene to pull off.
>
> To summarize, I think the "shared filestore (NFS)" and "lockless
> commits" make implementing transactions very difficult. I am sure I
> am missing something here, I just don't see what.

Lucene hasn't ever supported that case above: it never blocks a
reader from opening the index. But, you could easily build that on
top of Lucene, right?

I'm still trying to understand what you feel is missing in the core
that prevents you from building XA (or, your own transactions
handling that involves another resource like a DB) on top of Lucene...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I guess I don't understand what a commit lock is, or what's its
purpose is. It seems the write lock is all that is needed.

If you still need a write lock, then what is the purpose of
"lockless" commits.

You can get consistency if all writers get the write lock before
performing any read. It would seem this should be the requirement???

Is there a Wiki or some such thing that discusses the "lockless
commits", their purpose and their implementation? I find the email
thread a bit cumbersome to review.


On Jan 23, 2008, at 11:55 AM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Maybe I don't understand lockless commits then.
>>
>> I just don't think you can enforce transactional consistency
>> without either 1) locking, or 2) optimistic collision detection. I
>> could be wrong here, but this has been my experience.
>> By effectively removing the locking requirement, I think you are
>> going to have users developing code without thought as to what is
>> going to happen when locking is added. This is going to break the
>> backwards compatibility that people are striving for.
>
> Lucene still has locking (write.lock), to only allow one writer at
> a time to make changes to the index (ie, it serializes writer
> sessions). Lock-less commits just replaced the old "commit.lock".
>
>> The lucene "writer" structure needs to be something like:
>>
>> start tx for update
>> do work
>> commit
>>
>> where commit is composed of (prepare and commit phases), but
>> commit may fail.
>
> Right, this is what IndexWriter does now. It's just that with
> autoCommit=false you have total control on when that commit takes
> place (only on closing the writer).
>
>> It is unknown if this can actually happen though, since there is
>> no unique ID that could cause collisions, but there is the
>> internal id (which would need to remain constant throughout the tx
>> in order for queries and delete operations to work).
>
> Yes but there are other errors that Lucene may hit, like disk full,
> which must (and do) rollback the commit to the start of the
> transaction (ie, index state when writer was first opened).
>
>> I am sure it is that I don't understand lockless commits, so I
>> will give a scenario.
>>
>> client A issues query looking for documents with OID (a field) =
>> "some field";
>> client B issues same query
>> both queries return nothing found
>> client A inserts document with OID = "some filed"
>> client B inserts document with OID = "some field"
>>
>> client A commits and client B commits
>>
>> unless B is blocked, once A issues the query, the index is going
>> to end up with 2 different copies of the document.
>>
>> I understand that Lucene is not a database, and has no concept of
>> unique constraints. It is my understand that this has been
>> overcome using locks and sequential access to the index when writing.
>>
>> In a simple XA implementation, client A would open a SERIALIZABLE
>> transaction, which would block B from even reading the index.
>> Most simple XA implementation only support READ_COMMITTED,
>> SERIALIZABLE, and NONE.
>>
>> There are other ways of offering finer grained locking (based on
>> internal id and timestamps), but most are going to need a "server
>> based" implementation of lucene to pull off.
>>
>> To summarize, I think the "shared filestore (NFS)" and "lockless
>> commits" make implementing transactions very difficult. I am sure
>> I am missing something here, I just don't see what.
>
> Lucene hasn't ever supported that case above: it never blocks a
> reader from opening the index. But, you could easily build that on
> top of Lucene, right?
>
> I'm still trying to understand what you feel is missing in the core
> that prevents you from building XA (or, your own transactions
> handling that involves another resource like a DB) on top of Lucene...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
robert engels wrote:

> I guess I don't understand what a commit lock is, or what's its
> purpose is. It seems the write lock is all that is needed.

The commit.lock was used to guard access to the "segments" file. A
reader would acquire the lock (blocking out other readers and
writers) when reading the file. And a writer would acquire the lock
when writing it.

> If you still need a write lock, then what is the purpose of
> "lockless" commits.

Lockless commits got rid of one lock (commit.lock), not write.lock.

> You can get consistency if all writers get the write lock before
> performing any read. It would seem this should be the requirement???

In Lucene, you use an IndexReader to do reads (not a writer), which
does not block other readers.

> Is there a Wiki or some such thing that discusses the "lockless
> commits", their purpose and their implementation? I find the email
> thread a bit cumbersome to review.

No, but really the concept is very simple: instead of writing to
segments, we write to segments_1, then segments_2, etc.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: I do like the idea of a static/system property to match legacy
: behavior. For example, the bugs around how StandardTokenizer
: mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
: Clearly those are silly bugs that should be fixed, quickly, with this
: back-compatible mode to keep the bug in place.
:
: We might want to, instead, have ctors for many classes take a required
: arg which states the version of Lucene you are using? So if you are
: writing a new app you would pass in the current version. Then, on
: dropping in a future Lucene JAR, we could use that arg to enforce the
: right backwards compatibility. This would save users from having to
: realize they are hitting one of these situations and then know to go
: set the right static/property to retain the buggy behavior.

I'm not sure that this would be better though ... when i write my code, i
pass "2.3" to all these constructors (or factory methods) and then later i
want to upgrade to 2.3 to get all the new performance goodness ... i
shouldn't have to change all those constructor calls to get all the 2.4
goodness, i should be able to leave my code as is -- but if i do that,
then i might not get all the 2.4 goodness, (like improved
tokenization, or more precise segment merging) because some of that
goodness violates previous assumptions that some code might have had ...
my code doesn't have those assumptions, i know nothing about them, i'll
take whatever behavior the Lucene Developers recommend (unless i see
evidence that it breaks something, in which case i'll happily set a
system property or something that the release notes say will force the
old behavior.

The basic principle being: by default, give users the behavior that is
generally viewed as "correct" -- but give them the option to force
"uncorrect" legacy behavior.

: Also, backporting is extremely costly over time. I'd much rather keep
: compatibility for longer on our forward releases, than spend our
: scarce resources moving changes back.

+1

: So to summarize ... I think we should have (keep) a high tolerance for
: cruft to maintain API compatibility. I think our current approach
: (try hard to keep compatibility during "minor" releases, then
: deprecate, then remove APIs on a major release; do major releases only
: when truly required) is a good one.

i'm with you for the most part, it's just the defintion of "when truly
required" that tends to hang people up ... there's a chicken vs egg
problem of deciding wether the code should drive what the next release
number is: "i've added a bitch'n feature but it requires adding a method
to an interface, therefor the next release must be called 4.0" ... vs the
mindset that "we just had a 3.0 release, it's too soon for another major
release, the next release should be called 3.1, so we need to hold off on
commiting non backwards compatible changes for a while."

I'm in the first camp: version numbers should be descriptive, information
carrying, labels for releases -- but the version number of a release
should be dicated by the code contained in that release. (if that means
the next version after 3.0.0 is 4.0.0, then so be it.)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Thanks.

So all writers still need to get the write lock, before opening the
reader in order to maintain transactional consistency.

Was there performance testing done on the lockless commits with heavy
contention? I would think that reading the directory to find the
latest segments file would be slower. Is there a 'latest segments'
file to avoid this? If not, there probably should be. As long as the
data fits in a single disk block (which is should), I don't think you
will have a consistency problem.

On Jan 23, 2008, at 1:40 PM, Michael McCandless wrote:

> robert engels wrote:
>
>> I guess I don't understand what a commit lock is, or what's its
>> purpose is. It seems the write lock is all that is needed.
>
> The commit.lock was used to guard access to the "segments" file. A
> reader would acquire the lock (blocking out other readers and
> writers) when reading the file. And a writer would acquire the
> lock when writing it.
>
>> If you still need a write lock, then what is the purpose of
>> "lockless" commits.
>
> Lockless commits got rid of one lock (commit.lock), not write.lock.
>
>> You can get consistency if all writers get the write lock before
>> performing any read. It would seem this should be the requirement???
>
> In Lucene, you use an IndexReader to do reads (not a writer), which
> does not block other readers.
>
>> Is there a Wiki or some such thing that discusses the "lockless
>> commits", their purpose and their implementation? I find the email
>> thread a bit cumbersome to review.
>
> No, but really the concept is very simple: instead of writing to
> segments, we write to segments_1, then segments_2, etc.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I guess I don't see the back-porting as an issue. Only those that
want to need to do the back-porting. Head moves on...


On Jan 23, 2008, at 2:00 PM, Chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior. For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a
> required
> : arg which states the version of Lucene you are using? So if you are
> : writing a new app you would pass in the current version. Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce
> the
> : right backwards compatibility. This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my
> code, i
> pass "2.3" to all these constructors (or factory methods) and then
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have
> had ...
> my code doesn't have those assumptions, i know nothing about them,
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.
>
> : Also, backporting is extremely costly over time. I'd much rather
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high
> tolerance for
> : cruft to maintain API compatibility. I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a
> method
> to an interface, therefor the next release must be called 4.0" ...
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another
> major
> release, the next release should be called 3.1, so we need to hold
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release. (if that
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: I guess I don't see the back-porting as an issue. Only those that want to need
: to do the back-porting. Head moves on...

I view it as a potential risk to the overal productivity of the community.

If upgrading from A to B is easy people (in general) won't spend a lot of
time/effort backporting feature from B to A -- this time/effort savings
benefits the community because (depending on the person):
1) that time/effort saved can be spend contributing even more features
to Lucene
2) that time/effort saved improves the impressions people have of Lucene.

If on the other hand upgrading from X to Y is "hard" that encouragees
people to backport features ... in some cases this backporting may be done
"in the open" with people contributing these backports as patches, which
can then be commited/releaseed by developers ... but there is still a
time/effort cost there ... a bigger time/effort cost is the cummulative
time/effort cost of all the people that backport some set of features just
enough to get things working for themselves on their local copy, and don't
contribute thouse changes back ... that cost gets paid by the commuity s a
whole over and over again.

I certianly don't want to discourage anyone who *wants* to backport
features, and I would never suggest that Lucene should make it a policy to
not accept patches to previous releases that backport functionality -- i
just think we should do our best to minimize the need/motivation to spend
time/effort on backporting.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I think you are incorrect.

I would guess the number of people/organizations using Lucene vs.
contributing to Lucene is much greater.

The contributers work in head (should IMO). The users can select a
particular version of Lucene and code their apps accordingly. They
can also back-port features from a later to an earlier release. If
they have limited development resources, they are probably not
working on Lucene (they are working on their apps), but they can
update their own code to work with later versions - which they would
probably rather do than learning the internals and contributing to
Lucene.

If the users are "just dropping in a new version" they are not
contributing to the community... I think just the opposite, they are
parasites. I think a way to gauge this would be the number of
questions/people on the user list versus the development list.

Lucene is a library, and I believe what I stated is earlier is true -
in order to continue to advance it the API needs to be permitted to
change to allow for better functionality and performance. If Lucene
is hand-tied by earlier APIs then this work is either not going to
happen, or be very messy (inefficient).

On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:

>
> : I guess I don't see the back-porting as an issue. Only those that
> want to need
> : to do the back-porting. Head moves on...
>
> I view it as a potential risk to the overal productivity of the
> community.
>
> If upgrading from A to B is easy people (in general) won't spend a
> lot of
> time/effort backporting feature from B to A -- this time/effort
> savings
> benefits the community because (depending on the person):
> 1) that time/effort saved can be spend contributing even more
> features
> to Lucene
> 2) that time/effort saved improves the impressions people have of
> Lucene.
>
> If on the other hand upgrading from X to Y is "hard" that encouragees
> people to backport features ... in some cases this backporting may
> be done
> "in the open" with people contributing these backports as patches,
> which
> can then be commited/releaseed by developers ... but there is still a
> time/effort cost there ... a bigger time/effort cost is the
> cummulative
> time/effort cost of all the people that backport some set of
> features just
> enough to get things working for themselves on their local copy,
> and don't
> contribute thouse changes back ... that cost gets paid by the
> commuity s a
> whole over and over again.
>
> I certianly don't want to discourage anyone who *wants* to backport
> features, and I would never suggest that Lucene should make it a
> policy to
> not accept patches to previous releases that backport functionality
> -- i
> just think we should do our best to minimize the need/motivation to
> spend
> time/effort on backporting.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
robert engels wrote:

> Thanks.
>
> So all writers still need to get the write lock, before opening the
> reader in order to maintain transactional consistency.

I don't understand what you mean by "before opening the reader"? A
writer acquires the write.lock before opening. Readers do not,
unless/until they do their first write operation (deleteDocument/
setNorm).

> Was there performance testing done on the lockless commits with
> heavy contention? I would think that reading the directory to find
> the latest segments file would be slower. Is there a 'latest
> segments' file to avoid this? If not, there probably should be. As
> long as the data fits in a single disk block (which is should), I
> don't think you will have a consistency problem.

Performance tests were done (see LUCENE-710).

And, yes, there is a file segments.gen that records the latest
segment, but it is used along with the directory listing to find the
current segments file.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior. For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a
> required
> : arg which states the version of Lucene you are using? So if you are
> : writing a new app you would pass in the current version. Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce
> the
> : right backwards compatibility. This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my
> code, i
> pass "2.3" to all these constructors (or factory methods) and then
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have
> had ...
> my code doesn't have those assumptions, i know nothing about them,
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.

OK, I agree: the vast majority of users upgrading would in fact want
all of the changes in the new release. And then the rare user who is
affected by that bug fix to StandardTokenizer would have to set the
compatibility mode. So it makes sense for you to get all changes on
upgrading (and NOT specify the legacy version in all ctors).

> : Also, backporting is extremely costly over time. I'd much rather
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high
> tolerance for
> : cruft to maintain API compatibility. I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a
> method
> to an interface, therefor the next release must be called 4.0" ...
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another
> major
> release, the next release should be called 3.1, so we need to hold
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release. (if that
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)

Well, I am weary of doing major releases too often. Though I do
agree that the version number should be a "fastmatch" for reading
through CHANGES.txt.

Say we do this, and zoom forward 2 years when we're up to 6.0, then
poor users stuck on 1.9 will dread upgrading, but probably shouldn't.

One of the amazing things about Lucene, to me, is how many really
major changes we have been able to make while not in fact breaking
backwards compatibility (too much). Being very careful not to make
things public, intentionally not committing to things like exactly
when does a flush or commit or merge actually happen, marking new
APIs as experimental and freely subject to change, using abstract
classes not interfaces, are all wonderful tools that Lucene employs
(and should continue to do so), to enable sizable changes in the
future while keeping backwards compatibility.

Allowing for future backwards compatibility is one of the most
important things we all do when we make changes to Lucene!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
robert engels wrote:

> I think you are incorrect.
>
> I would guess the number of people/organizations using Lucene vs.
> contributing to Lucene is much greater.
>
> The contributers work in head (should IMO). The users can select a
> particular version of Lucene and code their apps accordingly. They
> can also back-port features from a later to an earlier release. If
> they have limited development resources, they are probably not
> working on Lucene (they are working on their apps), but they can
> update their own code to work with later versions - which they
> would probably rather do than learning the internals and
> contributing to Lucene.
>
> If the users are "just dropping in a new version" they are not
> contributing to the community... I think just the opposite, they
> are parasites. I think a way to gauge this would be the number of
> questions/people on the user list versus the development list.

I don't think they are parasites at all. They are users that place
alot of trust in us and will come to the users list with interesting
issues. Many of the improvements to Lucene are sourced from the
users list. Even if that user doesn't do the actual work to fix the
issue, their innocent question and prodding can inspire someone else
to take the idea forward, make a patch, etc. This is the normal and
healthy way that open source works....

> Lucene is a library, and I believe what I stated is earlier is true
> - in order to continue to advance it the API needs to be permitted
> to change to allow for better functionality and performance. If
> Lucene is hand-tied by earlier APIs then this work is either not
> going to happen, or be very messy (inefficient).

The thing is, we have been able to advance lately, sizably, without
breaking APIs, thanks to the "future backwards compatibility
proofing" that Lucene does.

I do agree that if it got to the point where we were forced to make a
hard choice of stunt Lucene's growth so as to keep backwards
compatibility vs let Lucene grow and make a new major release, we
should definitely make a new major release. Search is still young
and if we stunt Lucene now it will slowly die.

It's just that I haven't seen any recent change, except for allowing
JVM 1.5 source, that actually requires a major release, I think.

Mike

> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>
>>
>> : I guess I don't see the back-porting as an issue. Only those
>> that want to need
>> : to do the back-porting. Head moves on...
>>
>> I view it as a potential risk to the overal productivity of the
>> community.
>>
>> If upgrading from A to B is easy people (in general) won't spend a
>> lot of
>> time/effort backporting feature from B to A -- this time/effort
>> savings
>> benefits the community because (depending on the person):
>> 1) that time/effort saved can be spend contributing even more
>> features
>> to Lucene
>> 2) that time/effort saved improves the impressions people have of
>> Lucene.
>>
>> If on the other hand upgrading from X to Y is "hard" that encouragees
>> people to backport features ... in some cases this backporting may
>> be done
>> "in the open" with people contributing these backports as patches,
>> which
>> can then be commited/releaseed by developers ... but there is still a
>> time/effort cost there ... a bigger time/effort cost is the
>> cummulative
>> time/effort cost of all the people that backport some set of
>> features just
>> enough to get things working for themselves on their local copy,
>> and don't
>> contribute thouse changes back ... that cost gets paid by the
>> commuity s a
>> whole over and over again.
>>
>> I certianly don't want to discourage anyone who *wants* to backport
>> features, and I would never suggest that Lucene should make it a
>> policy to
>> not accept patches to previous releases that backport
>> functionality -- i
>> just think we should do our best to minimize the need/motivation
>> to spend
>> time/effort on backporting.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: Back Compatibility [ In reply to ]
Hi robert,

On 01/23/2008 at 4:55 PM, robert engels wrote:
> If the users are "just dropping in a new version" they are not
> contributing to the community... I think just the opposite, they are
> parasites.

I reject your characterization of passive users as "parasites"; I suspect that you intend your casual use of this highly prejudicial term to license wholesale abandonment of them as a valid constituency.

In my estimation, nearly every active contributor to open source projects, including Lucene, was once a passive user. If you discourage that pipeline, you cut off the supply of fresh perspectives and future contributions. Please, let's not do that.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
You must get the write lock before opening the reader if you want
transactional consistency and are performing updates.

No other way to do it.

Otherwise.

A opens reader.
B opens reader.
A performs query decides an update is needed based on results
B performs query decides an update is needed based on results
B gets write lock
B updates
B releases
A gets write lock
A performs update - ERROR. A is performing an update based on stale data

If A & B want to update an index, it must work as:

A gets lock
A opens reader
A updates
A releases lock
B get lcoks
B opens reader
B updates
B releases lock

The only way you can avoid this is if system can determine that B's
query results in the first case would not change based on A's updates.

On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Thanks.
>>
>> So all writers still need to get the write lock, before opening
>> the reader in order to maintain transactional consistency.
>
> I don't understand what you mean by "before opening the reader"? A
> writer acquires the write.lock before opening. Readers do not,
> unless/until they do their first write operation (deleteDocument/
> setNorm).
>
>> Was there performance testing done on the lockless commits with
>> heavy contention? I would think that reading the directory to find
>> the latest segments file would be slower. Is there a 'latest
>> segments' file to avoid this? If not, there probably should be. As
>> long as the data fits in a single disk block (which is should), I
>> don't think you will have a consistency problem.
>
> Performance tests were done (see LUCENE-710).
>
> And, yes, there is a file segments.gen that records the latest
> segment, but it is used along with the directory listing to find
> the current segments file.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I don't think I can say that this needs to happen now either. :)

An interesting question to answer would be:

If Lucene did not exist, and given all of the knowledge we have, we
decided to create a Java based search engine, would the API look like
it does today?

The answer may be yes. I doubt it would be in many areas though.

The major releases are where you get to rethink the API and the
approach.

If you don't do this, Lucene will slowly die (as you stated). What
happens is that the developers get tired of the harness and start a
new project. If the API were able to be changed easier this would
not happen.


On Jan 23, 2008, at 4:27 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> I think you are incorrect.
>>
>> I would guess the number of people/organizations using Lucene vs.
>> contributing to Lucene is much greater.
>>
>> The contributers work in head (should IMO). The users can select a
>> particular version of Lucene and code their apps accordingly. They
>> can also back-port features from a later to an earlier release. If
>> they have limited development resources, they are probably not
>> working on Lucene (they are working on their apps), but they can
>> update their own code to work with later versions - which they
>> would probably rather do than learning the internals and
>> contributing to Lucene.
>>
>> If the users are "just dropping in a new version" they are not
>> contributing to the community... I think just the opposite, they
>> are parasites. I think a way to gauge this would be the number of
>> questions/people on the user list versus the development list.
>
> I don't think they are parasites at all. They are users that place
> alot of trust in us and will come to the users list with
> interesting issues. Many of the improvements to Lucene are sourced
> from the users list. Even if that user doesn't do the actual work
> to fix the issue, their innocent question and prodding can inspire
> someone else to take the idea forward, make a patch, etc. This is
> the normal and healthy way that open source works....
>
>> Lucene is a library, and I believe what I stated is earlier is
>> true - in order to continue to advance it the API needs to be
>> permitted to change to allow for better functionality and
>> performance. If Lucene is hand-tied by earlier APIs then this work
>> is either not going to happen, or be very messy (inefficient).
>
> The thing is, we have been able to advance lately, sizably, without
> breaking APIs, thanks to the "future backwards compatibility
> proofing" that Lucene does.
>
> I do agree that if it got to the point where we were forced to make
> a hard choice of stunt Lucene's growth so as to keep backwards
> compatibility vs let Lucene grow and make a new major release, we
> should definitely make a new major release. Search is still young
> and if we stunt Lucene now it will slowly die.
>
> It's just that I haven't seen any recent change, except for
> allowing JVM 1.5 source, that actually requires a major release, I
> think.
>
> Mike
>
>> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>>
>>>
>>> : I guess I don't see the back-porting as an issue. Only those
>>> that want to need
>>> : to do the back-porting. Head moves on...
>>>
>>> I view it as a potential risk to the overal productivity of the
>>> community.
>>>
>>> If upgrading from A to B is easy people (in general) won't spend
>>> a lot of
>>> time/effort backporting feature from B to A -- this time/effort
>>> savings
>>> benefits the community because (depending on the person):
>>> 1) that time/effort saved can be spend contributing even more
>>> features
>>> to Lucene
>>> 2) that time/effort saved improves the impressions people have
>>> of Lucene.
>>>
>>> If on the other hand upgrading from X to Y is "hard" that
>>> encouragees
>>> people to backport features ... in some cases this backporting
>>> may be done
>>> "in the open" with people contributing these backports as
>>> patches, which
>>> can then be commited/releaseed by developers ... but there is
>>> still a
>>> time/effort cost there ... a bigger time/effort cost is the
>>> cummulative
>>> time/effort cost of all the people that backport some set of
>>> features just
>>> enough to get things working for themselves on their local copy,
>>> and don't
>>> contribute thouse changes back ... that cost gets paid by the
>>> commuity s a
>>> whole over and over again.
>>>
>>> I certianly don't want to discourage anyone who *wants* to backport
>>> features, and I would never suggest that Lucene should make it a
>>> policy to
>>> not accept patches to previous releases that backport
>>> functionality -- i
>>> just think we should do our best to minimize the need/motivation
>>> to spend
>>> time/effort on backporting.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
The statement upon rereading seems much stronger than intended. You
are correct, but I think the number of users that become contributers
is still far less than the number of users.

The only abandonment of the users was from the standpoint of
maintaining a legacy API. The users are free to update their code to
move with Lucene. They are the ones choosing to stay behind.

Even though I have contributed very little to Lucene, I still fight
for the developers ability to move it forward - since I do contribute
so little !!!!. It is up to me to update my code, or stay where I am
at. Now, if Lucene created a release every week that completely
changed the API and broke everything I wrote, while the old release
still had numerous serious bugs, I would quickly grow frustrated and
find a new library. That is not the case, and I don't think anyone
(especially me) is arguing for that.

On Jan 23, 2008, at 4:29 PM, Steven A Rowe wrote:

> Hi robert,
>
> On 01/23/2008 at 4:55 PM, robert engels wrote:
>> If the users are "just dropping in a new version" they are not
>> contributing to the community... I think just the opposite, they are
>> parasites.
>
> I reject your characterization of passive users as "parasites"; I
> suspect that you intend your casual use of this highly prejudicial
> term to license wholesale abandonment of them as a valid constituency.
>
> In my estimation, nearly every active contributor to open source
> projects, including Lucene, was once a passive user. If you
> discourage that pipeline, you cut off the supply of fresh
> perspectives and future contributions. Please, let's not do that.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Right.

But, that can, and should, be done outside of the Lucene core.

Mike

robert engels wrote:

> You must get the write lock before opening the reader if you want
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock
> A performs update - ERROR. A is performing an update based on stale
> data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's
> query results in the first case would not change based on A's updates.
>
> On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:
>
>>
>> robert engels wrote:
>>
>>> Thanks.
>>>
>>> So all writers still need to get the write lock, before opening
>>> the reader in order to maintain transactional consistency.
>>
>> I don't understand what you mean by "before opening the reader"?
>> A writer acquires the write.lock before opening. Readers do not,
>> unless/until they do their first write operation (deleteDocument/
>> setNorm).
>>
>>> Was there performance testing done on the lockless commits with
>>> heavy contention? I would think that reading the directory to
>>> find the latest segments file would be slower. Is there a 'latest
>>> segments' file to avoid this? If not, there probably should be.
>>> As long as the data fits in a single disk block (which is
>>> should), I don't think you will have a consistency problem.
>>
>> Performance tests were done (see LUCENE-710).
>>
>> And, yes, there is a file segments.gen that records the latest
>> segment, but it is used along with the directory listing to find
>> the current segments file.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Top posting because this is a response to the thread as a whole.

It appears that this thread has identified some different reasons for
"needing" to break compatibility:
1) A current behavior is now deemed bad or wrong. Examples: the silent
truncation of large documents or an analyzer that works incorrectly.
2) Performance tuning such as seen in Token, allowing reuse.
3) Support of a new language feature, e.g. generics, that make the
code "better".
4) A new feature requires a change to the existing API.

Perhaps there were others? Maybe specifics are in Jira.

It seems to me that the Lucene developers have done an excellent job
at figuring out how to maintain compatibility. This is a testament to
how well grounded the design of the API actually is, from early on and
even now. And changes seem to be well thought out, well designed and
carefully implemented.

I think that when it really gets down to it, the Lucene API will stay
very stable because of this.

On a side note, the cLucene project seems to be languishing (still
trying to get to 2.0) and any stability of the API is a good thing for
it. And perhaps for the other "ports" as well.

Again many thanks for all your hard work,
DM Smith, a thankful "parasite" :)

On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:

>
> chris Hostetter wrote:
>
>>
>> : I do like the idea of a static/system property to match legacy
>> : behavior. For example, the bugs around how StandardTokenizer
>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>> solution.
>> : Clearly those are silly bugs that should be fixed, quickly, with
>> this
>> : back-compatible mode to keep the bug in place.
>> :
>> : We might want to, instead, have ctors for many classes take a
>> required
>> : arg which states the version of Lucene you are using? So if you
>> are
>> : writing a new app you would pass in the current version. Then, on
>> : dropping in a future Lucene JAR, we could use that arg to enforce
>> the
>> : right backwards compatibility. This would save users from having
>> to
>> : realize they are hitting one of these situations and then know to
>> go
>> : set the right static/property to retain the buggy behavior.
>>
>> I'm not sure that this would be better though ... when i write my
>> code, i
>> pass "2.3" to all these constructors (or factory methods) and then
>> later i
>> want to upgrade to 2.3 to get all the new performance goodness ... i
>> shouldn't have to change all those constructor calls to get all the
>> 2.4
>> goodness, i should be able to leave my code as is -- but if i do
>> that,
>> then i might not get all the 2.4 goodness, (like improved
>> tokenization, or more precise segment merging) because some of that
>> goodness violates previous assumptions that some code might have
>> had ...
>> my code doesn't have those assumptions, i know nothing about them,
>> i'll
>> take whatever behavior the Lucene Developers recommend (unless i see
>> evidence that it breaks something, in which case i'll happily set a
>> system property or something that the release notes say will force
>> the
>> old behavior.
>>
>> The basic principle being: by default, give users the behavior that
>> is
>> generally viewed as "correct" -- but give them the option to force
>> "uncorrect" legacy behavior.
>
> OK, I agree: the vast majority of users upgrading would in fact want
> all of the changes in the new release. And then the rare user who
> is affected by that bug fix to StandardTokenizer would have to set
> the compatibility mode. So it makes sense for you to get all
> changes on upgrading (and NOT specify the legacy version in all
> ctors).
>
>> : Also, backporting is extremely costly over time. I'd much rather
>> keep
>> : compatibility for longer on our forward releases, than spend our
>> : scarce resources moving changes back.
>>
>> +1
>>
>> : So to summarize ... I think we should have (keep) a high
>> tolerance for
>> : cruft to maintain API compatibility. I think our current approach
>> : (try hard to keep compatibility during "minor" releases, then
>> : deprecate, then remove APIs on a major release; do major releases
>> only
>> : when truly required) is a good one.
>>
>> i'm with you for the most part, it's just the defintion of "when
>> truly
>> required" that tends to hang people up ... there's a chicken vs egg
>> problem of deciding wether the code should drive what the next
>> release
>> number is: "i've added a bitch'n feature but it requires adding a
>> method
>> to an interface, therefor the next release must be called 4.0" ...
>> vs the
>> mindset that "we just had a 3.0 release, it's too soon for another
>> major
>> release, the next release should be called 3.1, so we need to hold
>> off on
>> commiting non backwards compatible changes for a while."
>>
>> I'm in the first camp: version numbers should be descriptive,
>> information
>> carrying, labels for releases -- but the version number of a release
>> should be dicated by the code contained in that release. (if that
>> means
>> the next version after 3.0.0 is 4.0.0, then so be it.)
>
> Well, I am weary of doing major releases too often. Though I do
> agree that the version number should be a "fastmatch" for reading
> through CHANGES.txt.
>
> Say we do this, and zoom forward 2 years when we're up to 6.0, then
> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>
> One of the amazing things about Lucene, to me, is how many really
> major changes we have been able to make while not in fact breaking
> backwards compatibility (too much). Being very careful not to make
> things public, intentionally not committing to things like exactly
> when does a flush or commit or merge actually happen, marking new
> APIs as experimental and freely subject to change, using abstract
> classes not interfaces, are all wonderful tools that Lucene employs
> (and should continue to do so), to enable sizable changes in the
> future while keeping backwards compatibility.
>
> Allowing for future backwards compatibility is one of the most
> important things we all do when we make changes to Lucene!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Yes, I agree these are what is about (despite the divergence into
locking).

As I see, it the question is about whether we should try to do major
releases on the order of a year, rather than the current 2+ year
schedule and also how to best handle bad behavior when producing
tokens that previous applications rely on.

On the first case, we said we would try to do minor releases more
frequently (on the order of once a quarter) in the past, but this, so
far hasn't happened. However, it has only been one release, and it
did have a lot of big changes that warranted longer testing. I do
agree with Michael M. that we have done a good job of keeping back
compatibility. I still don't know if trying to clean out deprecations
once a year puts some onerous task on people when it comes to
upgrading as opposed to doing every two years. Do people really have
code that they never compile or work on in over a year? If they do,
do they care about upgrading? It clearly means they are happy w/
Lucene and don't need any bug fixes. I can understand this being a
bigger issue if it were on the order of every 6 months or less, but
that isn't what I am proposing. I guess my suggestion would be that
we try to get back onto the once a quarter release goal, which will
more than likely lead to a major release in the 1-1.5 year time
frame. That being said, I am fine with maintaining the status quo
concerning back. compatibility as I think those arguments are
compelling. On the interface thing, I wish there was a @introducing
annotation that could announce the presence of a new method and would
give a warning up until the version specified is met, at which point
it would break the compile, but I realize the semantics of that are
pretty weird, so...

As for the other issue concerning things like token issues, I think it
is reasonable to fix the bug and just let people know it will change
indexing, but try to allow for the old way if it is not to onerous.
Chances are most people aren't even aware of it, and thus telling them
about may actually cause them to consider it. For things like
maxFieldLength, etc. then back compat. is a reasonable thing to
preserve.

Cheers,
Grant


On Jan 23, 2008, at 6:24 PM, DM Smith wrote:

> Top posting because this is a response to the thread as a whole.
>
> It appears that this thread has identified some different reasons
> for "needing" to break compatibility:
> 1) A current behavior is now deemed bad or wrong. Examples: the
> silent truncation of large documents or an analyzer that works
> incorrectly.
> 2) Performance tuning such as seen in Token, allowing reuse.
> 3) Support of a new language feature, e.g. generics, that make the
> code "better".
> 4) A new feature requires a change to the existing API.
>
> Perhaps there were others? Maybe specifics are in Jira.
>
> It seems to me that the Lucene developers have done an excellent job
> at figuring out how to maintain compatibility. This is a testament
> to how well grounded the design of the API actually is, from early
> on and even now. And changes seem to be well thought out, well
> designed and carefully implemented.
>
> I think that when it really gets down to it, the Lucene API will
> stay very stable because of this.
>
> On a side note, the cLucene project seems to be languishing (still
> trying to get to 2.0) and any stability of the API is a good thing
> for it. And perhaps for the other "ports" as well.
>
> Again many thanks for all your hard work,
> DM Smith, a thankful "parasite" :)
>
> On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:
>
>>
>> chris Hostetter wrote:
>>
>>>
>>> : I do like the idea of a static/system property to match legacy
>>> : behavior. For example, the bugs around how StandardTokenizer
>>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>>> solution.
>>> : Clearly those are silly bugs that should be fixed, quickly, with
>>> this
>>> : back-compatible mode to keep the bug in place.
>>> :
>>> : We might want to, instead, have ctors for many classes take a
>>> required
>>> : arg which states the version of Lucene you are using? So if you
>>> are
>>> : writing a new app you would pass in the current version. Then, on
>>> : dropping in a future Lucene JAR, we could use that arg to
>>> enforce the
>>> : right backwards compatibility. This would save users from
>>> having to
>>> : realize they are hitting one of these situations and then know
>>> to go
>>> : set the right static/property to retain the buggy behavior.
>>>
>>> I'm not sure that this would be better though ... when i write my
>>> code, i
>>> pass "2.3" to all these constructors (or factory methods) and then
>>> later i
>>> want to upgrade to 2.3 to get all the new performance goodness ... i
>>> shouldn't have to change all those constructor calls to get all
>>> the 2.4
>>> goodness, i should be able to leave my code as is -- but if i do
>>> that,
>>> then i might not get all the 2.4 goodness, (like improved
>>> tokenization, or more precise segment merging) because some of that
>>> goodness violates previous assumptions that some code might have
>>> had ...
>>> my code doesn't have those assumptions, i know nothing about them,
>>> i'll
>>> take whatever behavior the Lucene Developers recommend (unless i see
>>> evidence that it breaks something, in which case i'll happily set a
>>> system property or something that the release notes say will force
>>> the
>>> old behavior.
>>>
>>> The basic principle being: by default, give users the behavior
>>> that is
>>> generally viewed as "correct" -- but give them the option to force
>>> "uncorrect" legacy behavior.
>>
>> OK, I agree: the vast majority of users upgrading would in fact
>> want all of the changes in the new release. And then the rare user
>> who is affected by that bug fix to StandardTokenizer would have to
>> set the compatibility mode. So it makes sense for you to get all
>> changes on upgrading (and NOT specify the legacy version in all
>> ctors).
>>
>>> : Also, backporting is extremely costly over time. I'd much
>>> rather keep
>>> : compatibility for longer on our forward releases, than spend our
>>> : scarce resources moving changes back.
>>>
>>> +1
>>>
>>> : So to summarize ... I think we should have (keep) a high
>>> tolerance for
>>> : cruft to maintain API compatibility. I think our current approach
>>> : (try hard to keep compatibility during "minor" releases, then
>>> : deprecate, then remove APIs on a major release; do major
>>> releases only
>>> : when truly required) is a good one.
>>>
>>> i'm with you for the most part, it's just the defintion of "when
>>> truly
>>> required" that tends to hang people up ... there's a chicken vs egg
>>> problem of deciding wether the code should drive what the next
>>> release
>>> number is: "i've added a bitch'n feature but it requires adding a
>>> method
>>> to an interface, therefor the next release must be called 4.0" ...
>>> vs the
>>> mindset that "we just had a 3.0 release, it's too soon for another
>>> major
>>> release, the next release should be called 3.1, so we need to hold
>>> off on
>>> commiting non backwards compatible changes for a while."
>>>
>>> I'm in the first camp: version numbers should be descriptive,
>>> information
>>> carrying, labels for releases -- but the version number of a release
>>> should be dicated by the code contained in that release. (if that
>>> means
>>> the next version after 3.0.0 is 4.0.0, then so be it.)
>>
>> Well, I am weary of doing major releases too often. Though I do
>> agree that the version number should be a "fastmatch" for reading
>> through CHANGES.txt.
>>
>> Say we do this, and zoom forward 2 years when we're up to 6.0, then
>> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>>
>> One of the amazing things about Lucene, to me, is how many really
>> major changes we have been able to make while not in fact breaking
>> backwards compatibility (too much). Being very careful not to make
>> things public, intentionally not committing to things like exactly
>> when does a flush or commit or merge actually happen, marking new
>> APIs as experimental and freely subject to change, using abstract
>> classes not interfaces, are all wonderful tools that Lucene employs
>> (and should continue to do so), to enable sizable changes in the
>> future while keeping backwards compatibility.
>>
>> Allowing for future backwards compatibility is one of the most
>> important things we all do when we make changes to Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 24, 2008 12:31 AM, robert engels <rengels@ix.netcom.com> wrote:

> You must get the write lock before opening the reader if you want
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock


Lucene actually protects from this - 'A' would fail to acquire the write
lock, with a stale-index-exception (this is tested in TesIndexReader -
testDeleteReaderReaderConflict).


> A performs update - ERROR. A is performing an update based on stale data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's
> query results in the first case would not change based on A's updates.
>
Re: Back Compatibility [ In reply to ]
Doron Cohen wrote:

> ------=_Part_11325_2615585.1201162438596
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On Jan 24, 2008 12:31 AM, robert engels <rengels@ix.netcom.com> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).

Aha, you are right Doron! Indeed Lucene effectively serializes this
case, using the write.lock.

>
>> A performs update - ERROR. A is performing an update based on
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's
>> updates.

And, in this case, B will fail when it tries to get the lock. It
must be re-opened so it first sees the changes committed by A.

So, Lucene is transactional, but forces clients to serialize their
write operations (ie, one cannot have multiple transactions open at
once).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Grant Ingersoll wrote:

> Yes, I agree these are what is about (despite the divergence into
> locking).
>
> As I see, it the question is about whether we should try to do
> major releases on the order of a year, rather than the current 2+
> year schedule and also how to best handle bad behavior when
> producing tokens that previous applications rely on.
>
> On the first case, we said we would try to do minor releases more
> frequently (on the order of once a quarter) in the past, but this,
> so far hasn't happened. However, it has only been one release,
> and it did have a lot of big changes that warranted longer
> testing. I do agree with Michael M. that we have done a good job
> of keeping back compatibility. I still don't know if trying to
> clean out deprecations once a year puts some onerous task on people
> when it comes to upgrading as opposed to doing every two years. Do
> people really have code that they never compile or work on in over
> a year? If they do, do they care about upgrading? It clearly
> means they are happy w/ Lucene and don't need any bug fixes. I can
> understand this being a bigger issue if it were on the order of
> every 6 months or less, but that isn't what I am proposing. I
> guess my suggestion would be that we try to get back onto the once
> a quarter release goal, which will more than likely lead to a major
> release in the 1-1.5 year time frame. That being said, I am fine
> with maintaining the status quo concerning back. compatibility as I
> think those arguments are compelling. On the interface thing, I
> wish there was a @introducing annotation that could announce the
> presence of a new method and would give a warning up until the
> version specified is met, at which point it would break the
> compile, but I realize the semantics of that are pretty weird, so...

I do think we should try for minor releases more frequently,
independent of the backwards compatibility question (how often to do
major releases) :)

I think major releases should be done only when a major feature truly
"forces" us to (which Java 1.5 has) and not because we want to clean
out the accumulated cruft we are carrying forward to preserve
backwards compatibility.

> As for the other issue concerning things like token issues, I think
> it is reasonable to fix the bug and just let people know it will
> change indexing, but try to allow for the old way if it is not to
> onerous. Chances are most people aren't even aware of it, and thus
> telling them about may actually cause them to consider it. For
> things like maxFieldLength, etc. then back compat. is a reasonable
> thing to preserve.

So, in hindsight, the acronym/host setting for StandardAnalyzer
really should have defaulted to "true", meaning the bug is fixed, but
users who somehow depend on the bug (which should be a tiny minority)
have an avenue (setReplaceInvalidAcronym) to keep back compatibility
if needed even on a minor release, right? I agree. (And so in 2.4
we should fix the default to true?).

I think for such issues where it's a very minor break in backwards
compatibility, we should make the break, and very carefully document
this in the "Changes in runtime behavior" section, even within a
minor release. I don't think such changes should drive us to a major
release.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:

>
> Grant Ingersoll wrote:
>
>> Yes, I agree these are what is about (despite the divergence into
>> locking).
>>
>> As I see, it the question is about whether we should try to do
>> major releases on the order of a year, rather than the current 2+
>> year schedule and also how to best handle bad behavior when
>> producing tokens that previous applications rely on.
>>
>> On the first case, we said we would try to do minor releases more
>> frequently (on the order of once a quarter) in the past, but this,
>> so far hasn't happened. However, it has only been one release,
>> and it did have a lot of big changes that warranted longer
>> testing. I do agree with Michael M. that we have done a good job
>> of keeping back compatibility. I still don't know if trying to
>> clean out deprecations once a year puts some onerous task on people
>> when it comes to upgrading as opposed to doing every two years. Do
>> people really have code that they never compile or work on in over
>> a year? If they do, do they care about upgrading? It clearly
>> means they are happy w/ Lucene and don't need any bug fixes. I can
>> understand this being a bigger issue if it were on the order of
>> every 6 months or less, but that isn't what I am proposing. I
>> guess my suggestion would be that we try to get back onto the once
>> a quarter release goal, which will more than likely lead to a major
>> release in the 1-1.5 year time frame. That being said, I am fine
>> with maintaining the status quo concerning back. compatibility as I
>> think those arguments are compelling. On the interface thing, I
>> wish there was a @introducing annotation that could announce the
>> presence of a new method and would give a warning up until the
>> version specified is met, at which point it would break the
>> compile, but I realize the semantics of that are pretty weird, so...
>
> I do think we should try for minor releases more frequently,
> independent of the backwards compatibility question (how often to do
> major releases) :)
>

+1

The question then becomes what can we do to improve our development
process?

> I think major releases should be done only when a major feature
> truly "forces" us to (which Java 1.5 has) and not because we want to
> clean out the accumulated cruft we are carrying forward to preserve
> backwards compatibility.
>
>> As for the other issue concerning things like token issues, I think
>> it is reasonable to fix the bug and just let people know it will
>> change indexing, but try to allow for the old way if it is not to
>> onerous. Chances are most people aren't even aware of it, and thus
>> telling them about may actually cause them to consider it. For
>> things like maxFieldLength, etc. then back compat. is a reasonable
>> thing to preserve.
>
> So, in hindsight, the acronym/host setting for StandardAnalyzer
> really should have defaulted to "true", meaning the bug is fixed,
> but users who somehow depend on the bug (which should be a tiny
> minority) have an avenue (setReplaceInvalidAcronym) to keep back
> compatibility if needed even on a minor release, right? I agree.
> (And so in 2.4 we should fix the default to true?).


>
>
> I think for such issues where it's a very minor break in backwards
> compatibility, we should make the break, and very carefully document
> this in the "Changes in runtime behavior" section, even within a
> minor release. I don't think such changes should drive us to a
> major release.


+1

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Sorry, I am using "gets lock" to mean 'opening the index'. I was
simplifying the the procedure.

I think your comment is not correct in this context.

On Jan 24, 2008, at 3:16 AM, Michael McCandless wrote:

> Doron Cohen wrote:
>
>> ------=_Part_11325_2615585.1201162438596
>> Content-Type: text/plain; charset=ISO-8859-1
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> On Jan 24, 2008 12:31 AM, robert engels <rengels@ix.netcom.com>
>> wrote:
>>
>>> You must get the write lock before opening the reader if you want
>>> transactional consistency and are performing updates.
>>>
>>> No other way to do it.
>>>
>>> Otherwise.
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides an update is needed based on results
>>> B performs query decides an update is needed based on results
>>> B gets write lock
>>> B updates
>>> B releases
>>> A gets write lock
>>
>>
>> Lucene actually protects from this - 'A' would fail to acquire the
>> write
>> lock, with a stale-index-exception (this is tested in
>> TesIndexReader -
>> testDeleteReaderReaderConflict).
>
> Aha, you are right Doron! Indeed Lucene effectively serializes
> this case, using the write.lock.
>
>>
>>> A performs update - ERROR. A is performing an update based on
>>> stale data
>>>
>>> If A & B want to update an index, it must work as:
>>>
>>> A gets lock
>>> A opens reader
>>> A updates
>>> A releases lock
>>> B get lcoks
>>> B opens reader
>>> B updates
>>> B releases lock
>>>
>>> The only way you can avoid this is if system can determine that B's
>>> query results in the first case would not change based on A's
>>> updates.
>
> And, in this case, B will fail when it tries to get the lock. It
> must be re-opened so it first sees the changes committed by A.
>
> So, Lucene is transactional, but forces clients to serialize their
> write operations (ie, one cannot have multiple transactions open at
> once).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
Thanks, you are correct, but I am not sure it covers the complete case.

Change it a bit to be:

A opens reader.
B opens reader.
A performs query decides a new document is needed
B performs query decides a new document is needed
B gets writer, adds document, closes
A gets writer, adds document, closes

There needs to be a way to manually serialize these operations. I
assume I should just do this:

A gets writer
B gets writer - can't so blocked
A opens reader
A performs query decides a new document is needed
A adds document
A closes reader
A closes writer
B now gets writer
B opens reader
B performs query sees a new document is not needed
B closes reader
B closes writer

Previously, with the read locks, I did not think you could open the
reader after you had the write lock.

Am I correct here?

On Jan 24, 2008, at 2:13 AM, Doron Cohen wrote:

> On Jan 24, 2008 12:31 AM, robert engels <rengels@ix.netcom.com> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).
>
>
>> A performs update - ERROR. A is performing an update based on
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's
>> updates.
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 24, 2008 6:55 PM, robert engels <rengels@ix.netcom.com> wrote:

> Thanks, you are correct, but I am not sure it covers the complete case.
>
> Change it a bit to be:
>
> A opens reader.
> B opens reader.
> A performs query decides a new document is needed
> B performs query decides a new document is needed
> B gets writer, adds document, closes
> A gets writer, adds document, closes
>
> There needs to be a way to manually serialize these operations. I
> assume I should just do this:
>
> A gets writer
> B gets writer - can't so blocked
> A opens reader
> A performs query decides a new document is needed
> A adds document
> A closes reader
> A closes writer
> B now gets writer
> B opens reader
> B performs query sees a new document is not needed
> B closes reader
> B closes writer
>
> Previously, with the read locks, I did not think you could open the
> reader after you had the write lock.
>
> Am I correct here?


If I understand you correctly then yes and no :-)

"Yes" in the sense that this would work and achieve the
required serialization, and "no" in that you could always open
readers whether there was an open writer or not.

The current locking logic with readers is that opening a reader does
not require acquiring any lock. Only when attempting to use the reader
for a write operation (e.g. delete) the reader becomes a writer, and
for that it (1) acquires a write lock and (2) verifies that the
index was not modified by any writer since the reader was
first opened (or else it throws that stale exception).

Prior to lockless-commit there were two lock types - write-lock and
commit-lock. The commit-lock was used only briefly - during file opening
during reader-opening, to guarantee that no writer modifies the files that
the
reader is reading (especially the segments file). Lockles-commits got rid
of the commit lock (mainly by changing to never modify a file once it was
written.) Write locks are still in use, but only for writers, as described
above.
(Mike feel free to correct me here...)
Re: Back Compatibility [ In reply to ]
This is now a hijacked thread. It is very interesting, but it may be
hard to find again. Wouldn't it be better to record this thread
differently, perhaps opening a Jira issue to add XA to Lucene?

-- DM

Doron Cohen wrote:
> On Jan 24, 2008 6:55 PM, robert engels <rengels@ix.netcom.com> wrote:
>
>
>> Thanks, you are correct, but I am not sure it covers the complete case.
>>
>> Change it a bit to be:
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides a new document is needed
>> B performs query decides a new document is needed
>> B gets writer, adds document, closes
>> A gets writer, adds document, closes
>>
>> There needs to be a way to manually serialize these operations. I
>> assume I should just do this:
>>
>> A gets writer
>> B gets writer - can't so blocked
>> A opens reader
>> A performs query decides a new document is needed
>> A adds document
>> A closes reader
>> A closes writer
>> B now gets writer
>> B opens reader
>> B performs query sees a new document is not needed
>> B closes reader
>> B closes writer
>>
>> Previously, with the read locks, I did not think you could open the
>> reader after you had the write lock.
>>
>> Am I correct here?
>>
>
>
> If I understand you correctly then yes and no :-)
>
> "Yes" in the sense that this would work and achieve the
> required serialization, and "no" in that you could always open
> readers whether there was an open writer or not.
>
> The current locking logic with readers is that opening a reader does
> not require acquiring any lock. Only when attempting to use the reader
> for a write operation (e.g. delete) the reader becomes a writer, and
> for that it (1) acquires a write lock and (2) verifies that the
> index was not modified by any writer since the reader was
> first opened (or else it throws that stale exception).
>
> Prior to lockless-commit there were two lock types - write-lock and
> commit-lock. The commit-lock was used only briefly - during file opening
> during reader-opening, to guarantee that no writer modifies the files that
> the
> reader is reading (especially the segments file). Lockles-commits got rid
> of the commit lock (mainly by changing to never modify a file once it was
> written.) Write locks are still in use, but only for writers, as described
> above.
> (Mike feel free to correct me here...)
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
I will do so.

On Jan 24, 2008, at 12:44 PM, DM Smith wrote:

> This is now a hijacked thread. It is very interesting, but it may
> be hard to find again. Wouldn't it be better to record this thread
> differently, perhaps opening a Jira issue to add XA to Lucene?
>
> -- DM
>
> Doron Cohen wrote:
>> On Jan 24, 2008 6:55 PM, robert engels <rengels@ix.netcom.com> wrote:
>>
>>
>>> Thanks, you are correct, but I am not sure it covers the complete
>>> case.
>>>
>>> Change it a bit to be:
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides a new document is needed
>>> B performs query decides a new document is needed
>>> B gets writer, adds document, closes
>>> A gets writer, adds document, closes
>>>
>>> There needs to be a way to manually serialize these operations. I
>>> assume I should just do this:
>>>
>>> A gets writer
>>> B gets writer - can't so blocked
>>> A opens reader
>>> A performs query decides a new document is needed
>>> A adds document
>>> A closes reader
>>> A closes writer
>>> B now gets writer
>>> B opens reader
>>> B performs query sees a new document is not needed
>>> B closes reader
>>> B closes writer
>>>
>>> Previously, with the read locks, I did not think you could open the
>>> reader after you had the write lock.
>>>
>>> Am I correct here?
>>>
>>
>>
>> If I understand you correctly then yes and no :-)
>>
>> "Yes" in the sense that this would work and achieve the
>> required serialization, and "no" in that you could always open
>> readers whether there was an open writer or not.
>>
>> The current locking logic with readers is that opening a reader does
>> not require acquiring any lock. Only when attempting to use the
>> reader
>> for a write operation (e.g. delete) the reader becomes a writer, and
>> for that it (1) acquires a write lock and (2) verifies that the
>> index was not modified by any writer since the reader was
>> first opened (or else it throws that stale exception).
>>
>> Prior to lockless-commit there were two lock types - write-lock and
>> commit-lock. The commit-lock was used only briefly - during file
>> opening
>> during reader-opening, to guarantee that no writer modifies the
>> files that
>> the
>> reader is reading (especially the segments file). Lockles-commits
>> got rid
>> of the commit lock (mainly by changing to never modify a file once
>> it was
>> written.) Write locks are still in use, but only for writers, as
>> described
>> above.
>> (Mike feel free to correct me here...)
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
One more thought on back compatibility:

Do we have the same requirements for any and all contrib modules? I
am especially thinking about the benchmark contrib, but it probably
applies to others as well.

-Grant


On Jan 24, 2008, at 8:42 AM, Grant Ingersoll wrote:

>
> On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:
>
>>
>> Grant Ingersoll wrote:
>>
>>> Yes, I agree these are what is about (despite the divergence into
>>> locking).
>>>
>>> As I see, it the question is about whether we should try to do
>>> major releases on the order of a year, rather than the current 2+
>>> year schedule and also how to best handle bad behavior when
>>> producing tokens that previous applications rely on.
>>>
>>> On the first case, we said we would try to do minor releases more
>>> frequently (on the order of once a quarter) in the past, but this,
>>> so far hasn't happened. However, it has only been one release,
>>> and it did have a lot of big changes that warranted longer
>>> testing. I do agree with Michael M. that we have done a good job
>>> of keeping back compatibility. I still don't know if trying to
>>> clean out deprecations once a year puts some onerous task on
>>> people when it comes to upgrading as opposed to doing every two
>>> years. Do people really have code that they never compile or work
>>> on in over a year? If they do, do they care about upgrading? It
>>> clearly means they are happy w/ Lucene and don't need any bug
>>> fixes. I can understand this being a bigger issue if it were on
>>> the order of every 6 months or less, but that isn't what I am
>>> proposing. I guess my suggestion would be that we try to get back
>>> onto the once a quarter release goal, which will more than likely
>>> lead to a major release in the 1-1.5 year time frame. That being
>>> said, I am fine with maintaining the status quo concerning back.
>>> compatibility as I think those arguments are compelling. On the
>>> interface thing, I wish there was a @introducing annotation that
>>> could announce the presence of a new method and would give a
>>> warning up until the version specified is met, at which point it
>>> would break the compile, but I realize the semantics of that are
>>> pretty weird, so...
>>
>> I do think we should try for minor releases more frequently,
>> independent of the backwards compatibility question (how often to
>> do major releases) :)
>>
>
> +1
>
> The question then becomes what can we do to improve our development
> process?
>
>> I think major releases should be done only when a major feature
>> truly "forces" us to (which Java 1.5 has) and not because we want
>> to clean out the accumulated cruft we are carrying forward to
>> preserve backwards compatibility.
>>
>>> As for the other issue concerning things like token issues, I
>>> think it is reasonable to fix the bug and just let people know it
>>> will change indexing, but try to allow for the old way if it is
>>> not to onerous. Chances are most people aren't even aware of it,
>>> and thus telling them about may actually cause them to consider
>>> it. For things like maxFieldLength, etc. then back compat. is a
>>> reasonable thing to preserve.
>>
>> So, in hindsight, the acronym/host setting for StandardAnalyzer
>> really should have defaulted to "true", meaning the bug is fixed,
>> but users who somehow depend on the bug (which should be a tiny
>> minority) have an avenue (setReplaceInvalidAcronym) to keep back
>> compatibility if needed even on a minor release, right? I agree.
>> (And so in 2.4 we should fix the default to true?).
>
>
>>
>>
>> I think for such issues where it's a very minor break in backwards
>> compatibility, we should make the break, and very carefully
>> document this in the "Changes in runtime behavior" section, even
>> within a minor release. I don't think such changes should drive us
>> to a major release.
>
>
> +1
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
On Jan 25, 2008 8:04 PM, Grant Ingersoll <gsingers@apache.org> wrote:

> One more thought on back compatibility:
>
> Do we have the same requirements for any and all contrib modules? I
> am especially thinking about the benchmark contrib, but it probably
> applies to others as well.
>
> -Grant
>

In general I think that contrib should have same requirements, because
there may be applications out there depending on it - e.g. highlighting,
spell-correction - and here too, unstable packages can be marked with
the temporary warning such those we currently have for search.function.

benchmark is different in that - I think - there are no applications that
depend on it, so perhaps we can have more flexibility in it?

Doron
Re: Back Compatibility [ In reply to ]
Well, contrib/Wikipedia has a dependency on it, but at least it is
self contained. I would love to see the Wikipedia stuff extracted out
of benchmark and be in contrib/wikipedia (thus flipping the
dependency), but the effort isn't particularly high on my list.

But I do agree, benchmark doesn't have the same litmus test.

-Grant

On Jan 25, 2008, at 4:01 PM, Doron Cohen wrote:

> On Jan 25, 2008 8:04 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>
>> One more thought on back compatibility:
>>
>> Do we have the same requirements for any and all contrib modules? I
>> am especially thinking about the benchmark contrib, but it probably
>> applies to others as well.
>>
>> -Grant
>>
>
> In general I think that contrib should have same requirements, because
> there may be applications out there depending on it - e.g.
> highlighting,
> spell-correction - and here too, unstable packages can be marked with
> the temporary warning such those we currently have for
> search.function.
>
> benchmark is different in that - I think - there are no applications
> that
> depend on it, so perhaps we can have more flexibility in it?
>
> Doron



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: I would guess the number of people/organizations using Lucene vs. contributing
: to Lucene is much greater.
:
: The contributers work in head (should IMO). The users can select a particular
: version of Lucene and code their apps accordingly. They can also back-port
: features from a later to an earlier release. If they have limited development
: resources, they are probably not working on Lucene (they are working on their
: apps), but they can update their own code to work with later versions - which
: they would probably rather do than learning the internals and contributing to
: Lucene.

i think we have a semantic disconnect on the definition of "community"

I am including any and all people/projects that use Lucene in anyway --
wether or not they contribute back or not. If there are 1000 projects
using lucene as a library, and each project requires 5 man hours of work
to upgrade from version X to version Y becuse of a non-backwards
compatible change, but it would only take 2 man hours of work for those
projects to backport / rip out the one or two features of version Y they
really want to cram them into their code base then the community as a
whole is paying a really heavy cost for version Y ... regardless of wether
each of those 1000 projects invest the 5 hours or the 2 hours ... in the
first extreme we're all spending a cumulative total of 5000 man hours. in
the second case we're spending 2000 man hours, and now we've got 1000 apps
that are runing hacked up unofficial offshoots of version X that will
never be able to upgrade to version Z when it comes out -- the community
not only becomes very fractured but lucene as a whole gets a bad wrap,
because everybody talks about how they still run version X with local
patches instead of using version Y -- it makes new users wonder "what's
wrong with version Y?" ... "if upgrading is so hard that no one does it do
i really wnat to use this library?"

It may seem like a socialist or a communist or a free love hippy attitude,
but if contributors and committers take extra time to develop more
incrimental releases and backwards compatible API transitions it may cost
them more time upfront, but it saves the community as a whole a *lot* of
time in the long run.

By all means: we should move forward anytime really great improvements can
be made through new APIs and new features -- but we need to keep in mind
that if those new APIs and features are hard for our current user base to
adapt to, then we aren't doing the community as a whole any favors by
throwing the baby out with the bath water and prematurely throwing away
an old API in order to support the new one.

Trade offs must be made. Sometimes that may mean sacrificing committer
man hours; or performance; or API cleanliness; in order to reap the
benefit of a strong, happy, healthy, community.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: > So, in hindsight, the acronym/host setting for StandardAnalyzer really
: > should have defaulted to "true", meaning the bug is fixed, but users who
: > somehow depend on the bug (which should be a tiny minority) have an avenue
: > (setReplaceInvalidAcronym) to keep back compatibility if needed even on a
: > minor release, right? I agree. (And so in 2.4 we should fix the default to
: > true?).

: > I think for such issues where it's a very minor break in backwards
: > compatibility, we should make the break, and very carefully document this in
: > the "Changes in runtime behavior" section, even within a minor release. I
: > don't think such changes should drive us to a major release.

: +1

I've made some verbage changes to BackwardsCompatibility to document that
we may in fact make runtime behavior hcanges which are not strictly
"backwards compatible" and what commitments we have to lettings users
force the old behavior if we make a change like this in a minor release.

most of this verbage is just me making stuff up based on this thread ...
it is absolutely open for discussion (and editing by people with more
grammer sense then me)...

http://wiki.apache.org/lucene-java/BackwardsCompatibility



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
: But I do agree, benchmark doesn't have the same litmus test.

the generalization of that statement probably being "all contribs are not
created equal."

I propose making some comments in the BackwardsCompatibility wiki page
about the compatibility commitments of contribs depends largely on their
maturity and intended usage and that the README.txt file for each contrib
will identify it's approach to compatibility.

we can put some boler plate in the README for most of the contribs, and
special verbage in the README for the special contribs.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
And then you can end up like the Soviet Union...

The basic problems of communism - those that don't contribute their
fair share, but suck out the minimum resources (but maximum in
totality), and those that want to lead (their contribution) and suck
the minimum, and then those that contribute the most to make up for
everyone else, and quickly say this SUCKS....


On Jan 27, 2008, at 7:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select
> a particular
> : version of Lucene and code their apps accordingly. They can also
> back-port
> : features from a later to an earlier release. If they have limited
> development
> : resources, they are probably not working on Lucene (they are
> working on their
> : apps), but they can update their own code to work with later
> versions - which
> : they would probably rather do than learning the internals and
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in
> anyway --
> wether or not they contribute back or not. If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for
> those
> projects to backport / rip out the one or two features of version Y
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ...
> in the
> first extreme we're all spending a cumulative total of 5000 man
> hours. in
> the second case we're spending 2000 man hours, and now we've got
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one
> does it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it
> may cost
> them more time upfront, but it saves the community as a whole a
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great
> improvements can
> be made through new APIs and new features -- but we need to keep in
> mind
> that if those new APIs and features are hard for our current user
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing
> away
> an old API in order to support the new one.
>
> Trade offs must be made. Sometimes that may mean sacrificing
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
+1
On Jan 27, 2008, at 8:34 PM, Chris Hostetter wrote:

>
> : But I do agree, benchmark doesn't have the same litmus test.
>
> the generalization of that statement probably being "all contribs
> are not
> created equal."
>
> I propose making some comments in the BackwardsCompatibility wiki page
> about the compatibility commitments of contribs depends largely on
> their
> maturity and intended usage and that the README.txt file for each
> contrib
> will identify it's approach to compatibility.
>
> we can put some boler plate in the README for most of the contribs,
> and
> special verbage in the README for the special contribs.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
+1. And, we always have the major version release at our disposal if
need be.

At any rate, I think we have beaten this one to death. I think it is
a useful to look back every now and then on the major things that
guide us and make sure we all still agree, at least for the most
part. For now, I think our plan is pretty straightforward. 2.4
pretty quickly (3 months?) and then 2.9, all of which will be back-
compat. Then onto 3.0 which will be a full upgrade to 1.5, thus
dropping support for 1.4.

-Grant


On Jan 27, 2008, at 8:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select a
> particular
> : version of Lucene and code their apps accordingly. They can also
> back-port
> : features from a later to an earlier release. If they have limited
> development
> : resources, they are probably not working on Lucene (they are
> working on their
> : apps), but they can update their own code to work with later
> versions - which
> : they would probably rather do than learning the internals and
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in anyway
> --
> wether or not they contribute back or not. If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for
> those
> projects to backport / rip out the one or two features of version Y
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ... in
> the
> first extreme we're all spending a cumulative total of 5000 man
> hours. in
> the second case we're spending 2000 man hours, and now we've got
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one does
> it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it may
> cost
> them more time upfront, but it saves the community as a whole a
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great
> improvements can
> be made through new APIs and new features -- but we need to keep in
> mind
> that if those new APIs and features are hard for our current user
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing
> away
> an old API in order to support the new one.
>
> Trade offs must be made. Sometimes that may mean sacrificing
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Back Compatibility [ In reply to ]
> It may seem like a socialist or a communist or a free love hippy attitude,

It sounds like a perfect attitude.

(In particular the "free love hippie" part - does it come with LSD and
tie-dyed/batik clothes too?)

Kind regards,
Endre.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org