Mailing List Archive

[DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
hey folks,

I have spend a hell lot of time on the positions branch to make
positions and offsets working on all queries if needed. The one thing
that bugged me the most is the distinction between DocsEnum and
DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a
DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter.
Same is true for
DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I
don't really see the benefits from this. We should rather make the
interface simple and call it something like PostingsEnum where you
have to specify flags on the TermsIterator and if we can't provide the
sufficient enum we throw an exception?
I just want to bring up the idea here since it might simplify a lot
for users as well for us when improving our positions / offset etc.
support.

thoughts? Ideas?

simon

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum [ In reply to ]
On Thu, Nov 1, 2012 at 4:26 PM, Simon Willnauer
<simon.willnauer@gmail.com> wrote:
> hey folks,
>
> I have spend a hell lot of time on the positions branch to make
> positions and offsets working on all queries if needed. The one thing
> that bugged me the most is the distinction between DocsEnum and
> DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a
> DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter.
> Same is true for
> DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I
> don't really see the benefits from this. We should rather make the
> interface simple and call it something like PostingsEnum where you
> have to specify flags on the TermsIterator and if we can't provide the
> sufficient enum we throw an exception?
> I just want to bring up the idea here since it might simplify a lot
> for users as well for us when improving our positions / offset etc.
> support.
>

This is an interesting idea. If we forget about TermDocs/TermPositions
and were doing it from scratch, would we have two separate classes?
And whats the advantage? (you already get null if you ask for
positions and they arent there, and queries throw exception on that,
its unrelated to the enum classes themselves).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
RE: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum [ In reply to ]
+1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@gmail.com]
> Sent: Thursday, November 01, 2012 9:26 PM
> To: dev@lucene.apache.org
> Subject: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into
> PostingsEnum
>
> hey folks,
>
> I have spend a hell lot of time on the positions branch to make positions and
> offsets working on all queries if needed. The one thing that bugged me the
> most is the distinction between DocsEnum and DocsAndPositionsEnum.
> Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we
> omit Freqs we should return a DocIdSetIter.
> Same is true for
> DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I
> don't really see the benefits from this. We should rather make the interface
> simple and call it something like PostingsEnum where you have to specify
> flags on the TermsIterator and if we can't provide the sufficient enum we
> throw an exception?
> I just want to bring up the idea here since it might simplify a lot for users as
> well for us when improving our positions / offset etc.
> support.
>
> thoughts? Ideas?
>
> simon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum [ In reply to ]
On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> +1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()...
>

I was frustrated with some of the same things as simon, and thought
about the 'implements' too. (i actually went so far as to make a quick
prototype patch to see what it look like:
http://pastebin.com/vum1mx9H). I don't like that if you write a codec,
you must write duplicate enums and cannot have e.g. your positional
enum extend your docs one and so forth.

I also think it limits us for the Scorer case (it extends DocsEnum
now, but what if you wanted a Scorer where you could walk its
positions...)

But anyway I think I like Simon's idea (we can deal with the interface
idea separately).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum [ In reply to ]
+1, this makes total sense!

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 1, 2012 at 5:04 PM, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> +1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()...
>>
>
> I was frustrated with some of the same things as simon, and thought
> about the 'implements' too. (i actually went so far as to make a quick
> prototype patch to see what it look like:
> http://pastebin.com/vum1mx9H). I don't like that if you write a codec,
> you must write duplicate enums and cannot have e.g. your positional
> enum extend your docs one and so forth.
>
> I also think it limits us for the Scorer case (it extends DocsEnum
> now, but what if you wanted a Scorer where you could walk its
> positions...)
>
> But anyway I think I like Simon's idea (we can deal with the interface
> idea separately).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org