Mailing List Archive

Is Fair Similarity working with lucene 2.2 ?
Hi,

I've tried the "fair" similarity described here
(http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) with
lucene 2.2 but it does not seems to work.

I've attached the custom "MyFair" similarity to both IndexWriter and
IndexSearcher.

Do you have any idea ?

Thanks a lot,

Fabrice

--
View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15001250.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
On Montag, 21. Januar 2008, Fabrice Robini wrote:

> I've tried the "fair" similarity described here
> (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739)
> with lucene 2.2 but it does not seems to work.

What exactly doesn't work, don't you see an effect? At least the scores
should change if you try with an artificial small document. Maybe you can
strip down your code to a small self-contained example and post it.

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
Yes, I do not see an effect...

Here is my unit test that test it:

public void testFairSimilarity() throws CorruptIndexException,
IOException, ParseException
{
Directory theDirectory = new RAMDirectory();
Analyzer theAnalyzer = new FrenchAnalyzer();

IndexWriter theIndexWriter = new IndexWriter(theDirectory,
theAnalyzer);
theIndexWriter.setSimilarity(new FairSimilarity());

Document doc1 = new Document();
Field name1 = new Field(" NAME", "SHORT_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
Field.Store.NO, Field.Index.TOKENIZED);
doc1.add(name1);
doc1.add(content1);
theIndexWriter.addDocument(doc1);

Document doc2 = new Document();
Field name2 = new Field(" NAME", "BIG_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
doc1.add(name2);
doc1.add(content2);
theIndexWriter.addDocument(doc2);

Searcher searcher = new IndexSearcher(theDirectory);
searcher.setSimilarity(new FairSimilarity());

QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

Hits hits = searcher.search(queryParser.parse("x"));

assertEquals(2, hits.length());
assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
}

Is there anything wrong ?
Thanks a lot,

Fabrice


Daniel Naber-10 wrote:
>
> On Montag, 21. Januar 2008, Fabrice Robini wrote:
>
>> I've tried the "fair" similarity described here
>> (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739)
>> with lucene 2.2 but it does not seems to work.
>
> What exactly doesn't work, don't you see an effect? At least the scores
> should change if you try with an artificial small document. Maybe you can
> strip down your code to a small self-contained example and post it.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15014224.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
Well, I cant seem to even get past the assertions of this code.

The first assertion is failing in that I get 0 hits. I am using
SimpleAnalyzer since I do not have a FrenchAnalyzer.

Any thoughts?
Srikant

Fabrice Robini wrote:
> Yes, I do not see an effect...
>
> Here is my unit test that test it:
>

----------------------------------------------------------------------
Get a free email account with anti spam protection.
http://www.bluebottle.com/tag/2


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
Oooops sorry, bad cut/paste...

Here is the right one :-)

public void testFairSimilarity() throws CorruptIndexException,
IOException, ParseException
{
Directory theDirectory = new RAMDirectory();
Analyzer theAnalyzer = new StandardAnalyzer();

IndexWriter theIndexWriter = new IndexWriter(theDirectory,
theAnalyzer);
theIndexWriter.setSimilarity(new FairSimilarity());

Document doc1 = new Document();
Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
Field.Store.NO, Field.Index.TOKENIZED);
doc1.add(name1);
doc1.add(content1);
theIndexWriter.addDocument(doc1);

Document doc2 = new Document();
Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
doc2.add(name2);
doc2.add(content2);
theIndexWriter.addDocument(doc2);

theIndexWriter.close();

Searcher searcher = new IndexSearcher(theDirectory);
searcher.setSimilarity(new FairSimilarity());

QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

Hits hits = searcher.search(queryParser.parse("x"));

assertEquals(2, hits.length());
assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
}




Srikant Jakilinki-3 wrote:
>
> Well, I cant seem to even get past the assertions of this code.
>
> The first assertion is failing in that I get 0 hits. I am using
> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>
> Any thoughts?
> Srikant
>
> Fabrice Robini wrote:
>> Yes, I do not see an effect...
>>
>> Here is my unit test that test it:
>>
>
> ----------------------------------------------------------------------
> Get a free email account with anti spam protection.
> http://www.bluebottle.com/tag/2
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15023062.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits.
Maybe there is a loss of precision somewhere. Or when scores are equal,
Lucene is doing something unintended/overlooked and thus putting shorter
documents higher as the experiment is a special case where the TF of a
queried term (for both suites, the TF of x = 10%) is equal which is very
rarely. Or maybe the IDF factor is kicking in in some strange way
although it shouldnt. There are a number of varied reasons, but for the
naked eye, there isnt much.

However, that said, length normalization is not a science but an art and
the simple scheme we have here in the FairSimilarity will not perform
always as expected in real world scenarios. Maybe I am missing something
or have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topic
because it is an issue of computing relevance which is the core of search.

Cheers,
Srikant

Fabrice Robini wrote:
> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)
>
> public void testFairSimilarity() throws CorruptIndexException,
> IOException, ParseException
> {
> Directory theDirectory = new RAMDirectory();
> Analyzer theAnalyzer = new StandardAnalyzer();
>
> IndexWriter theIndexWriter = new IndexWriter(theDirectory,
> theAnalyzer);
> theIndexWriter.setSimilarity(new FairSimilarity());
>
> Document doc1 = new Document();
> Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
> Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
> Field.Store.NO, Field.Index.TOKENIZED);
> doc1.add(name1);
> doc1.add(content1);
> theIndexWriter.addDocument(doc1);
>
> Document doc2 = new Document();
> Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
> Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
> doc2.add(name2);
> doc2.add(content2);
> theIndexWriter.addDocument(doc2);
>
> theIndexWriter.close();
>
> Searcher searcher = new IndexSearcher(theDirectory);
> searcher.setSimilarity(new FairSimilarity());
>
> QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);
>
> Hits hits = searcher.search(queryParser.parse("x"));
>
> assertEquals(2, hits.length());
> assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
> assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
> }
>
>
>
>
> Srikant Jakilinki-3 wrote:
>
>> Well, I cant seem to even get past the assertions of this code.
>>
>> The first assertion is failing in that I get 0 hits. I am using
>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>
>> Any thoughts?
>> Srikant
>>

----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
Hi Srikant,

I really thank you for your reply, it's very interesting.
I have to say I am confused with that now...
I do not know what I can to for passing this Unit test...

I agree with you, it may be an issue of computing relevance.

Fabrice


Srikant Jakilinki-3 wrote:
>
> OK, got it to work. Thanks.
>
> By a quick scoring comparision, I got the same scores for both hits.
> Maybe there is a loss of precision somewhere. Or when scores are equal,
> Lucene is doing something unintended/overlooked and thus putting shorter
> documents higher as the experiment is a special case where the TF of a
> queried term (for both suites, the TF of x = 10%) is equal which is very
> rarely. Or maybe the IDF factor is kicking in in some strange way
> although it shouldnt. There are a number of varied reasons, but for the
> naked eye, there isnt much.
>
> However, that said, length normalization is not a science but an art and
> the simple scheme we have here in the FairSimilarity will not perform
> always as expected in real world scenarios. Maybe I am missing something
> or have forgot my basics but that is not to say your observation is
> trivial.
>
> Rather, the contrary. Hope there will be more activity on this topic
> because it is an issue of computing relevance which is the core of search.
>
> Cheers,
> Srikant
>
> Fabrice Robini wrote:
>> Oooops sorry, bad cut/paste...
>>
>> Here is the right one :-)
>>
>> public void testFairSimilarity() throws CorruptIndexException,
>> IOException, ParseException
>> {
>> Directory theDirectory = new RAMDirectory();
>> Analyzer theAnalyzer = new StandardAnalyzer();
>>
>> IndexWriter theIndexWriter = new IndexWriter(theDirectory,
>> theAnalyzer);
>> theIndexWriter.setSimilarity(new FairSimilarity());
>>
>> Document doc1 = new Document();
>> Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>> Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
>> Field.Store.NO, Field.Index.TOKENIZED);
>> doc1.add(name1);
>> doc1.add(content1);
>> theIndexWriter.addDocument(doc1);
>>
>> Document doc2 = new Document();
>> Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>> Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12
>> 13
>> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>> doc2.add(name2);
>> doc2.add(content2);
>> theIndexWriter.addDocument(doc2);
>>
>> theIndexWriter.close();
>>
>> Searcher searcher = new IndexSearcher(theDirectory);
>> searcher.setSimilarity(new FairSimilarity());
>>
>> QueryParser queryParser = new QueryParser("CONTENT",
>> theAnalyzer);
>>
>> Hits hits = searcher.search(queryParser.parse("x"));
>>
>> assertEquals(2, hits.length());
>> assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>> assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>> }
>>
>>
>>
>>
>> Srikant Jakilinki-3 wrote:
>>
>>> Well, I cant seem to even get past the assertions of this code.
>>>
>>> The first assertion is failing in that I get 0 hits. I am using
>>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>>
>>> Any thoughts?
>>> Srikant
>>>
>
> ----------------------------------------------------------------------
> Free pop3 email with a spam filter.
> http://www.bluebottle.com/tag/5
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15026214.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
On Dienstag, 22. Januar 2008, Fabrice Robini wrote:

> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)

The score is the same, so documents with a lower id (inserted earlier) will
be returned first. So everything looks okay to me, or am I missing
something?

regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Is Fair Similarity working with lucene 2.2 ? [ In reply to ]
Is there anything I can do to pass my Unit-Test ?
Or it is impossible ?

Thanks a lot,

Fabrice



Fabrice Robini wrote:
>
> Hi Srikant,
>
> I really thank you for your reply, it's very interesting.
> I have to say I am confused with that now...
> I do not know what I can to for passing this Unit test...
>
> I agree with you, it may be an issue of computing relevance.
>
> Fabrice
>
>
> Srikant Jakilinki-3 wrote:
>>
>> OK, got it to work. Thanks.
>>
>> By a quick scoring comparision, I got the same scores for both hits.
>> Maybe there is a loss of precision somewhere. Or when scores are equal,
>> Lucene is doing something unintended/overlooked and thus putting shorter
>> documents higher as the experiment is a special case where the TF of a
>> queried term (for both suites, the TF of x = 10%) is equal which is very
>> rarely. Or maybe the IDF factor is kicking in in some strange way
>> although it shouldnt. There are a number of varied reasons, but for the
>> naked eye, there isnt much.
>>
>> However, that said, length normalization is not a science but an art and
>> the simple scheme we have here in the FairSimilarity will not perform
>> always as expected in real world scenarios. Maybe I am missing something
>> or have forgot my basics but that is not to say your observation is
>> trivial.
>>
>> Rather, the contrary. Hope there will be more activity on this topic
>> because it is an issue of computing relevance which is the core of
>> search.
>>
>> Cheers,
>> Srikant
>>
>> Fabrice Robini wrote:
>>> Oooops sorry, bad cut/paste...
>>>
>>> Here is the right one :-)
>>>
>>> public void testFairSimilarity() throws CorruptIndexException,
>>> IOException, ParseException
>>> {
>>> Directory theDirectory = new RAMDirectory();
>>> Analyzer theAnalyzer = new StandardAnalyzer();
>>>
>>> IndexWriter theIndexWriter = new IndexWriter(theDirectory,
>>> theAnalyzer);
>>> theIndexWriter.setSimilarity(new FairSimilarity());
>>>
>>> Document doc1 = new Document();
>>> Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
>>> Field.Index.UN_TOKENIZED);
>>> Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
>>> Field.Store.NO, Field.Index.TOKENIZED);
>>> doc1.add(name1);
>>> doc1.add(content1);
>>> theIndexWriter.addDocument(doc1);
>>>
>>> Document doc2 = new Document();
>>> Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
>>> Field.Index.UN_TOKENIZED);
>>> Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11
>>> 12 13
>>> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>>> doc2.add(name2);
>>> doc2.add(content2);
>>> theIndexWriter.addDocument(doc2);
>>>
>>> theIndexWriter.close();
>>>
>>> Searcher searcher = new IndexSearcher(theDirectory);
>>> searcher.setSimilarity(new FairSimilarity());
>>>
>>> QueryParser queryParser = new QueryParser("CONTENT",
>>> theAnalyzer);
>>>
>>> Hits hits = searcher.search(queryParser.parse("x"));
>>>
>>> assertEquals(2, hits.length());
>>> assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>>> assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>>> }
>>>
>>>
>>>
>>>
>>> Srikant Jakilinki-3 wrote:
>>>
>>>> Well, I cant seem to even get past the assertions of this code.
>>>>
>>>> The first assertion is failing in that I get 0 hits. I am using
>>>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>>>
>>>> Any thoughts?
>>>> Srikant
>>>>
>>
>> ----------------------------------------------------------------------
>> Free pop3 email with a spam filter.
>> http://www.bluebottle.com/tag/5
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>

--
View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15060757.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org