Mailing List Archive

Similarity coefficient for more exact matching
Hi guys,
I have a field, Anayzed, Store.No.
Suppose one Document with value inside this field "Hello".
Another one "Hello world , one, two, three, four".
Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)?
Thanks,
 

Maxim
Re: Similarity coefficient for more exact matching [ In reply to ]
You can override org.apache.lucene.search.Similarity/DefaultSimilarity
to tweak quite a lot of stuff.

computeNorm() may be the method you are interested in. Called at
indexing time so be sure to use the same implementation at index and
query time, using IndexWriterConfig.setSimilarity() and
IndexSearcher.setSimilarity(), unless you are clever or like being
confused.

SweetSpotSimilarity might also be worth a look.

--
Ian.


On Fri, Apr 27, 2012 at 1:18 PM, Maxim Terletsky <sxamt@yahoo.com> wrote:
> Hi guys,
> I have a field, Anayzed, Store.No.
> Suppose one Document with value inside this field "Hello".
> Another one "Hello world , one, two, three, four".
> Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)?
> Thanks,
>
>
> Maxim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Similarity coefficient for more exact matching [ In reply to ]
> [use] IndexWriterConfig.setSimilarity() and
> IndexSearcher.setSimilarity(), unless you are clever or like being confused.
>
> SweetSpotSimilarity might also be worth a look.
>
> --
> Ian.

Being even less clever, I just make sure I set:

Similarity.setDefault(new MySimilarity())

when crawling and searching, so everything uses the same similarity strategies.

Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault().

Any thoughts on scenarios where you'd not push a custom similarity into the default position?

-Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Similarity coefficient for more exact matching [ In reply to ]
Similarity.setDefault(new MySimilarity()) is certainly better than the
2 calls I recommended. Thanks.

I find it hard to see why one might not want to do this in normal
usage but have a vague recollection of someone once outlining some
obscure scenarios where different similarities at index and search
time made sense.


--
Ian.


On Fri, May 4, 2012 at 5:32 PM, Paul Hill <paul@metajure.com> wrote:
>> [use] IndexWriterConfig.setSimilarity() and
>> IndexSearcher.setSimilarity(), unless you are clever or like being confused.
>>
>> SweetSpotSimilarity might also be worth a look.
>>
>> --
>> Ian.
>
> Being even less clever,  I just make sure I set:
>
> Similarity.setDefault(new MySimilarity())
>
> when crawling and searching, so everything uses the same similarity strategies.
>
> Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault().
>
> Any thoughts on scenarios where you'd not push a custom similarity into the default position?
>
> -Paul
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org