Mailing List Archive

Can I just add ShingleFilter to my nalayzer used for indexing and searching
Trying out ShingleFIlter and the way it is documented it implys that you
can just add it to your anaylzer and that's it with no side-effects
except a larger index, but I read other implying you have to modify the
way you parse user queries, could anyone confirm/deny.

Also is there an easy way to use a ShingleFilter only for common stop
words, or is that pointless.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching [ In reply to ]
Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA issue open for the QueryParser problem: <https://issues.apache.org/jira/browse/LUCENE-2605>).

There is a workaround involving PositionFilter described on the Solr wiki: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>. Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency. In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.

Steve

> -----Original Message-----
> From: Paul Taylor [mailto:paul_t100@fastmail.fm]
> Sent: Tuesday, February 21, 2012 8:07 AM
> To: java-user@lucene.apache.org
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
>
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
>
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
>
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Can I just add ShingleFilter to my nalayzer used for indexing and searching [ In reply to ]
On 21/02/2012 14:37, Steven A Rowe wrote:
> Hi Paul,
>
> Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA issue open for the QueryParser problem:<https://issues.apache.org/jira/browse/LUCENE-2605>).
>
> There is a workaround involving PositionFilter described on the Solr wiki:<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>. Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.
>
> CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency. In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.
>
> Steve
>
>
Thanks Steve, as our user interface allows access to the full lucene
query syntax I'll hold off this for now.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org