Hi,
I indexed a term '?e???????' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.
Here is my code
protected Analyzer.TokenStreamComponents createComponents(final String
> fieldName, final Reader reader)
> {
> final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
> reader);
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
>
> TokenStream tok = new ClassicFilter(src);
> tok = new LowerCaseFilter(getVersion(), tok);
> tok = new StopFilter(getVersion(), tok, stopwords);
> tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
> search
>
> return new Analyzer.TokenStreamComponents(src, tok)
> {
> @Override
> protected void setReader(final Reader reader) throws
> IOException
> {
>
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> super.setReader(reader);
> }
> };
> }
Am I missing anything? Is that expected behavior for my input or any reason
behind such abnormal behavior?
--
Regards,
Chitra
I indexed a term '?e???????' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.
Here is my code
protected Analyzer.TokenStreamComponents createComponents(final String
> fieldName, final Reader reader)
> {
> final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
> reader);
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
>
> TokenStream tok = new ClassicFilter(src);
> tok = new LowerCaseFilter(getVersion(), tok);
> tok = new StopFilter(getVersion(), tok, stopwords);
> tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
> search
>
> return new Analyzer.TokenStreamComponents(src, tok)
> {
> @Override
> protected void setReader(final Reader reader) throws
> IOException
> {
>
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> super.setReader(reader);
> }
> };
> }
Am I missing anything? Is that expected behavior for my input or any reason
behind such abnormal behavior?
--
Regards,
Chitra