separately.

When I calculate the document similarity, I want to give higher weights to

those Taxonomy terms and Ontology terms.

When I index the document, I have defined the Document content, Taxonomy

and Ontology terms as Fields for each document like this in my program.

*Field ontologyTerm= new Field("fiboterms", fiboTermList[curDocNo],

Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);*

*Field taxonomyTerm = new Field("taxoterms", taxoTermList[curDocNo],

Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);*

*Field document = new Field(docNames[curDocNo], strRdElt,

Field.TermVector.YES);*

I’m using Lucene index .TermFreqVector functions to calculate TFIDF values

and, then calculate cosine similarity between two documents using TFIDF

values.

For give weights to Ontology and Taxonomy terms when calculating the cosine

similarity, what I can do is, programmatically multiply the Taxonomy

and Ontology

term frequencies with defined weight factor before calculating the TFIDF

scores. Will this give higher weight to Taxonomy and Ontology terms in

document similarity calculation?

Are there Lucene functions that can be used to give higher weights to the

certain fields when calculating TFIDF values using TermFreqVector? can I

just use the setboost() function for this purpose, then how?

--

Regards

Kasun Perera