Mailing List Archive

Getting DF & IDF
Hi, I am new to use lucene, I have a query string of multiple terms. i) i want to return query string by removing stop words and stemmed version of the query.
ii) second i want to get tf and idf of each term in a query, how to get it?







Asif


_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969
Re: Getting DF & IDF [ In reply to ]
with my idea,
using BooleanQuery, you can make every thing.


On Mon, Feb 1, 2010 at 10:44 PM, Asif Nawaz <asifnawaz@hotmail.com> wrote:

>
> Hi, I am new to use lucene, I have a query string of multiple terms. i) i
> want to return query string by removing stop words and stemmed version of
> the query.
> ii) second i want to get tf and idf of each term in a query, how to get it?
>
>
>
>
>
>
>
> Asif
>
>
> _________________________________________________________________
> Hotmail: Trusted email with powerful SPAM protection.
> https://signup.live.com/signup.aspx?id=60969
>
RE: Getting DF & IDF [ In reply to ]
In HotelDatabase project of lucene, Following code is written in performSearch method of SearchEngine class.

Let queryString = "Located in the heart of paris"

Analyzer analyzer = new StandardAnalyzer();
IndexSearcher is = new IndexSearcher("index");
QueryParser parser = new QueryParser("content", analyzer);
Query query = parser.parse(queryString);
Hits hits = is.search(query);

To be specific what i want here
i) Removing stop words from query string and use stemming, so new query string should become "Locate heart paris"
ii) How to get term frequency (tf) of each word in query?
iii) How to get Document Frequency(df) of each word in query?
iv) How to get Inverse Document Frequency (idf) of each word in query?


Can u please let me know some solution that give answer of all my four questions. Or can you refer me to some sample code. I have tried boolean query but unable to do this.




> From: thienthanhomenh@gmail.com
> Date: Wed, 3 Feb 2010 04:59:49 +0900
> Subject: Re: Getting DF & IDF
> To: java-user@lucene.apache.org
>
> with my idea,
> using BooleanQuery, you can make every thing.
>
>
> On Mon, Feb 1, 2010 at 10:44 PM, Asif Nawaz <asifnawaz@hotmail.com> wrote:
>
> >
> > Hi, I am new to use lucene, I have a query string of multiple terms. i) i
> > want to return query string by removing stop words and stemmed version of
> > the query.
> > ii) second i want to get tf and idf of each term in a query, how to get it?
> >
> >
> >
> >
> >
> >
> >
> > Asif
> >
> >
> > _________________________________________________________________
> > Hotmail: Trusted email with powerful SPAM protection.
> > https://signup.live.com/signup.aspx?id=60969
> >

_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969
Re: Getting DF & IDF [ In reply to ]
Hi,
I am not sure if you are still searching the answer for your question. If
so, then please read on...

You can get the DF & IDF for each of the query terms in the query as below..

IndexReader reader = IndexReader.open(FSDirectory.open(new File(indexDir)),
true);

//Create a FilterIndexReader to invoke the abstract methods
FilterIndexReader filterIndexReader = new FilterIndexReader(reader);

//Number of documents in the index
int numDocs = filterIndexReader.numDocs();

//Iterate over each of the query words
for(String queryWord : queryWords){
Term term = new Term(searchField, queryWord.toLowerCase());

int docFreq = 0;
try {
docFreq = filterIndexReader.docFreq(term);
} catch (IOException e) {
logger.log(Level.SEVERE, null, e);
}

//Calculate IDF
double idf = 0.0;
if(docFreq > 0){
idf = Math.log10((double) numDocs / docFreq);
}

System.out.println(queryWord + "\tDF -" + docFreq + "\tIDF -" + idf);
}

--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-DF-IDF-tp547386p844962.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Getting DF & IDF [ In reply to ]
int numDocs = filterIndexReader.numDocs();
...
idf = Math.log10((double) numDocs / docFreq);
Sethu_424 wrote
>
>
wrong formula. numDoc should not be a count of documents in index - but
documents containing searching term.
We need something like IndexReader.docFreq( term );

--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-DF-IDF-tp547386p3984938.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org