Mailing List Archive

Find documents contained in search term
Hi,

I have a situation in which I have many short documents (30-400 chars).
My goal is given a phrase, find an indexed document which is a prefix of the
phrase.
Is there a way to achieve this goal using lucene using a single query?

Thanks,
David.



--
View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Find documents contained in search term [ In reply to ]
Can't see how you could do it with standard queries, but you could
reverse the process and use a MemoryIndex.

Add the single target phrase to the memory index then loop round all
docs executing a search for each one. Maybe use PrefixQuery although
I'd worry about performance. Try it and see.

But if you're just doing string comparison

for each doc {
if target.startsWith(doc.text) {
// match
}
}

might be easier.


--
Ian.

On Thu, Aug 16, 2012 at 6:38 PM, davidbrai <davidbrai@gmail.com> wrote:
> Hi,
>
> I have a situation in which I have many short documents (30-400 chars).
> My goal is given a phrase, find an indexed document which is a prefix of the
> phrase.
> Is there a way to achieve this goal using lucene using a single query?
>
> Thanks,
> David.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Find documents contained in search term [ In reply to ]
I was hoping I didn't have to iterate through the short documents.
I have about ~1M of them currently and this process needs to be very fast.
So I understand there is not such functionality available in lucene.



--
View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663p4001867.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Find documents contained in search term [ In reply to ]
Hi

You need to use prefix query for your requirement. Below are my thoughts
and hope it helps.

Say "Hello World" is your phrase.

1. Do a phrase query with your phrase ("Hello World")
2. If not found then strip the last character and then do prefix query
("Hello Worl")
3. Continue step 2 still you get the result or the pharse is empty.

If you give examples of some sample documents in the index and search
phrase then it will help others to give better response.

Regards
Aditya
www.findbestopensource.com - Search from more than 200,000 open source
projects.


On Fri, Aug 17, 2012 at 9:25 PM, davidbrai <davidbrai@gmail.com> wrote:

> I was hoping I didn't have to iterate through the short documents.
> I have about ~1M of them currently and this process needs to be very fast.
> So I understand there is not such functionality available in lucene.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663p4001867.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>