Search Lucene by URL

I am storing a document that has a url:

Document doc = new Document();
doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("html", CompressionTools.compressString(html), Field.Store.YES));

I want to find the document at its URL, but I get 0 results:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30)
Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).parse(url);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// Display results
for (ScoreDoc hit : hits) {
  System.out.println("FOUND A MATCH");
}
searcher.close();

What can I do differently so that I can store the HTML document and find it at the url?

+3
source share
2 answers

You can rewrite your request for something like this

Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).newTermQuery(new Term("url", url)).parse(url);

Sentence:

I suggest you use BooleanQuery as it provides good performance and is internally optimized.

TermQuery tq= new TermQuery(new Term("url", url));
// BooleanClauses Enum SHOULD says Use this operator for clauses that should appear in the matching documents.
BooleanQuery bq = new BooleanQuery().add(tq,BooleanClause.Occur.SHOULD);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);

I can see that you are indexing using the frield URL as Not_Analysed, which is a good IMO to search for. Since the analyzer is not used, the value will be saved as a single term.

, , URL- EXACT Lucene, (KeywordAnalyzer ..)

+4

Lucene QueryParser url Parser . TermQuery, :

TermQuery query = new TermQuery(new Term("url", url));
+2

All Articles