java - WildcardQuery Lucene does not work properly -
i trying use wildcardquery:
indexsearcher indexsearcher = new indexsearcher(ireader); term term = new term("phrase", queryparser.escape(partofphrase) + "*"); wildcardquery wildcardquery = new wildcardquery(term); log.debug(partofphrase); sort sort = new sort(new sortfield("freq", sortfield.type.long,true)); scoredoc[] hits = indexsearcher.search(wildcardquery, null, 10, sort).scoredocs;
but when insert "san " (without quotes), want like: "san diego", "san antonio" etc. getting not these results "sandals" (it must space after san), or juelz santana (i want find sentences start san). how can fix issue?
edit also, if insert "san d", have no results.
one possible way solve problem - use analyzer, not split query , text in document space.
one of possible analyzer - keywordanalzer, use whole data single keyword
essential part of test:
directory dir = new ramdirectory(); analyzer analyzer = new keywordanalyzer(); indexwriterconfig iwc = new indexwriterconfig(analyzer); iwc.setopenmode(indexwriterconfig.openmode.create); indexwriter writer = new indexwriter(dir, iwc);
later on, add needed docs:
document doc = new document(); doc.add(new textfield("text", "san diego", field.store.yes)); writer.adddocument(doc);
and finally, search want:
indexreader reader = directoryreader.open(dir); indexsearcher searcher = new indexsearcher(reader); term term = new term("text", queryparser.escape("san ") + "*"); wildcardquery wildcardquery = new wildcardquery(term);
my test working properly, allowing me retrieve san diego , san antonio , not take sandals. take @ full test here - https://github.com/mysterionrise/information-retrieval-adventure/blob/master/src/main/java/org/mystic/lucene/wildcardquerywithspace.java
for more information analyzer - http://lucene.apache.org/core/4_10_2/analyzers-common/org/apache/lucene/analysis/core/keywordanalyzer.html
Comments
Post a Comment