java - Separate numbers from letters in Lucene -
in many documents i'm indexing lucene, people accidentally concatenate words numbers. instance, 1 say: "i born in2000", instead of "i born in 2000".
is there lucene tokenizer can separate words numbers (e.g. in2000and) several words (e.g. in 2000 and)?
you can use worddelimiterfilterfactory , add splitonnumerics=1 param schema.
Comments
Post a Comment