Skip to main content

lucene 分词

· 2 min read

背景

了解分词过程

概述

lucene的查询过程:

(String query , String field ) -> Query

整个过程是将字符串"how old" 切割成一个个Term Query

最后会构造成一棵语法树:

should:[how,old]

图片

背景

lucene 的分词是一个基本的话题,主要是利用:incrementToken 这个抽象方法以及继承AttributeSource 这个类

public abstract class TokenStream extends AttributeSource implements Closeable {
public abstract boolean incrementToken() throws IOException;
}

lucene boolean clause

相关阅读

lucene 的bolean 子句有四种:

  • MUST
  • FILTER
  • SHOULD
  • MUST_NOT 子句

堆栈

<init>:202, TermQuery (org.apache.lucene.search)
newTermQuery:640, QueryBuilder (org.apache.lucene.util)
add:408, QueryBuilder (org.apache.lucene.util)
analyzeMultiBoolean:427, QueryBuilder (org.apache.lucene.util)
createFieldQuery:364, QueryBuilder (org.apache.lucene.util)
createFieldQuery:257, QueryBuilder (org.apache.lucene.util)
newFieldQuery:468, QueryParserBase (org.apache.lucene.queryparser.classic)
getFieldQuery:457, QueryParserBase (org.apache.lucene.queryparser.classic)
MultiTerm:680, QueryParser (org.apache.lucene.queryparser.classic)
Query:233, QueryParser (org.apache.lucene.queryparser.classic)
TopLevelQuery:223, QueryParser (org.apache.lucene.queryparser.classic)
parse:136, QueryParserBase (org.apache.lucene.queryparser.classic)
testParse:20, ParseTest (com.dinosaur.lucene.demo)

排序算分

BlockMaxMaxscoreScorermatches会将所有的分词算出来,然后计算分数总和

score:250, BM25Similarity$BM25Scorer (org.apache.lucene.search.similarities)
score:60, LeafSimScorer (org.apache.lucene.search)
score:75, TermScorer (org.apache.lucene.search)
matches:240, BlockMaxMaxscoreScorer$2 (org.apache.lucene.search)
doNext:85, TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator (org.apache.lucene.search)
advance:78, TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator (org.apache.lucene.search)
score:232, BooleanWeight$2 (org.apache.lucene.search)
score:38, BulkScorer (org.apache.lucene.search)
search:776, IndexSearcher (org.apache.lucene.search)
search:694, IndexSearcher (org.apache.lucene.search)
search:688, IndexSearcher (org.apache.lucene.search)
searchAfter:523, IndexSearcher (org.apache.lucene.search)
search:538, IndexSearcher (org.apache.lucene.search)
doPagingSearch:161, SearchFiles (com.dinosaur.lucene.skiptest)
testSearch:131, SearchFiles (com.dinosaur.lucene.skiptest)

相关阅读