Skip to main content

lucene 编译安装

· 5 min read

需要编译和了解lucene代码

编译

因为lucene锁死了版本,所以要切换成jdk17,我本地是jdk18

clone代码

## clone 代码
git clone https://github.com/apache/lucene.git

### 切换目录
cd lucene

### 编译
./gradlew

## 如果是翻墙,可以使用代理,这样会快一点
## 指定域名和端口
./gradlew -DsocksProxyHost=192.168.1.102 -DsocksProxyPort=1081

启动和测试

### 打包demo
./gradlew lucene:demo:jar

### 执行demo
java -cp /home/ubuntu/lucene-9.1.0/lucene/demo/build/classes/java/main:/home/ubuntu/lucene-9.1.0/lucene/core/build/classes/java/main/ org.apache.lucene.demo.IndexFiles -

操作系统是ubuntu切换jdk17命令如下:

### 安装jdk17
sudo apt install openjdk-17-jdk
# Configure Java 切换java
sudo update-alternatives --config java

# Configure Java Compiler 切换javac
sudo update-alternatives --config javac


### 查看切换之后的命令,java 已经是17了
java --version
openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)

遇到的错误

gradle-wrapper.jar 下载不下来,跳过证书:

wget --no-check-certificate  https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar

然后放到{$luceneGitDir}/gradle/wrapper/ 下面 , 这里luceneGitDir 是你的git clone 下来的lucuene 目录

相关代码

      IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setUseCompoundFile(false); // 生成多个文件

写入header

对应的jdb调试

main[1] stop in  org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136
Deferring breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136.
It will be set after the class is loaded.
main[1] cont
> Set deferred breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136

Breakpoint hit: "thread=main", org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(), line=136 bci=180
136 CodecUtil.writeIndexHeader(

main[1] list
132
133 fieldsStream =
134 directory.createOutput(
135 IndexFileNames.segmentFileName(segment, segmentSuffix, FIELDS_EXTENSION), context);
136 => CodecUtil.writeIndexHeader(
137 fieldsStream, formatName, VERSION_CURRENT, si.getId(), segmentSuffix);
138 assert CodecUtil.indexHeaderLength(formatName, segmentSuffix)
139 == fieldsStream.getFilePointer();
140
141 indexWriter =
main[1] print formatName
formatName = "Lucene90StoredFieldsFastData"

对应堆栈

  [1] org.apache.lucene.store.OutputStreamIndexOutput.writeByte (OutputStreamIndexOutput.java:54)
[2] org.apache.lucene.codecs.CodecUtil.writeBEInt (CodecUtil.java:653)
[3] org.apache.lucene.codecs.CodecUtil.writeHeader (CodecUtil.java:82)
[4] org.apache.lucene.codecs.CodecUtil.writeIndexHeader (CodecUtil.java:125)
[5] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init> (Lucene90CompressingStoredFieldsWriter.java:128)
[6] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter (Lucene90CompressingStoredFieldsFormat.java:140)
[7] org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsWriter (Lucene90StoredFieldsFormat.java:154)
[8] org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter (StoredFieldsConsumer.java:49)
[9] org.apache.lucene.index.StoredFieldsConsumer.startDocument (StoredFieldsConsumer.java:56)
[10] org.apache.lucene.index.IndexingChain.startStoredFields (IndexingChain.java:556)
[11] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:587)
[12] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[13] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[14] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[15] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[16] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
[17] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
[18] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[19] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[20] java.nio.file.Files.walkFileTree (Files.java:2,725)
[21] java.nio.file.Files.walkFileTree (Files.java:2,797)
[22] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[23] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

倒排索引

main[1] where
[1] org.apache.lucene.index.TermsHashPerField.initStreamSlices (TermsHashPerField.java:150)
[2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:198)
[3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
[4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
[5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

写入内容

main[1] where
[1] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.writeField (Lucene90CompressingStoredFieldsWriter.java:276)
[2] org.apache.lucene.index.StoredFieldsConsumer.writeField (StoredFieldsConsumer.java:65)
[3] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:749)
[4] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[5] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[6] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[7] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[8] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[9] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

查看fdt文件

hexdump -C _0.fdt
00000000 3f d7 6c 17 1c 4c 75 63 65 6e 65 39 30 53 74 6f |?.l..Lucene90Sto|
00000010 72 65 64 46 69 65 6c 64 73 46 61 73 74 44 61 74 |redFieldsFastDat|
00000020 61 00 00 00 01 85 88 12 2b 0c 73 6b 95 30 38 76 |a.......+.sk.08v|
00000030 c9 0a 2a 52 29 00 00 0a 00 01 00 1c 02 06 03 07 |..*R)...........|
00000040 07 07 07 07 07 07 07 07 20 00 1a 60 2f 68 6f 6d |........ ..`/hom|
00000050 65 2f 60 75 62 75 6e 74 75 60 2f 64 6f 63 2f 6d |e/`ubuntu`/doc/m|
00000060 60 6f 6e 67 6f 2e 74 60 78 74 00 1a 2f 68 60 6f |`ongo.t`xt../h`o|
00000070 6d 65 2f 75 62 60 75 6e 74 75 2f 64 60 6f 63 2f |me/ub`untu/d`oc/|
00000080 68 65 6c 60 6c 6f 2e 74 78 74 c0 28 93 e8 00 00 |hel`lo.txt.(....|
00000090 00 00 00 00 00 00 c8 75 0a 41 |.......u.A|
0000009a

fdt描述

然后分析fdt格式: [1-4]代表第一个字节到第四个字节

[1-4]前四位字节是大端的magic number CODEC_MAGIC = 0x3fd76c17 [5-33] 第五个字节描述字符串长度,后面的[6-33]是具体的字符串,也就是16进制1c也就是10进制的28 , 因为字符串长度是28的字符串Lucene90StoredFieldsFastData [34-37]字符串后面是写死的版本大端的1 [38-53] 16字节用唯一id描述这个文件

缓冲池

TermsHashPerField持有三个缓冲池intPool,bytePool,termBytePool

  TermsHashPerField(
int streamCount,
IntBlockPool intPool,
ByteBlockPool bytePool,
ByteBlockPool termBytePool,
Counter bytesUsed,
TermsHashPerField nextPerField,
String fieldName,
IndexOptions indexOptions) {
this.intPool = intPool;
this.bytePool = bytePool;
this.streamCount = streamCount;
this.fieldName = fieldName;
this.nextPerField = nextPerField;
assert indexOptions != IndexOptions.NONE;
this.indexOptions = indexOptions;
PostingsBytesStartArray byteStarts = new PostingsBytesStartArray(this, bytesUsed);
bytesHash = new BytesRefHash(termBytePool, HASH_INIT_SIZE, byteStarts);
}

生成term

main[1] where
[1] org.apache.lucene.util.BytesRefHash.add (BytesRefHash.java:247)
[2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:193)
[3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
[4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
[5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

arch 查询

相关阅读

main[1] where
[1] org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact (SegmentTermsEnum.java:476)
[2] org.apache.lucene.index.TermStates.loadTermsEnum (TermStates.java:117)
[3] org.apache.lucene.index.TermStates.build (TermStates.java:102)
[4] org.apache.lucene.search.TermQuery.createWeight (TermQuery.java:227)
[5] org.apache.lucene.search.IndexSearcher.createWeight (IndexSearcher.java:885)
[6] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:686)
[7] org.apache.lucene.search.IndexSearcher.searchAfter (IndexSearcher.java:532)
[8] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:542)
[9] org.apache.lucene.demo.SearchFiles.doPagingSearch (SearchFiles.java:180)
[10] org.apache.lucene.demo.SearchFiles.main (SearchFiles.java:150)

相关阅读