3.2 中文分词器-PaodingAnalyzer
2016-02-17 21:50:50
7,202
0
在pom.xml中添加依赖:
<dependency> <groupId>com.thihy</groupId> <artifactId>elasticsearch-analysis-paoding</artifactId> <version>1.4.2.1</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>1.5.2</version> </dependency>
到网上下载paoding分词器
在src/main/resources/paoding建立文件:paoding-analysis.properties,内容如下
paoding.analyzer.mode=most-words paoding.analyzer.dictionaries.compiler=net.paoding.analysis.analyzer.impl.MostWordsModeDictionariesCompiler paoding.dic.home=classpath:paoding/dic paoding.dic.detector.interval=60 paoding.knife.class.letterKnife=net.paoding.analysis.knife.LetterKnife paoding.knife.class.numberKnife=net.paoding.analysis.knife.NumberKnife paoding.knife.class.cjkKnife=net.paoding.analysis.knife.CJKKnife
将dic文件夹拷贝到src/main/resources/paoding下
测试
@Test public void test() throws IOException { Analyzer analyzer = new PaodingAnalyzer("classpath:paoding/paoding-analysis.properties"); String text = "我爱北京天安门"; TokenStream tokenStream = analyzer.tokenStream("", text); tokenStream.reset(); while (tokenStream.incrementToken()) { CharTermAttribute charTermAttribute = tokenStream .addAttribute(CharTermAttribute.class); System.out.println(charTermAttribute); } }
运行单元测试,控制台输出:
我爱 北京 天安 天安门
上一篇:3.1 中文分词器—IKAnalyzer
下一篇:4.0 Lucene查询详解