3.2 中文分词器-PaodingAnalyzer

2016-02-17 21:50:50 7,202 0

在pom.xml中添加依赖:

		<dependency>
			<groupId>com.thihy</groupId>
			<artifactId>elasticsearch-analysis-paoding</artifactId>
			<version>1.4.2.1</version>
		</dependency>
		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>1.5.2</version>
		</dependency>

到网上下载paoding分词器

在src/main/resources/paoding建立文件:paoding-analysis.properties,内容如下

paoding.analyzer.mode=most-words
paoding.analyzer.dictionaries.compiler=net.paoding.analysis.analyzer.impl.MostWordsModeDictionariesCompiler
paoding.dic.home=classpath:paoding/dic
paoding.dic.detector.interval=60
paoding.knife.class.letterKnife=net.paoding.analysis.knife.LetterKnife
paoding.knife.class.numberKnife=net.paoding.analysis.knife.NumberKnife
paoding.knife.class.cjkKnife=net.paoding.analysis.knife.CJKKnife

将dic文件夹拷贝到src/main/resources/paoding下

测试

@Test
	public void test() throws IOException {
		Analyzer analyzer = new PaodingAnalyzer("classpath:paoding/paoding-analysis.properties");
		String text = "我爱北京天安门";
		TokenStream tokenStream = analyzer.tokenStream("", text);
		tokenStream.reset();
		while (tokenStream.incrementToken()) {
			CharTermAttribute charTermAttribute = tokenStream
					.addAttribute(CharTermAttribute.class);
			System.out.println(charTermAttribute);

		}
	}

运行单元测试,控制台输出:

我爱
北京
天安
天安门