小镇青年的文档中心

springboot整合ik分词器

pom依赖

xml

<!-- ikanalyzer 中文分词器  -->
<dependency>
    <groupId>com.janeluo</groupId>
    <artifactId>ikanalyzer</artifactId>
    <version>2012_u6</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<!--  lucene-queryparser 查询分析器模块 -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>7.3.0</version>
</dependency>

示例代码

java

public static List<String> iKSegmenterToList(String rawText) throws Exception {
    List<String> resultList = new ArrayList<>();
    StringReader sr = new StringReader(rawText);
    // 第二个参数：是否开启智能分词
    IKSegmenter ik = new IKSegmenter(sr, true);
    Lexeme lex;
    while((lex = ik.next()) != null) {
        String lexemeText = lex.getLexemeText();
        resultList.add(lexemeText);
    }

    return resultList;
}

虽然上面代码开启了智能分词，但是分词还存在另外两种情况，此时需要对IK分词器进行单独的扩展配置
1. 特殊词汇（如公司名）需要保留
2. 一些无关紧要的词汇（如介词）或者就想过滤掉的词需要配置

IK分词器扩展配置步骤，在项目的resources文件夹下新建三个文件：IKAnalyzer.cfg.xml、ext_dict.dic、ext_stopwords.dic

IKAnalyzer.cfg.xml

xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <entry key="ext_dict">ext_dict.dic</entry>
    <entry key="ext_stopwords">ext_stopwords.dic</entry>
</properties>

ext_dict：扩展词典：添加系统默认分词词典中没有的词（如公司名、专有名词等等）
ext_stopwords：停用词典：用于指定哪些词是停用词，这些词在分词时会被过滤掉（如的、了、和等等）

ext_dict.dic：自己定义就好，没有的话，不用配置

ext_stopwords.dic：github停用词典整理

停用词典	说明	本地路径
cn_stopwords.txt	中文停用词表	https://docs.qnmdmyy.top/resources/后端/spring全家桶/springboot整合ik分词器/cn_stopwords.txt
hit_stopwords.txt	哈工大停用词表	https://docs.qnmdmyy.top/resources/后端/spring全家桶/springboot整合ik分词器/hit_stopwords.txt
baidu_stopwords.txt	百度停用词表	https://docs.qnmdmyy.top/resources/后端/spring全家桶/springboot整合ik分词器/baidu_stopwords.txt
scu_stopwords.txt	机器智能实验室停用词库	https://docs.qnmdmyy.top/resources/后端/spring全家桶/springboot整合ik分词器/scu_stopwords.txt

selenium

PaddleOCR

AndroidStudio

MobaXterm

idea

jd-gui

navicat

vscode

vercel

springboot整合ik分词器

springboot整合ik分词器 ​

springboot整合ik分词器