Skip to content

springboot整合ik分词器

  • pom依赖

    xml
    <!-- ikanalyzer 中文分词器  -->
    <dependency>
        <groupId>com.janeluo</groupId>
        <artifactId>ikanalyzer</artifactId>
        <version>2012_u6</version>
        <exclusions>
            <exclusion>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-core</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-queryparser</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-analyzers-common</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    
    <!--  lucene-queryparser 查询分析器模块 -->
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queryparser</artifactId>
        <version>7.3.0</version>
    </dependency>
  • 示例代码

    java
    public static List<String> iKSegmenterToList(String rawText) throws Exception {
        List<String> resultList = new ArrayList<>();
        StringReader sr = new StringReader(rawText);
        // 第二个参数:是否开启智能分词
        IKSegmenter ik = new IKSegmenter(sr, true);
        Lexeme lex;
        while((lex = ik.next()) != null) {
            String lexemeText = lex.getLexemeText();
            resultList.add(lexemeText);
        }
    
        return resultList;
    }
  • 虽然上面代码开启了智能分词,但是分词还存在另外两种情况,此时需要对IK分词器进行单独的扩展配置

    1. 特殊词汇(如公司名)需要保留
    2. 一些无关紧要的词汇(如介词)或者就想过滤掉的词需要配置
  • IK分词器扩展配置步骤,在项目的resources文件夹下新建三个文件:IKAnalyzer.cfg.xml、ext_dict.dic、ext_stopwords.dic

MIT版权,未经许可禁止任何形式的转载