public class: SegTokenFilter [javadoc | source]

Filters a SegToken by converting full-width latin to half-width, then lowercasing latin. Additionally, all punctuation is converted into Utility#COMMON_DELIMITER

WARNING: The status of the analyzers/smartcn analysis.cn.smart package is experimental. The APIs and file formats introduced here might change in the future and will not be supported anymore in such a case.

Method from org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter Summary:
Method from org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter Detail:
 public SegToken filter(SegToken token) 
    Filter an input SegToken

    Full-width latin will be converted to half-width, then all latin will be lowercased. All punctuation is converted into Utility#COMMON_DELIMITER