Save This Page
Home » lucene-3.0.1-src » org.apache » lucene » analysis » cn » [javadoc | source]
public final class: ChineseTokenizer [javadoc | source]

All Implemented Interfaces:

Tokenize Chinese text as individual chinese characters.

The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:

Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Fields inherited from org.apache.lucene.analysis.Tokenizer:
 public ChineseTokenizer(Reader in) 
 public ChineseTokenizer(AttributeSource source,
    Reader in) 
 public ChineseTokenizer(AttributeFactory factory,
    Reader in) 
Method from Summary:
end,   incrementToken,   reset,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   correctOffset,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   end,   incrementToken,   reset
Methods from org.apache.lucene.util.AttributeSource:
addAttribute,   addAttributeImpl,   captureState,   clearAttributes,   cloneAttributes,   equals,   getAttribute,   getAttributeClassesIterator,   getAttributeFactory,   getAttributeImplsIterator,   hasAttribute,   hasAttributes,   hashCode,   restoreState,   toString
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from Detail:
 public final  void end() 
 public boolean incrementToken() throws IOException 
 public  void reset() throws IOException 
 public  void reset(Reader input) throws IOException