Home » lucene-3.0.1-src » org.apache » lucene » analysis » standard »

org.apache.lucene.analysis.standard

Interfaces:

CharStream   This interface describes a character stream that maintains line and column number positions of the characters.  code | html
StandardTokenizerConstants     code | html

Classes:

FastCharStream   An efficient implementation of JavaCC's CharStream interface.  code | html
ParseException   This exception is thrown when parse errors are encountered.  code | html
StandardAnalyzer   Filters StandardTokenizer with StandardFilter , LowerCaseFilter and StopFilter , using a list of English stop words.  code | html
StandardAnalyzer.SavedStreams     code | html
StandardFilter   Normalizes tokens extracted with StandardTokenizer code | html
StandardTokenizer   A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

  • Splits words at punctuation characters, removing punctuation. 
code | html
StandardTokenizerImpl   This class is a scanner generated by JFlex 1.4.1 on 9/4/08 6:49 PM from the specification file /tango/mike/src/lucene.standarddigit/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex  code | html
StandardTokenizerTokenManager     code | html
Token   Describes the input token stream.  code | html
TokenMgrError     code | html