Home » lucene-3.0.1-src » org.apache » lucene » analysis » compound » [javadoc | source]
org.apache.lucene.analysis.compound
public class: HyphenationCompoundWordTokenFilter [javadoc | source]
java.lang.Object
   org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
         org.apache.lucene.analysis.TokenFilter
            org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
               org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter

All Implemented Interfaces:
    Closeable

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

Fields inherited from org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase:
DEFAULT_MIN_WORD_SIZE,  DEFAULT_MIN_SUBWORD_SIZE,  DEFAULT_MAX_SUBWORD_SIZE,  dictionary,  tokens,  minWordSize,  minSubwordSize,  maxSubwordSize,  onlyLongestMatch
Fields inherited from org.apache.lucene.analysis.TokenFilter:
input
Constructor:
 public HyphenationCompoundWordTokenFilter(TokenStream input,
    HyphenationTree hyphenator,
    String[] dictionary) 
    Parameters:
    input - the TokenStream to process
    hyphenator - the hyphenation pattern tree to use for hyphenation
    dictionary - the word dictionary to match against
 public HyphenationCompoundWordTokenFilter(TokenStream input,
    HyphenationTree hyphenator,
    Set dictionary) 
    Parameters:
    input - the TokenStream to process
    hyphenator - the hyphenation pattern tree to use for hyphenation
    dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.
 public HyphenationCompoundWordTokenFilter(TokenStream input,
    HyphenationTree hyphenator,
    String[] dictionary,
    int minWordSize,
    int minSubwordSize,
    int maxSubwordSize,
    boolean onlyLongestMatch) 
    Parameters:
    input - the TokenStream to process
    hyphenator - the hyphenation pattern tree to use for hyphenation
    dictionary - the word dictionary to match against
    minWordSize - only words longer than this get processed
    minSubwordSize - only subwords longer than this get to the output stream
    maxSubwordSize - only subwords shorter than this get to the output stream
    onlyLongestMatch - Add only the longest matching subword to the stream
 public HyphenationCompoundWordTokenFilter(TokenStream input,
    HyphenationTree hyphenator,
    Set dictionary,
    int minWordSize,
    int minSubwordSize,
    int maxSubwordSize,
    boolean onlyLongestMatch) 
    Parameters:
    input - the TokenStream to process
    hyphenator - the hyphenation pattern tree to use for hyphenation
    dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.
    minWordSize - only words longer than this get processed
    minSubwordSize - only subwords longer than this get to the output stream
    maxSubwordSize - only subwords shorter than this get to the output stream
    onlyLongestMatch - Add only the longest matching subword to the stream
Method from org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter Summary:
decomposeInternal,   getHyphenationTree,   getHyphenationTree,   getHyphenationTree
Methods from org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase:
addAllLowerCase,   createToken,   decompose,   decomposeInternal,   incrementToken,   makeDictionary,   makeLowerCaseCopy,   reset
Methods from org.apache.lucene.analysis.TokenFilter:
close,   end,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   end,   incrementToken,   reset
Methods from org.apache.lucene.util.AttributeSource:
addAttribute,   addAttributeImpl,   captureState,   clearAttributes,   cloneAttributes,   equals,   getAttribute,   getAttributeClassesIterator,   getAttributeFactory,   getAttributeImplsIterator,   hasAttribute,   hasAttributes,   hashCode,   restoreState,   toString
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter Detail:
 protected  void decomposeInternal(Token token) 
 public static HyphenationTree getHyphenationTree(String hyphenationFilename) throws Exception 
    Create a hyphenator tree
 public static HyphenationTree getHyphenationTree(File hyphenationFile) throws Exception 
    Create a hyphenator tree
 public static HyphenationTree getHyphenationTree(Reader hyphenationReader) throws Exception 
    Create a hyphenator tree