Home » lucene-3.0.1-src » org.apache » lucene » analysis » compound » [javadoc | source]
org.apache.lucene.analysis.compound
public class: DictionaryCompoundWordTokenFilter [javadoc | source]
java.lang.Object
   org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
         org.apache.lucene.analysis.TokenFilter
            org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
               org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

All Implemented Interfaces:
    Closeable

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

Fields inherited from org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase:
DEFAULT_MIN_WORD_SIZE,  DEFAULT_MIN_SUBWORD_SIZE,  DEFAULT_MAX_SUBWORD_SIZE,  dictionary,  tokens,  minWordSize,  minSubwordSize,  maxSubwordSize,  onlyLongestMatch
Fields inherited from org.apache.lucene.analysis.TokenFilter:
input
Constructor:
 public DictionaryCompoundWordTokenFilter(TokenStream input,
    String[] dictionary) 
    Parameters:
    input - the TokenStream to process
    dictionary - the word dictionary to match against
 public DictionaryCompoundWordTokenFilter(TokenStream input,
    Set dictionary) 
    Parameters:
    input - the TokenStream to process
    dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.
 public DictionaryCompoundWordTokenFilter(TokenStream input,
    String[] dictionary,
    int minWordSize,
    int minSubwordSize,
    int maxSubwordSize,
    boolean onlyLongestMatch) 
    Parameters:
    input - the TokenStream to process
    dictionary - the word dictionary to match against
    minWordSize - only words longer than this get processed
    minSubwordSize - only subwords longer than this get to the output stream
    maxSubwordSize - only subwords shorter than this get to the output stream
    onlyLongestMatch - Add only the longest matching subword to the stream
 public DictionaryCompoundWordTokenFilter(TokenStream input,
    Set dictionary,
    int minWordSize,
    int minSubwordSize,
    int maxSubwordSize,
    boolean onlyLongestMatch) 
    Parameters:
    input - the TokenStream to process
    dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.
    minWordSize - only words longer than this get processed
    minSubwordSize - only subwords longer than this get to the output stream
    maxSubwordSize - only subwords shorter than this get to the output stream
    onlyLongestMatch - Add only the longest matching subword to the stream
Method from org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter Summary:
decomposeInternal
Methods from org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase:
addAllLowerCase,   createToken,   decompose,   decomposeInternal,   incrementToken,   makeDictionary,   makeLowerCaseCopy,   reset
Methods from org.apache.lucene.analysis.TokenFilter:
close,   end,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   end,   incrementToken,   reset
Methods from org.apache.lucene.util.AttributeSource:
addAttribute,   addAttributeImpl,   captureState,   clearAttributes,   cloneAttributes,   equals,   getAttribute,   getAttributeClassesIterator,   getAttributeFactory,   getAttributeImplsIterator,   hasAttribute,   hasAttributes,   hashCode,   restoreState,   toString
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter Detail:
 protected  void decomposeInternal(Token token)