Home » lucene-3.0.1-src » org.apache » lucene » analysis » nl » [javadoc | source]
org.apache.lucene.analysis.nl
public class: DutchAnalyzer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.nl.DutchAnalyzer

All Implemented Interfaces:
    Closeable

Analyzer for Dutch language.

Supports an external list of stopwords (words that will not be indexed at all), an external list of exclusions (word that will not be stemmed, but indexed) and an external list of word-stem pairs that overrule the algorithm (dictionary stemming). A default set of stopwords is used unless an alternative list is specified, but the exclusion list is empty by default.

NOTE: This class uses the same Version dependent settings as StandardAnalyzer .

Field Summary
public static final  String[] DUTCH_STOP_WORDS    List of typical Dutch stopwords.
     
    Fields inherited from org.apache.lucene.analysis.Analyzer:
    overridesTokenStreamMethod
    Constructor:
     public DutchAnalyzer(Version matchVersion) 
      Builds an analyzer with the default stop words (#DUTCH_STOP_WORDS ) and a few default entries for the stem exclusion table.
     public DutchAnalyzer(Version matchVersion,
        Set<?> stopwords) 
     public DutchAnalyzer(Version matchVersion,
        String stopwords) 
      Builds an analyzer with the given stop words.
      Parameters:
      matchVersion -
      stopwords -
     public DutchAnalyzer(Version matchVersion,
        HashSet<?> stopwords) 
      Builds an analyzer with the given stop words.
      Parameters:
      stopwords -
     public DutchAnalyzer(Version matchVersion,
        File stopwords) 
      Builds an analyzer with the given stop words.
      Parameters:
      stopwords -
     public DutchAnalyzer(Version matchVersion,
        Set<?> stopwords,
        Set<?> stemExclusionTable) 
    Method from org.apache.lucene.analysis.nl.DutchAnalyzer Summary:
    getDefaultStopSet,   reusableTokenStream,   setStemDictionary,   setStemExclusionTable,   setStemExclusionTable,   setStemExclusionTable,   tokenStream
    Methods from org.apache.lucene.analysis.Analyzer:
    close,   getOffsetGap,   getPositionIncrementGap,   getPreviousTokenStream,   reusableTokenStream,   setOverridesTokenStreamMethod,   setPreviousTokenStream,   tokenStream
    Methods from java.lang.Object:
    clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
    Method from org.apache.lucene.analysis.nl.DutchAnalyzer Detail:
     public static Set<?> getDefaultStopSet() 
      Returns an unmodifiable instance of the default stop-words set.
     public TokenStream reusableTokenStream(String fieldName,
        Reader reader) throws IOException 
      Returns a (possibly reused) TokenStream which tokenizes all the text in the provided Reader .
     public  void setStemDictionary(File stemdictFile) 
      Reads a stemdictionary file , that overrules the stemming algorithm This is a textfile that contains per line word\tstem, i.e: two tab seperated words
     public  void setStemExclusionTable(String exclusionlist) 
    Deprecated! use - #DutchAnalyzer(Version, Set, Set) instead

      Builds an exclusionlist from an array of Strings.
     public  void setStemExclusionTable(HashSet<?> exclusionlist) 
    Deprecated! use - #DutchAnalyzer(Version, Set, Set) instead

      Builds an exclusionlist from a Hashtable.
     public  void setStemExclusionTable(File exclusionlist) 
    Deprecated! use - #DutchAnalyzer(Version, Set, Set) instead

      Builds an exclusionlist from the words contained in the given file.
     public TokenStream tokenStream(String fieldName,
        Reader reader)