Home » lucene-3.0.1-src » org.apache » lucene » analysis »

org.apache.lucene.analysis

Sub Packages:

org.apache.lucene.analysis.ar   Analyzer for Arabic.  
org.apache.lucene.analysis.br   Analyzer for Brazilian Portuguese.  
org.apache.lucene.analysis.cjk   Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).  
org.apache.lucene.analysis.cn   Analyzer for Chinese, which indexes unigrams (individual chinese characters).  
org.apache.lucene.analysis.cn.smart   Analyzer for Simplified Chinese, which indexes words.  
org.apache.lucene.analysis.cn.smart.hhmm   SmartChineseAnalyzer Hidden Markov Model package.  
org.apache.lucene.analysis.compound   A filter that decomposes compound words you find in many Germanic languages into the word parts.  
org.apache.lucene.analysis.compound.hyphenation   The code for the compound word hyphenation is taken from the Apache FOP project .  
org.apache.lucene.analysis.cz   Analyzer for Czech.  
org.apache.lucene.analysis.de   Analyzer for German.  
org.apache.lucene.analysis.el   Analyzer for Greek.  
org.apache.lucene.analysis.fa   Analyzer for Persian.  
org.apache.lucene.analysis.fr   Analyzer for French.  
org.apache.lucene.analysis.miscellaneous   Miscellaneous TokenStreams  
org.apache.lucene.analysis.ngram   Character n-gram tokenizers and filters.  
org.apache.lucene.analysis.nl   Analyzer for Dutch.  
org.apache.lucene.analysis.payloads   Provides various convenience classes for creating payloads on Tokens.  
org.apache.lucene.analysis.position   Filter for assigning position increments.  
org.apache.lucene.analysis.query   Automatically filter high-frequency stopwords.  
org.apache.lucene.analysis.reverse   Filter to reverse token text.  
org.apache.lucene.analysis.ru   Analyzer for Russian.  
org.apache.lucene.analysis.shingle   Word n-gram filters  
org.apache.lucene.analysis.sinks   Implementations of the SinkTokenizer that might be useful.  
org.apache.lucene.analysis.snowball   org.apache.lucene.analysis.TokenFilter and org.apache.lucene.analysis.Analyzer implementations that use Snowball stemmers.  
org.apache.lucene.analysis.standard   A fast grammar-based tokenizer constructed with JFlex.  
org.apache.lucene.analysis.th   Analyzer for Thai.  
org.apache.lucene.analysis.tokenattributes    

Abstract Classes:

Analyzer   An Analyzer builds TokenStreams, which analyze text.  code | html
BaseCharFilter   Base utility class for implementing a CharFilter code | html
CharFilter   Subclasses of CharFilter can be chained to filter CharStream.  code | html
CharStream   CharStream adds #correctOffset functionality over Reader code | html
CharTokenizer   An abstract base class for simple, character-oriented tokenizers.  code | html
TeeSinkTokenFilter.SinkFilter   A filter that decides which AttributeSource states to store in the sink.  code | html
TokenFilter   A TokenFilter is a TokenStream whose input is another TokenStream.  code | html
TokenStream   A TokenStream enumerates the sequence of tokens, either from Field s of a Document or from query text.  code | html
Tokenizer   A Tokenizer is a TokenStream whose input is a Reader.  code | html

Classes:

ASCIIFoldingFilter   This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.  code | html
BaseCharFilter.OffCorrectMap     code | html
CachingTokenFilter   This class can be used if the token attributes of a TokenStream are intended to be consumed more than once.  code | html
CharArraySet   A simple class that stores Strings as char[]'s in a hash table.  code | html
CharArraySet.CharArraySetIterator   The Iterator for this set.  code | html
CharArraySet.UnmodifiableCharArraySet   Efficient unmodifiable CharArraySet code | html
CharReader   CharReader is a Reader wrapper.  code | html
ISOLatin1AccentFilter   A filter that replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent.  code | html
KeywordAnalyzer   "Tokenizes" the entire stream as a single token.  code | html
KeywordTokenizer   Emits the entire input as a single token.  code | html
LengthFilter   Removes words that are too long or too short from the stream.  code | html
LetterTokenizer   A LetterTokenizer is a tokenizer that divides text at non-letters.  code | html
LowerCaseFilter   Normalizes token text to lower case.  code | html
LowerCaseTokenizer   LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.  code | html
MappingCharFilter   Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap to the character stream, and correcting the resulting changes to the offsets.  code | html
NormalizeCharMap   Holds a map of String input to String output, to be used with MappingCharFilter code | html
NumericTokenStream   Expert: This class provides a TokenStream for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter code | html
PerFieldAnalyzerWrapper   This analyzer is used to facilitate scenarios where different fields require different analysis techniques.  code | html
PorterStemFilter   Transforms the token stream as per the Porter stemming algorithm.  code | html
PorterStemmer   Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.  code | html
SimpleAnalyzer   An Analyzer that filters LetterTokenizer with LowerCaseFilter   code | html
SinkTokenizer   A SinkTokenizer can be used to cache Tokens for use in an Analyzer  code | html
StopAnalyzer   Filters LetterTokenizer with LowerCaseFilter and StopFilter code | html
StopAnalyzer.SavedStreams   Filters LowerCaseTokenizer with StopFilter.  code | html
StopFilter   Removes stop words from a token stream.  code | html
TeeSinkTokenFilter   This TokenFilter provides the ability to set aside attribute states that have already been analyzed.  code | html
TeeSinkTokenFilter.SinkTokenStream     code | html
TeeTokenFilter   Works in conjunction with the SinkTokenizer to provide the ability to set aside tokens that have already been analyzed.  code | html
TestAnalyzers   Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.  code | html
Token   A Token is an occurrence of a term from the text of a field.  code | html
Token.TokenAttributeFactory   Expert: Creates a TokenAttributeFactory returning Token as instance for the basic attributes and for all other attributes calls the given delegate factory.  code | html
WhitespaceAnalyzer   An Analyzer that uses WhitespaceTokenizer code | html
WhitespaceTokenizer   A WhitespaceTokenizer is a tokenizer that divides text at whitespace.  code | html
WordlistLoader   Loader for text files that represent a list of stopwords.  code | html

All Test Cases:

TestPerFieldAnalzyerWrapper   Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.  code | html
TestStopAnalyzer   Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.  code | html