java.lang.Object org.apache.lucene.util.AttributeSource org.apache.lucene.analysis.TokenStream
All Implemented Interfaces:
TokenStreamenumerates the sequence of tokens, either from Field s of a Document or from query text.
This is an abstract class; concrete subclasses are:
TokenStreamwhose input is a Reader; and
TokenStreamwhose input is another
TokenStreamAPI has been introduced with Lucene 2.9. This API has moved from being Token -based to Attribute -based. While Token still exists in 2.9 as a convenience class, the preferred way to store the information of a Token is to use AttributeImpl s.
TokenStream now extends AttributeSource , which provides
access to all of the token Attribute s for the
Note that only one instance per AttributeImpl is created and reused
for every token. This approach reduces object creation and allows local
caching of references to the AttributeImpl s. See
#incrementToken() for further details.
The workflow of the new
TokenStream API is as follows:
TokenStream/TokenFilter s which add/get attributes to/from the AttributeSource .
You can find some example code for the new API in the analysis package level Javadoc.
Sometimes it is desirable to capture a current state of a
e.g., for buffering purposes (see CachingTokenFilter ,
TeeSinkTokenFilter ). For this usecase
AttributeSource#captureState and AttributeSource#restoreState
can be used.
|Method from org.apache.lucene.analysis.TokenStream Summary:|
|close, end, incrementToken, reset|
|Methods from org.apache.lucene.util.AttributeSource:|
|addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString|
|Methods from java.lang.Object:|
|clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait|
|Method from org.apache.lucene.analysis.TokenStream Detail:|
public void close() throws IOException
public void end() throws IOException
abstract public boolean incrementToken() throws IOException
The producer must make no assumptions about the attributes after the method has been returned: the caller may arbitrarily change it. If the producer needs to preserve the state for subsequent calls, it can use #captureState to create a copy of the current attribute state.
This method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to #addAttribute(Class) and #getAttribute(Class) , references to all AttributeImpl s that this stream uses should be retrieved during instantiation.
To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in #incrementToken() .
public void reset() throws IOException