|
|||||||||
Home >> All >> org >> apache >> xerces >> utils >> [ regex overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: ![]() ![]() ![]() |
DETAIL: FIELD | CONSTR | METHOD |
org.apache.xerces.utils.regex
Class RegularExpression

java.lang.Objectorg.apache.xerces.utils.regex.RegularExpression
- All Implemented Interfaces:
- java.io.Serializable
- public class RegularExpression
- extends java.lang.Object
- implements java.io.Serializable
- extends java.lang.Object
A regular expression matching engine using Non-deterministic Finite Automaton (NFA). This engine does not conform to the POSIX regular expression.
How to use
- A. Standard way
-
RegularExpression re = new RegularExpression(regex); if (re.matches(text)) { ... }
- B. Capturing groups
-
RegularExpression re = new RegularExpression(regex); Match match = new Match(); if (re.matches(text, match)) { ... // You can refer captured texts with methods of the
Match
class. }
Case-insensitive matching
RegularExpression re = new RegularExpression(regex, "i"); if (re.matches(text) >= 0) { ...}
Options
You can specify options to RegularExpression(
regex,
options)
or setPattern(
regex,
options)
.
This options parameter consists of the following characters.
"i"
- This option indicates case-insensitive matching.
"m"
- ^ and $ consider the EOL characters within the text.
"s"
- . matches any one character.
"u"
- Redefines \d \D \w \W \s \S \b \B \< \> as becoming to Unicode.
"w"
- By this option, \b \B \< \> are processed with the method of 'Unicode Regular Expression Guidelines' Revision 4. When "w" and "u" are specified at the same time, \b \B \< \> are processed for the "w" option.
","
- The parser treats a comma in a character class as a range separator. [a,b] matches a or , or b without this option. [a,b] matches a or b with this option.
"X"
-
By this option, the engine confoms to XML Schema: Regular Expression.
The
match()
method does not do subsring matching but entire string matching.
Syntax
Differences from the Perl 5 regular expression
|
Meta characters are `. * + ? { [ ( ) | \ ^ $'.
- Character
- . (A period)
- Matches any one character except the following characters.
- LINE FEED (U+000A), CARRIAGE RETURN (U+000D), PARAGRAPH SEPARATOR (U+2029), LINE SEPARATOR (U+2028)
- This expression matches one code point in Unicode. It can match a pair of surrogates.
- When the "s" option is specified, it matches any character including the above four characters.
- LINE FEED (U+000A), CARRIAGE RETURN (U+000D), PARAGRAPH SEPARATOR (U+2029), LINE SEPARATOR (U+2028)
- \e \f \n \r \t
- Matches ESCAPE (U+001B), FORM FEED (U+000C), LINE FEED (U+000A), CARRIAGE RETURN (U+000D), HORIZONTAL TABULATION (U+0009)
- \cC
- Matches a control character.
The C must be one of '@', 'A'-'Z',
'[', '\', ']', '^', '_'.
It matches a control character of which the character code is less than
the character code of the C by 0x0040.
- For example, a \cJ matches a LINE FEED (U+000A), and a \c[ matches an ESCAPE (U+001B).
- a non-meta character
- Matches the character.
- \ + a meta character
- Matches the meta character.
- \xHH \x{HHHH}
- Matches a character of which code point is HH (Hexadecimal) in Unicode. You can write just 2 digits for \xHH, and variable length digits for \x{HHHH}.
- \vHHHHHH
- Matches a character of which code point is HHHHHH (Hexadecimal) in Unicode.
- \g
- Matches a grapheme.
- It is equivalent to (?[\p{ASSIGNED}]-[\p{M}\p{C}])?(?:\p{M}|[\x{094D}\x{09CD}\x{0A4D}\x{0ACD}\x{0B3D}\x{0BCD}\x{0C4D}\x{0CCD}\x{0D4D}\x{0E3A}\x{0F84}]\p{L}|[\x{1160}-\x{11A7}]|[\x{11A8}-\x{11FF}]|[\x{FF9E}\x{FF9F}])*
- \X
- Matches a combining character sequence. It is equivalent to (?:\PM\pM*)
- Character class
-
+ *
- [R1R2...Rn] (without "," option)
+ *
- [R1,R2,...,Rn] (with "," option)
- Positive character class. It matches a character in ranges.
- Rn:
- A character (including \e \f \n \r \t \xHH \x{HHHH} \vHHHHHH)
This range matches the character.
- C1-C2
This range matches a character which has a code point that is >= C1's code point and <= C2's code point. + *
- A POSIX character class: [:alpha:] [:alnum:] [:ascii:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:],
+ * and negative POSIX character classes in Perl like [:^alpha:]
...
- \d \D \s \S \w \W \p{name} \P{name}
These expressions specifies the same ranges as the following expressions.
Enumerated ranges are merged (union operation). [a-ec-z] is equivalent to [a-z]
- Rn:
- [^R1R2...Rn] (without a "," option)
- [^R1,R2,...,Rn] (with a "," option)
- Negative character class. It matches a character not in ranges.
- (?[ranges]op[ranges]op[ranges] ... ) (op is - or + or &.)
- Subtraction or union or intersection for character classes.
- For exmaple, (?[A-Z]-[CF]) is equivalent to [A-BD-EG-Z], and (?[0x00-0x7f]-[K]&[\p{Lu}]) is equivalent to [A-JL-Z].
- The result of this operations is a positive character class even if an expression includes any negative character classes. You have to take care on this in case-insensitive matching. For instance, (?[^b]) is equivalent to [\x00-ac-\x{10ffff}], which is equivalent to [^b] in case-sensitive matching. But, in case-insensitive matching, (?[^b]) matches any character because it includes 'B' and 'B' matches 'b' though [^b] is processed as [^Bb].
- For exmaple, (?[A-Z]-[CF]) is equivalent to [A-BD-EG-Z], and (?[0x00-0x7f]-[K]&[\p{Lu}]) is equivalent to [A-JL-Z].
- [R1R2...-[RnRn+1...]] (with an "X" option)
- Character class subtraction for the XML Schema. You can use this syntax when you specify an "X" option.
- \d
- Equivalent to [0-9].
- When a "u" option is set, it is equivalent to \p{Nd}.
- \D
- Equivalent to [^0-9]
- When a "u" option is set, it is equivalent to \P{Nd}.
- \s
- Equivalent to [ \f\n\r\t]
- When a "u" option is set, it is equivalent to [ \f\n\r\t\p{Z}].
- \S
- Equivalent to [^ \f\n\r\t]
- When a "u" option is set, it is equivalent to [^ \f\n\r\t\p{Z}].
- \w
- Equivalent to [a-zA-Z0-9_]
- When a "u" option is set, it is equivalent to [\p{Lu}\p{Ll}\p{Lo}\p{Nd}_].
- \W
- Equivalent to [^a-zA-Z0-9_]
- When a "u" option is set, it is equivalent to [^\p{Lu}\p{Ll}\p{Lo}\p{Nd}_].
- \p{name}
- Matches one character in the specified General Category (the second field in UnicodeData.txt) or the specified Block.
The following names are available:
- Unicode General Categories:
-
L, M, N, Z, C, P, S, Lu, Ll, Lt, Lm, Lo, Mn, Me, Mc, Nd, Nl, No, Zs, Zl, Zp,
Cc, Cf, Cn, Co, Cs, Pd, Ps, Pe, Pc, Po, Sm, Sc, Sk, So,
- (Currently the Cn category includes U+10000-U+10FFFF characters)
- Unicode Blocks:
- Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, IPA Extensions, Spacing Modifier Letters, Combining Diacritical Marks, Greek, Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, Georgian, Hangul Jamo, Latin Extended Additional, Greek Extended, General Punctuation, Superscripts and Subscripts, Currency Symbols, Combining Marks for Symbols, Letterlike Symbols, Number Forms, Arrows, Mathematical Operators, Miscellaneous Technical, Control Pictures, Optical Character Recognition, Enclosed Alphanumerics, Box Drawing, Block Elements, Geometric Shapes, Miscellaneous Symbols, Dingbats, CJK Symbols and Punctuation, Hiragana, Katakana, Bopomofo, Hangul Compatibility Jamo, Kanbun, Enclosed CJK Letters and Months, CJK Compatibility, CJK Unified Ideographs, Hangul Syllables, High Surrogates, High Private Use Surrogates, Low Surrogates, Private Use, CJK Compatibility Ideographs, Alphabetic Presentation Forms, Arabic Presentation Forms-A, Combining Half Marks, CJK Compatibility Forms, Small Form Variants, Arabic Presentation Forms-B, Specials, Halfwidth and Fullwidth Forms
- Others:
- ALL (Equivalent to [\u0000-\v10FFFF])
- ASSGINED (\p{ASSIGNED} is equivalent to \P{Cn})
- UNASSGINED (\p{UNASSIGNED} is equivalent to \p{Cn})
- ASSGINED (\p{ASSIGNED} is equivalent to \P{Cn})
- \P{name}
- Matches one character not in the specified General Category or the specified Block.
- [R1R2...Rn] (without "," option)
+ *
- Selection and Quantifier
- X|Y
- ...
- X*
- Matches 0 or more X.
- X+
- Matches 1 or more X.
- X?
- Matches 0 or 1 X.
- X{number}
- Matches number times.
- X{min,}
- ...
- X{min,max}
- ...
- X*?
- X+?
- X??
- X{min,}?
- X{min,max}?
- X+?
- Non-greedy matching.
- Grouping, Capturing, and Back-reference
- (?:X)
- Grouping. "foo+" matches "foo" or "foooo". If you want it matches "foofoo" or "foofoofoo", you have to write "(?:foo)+".
- (X)
- Grouping with capturing.
It make a group and applications can know
where in target text a group matched with methods of a
Match
instance aftermatches(String,Match)
. The 0th group means whole of this regular expression. The Nth gorup is the inside of the Nth left parenthesis.For instance, a regular expression is " *([^<:]*) +<([^>]*)> *" and target text is "From: TAMURA Kent <kent@trl.ibm.co.jp>":
Match.getCapturedText(0)
: " TAMURA Kent <kent@trl.ibm.co.jp>"Match.getCapturedText(1)
: "TAMURA Kent"Match.getCapturedText(2)
: "kent@trl.ibm.co.jp"
- \1 \2 \3 \4 \5 \6 \7 \8 \9
- (?>X)
- Independent expression group. ................
- (?options:X)
- (?options-options2:X)
- ............................
- The options or the options2 consists of 'i' 'm' 's' 'w'. Note that it can not contain 'u'.
- (?options)
- (?options-options2)
- ......
- These expressions must be at the beginning of a group.
- Anchor
- \A
- Matches the beginnig of the text.
- \Z
- Matches the end of the text, or before an EOL character at the end of the text, or CARRIAGE RETURN + LINE FEED at the end of the text.
- \z
- Matches the end of the text.
- ^
- Matches the beginning of the text. It is equivalent to \A.
- When a "m" option is set, it matches the beginning of the text, or after one of EOL characters ( LINE FEED (U+000A), CARRIAGE RETURN (U+000D), LINE SEPARATOR (U+2028), PARAGRAPH SEPARATOR (U+2029).)
- $
- Matches the end of the text, or before an EOL character at the end of the text,
or CARRIAGE RETURN + LINE FEED at the end of the text.
- When a "m" option is set, it matches the end of the text, or before an EOL character.
- \b
- Matches word boundary. (See a "w" option)
- \B
- Matches non word boundary. (See a "w" option)
- \<
- Matches the beginning of a word. (See a "w" option)
- \>
- Matches the end of a word. (See a "w" option)
- Lookahead and lookbehind
- (?=X)
- Lookahead.
- (?!X)
- Negative lookahead.
- (?<=X)
- Lookbehind.
- (Note for text capturing......)
- (?<!X)
- Negative lookbehind.
- Misc.
- (?(condition)yes-pattern|no-pattern),
- (?(condition)yes-pattern)
- ......
- (?#comment)
- Comment. A comment string consists of characters except ')'. You can not write comments in character classes and before quantifiers.
- (?(condition)yes-pattern|no-pattern),
BNF for the regular expression
regex ::= ('(?' options ')')? term ('|' term)* term ::= factor+ factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )? | '(?#' [^)]* ')' minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}' atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9] | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X' | '(?>' regex ')' | '(?' options ':' regex ')' | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')' options ::= [imsw]* ('-' [imsw]+)? anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\<' | '\>' looks ::= '(?=' regex ')' | '(?!' regex ')' | '(?<=' regex ')' | '(?<!' regex ')' char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1 category-block ::= '\' [pP] category-symbol-1 | ('\p{' | '\P{') (category-symbol | block-name | other-properties) '}' category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S' category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo' | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No' | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs' | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po' | 'Sm' | 'Sc' | 'Sk' | 'So' block-name ::= (See above) other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED' character-1 ::= (any character except meta-characters) char-class ::= '[' ranges ']' | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')' ranges ::= '^'? (range ','?)+ range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block | range-char | range-char '-' range-char range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2 code-point ::= '\x' hex-char hex-char | '\x{' hex-char+ '}' | '\v' hex-char hex-char hex-char hex-char hex-char hex-char hex-char ::= [0-9a-fA-F] character-2 ::= (any character except \[]-,)
TODO
- Unicode Regular Expression Guidelines
- 2.4 Canonical Equivalents
- Level 3
- Parsing performance
Nested Class Summary | |
(package private) static class |
RegularExpression.Context
|
Field Summary | |
(package private) static int |
CARRIAGE_RETURN
|
(package private) RegularExpression.Context |
context
|
(package private) static boolean |
DEBUG
|
(package private) static int |
EXTENDED_COMMENT
"x" |
(package private) RangeToken |
firstChar
|
(package private) java.lang.String |
fixedString
|
(package private) boolean |
fixedStringOnly
|
(package private) int |
fixedStringOptions
|
(package private) BMPattern |
fixedStringTable
|
(package private) boolean |
hasBackReferences
|
(package private) static int |
IGNORE_CASE
"i" |
(package private) static int |
LINE_FEED
|
(package private) static int |
LINE_SEPARATOR
|
(package private) int |
minlength
|
(package private) static int |
MULTIPLE_LINES
"m" |
(package private) int |
nofparen
The number of parenthesis in the regular expression. |
(package private) int |
numberOfClosures
|
(package private) Op |
operations
|
(package private) int |
options
|
(package private) static int |
PARAGRAPH_SEPARATOR
|
(package private) static int |
PROHIBIT_FIXED_STRING_OPTIMIZATION
"F" |
(package private) static int |
PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
"H" |
(package private) java.lang.String |
regex
A regular expression. |
(package private) static int |
SINGLE_LINE
"s" |
(package private) static int |
SPECIAL_COMMA
",". |
(package private) Token |
tokentree
Internal representation of the regular expression. |
(package private) static int |
UNICODE_WORD_BOUNDARY
An option. |
(package private) static int |
USE_UNICODE_CATEGORY
This option redefines \d \D \w \W \s \S. |
(package private) static Token |
wordchar
|
private static int |
WT_IGNORE
|
private static int |
WT_LETTER
|
private static int |
WT_OTHER
|
(package private) static int |
XMLSCHEMA_MODE
"X". |
Constructor Summary | |
|
RegularExpression(java.lang.String regex)
Creates a new RegularExpression instance. |
|
RegularExpression(java.lang.String regex,
java.lang.String options)
Creates a new RegularExpression instance with options. |
(package private) |
RegularExpression(java.lang.String regex,
Token tok,
int parens,
boolean hasBackReferences,
int options)
|
Method Summary | |
private void |
compile(Token tok)
Compiles a token tree into an operation flow. |
private Op |
compile(Token tok,
Op next,
boolean reverse)
Converts a token to an operation. |
boolean |
equals(java.lang.Object obj)
Return true if patterns are the same and the options are equivalent. |
(package private) boolean |
equals(java.lang.String pattern,
int options)
|
int |
getNumberOfGroups()
Return the number of regular expression groups. |
java.lang.String |
getOptions()
Returns a option string. |
java.lang.String |
getPattern()
|
private static int |
getPreviousWordType(char[] target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getPreviousWordType(java.text.CharacterIterator target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getPreviousWordType(java.lang.String target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getWordType(char[] target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getWordType(java.text.CharacterIterator target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getWordType(java.lang.String target,
int begin,
int end,
int offset,
int opts)
|
private static int |
getWordType0(char ch,
int opts)
|
int |
hashCode()
Get a value that represents this Object, as uniquely as possible within the confines of an int. |
private static boolean |
isEOLChar(int ch)
|
private static boolean |
isSet(int options,
int flag)
|
private static boolean |
isWordChar(int ch)
|
private int |
matchCharacterIterator(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts)
|
private int |
matchCharArray(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts)
|
boolean |
matches(char[] target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(char[] target,
int start,
int end)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(char[] target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(char[] target,
Match match)
Checks whether the target text contains this pattern or not. |
boolean |
matches(java.text.CharacterIterator target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(java.text.CharacterIterator target,
Match match)
Checks whether the target text contains this pattern or not. |
boolean |
matches(java.lang.String target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(java.lang.String target,
int start,
int end)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(java.lang.String target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(java.lang.String target,
Match match)
Checks whether the target text contains this pattern or not. |
private static boolean |
matchIgnoreCase(int chardata,
int ch)
|
private int |
matchString(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts)
|
(package private) void |
prepare()
Prepares for matching. |
private static boolean |
regionMatches(char[] target,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatches(char[] target,
int offset,
int limit,
java.lang.String part,
int partlen)
|
private static boolean |
regionMatches(java.text.CharacterIterator target,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatches(java.text.CharacterIterator target,
int offset,
int limit,
java.lang.String part,
int partlen)
|
private static boolean |
regionMatches(java.lang.String text,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatches(java.lang.String text,
int offset,
int limit,
java.lang.String part,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(char[] target,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(char[] target,
int offset,
int limit,
java.lang.String part,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(java.text.CharacterIterator target,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(java.text.CharacterIterator target,
int offset,
int limit,
java.lang.String part,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(java.lang.String text,
int offset,
int limit,
int offset2,
int partlen)
|
private static boolean |
regionMatchesIgnoreCase(java.lang.String text,
int offset,
int limit,
java.lang.String part,
int partlen)
|
void |
setPattern(java.lang.String newPattern)
|
private void |
setPattern(java.lang.String newPattern,
int options)
|
void |
setPattern(java.lang.String newPattern,
java.lang.String options)
|
java.lang.String |
toString()
Represents this instence in String. |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
DEBUG
static final boolean DEBUG
- See Also:
- Constant Field Values
regex
java.lang.String regex
- A regular expression.
options
int options
nofparen
int nofparen
- The number of parenthesis in the regular expression.
tokentree
Token tokentree
- Internal representation of the regular expression.
hasBackReferences
boolean hasBackReferences
minlength
transient int minlength
operations
transient Op operations
numberOfClosures
transient int numberOfClosures
context
transient RegularExpression.Context context
firstChar
transient RangeToken firstChar
fixedString
transient java.lang.String fixedString
fixedStringOptions
transient int fixedStringOptions
fixedStringTable
transient BMPattern fixedStringTable
fixedStringOnly
transient boolean fixedStringOnly
IGNORE_CASE
static final int IGNORE_CASE
- "i"
- See Also:
- Constant Field Values
SINGLE_LINE
static final int SINGLE_LINE
- "s"
- See Also:
- Constant Field Values
MULTIPLE_LINES
static final int MULTIPLE_LINES
- "m"
- See Also:
- Constant Field Values
EXTENDED_COMMENT
static final int EXTENDED_COMMENT
- "x"
- See Also:
- Constant Field Values
USE_UNICODE_CATEGORY
static final int USE_UNICODE_CATEGORY
- This option redefines \d \D \w \W \s \S.
- See Also:
#RegularExpression(java.lang.String,int)
,setPattern(java.lang.String,int)
55 ,UNICODE_WORD_BOUNDARY
55 , Constant Field Values
UNICODE_WORD_BOUNDARY
static final int UNICODE_WORD_BOUNDARY
- An option.
This enables to process locale-independent word boundary for \b \B \< \>.
By default, the engine considers a position between a word character (\w) and a non word character is a word boundary.
By this option, the engine checks word boundaries with the method of 'Unicode Regular Expression Guidelines' Revision 4.
- See Also:
#RegularExpression(java.lang.String,int)
,setPattern(java.lang.String,int)
55 , Constant Field Values
PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
static final int PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
- "H"
- See Also:
- Constant Field Values
PROHIBIT_FIXED_STRING_OPTIMIZATION
static final int PROHIBIT_FIXED_STRING_OPTIMIZATION
- "F"
- See Also:
- Constant Field Values
XMLSCHEMA_MODE
static final int XMLSCHEMA_MODE
- "X". XML Schema mode.
- See Also:
- Constant Field Values
SPECIAL_COMMA
static final int SPECIAL_COMMA
- ",".
- See Also:
- Constant Field Values
WT_IGNORE
private static final int WT_IGNORE
- See Also:
- Constant Field Values
WT_LETTER
private static final int WT_LETTER
- See Also:
- Constant Field Values
WT_OTHER
private static final int WT_OTHER
- See Also:
- Constant Field Values
wordchar
static transient Token wordchar
LINE_FEED
static final int LINE_FEED
- See Also:
- Constant Field Values
CARRIAGE_RETURN
static final int CARRIAGE_RETURN
- See Also:
- Constant Field Values
LINE_SEPARATOR
static final int LINE_SEPARATOR
- See Also:
- Constant Field Values
PARAGRAPH_SEPARATOR
static final int PARAGRAPH_SEPARATOR
- See Also:
- Constant Field Values
Constructor Detail |
RegularExpression
public RegularExpression(java.lang.String regex) throws ParseException
- Creates a new RegularExpression instance.
RegularExpression
public RegularExpression(java.lang.String regex, java.lang.String options) throws ParseException
- Creates a new RegularExpression instance with options.
RegularExpression
RegularExpression(java.lang.String regex, Token tok, int parens, boolean hasBackReferences, int options)
Method Detail |
compile
private void compile(Token tok)
- Compiles a token tree into an operation flow.
compile
private Op compile(Token tok, Op next, boolean reverse)
- Converts a token to an operation.
matches
public boolean matches(char[] target)
- Checks whether the target text contains this pattern or not.
matches
public boolean matches(char[] target, int start, int end)
- Checks whether the target text contains this pattern
in specified range or not.
matches
public boolean matches(char[] target, Match match)
- Checks whether the target text contains this pattern or not.
matches
public boolean matches(char[] target, int start, int end, Match match)
- Checks whether the target text contains this pattern
in specified range or not.
matchCharArray
private int matchCharArray(RegularExpression.Context con, Op op, int offset, int dx, int opts)
getPreviousWordType
private static final int getPreviousWordType(char[] target, int begin, int end, int offset, int opts)
getWordType
private static final int getWordType(char[] target, int begin, int end, int offset, int opts)
regionMatches
private static final boolean regionMatches(char[] target, int offset, int limit, java.lang.String part, int partlen)
regionMatches
private static final boolean regionMatches(char[] target, int offset, int limit, int offset2, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(char[] target, int offset, int limit, java.lang.String part, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(char[] target, int offset, int limit, int offset2, int partlen)
matches
public boolean matches(java.lang.String target)
- Checks whether the target text contains this pattern or not.
matches
public boolean matches(java.lang.String target, int start, int end)
- Checks whether the target text contains this pattern
in specified range or not.
matches
public boolean matches(java.lang.String target, Match match)
- Checks whether the target text contains this pattern or not.
matches
public boolean matches(java.lang.String target, int start, int end, Match match)
- Checks whether the target text contains this pattern
in specified range or not.
matchString
private int matchString(RegularExpression.Context con, Op op, int offset, int dx, int opts)
getPreviousWordType
private static final int getPreviousWordType(java.lang.String target, int begin, int end, int offset, int opts)
getWordType
private static final int getWordType(java.lang.String target, int begin, int end, int offset, int opts)
regionMatches
private static final boolean regionMatches(java.lang.String text, int offset, int limit, java.lang.String part, int partlen)
regionMatches
private static final boolean regionMatches(java.lang.String text, int offset, int limit, int offset2, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(java.lang.String text, int offset, int limit, java.lang.String part, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(java.lang.String text, int offset, int limit, int offset2, int partlen)
matches
public boolean matches(java.text.CharacterIterator target)
- Checks whether the target text contains this pattern or not.
matches
public boolean matches(java.text.CharacterIterator target, Match match)
- Checks whether the target text contains this pattern or not.
matchCharacterIterator
private int matchCharacterIterator(RegularExpression.Context con, Op op, int offset, int dx, int opts)
getPreviousWordType
private static final int getPreviousWordType(java.text.CharacterIterator target, int begin, int end, int offset, int opts)
getWordType
private static final int getWordType(java.text.CharacterIterator target, int begin, int end, int offset, int opts)
regionMatches
private static final boolean regionMatches(java.text.CharacterIterator target, int offset, int limit, java.lang.String part, int partlen)
regionMatches
private static final boolean regionMatches(java.text.CharacterIterator target, int offset, int limit, int offset2, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(java.text.CharacterIterator target, int offset, int limit, java.lang.String part, int partlen)
regionMatchesIgnoreCase
private static final boolean regionMatchesIgnoreCase(java.text.CharacterIterator target, int offset, int limit, int offset2, int partlen)
prepare
void prepare()
- Prepares for matching. This method is called just before starting matching.
isSet
private static final boolean isSet(int options, int flag)
setPattern
public void setPattern(java.lang.String newPattern) throws ParseException
setPattern
private void setPattern(java.lang.String newPattern, int options) throws ParseException
setPattern
public void setPattern(java.lang.String newPattern, java.lang.String options) throws ParseException
getPattern
public java.lang.String getPattern()
toString
public java.lang.String toString()
- Represents this instence in String.
getOptions
public java.lang.String getOptions()
- Returns a option string.
The order of letters in it may be different from a string specified
in a constructor or
setPattern()
.
equals
public boolean equals(java.lang.Object obj)
- Return true if patterns are the same and the options are equivalent.
equals
boolean equals(java.lang.String pattern, int options)
hashCode
public int hashCode()
- Description copied from class:
java.lang.Object
- Get a value that represents this Object, as uniquely as
possible within the confines of an int.
There are some requirements on this method which subclasses must follow:
- Semantic equality implies identical hashcodes. In other
words, if
a.equals(b)
is true, thena.hashCode() == b.hashCode()
must be as well. However, the reverse is not necessarily true, and two objects may have the same hashcode without being equal. - It must be consistent. Whichever value o.hashCode() returns on the first invocation must be the value returned on all later invocations as long as the object exists. Notice, however, that the result of hashCode may change between separate executions of a Virtual Machine, because it is not invoked on the same object.
Notice that since
hashCode
is used in java.util.Hashtable and other hashing classes, a poor implementation will degrade the performance of hashing (so don't blindly implement it as returning a constant!). Also, if calculating the hash is time-consuming, a class may consider caching the results.The default implementation returns
System.identityHashCode(this)
- Semantic equality implies identical hashcodes. In other
words, if
getNumberOfGroups
public int getNumberOfGroups()
- Return the number of regular expression groups.
This method returns 1 when the regular expression has no capturing-parenthesis.
getWordType0
private static final int getWordType0(char ch, int opts)
isEOLChar
private static final boolean isEOLChar(int ch)
isWordChar
private static final boolean isWordChar(int ch)
matchIgnoreCase
private static final boolean matchIgnoreCase(int chardata, int ch)
|
|||||||||
Home >> All >> org >> apache >> xerces >> utils >> [ regex overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: ![]() ![]() ![]() |
DETAIL: FIELD | CONSTR | METHOD |