x* | Zero or more x's (greedy, takes as many as possible) |
x*? | Zero or more x's (stingy, takes as few as possible) |
x+ | One or more x's (greedy, takes as many as possible) |
x+? | One or more x's (stingy, takes as few as possible) |
x? | One or zero x (greedy, try one first) |
x?? | Zero or one x (stingy, try zero first) |
x{n} | n x's |
x{m,n} | At most n and at least m x's (greedy, takes as many as possible) |
x{m,n}? | At least m and at most n x's (stingy, takes as few as possible) |
x{n,} | At least n x's (greedy, takes as many as possible) |
x{n,}? | At least n x's (stingy, takes as few as possible) |
See more about greedy and stingy matching.
(expression)
Use parentheses to group things together for use with operators * + ? | and to remember matched patterns (see Subexpressions).
(?:expression)
Same as (expression), but doesn't create a backreference (matched pattern) like (expression) does.
. | Any single character except a newline. Exception: When reSingleLine is set, accepts a newline as well. |
\n | A newline. Depending on the setting of Const NewLine, one of: ASCII 13 & ASCII 10 (default), ASCII 13, or ASCII 10 |
\r | A return (ASCII 13) |
\f | A form feed (ASCII 10) |
\t | A tab (ASCII 9) |
\a | Alarm bell (ASCII 7) |
\b | Backspace character (ASCII 8, inside [ ] only) |
\e | Escape character (ASCII 27) |
\cA | Control character. Examples: \cA = Ctrl-A (ASCII 1), \cZ = Ctrl-Z (ASCII 26) |
\w | Any alphanumeric (word) character. By default, the same as [a-zA-Z0-9_¡-ÿ], that is, underscore, all numbers and Unicode characters 00A1-00FF. Note that the range 00A1-00FF includes some punctuation characters and some extended Latin letters.
If you want to use the Perl default [a-zA-Z0-9_] without Unicode 00A1-00FF, set #Const ExtendedCharacters = False in (declarations) of RegExpr.Bas. |
\W | Any non-word character. The same as [^\w] |
\d | Any digit. The same as [0-9] |
\D | Any non-digit. The same as [^0-9] |
\s | Any whitespace character: space, tab, form feed, return, or newline. |
\S | Any non-whitespace character |
\x## | Unicode (ASCII) ## in hexadecimal. Example: \x40 matches @ |
\0### | Unicode (ASCII) ## in octal. Example: \0100 matches @, \0 matches the null character (ASCII 0) |
\ | Escape character, used to match special characters. Because the characters + * ? . $ ^ | \ [ ] ( ) { } have a special meaning in regular expressions, you must precede them with a backslash \ to match themselves . Examples: \$ matches $, \( matches (, \\ matches \ etc. |
\Q | Quote. Disable special characters until \E. Example: \Q*.*\E matches "*.*" but nothing else. |
Zero-width assertions don't consume the text they match.
\A | Beginning of string |
\Z | End of string, or before newline at end-of-string (newline at end-of-string remains unmatched) |
\b | A word boundary, outside [] only. A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order). Start and end of string count as a \W. |
\b matches any lower-case word. | |
\B | No word boundary, outside [] only. |
matches any word beginning The or the that is at least 4 characters long. |
The assertions ^ and $ depend on the reMultiline flag. The default is:
^ | Beginning of string |
$ | End of string, or before newline at end-of-string (newline at end-of-string remains unmatched) |
If flag reMultiline is set:
^ | Beginning of string or line |
$ | End of string or line |
Lookaheads are zero-width assertions that ensure that what follows must or must not match a given regular expression. Lookaheads don't consume the input.
(?=expression)
What follows must match expression.
Example: /\w+(?=\t)/ matches a word followed by a tab.
(?!expression)
What follows must not match expression.
Example: /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".
Note that there are no lookbehinds. /(?!foo)bar/ will not find an occurrence of "bar" that is preceded by something which is not "foo". That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match.
\1 | Match the first subexpression. For example, "(\w+) \1" matches any repeated word with a space between them. |
\2...\9 | Match the second etc. subexpression. At most 9 subexpressions can be matched like this. |
Backreferences work only after the subexpression has been matched, that is, a backreference must be located after the corresponding ( )
Square brackets are used to match any one of the characters inside them. [abc] matches any of a, b, or c.
You can also use + ? * after square brackets:
[abc]+ matches any combination of a, b and c.
A hyphen indicates "between" in ASCII order: [a-c]
Don't use \n in "between" conditions, use \r and \f instead.
A carat at the beginning means "not": [^d-z]
You can also use most special expressions inside []. However, the expressions + * ? . $ \B | ( ) \1 don't have any special meaning inside [ ]. In addition, \b is ASCII 8, not a word boundary.
If you want to match ], ^ or - inside square brackets, use the escape character: \], \^ or \-.
A vertical bar | represents an or
operator. Parentheses (...) can be used to group things together:
Jesse|Peter|Samuel | Any of Jesse, Peter, and Samuel. |
(0|1)+ | Any string of 0's and 1's. |
(?#Text)
A comment that is ignored.
Left to right. Regular expressions always take the first string that matches, starting from the left. Out of or
'ed expressions, the leftmost one is tried first.
RegExpr uses Unicode strings. Character values and ranges are expressed in Unicode. In VB, the functions ChrW and AscW are compatible with RegExpr, while Chr and Asc are not.