RegExpr Help › Syntax
x* | Zero or more x's (greedy, take as many as possible) |
x*? | Zero or more x's (stingy, take as few as possible) |
x+ | One or more x's (greedy, take as many as possible) |
x+? | One or more x's (stingy, take as few as possible) |
x? | One or zero x (greedy, try one first) |
x?? | Zero or one x (stingy, try zero first) |
x{n} | Exactly n x's |
x{m,n} | At least m and at most n x's (greedy, take as many as possible) |
x{m,n}? | At least m and at most n x's (stingy, take as few as possible) |
x{,n} | At most n x's (greedy, take as many as possible) |
x{,n}? | At most n x's (stingy, take as few as possible) |
x{n,} | At least n x's (greedy, take as many as possible) |
x{n,}? | At least n x's (stingy, take as few as possible) |
m and n should be integers in the range 0..999.
See more about greedy and stingy matching.
(expression)
Use parentheses to group things together for use with operators * + ? | and to remember matched patterns (see Subexpressions).
(?:expression)
Same as (expression), but doesn't create a backreference (matched pattern) like (expression) does.
The vertical bar | is an or
operator. Use parentheses (..|..) to group things together.
Jesse|Peter|Samuel | Any of Jesse, Peter, and Samuel. |
(0|1)+ | Any string of 0's and 1's. |
The leftmost alternative is tried first. If it doesn't match, the second one is tried and so on.
. | Any single character except a newline. Exception: When flag reSingleLine is set, accepts a newline as well. |
\n | A newline. Depending on the setting of Const NewLine , one of:CR LF pair (ASCII 13 & 10). This is the default. Windows-style line endings. CR (ASCII 13) when Const NewLine = vbCr LF (ASCII 10) when Const NewLine = vbLf
|
\a | Alarm bell (ASCII 7) |
\b | Backspace character (ASCII 8, inside [ ] only) |
\cX | Control character Ctrl+X. Allowed control characters:
\c@ = ASCII 0 \cA .. \cZ = ASCII 1..26 \ca .. \cz = ASCII 1..26 \c[ = ASCII 27 \c\ = ASCII 28 \c] = ASCII 29 \c^ = ASCII 30 \c\ = ASCII 31 \c? = ASCII 127 |
\e | Escape character (ASCII 27) |
\f | A form feed (ASCII 12) |
\r | A return (ASCII 13) |
\t | A tab (ASCII 9) |
\d | Any digit. The same as [0-9] |
\D | Any non-digit. The same as [^0-9] |
\s | Any whitespace character: space, tab, return (CR), line feed (LF) or \n (newline).
Note 1: When \n is the pair CR LF, \s matches CR LF as a priority instead of matching just CR. Note 2: Whitespace doesn't include other whitespace characters such as form feed (\f) or vertical tab. |
\S | Any non-whitespace character. The same as [^\s] except \n is not special. |
\w | Any alphanumeric (word) character. The default is [a-zA-Z0-9_], that is one of: ASCII letter, digit or underscore.
Set #Const ExtendedCharacters = True in the (declarations) section of RegExpr.Bas for an extended set of word characters: [a-zA-Z0-9_ˇ-˙]. This adds Unicode range 00A1-00FF, which contains many national letters. |
\W | Any non-word character. The same as [^\w] |
\x## | Unicode (ASCII) ## in hexadecimal. 1 or 2 hex digits in range \x0 .. \xFF. Example: \xA matches ASCII 10 (LF) and \x40 matches @. Uppercase \xFF and lowercase \xff are both accepted. |
\0### | Unicode (ASCII) ### in octal. The first digit is always zero. 1 to 4 digits in range \0 .. \0377. Example: \012 matches ASCII 10 (LF) and \0100 matches @. |
\0 | The null character, octal 0 (ASCII 0) |
\ | Escape character to quote special characters. Because the characters + * ? . $ ^ | \ [ ] ( ) { } have a special meaning in regular expressions, you must precede them with a backslash \ to match themselves . Example: \$ matches $, \( matches (, \\ matches \ etc. |
\Q | Quote. Disable special characters until \E. Example: \Q.+\E matches literal text .+ and not just any character sequence. |
Zero-width assertions don't consume the text they match.
\A | Beginning of string |
\Z | Either 1) end of string or 2) before newline at end-of-string; in the latter case the newline remains unmatched |
\z | End of string |
\b | A word boundary, outside [] only. A word boundary (\b) is defined as a spot between two characters that has a \w on one side and a \W on the other side of it (in either order). Start and end of string count as a \W. |
Example: | |
\B | Not word boundary, outside [] only. |
Example: |
The assertions ^ and $ depend on the reMultiline flag. The default is:
^ | Beginning of string |
$ | Either 1) end of string or 2) before newline at end-of-string; in the latter case the newline remains unmatched |
If flag reMultiline is set:
^ | Beginning of string or line |
$ | End of string or line |
Lookaheads are zero-width assertions that ensure that what follows must or must not match a given regular expression. Lookaheads don't consume the input.
(?=expression)
What follows must match expression.
Example: \w+(?=\t) matches a word followed by a tab. The tab is not included in the match.
(?!expression)
What follows must not match expression.
Example: foo(?!bar) matches any occurrence of foo that isn't followed by bar.
Note that there are no lookbehinds. (?!foo)bar will not find an occurrence of bar that is preceded by something which is not foo. That's because the (?!foo) is just saying that the next thing can't be foo. And it's not, it's bar, so foobar will match.
\1 | Match the first subexpression. Expression (\w+) \1 matches any word followed by a space and the same word again. |
\2...\9 | Match the second etc. subexpression. At most 9 subexpressions can be matched like this. |
Backreferences work only after the subexpression has been matched. That is, a backreference must be located after the corresponding corresponding (expression).
Square brackets are used to match any one of the characters inside them.
[abc] | Match any of a, b or c. |
[a-c] | Match any character between a and c, that is, [abc]. The characters are in Unicode (ASCII) order. |
[^abc] | A carat at the beginning means "not". Match anything not in the group: anything else than a, b or c. |
[^d-z] | Matching anything else than [d-z]. |
[abc]+ | Quantifiers after square brackets match any combination of the characters so many times. So, [abc]+ matches any combination of a, b and c one or more times. |
You can use most special expressions inside [ ]. The expressions + * ? . $ \B | ( ) \1 don't have any special meaning inside [ ]. As an example, [.+] matches a dot or a plus sign and [()] matches an opening or a closing parenthesis. In addition, \b is ASCII 8, not a word boundary.
The characters ^ - ] are special and need to be escaped inside square brackets. You can also use the following techniques:
^ | Put anywhere else than the beginning or escape it: [a^c], [\^ac] will match a, c or ^ |
- | Put at the beginning or end or escape it: [ac-], [-ac], [a\-c] will match a, c or - |
] | Put at the beginning or escape it: []ac], [ac\]] will match a, c or ] |
\n behaves exceptionally in square brackets when the newline sequence is set to CR LF, which is the default. [\n] matches the pair CR LF in one go. On the other hand, [^\n] matches anything but the pair CR LF, which means it matches just the LF of this pair. The use of [^\n] can thus cause unexpected results. [\n-..] and [..-\n] are not supported when \n is CR LF.
[\s] is exceptional the same way as [\n]: it also matches the pair CR LF in one go. On the other hand, [^\s] is not exceptional. It will not match either CR or LF whether alone or as a pair.
Ranges defined with multi-character classes \s\d\w or \S\D\W are discouraged. They can cause unexpected results. As an example [\d-A] expands to [0-9-A], which matches the digits 0-9, the dash - and the letter A.
(?#Text)
A comment that is ignored.
Left to right. Regular expressions always take the first string that matches, starting from the left.
RegExpr uses Unicode strings. Character values and ranges are expressed in Unicode. VB/VBA functions ChrW() and AscW() are compatible with RegExpr. Functions Chr() and Asc() are not entirely compatible. They are compatible in the ASCII 0..127 range, however.
Tutorial
VB functions in RegExpr
©Aivosto Oy · RegExpr Help