Each character matches itself, unless it is one of the special characters +?.*^$()[{|
. The special meaning of these characters can be escaped using a .
The multiline and single-line modes are discussed in the section Chapter 23.
.
Matches any character, but not a newline. In singleline mode, matches newlines as well.
( . . . )
Groups a series of pattern elements to a single element. The text the group matches is captured for later use. It is also assigned immediately to $^N
to be used during the match, e.g., in a (?{ ... })
.
^
Matches the beginning of the target. In multiline mode, also matches after every newline character.
$
Matches the end of the line, or before a final newline character. In multiline mode, also matches before every newline character.
[
. . . ]
Denotes a class of characters to match. [^
. . . ]
negates the class.
|
... |
...Matches the alternatives from left to right, until one succeeds.
(?#
text )
Comment.
(?
[ modifier ] : pattern )
Acts like (
pattern)
but does not capture the text it matches. modifier can be one or more of i
, m
, s
, or x
. Modifiers can be switched off by preceding the letter(s) with a minus sign, e.g., si-xm
. See page 37 for the meaning of the modifiers.
(?=
pattern )
Zero-width positive look-ahead assertion.
(?!
pattern )
Zero-width negative look-ahead assertion.
(?<=
pattern )
Zero-width positive look-behind assertion.
(?<!
pattern )
Zero-width negative look-behind assertion.
(?{
code })
Executes Perl code while matching. Always succeeds with zero width. Can be used as the condition in a conditional pattern selection. If not, the result of executing code is stored in $^R
.
(??{
code })
Executes Perl code while matching. Interprets the result as a pattern.
(?>
pattern )
Like (?:
pattern )
, but prevents backtracking inside.
(?(
cond )
ptrue [ |
pfalse ] )
Selects a pattern depending on the condition. cond should be the number of a parenthesized subpattern, or one of the zero-width look-ahead, look-behind, and evaluate assertions.
(?
modifier )
Embedded pattern-match modifier. modifier can be one or more of i
, m
, s
, or x
. Modifiers can be switched off by preceding the letter(s) with a minus sign, e.g., (?si-xm)
.
Quantified subpatterns match as many times as possible. When followed with a ?
they match the minimum number of times. These are the quantifiers:
| Matches the preceding pattern element one or more times. |
| Matches zero or one times. |
| Matches zero or more times. |
| Denotes the minimum n and maximum m match count. |
Patterns are processed as double-quoted strings, so standard string escapes have their usual meaning (see Chapter 6). An exception is ,
which matches word boundaries, except in a character class, where it denotes a backspace again.
A escapes any special meaning of nonalphanumeric characters, but it turns most alphanumeric characters into something special:
| Refer to matched subexpressions, grouped with |
| Matches alphanumeric plus |
| Matches whitespace. |
| Matches numeric. |
| Matches the beginning of the string. |
Matches the end of the string or before a newline at the end of the string. | |
| Matches the physical end of the string. |
Matches word boundaries. | |
| Matches where the previous search with a |
| Matches a named property. |
| Matches extended Unicode combining character sequence. |
| Matches a single 8-bit byte. |
|
POSIX classes are used inside character classes, like [[:alpha:]]
. These are the POSIX classes and their Unicode property names:
[:alpha:] p{IsAlpha}
Matches one alphabetic character.
[:alnum:] p{IsAlnum}
Matches one alphanumeric character.
[:ascii:] p{IsASCII}
Matches one ASCII character.
[:blank:] p{IsSpace}
Matches one whitespace character, almost like s
.
[:cntrl:] p{IsCntrl}
[:digit:] p{IsDigit}
Matches one numeric character, like d
.
[:graph:] p{IsGraph}
Matches one alphanumeric or punctuation character.
[:lower:] p{IsLower}
Matches one lowercase character.
[:print:] p{IsPrint}
Matches one alphanumeric or punctuation character or space character.
[:punct:] p{IsPunct}
Matches one punctuation character.
[:space:] p{IsSpace}
Matches one whitespace character, almost like s
.
[:upper:] p{IsUpper}
Matches one uppercase character.
[:word:] p{IsWord}
Matches one word character, like w
.
[:xdigit:] p{IsXDigit}
Matches one hexadecimal digit.
In general, the "Is"
prefix may be omitted for property names.
The equivalent of s
is p{IsSpacePerl}
.
The POSIX classes can be negated with a ^
, e.g., [:^print:]
, the named properties by using P,
e.g., P{IsPrint}
.
See also $1. . . $9
, $+
, $`
, $&
, $'
, $^R
, and $^N
, and @-
and @+
.
With modifier x
, whitespace and comments can be embedded in the patterns.
Regular expression patterns can be compiled and used as values with the qr quoting operator: qr/
string/modifiers compiles string as a pattern according to the (optional) modifiers, and returns the compiled pattern as a scalar value.