Match any contiguous sequence of 10 digits, such as
1234567890
.
Convert the sequence into a nicely formatted phone number—for example,
(123) 456-7890
.
Recipe 2.10 explains how you can use capturing groups in your regular expression to match the same text more than once. The text matched by each capturing group in your regex is also available after each successful match. You can insert the text of some or all capturing groups—in any order, or even more than once—into the replacement text.
Some flavors, such as Python and Ruby, use the same «1
» syntax for
backreferences in both the regular expression and the replacement
text. Other flavors use Perl’s «$1
» syntax, using a dollar sign
instead of a backslash. PHP supports both.
In Perl, «$1
» and above are actually variables
that are set after each successful regex match. You can use them
anywhere in your code until the next regex match. .NET, Java,
JavaScript, and PHP support «$1
» only in the replacement syntax.
These programming languages do offer other ways to access capturing
groups in code. Chapter 3 explains that in
detail.
All regex flavors in this book support up to 99
capturing groups in a regular expression. In the replacement text,
ambiguity can occur with «$10
» or «10
» and above. These can be
interpreted as either the 10th capturing group, or the first capturing
group followed by a literal zero.
.NET, XRegExp, PHP, and Perl allow you to put curly braces
around the number to make your intention clear. «${10}
» is always the
10th capturing group, and «${1}0
» is always the first followed by
a literal zero.
Java and JavaScript try to be clever with «$10
». If a capturing
group with the specified two-digit number exists in your regular
expression, both digits are used for the capturing group. If fewer
capturing groups exist, only the first digit is used to reference the
group, leaving the second as a literal. Thus «$23
» is the 23rd capturing group, if
it exists. Otherwise, it is the second capturing group followed by a
literal «3
».
.NET, XRegExp, PHP, Perl, Python, and Ruby always treat «$10
» and «10
» as the 10th
capturing group, regardless of whether it exists. If it doesn’t, the
behavior for nonexistent groups comes into play.
The regular expression in the solution for this recipe
has three capturing groups. If you type «$4
» or «4
» into the replacement text, you’re
adding a reference to a capturing group that does not exist. This
triggers one of three different behaviors.
Java, XRegExp, and Python will cry foul by raising an exception
or returning an error message. Do not use invalid backreferences with
these flavors. (Actually, you shouldn’t use invalid backreferences
with any flavor.) If you want to insert «$4
» or «4
» literally, escape the dollar sign
or backslash. Recipe 2.19 explains this
in detail.
PHP, Perl, and Ruby substitute all backreferences in the replacement text, including those that point to groups that don’t exist. Groups that don’t exist did not capture any text and therefore references to these groups are simply replaced with nothing.
Finally, .NET and JavaScript (without XRegExp) leave backreferences to groups that don’t exist as literal text in the replacement.
All flavors do replace groups that do exist in the regular expression but did not capture anything. Those are replaced with nothing.
(?<area>d{3})(?<exchange>d{3})(?<number>d{4})
Regex options: None |
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9 |
(?'area'd{3})(?'exchange'd{3})(?'number'd{4})
Regex options: None |
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9 |
(?P<area>d{3})(?P<exchange>d{3})(?P<number>d{4})
Regex options: None |
Regex flavors: PCRE, Perl 5.10, Python |
(${area})●${exchange}-${number}
Replacement text flavors: .NET, Java 7, XRegExp |
(g<area>)●g<exchange>-g<number>
Replacement text flavor: Python |
(k<area>)●k<exchange>-k<number>
Replacement text flavor: Ruby 1.9 |
(k'area')●k'exchange'-k'number'
Replacement text flavor: Ruby 1.9 |
($+{area})●$+{exchange}-$+{number}
Replacement text flavor: Perl 5.10 |
($1)●$2-$3
Replacement text flavor: PHP |
.NET, Java 7, XRegExp, Python, and Ruby 1.9 allow you to use named backreferences in the replacement text if you used named capturing groups in your regular expression. The syntax for named backreferences in the replacement text differs from that in the regular expression.
Ruby uses the same syntax for backreferences in the replacement
text as it does in the regular
expression. For named capturing groups in Ruby 1.9, this syntax is
«k<group>
» or «k'group'
». The choice
between angle brackets and single quotes is merely a notational
convenience.
Perl 5.10 and later store the text matched by named capturing
groups into the hash %+
. You can
get the text matched by the group “name” with $+{name}
. Perl interpolates variables in the
replacement text, so you can treat «$+{name}
» as a named backreference in
the replacement text.
PHP (using PCRE) supports named capturing groups in regular expressions, but not in the replacement text. You can use numbered backreferences in the replacement text to named capturing groups in the regular expression. PCRE assigns numbers to both named and unnamed groups, from left to right.
.NET, Java 7, XRegExp, Python, and Ruby 1.9 also allow numbered references to named groups. However, .NET uses a different numbering scheme for named groups, as Recipe 2.11 explains. Mixing names and numbers with .NET, Java 7, XRegExp, Python, or Ruby is not recommended. Either give all your capturing groups names or don’t name any groups at all. Always use named backreferences for named groups.
Recipe 2.9 explains the capturing groups that backreferences refer to.
Recipe 2.11 explains named capturing groups. Naming the groups in your regex and the backreferences in your replacement text makes them easier to read and maintain.
Search and Replace with Regular Expressions in Chapter 1 describes the various replacement text flavors.
Recipe 2.10 shows how to use backrefreences in the regular expression itself. The syntax is different than for backreferences in the replacement text.
Recipe 3.15 explains how to use replacement text in source code.