$%*$$11
Replacement text flavors: .NET, JavaScript |
$%\*$1\1
Replacement text flavor: Java |
$%*$1\1
Replacement text flavor: PHP |
$%*$1\1
Replacement text flavor: Perl |
$%*$1\1
Replacement text flavors: Python, Ruby |
This recipe shows you the different escape rules used by the various replacement text flavors. The only two characters you may ever need to escape in the replacement text are the dollar sign and the backslash. The escape characters are also the dollar sign and the backslash.
The percentage sign and asterisk in this example are always
literal characters, though a preceding backslash may be treated as an
escape instead of a literal backslash. «$1
» and/or «1
» are a backreference to a capturing
group. Recipe 2.21 tells you which
flavors use which syntax for backreferences.
The fact that this problem has five different solutions for seven replacement text flavors demonstrates that there really is no standard for replacement text syntax.
.NET and JavaScript always treat a backslash as a literal character. Do not escape it with another backslash, or you’ll end up with two backslashes in the replacement.
A lone dollar sign is a literal character. Dollar signs need to be escaped only when they are followed by a digit, ampersand, backtick, straight quote, underscore, plus sign, or another dollar sign. To escape a dollar sign, precede it with another dollar sign. You can double up all dollar signs if you feel that makes your replacement text more readable. This solution is equally valid:
$$%*$$11
Replacement text flavors: .NET, JavaScript |
.NET and XRegExp also require dollar signs followed by an
opening curly brace to be escaped. «${group}
» is a named
backreference in .NET and XRegExp. Standard JavaScript without the
XRegExp library does not support named backreferences.
In Java, the backslash is used to escape backslashes and dollar signs in the replacement text. All literal backslashes and all literal dollar signs must be escaped. If you do not escape them, Java will throw an exception.
PHP requires backslashes followed by a digit, and dollar signs followed by a digit or opening curly brace, to be escaped with a backslash.
A backslash also escapes another backslash. Thus, you need to
write «\\
»
to replace with two literal backslashes. All other backslashes are
treated as literal backslashes.
Perl is a bit different from the other replacement text
flavors: it does not really have a replacement text flavor. Whereas
the other programming languages have special logic in their
search-and-replace routines to substitute things such as «$1
», in Perl that’s
just normal variable interpolation. In the replacement text, you need
to escape all literal dollar signs with a backslash, just as you would
in any double-quoted string.
One exception is that Perl does support the «1
» syntax for
backreferences. Thus, you need to escape a backslash followed by a
digit if you want the backslash to be a literal. A backslash followed
by a dollar sign also needs to be escaped, to prevent the backslash
from escaping the dollar sign.
A backslash also escapes another backslash. Thus, you need to
write «\\
»
to replace with two literal backslashes. All other backslashes are
treated as literal backslashes.
The dollar sign has no special meaning in the replacement text in Python and Ruby. Backslashes need to be escaped with another backslash when followed by a character that gives the backslash a special meaning.
With Python, «1
» through «9
» and «g<
» create backreferences. These
backslashes need to be escaped.
For Ruby, you need to escape a backslash followed by a digit, ampersand, backtick, straight quote, or plus sign.
In both languages, a backslash also escapes another backslash.
Thus, you need to write «\\
» to include two literal
backslashes in replacement text. All other backslashes are treated as
literal backslashes.
Remember that in this chapter, we deal only with the regular expressions and replacement text themselves. The next chapter covers programming languages and string literals.
The replacement texts shown earlier will work when the actual
string variable you’re passing to the replace()
function
holds this text. In other words, if your application provides a text
box for the user to type in the replacement text, these solutions show
what the user would have to type in order for the search-and-replace
to work as intended. If you test your search-and-replace commands with
RegexBuddy or another regex tester, the replacement texts included in
this recipe will show the expected results.
But these same replacement texts will not work if you paste them directly into your source code and put quote characters around them. String literals in programming languages have their own escape rules, and you need to follow those rules on top of the replacement text escape rules. You may indeed end up with a mess of backslashes.
Recipe 3.14 shows how to add a search-and-replace to source code.