You want to replace all matches of a regular expression with a new string that you build up in procedural code. You want to be able to replace each match with a different string, based on the text that was actually matched.
For example, suppose you want to replace all numbers in a string with the number multiplied by two.
You can use the static call when you process only a small number of strings with the same regular expression:
string resultString = Regex.Replace(subjectString, @"d+", new MatchEvaluator(ComputeReplacement));
Construct a Regex
object if you want to use the same regular expression with a large
number of strings:
Regex regexObj = new Regex(@"d+"); string resultString = regexObj.Replace(subjectString, new MatchEvaluator(ComputeReplacement));
Both code snippets call the function ComputeReplacement
. You should add this method
to the class in which you’re implementing this solution:
public String ComputeReplacement(Match matchResult) { int twiceasmuch = int.Parse(matchResult.Value) * 2; return twiceasmuch.ToString(); }
You can use the static call when you process only a small number of strings with the same regular expression:
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement) Dim ResultString = Regex.Replace(SubjectString, "d+", MyMatchEvaluator)
Construct a Regex
object if you want to use the same regular expression with a large
number of strings:
Dim RegexObj As New Regex("d+") Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement) Dim ResultString = RegexObj.Replace(SubjectString, MyMatchEvaluator)
Both code snippets call the function ComputeReplacement
. You should add this method
to the class in which you’re implementing this solution:
Public Function ComputeReplacement(ByVal MatchResult As Match) As String Dim TwiceAsMuch = Int.Parse(MatchResult.Value) * 2; Return TwiceAsMuch.ToString(); End Function
StringBuffer resultString = new StringBuffer(); Pattern regex = Pattern.compile("\d+"); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { Integer twiceasmuch = Integer.parseInt(regexMatcher.group()) * 2; regexMatcher.appendReplacement(resultString, twiceasmuch.toString()); } regexMatcher.appendTail(resultString);
Using a declared callback function:
$result = preg_replace_callback('/d+/', 'compute_replacement', $subject); function compute_replacement($groups) { return $groups[0] * 2; }
Using an anonymous callback function:
$result = preg_replace_callback( '/d+/', create_function( '$groups', 'return $groups[0] * 2;' ), $subject );
If you have only a few strings to process, you can use the global function:
result = re.sub(r"d+", computereplacement, subject)
To use the same regex repeatedly, use a compiled object:
reobj = re.compile(r"d+") result = reobj.sub(computereplacement, subject)
Both code snippets call the function computereplacement
. This function needs to be
declared before you can pass it to sub()
.
def computereplacement(matchobj): return str(int(matchobj.group()) * 2)
When using a string as the replacement text, you can do only basic text substitution. To replace each match with something totally different that varies along with the match being replaced, you need to create the replacement text in your own code.
Recipe 3.14 discusses the
various ways in which you can call the Regex.Replace()
method, passing a string as the
replacement text. When using a static call, the replacement is the
third parameter, after the subject and the regular expression. If you
passed the regular expression to the Regex()
constructor, you can call Replace()
on that object with the replacement as the second parameter.
Instead of passing a string as the second or third parameter,
you can pass a MatchEvaluator
delegate. This delegate
is a reference to a member function that you add to the class where
you’re doing the search-and-replace. To create the delegate, use the
new
keyword to call the
MatchEvaluator()
constructor. Pass your member function as the only parameter to
MatchEvaluator()
.
The function you want to use for the delegate should return a
string and take one parameter of class System.Text.RegularExpressions.Match
. This is
the same Match
class
returned by the Regex.Match()
member used in nearly all the
previous recipes in this chapter.
When you call Replace()
with a MatchEvaluator
as the replacement, your function will be called for each regular
expression match that needs to be replaced. Your function needs to
return the replacement text. You can use any of the properties of the
Match
object to build your replacement
text. The example shown earlier uses matchResult.Value
to retrieve the
string with the whole regex match. Often, you’ll use matchResult.Groups[]
to build up
your replacement text from the capturing groups in your regular
expression.
If you do not want to replace certain regex matches, your
function should return matchResult.Value
. If you
return null
or an empty
string, the regex match is replaced with nothing (i.e.,
deleted).
Recipe 3.14 discusses the
various ways in which you can call the Regex.Replace()
method, passing a string as the
replacement text. When using a static call, the replacement text is
the third parameter, after the subject and the regular expression. If
you used the Dim
keyword to create a variable with your regular expression, you can
call Replace()
on that
object with the replacement as the second parameter.
Instead of passing a string as the second or third parameter,
you can pass a MatchEvaluator
object. This object
holds a reference to a function that you add to the class where you’re
doing the search-and-replace. Use the Dim
keyword to create a new variable of type
MatchEvaluator
. Pass
one parameter with the AddressOf
keyword followed by the name of your
member function. The AddressOf
operator returns a reference to your
function, without actually calling the function at that point.
The function you want to use for MatchEvaluator
should return a string and should
take one parameter of class System.Text.RegularExpressions.Match
. This is
the same Match
class
returned by the Regex.Match()
member used in nearly all the
previous recipes in this chapter. The parameter will be passed by
value, so you have to declare it with ByVal
.
When you call Replace()
with a MatchEvaluator
as the replacement, your function will be called for each regular
expression match that needs to be replaced. Your function needs to
return the replacement text. You can use any of the properties of the
Match
object to build
your replacement text. The example uses MatchResult.Value
to retrieve the string with
the whole regex match. Often, you’ll use MatchResult.Groups()
to build up your
replacement text from the capturing groups in your regular
expression.
If you do not want to replace certain regex matches, your
function should return MatchResult.Value
. If you return
Nothing
or an empty
string, the regex match is replaced with nothing (i.e.,
deleted).
The Java solution is very straightforward. We iterate
over all the regex matches as explained in Recipe 3.11. Inside the loop, we call appendReplacement()
on our
Matcher
object. When
find()
fails to find
any further matches, we call appendTail()
. The two methods appendReplacement()
and appendTail()
make it very easy
to use a different replacement text for each regex match.
appendReplacement()
takes two parameters. The
first is the StringBuffer
where you’re (temporarily) storing
the result of the search-and-replace in progress. The second is the
replacement text to be used for the last match found by find()
. This replacement text
can include references to capturing groups, such as "$1"
. If there is a syntax error
in your replacement text, an IllegalArgumentException
is thrown. If the
replacement text references a capturing group that does not exist, an
IndexOutOfBoundsException
is thrown instead. If
you call appendReplacement()
without a prior successful
call to find()
, it
throws an IllegalStateException
.
If you call appendReplacement()
correctly, it does two
things. First, it copies the text located between the previous and
current regex match to the string buffer, without making any
modifications to the text. If the current match is the first one, it
copies all the text before that match. After that, it appends your
replacement text, substituting any backreferences in it with the text
matched by the referenced capturing groups.
If you want to delete a particular match, simply replace it with
an empty string. If you want to leave a match in the string unchanged,
you can omit the call to appendReplacement()
for that match. By “previous
regex match,” We mean the previous match for which you called appendReplacement()
. If you
don’t call appendReplacement()
for certain matches, those
become part of the text between the matches that you do replace, which
is copied unchanged into the target string buffer.
When you’re done replacing matches, call appendTail()
. That copies the
text at the end of the string after the last regex match for which you
called appendReplacement()
.
In JavaScript, a function is really just another object
that can be assigned to a variable.
Instead of passing a literal string or a variable that holds
a string to the string.replace()
function, we can pass a
function that returns a string. This function is then called each time
a replacement needs to be made.
You can make your replacement function accept one or more parameters. If you do, the first parameter will be set to the text matched by the regular expression. If your regular expression has capturing groups, the second parameter will hold the text matched by the first capturing group, the third parameter gives you the text of the second capturing group, and so on. You can set these parameters to use bits of the regular expression match to compose the replacement.
The replacement function in the JavaScript solution for this recipe simply takes the text matched by the regular expression, and returns it multiplied by two. JavaScript handles the string-to-number and number-to-string conversions implicitly.
The preg_replace_callback()
function works just like
the preg_replace()
function described in Recipe 3.14. It takes a regular expression,
replacement, subject string, optional replacement limit, and optional
replacement count. The regular expression and subject string can be
single strings or arrays.
The difference is that preg_replace_callback()
expects the second
parameter to be a function rather than the actual replacement text. If
you declare the function in your code, then the name of the function
must be passed as a string. Alternatively, you can pass the result of
create_function()
to
create an anonymous function. Either way, your replacement function
should take one parameter and return a string (or something that can
be coerced into a string).
Each time preg_replace_callback()
finds a regex match, it
will call your callback function. The parameter will be filled with an
array of strings. Element zero holds the overall regex match, and
elements one and beyond hold the text matched by capturing groups one
and beyond. You can use this array to build up your replacement text
using the text matched by the regular expression or one or more
capturing groups.
The s///
operator supports one extra modifier that is ignored by the m//
operator: /e
. The
/e
, or
“execute,” modifier tells the substitution operator to execute the
replacement part as Perl code, instead of interpreting it as the
contents of a double-quoted string. Using this modifier, we can easily
retrieve the matched text with the $&
variable, and then multiply it by two. The result of the code is used
as the replacement string.
Python’s sub()
function allows you to pass the name of a
function instead of a string as the replacement text. This function is
then called for each regex match to be replaced.
You need to declare this function before you can reference it.
It should take one parameter to receive a MatchObject
instance, which is the same object
returned by the search()
function. You can use it to retrieve (part of) the regex match to
build your replacement. See Recipe 3.7 and
Recipe 3.9 for details.
Your function should return a string with the replacement text.
The previous two recipes called the gsub()
method of the String
class with two parameters: the regex and the replacement text. This
method also exists in block form.
In block form, gsub()
takes your regular expression as its only parameter. It fills one
iterator variable with a string that holds the text matched by the
regular expression. If you supply additional iterator variables, they
are set to nil
, even if
your regular expression has capturing groups.
Inside the block, place an expression that evaluates to the
string that you want to use as the replacement text. You can use the
special regex match variables, such as $~
,
$&
, and $1
, inside the block. Their
values change each time the block is evaluated to make another
replacement. See Recipes 3.7, 3.8,
and 3.9 for details.
You cannot use replacement text tokens such as «1
». Those remain as
literal text.
Recipe 3.9 shows code to get the text matched by a particular part (capturing group) of a regex.
Recipe 3.15 shows code to make a search-and-replace reinsert parts of the text matched by the regular expression.