3.16. Replace Matches with Replacements Generated in Code

Problem

You want to replace all matches of a regular expression with a new string that you build up in procedural code. You want to be able to replace each match with a different string, based on the text that was actually matched.

For example, suppose you want to replace all numbers in a string with the number multiplied by two.

Solution

C#

You can use the static call when you process only a small number of strings with the same regular expression:

string resultString = Regex.Replace(subjectString, @"d+",
                      new MatchEvaluator(ComputeReplacement));

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Regex regexObj = new Regex(@"d+");
string resultString = regexObj.Replace(subjectString,
                      new MatchEvaluator(ComputeReplacement));

Both code snippets call the function ComputeReplacement. You should add this method to the class in which you’re implementing this solution:

public String ComputeReplacement(Match matchResult) {
    int twiceasmuch = int.Parse(matchResult.Value) * 2;
    return twiceasmuch.ToString();
}

VB.NET

You can use the static call when you process only a small number of strings with the same regular expression:

Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = Regex.Replace(SubjectString, "d+", MyMatchEvaluator)

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Dim RegexObj As New Regex("d+")
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = RegexObj.Replace(SubjectString, MyMatchEvaluator)

Both code snippets call the function ComputeReplacement. You should add this method to the class in which you’re implementing this solution:

Public Function ComputeReplacement(ByVal MatchResult As Match) As String
    Dim TwiceAsMuch = Int.Parse(MatchResult.Value) * 2;
    Return TwiceAsMuch.ToString();
End Function

Java

StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    Integer twiceasmuch = Integer.parseInt(regexMatcher.group()) * 2;
    regexMatcher.appendReplacement(resultString, twiceasmuch.toString());
}
regexMatcher.appendTail(resultString);

JavaScript

var result = subject.replace(/d+/g, function(match) {
    return match * 2;
});

PHP

Using a declared callback function:

$result = preg_replace_callback('/d+/', 'compute_replacement', $subject);

function compute_replacement($groups) {
    return $groups[0] * 2;
}

Using an anonymous callback function:

$result = preg_replace_callback(
    '/d+/',
    create_function(
        '$groups',
        'return $groups[0] * 2;'
    ),
    $subject
);

Perl

$subject =~ s/d+/$& * 2/eg;

Python

If you have only a few strings to process, you can use the global function:

result = re.sub(r"d+", computereplacement, subject)

To use the same regex repeatedly, use a compiled object:

reobj = re.compile(r"d+")
result = reobj.sub(computereplacement, subject)

Both code snippets call the function computereplacement. This function needs to be declared before you can pass it to sub().

def computereplacement(matchobj):
    return str(int(matchobj.group()) * 2)

Ruby

result = subject.gsub(/d+/) {|match|
    Integer(match) * 2
}

Discussion

When using a string as the replacement text, you can do only basic text substitution. To replace each match with something totally different that varies along with the match being replaced, you need to create the replacement text in your own code.

C#

Recipe 3.14 discusses the various ways in which you can call the Regex.Replace() method, passing a string as the replacement text. When using a static call, the replacement is the third parameter, after the subject and the regular expression. If you passed the regular expression to the Regex() constructor, you can call Replace() on that object with the replacement as the second parameter.

Instead of passing a string as the second or third parameter, you can pass a MatchEvaluator delegate. This delegate is a reference to a member function that you add to the class where you’re doing the search-and-replace. To create the delegate, use the new keyword to call the MatchEvaluator() constructor. Pass your member function as the only parameter to MatchEvaluator().

The function you want to use for the delegate should return a string and take one parameter of class System.Text.RegularExpressions.Match. This is the same Match class returned by the Regex.Match() member used in nearly all the previous recipes in this chapter.

When you call Replace() with a MatchEvaluator as the replacement, your function will be called for each regular expression match that needs to be replaced. Your function needs to return the replacement text. You can use any of the properties of the Match object to build your replacement text. The example shown earlier uses matchResult.Value to retrieve the string with the whole regex match. Often, you’ll use matchResult.Groups[] to build up your replacement text from the capturing groups in your regular expression.

If you do not want to replace certain regex matches, your function should return matchResult.Value. If you return null or an empty string, the regex match is replaced with nothing (i.e., deleted).

VB.NET

Recipe 3.14 discusses the various ways in which you can call the Regex.Replace() method, passing a string as the replacement text. When using a static call, the replacement text is the third parameter, after the subject and the regular expression. If you used the Dim keyword to create a variable with your regular expression, you can call Replace() on that object with the replacement as the second parameter.

Instead of passing a string as the second or third parameter, you can pass a MatchEvaluator object. This object holds a reference to a function that you add to the class where you’re doing the search-and-replace. Use the Dim keyword to create a new variable of type MatchEvaluator. Pass one parameter with the AddressOf keyword followed by the name of your member function. The AddressOf operator returns a reference to your function, without actually calling the function at that point.

The function you want to use for MatchEvaluator should return a string and should take one parameter of class System.Text.RegularExpressions.Match. This is the same Match class returned by the Regex.Match() member used in nearly all the previous recipes in this chapter. The parameter will be passed by value, so you have to declare it with ByVal.

When you call Replace() with a MatchEvaluator as the replacement, your function will be called for each regular expression match that needs to be replaced. Your function needs to return the replacement text. You can use any of the properties of the Match object to build your replacement text. The example uses MatchResult.Value to retrieve the string with the whole regex match. Often, you’ll use MatchResult.Groups() to build up your replacement text from the capturing groups in your regular expression.

If you do not want to replace certain regex matches, your function should return MatchResult.Value. If you return Nothing or an empty string, the regex match is replaced with nothing (i.e., deleted).

Java

The Java solution is very straightforward. We iterate over all the regex matches as explained in Recipe 3.11. Inside the loop, we call appendReplacement() on our Matcher object. When find() fails to find any further matches, we call appendTail(). The two methods appendReplacement() and appendTail() make it very easy to use a different replacement text for each regex match.

appendReplacement() takes two parameters. The first is the StringBuffer where you’re (temporarily) storing the result of the search-and-replace in progress. The second is the replacement text to be used for the last match found by find(). This replacement text can include references to capturing groups, such as "$1". If there is a syntax error in your replacement text, an IllegalArgumentException is thrown. If the replacement text references a capturing group that does not exist, an IndexOutOfBoundsException is thrown instead. If you call appendReplacement() without a prior successful call to find(), it throws an IllegalStateException.

If you call appendReplacement() correctly, it does two things. First, it copies the text located between the previous and current regex match to the string buffer, without making any modifications to the text. If the current match is the first one, it copies all the text before that match. After that, it appends your replacement text, substituting any backreferences in it with the text matched by the referenced capturing groups.

If you want to delete a particular match, simply replace it with an empty string. If you want to leave a match in the string unchanged, you can omit the call to appendReplacement() for that match. By “previous regex match,” We mean the previous match for which you called appendReplacement(). If you don’t call appendReplacement() for certain matches, those become part of the text between the matches that you do replace, which is copied unchanged into the target string buffer.

When you’re done replacing matches, call appendTail(). That copies the text at the end of the string after the last regex match for which you called appendReplacement().

JavaScript

In JavaScript, a function is really just another object that can be assigned to a variable. Instead of passing a literal string or a variable that holds a string to the string.replace() function, we can pass a function that returns a string. This function is then called each time a replacement needs to be made.

You can make your replacement function accept one or more parameters. If you do, the first parameter will be set to the text matched by the regular expression. If your regular expression has capturing groups, the second parameter will hold the text matched by the first capturing group, the third parameter gives you the text of the second capturing group, and so on. You can set these parameters to use bits of the regular expression match to compose the replacement.

The replacement function in the JavaScript solution for this recipe simply takes the text matched by the regular expression, and returns it multiplied by two. JavaScript handles the string-to-number and number-to-string conversions implicitly.

PHP

The preg_replace_callback() function works just like the preg_replace() function described in Recipe 3.14. It takes a regular expression, replacement, subject string, optional replacement limit, and optional replacement count. The regular expression and subject string can be single strings or arrays.

The difference is that preg_replace_callback() expects the second parameter to be a function rather than the actual replacement text. If you declare the function in your code, then the name of the function must be passed as a string. Alternatively, you can pass the result of create_function() to create an anonymous function. Either way, your replacement function should take one parameter and return a string (or something that can be coerced into a string).

Each time preg_replace_callback() finds a regex match, it will call your callback function. The parameter will be filled with an array of strings. Element zero holds the overall regex match, and elements one and beyond hold the text matched by capturing groups one and beyond. You can use this array to build up your replacement text using the text matched by the regular expression or one or more capturing groups.

Perl

The s/// operator supports one extra modifier that is ignored by the m// operator: /e. The /e, or “execute,” modifier tells the substitution operator to execute the replacement part as Perl code, instead of interpreting it as the contents of a double-quoted string. Using this modifier, we can easily retrieve the matched text with the $& variable, and then multiply it by two. The result of the code is used as the replacement string.

Python

Python’s sub() function allows you to pass the name of a function instead of a string as the replacement text. This function is then called for each regex match to be replaced.

You need to declare this function before you can reference it. It should take one parameter to receive a MatchObject instance, which is the same object returned by the search() function. You can use it to retrieve (part of) the regex match to build your replacement. See Recipe 3.7 and Recipe 3.9 for details.

Your function should return a string with the replacement text.

Ruby

The previous two recipes called the gsub() method of the String class with two parameters: the regex and the replacement text. This method also exists in block form.

In block form, gsub() takes your regular expression as its only parameter. It fills one iterator variable with a string that holds the text matched by the regular expression. If you supply additional iterator variables, they are set to nil, even if your regular expression has capturing groups.

Inside the block, place an expression that evaluates to the string that you want to use as the replacement text. You can use the special regex match variables, such as $~, $&, and $1, inside the block. Their values change each time the block is evaluated to make another replacement. See Recipes 3.7, 3.8, and 3.9 for details.

You cannot use replacement text tokens such as «1». Those remain as literal text.

See Also

Recipe 3.9 shows code to get the text matched by a particular part (capturing group) of a regex.

Recipe 3.15 shows code to make a search-and-replace reinsert parts of the text matched by the regular expression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset