3.15. Replace Matches Reusing Parts of the Match

Problem

You want to run a search-and-replace that reinserts parts of the regex match back into the replacement. The parts you want to reinsert have been isolated in your regular expression using capturing groups, as described in Recipe 2.9.

For example, you want to match pairs of words delimited by an equals sign, and swap those words in the replacement.

Solution

C#

You can use the static call when you process only a small number of strings with the same regular expression:

string resultString = Regex.Replace(subjectString, @"(w+)=(w+)",
                                                   "$2=$1");

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Regex regexObj = new Regex(@"(w+)=(w+)");
string resultString = regexObj.Replace(subjectString, "$2=$1");

VB.NET

You can use the static call when you process only a small number of strings with the same regular expression:

Dim ResultString = Regex.Replace(SubjectString, "(w+)=(w+)", "$2=$1")

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Dim RegexObj As New Regex("(w+)=(w+)")
Dim ResultString = RegexObj.Replace(SubjectString, "$2=$1")

Java

You can call String.replaceAll() when you process only one string with the same regular expression:

String resultString = subjectString.replaceAll("(\w+)=(\w+)", "$2=$1");

Construct a Matcher object if you want to use the same regular expression with a large number of strings:

Pattern regex = Pattern.compile("(\w+)=(\w+)");
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll("$2=$1");

JavaScript

result = subject.replace(/(w+)=(w+)/g, "$2=$1");

PHP

$result = preg_replace('/(w+)=(w+)/', '$2=$1', $subject);

Perl

$subject =~ s/(w+)=(w+)/$2=$1/g;

Python

If you have only a few strings to process, you can use the global function:

result = re.sub(r"(w+)=(w+)", r"2=1", subject)

To use the same regex repeatedly, use a compiled object:

reobj = re.compile(r"(w+)=(w+)")
result = reobj.sub(r"2=1", subject)

Ruby

result = subject.gsub(/(w+)=(w+)/, '2=1')

Discussion

The regular expression (w+)=(w+) matches the pair of words and captures each word into its own capturing group. The word before the equals sign is captured by the first group, and the word after the sign by the second group.

For the replacement, you need to specify that you want to use the text matched by the second capturing group, followed by an equals sign, followed by the text matched by the first capturing group. You can do this with special placeholders in the replacement text. The replacement text syntax varies widely between different programming languages. Search and Replace with Regular Expressions in Chapter 1 describes the replacement text flavors, and Recipe 2.21 explains how to reference capturing groups in the replacement text.

.NET

In .NET, you can use the same Regex.Replace() method described in the previous recipe, using a string as the replacement. The syntax for adding backreferences to the replacement text follows the .NET replacement text flavor Recipe 2.21.

Java

In Java, you can use the same replaceFirst() and replaceAll() methods described in the previous recipe. The syntax for adding backreferences to the replacement text follows the Java replacement text flavor described in this book.

JavaScript

In JavaScript, you can use the same string.replace() method described in the previous recipe. The syntax for adding backreferences to the replacement text follows the JavaScript replacement text flavor described in this book.

PHP

In PHP, you can use the same preg_replace() function described in the previous recipe. The syntax for adding backreferences to the replacement text follows the PHP replacement text flavor described in this book.

Perl

In Perl, the replace part in s/regex/replace/ is simply interpreted as a double-quoted string. You can use the special variables $&, $1, $2, etc., explained in Recipe 3.7 and Recipe 3.9 in the replacement string. The variables are set right after the regex match is found, before it is replaced. You can also use these variables in all other Perl code. Their values persist until you tell Perl to find another regex match.

All the other programming languages in this book provide a function call that takes the replacement text as a string. The function call parses the string to process backreferences such as $1 or 1. But outside the replacement text string, $1 has no meaning with these languages.

Python

In Python, you can use the same sub() function described in the previous recipe. The syntax for adding backreferences to the replacement text follows the Python replacement text flavor described in this book.

Ruby

In Ruby, you can use the same String.gsub() method described in the previous recipe. The syntax for adding backreferences to the replacement text follows the Ruby replacement text flavor described in this book.

You cannot interpolate variables such as $1 in the replacement text. That’s because Ruby does variable interpolation before the gsub() call is executed. Before the call, gsub() hasn’t found any matches yet, so backreferences can’t be substituted. If you try to interpolate $1, you’ll get the text matched by the first capturing group in the last regex match before the call to gsub().

Instead, use replacement text tokens such as «1». The gsub() function substitutes those tokens in the replacement text for each regex match. We recommend that you use single-quoted strings for the replacement text. In double-quoted strings, the backslash is used as an escape, and escaped digits are octal escapes. '1' and "\1" use the text matched by the first capturing group as the replacement, whereas "1" substitutes the single literal character 0x01.

Named Capture

If you use named capturing groups in your regular expression, you can reference the groups by their names in your replacement string.

C#

You can use the static call when you process only a small number of strings with the same regular expression:

string resultString = Regex.Replace(subjectString,
                      @"(?<left>w+)=(?<right>w+)", "${right}=${left}");

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Regex regexObj = new Regex(@"(?<left>w+)=(?<right>w+)");
string resultString = regexObj.Replace(subjectString, "${right}=${left}");

VB.NET

You can use the static call when you process only a small number of strings with the same regular expression:

Dim ResultString = Regex.Replace(SubjectString,
                   "(?<left>w+)=(?<right>w+)", "${right}=${left}")

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Dim RegexObj As New Regex("(?<left>w+)=(?<right>w+)")
Dim ResultString = RegexObj.Replace(SubjectString, "${right}=${left}")

Java 7

Java 7 adds support for named capture to the regular expression syntax and for named backreferences to the replacement text syntax.

You can call String.replaceAll() when you process only one string with the same regular expression:

String resultString = subjectString.replaceAll(
                      "(?<left>\w+)=(?<right>\w+)", "${right}=${left}");

Construct a Matcher object if you want to use the same regular expression with a large number of strings:

Pattern regex = Pattern.compile("(?<left>\w+)=(?<right>\w+)");
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll("${right}=${left}");

XRegExp

The XRegExp.replace() method extends JavaScript’s replacement text syntax with named backreferences.

var re = XRegExp("(?<left>\w+)=(?<right>\w+)", "g");
var result = XRegExp.replace(subject, re, "${right}=${left}");

PHP

$result = preg_replace('/(?P<left>w+)=(?P<right>w+)/',
                       '$2=$1', $subject);

PHP’s preg functions use the PCRE library, which supports named capture. The preg_match() and preg_match_all() functions add named capturing groups to the array with match results. Unfortunately, preg_replace() does not provide a way to use named backreferences in the replacement text. If your regex has named capturing groups, count both the named and numbered capturing groups from left to right to determine the backreference number of each group. Use those numbers in the replacement text.

Perl

$subject =~ s/(?<left>w+)=(?<right>w+)/$+{right}=$+{left}/g;

Perl supports named capturing groups starting with version 5.10. The %+ hash stores the text matched by all named capturing groups in the regular expression last used. You can use this hash in the replacement text string, as well as anywhere else.

Python

If you have only a few strings to process, you can use the global function:

result = re.sub(r"(?P<left>w+)=(?P<right>w+)", r"g<right>=g<left>",
                subject)

To use the same regex repeatedly, use a compiled object:

reobj = re.compile(r"(?P<left>w+)=(?P<right>w+)")
result = reobj.sub(r"g<right>=g<left>", subject)

Ruby

result = subject.gsub(/(?<left>w+)=(?<right>w+)/, 'k<left>=k<right>')

See Also

Search and Replace with Regular Expressions in Chapter 1 describes the replacement text flavors.

Recipe 2.21 explains how to reference capturing groups in the replacement text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset