3.7. Retrieve the Matched Text

Problem

You have a regular expression that matches a part of the subject text, and you want to extract the text that was matched. If the regular expression can match the string more than once, you want only the first match. For example, when applying the regex d+ to the string Do you like 13 or 42?, 13 should be returned.

Solution

C#

For quick one-off matches, you can use the static call:

string resultString = Regex.Match(subjectString, @"d+").Value;

If the regex is provided by the end user, you should use the static call with full exception handling:

string resultString = null;
try {
    resultString = Regex.Match(subjectString, @"d+").Value;
} catch (ArgumentNullException ex) {
    // Cannot pass null as the regular expression or subject string
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

To use the same regex repeatedly, construct a Regex object:

Regex regexObj = new Regex(@"d+");
string resultString = regexObj.Match(subjectString).Value;

If the regex is provided by the end user, you should use the Regex object with full exception handling:

string resultString = null;
try {
    Regex regexObj = new Regex(@"d+");
    try {
        resultString = regexObj.Match(subjectString).Value;
    } catch (ArgumentNullException ex) {
        // Cannot pass null as the subject string
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

For quick one-off matches, you can use the static call:

Dim ResultString  = Regex.Match(SubjectString, "d+").Value

If the regex is provided by the end user, you should use the static call with full exception handling:

Dim ResultString As String = Nothing
Try
    ResultString = Regex.Match(SubjectString, "d+").Value
Catch ex As ArgumentNullException
    'Cannot pass Nothing as the regular expression or subject string
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

To use the same regex repeatedly, construct a Regex object:

Dim RegexObj As New Regex("d+")
Dim ResultString = RegexObj.Match(SubjectString).Value

If the regex is provided by the end user, you should use the Regex object with full exception handling:

Dim ResultString As String = Nothing
Try
    Dim RegexObj As New Regex("d+")
    Try
        ResultString = RegexObj.Match(SubjectString).Value
    Catch ex As ArgumentNullException
        'Cannot pass Nothing as the subject string
    End Try
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

Create a Matcher to run the search and store the result:

String resultString = null;
Pattern regex = Pattern.compile("\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    resultString = regexMatcher.group();
}

If the regex is provided by the end user, you should use full exception handling:

String resultString = null;
try {
    Pattern regex = Pattern.compile("\d+");
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        resultString = regexMatcher.group();
    }
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

JavaScript

var result = subject.match(/d+/);
if (result) {
    result = result[0];
} else {
    result = '';
}

PHP

if (preg_match('/d+/', $subject, $groups)) {
    $result = $groups[0];
} else {
    $result = '';
}

Perl

if ($subject =~ m/d+/) {
    $result = $&;
} else {
    $result = '';
}

Python

For quick one-off matches, you can use the global function:

matchobj = re.search("regex pattern", subject)
if matchobj:
    result = matchobj.group()
else:
    result = ""

To use the same regex repeatedly, use a compiled object:

reobj = re.compile("regex pattern")
matchobj = reobj.search(subject)
if match:
    result = matchobj.group()
else:
    result = ""

Ruby

You can use the =~ operator and its magic $& variable:

if subject =~ /regex pattern/
    result = $&
else
    result = ""
end

Alternatively, you can call the match method on a Regexp object:

matchobj = /regex pattern/.match(subject)
if matchobj
    result = matchobj[0]
else
    result = ""
end

Discussion

Extracting the part of a longer string that fits the pattern is another prime job for regular expressions. All programming languages discussed in this book provide an easy way to get the first regular expression match from a string. The function will attempt the regular expression at the start of the string and continue scanning through the string until the regular expression matches.

.NET

The .NET Regex class does not have a member that returns the string matched by the regular expression. But it does have a Match() method that returns an instance of the Match class. This Match object has a property called Value, which holds the text matched by the regular expression. If the regular expression fails to match, it still returns a Match object, but the Value property holds an empty string.

A total of five overloads allows you to call the Match() method in various ways. The first parameter is always the string that holds the subject text in which you want the regular expression to find a match. This parameter should not be null. Otherwise, Match() will throw an ArgumentNullException.

If you want to use the regular expression only a few times, you can use a static call. The second parameter is then the regular expression you want to use. You can pass regex options as an optional third parameter. If your regular expression has a syntax error, an ArgumentException will be thrown.

If you want to use the same regular expression on many strings, you can make your code more efficient by constructing a Regex object first and then calling Match() on that object. The first parameter with the subject string is then the only required parameter. You can specify an optional second parameter to indicate the character index at which the regular expression should begin to search. Essentially, the number you pass as the second parameter is the number of characters at the start of your subject string that the regular expression should ignore. This can be useful when you’ve already processed the string up to a point and want to search the remainder of the string. If you specify this number, it must be in the range from zero to the length of the subject string. Otherwise, IsMatch() throws an ArgumentOutOfRangeException.

If you specify the second parameter with the starting position, you can specify a third parameter that indicates the length of the substring the regular expression is allowed to search through. This number must be greater than or equal to zero and must not exceed the length of the subject string (first parameter) minus the starting offset (second parameter). For instance, regexObj.Match("123456", 3, 2) tries to find a match in "45". If the third parameter is greater than the length of the subject string, Match() throws an ArgumentOutOfRangeException. If the third parameter is not greater than the length of the subject string, but the sum of the second and third parameters is greater than the length of the string, then another IndexOutOfRangeException is thrown. If you allow the user to specify starting and ending positions, either check them before calling Match() or make sure to catch both out-of-range exceptions.

The static overloads do not allow for the parameters that specify which part of the string the regular expression can search through.

Java

To get the part of a string matched by a regular expression, you need to create a Matcher, as explained in Recipe 3.3. Then call the find() method on your matcher, without any parameters. If find() returns true, call group() without any parameters to retrieve the text matched by your regular expression. If find() returns false, you should not call group(), as all you’ll get is an IllegalStateException.

Matcher.find() takes one optional parameter with the starting position in the subject string. You can use this to begin the search at a certain position in the string. Specify zero to begin the match attempt at the start of the string. An IndexOutOfBoundsException is thrown if you set the starting position to a negative number, or to a number greater than the length of the subject string.

If you omit the parameter, find() starts at the character after the previous match found by find(). If you’re calling find() for the first time after Pattern.matcher() or Matcher.reset(), then find() begins searching at the start of the string.

JavaScript

The string.match() method takes a regular expression as its only parameter. You can pass the regular expression as a literal regex, a regular expression object, or as a string. If you pass a string, string.match() creates a temporary regexp object.

When the match attempt fails, string.match() returns null. This allows you to differentiate between a regex that finds no matches, and a regex that finds a zero-length match. It does mean that you cannot directly display the result, as “null” or an error about a null object may appear.

When the match attempt succeeds, string.match() returns an array with the details of the match. Element zero in the array is a string that holds the text matched by the regular expression.

Make sure that you do not add the /g flag to your regular expression. If you do, string.match() behaves differently, as Recipe 3.10 explains.

PHP

The preg_match() function discussed in the previous two recipes takes an optional third parameter to store the text matched by the regular expression and its capturing groups. When preg_match() returns 1, the variable holds an array of strings. Element zero in the array holds the overall regular expression match. The other elements are explained in Recipe 3.9.

Perl

When the pattern-matching operator m// finds a match, it sets several special variables. One of those is the $& variable, which holds the part of the string matched by the regular expression. The other special variables are explained in later recipes.

Python

Recipe 3.5 explains the search() function. This time, we store the MatchObject instance returned by search() into a variable. To get the part of the string matched by the regular expression, we call the group() method on the match object without any parameters.

Ruby

Recipe 3.8 explains the $~ variable and the MatchData object. In a string context, this object evaluates to the text matched by the regular expression. In an array context, this object evaluates to an array with element number zero holding the overall regular expression match.

$& is a special read-only variable. It is an alias for $~[0], which holds a string with the text matched by the regular expression.

See Also

Recipe 3.5 shows code to test whether a regex matches a subject string, without retrieving the actual match.

Recipe 3.8 shows code to determine the position and length of the match.

Recipe 3.9 shows code to get the text matched by a particular part (capturing group) of a regex.

Recipe 3.10 shows code to get a list of all the matches a regex can find in a string.

Recipe 3.11 shows code to iterate over all the matches a regex can find in a string.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset