You have a regular expression that matches a part of the subject
text, and you want to extract the text that was matched. If the regular
expression can match the string more than once, you want only the first
match. For example, when applying the regex ‹d+
›
to the string Do
you like 13 or 42?
, 13
should be returned.
For quick one-off matches, you can use the static call:
string resultString = Regex.Match(subjectString, @"d+").Value;
If the regex is provided by the end user, you should use the static call with full exception handling:
string resultString = null; try { resultString = Regex.Match(subjectString, @"d+").Value; } catch (ArgumentNullException ex) { // Cannot pass null as the regular expression or subject string } catch (ArgumentException ex) { // Syntax error in the regular expression }
To use the same regex repeatedly, construct a Regex
object:
Regex regexObj = new Regex(@"d+"); string resultString = regexObj.Match(subjectString).Value;
If the regex is provided by the end user, you should use the
Regex
object with full
exception handling:
string resultString = null; try { Regex regexObj = new Regex(@"d+"); try { resultString = regexObj.Match(subjectString).Value; } catch (ArgumentNullException ex) { // Cannot pass null as the subject string } } catch (ArgumentException ex) { // Syntax error in the regular expression }
For quick one-off matches, you can use the static call:
Dim ResultString = Regex.Match(SubjectString, "d+").Value
If the regex is provided by the end user, you should use the static call with full exception handling:
Dim ResultString As String = Nothing Try ResultString = Regex.Match(SubjectString, "d+").Value Catch ex As ArgumentNullException 'Cannot pass Nothing as the regular expression or subject string Catch ex As ArgumentException 'Syntax error in the regular expression End Try
To use the same regex repeatedly, construct a Regex
object:
Dim RegexObj As New Regex("d+") Dim ResultString = RegexObj.Match(SubjectString).Value
If the regex is provided by the end user, you should use the
Regex
object with full
exception handling:
Dim ResultString As String = Nothing Try Dim RegexObj As New Regex("d+") Try ResultString = RegexObj.Match(SubjectString).Value Catch ex As ArgumentNullException 'Cannot pass Nothing as the subject string End Try Catch ex As ArgumentException 'Syntax error in the regular expression End Try
Create a Matcher
to run the search and store the result:
String resultString = null; Pattern regex = Pattern.compile("\d+"); Matcher regexMatcher = regex.matcher(subjectString); if (regexMatcher.find()) { resultString = regexMatcher.group(); }
If the regex is provided by the end user, you should use full exception handling:
String resultString = null; try { Pattern regex = Pattern.compile("\d+"); Matcher regexMatcher = regex.matcher(subjectString); if (regexMatcher.find()) { resultString = regexMatcher.group(); } } catch (PatternSyntaxException ex) { // Syntax error in the regular expression }
For quick one-off matches, you can use the global function:
matchobj = re.search("regex pattern", subject) if matchobj: result = matchobj.group() else: result = ""
To use the same regex repeatedly, use a compiled object:
reobj = re.compile("regex pattern") matchobj = reobj.search(subject) if match: result = matchobj.group() else: result = ""
Extracting the part of a longer string that fits the pattern is another prime job for regular expressions. All programming languages discussed in this book provide an easy way to get the first regular expression match from a string. The function will attempt the regular expression at the start of the string and continue scanning through the string until the regular expression matches.
The .NET Regex
class does not have a member that returns the string matched by the
regular expression. But it does have a Match()
method that returns an instance of the Match
class. This Match
object has a property called Value
, which holds the text
matched by the regular expression. If the regular expression fails to
match, it still returns a Match
object, but the Value
property holds an empty string.
A total of five overloads allows you to call the Match()
method in various ways.
The first parameter is always the string that holds the subject text
in which you want the regular expression to find a match. This
parameter should not be null
. Otherwise, Match()
will throw an ArgumentNullException
.
If you want to use the regular expression only a few times, you
can use a static call. The second parameter is then the regular
expression you want to use. You can pass regex options as an optional
third parameter. If your regular expression has a syntax error, an
ArgumentException
will be thrown.
If you want to use the same regular expression on many strings,
you can make your code more efficient by constructing a Regex
object first and then
calling Match()
on that object. The first parameter with the subject string is then
the only required parameter. You can specify an optional second
parameter to indicate the character index at which the regular
expression should begin to search. Essentially, the number you pass as
the second parameter is the number of characters at the start of your
subject string that the regular expression should ignore. This can be
useful when you’ve already processed the string up to a point and want
to search the remainder of the string. If you specify this number, it
must be in the range from zero to the length of the subject string.
Otherwise, IsMatch()
throws an ArgumentOutOfRangeException
.
If you specify the second parameter with the starting position,
you can specify a third parameter that indicates the length of the
substring the regular expression is allowed to search through. This
number must be greater than or equal to zero and must not exceed the
length of the subject string (first parameter) minus the starting
offset (second parameter). For instance, regexObj.Match("123456", 3, 2)
tries to find a
match in "45"
. If the
third parameter is greater than the length of the subject string,
Match()
throws an ArgumentOutOfRangeException
. If the third
parameter is not greater than the length of the subject string, but
the sum of the second and third parameters is greater than the length
of the string, then another IndexOutOfRangeException
is thrown. If you allow
the user to specify starting and ending positions, either check them
before calling Match()
or make sure to catch both out-of-range exceptions.
The static overloads do not allow for the parameters that specify which part of the string the regular expression can search through.
To get the part of a string matched by a regular
expression, you need to create a Matcher
, as explained in Recipe 3.3. Then call the find()
method on your matcher, without any parameters. If find()
returns true
, call
group()
without any
parameters to retrieve the text matched by your regular expression. If
find()
returns false
, you
should not call group()
, as all you’ll get is an IllegalStateException
.
Matcher.find()
takes one optional parameter with
the starting position in the subject string. You can use this to begin
the search at a certain position in the string. Specify zero to begin the match attempt
at the start of the string. An IndexOutOfBoundsException
is thrown if you set
the starting position to a negative number, or to a number greater
than the length of the subject string.
If you omit the parameter, find()
starts at the character after the previous match found by find()
.
If you’re calling find()
for the first time after Pattern.matcher()
or Matcher.reset()
, then find()
begins searching at the start of the string.
The string.match()
method takes a regular expression
as its only parameter. You can pass the regular expression as a
literal regex, a regular expression object, or as a string. If you
pass a string, string.match()
creates a temporary regexp
object.
When the match attempt fails, string.match()
returns null
. This allows you to differentiate between a
regex that finds no matches, and a regex that finds a zero-length
match. It does mean that you cannot directly display the result, as
“null” or an error about a null object may appear.
When the match attempt succeeds, string.match()
returns an array with the details
of the match. Element zero in the array is a string that holds the
text matched by the regular expression.
Make sure that you do not add the /g
flag to your regular expression. If you do,
string.match()
behaves differently, as Recipe 3.10 explains.
The preg_match()
function discussed in the previous
two recipes takes an optional third parameter to store the text
matched by the regular expression and its capturing groups. When
preg_match()
returns 1
, the variable holds an array of strings.
Element zero in the array holds the overall regular expression match.
The other elements are explained in Recipe 3.9.
When the pattern-matching operator m//
finds
a match, it sets several special variables. One of those is the
$&
variable, which holds the part of the string matched by the regular
expression. The other special variables are explained in later
recipes.
Recipe 3.5 explains the
search()
function. This time, we store the MatchObject
instance returned by search()
into a variable. To get the part of the string matched by the regular
expression, we call the group()
method on the match object without any
parameters.
Recipe 3.8 explains the
$~
variable and the MatchData
object. In a string context, this
object evaluates to the text matched by the regular expression. In an
array context, this object evaluates to an array with element number
zero holding the overall regular expression match.
$&
is a
special read-only variable. It is an alias for $~[0]
, which holds a string with
the text matched by the regular expression.
Recipe 3.5 shows code to test whether a regex matches a subject string, without retrieving the actual match.
Recipe 3.8 shows code to determine the position and length of the match.
Recipe 3.9 shows code to get the text matched by a particular part (capturing group) of a regex.
Recipe 3.10 shows code to get a list of all the matches a regex can find in a string.
Recipe 3.11 shows code to iterate over all the matches a regex can find in a string.