3.12. Validate Matches in Procedural Code

Problem

Recipe 3.10 shows how you can retrieve a list of all matches a regular expression can find in a string when it is applied repeatedly to the remainder of the string after each match. Now you want to get a list of matches that meet certain extra criteria that you cannot (easily) express in a regular expression. For example, when retrieving a list of lucky numbers, you only want to retain those that are an integer multiple of 13.

Solution

C#

You can use the static call when you process only a small number of strings with the same regular expression:

StringCollection resultList = new StringCollection();
Match matchResult = Regex.Match(subjectString, @"d+");
while (matchResult.Success) {
    if (int.Parse(matchResult.Value) % 13 == 0) {
        resultList.Add(matchResult.Value);
    }
    matchResult = matchResult.NextMatch();
}

Construct a Regex object if you want to use the same regular expression with a large number of strings:

StringCollection resultList = new StringCollection();
Regex regexObj = new Regex(@"d+");
matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    if (int.Parse(matchResult.Value) % 13 == 0) {
        resultList.Add(matchResult.Value);
    }
    matchResult = matchResult.NextMatch();
}

VB.NET

You can use the static call when you process only a small number of strings with the same regular expression:

Dim ResultList = New StringCollection
Dim MatchResult = Regex.Match(SubjectString, "d+")
While MatchResult.Success
    If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
        ResultList.Add(MatchResult.Value)
    End If
    MatchResult = MatchResult.NextMatch
End While

Construct a Regex object if you want to use the same regular expression with a large number of strings:

Dim ResultList = New StringCollection
Dim RegexObj As New Regex("d+")
Dim MatchResult = RegexObj.Match(SubjectString)
While MatchResult.Success
    If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
        ResultList.Add(MatchResult.Value)
    End If
    MatchResult = MatchResult.NextMatch
End While

Java

List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile("\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    if (Integer.parseInt(regexMatcher.group()) % 13 == 0) {
        resultList.add(regexMatcher.group());
    }
}

JavaScript

var list = [];
var regex = /d+/g;
var match = null;
while (match = regex.exec(subject)) {
    // Don't let browsers get stuck in an infinite loop
    if (match.index == regex.lastIndex) regex.lastIndex++;
    // Here you can process the match stored in the match variable
    if (match[0] % 13 == 0) {
        list.push(match[0]);
    }
}

XRegExp

var list = [];
XRegExp.forEach(subject, /d+/, function(match) {
   if (match[0] % 13 == 0) {
       list.push(match[0]);
   }
});

PHP

preg_match_all('/d+/', $subject, $matchdata, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matchdata[0]); $i++) {
    if ($matchdata[0][$i] % 13 == 0) {
      $list[] = $matchdata[0][$i];
    }
}

Perl

while ($subject =~ m/d+/g) {
    if ($& % 13 == 0) {
        push(@list, $&);
    }
}

Python

If you process only a small number of strings with the same regular expression, you can use the global function:

list = []
for matchobj in re.finditer(r"d+", subject):
    if int(matchobj.group()) % 13 == 0:
       list.append(matchobj.group())

To use the same regex repeatedly, use a compiled object:

list = []
reobj = re.compile(r"d+")
for matchobj in reobj.finditer(subject):
    if int(matchobj.group()) % 13 == 0:
       list.append(matchobj.group())

Ruby

list = []
subject.scan(/d+/) {|match|
    list << match if (Integer(match) % 13 == 0)
}

Discussion

Regular expressions deal with text. Though the regular expression d+ matches what we call a number, to the regular expression engine it’s just a string of one or more digits.

If you want to find specific numbers, such as those divisible by 13, it is much easier to write a general regex that matches all numbers, and then use a bit of procedural code to skip the regex matches you’re not interested in.

The solutions for this recipe all are based on the solutions for the previous recipe, which shows how to iterate over all matches. Inside the loop, we convert the regular expression match into a number.

Some languages do this automatically; other languages require an explicit function call to convert the string into an integer. We then check whether the integer is divisible by 13. If it is, the regex match is added to the list. If it is not, the regex match is skipped.

See Also

Recipe 3.12 was used as a basis for this recipe. It explains how iterating over regex matches works.

Recipe 3.7 shows code to get only the first regex match.

Recipe 3.8 shows code to determine the position and length of the match.

Recipe 3.10 shows code to get a list of all the matches a regex can find in a string.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset