Using iterators

The library also provides an iterator class for regular expressions, which provides a different way to parse strings. Since the class will involve comparisons of strings it is templated with the element type and traits. The class will need to iterate through strings, so the first template parameter is the string iterator type and the element and traits types can be deduced from that. The regex_iterator class is a forward iterator so it has a ++ operator and it provides a * operator that gives access to a match_result object. In the previous code, you saw that a match_result object is passed to the regex_match and regex_search functions, which use it to contain their results. This raises the question of what code fills the match_result object accessed through the regex_iterator. The answer lies in the iterator's ++ operator:

    string str = "the cat sat on the mat in the bathroom"; 
regex rx("(b(.at)([^ ]*)");
regex_iterator<string::iterator> next(str.begin(), str.end(), rx);
regex_iterator<string::iterator> end;

for (; next != end; ++next)
{
cout << next->position() << " " << next->str() << ", ";
}
cout << "n";
// 4 cat, 8 sat, 19 mat, 30 bathroom

In this code, a string is searched for words where the second and third letters are at. The b says that the pattern must be at the start of a word (the . means that the word can start with any letter). There is a capture group around these three characters and a second capture group for one or more characters other than spaces.

The iterator object next is constructed with iterators to the string to search and the regex object. The ++ operator essentially calls the regex_search function while maintaining the position of the place to perform the next search. If the search fails to find the pattern then the operator returns the end of sequence iterator, which is the iterator that is created by the default constructor (the end object in this code). This code prints out the full match because we use the default parameter for the str method (0). If you want the actual substring matched, use str(1) and the result will be:

    4 cat, 8 sat, 19 mat, 30 bat

Since the * (and the ->) operator gives access to a match_result object, you can also access the prefix method to get the string that precedes the match and the suffix method will return the string that follows the match.

The regex_iterator class allows you to iterate over the matched substrings, whereas the regex_token_iterator goes one step further in that it also gives you access to all submatches. In use, this class is the same as regex_iterator, except in construction. The regex_token_iterator constructor has a parameter to indicate which submatch you wish to access through the * operator. A value of -1 means you want the prefix, a value of 0 means you want the whole match, and a value of 1 or above means you want the numbered sub match. If you wish, you can pass an int vector or C array with the submatch types that you want:

    using iter = regex_token_iterator<string::iterator>; 
string str = "the cat sat on the mat in the bathroom";
regex rx("b(.at)([^ ]*)");
iter next, end;

// get the text between the matches
next = iter(str.begin(), str.end(), rx, -1);
for (; next != end; ++next) cout << next->str() << ", ";
cout << "n";
// the , , on the , in the ,

// get the complete match
next = iter(str.begin(), str.end(), rx, 0);
for (; next != end; ++next) cout << next->str() << ", ";
cout << "n";
// cat, sat, mat, bathroom,

// get the sub match 1
next = iter(str.begin(), str.end(), rx, 1);
for (; next != end; ++next) cout << next->str() << ", ";
cout << "n";
// cat, sat, mat, bat,

// get the sub match 2
next = iter(str.begin(), str.end(), rx, 2);
for (; next != end; ++next) cout << next->str() << ", ";
cout << "n";
// , , , hroom,
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset