How it works...

A simple regular expression that can parse the input file shown earlier may look like this:

    ^(?!#)(w+)s*=s*([wd]+[wd._,-:]*)$

This regular expression is supposed to ignore all lines that start with a #; for those that do not start with #, match a name followed by the equal sign and then a value that can be composed of alphanumeric characters and several other characters (underscore, dot, comma, and so on). The exact meaning of this regular expression is explained as follows:

Part	Description
`^`	Start of line
`(?!#)`	A negative lookahead that makes sure that it is not possible to match the # character
`(w)+`	A capturing group representing an identifier of at least a one word character
`s*`	Any white spaces
`=`	Equal sign
`s*`	Any white spaces
`([wd]+[wd._,-:]*)`	A capturing group representing a value that starts with an alphanumeric character, but can also contain a dot, comma, backslash, hyphen, colon, or an underscore.
`$`	End of line

We can use std::regex_search() to search for a match anywhere in the input text. This algorithm has several overloads, but in general they work in the same way. You must specify the range of characters to work through, an output std::match_results object that will contain the result of the match, and a std::basic_regex object representing the regular expression and matching flags (that define the way the search is done). The function returns true if a match was found or false otherwise.

In the first example from the previous section (see the 4th list item), match is an instance of std::smatch that is a typedef of std::match_results with string::const_iterator as the template type. If a match was found, this object will contain the matching information in a sequence of values for all matched subexpressions. The submatch at index 0 is always the entire match. The submatch at index 1 is the first subexpression that was matched, the submatch at index 2 is the second subexpression that was matched, and so on. Since we have two capturing groups (that are subexpressions) in our regular expression, the std::match_results will have three submatches in case of success. The identifier representing the name is at index 1, and the value after the equal sign is at index 2. Therefore, this code only prints the following:

    timeout=120

The std::regex_search() algorithm is not able to iterate through all the possible matches in a text. To do that, we need to use an iterator. std::regex_iterator is intended for this purpose. It allows not only iterating through all the matches, but also accessing all the submatches of a match. The iterator actually calls std::regex_search() upon construction and on each increment, and it remembers the result std::match_results from the call. The default constructor creates an iterator that represents the end of the sequence and can be used to test when the loop through the matches should stop.

In the second example from the previous section (see the 5th list item), we first create an end of sequence iterator, and then we start iterating through all the possible matches. When constructed, it will call std::regex_match(), and if a match is found, we can access its results through the current iterator. This will continue until no match is found (end of the sequence). This code will print the following output:

    'timeout'='120' 
    'server'='127.0.0.1'

An alternative to std::regex_iterator is std::regex_token_iterator. This works similar to the way std::regex_iterator works and, in fact, it contains such an iterator internally, except that it enables us to access a particular subexpression from a match. This is shown in the third example in the How to do it... section (the the 6th list item). We start by creating an end-of-sequence iterator and then loop through the matches until the end-of-sequence is reached. In the constructor we used, we did not specify the index of the subexpression to access through the iterator; therefore, the default value of 0 is used. That means this program will print the entire matches:

    timeout=120 
    server = 127.0.0.1

If we wanted to access only the first subexpression (that means the names in our case), all we had to do was specify the index of the subexpression in the constructor of the token iterator. This time, the output that we get is only the names:

    auto end = std::sregex_token_iterator{}; 
    for (auto it = std::sregex_token_iterator{ std::begin(text),  
                   std::end(text), rx, 1 }; 
         it != end; ++it) 
    { 
      std::cout << *it << std::endl; 
    }

An interesting thing about the token iterator is that it can return the unmatched parts of the string if the index of the subexpressions is -1, in which case it returns an std::match_results object that corresponds to the sequence of characters between the last match and the end of the sequence:

    auto end = std::sregex_token_iterator{}; 
    for (auto it = std::sregex_token_iterator{ std::begin(text),  
                   std::end(text), rx, -1 }; 
         it != end; ++it) 
    { 
      std::cout << *it << std::endl; 
    }

This program will output the following (note that the empty lines are actually part of the output):

 

    #remove # to uncomment the following lines 


    #retrycount=3

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...