15 STRINGS

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

15
STRINGS

If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart.
—Nelson Mandela

The STL provides a special string container for human-language data, such as words, sentences, and markup languages. Available in the <string> header, the std::basic_string is a class template that you can specialize on a string’s underlying character type. As a sequential container, basic_string is essentially similar to a vector but with some special facilities for manipulating language.

STL basic_string provides major safety and feature improvements over C-style or null-terminated strings, and because human-language data inundates most modern programs, you’ll probably find basic_string indispensable.

std::string

The STL provides four basic_string specializations in the <string> header. Each specialization implements a string using one of the fundamental character types that you learned about in Chapter 2:

std::string for char is used for character sets like ASCII.
std::wstring for wchar_t is large enough to contain the largest character of the implementation’s locale.
std::u16string for char16_t is used for character sets like UTF-16.
std::u32string for char32_t is used for character sets like UTF-32.

You’ll use the specialization with the appropriate underlying type. Because these specializations have the same interface, all the examples in this chapter will use std::string.

Constructing

The basic_string container takes three template parameters:

The underlying character type, T
The underlying type’s traits, Traits
An allocator, Alloc

Of these, only T is required. The STL’s std::char_traits template class in the <string> header abstracts character and string operations from the underlying character type. Also, unless you plan on supporting a custom character type, you won’t need to implement your own type traits, because char_traits has specializations available for char, wchar_t, char16_t, and char32_t. When the stdlib provides specializations for a type, you won’t need to provide it yourself unless you require some kind of exotic behavior.

Together, a basic_string specialization looks like this, where T is a character type:

std::basic_string<T, Traits=std::char_traits<T>, Alloc=std::allocator<T>>

NOTE

In most cases, you’ll be dealing with one of the predefined specializations, especially string or wstring. However, if you need a custom allocator, you’ll need to specialize basic_string appropriately.

The basic_string<T> container supports the same constructors as vector<T>, plus additional convenience constructors for converting a C-style string. In other words, a string supports the constructors of vector<char>, a wstring supports the constructors of vector<wchar_t>, and so on. As with vector, use parentheses for all basic_string constructors except when you actually want an initializer list.

You can default construct an empty string, or if you want to fill a string with a repeating character, you can use the fill constructor by passing a size_t and a char, as Listing 15-1 illustrates.

#include <string>
TEST_CASE("std::string supports constructing") {
  SECTION("empty strings") {
    std::string cheese; ➊
    REQUIRE(cheese.empty()); ➋
  }
  SECTION("repeated characters") {
    std::string roadside_assistance(3, 'A'); ➌
    REQUIRE(roadside_assistance == "AAA"); ➍
  }
}

Listing 15-1: The default and fill constructors of string

After you default construct a string ➊, it contains no elements ➋. If you want to fill the string with repeating characters, you can use the fill constructor by passing in the number of elements you want to fill and their value ➌. The example fills a string with three A characters ➍.

NOTE

You’ll learn about std::string comparisons with operator== later in the chapter. Because you generally handle C-style strings with raw pointers or raw arrays, operator== returns true only when given the same object. However, for std::string, operator== returns true if the contents are equivalent. As you can see in Listing 15-1, the comparison works even when one of the operands is a C-style string literal.

The string constructor also offers two const char*-based constructors. If the argument points to a null-terminated string, the string constructor can determine the input’s length on its own. If the pointer does not point to a null-terminated string or if you only want to use the first part of a string, you can pass a length argument that informs the string constructor of how many elements to copy, as Listing 15-2 illustrates.

TEST_CASE("std::string supports constructing substrings ") {
  auto word = "gobbledygook"; ➊
  REQUIRE(std::string(word) == "gobbledygook"); ➋
  REQUIRE(std::string(word, 6) == "gobble"); ➌
}

Listing 15-2: Constructing a string from C-style strings

You create a const char* called word pointing to the C-style string literal gobbledygook ➊. Next, you construct a string by passing word. As expected, the resulting string contains gobbledygook ➋. In the next test, you pass the number 6 as a second argument. This causes string to only take the first six characters of word, resulting in the string containing gobble ➌.

Additionally, you can construct strings from other strings. As an STL container, string fully supports copy and move semantics. You can also construct a string from a substring—a contiguous subset of another string. Listing 15-3 illustrates these three constructors.

TEST_CASE("std::string supports") {
  std::string word("catawampus"); ➊
  SECTION("copy constructing") {
    REQUIRE(std::string(word) == "catawampus"); ➋
  }
  SECTION("move constructing") {
    REQUIRE(std::string(move(word)) == "catawampus"); ➌
  }
  SECTION("constructing from substrings") {
    REQUIRE(std::string(word, 0, 3) == "cat"); ➍
    REQUIRE(std::string(word, 4) == "wampus"); ➎
  }
}

Listing 15-3: Copy, move, and substring construction of string objects

NOTE

In Listing 15-3, word is in a moved-from state, which, you’ll recall from “Move Semantics” on page 122, means it can only be reassigned or destructed.

Here, you construct a string called word containing the characters catawampus ➊. Copy construction yields another string containing a copy of the characters of word ➋. Move construction steals the characters of word, resulting in a new string containing catawampus ➌. Finally, you can construct a new string based on substrings. By passing word, a starting position of 0, and a length of 3, you construct a new string containing the characters cat ➍. If you instead pass word and a starting position of 4 (without a length), you get all the characters from the fourth to the end of the original string, resulting in wampus ➎.

The string class also supports literal construction with std::string_literals::operator""s. The major benefit is notational convenience, but you can also use operator""s to embed null characters within a string easily, as Listing 15-4 illustrates.

TEST_CASE("constructing a string with") {
  SECTION("std::string(char*) stops at embedded nulls") {
    std::string str("idioglossiaellohay!"); ➊
    REQUIRE(str.length() == 11); ➋
  }
  SECTION("operator""s incorporates embedded nulls") {
    using namespace std::string_literals; ➌
    auto str_lit = "idioglossiaellohay!"s; ➍
    REQUIRE(str_lit.length() == 20); ➎
  }
}

Listing 15-4: Constructing a string

In the first test, you construct a string using the literal idioglossiaellohay! ➊, which results in a string containing idioglossia ➋, The remainder of the literal didn’t get copied into the string due to embedded nulls. In the second test, you bring in the std::string_literals namespace ➌ so you can use operator""s to construct a string from a literal directly ➍. Unlike the std::string constructor ➊, operator""s yields a string containing the entire literal—embedded null bytes and all ➎.

Table 15-1 summarizes the options for constructing a string. In this table, c is a char, n and pos are size_t, str is a string or a C-style string, c_str is a C-style string, and beg and end are input iterators.

Table 15-1: Supported std::string Constructors

Constructor	Produces a string containing
string()	No characters.
string(n, c)	c repeated n times.
string(str, pos, [n])	The half-open range pos to pos+n of str. Substring extends from pos to str’s end if n is omitted.
string(c_str, [n])	A copy of c_str, which has length n. If c_str is null terminated, n defaults to the null-terminated string’s length.
string(beg, end)	A copy of the elements in the half-open range from beg to end.
string(str)	A copy of str.
string(move(str))	The contents of str, which is in a moved-from state after construction.
string{ c1, c2, c3 }	The characters c1, c2, and c3.
"my string literal"s	A string containing the characters my string literal.

String Storage and Small String Optimizations

Exactly like vector, string uses dynamic storage to store its constituent elements contiguously. Accordingly, vector and string have very similar copy/move-construction/assignment semantics. For example, copy operations are potentially more expensive than move operations because the contained elements reside in dynamic memory.

The most popular STL implementations have small string optimizations (SSO). The SSO places the contents of a string within the object’s storage (rather than dynamic storage) if the contents are small enough. As a general rule, a string with fewer than 24 bytes is an SSO candidate. Implementers make this optimization because in many modern programs, most strings are short. (A vector doesn’t have any small optimizations.)

NOTE

Practically, SSO affects moves in two ways. First, any references to the elements of a string will invalidate if the string moves. Second, moves are potentially slower for strings than vectors because strings need to check for SSO.

A string has a size (or length) and a capacity. The size is the number of characters contained in the string, and the capacity is the number of characters that the string can hold before needing to resize.

Table 15-2 contains methods for reading and manipulating the size and capacity of a string. In this table, n is a size_t. An asterisk (*) indicates that this operation invalidates raw pointers and iterators to the elements of s in at least some circumstances.

Table 15-2: Supported std::string Storage and Length Methods

Method	Returns
s.empty()	true if s contains no characters; otherwise false.
s.size()	The number of characters in s.
s.length()	Identical to s.size()
s.max_size()	The maximum possible size of s (due to system/runtime limitations).
s.capacity()	The number of characters s can hold before needing to resize.
s.shrink_to_fit()	void; issues a non-binding request to reduce s.capacity() to s.size().*
s.reserve([n])	void; if n > s.capacity(), resizes so s can hold at least n elements; otherwise, issues a non-binding request* to reduce s.capacity() to n or s.size(), whichever is greater.

NOTE

At press time, the draft C++20 standard changes the behavior of the reserve method when its argument is less than the size of the string. This will match the behavior of vector, where there is no effect rather than being equivalent to invoking shrink_to_fit.

Note that the size and capacity methods of string match those of vector very closely. This is a direct result of the closeness of their storage models.

Element and Iterator Access

Because string offers random-access iterators to contiguous elements, it accordingly exposes similar element- and iterator-access methods to vector.

For interoperation with C-style APIs, string also exposes a c_str method, which returns a non-modifiable, null-terminated version of the string as a const char*, as Listing 15-5 illustrates.

TEST_CASE("string's c_str method makes null-terminated strings") {
  std::string word("horripilation"); ➊
  auto as_cstr = word.c_str(); ➋
  REQUIRE(as_cstr[0] ==  'h'); ➌
  REQUIRE(as_cstr[1] ==  'o');
  REQUIRE(as_cstr[11] == 'o');
  REQUIRE(as_cstr[12] == 'n');
  REQUIRE(as_cstr[13] == ''); ➍
}

Listing 15-5: Extracting a null-terminated string from a string

You construct a string called word containing the characters horripilation ➊ and use its c_str method to extract a null-terminated string called as_cstr ➋. Because as_cstr is a const char*, you can use operator[] to illustrate that it contains the same characters as word ➌ and that it is null terminated ➍.

NOTE

The std::string class also supports operator[], which has the same behavior as with a C-style string.

Generally, c_str and data produce identical results except that references returned by data can be non-const. Whenever you manipulate a string, implementations usually ensure that the contiguous memory backing the string ends with a null terminator. The program in Listing 15-6 illustrates this behavior by printing the results of calling data and c_str alongside their addresses.

#include <string>
#include <cstdio>

int main() {
  std::string word("pulchritudinous");
  printf("c_str: %s at 0x%p
", word.c_str(), word.c_str()); ➊
  printf("data:  %s at 0x%p
", word.data(), word.data()); ➋
}
--------------------------------------------------------------------------
c_str: pulchritudinous at 0x0000002FAE6FF8D0 ➊
data:  pulchritudinous at 0x0000002FAE6FF8D0 ➋

Listing 15-6: Illustrating that c_str and data return equivalent addresses

Both c_str and data produce identical results because they point to the same addresses ➊ ➋. Because the address is the beginning of a null-terminated string, printf yields identical output for both invocations.

Table 15-3 lists the access methods of string. Note that n is a size_t in the table.

Table 15-3: Supported std::string Element and Iterator Access Methods

Method	Returns
s.begin()	An iterator pointing to the first element.
s.cbegin()	A const iterator pointing to the first element.
s.end()	An iterator pointing to one past the last element.
s.cend()	A const iterator pointing to one past the last element.
s.at(n)	A reference to element n of s. Throws std::out_of_range if out of bounds.
s[n]	A reference to element n of s. Undefined behavior if n > s.size(). Also s[s.size()] must be 0, so writing a non-zero value into this character is undefined behavior.
s.front()	A reference to first element.
s.back()	A reference to last element.
s.data()	A raw pointer to the first element if string is non-empty. For an empty string, returns a pointer to a null character.
s.c_str()	Returns a non-modifiable, null-terminated version of the contents of s.

String Comparisons

Note that string supports comparisons with other strings and with raw C-style strings using the usual comparison operators. For example, the equality operator== returns true if the size and contents of the left and right size are equal, whereas the inequality operator!= returns the opposite. The remaining comparison operators perform lexicographical comparison, meaning they sort alphabetically where A < Z < a < z and where, if all else is equal, shorter words are less than longer words (for example, pal < palindrome). Listing 15-7 illustrates comparisons.

NOTE

Technically, lexicographical comparison depends on the encoding of the string. It’s theoretically possible that a system could use a default encoding where the alphabet is in some completely jumbled order (such as the nearly obsolete EBCDIC encoding, which put lowercase letters before uppercase letters), which would affect string comparison. For ASCII-compatible encodings, you don’t need to worry since they imply the expected lexicographical behavior.

TEST_CASE("std::string supports comparison with") {
  using namespace std::literals::string_literals; ➊
  std::string word("allusion"); ➋
  SECTION("operator== and !=") {
    REQUIRE(word == "allusion"); ➌
    REQUIRE(word == "allusion"s); ➍
    REQUIRE(word != "Allusion"s); ➎
    REQUIRE(word != "illusion"s); ➏
    REQUIRE_FALSE(word == "illusion"s); ➐
  }
  SECTION("operator<") {
    REQUIRE(word < "illusion"); ➑
    REQUIRE(word < "illusion"s); ➒
    REQUIRE(word > "Illusion"s); ➓
  }
}

Listing 15-7: The string class supports comparison

Here, you bring in the std::literals::string_literals namespace so you can easily construct a string with operator""s ➊. You also construct a string called word containing the characters allusion ➋. In the first set of tests, you examine operator== and operator!=.

You can see that word equals (==) allusion as both a C-style string ➌ and a string ➍, but it doesn’t equal (!=) strings containing Allusion ➎ or illusion ➏. As usual, operator== and operator!= always return opposite results ➐.

The next set of tests uses operator< to show that allusion is less than illusion ➑, because a is lexicographically less than i. Comparisons work with C-style strings and strings ➒. Listing 15-7 also shows that Allusion is less than allusion ➓ because A is lexicographically less than a.

Table 15-4 lists the comparison methods of string. Note that other is a string or char* C-style string in the table.

Table 15-4: Supported std::string Comparison Operators

Method	Returns
s == other	true if s and other have identical characters and lengths; otherwise false
s != other	The opposite of operator==
s.compare(other)	Returns 0 if s == other, a negative number if s < other, and a positive number if s > other
s < other s > other s <= other s >= other	The result of the corresponding comparison operation, according to lexicographical sort

Manipulating Elements

For manipulating elements, string has a lot of methods. It supports all the methods of vector<char> plus many others useful to manipulating human-language data.

Adding Elements

To add elements to a string, you can use push_back, which inserts a single character at the end. When you want to insert more than one character to the end of a string, you can use operator+= to append a character, a null-terminated char* string, or a string. You can also use the append method, which has three overloads. First, you can pass a string or a null-terminated char* string, an optional offset into that string, and an optional number of characters to append. Second, you can pass a length and a char, which will append that number of chars to the string. Third, you can append a half-open range. Listing 15-8 illustrates all of these operations.

TEST_CASE("std::string supports appending with") {
  std::string word("butt"); ➊
  SECTION("push_back") {
    word.push_back('e'); ➋
    REQUIRE(word == "butte");
  }
  SECTION("operator+=") {
    word += "erfinger"; ➌
    REQUIRE(word == "butterfinger");
  }
  SECTION("append char") {
    word.append(1, 's'); ➍
    REQUIRE(word == "butts");
  }
  SECTION("append char*") {
    word.append("stockings", 5); ➎
    REQUIRE(word == "buttstock");
  }
  SECTION("append (half-open range)") {
    std::string other("onomatopoeia"); ➏
    word.append(other.begin(), other.begin()+2); ➐
    REQUIRE(word == "button");
  }
}

Listing 15-8: Appending to a string

To begin, you initialize a string called word containing the characters butt ➊. In the first test, you invoke push_back with the letter e ➋, which yields butte. Next, you add erfinger to word using operator+= ➌, yielding butterfinger. In the first invocation of append, you append a single s ➍ to yield butts. (This setup works just like push_back.) A second overload of append allows you to provide a char* and a length. By providing stockings and length 5, you add stock to word to yield buttstock ➎. Because append works with half-open ranges, you can also construct a string called other containing the characters onomatopoeia ➏ and append the first two characters via a half-open range to yield button ➐.

NOTE

Recall from “Test Cases and Sections” on page 308 that each SECTION of a Catch unit test runs independently, so modifications to word are independent of each other: the setup code resets word for each test.

Removing Elements

To remove elements from a string, you have several options. The simplest method is to use pop_back, which follows vector in removing the last character from a string. If you want to instead remove all the characters (to yield an empty string), use the clear method. When you need more precision in removing elements, use the erase method, which provides several overloads. You can provide an index and a length, which removes the corresponding characters. You can also provide an iterator to remove a single element or a half-open range to remove many. Listing 15-9 illustrates removing elements from a string.

TEST_CASE("std::string supports removal with") {
  std::string word("therein"); ➊
  SECTION("pop_back") {
    word.pop_back();
    word.pop_back(); ➋
    REQUIRE(word == "there");
  }
  SECTION("clear") {
    word.clear(); ➌
    REQUIRE(word.empty());
  }
  SECTION("erase using half-open range") {
    word.erase(word.begin(), word.begin()+3); ➍
    REQUIRE(word == "rein");
  }
  SECTION("erase using an index and length") {
    word.erase(5, 2);
    REQUIRE(word == "there"); ➎
  }
}

Listing 15-9: Removing elements from a string

You construct a string called word containing the characters therein ➊. In the first test, you call pop_back twice to first remove the letter n followed by the letter i so word contains the characters there ➋. Next, you invoke clear, which removes all the characters from word so it’s empty ➌. The last two tests use erase to remove some subset of the characters in word. In the first usage, you remove the first three characters with a half-open range so word contains rein ➍. In the second, you remove the characters starting at index 5 (i in therein) and extending two characters ➎. Like the first test, this yields the characters there.

Replacing Elements

To insert and remove elements simultaneously, use string to expose the replace method, which has many overloads.

First, you can provide a half-open range and a null-terminated char* or a string, and replace will perform a simultaneous erase of all the elements within the half-open range and an insert of the provided string where the range used to be. Second, you can provide two half-open ranges, and replace will insert the second range instead of a string.

Instead of replacing a range, you can use either an index or a single iterator and a length. You can supply a new half-open range, a character and a size, or a string, and replace will substitute new elements over the implied range. Listing 15-10 demonstrates some of these possibilities.

TEST_CASE("std::string replace works with") {
  std::string word("substitution"); ➊
  SECTION("a range and a char*") {
    word.replace(word.begin()+9, word.end(), "e"); ➋
    REQUIRE(word == "substitute");
  }
  SECTION("two ranges") {
    std::string other("innuendo");
    word.replace(word.begin(), word.begin()+3,
                 other.begin(), other.begin()+2); ➌
    REQUIRE(word == "institution");
  }
  SECTION("an index/length and a string") {
    std::string other("vers");
    word.replace(3, 6, other); ➍
    REQUIRE(word == "subversion");
  }
}

Listing 15-10: Replacing elements of a string

Here, you construct a string called word containing substitution ➊. In the first test, you replace all the characters from index 9 to the end with the letter e, resulting in the word substitute ➋. Next, you replace the first three letters of word with the first two letters of a string containing innuendo ➌, resulting in institution. Finally, you use an alternate way of specifying the target sequence with an index and a length to replace the characters stitut with the characters vers, yielding subversion ➍.

The string class offers a resize method to manually set the length of string. The resize method takes two arguments: a new length and an optional char. If the new length of string is smaller, resize ignores the char. If the new length of string is larger, resize appends the char the implied number of times to achieve the desired length. Listing 15-11 illustrates the resize method.

TEST_CASE("std::string resize") {
  std::string word("shamp"); ➊
  SECTION("can remove elements") {
    word.resize(4); ➋
    REQUIRE(word == "sham");
  }
  SECTION("can add elements") {
    word.resize(7, 'o'); ➌
    REQUIRE(word == "shampoo");
  }
}

Listing 15-11: Resizing a string

You construct a string called word containing the characters shamp ➊. In the first test, you resize word to length 4 so it contains sham ➋. In the second, you resize to a length of 7 and provide the optional character o as the value to extend word with ➌. This results in word containing shampoo.

The “Constructing” section on page 482 explained a substring constructor that can extract contiguous sequences of characters to create a new string. You can also generate substrings using the substr method, which takes two optional arguments: a position argument and a length. The position defaults to 0 (the beginning of the string), and the length defaults to the remainder of the string. Listing 15-12 illustrates how to use substr.

TEST_CASE("std::string substr with") {
  std::string word("hobbits"); ➊
  SECTION("no arguments copies the string") {
    REQUIRE(word.substr() == "hobbits"); ➋
  }
  SECTION("position takes the remainder") {
    REQUIRE(word.substr(3) == "bits"); ➌
  }
  SECTION("position/index takes a substring") {
    REQUIRE(word.substr(3, 3) == "bit"); ➍
  }
}

Listing 15-12: Extracting substrings from a string

You declare a string called word containing hobbits ➊. If you invoke substr with no arguments, you simply copy the string ➋. When you provide the position argument 3, substr extracts the substring beginning at element 3 and extending to the end of the string, yielding bits ➌. Finally, when you provide a position (3) and a length (3), you instead get bit ➍.

Summary of string Manipulation Methods

Table 15-5 lists many of the insertion and deletion methods of string. In this table, str is a string or a C-style char* string, p and n are size_t, ind is a size_t index or an iterator into s, n and i are a size_t, c is a char, and beg and end are iterators. An asterisk (*) indicates that this operation invalidates raw pointers and iterators to v’s elements in at least some circumstances.

Table 15-5: Supported std::string Element Manipulation Methods

Method	Description
s.insert(ind, str, [p], [n])	Inserts the n elements of str, starting at p, into s just before ind. If no n supplied, inserts the entire string or up to the first null of a char; p defaults to 0.
s.insert(ind, n, c)	Inserts n copies of c just before ind.*
s.insert(ind, beg, end)	Inserts the half-open range from beg to end just before ind. *
s.append(str, [p], [n])	Equivalent to s.insert(s.end(), str, [p], [n]).*
s.append(n, c)	Equivalent to s.insert(s.end(), n, c).*
s.append(beg, end)	Appends the half-open range from beg to end to the end of s.*
s += c s += str	Appends c or str to the end of s.*
s.push_back(c)	Appends c to the end of s.*
s.clear()	Removes all characters from s.*
s.erase([i], [n])	Removes n characters starting at position i; i defaults to 0, and n defaults to the remainder of s.*
s.erase(itr)	Erases the element pointed to by itr.*
s.erase(beg, end)	Erases the elements on the half-open range from beg to end.*
s.pop_back()	Removes the last element of s.*
s.resize(n,[c])	Resizes the string so it contains n characters. If this operation increases the string’s length, it adds copies of c, which defaults to 0.*
s.replace(i, n1, str, [p], [n2])	Replaces the n1 characters starting at index i with the n2 elements in str starting at p. By default, p is 0 and n2 is str.length().*
s.replace(beg, end, str)	Replaces the half-open range beg to end with str.*
s.replace(p, n, str)	Replaces from index p to p+n with str.*
s.replace(beg1, end1, beg2, end2)	Replaces the half-open range beg1 to end1 with the half-open range beg2 to end2.*
s.replace(ind, c, [n])	Replaces n elements starting at ind with cs.*
s.replace(ind, beg, end)	Replaces elements starting at ind with the half-open range beg to end.*
s.substr([p], [c])	Returns the substring starting at p with length c. By default, p is 0 and c is the remainder of the string.
s1.swap(s2) swap(s1, s2)	Exchanges the contents of s1 and s2.*

Search

In addition to the preceding methods, string offers several search methods, which enable you to locate substrings and characters that you’re interested in. Each method performs a particular kind of search, so which you choose depends on the particulars of the application.

find

The first method string offers is find, which accepts a string, a C-style string, or a char as its first argument. This argument is an element that you want to locate within this. Optionally, you can provide a second size_t position argument that tells find where to start looking. If find fails to locate the substring, it returns the special size_t-valued, constant, static member std::string::npos. Listing 15-13 illustrates the find method.

TEST_CASE("std::string find") {
  using namespace std::literals::string_literals;
  std::string word("pizzazz"); ➊
  SECTION("locates substrings from strings") {
    REQUIRE(word.find("zz"s) == 2); // pi(z)zazz ➋
  }
  SECTION("accepts a position argument") {
    REQUIRE(word.find("zz"s, 3) == 5); // pizza(z)z ➌
  }
  SECTION("locates substrings from char*") {
    REQUIRE(word.find("zaz") == 3); // piz(z)azz ➍
  }
  SECTION("returns npos when not found") {
    REQUIRE(word.find('x') == std::string::npos); ➎
  }
}

Listing 15-13: Finding substrings within a string

Here, you construct the string called word containing pizzazz ➊. In the first test, you invoke find with a string containing zz, which returns 2 ➋, the index of the first z in pizzazz. When you provide a position argument of 3 corresponding to the second z in pizzazz, find locates the second zz beginning at 5 ➌. In the third test, you use the C-style string zaz, and find returns 3, again corresponding to the second z in pizzazz ➍. Finally, you attempt to find the character x, which doesn’t appear in pizzazz, so find returns std::string::npos ➎.

rfind

The rfind method is an alternative to find that takes the same arguments but searches in reverse. You might want to use this functionality if, for example, you were looking for particular punctuation at the end of a string, as Listing 15-14 illustrates.

TEST_CASE("std::string rfind") {
  using namespace std::literals::string_literals;
  std::string word("pizzazz"); ➊
  SECTION("locates substrings from strings") {
    REQUIRE(word.rfind("zz"s) == 5); // pizza(z)z ➋
  }
  SECTION("accepts a position argument") {
    REQUIRE(word.rfind("zz"s, 3) == 2); // pi(z)zazz ➌
  }
  SECTION("locates substrings from char*") {
    REQUIRE(word.rfind("zaz") == 3); // piz(z)azz ➍
  }
  SECTION("returns npos when not found") {
    REQUIRE(word.rfind('x') == std::string::npos); ➎
  }
}

Listing 15-14: Finding substrings in reverse within a string

Using the same word ➊, you use the same arguments as in Listing 15-13 to test rfind. Given zz, rfind returns 5, the second to last z in pizzazz ➋. When you provide the positional argument 3, rfind instead returns the first z in pizzazz ➌. Because there’s only one occurrence of the substring zaz, rfind returns the same position as find ➍. Also like find, rfind returns std::string::npos when given x ➎.

**find_*_of**

Whereas find and rfind locate exact subsequences in a string, a family of related functions finds the first character contained in a given argument.

The find_first_of function accepts a string and locates the first character in this contained in the argument. Optionally, you can provide a size_t position argument to indicate to find_first_of where to start in the string. If find_first_of cannot find a matching character, it will return std::string::npos. Listing 15-15 illustrates the find_first_of function.

TEST_CASE("std::string find_first_of") {
  using namespace std::literals::string_literals;
  std::string sentence("I am a Zizzer-Zazzer-Zuzz as you can plainly see."); ➊
  SECTION("locates characters within another string") {
    REQUIRE(sentence.find_first_of("Zz"s) == 7); // (Z)izzer ➋
  }
  SECTION("accepts a position argument") {
    REQUIRE(sentence.find_first_of("Zz"s, 11) == 14); // (Z)azzer ➌
  }
  SECTION("returns npos when not found") {
    REQUIRE(sentence.find_first_of("Xx"s) == std::string::npos); ➍
  }
}

Listing 15-15: Finding the first element from a set within a string

The string called sentence contains I am a Zizzer-Zazzer-Zuzz as you can plainly see. ➊. Here, you invoke find_first_of with the string Zz, which matches both lowercase and uppercase z. This returns 7, which corresponds to the first Z in sentence, Zizzer ➋. In the second test, you again provide the string Zz but also pass the position argument 11, which corresponds to the e in Zizzer. This results in 14, which corresponds to the Z in Zazzer ➌. Finally, you invoke find_first_of with Xx, which results in std::string::npos because sentence doesn’t contain an x (or an X) ➍.

A string offers three find_first_of variations:

find_first_not_of returns the first character not contained in the string argument. Rather than providing a string containing the elements you want to find, you provide a string of characters you don’t want to find.
find_last_of performs matching in reverse; rather than searching from the beginning of the string or from the position argument and proceeding to the end, find_last_of begins at the end of the string or from the position argument and proceeds to the beginning.
find_last_not_of combines the two prior variations: you pass a string containing elements you don’t want to find, and find_last_not_of searches in reverse.

Your choice of find function boils down to what your algorithmic requirements are. Do you need to search from the back of a string, say for a punctuation mark? If so, use find_last_of. Are you looking for the first space in a string? If so, use find_first_of. Do you want to invert your search and look for the first element that is not a member of some set? Then use the alternatives find_first_not_of and find_last_not_of, depending on whether you want to start from the beginning or end of the string.

Listing 15-16 illustrates these three find_first_of variations.

TEST_CASE("std::string") {
  using namespace std::literals::string_literals;
  std::string sentence("I am a Zizzer-Zazzer-Zuzz as you can plainly see."); ➊
  SECTION("find_last_of finds last element within another string") {
    REQUIRE(sentence.find_last_of("Zz"s) == 24); // Zuz(z) ➋
  }
  SECTION("find_first_not_of finds first element not within another string") {
    REQUIRE(sentence.find_first_not_of(" -IZaeimrz"s) == 22); // Z(u)zz ➌
  }
  SECTION("find_last_not_of finds last element not within another string") {
    REQUIRE(sentence.find_last_not_of(" .es"s) == 43); // plainl(y) ➍
     }
}

Listing 15-16: Alternatives to the find_first_of method of string

Here, you initialize the same sentence as in Listing 15-15 ➊. In the first test, you use find_last_of on Zz, which searches in reverse for any z or Z and returns 24, the last z in the sentence Zuzz ➋. Next, you use find_first_not_of and pass a farrago of characters (not including the letter u), which results in 22, the position of the first u in Zuzz ➌. Finally, you use find_last_not_of to find the last character not equal to space, period, e, or s. This results in 43, the position of y in plainly ➍.

Summary of string Search Methods

Table 15-6 lists many of the search methods for string. Note that s2 is a string; cstr is a C-style char* string; c is a char; and n, l, and pos are size_t in the table.

Table 15-6: Supported std::string Search Algorithms

Method	Searches s starting at p and returns the position of the . . .
s.find(s2, [p])	First substring equal to s2; p defaults to 0.
s.find(cstr, [p], [l])	First substring equal to the first l characters of cstr; p defaults to 0; l defaults to cstr’s length per null termination.
s.find(c, [p])	First character equal to c; p defaults to 0.
s.rfind(s2, [p])	Last substring equal to s2; p defaults to npos.
s.rfind(cstr, [p], [l])	Last substring equal to the first l characters of cstr; p defaults to npos; l defaults to cstr’s length per null termination.
s.rfind(c, [p])	Last character equal to c; p defaults to npos.
s.find_first_of(s2, [p])	First character contained in s2; p defaults to 0.
s.find_first_of(cstr, [p], [l])	First character contained in the first l characters of cstr; p defaults to 0; l defaults to cstr’s length per null termination.
s.find_first_of(c, [p])	First character equal to c; p defaults to 0.
s.find_last_of(s2, [p])	Last character contained in s2; p defaults to 0.
s.find_last_of(cstr, [p], [l])	Last character contained in the first l characters of cstr; p defaults to 0; l defaults to cstr’s length per null termination.
s.find_last_of(c, [p])	Last character equal to c; p defaults to 0.
s.find_first_not_of(s2, [p])	First character not contained in s2; p defaults to 0.
s.find_first_not_of(cstr, [p], [l])	First character not contained in the first l characters of cstr; p defaults to 0; l defaults to cstr’s length per null termination.
s.find_first_not_of(c, [p])	First character not equal to c; p defaults to 0.
s.find_last_not_of(s2, [p])	Last character not contained in s2; p defaults to 0.
s.find_last_not_of(cstr, [p], [l])	Last character not contained in the first l characters of cstr; p defaults to 0; l defaults to cstr’s length per null termination.
s.find_last_not_of(c, [p])	Last character not equal to c; p defaults to 0.

Numeric Conversions

The STL provides functions for converting between string or wstring and the fundamental numeric types. Given a numeric type, you can use the std::to_string and std::to_wstring functions to generate its string or wstring representation. Both functions have overloads for all the numeric types. Listing 15-17 illustrates string and wstring.

TEST_CASE("STL string conversion function") {
  using namespace std::literals::string_literals;
  SECTION("to_string") {
    REQUIRE("8675309"s == std::to_string(8675309)); ➊
  }
  SECTION("to_wstring") {
    REQUIRE(L"109951.1627776"s == std::to_wstring(109951.1627776)); ➋
  }
}

Listing 15-17: Numeric conversion functions of string

NOTE

Thanks to the inherent inaccuracy of the double type, the second unit test ➋ might fail on your system.

The first example uses to_string to convert the int 8675309 into a string ➊; the second example uses to_wstring to convert the double 109951.1627776 into a wstring ➋.

You can also convert the other way, going from a string or wstring to a numeric type. Each numeric conversion function accepts a string or wstring containing a string-encoded number as its first argument. Next, you can provide an optional pointer to a size_t. If provided, the conversion function will write the index of the last character it was able to convert (or the length of the input string if it decoded all characters). By default, this index argument is nullptr, in which case the conversion function doesn’t write the index. When the target type is integral, you can provide a third argument: an int corresponding to the base of the encoded string. This base argument is optional and defaults to 10.

Each conversion function throws std::invalid_argument if no conversion could be performed and throws std::out_of_range if the converted value is out of range for the corresponding type.

Table 15-7 lists each of these conversion functions along with its target type. In this table, s is a string. If p is not nullptr, the conversion function will write the position of the first unconverted character in s to the memory pointed to by p. If all characters are encoded, returns the length of s. Here, b is the number’s base representation in s. Note that p defaults to nullptr, and b defaults to 10.

Table 15-7: Supported Numeric Conversion Functions for std::string and std::wstring

Function	Converts s to
stoi(s, [p], [b])	An int
stol(s, [p], [b])	A long
stoll(s, [p], [b])	A long long
stoul(s, [p], [b])	An unsigned long
stoull(s, [p], [b])	An unsigned long long
stof(s, [p])	A float
stod(s, [p])	A double
stold(s, [p])	A long double
to_string(n)	A string
to_wstring(n)	A wstring

Listing 15-18 illustrates several numeric conversion functions.

TEST_CASE("STL string conversion function") {
  using namespace std::literals::string_literals;
  SECTION("stoi") {
    REQUIRE(std::stoi("8675309"s) == 8675309); ➊
  }
  SECTION("stoi") {
    REQUIRE_THROWS_AS(std::stoi("1099511627776"s), std::out_of_range); ➋
  }
  SECTION("stoul with all valid characters") {
    size_t last_character{};
    const auto result = std::stoul("0xD3C34C3D"s, &last_character, 16); ➌
    REQUIRE(result == 0xD3C34C3D);
    REQUIRE(last_character == 10);
  }
  SECTION("stoul") {
    size_t last_character{};
    const auto result = std::stoul("42six"s, &last_character); ➍
    REQUIRE(result == 42);
    REQUIRE(last_character == 2);
  }
  SECTION("stod") {
    REQUIRE(std::stod("2.7182818"s) == Approx(2.7182818)); ➎
  }
}

Listing 15-18: String conversion functions of string

First, you use stoi to convert 8675309 to an integer ➊. In the second test, you attempt to use stoi to convert the string 1099511627776 into an integer. Because this value is too large for an int, stoi throws std::out_of_range ➋. Next, you convert 0xD3C34C3D with stoi, but you provide the two optional arguments: a pointer to a size_t called last_character and a hexadecimal base ➌. The last_character object is 10, the length of 0xD3C34C3D, because stoi can parse every character. The string in the next test, 42six, contains the unparsable characters six. When you invoke stoul this time, the result is 42 and last_character equals 2, the position of s in six ➍. Finally, you use stod to convert the string 2.7182818 to a double ➎.

NOTE

Boost’s Lexical Cast provides an alternative, template-based approach to numeric conversions. Refer to the documentation for boost::lexical_cast available in the <boost/lexical_cast.hpp> header.

String View

A string view is an object that represents a constant, contiguous sequence of characters. It’s very similar to a const string reference. In fact, string view classes are often implemented as a pointer to a character sequence and a length.

The STL offers the class template std::basic_string_view in the <string_view> header, which is analogous to std::basic_string. The template std::basic_string_view has a specialization for each of the four commonly used character types:

char has string_view
wchar_t has wstring_view
char16_t has u16string_view
char32_t has u32string_view

This section discusses the string_view specialization for demonstration purposes, but the discussion generalizes to the other three specializations.

The string_view class supports most of the same methods as string; in fact, it’s designed to be a drop-in replacement for a const string&.

Constructing

The string_view class supports default construction, so it has zero length and points to nullptr. Importantly, string_view supports implicit construction from a const string& or a C-style string. You can construct string_view from a char* and a size_t, so you can manually specify the desired length in case you want a substring or you have embedded nulls. Listing 15-19 illustrates the use of string_view.

TEST_CASE("std::string_view supports") {
  SECTION("default construction") {
    std::string_view view; ➊
    REQUIRE(view.data() == nullptr);
    REQUIRE(view.size() == 0);
    REQUIRE(view.empty());
  }
  SECTION("construction from string") {
    std::string word("sacrosanct");
    std::string_view view(word); ➋
    REQUIRE(view == "sacrosanct");
  }
  SECTION("construction from C-string") {
    auto word = "viewership";
    std::string_view view(word); ➌
    REQUIRE(view == "viewership");
  }
  SECTION("construction from C-string and length") {
    auto word = "viewership";
    std::string_view view(word, 4); ➍
    REQUIRE(view == "view");
  }
}

Listing 15-19: The constructors of string_view

The default-constructed string_view points to nullptr and is empty ➊. When you construct a string_view from a string ➋ or a C-style string ➌, it points to the original’s contents. The final test provides the optional length argument 4, which means the string_view refers to only the first four characters instead ➍.

Although string_view also supports copy construction and assignment, it doesn’t support move construction or assignment. This design makes sense when you consider that string_view doesn’t own the sequence to which it points.

Supported string_view Operations

The string_view class supports many of the same operations as a const string& with identical semantics. The following lists all the shared methods between string and string_view:

Iterators begin, end, rbegin, rend, cbegin, cend, crbegin, crend

Element Access operator[], at, front, back, data

Capacity size, length, max_size, empty

Search find, rfind, find_first_of, find_last_of, find_first_not_of, find_last_not_of

Extraction copy, substr

Comparison compare, operator==, operator!= , operator<, operator>, operator<=, operator>=

In addition to these shared methods, string_view supports the remove_prefix method, which removes the given number of characters from the beginning of the string_view, and the remove_suffix method, which instead removes characters from the end. Listing 15-20 illustrates both methods.

TEST_CASE("std::string_view is modifiable with") {
  std::string_view view("previewing"); ➊
  SECTION("remove_prefix") {
    view.remove_prefix(3); ➋
    REQUIRE(view == "viewing");
  }
  SECTION("remove_suffix") {
    view.remove_suffix(3); ➌
    REQUIRE(view == "preview");
  }
}

Listing 15-20: Modifying a string_view with remove_prefix and remove_suffix

Here, you declare a string_view referring to the string literal previewing ➊. The first test invokes remove_prefix with 3 ➋, which removes three characters from the front of string_view so it now refers to viewing. The second test instead invokes remove_suffix with 3 ➌, which removes three characters from the back of the string_view and results in preview.

Ownership, Usage, and Efficiency

Because string_view doesn’t own the sequence to which it refers, it’s up to you to ensure that the lifetime of the string_view is a subset of the referred-to sequence’s lifetime.

Perhaps the most common usage of string_view is as a function parameter. When you need to interact with an immutable sequence of characters, it’s the first port of call. Consider the count_vees function in Listing 15-21, which counts the frequency of the letter v in a sequence of characters.

#include <string_view>

size_t count_vees(std::string_view my_view➊) {
  size_t result{};
  for(auto letter : my_view) ➋
    if (letter == 'v') result++; ➌
  return result; ➍
}

Listing 15-21: The count_vees function

The count_vees function takes a string_view called my_view ➊, which you iterate over using a range-based for loop ➋. Each time a character in my_view equals v, you increment a result variable ➌, which you return after exhausting the sequence ➍.

You could reimplement Listing 15-21 by simply replacing string_view with const string&, as demonstrated in Listing 15-22.

#include <string>

size_t count_vees(const std::string& my_view) {
--snip--
}

Listing 15-22: The count_vees function reimplemented to use a const string& instead of a string_view

If string_view is just a drop-in replacement for a const string&, why bother having it? Well, if you invoke count_vees with a std::string, there’s no difference: modern compilers will emit the same code.

If you instead invoke count_vees with a string literal, there’s a big difference: when you pass a string literal for a const string&, you construct a string. When you pass a string literal for a string_view, you construct a string_view. Constructing a string is probably more expensive, because it might have to allocate dynamic memory and it definitely has to copy characters. A string_view is just a pointer and a length (no copying or allocating is required).

Regular Expressions

A regular expression, also called a regex, is a string that defines a search pattern. Regexes have a long history in computer science and form a sort of mini-language for searching, replacing, and extracting language data. The STL offers regular expression support in the <regex> header.

When used judiciously, regular expressions can be tremendously powerful, declarative, and concise; however, it’s also easy to write regexes that are totally inscrutable. Use regexes deliberately.

Patterns

You build regular expressions using strings called patterns. Patterns represent a desired set of strings using a particular regular expression grammar that sets the syntax for building patterns. In other words, a pattern defines the subset of all possible strings that you’re interested in. The STL supports a handful of grammars, but the focus here will be on the very basics of the default grammar, the modified ECMAScript regular expression grammar (see [re.grammar] for details).

Character Classes

In the ECMAScript grammar, you intermix literal characters with special markup to describe your desired strings. Perhaps the most common markup is a character class, which stands in for a set of possible characters: d matches any digit, s matches any whitespace, and w matches any alphanumeric (“word”) character.

Table 15-8 lists a few example regular expressions and possible interpretations.

Table 15-8: Regular Expression Patterns Using Only Character Classes and Literals

Regex pattern	Possibly describes
ddd-ddd-dddd	An American phone number, such as 202-456-1414
dd:dd wM	A time in HH:MM AM/PM format, such as 08:49 PM
wwdddddd	An American ZIP code including a prepended state code, such as NJ07932
wd-wd	An astromech droid identifier, such as R2-D2
cwt	A three-letter word starting with c and ending with t, such as cat or cot

You can also invert a character class by capitalizing the d, s, or w to give the opposite: D matches any non-digit, S matches any non-whitespace, and W matches any non-word character.

In addition, you can build your own character classes by explicitly enumerating them between square brackets []. For example, the character class [02468] includes even digits. You can also use hyphens as shortcuts to include implied ranges, so the character class [0-9a-fA-F] includes any hexadecimal digit whether the letter is capitalized or not. Finally, you can invert a custom character class by prepending the list with a caret ^. For example, the character class [^aeiou] includes all non-vowel characters.

Quantifiers

You can save some typing by using quantifiers, which specify that the character directly to the left should be repeated some number of times. Table 15-9 lists the regex quantifiers.

Table 15-9: Regular Expression Quantifiers

Regex quantifier	Specifies a quantity of
*	0 or more
+	1 or more
?	0 or 1
{n}	Exactly n
{n,m}	Between n and m, inclusive
{n,}	At least n

Using quantifiers, you can specify all words beginning with c and ending with t using the pattern cw*t, because w* matches any number of word characters.

Groups

A group is a collection of characters. You can specify a group by placing it within parentheses. Groups are useful in several ways, including specifying a particular collection for eventual extraction and for quantification.

For example, you could improve the ZIP pattern in Table 15-8 to use quantifiers and groups, like this:

(w{2})?➊(d{5})➋(-d{4})?➌

Now you have three groups: the optional state ➊, the ZIP code ➋, and an optional four-digit suffix ➌. As you’ll see later on, these groups make parsing from regexes much easier.

Other Special Characters

Table 15-10 lists several other special characters available for use in regex patterns.

Table 15-10: Example Special Characters

Character	Specifies
X\|Y	Character X or Y
Y	The special character Y as a literal (in other words, escape it)
	Newline
	Carriage return
	Tab
	Null
xYY	The hexadecimal character corresponding to YY

basic_regex

The STL’s std::basic_regex class template in the <regex> header represents a regular expression constructed from a pattern. The basic_regex class accepts two template parameters, a character type and an optional traits class. You’ll almost always want to use one of the convenience specializations: std::regex for std::basic_regex<char> or std::wregex for std::basic_regex<wchar_t>.

The primary means of constructing a regex is by passing a string literal containing your regex pattern. Because patterns will require a lot of escaped characters—especially the backslash —it’s a good idea to use raw string literals, such as R"()". The constructor accepts a second, optional parameter for specifying syntax flags like the regex grammar.

Although regex is used primarily as input into regular expression algorithms, it does offer a few methods that users can interact with. It supports the usual copy and move construction and assignment suite and swap, plus the following:

assign(s) reassigns the pattern to s
mark_count() returns the number of groups in the pattern
flags() returns the syntax flags issued at construction

Listing 15-23 illustrates how you could construct a ZIP code regex and inspect its subgroups.

#include <regex>

TEST_CASE("std::basic_regex constructs from a string literal") {
  std::regex zip_regex{ R"((w{2})?(d{5})(-d{4})?)" }; ➊
  REQUIRE(zip_regex.mark_count() == 3); ➋
}

Listing 15-23: Constructing a regex using a raw string literal and extracting its group count

Here, you construct a regex called zip_regex using the pattern (w{2})?(d{5})(-d{4})? ➊. Using the mark_count method, you see that zip_regex contains three groups ➋.

Algorithms

The <regex> class contains three algorithms for applying std::basic_regex to a target string: matching, searching, or replacing. Which you choose depends on the task at hand.

Matching

Matching attempts to marry a regular expression to an entire string. The STL provides the std::regex_match function for matching, which has four overloads.

First, you can provide regex_match a string, a C-string, or a begin and end iterator forming a half-open range. The next parameter is an optional reference to a std::match_results object that receives details about the match. The next parameter is a std::basic_regex that defines the matching, and the final parameter is an optional std::regex_constants::match_flag_type that specifies additional matching options for advanced use cases. The regex_match function returns a bool, which is true if it found a match; otherwise, it’s false.

To summarize, you can invoke regex_match in the following ways:

regex_match(beg, end, [mr], rgx, [flg])
regex_match(str, [mr], rgx, [flg])

Either provide a half-open range from beg to end or a string/C-string str to search. Optionally, you can provide a match_results called mr to store all the details of any matches found. You obviously have to provide a regex rgx. Finally, the flags flg are seldom used.

NOTE

For details on match flags flg, refer to [re.alg.match].

A submatch is a subsequence of the matched string that corresponds to a group. The ZIP code–matching regular expression (w{2})(d{5})(-d{4})? can produce two or three submatches depending on the string. For example, TX78209 contains the two submatches TX and 78209, and NJ07936-3173 contains the three submatches NJ, 07936, and -3173.

The match_results class stores zero or more std::sub_match instances. A sub_match is a simple class template that exposes a length method to return the length of a submatch and a str method to build a string from the sub_match.

Somewhat confusingly, if regex_match successfully matches a string, match_results stores the entire matched string as its first element and then stores any submatches as subsequent elements.

The match_results class provides the operations listed in Table 15-11.

Table 15-11: Supported Operations of match_results

Operation	Description
mr.empty()	Checks whether the match was successful.
mr.size()	Returns the number of submatches.
mr.max_size()	Returns the maximum number of submatches.
mr.length([i])	Returns the length of the submatch i, which defaults to 0.
mr.position([i])	Returns the character of the first position of submatch i, which defaults to 0.
mr.str([i])	Returns the string representing submatch i, which defaults to 0.
mr [i]	Returns a reference to a std::sub_match class corresponding to submatch i, which defaults to 0.
mr.prefix()	Returns a reference to a std::sub_match class corresponding to the sequence before the match.
mr.suffix()	Returns a reference to a std::sub_match class corresponding to the sequence after the match.
mr.format(str)	Returns a string with contents according to the format string str. There are three special sequences: $' for the characters before a match, $' for the characters after the match, and $& for the matched characters.
mr.begin() mr.end() mr.cbegin() mr.cend()	Returns the corresponding iterator to the sequence of submatches.

The std::sub_match class template has predefined specializations to work with common string types:

std::csub_match for a const char*
std::wcsub_match for a const wchar_t*
std::ssub_match for a std::string
std::wssub_match for a std::wstring

Unfortunately, you’ll have to keep track of all these specializations manually due to the design of std::regex_match. This design generally befuddles newcomers, so let’s look at an example. Listing 15-24 uses the ZIP code regular expression (w{2})(d{5})(-d{4})? to match against the strings NJ07936-3173 and Iomega Zip 100.

#include <regex>
#include <string>

TEST_CASE("std::sub_match") {
  std::regex regex{ R"((w{2})(d{5})(-d{4})?)" }; ➊
  std::smatch results; ➋
  SECTION("returns true given matching string") {
    std::string zip("NJ07936-3173");
    const auto matched = std::regex_match(zip, results, regex); ➌
    REQUIRE(matched); ➍
    REQUIRE(results[0] == "NJ07936-3173"); ➎
    REQUIRE(results[1] == "NJ"); ➏
    REQUIRE(results[2] == "07936");
    REQUIRE(results[3] == "-3173");
  }
  SECTION("returns false given non-matching string") {
    std::string zip("Iomega Zip 100");
    const auto matched = std::regex_match(zip, results, regex); ➐
    REQUIRE_FALSE(matched); ➑
    }
}

Listing 15-24: A regex_match attempts to match a regex to a string.

You construct a regex with the raw literal R"((w{2})(d{5})(-d{4})?)" ➊ and default construct an smatch ➋. In the first test, you regex_match the valid ZIP code NJ07936-3173 ➌, which returns the true value matched to indicate success ➍. Because you provide an smatch to regex_match, it contains the valid ZIP code as the first element ➎, followed by each of the three subgroups ➏.

In the second test, you regex_match the invalid ZIP code Iomega Zip 100 ➐, which fails to match and returns false ➑.

Searching

Searching attempts to match a regular expression to a part of a string. The STL provides the std::regex_search function for searching, which is essentially a replacement for regex_match that succeeds even when only a part of a string matches a regex.

For example, The string NJ07936-3173 is a ZIP Code. contains a ZIP code. But applying the ZIP regular expression to it using std::regex_match will return false because the regex doesn’t match the entire string. However, applying std::regex_search instead would yield true because the string embeds a valid ZIP code. Listing 15-25 illustrates regex_match and regex_search.

TEST_CASE("when only part of a string matches a regex, std::regex_ ") {
  std::regex regex{ R"((w{2})(d{5})(-d{4})?)" }; ➊
  std::string sentence("The string NJ07936-3173 is a ZIP Code."); ➋
  SECTION("match returns false") {
    REQUIRE_FALSE(std::regex_match(sentence, regex)); ➌
  }
  SECTION("search returns true") {
    REQUIRE(std::regex_search(sentence, regex)); ➍
  }
}

Listing 15-25: Comparing regex_match and regex_search

As before, you construct the ZIP regex ➊. You also construct the example string sentence, which embeds a valid ZIP code ➋. The first test calls regex_match with sentence and regex, which returns false ➌. The second test instead calls regex_search with the same arguments and returns true ➍.

Replacing

Replacing substitutes regular expression occurrences with replacement text. The STL provides the std::regex_replace function for replacing.

In its most basic usage, you pass regex_replace three arguments:

A source string/C-string/half-open range to search
A regular expression
A replacement string

As an example, Listing 15-26 replaces all the vowels in the phrase queueing and cooeeing in eutopia with underscores (_).

TEST_CASE("std::regex_replace") {
  std::regex regex{ "[aeoiu]" }; ➊
  std::string phrase("queueing and cooeeing in eutopia"); ➋
  const auto result = std::regex_replace(phrase, regex, "_"); ➌
  REQUIRE(result == "q_____ng _nd c_____ng _n __t_p__"); ➍
}

Listing 15-26: Using std::regex_replace to substitute underscores for vowels in a string

You construct a std::regex that contains the set of all vowels ➊ and a string called phrase containing the vowel-rich contents queueing and cooeeing in eutopia ➋. Next, you invoke std::regex_replace with phrase, the regex, and the string literal _ ➌, which replaces all vowels with underscores ➍.

NOTE

Boost Regex provides regular expression support mirroring the STL’s in the <boost/regex.hpp> header. Another Boost library, Xpressive, offers an alternative approach with regular expressions that you can express directly in C++ code. It has some major advantages, such as expressiveness and compile-time syntax checking, but the syntax necessarily diverges from standard regular expression syntaxes like POSIX, Perl, and ECMAScript.

Boost String Algorithms

Boost’s String Algorithms library offers a bounty of string manipulation functions. It contains functions for common tasks related to string, such as trimming, case conversion, finding/replacing, and evaluating characteristics. You can access all the Boost String Algorithms functions in the boost::algorithm namespace and in the <boost/algorithm/string.hpp> convenience header.

Boost Range

Range is a concept (in the Chapter 6 compile-time polymorphism sense of the word) that has a beginning and an end that allow you to iterate over constituent elements. The range aims to improve the practice of passing a half-open range as a pair of iterators. By replacing the pair with a single object, you can compose algorithms together by using the range result of one algorithm as the input to another. For example, if you wanted to transform a range of strings to all uppercase and sort them, you could pass the results of one operation directly into the other. This is not generally possible to do with iterators alone.

Ranges are not currently part of the C++ standard, but several experimental implementations exist. One such implementation is Boost Range, and because Boost String Algorithms uses Boost Range extensively, let’s look at it now.

The Boost Range concept is like the STL container concept. It provides the usual complement of begin/end methods to expose iterators over the elements in the range. Each range has a traversal category, which indicates the range’s supported operations:

A single-pass range allows one-time, forward iteration.
A forward range allows (unlimited) forward iteration and satisfies single-pass range.
A bidirectional range allows forward and backward iteration and satisfies forward range.
A random-access range allows arbitrary element access and satisfies bidirectional range.

Boost String Algorithms is designed for std::string, which satisfies the random-access range concept. For the most part, the fact that Boost String Algorithms accepts Boost Range rather than std::string is a totally transparent abstraction to users. When reading the documentation, you can mentally substitute Range with string.

Predicates

Boost String Algorithms incorporates predicates extensively. You can use them directly by bringing in the <boost/algorithm/string/predicate.hpp> header. Most of the predicates contained in this header accept two ranges, r1 and r2, and return a bool based on their relationship. The predicate starts_with, for example, returns true if r1 begins with r2.

Each predicate has a case-insensitive version, which you can use by prepending the letter i to the method name, such as istarts_with. Listing 15-27 illustrates starts_with and istarts_with.

#include <string>
#include <boost/algorithm/string/predicate.hpp>

TEST_CASE("boost::algorithm") {
  using namespace boost::algorithm;
  using namespace std::literals::string_literals;
  std::string word("cymotrichous"); ➊
  SECTION("starts_with tests a string's beginning") {
    REQUIRE(starts_with(word, "cymo"s)); ➋
  }
  SECTION("istarts_with is case insensitive") {
    REQUIRE(istarts_with(word, "cYmO"s)); ➌
  }
}

Listing 15-27: Both starts_with and istarts_with check a range’s beginning characters.

You initialize a string containing cymotrichous ➊. The first test shows that starts_with returns true when with word and cymo ➋. The case-insensitive version istarts_with also returns true given word and cYmO ➌.

Note that <boost/algorithm/string/predicate.hpp> also contains an all predicate, which accepts a single range r and a predicate p. It returns true if p evaluates to true for all elements of r, as Listing 15-28 illustrates.

TEST_CASE("boost::algorithm::all evaluates a predicate for all elements") {
  using namespace boost::algorithm;
  std::string word("juju"); ➊
  REQUIRE(all(word➋, [](auto c) { return c == 'j' || c =='u'; }➌));
}

Listing 15-28: The all predicate evaluates if all elements in a range satisfy a predicate.

You initialize a string containing juju ➊, which you pass to all as the range ➋. You pass a lambda predicate, which returns true for the letters j and u ➌. Because juju contains only these letters, all returns true.

Table 15-12 lists the predicates available in <boost/algorithm/string/predicate.hpp>.In this table, r, r1, and r2 are string ranges, and p is an element comparison predicate.

Table 15-12: Predicates in the Boost String Algorithms Library

Predicate	Returns true if
starts_with(r1, r2, [p]) istarts_with(r1, r2)	r1 starts with r2; p used for character-wise comparison.
ends_with(r1, r2, [p]) iends_with(r1, r2)	r1 ends with r2; p used for character-wise comparison.
contains(r1, r2, [p]) icontains(r1, r2)	r1 contains r2; p used for character-wise comparison.
equals(r1, r2, [p]) iequals(r1, r2)	r1 equals r2; p used for character-wise comparison.
lexicographical_compare(r1, r2, [p]) ilexicographical_compare(r1, r2)	r1 lexicographically less than r2; p used for character-wise comparison.
all(r, [p])	All elements of r return true for p.

Function permutations beginning with i are case-insensitive.

Classifiers

Classifiers are predicates that evaluate some characteristics about a character. The <boost/algorithm/string/classification.hpp> header offers generators for creating classifiers. A generator is a non-member function that acts like a constructor. Some generators accept arguments for customizing the classifier.

NOTE

Of course, you can create your own predicates just as easily with your own function objects, like lambdas, but Boost provides a menu of premade classifiers for convenience.

The is_alnum generator, for example, creates a classifier that determines whether a character is alphanumeric. Listing 15-29 illustrates how to use this classifier independently or in conjunction with all.

#include <boost/algorithm/string/classification.hpp>

TEST_CASE("boost::algorithm::is_alnum") {
  using namespace boost::algorithm;
  const auto classifier = is_alnum(); ➊
  SECTION("evaluates alphanumeric characters") {
    REQUIRE(classifier('a')); ➋
    REQUIRE_FALSE(classifier('$')); ➌
  }
  SECTION("works with all") {
    REQUIRE(all("nostarch", classifier)); ➍
    REQUIRE_FALSE(all("@nostarch", classifier)); ➎
  }
}

Listing 15-29: The is_alum generator determines whether a character is alphanumeric.

Here, you construct a classifier from the is_alnum generator ➊. The first test uses the classifier to evaluate that a is alphanumeric ➋ and $ is not ➌. Because all classifiers are predicates that operate on characters, you can use them in conjunction with the all predicate discussed in the previous section to determine that nostarch contains all alphanumeric characters ➍ and @nostarch doesn’t ➎.

Table 15-13 lists the character classifications available in <boost/algorithm/string/classification.hpp>. In this table, r is a string range, and beg and end are element comparison predicates.

Table 15-13: Character Predicates in the Boost String Algorithms Library

Predicate	Returns true if element is . . .
is_space	A space
is_alnum	An alphanumeric character
is_alpha	An alphabetical character
is_cntrl	A control character
is_digit	A decimal digit
is_graph	A graphical character
is_lower	A lowercase character
is_print	A printable character
is_punct	A punctuation character
is_upper	An uppercase character
is_xdigit	A hexadecimal digit
is_any_of(r)	Contained in r
is_from_range(beg, end)	Contained in the half-open range from beg to end

Finders

A finder is a concept that determines a position in a range corresponding to some specified criteria, usually a predicate or a regular expression. Boost String Algorithms provides some generators for producing finders in the <boost/algorithm/string/finder.hpp> header.

For example, the nth_finder generator accepts a range r and an index n, and it creates a finder that will search a range (taken as a begin and an end iterator) for the nth occurrence of r, as Listing 15-30 illustrates.

#include <boost/algorithm/string/finder.hpp>

TEST_CASE("boost::algorithm::nth_finder finds the nth occurrence") {
  const auto finder = boost::algorithm::nth_finder("na", 1); ➊
  std::string name("Carl Brutananadilewski"); ➋
  const auto result = finder(name.begin(), name.end()); ➌
  REQUIRE(result.begin() == name.begin() + 12); ➍ // Brutana(n)adilewski
  REQUIRE(result.end() == name.begin() + 14); ➎ // Brutanana(d)ilewski
}

Listing 15-30: The nth_finder generator creates a finder that locates the nth occurrence of a sequence.

You use the nth_finder generator to create finder, which will locate the second instance of na in a range (n is zero based) ➊. Next, you construct name containing Carl Brutananadilewski ➋ and invoke finder with the begin and end iterators of name ➌. The result is a range whose begin points to the second n in Brutananadilewski ➍ and whose end points to the first d in Brutananadilewski ➎.

Table 15-14 lists the finders available in <boost/algorithm/string/finder.hpp>. In this table, s is a string, p is an element comparison predicate, n is an integral value, beg and end are iterators, rgx is a regular expression, and r is a string range.

Table 15-14: Finders in the Boost String Algorithms Library

Generator	Creates a finder that, when invoked, returns . . .
first_finder(s, p)	The first element matching s using p
last_finder(s, p)	The last element matching s using p
nth_finder(s, p, n)	The nth element matching s using p
head_finder(n)	The first n elements
tail_finder(n)	the last n elements
token_finder(p)	The character matching p
range_finder(r) range_finder(beg, end)	r regardless of input
regex_finder(rgx)	The first substring matching rgx

NOTE

Boost String Algorithms specifies a formatter concept, which presents the results of a finder to a replace algorithm. Only an advanced user will need these algorithms. Refer to the documentation for the find_format algorithms in the <boost/algorithm/string/find_format.hpp> header for more information.

Modifying Algorithms

Boost contains a lot of algorithms for modifying a string (range). Between the <boost/algorithm/string/case_conv.hpp>, <boost/algorithm/string/trim.hpp>, and <boost/algorithm/string/replace.hpp> headers, algorithms exist to convert case, trim, replace, and erase many different ways.

For example, the to_upper function will convert all of a string’s letters to uppercase. If you want to keep the original unmodified, you can use the to_upper_copy function, which will return a new object. Listing 15-31 illustrates to_upper and to_upper_copy.

#include <boost/algorithm/string/case_conv.hpp>

TEST_CASE("boost::algorithm::to_upper") {
  std::string powers("difficulty controlling the volume of my voice"); ➊
  SECTION("upper-cases a string") {
    boost::algorithm::to_upper(powers); ➋
    REQUIRE(powers == "DIFFICULTY CONTROLLING THE VOLUME OF MY VOICE"); ➌
  }
  SECTION("_copy leaves the original unmodified") {
    auto result = boost::algorithm::to_upper_copy(powers); ➍
    REQUIRE(powers == "difficulty controlling the volume of my voice"); ➎
    REQUIRE(result == "DIFFICULTY CONTROLLING THE VOLUME OF MY VOICE"); ➏
  }
}

Listing 15-31: Both to_upper and to_upper_copy convert the case of a string.

You create a string called powers ➊. The first test invokes to_upper on powers ➋, which modifies it in place to contain all uppercase letters ➌. The second test uses the _copy variant to create a new string called result ➍. The powers string is unaffected ➎, whereas result contains an all uppercase version ➏.

Some Boost String Algorithms, such as replace_first, also have case-insensitive versions. Just prepend an i, and matching will proceed regardless of case. For algorithms like replace_first that also have _copy variants, any permutation will work (replace_first, ireplace_first, replace_first_copy, and ireplace_first_copy).

The replace_first algorithm and its variants accept an input range s, a match range m, and a replace range r, and replaces the first instance of m in s with r. Listing 15-32 illustrates replace_first and i_replace_first.

#include <boost/algorithm/string/replace.hpp>

TEST_CASE("boost::algorithm::replace_first") {
  using namespace boost::algorithm;
  std::string publisher("No Starch Press"); ➊
  SECTION("replaces the first occurrence of a string") {
    replace_first(publisher, "No", "Medium"); ➋
    REQUIRE(publisher == "Medium Starch Press"); ➌
  }
  SECTION("has a case-insensitive variant") {
    auto result = ireplace_first_copy(publisher, "NO", "MEDIUM"); ➍
    REQUIRE(publisher == "No Starch Press"); ➎
    REQUIRE(result == "MEDIUM Starch Press"); ➏
  }}

Listing 15-32: Both replace_first and i_replace_first replace matching string sequences.

Here, you construct a string called publisher containing No Starch Press ➊. The first test invokes replace_first with publisher as the input string, No as the match string, and Medium as the replacement string ➋. Afterward, publisher contains Medium Starch Press ➌. The second test uses the ireplace_first_copy variant, which is case insensitive and performs a copy. You pass NO and MEDIUM as the match and replace strings ➍, respectively, and the result contains MEDIUM Starch Press ➏, whereas publisher is unaffected ➎.

Table 15-15 lists many of the modifying algorithms available in Boost String Algorithms. In this table, r, s, s1, and s2 are strings; p is an element comparison predicate; n is an integral value; and rgx is a regular expression.

Table 15-15: Modifying Algorithms in the Boost String Algorithms Library

Algorithm	Description
to_upper(s) to_upper_copy(s)	Converts s to all uppercase
to_lower(s) to_lower_copy(s)	Converts s to all lowercase
trim_left_copy_if(s, [p]) trim_left_if(s, [p]) trim_left_copy(s) trim_left(s)	Removes leading spaces from s
trim_right_copy_if(s, [p]) trim_right_if(s, [p]) trim_right_copy(s) trim_right(s)	Removes trailing spaces from s
trim_copy_if(s, [p]) trim_if(s, [p]) trim_copy(s) trim(s)	Removes leading and trailing spaces from s
replace_first(s1, s2, r) replace_first_copy(s1, s2, r) ireplace_first(s1, s2, r) ireplace_first_copy(s1, s2, r)	Replaces the first occurrence of s2 in s1 with r
erase_first(s1, s2) erase_first_copy(s1, s2) ierase_first(s1, s2) ierase_first_copy(s1, s2)	Erases the first occurrence of s2 in s1
replace_last(s1, s2, r) replace_last_copy(s1, s2, r) ireplace_last(s1, s2, r) ireplace_last_copy(s1, s2, r)	Replaces the last occurrence of s2 in s1 with r
erase_last(s1, s2) erase_last_copy(s1, s2) ierase_last(s1, s2) ierase_last_copy(s1, s2)	Erases the last occurrence of s2 in s1
replace_nth(s1, s2, n, r) replace_nth_copy(s1, s2, n, r) ireplace_nth(s1, s2, n, r) ireplace_nth_copy(s1, s2, n, r)	Replaces the nth occurrence of s2 in s1 with r
erase_nth(s1, s2, n) erase_nth_copy(s1, s2, n) ierase_nth(s1, s2, n) ierase_nth_copy(s1, s2, n)	Erases the nth occurrence of s2 in s1
replace_all(s1, s2, r) replace_all_copy(s1, s2, r) ireplace_all(s1, s2, r) ireplace_all_copy(s1, s2, r)	Replaces all occurrences of s2 in s1 with r
erase_all(s1, s2) erase_all_copy(s1, s2) ierase_all(s1, s2) ierase_all_copy(s1, s2)	Erases all occurrences of s2 in s1
replace_head(s, n, r) replace_head_copy(s, n, r)	Replaces the first n characters of s with r
erase_head(s, n) erase_head_copy(s, n)	Erases the first n characters of s
replace_tail(s, n, r) replace_tail_copy(s, n, r)	Replaces the last n characters of s with r
erase_tail(s, n) erase_tail_copy(s, n)	Erases the last n characters of s
replace_regex(s, rgx, r) replace_regex_copy(s, rgx, r)	Replaces the first instance of rgx in s with r
erase_regex(s, rgx) erase_regex_copy(s, rgx)	Erases the first instance of rgx in s
replace_all_regex(s, rgx, r) replace_all_regex_copy(s, rgx, r)	Replaces all instances of rgx in s with r
erase_all_regex(s, rgx) erase_all_regex_copy(s, rgx)	Erases all instances of rgx in s

Splitting and Joining

Boost String Algorithms contains functions for splitting and joining strings in the <boost/algorithm/string/split.hpp> and <boost/algorithm/string/join.hpp> headers.

To split a string, you provide the split function with an STL container res, a range s, and a predicate p. It will tokenize the range s using the predicate p to determine delimiters and insert the results into res. Listing 15-33 illustrates the split function.

#include <vector>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>

TEST_CASE("boost::algorithm::split splits a range based on a predicate") {
  using namespace boost::algorithm;
  std::string publisher("No Starch Press"); ➊
  std::vector<std::string> tokens; ➋
  split(tokens, publisher, is_space()); ➌
  REQUIRE(tokens[0] == "No"); ➍
  REQUIRE(tokens[1] == "Starch");
  REQUIRE(tokens[2] == "Press");
}

Listing 15-33: The split function tokenizes a string.

Armed again with publisher ➊, you create a vector called tokens to contain the results ➋. You invoke split with tokens as the results container, publisher as the range, and an is_space as your predicate ➌. This splits the publisher into pieces by spaces. Afterward, tokens contains No, Starch, and Press as expected ➍.

You can perform the inverse operation with join, which accepts an STL container seq and a separator string sep. The join function will bind each element of seq together with sep between each.

Listing 15-34 illustrates the utility of join and the indispensability of the Oxford comma.

#include <vector>
#include <boost/algorithm/string/join.hpp>

TEST_CASE("boost::algorithm::join staples tokens together") {
  std::vector<std::string> tokens{ "We invited the strippers",
                                   "JFK", "and Stalin." }; ➊
  auto result = boost::algorithm::join(tokens, ", "); ➋
  REQUIRE(result == "We invited the strippers, JFK, and Stalin."); ➌
}

Listing 15-34: The join function attaches string tokens together with a separator.

You instantiate a vector called tokens with three string objects ➊. Next, you use join to bind token’s constituent elements together with a comma followed by a space ➋. The result is a single string containing the constituent elements bound together with commas and spaces ➌.

Table 15-16 lists many of the split/join algorithms available in <boost/algorithm/string/split.hpp> and <boost/algorithm/string/join.hpp>. In this table, res, s, s1, s2, and sep are strings; seq is a range of strings; p is an element comparison predicate; and rgx is a regular expression.

Table 15-16: split and join Algorithms in the Boost String Algorithms Library

Function	Description
find_all(res, s1, s2) ifind_all(res, s1, s2) find_all_regex(res, s1, rgx) iter_find(res, s1, s2)	Finds all instances of s2 or rgx in s1, writing each into res
split(res, s, p) split_regex(res, s, rgx) iter_split(res, s, s2)	Split s using p, rgx, or s2, writing tokens into res
join(seq, sep)	Returns a string joining seq using sep as a separator
join_if(seq, sep, p)	Returns a string joining all elements of seq matching p using sep as a separator

Searching

Boost String Algorithms offers a handful of functions for searching ranges in the <boost/algorithm/string/find.hpp> header. These are essentially convenient wrappers around the finders in Table 15-8.

For example, the find_head function accepts a range s and a length n, and it returns a range containing the first n elements of s. Listing 15-35 illustrates the find_head function.

#include <boost/algorithm/string/find.hpp>

TEST_CASE("boost::algorithm::find_head computes the head") {
  std::string word("blandishment"); ➊
  const auto result = boost::algorithm::find_head(word, 5); ➋
  REQUIRE(result.begin() == word.begin()); ➌ // (b)landishment
  REQUIRE(result.end() == word.begin()+5); ➍ // bland(i)shment
}

Listing 15-35: The find_head function creates a range from the beginning of a string.

You construct a string called word containing blandishment ➊. You pass it into find_head along with the length argument 5 ➋. The begin of result points to the beginning of word ➌, and its end points to 1 past the fifth element ➍.

Table 15-17 lists many of the find algorithms available in <boost/algorithm/string/find.hpp>. In this table, s, s1, and s2 are strings; p is an element comparison predicate; rgx is a regular expression; and n is an integral value.

Table 15-17: Find Algorithms in the Boost String Algorithms Library

Predicate	Finds the . . .
find_first(s1, s2) ifind_first(s1, s2)	First instance of s2 in s1
find_last(s1, s2) ifind_last(s1, s2)	First instance of s2 in s1
find_nth(s1, s2, n) ifind_nth(s1, s2, n)	nth instance of s2 in s1
find_head(s, n)	First n characters of s
find_tail(s, n)	Last n characters of s
find_token(s, p)	First character matching p in s
find_regex(s, rgx)	First substring matching rgx in s
find(s, fnd)	Result of applying fnd to s

Boost Tokenizer

Boost Tokenizer’s boost::tokenizer is a class template that provides a view of a series of tokens contained in a string. A tokenizer takes three optional template parameters: a tokenizer function, an iterator type, and a string type.

The tokenizer function is a predicate that determines whether a character is a delimiter (returns true) or not (returns false). The default tokenizer function interprets spaces and punctuation marks as separators. If you want to specify the delimiters explicitly, you can use the boost::char_separator<char> class, which accepts a C-string containing all the delimiting characters. For example, a boost::char_separator<char>(";|,") would separate on semicolons (;), pipes (|), and commas (,).

The iterator type and string type correspond with the type of string you want to split. By default, these are std::string::const_iterator and std::string, respectively.

Because tokenizer doesn’t allocate memory and boost::algorithm::split does, you should strongly consider using the former whenever you only need to iterate over the tokens of a string once.

A tokenizer exposes begin and end methods that return input iterators, so you can treat it as a range of values corresponding to the underlying token sequence.

Listing 15-36 tokenizes the iconic palindrome A man, a plan, a canal, Panama! by comma.

#include<boost/tokenizer.hpp>
#include<string>

TEST_CASE("boost::tokenizer splits token-delimited strings") {
  std::string palindrome("A man, a plan, a canal, Panama!"); ➊
  boost::char_separator<char> comma{ "," }; ➋
  boost::tokenizer<boost::char_separator<char>> tokens{ palindrome, comma }; ➌
  auto itr = tokens.begin(); ➍
  REQUIRE(*itr == "A man"); ➎
  itr++; ➏
  REQUIRE(*itr == " a plan");
  itr++;
  REQUIRE(*itr == " a canal");
  itr++;
  REQUIRE(*itr == " Panama!");
}

Listing 15-36: The boost::tokenizer splits strings by specified delimiters.

Here, you construct palindrome ➊, char_separator ➋, and the corresponding tokenizer ➌. Next, you extract an iterator from the tokenizer using its begin method ➍. You can treat the resulting iterator as usual, dereferencing its value ➎ and incrementing to the next element ➏.

Localizations

A locale is a class for encoding cultural preferences. The locale concept is typically encoded in whatever operating environment your application runs within. It also controls many preferences, such as string comparison; date and time, money, and numeric formatting; postal and ZIP codes; and phone numbers.

The STL offers the std::locale class and many helper functions and classes in the <locale> header.

Mainly for brevity (and partially because English speakers are the primary intended audience for this book), this chapter won’t explore locales any further.

Summary

This chapter covered std::string and its ecosystem in detail. After exploring its similarities to std::vector, you learned about its built-in methods for handling human-language data, such as comparing, adding, removing, replacing, and searching. You looked at how the numeric conversion functions allow you to convert between numbers and strings, and you examined the role that std::string_view plays in passing strings around your programs. You also learned how to employ regular expressions to perform intricate match, search, and replacement based on potentially complicated patterns. Finally, you trekked through the Boost String Algorithms library, which complements and extends the built-in methods of std::string with additional methods for searching, replacing, trimming, erasing, splitting, and joining.

EXERCISES

15-1. Refactor the histogram calculator in Listings 9-30 and 9-31 to use std::string. Construct a string from the program’s input and modify AlphaHistogram to accept a string_view or a const string& in its ingest method. Use a range-based for loop to iterate over the ingested elements of string. Replace the counts field’s type with an associative container.

15-2. Implement a program that determines whether the user’s input is a palindrome.

15-3. Implement a program that counts the number of vowels in the user’s input.

15-4. Implement a calculator program that supports addition, subtraction, multiplication, and division of any two numbers. Consider using the find method of std::string and the numeric conversion functions.

15-5. Extend your calculator program in some of the following ways: permit multiple operations or the modulo operator and accept floating-point numbers or parentheses.

15-6. Optional: Read more about locales in [localization].

FURTHER READING

ISO International Standard ISO/IEC (2017) — Programming Language C++ (International Organization for Standardization; Geneva, Switzerland; https://isocpp.org/std/the-standard/)
The C++ Programming Language, 4th Edition, by Bjarne Stroustrup (Pearson Education, 2013)
The Boost C++ Libraries, 2nd Edition, by Boris Schäling (XML Press, 2014)
The C++ Standard Library: A Tutorial and Reference, 2nd Edition, by Nicolai M. Josuttis (Addison-Wesley Professional, 2012)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15 STRINGS

Create new playlist

Sign In

Sign Up

15STRINGS

std::string

Constructing

String Storage and Small String Optimizations

Element and Iterator Access

String Comparisons

Manipulating Elements

Adding Elements

Removing Elements

Replacing Elements

Summary of string Manipulation Methods

Search

find

rfind

find_*_of

Summary of string Search Methods

Numeric Conversions

String View

Constructing

Supported string_view Operations

Ownership, Usage, and Efficiency

Regular Expressions

Patterns

Character Classes

Quantifiers

Groups

Other Special Characters

basic_regex

Algorithms

Matching

Searching

Replacing

Boost String Algorithms

Boost Range

Predicates

Classifiers

Finders

Modifying Algorithms

Splitting and Joining

Searching

Boost Tokenizer

Localizations

Summary

Table of Contents for
15 STRINGS

15
STRINGS

**find_*_of**