set
A logical extension to our program is to ignore common words like “the,” “and,” “or,” and so on. We’ll use a set
to hold the words we want to ignore and count only those words that are not in this set:
// count the number of times each word occurs in the input
map<string, size_t> word_count; // empty map from string to size_t
set<string> exclude = {"The", "But", "And", "Or", "An", "A",
"the", "but", "and", "or", "an", "a"};
string word;
while (cin >> word)
// count only words that are not in exclude
if (exclude.find(word) == exclude.end())
++word_count[word]; // fetch and increment the counter for word
Like the other containers, set
is a template. To define a set
, we specify the type of its elements, which in this case are string
s. As with the sequential containers, we can list initialize (§ 9.2.4, p. 336) the elements of an associative container. Our exclude
set holds the 12 words we want to ignore.
The important difference between this program and the previous program is that before counting each word, we check whether the word is in the exclusion set. We do this check in the if
:
// count only words that are not in exclude
if (exclude.find(word) == exclude.end())
The call to find
returns an iterator. If the given key is in the set
, the iterator refers to that key. If the element is not found, find
returns the off-the-end iterator. In this version, we update the counter for word
only if word
is not in exclude
.
If we run this version on the same input as before, our output would be
Although occurs 1 time
Before occurs 1 time
are occurs 1 time
as occurs 1 time
...
Exercise 11.1: Describe the differences between a map
and a vector
.
Exercise 11.2: Give an example of when each of list
, vector
, deque
, map
, and set
might be most useful.
Exercise 11.3: Write your own version of the word-counting program.
Exercise 11.4: Extend your program to ignore case and punctuation. For example, “example.” “example,” and “Example” should all increment the same counter.