Using a set

A logical extension to our program is to ignore common words like “the,” “and,” “or,” and so on. We’ll use a set to hold the words we want to ignore and count only those words that are not in this set:

// count the number of times each word occurs in the input
map<string, size_t> word_count; // empty map from string to size_t
set<string> exclude = {"The", "But", "And", "Or", "An", "A",
                       "the", "but", "and", "or", "an", "a"};
string word;
while (cin >> word)
    // count only words that are not in exclude
    if (exclude.find(word) == exclude.end())
        ++word_count[word];   // fetch and increment the counter for word

Like the other containers, set is a template. To define a set, we specify the type of its elements, which in this case are strings. As with the sequential containers, we can list initialize (§ 9.2.4, p. 336) the elements of an associative container. Our exclude set holds the 12 words we want to ignore.

The important difference between this program and the previous program is that before counting each word, we check whether the word is in the exclusion set. We do this check in the if:

// count only words that are not in exclude
if (exclude.find(word) == exclude.end())

The call to find returns an iterator. If the given key is in the set, the iterator refers to that key. If the element is not found, find returns the off-the-end iterator. In this version, we update the counter for word only if word is not in exclude.

If we run this version on the same input as before, our output would be

Although occurs 1 time
Before occurs 1 time
are occurs 1 time
as occurs 1 time
...


Exercises Section 11.1

Exercise 11.1: Describe the differences between a map and a vector.

Exercise 11.2: Give an example of when each of list, vector, deque, map, and set might be most useful.

Exercise 11.3: Write your own version of the word-counting program.

Exercise 11.4: Extend your program to ignore case and punctuation. For example, “example.” “example,” and “Example” should all increment the same counter.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset