EXPLORATION 22

image

Using Algorithms

So far, your use of the standard algorithms has been limited to a few calls to copy, the occasional use of sort, and so on. The main limitation has been that many of the more interesting algorithms require you to supply a function in order to do anything useful. This Exploration takes a look at these more advanced algorithms. In addition, we’ll revisit some of the algorithms you already know, to show you how they, too, can be used in a more advanced manner.

Transforming Data

Several programs that you’ve read and written have a common theme: copying a sequence of data, such as a vector or string, and applying some kind of transformation to each element (converting to lowercase, doubling the values in an array, and so on). The standard algorithm, transform, is ideal for applying an arbitrarily complex transformation to the elements of a sequence.

For example, recall Listing 10-5, which doubled all the values in an array. Listing 22-1 presents a new way to write this same program, but using transform.

Listing 22-1.  Calling transform to Apply a Function to Each Element of an Array

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
 
int times_two(int i)
{
  return i * 2;
}
 
int main()
{
   std::vector<int> data{};
 
   std::copy(std::istream_iterator<int>(std::cin),
             std::istream_iterator<int>(),
             std::back_inserter(data));
 
   std::transform(data.begin(), data.end(), data.begin(), times_two);
 
   std::copy(data.begin(), data.end(),
             std::ostream_iterator<int>(std::cout, " "));
}

The transform function takes four arguments: the first two specify the input range (as start and one-past-the-end iterators), the third argument is a write iterator, and the final argument is the name of a function. Like other algorithms, transform is declared in the <algorithm> header.

Regarding the third argument, as usual, it is your responsibility to ensure the output sequence has enough room to accommodate the transformed data. In this case, the transformed data overwrite the original data, so the start of the output range is the same as the start of the input range. The fourth argument is just the name of a function that you must have declared or defined earlier in the source file. In this example, the function takes one int parameter and returns an int. The general rule for a transform function is that its parameter type must match the input type, which is the type of the element to which the read iterators refer. The return value must match the output type, which is the type to which the result iterator refers. The transform algorithm calls this function once for each element in the input range. It copies to the output range the value returned by the function.

Rewriting the word-counting program is a little harder. Recall from Listing 20-3 that the sanitize function transforms a string by removing non-letters and converting all uppercase letters to lowercase. The purpose of the C++ standard library is not to provide a zillion functions that cover all possible programming scenarios but, rather, to provide the tools you need to build your own functions with which you can solve your problems. Thus, you would search the standard library in vain for a single algorithm that copies, transforms, and filters. Instead, you can combine two standard functions: one that transforms and one that filters.

A further complication, however, is that you know that the filtering and transforming functions will rely on a locale. Solve the problem for now by setting your chosen locale as the global locale. Do this by calling std::local::global, and passing a locale object as the sole argument. An std::locale object created with the default constructor uses the global locale, so after your program sets your chosen locale as the global locale, you can easily imbue a stream or otherwise access the chosen locale by means of std::locale{}. Any function can use the global locale without having to pass locale objects around. Listing 22-2 demonstrates how to rewrite Listing 20-3 to set the global locale to the native locale and then how to use the global locale in the rest of the program.

Listing 22-2.  New main Function That Sets the Global Locale

#include <iomanip>
#include <iostream>
#include <locale>
#include <map>
#include <string>
 
typedef std::map<std::string, int> count_map;  ///< Map words to counts
typedef count_map::value_type      count_pair; ///< pair of a word and a count
typedef std::string::size_type     str_size;   ///< String size type
 
/** Initialize the I/O streams by imbuing them with
 * the global locale. Use this function to imbue the streams
 * with the native locale. C++ initially imbues streams with
 * the classic locale.
 */
void initialize_streams()
{
  std::cin.imbue(std::locale{});
  std::cout.imbue(std::locale{});
}
 
/** Find the longest key in a map.
 * @param map the map to search
 * @returns the size of the longest key in @p map
 */
str_size get_longest_key(count_map const& map)
{
  str_size result{0};
  for (auto pair : map)
    if (pair.first.size() > result)
      result = pair.first.size();
  return result;
}
 
/** Print the word, count, newline. Keep the columns neatly aligned.
 * Rather than the tedious operation of measuring the magnitude of all
 * the counts and then determining the necessary number of columns, just
 * use a sufficiently large value for the counts column.
 * @param pair a word/count pair
 * @param longest the size of the longest key; pad all keys to this size
 */
void print_pair(count_pair const& pair, str_size longest)
{
  int const count_size{10}; // Number of places for printing the count
  std::cout << std::setw(longest)    << std::left  << pair.first <<
               std::setw(count_size) << std::right << pair.second << ' ';
}
 
/** Print the results in neat columns.
 * @param counts the map of all the counts
 */
void print_counts(count_map counts)
{
  str_size longest{get_longest_key(counts)};
  
  // For each word/count pair...
  for (count_pair pair: counts)
    print_pair(pair, longest);
}
 
/** Sanitize a string by keeping only alphabetic characters.
 * @param str the original string
 * @return a santized copy of the string
 */
std::string sanitize(std::string const& str)
{
  std::string result{};
  for (char c : str)
    if (std::isalnum(c, std::locale{}))
      result.push_back(std::tolower(c, std::locale{}));
  return result;
}
 
/** Main program to count unique words in the standard input. */
int main()
{
  // Set the global locale to the native locale.
  std::locale::global(std::locale{""});
  initialize_streams();
 
  count_map counts{};
 
  // Read words from the standard input and count the number of times
  // each word occurs.
  std::string word{};
  while (std::cin >> word)
  {
    std::string copy{sanitize(word)};
 
    // The "word" might be all punctuation, so the copy would be empty.
    // Don't count empty strings.
    if (not copy.empty())
      ++counts[copy];
  }
 
  print_counts(counts);
}

Now it’s time to rewrite the sanitize function to take advantage of algorithms. Use transform to convert characters to lowercase. Use remove_if to get rid of nonalphabetic characters from the string. The remove_if algorithm calls a function for each element of a sequence. If the function returns true, remove_if eliminates that element from the sequence—well, it kind of does.

One curious side effect of how iterators work in C++ is that the remove_if function does not actually erase anything from the sequence. Instead, it rearranges the elements and returns an iterator that points to the position one past the end of the elements to be preserved. You can then call the erase member function to delete the elements that were removed, or you can make sure your function keeps track of the new logical end of the sequence.

Figure 22-1 illustrates how remove_if works with a before and after view of a string. Notice how the remove_if function does not alter the size of the string, but the characters after the new end are not meaningful. They are junk.

9781430261933_Fig22-01.jpg

Figure 22-1. Removing elements from a sequence

Take a look at Listing 22-3 to see how remove_if works in code.

Listing 22-3.  Sanitizing a String by Transforming It

/** Test for non-letter.
 * @param ch the character to test
 * @return true if @p ch is not a character that makes up a word
 */
bool non_letter(char ch)
{
  return not std::isalnum(ch, std::locale());
}
 
/** Convert to lowercase.
 * Use a canonical form by converting to uppercase first,
 * and then to lowercase.
 * @param ch the character to test
 * @return the character converted to lowercase
 */
char lowercase(char ch)
{
  return std::tolower(ch, std::locale());
}
 
/** Sanitize a string by keeping only alphabetic characters.
 * @param str the original string
 * @return a santized copy of the string
 */
std::string sanitize(std::string str)
{
  // Remove all non-letters from the string, and then erase them.
  str.erase(std::remove_if(str.begin(), str.end(), non_letter),
            str.end());
 
  // Convert the remnants of the string to lowercase.
  std::transform(str.begin(), str.end(), str.begin(), lowercase);
 
  return str;
}

The erase member function takes two iterators as arguments and erases all the elements within that range. The remove_if function returns an iterator that points to one-past-the-end of the new string, which means it also points to the first position of the elements to be erased. Passing str.end() as the end of the range instructs erase to get rid of all the removed elements.

The remove/erase idiom is common in C++, so you should get used to seeing it. The standard library has several remove-like functions, all of which work the same way. It takes a little while to get used to this approach, but once you do, you will find it quite easy to use.

Predicates

The non_letter function is an example of a predicate. A predicate is a function that returns a bool result. These functions have many uses in the standard library.

For example, the sort function sorts values in ascending order. What if you wanted to sort data in descending order? The sort function lets you provide a predicate to compare items. The ordering predicate (call it pred) must meet the following qualifications:

  • pred(a, a) must be false (a common error is to implement <= instead of <, which violates this requirement).
  • If pred(a, b) is true, and pred(b, c) is true, then pred(a, c) must also be true.
  • The parameter types must match the element type to be sorted.
  • The return type must be bool or something that C++ can convert automatically to bool.

If you don’t provide a predicate, sort uses the < operator as the default.

Write a predicate to compare two integers for sorting in descending order.

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

Write a program to test your function. Did it work? ________________

Compare your solution with Listing 22-4.

Listing 22-4.  Sorting into Descending Order

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
 
/** Predicate for sorting into descending order. */
int descending(int a, int b)
{
  return a > b;
}
 
int main()
{
  std::vector<int> data{};
  data.insert(data.begin(), std::istream_iterator<int>(std::cin),
                            std::istream_iterator<int>());
 
  std::sort(data.begin(), data.end(), descending);
 
  std::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));
}

The default comparison that sort uses (the < operator) is the standard for comparison throughout the standard library. The standard library uses < as the ordering function for anything and everything that can be ordered. For example, map uses < to compare keys. The lower_bound functions (which you used in Exploration 13) use the < operator to perform a binary search.

The standard library even uses < to compare objects for equality when dealing with ordered values, such as a map or a binary search. (Algorithms and containers that are not inherently ordered use == to determine when two objects are equal.) To test if two items, a and b, are the same, these library functions use a < b and b < a. If both comparisons are false, then a and b must be the same, or in C++ terms, equivalent. If you supply a comparison predicate (pred), the library considers a and b to be equivalent if pred(a, b) is false and pred(b, a) is false.

Modify your descending-sort program (or Listing 22-4) to use== as the comparison operator. What do you expect to happen?

_____________________________________________________________

_____________________________________________________________

Run the new program with a variety of inputs. What actually happens?

_____________________________________________________________

_____________________________________________________________

Were you correct? ________________

The equivalence test is broken, because descending(a,a) is true, not false. Because the predicate does not work properly, sort is not guaranteed to work properly, or at all. The results are undefined. Whenever you write a predicate, be sure the comparison is strict (that is, you can write a valid equivalence test) and the transitive property holds (if a < b and b < c, then a < c is also true).

Other Algorithms

The standard library contains too many useful algorithms to cover in this book, but I’ll take a few moments in this section to introduce you to at least some of them. Refer to a comprehensive language reference to learn about the other algorithms.

Let’s explore algorithms by looking for palindromes. A palindrome is a word or phrase that reads the same forward and backward, ignoring punctuation, such as, for example:

Madam, I’m Adam.

The program reads one line of text at a time by calling the getline function. This function reads from an input stream into a string, stopping when it reads a delimiter character. The default delimiter is ' ', so it reads one line of text. It does not skip over initial or trailing white space.

The first step is to remove non-letter characters, but you already know how to do that.

The next step is to test whether the resulting string is a palindrome. The reverse function transposes the order of elements in a range, such as characters in a string.

The equal function compares two sequences to determine whether they are the same. It takes two iterators for the first range and a starting iterator for the second range. It assumes the two ranges are the same size, and it compares one element at a time and works for any kind of sequence. In this case, the comparison must be case-insensitive, so provide a predicate that converts all text to a canonical case and then compares them.

Go ahead. Write the program. A simple web search should deliver up some juicy palindromes with which to test your program. If you don’t have access to the Web, try the following:

Eve

Deed

Hannah

Leon saw I was Noel

If you need some hints, here are my recommendations:

  • Write a function called is_palindrome that takes a string as a parameter and returns a bool.
  • This function uses the remove_if function to clean up the string.
  • There’s no need to erase the removed elements, however. Instead, save the end iterator returned by remove_if and use that instead of str.end().
  • Define a new string, rev, which is a copy of the sanitized string. Copy only the desired part of the test string by taking only up to the iterator that was returned from remove_if.
  • Next, call reverse.
  • Write a predicate, is_same_char, to compare two characters after converting them both to uppercase and then to lowercase.
  • Call the equal function with same_char as its predicate, to compare the original string with the reversed copy.
  • The main program sets the global locale to the native locale and imbues the input and output streams with the new global locale.
  • The main program calls getline(std::cin, line) until the function returns false (meaning error or end-of-file), then calls is_palindrome for each line.

Listing 22-5 shows my version of the completed program.

Listing 22-5.  Testing for Palindromes

#include <algorithm>
#include <iostream>
#include <iterator>
#include <locale>
#include <string>
 
/** Test for non-letter.
 * @param ch the character to test
 * @return true if @p ch is not a letter
 */
bool non_letter(char ch)
{
  return not std::isalpha(ch, std::locale());
}
 
/** Convert to lowercase.
 * Use a canonical form by converting to uppercase first,
 * and then to lowercase.
 * @param ch the character to test
 * @return the character converted to lowercase
 */
char lowercase(char ch)
{
  return std::tolower(ch, std::locale());
}
 
/** Compare two characters without regard to case. */
bool is_same_char(char a, char b)
{
  return lowercase(a) == lowercase(b);
}
 
/** Determine whether @p str is a palindrome.
 * Only letter characters are tested. Spaces and punctuation don't count.
 * Empty strings are not palindromes because that's just too easy.
 * @param str the string to test
 * @return true if @p str is the same forward and backward
 */
bool is_palindrome(std::string str)
 
{
  std::string::iterator end{std::remove_if(str.begin(), str.end(), non_letter)};
  std::string rev{str.begin(), end};
  std::reverse(rev.begin(), rev.end());
  return not rev.empty() and std::equal(str.begin(), end, rev.begin(), is_same_char);
}
 
int main()
{
  std::locale::global(std::locale{""});
  std::cin.imbue(std::locale{});
  std::cout.imbue(std::locale{});
 
  std::string line{};
  while (std::getline(std::cin, line))
    if (is_palindrome(line))
      std::cout << line << ' ';
}

In a large program, the predicate or transformation function might be declared far from where it is used. Often, a predicate is used only once. By defining a function just for that predicate, it makes your program harder to understand. The human reader must read all the code to ensure that the predicate truly is called in only one place. It would be nice if C++ offered a way to write the predicate in the place where it is used, and thereby avoid these problems. Read the next Exploration to learn how to achieve this in C++ 11.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset