EXPLORATION 58

image

Pointers

Few topics cause more confusion, especially for programmers new to C++, than pointers. Necessary, powerful, and versatile, pointers can also be dangerous and the underlying cause of so many bugs that they are both bane and blessing. Pointers are hard at work behind many of the standard library’s features, and any serious application or library inevitably uses pointers in some fashion. When used with care and caution, pointers will become an indispensable tool in your C++ programmer’s toolkit.

A Programming Problem

Before diving into syntax and semantics, consider the following problem. Real-life C++ projects typically contain multiple source files, and each source file includes multiple headers. While you are working, you will compile and recompile the project many times. Each time, it’s preferable to recompile only those files that have changed or include a header file that has changed. Different development environments have different tools to decide which files to recompile. An IDE typically makes these decisions itself; in other environments, a separate tool, such as make, jam, or scons, examines the files in your project and decides which ones to recompile.

The problem to tackle in this and following Explorations is to write a simple tool that decides which files to compile and pretends to compile them. (Actually invoking an external program is beyond the scope of this book, so you won’t learn how to write an entire build tool.)

The essential idea is simple: to make an executable program, you must compile source files into object files and link the object files to form the program. The executable program is said to depend on the object files, which in turn, depend on the source files. Other terminology has the program as the target, with the object files as its dependencies. An object file, in turn, can be a target, with a source file and the header files that it includes as dependencies.

As you know, to compile a single source file into a single object file, the compiler may be required to read many additional header files. Each of these header files is a dependency of the object file. Thus, one header file can be a dependency of many object files. In more technical terms, targets and dependencies form a directed acyclic graph (DAG), which I will call the dependency graph.

image Note  A cyclic graph, such that A depends on B, and B depends on A, is a bad idea in the real world and often indicates a faulty or poorly considered design. For the sake of simplicity, I will ignore this error condition in this and subsequent Explorations.

Anyone who’s been around large projects knows that dependency graphs can become extremely complex. Some header files may be generated by other programs, so the header files are targets, with the generating programs as dependencies, and the generating programs are targets with their own dependencies.

IDEs and programs, such as make, analyze the dependency graph and determine which targets must be built first to ensure that every target’s dependencies are fulfilled. Thus, if A depends on B, and B depends on C, make must build C first (if it is a target), then B, and finally A. The key algorithm that make employs to find the correct order in whichto build targets is a topological sort.

Topological sorts are not included in the typical algorithms coursework of many computer science majors. Nor does the algorithm appear in many textbooks. However, any comprehensive algorithms book includes topological sort.

image Note  A good text on topological sort is Introduction to Algorithms, 3rd ed., by T. H. Cormen, C. E. Leiserson, and R. L. Rivest (MIT Press, 2009). My solution implements exercise 22.4-5.

The C++ standard library does not include a topological sort algorithm, because it is not a sequential algorithm. It operates on a graph, and the C++ library has no standard graph classes. (Boost has a graph library that includes topological sort, but to ensure everyone can use this Exploration, we will write our own topological sort function.)

We’ll begin this Exploration by writing a pseudo-make program—that is, a program that reads a makefile: a file that describes a set of targets and their dependencies, performs a topological sort to find the order for building targets, and prints the targets in proper build order. In order to simplify the program somewhat, restrict the input to a text file that declares dependencies as pairs of strings, one pair on a line of text. The first string is the name of a target, and the second string is the name of a dependency. If a target has multiple dependencies, the input file must list the target on multiple lines, one per dependency. A target can be a dependency of another target. The order of lines within the input file is not important. The goal is to write a program that will print the targets in order, so that a make-like program can build the first target first, and proceed in order, such that all targets are built before they are needed as dependencies.

To help clarify terminology, I use the term artifact for a string that can be a target, a dependency, or both. If you already know an algorithm for topological sort, go ahead and implement the program now. Otherwise, keep reading to see one implementation of topological_sort. To represent the dependency graph, use a map of sets. The map key is a dependency, and the value is the set of targets that list the key as a dependency. This seems inside out from the way you may usually think about organizing targets and dependencies, but as you can see in Listing 58-1, it makes the topological sort quite easy to implement. Because the topological_sort function is reusable, it is a template function and works with nodes instead of artifacts, targets, and dependencies.

Listing 58-1.  Topological Sort of a Directed Acyclic Graph

#ifndef TOPOLOGICAL_SORT_HPP_
#define TOPOLOGICAL_SORT_HPP_
 
#include <deque>
#include <stdexcept>
 
/// Helper function for topological_sort().
/// Finds all the nodes in the graph with no incoming edges,
/// that is, with empty values. Removes each one from the graph
/// and adds it to the set @p nodes.
/// @param[in,out] graph A map of node/set pairs
/// @param[in,out] nodes A queue of nodes with no incoming edges
template<class Graph, class Nodes>
void topsort_clean_graph(Graph& graph, Nodes& nodes)
{
  for (auto iter(graph.begin()), end(graph.end()); iter != end;)
  {
    if (iter->second.empty())
    {
      nodes.push_back(iter->first);
      graph.erase(iter++);  // advance iterator before erase invalidates it
    }
    else
      ++iter;
  }
}
 
/// Topological sort of a directed acyclic graph.
/// A graph is a map keyed by nodes, with sets of nodes as values.
/// Edges run from values to keys. The sorted list of nodes
/// is copied to an output iterator in reverse order.
/// @param graph The graph
/// @param sorted The output iterator
/// @throws std::runtime_error if the graph contains a cycle
/// @pre Graph::key_type == Graph::mapped_type::key_type
template<class Graph, class OutIterator>
void topological_sort(Graph graph, OutIterator sorted)
{
  std::deque<typename Graph::key_type> nodes{};
  // Start with the set of nodes with no incoming edges.
  topsort_clean_graph(graph, nodes);
 
  while (not nodes.empty())
  {
    // Grab the first node to process, output it to sorted,
    // and remove it from the graph.
    typename Graph::key_type n{nodes.front()};
    nodes.pop_front();
    *sorted = n;
    ++sorted;
 
    // Erase n from the graph
    for (auto& node : graph)
    {
      node.second.erase(n);
    }
    // After removing n, find any nodes that no longer
    // have any incoming edges.
    topsort_clean_graph(graph, nodes);
  }
  if (not graph.empty())
    throw std::invalid_argument("Dependency graph contains cycles");
}
 
#endif // TOPOLOGICAL_SORT_HPP_

Now that you have the topological_sort function, implement the pseudo-makeprogram to read and parse the input, build the dependency graph, call topological_sort, and print the sorted result. Keep things simple and treat artifacts (targets and dependencies) as strings. Thus, the dependency graph is a map with std::string as the key type and std::set<std::string> as the value type. Compare your solution with Listing 58-2.

Listing 58-2.  First Draft of the Pseudo-make Program

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <iterator>
#include <sstream>
#include <stdexcept>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
 
#include "topological_sort.hpp"
 
typedef std::string artifact; ///< A target, dependency, or both
 
class dependency_graph
{
public:
  typedef std::unordered_map<artifact, std::unordered_set<artifact>> graph_type;
 
  void store_dependency(artifact const& target, artifact const& dependency)
  {
    graph_[dependency].insert(target);
    graph_[target]; // ensures that target is in the graph
  }
 
  graph_type const& graph() const { return graph_; }
 
  template<class OutIter>
  void sort(OutIter sorted)
  const
  {
    topological_sort(graph_, sorted);
  }
 
private:
  graph_type graph_;
};
 
int main()
{
  dependency_graph graph{};
 
  std::string line{};
  while (std::getline(std::cin, line))
  {
    std::string target{}, dependency{};
    std::istringstream stream{line};
    if (stream >> target >> dependency)
      graph.store_dependency(target, dependency);
    else if (not target.empty())
      // Input line has a target with no dependency,
      // so report an error.
      std::cerr << "malformed input: target, " << target <<
                   ", must be followed by a dependency name ";
    // else ignore blank lines
  }
 
  try {
    // Get the artifacts. The sort stores them in reverse order.
    std::vector<artifact> sorted{};
    graph.sort(std::back_inserter(sorted));
    // Then print them in the correct order.
    std::copy(sorted.rbegin(), sorted.rend(),
              std::ostream_iterator<artifact>(std::cout, " "));
  } catch (std::runtime_error const& ex) {
    std::cerr << ex.what() << ' ';
    return EXIT_FAILURE;
  }
}

So what do DAGs and topological sorts have to do with the topic of this Exploration? I thought you’d never ask. Let’s construct a slightly more complicated problem by making it a little more realistic.

A real make program has to keep track of more information about an artifact, especially the time when it was last modified. A target also has a list of actions to perform, if any dependency is newer than the target. Thus, a class makes more sense than a string for representing an artifact. You can add to the class whatever functionality you need for your make program.

Standard C++ does not have any functions for querying a file’s modification time. For now, we’ll just sidestep the issue and make up a time for every artifact. The important task at hand is to associate additional information with an artifact. The <chrono> header declares a time type that goes by the rather unwieldy qualified name of

std::chrono::system_clock::time_point

Ignoring actions, you might define the artifact type as shown in Listing 58-3.

Listing 58-3.  New Definition of an Artifact

#ifndef ARTIFACT_HPP_
#define ARTIFACT_HPP_
 
#include <chrono>
#include <string>
 
class artifact
{
public:
  typedef  std::chrono::system_clock clock;
  artifact() : name_{}, mod_time_{clock::time_point{}} {}
  artifact(std::string const& name)
  : name_{name}, mod_time_{get_mod_time()}
  {}
 
  std::string const& name()     const { return name_; }
  clock::time_point  mod_time() const { return mod_time_; }
 
  /// Build a target.
  /// After completing the actions (not yet implemented),
  /// update the modification time.
  void build();
 
  /// Look up the modification time of the artifact.
  /// Return time_point{} if the artifact does not
  /// exist (and therefore must be built) or if the time cannot
  /// be obtained for any other reason.
  /// Also see boost::file_system::last_write_time.
  clock::time_point get_mod_time()
  {
    // Real programs should get this information from the
    // operating system. This program returns the current time.
    return clock::now();
  }
private:
  std::string name_;
  clock::time_point mod_time_;
};
 
#endif // ARTIFACT_HPP_

Now we run into a problem. In the first draft of this program, what made two strings refer to the same artifact is that the strings had the same content. The target named “program” is the same artifact as the dependency named “program,” because they are spelled the same. That scheme falls down now that an artifact is more than just a string. When you build a target and update its modification time, you want all uses of that artifact to be updated. Somehow, every use of an artifact name must be associated with a single artifact object for that name.

Got any ideas? It can be done with your current understanding of C++, but you may have to stop and think about it.

Need a hint? How about storing all artifacts in one big vector? Then make a dependency graph that contains indices into the vector, instead of artifact names. Try it. Rewrite the program in Listing 58-2 to use the new artifact.hpp header from Listing 58-3. When an artifact name is read from the input file, look up that name ina vector of all artifacts. If the artifact is new, add it to the end. Store vector indices in the dependency graph. Print the final list by looking up the numbers in the vector. Compare your solution with Listing 58-4.

Listing 58-4.  Second Draft, Adding Modification Times to Artifacts

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <iterator>
#include <sstream>
#include <stdexcept>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
 
#include "artifact.hpp"
#include "topological_sort.hpp"
 
typedef std::size_t artifact_index;
 
class dependency_graph
{
public:
  typedef std::unordered_map<artifact_index, std::unordered_set<artifact_index>> graph_type;
 
  void store_dependency(artifact_index target, artifact_index dependency)
  {
    graph_[dependency].insert(target);
    graph_[target]; // ensures that target is in the graph
  }
 
  graph_type const& graph() const { return graph_; }
 
  template<class OutIter>
  void sort(OutIter sorted)
  const
  {
    topological_sort(graph_, sorted);
  }
 
private:
  graph_type graph_;
};
 
std::vector<artifact> artifacts{};
artifact_index lookup_artifact(std::string const&name)
{
  auto iter( std::find_if(artifacts.begin(), artifacts.end(),
    [&name](artifact const&a) { return a.name() == name; })
  );
  if (iter != artifacts.end())
    return iter - artifacts.begin();
  // Artifact not found, so add it to the end.
  artifacts.push_back(artifact{name});
  return artifacts.size() - 1;
}
 
int main()
{
  dependency_graph graph{};
 
  std::string line{};
  while (std::getline(std::cin, line))
  {
    std::string target_name{}, dependency_name{};
    std::istringstream stream{line};
    if (stream >> target_name >> dependency_name)
    {
      artifact_index target{lookup_artifact(target_name)};
      artifact_index dependency{lookup_artifact(dependency_name)};
      graph.store_dependency(target, dependency);
    }
    else if (not target_name.empty())
      // Input line has a target with no dependency,
      // so report an error.
      std::cerr << "malformed input: target, " << target_name <<
                   ", must be followed by a dependency name ";
    // else ignore blank lines
  }
 
  try {
    // Get the sorted artifact indices in reverse order.
    std::vector<artifact_index> sorted{};
    graph.sort(std::back_inserter(sorted));
    // Then print the artifacts in the correct order.
    for (auto it(sorted.rbegin()), end(sorted.rend());
         it != end;
         ++it)
    {
      std::cout << artifacts.at(*it).name() << ' ';
    }
  } catch (std::runtime_error const& ex) {
    std::cerr << ex.what() << ' ';
    return EXIT_FAILURE;
  }
}

image Note  If the performance of linear lookups concerns you, congratulations for sharp thinking. Not to worry, however, because the program will continue to grow and evolve throughout this Exploration, and we will eliminate the performance issue before we finish.

Well, that works, but it’s ugly. Looking up indices is sloppy programming. Much better would be to store references to the artifact objects directly in the graph. Ah, there’s the rub. You can’t store a reference in a standard container. Containers are for storing objects—real objects. The container must be able to copy and assign the elements in the container, but it can’t do that with references. Copying a reference actually copies the object to which it refers. A reference is not a first-class entity that a program can manipulate.

Wouldn’t it be nice if C++ had a language feature that acts like a reference but lets you copy and assign the reference itself (not the referred-to object)? Let’s pretend we are inventing the C++ language, and we have to add this language feature.

The Solution

Let’s devise a new language feature to solve this programming problem. This new feature is similar to references but permits use with standard containers. Let’s call this feature a flex-ref, short for “flexible reference.”

If a and b are both flex-refs that refer to type int, the statement

a = b;

means the value of a is changed so that a now refers to the same int object to which b refers. Passing a as an argument to a function passes the value of a, so if the function assigns a new value to a, that change is local to the function (just as with any other function argument). Using a suitable operator, however, the function can obtain the int object to which a refers, and read or modify that int.

You need a way to obtain the referred-to value, so we have to invent a new operator. Look at iterators for inspiration: given an iterator, the unary * operator returns the item to which the iterator refers. Let’s use the same operator for flex-refs. Thus, the following prints the int value to which a refers:

std::cout << *a;

In the spirit of the * operator, declare a flex-ref by using * in the same manner that you use & for references.

int *a, *b;

Use the same syntax when declaring a container. For example, declare a vector of flex-refs that refer to type int.

std::vector<int*> vec;
vec.push_back(a);
b = vec.front();

All that’s left is to provide a way to make a flex-ref refer to an object. For that, let’s turn to ordinary references for inspiration and use the & operator. Supposing that c is of type int, the following makes a refer to c:

a = &c;

As you’ve guessed by now, flex-refs are pointers. The variables a and b are called “pointers to int.” A pointer is an honest-to-goodness lvalue. It occupies memory. The values that are stored in that memory are addresses of other lvalues. You can freely change the value stored in that memory, which has the effect of making the pointer refer to a different object.

A pointer can point to a const object, or it can be a const pointer, or both. The following shows pointer toconst int:

int const* p;
p = &c;

Define a const pointer—that is, a pointer that is itself const and therefore cannot be the target of an assignment, but the dereferenced object can be the target.

int * const q{&c};
*q = 42;

Like any const object, you must supply an initializer, and you cannot modify the pointer. However, you can modify the object to which the pointer points.

You can define a reference to a pointer, just as you can define a reference to anything (except another reference).

int const*&r{p};

Read this declaration the way you would read any other declaration: start by finding the declarator, r. Then read the declaration from the inside, working your way outward. To the left, see &, telling you that r is a reference. To the right is the initializer, {p}; r is a reference to p (r is another name for the object p). Continuing to the left, you see *, so r is a reference to a pointer. Then you see const, so you know r is a reference to a pointer to a const something. Finally, int tells you that r is a reference to a pointer to const int. Thus, the initializer is valid, because its type is pointer to int.

What about the other way around? Can you define a pointer to a reference? The short answer is that you can’t. A pointer to a reference makes as little sense as a reference to a reference. References and pointers must refer or point to a real object. When you use the & operator on a reference, you get the address of the referenced object.

You can define a pointer to a pointer. Or a pointer to a pointer to a pointer to a pointer. . . Just keep track of the exact type of your pointer. The compiler ensures that you assign only expressions of the correct type, as shown in the following:

int x;
int *y;
int **z;
y = &x;
z = &y;

Try z = &x and y = z. What happens?

_____________________________________________________________

Because x has type int, &x has type int*; y has type int*, too, so you can assign &x to y but not to z, which has type int**. The types must match, so you can’t assign z to y either.

It took me long enough to get to the point, but now you can see how pointers help solve the problem of writing the dependency graph. Before we dive into the code, however, let’s take a moment to clarify some terminology.

Addresses vs. Pointers

Programmers are sticklers for details. The compilers and other tools we use daily force us to be. So let’s be absolutely clear about addresses and pointers.

An address is a memory location. In C++ parlance, it is an rvalue, so you cannot modify or assign to an address. When a program takes the address of an object (with the & operator), the result is a constant for the lifetime of that object. Like every other rvalue, an address in C++ has a type, which must be a pointer type.

A pointer type is more properly called an address type, because the range of values represented by the type are addresses. Nonetheless, the term pointer type is more common, because a pointer object has a pointer type.

A pointer type can denote multiple levels of indirection—it can denote a pointer to a pointer, or a pointer to a pointer to a pointer, etc. You must declare each level of pointer indirection with an asterisk. In other words, int* is the type “pointer to int” and int** is “pointer to pointer to int.”

A pointer is an lvalue that has a pointer type. A pointer object, like any object, has a location in memory in which the program can store a value. The value must have a type that is compatible with the pointer’s type; the value must be an address of the correct type.

Dependency Graphs

Now let’s get back to the dependency graph. The graph can store pointers to artifacts. Each external file corresponds to a single artifact object in the program. That artifact can have many nodes in the graph pointing to it. If you update that artifact, all nodes that point to the artifact see the update. Thus, when a build rule updates an artifact, the file modification time may change. All nodes for that artifact in the graph immediately see the new time, because they all point to a single object.

All that’s left to figure out is where these artifacts reside. For the sake of simplicity, I recommend a map, keyed by artifact name. The mapped values are artifact objects (not pointers). Take the address of an artifact in the map to obtain pointers to store in the graph. Go ahead; don’t wait for me. Using the topological_sort.hpp andartifact.hpp headers, rewrite Listing 58-4 to store artifact objects in a map and artifact pointers in the graph. Compare your solution with Listing 58-5.

Listing 58-5.  Storing Pointers in the Dependency Graph

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <iterator>
#include <map>
#include <sstream>
#include <stdexcept>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
 
#include "artifact.hpp"
#include "topological_sort.hpp"
 
class dependency_graph
{
public:
  typedef std::unordered_map<artifact*, std::unordered_set<artifact*>> graph_type;
 
  void store_dependency(artifact* target, artifact* dependency)
  {
    graph_[dependency].insert(target);
    graph_[target]; // ensures that target is in the graph
  }
 
  graph_type const& graph() const { return graph_; }
 
  template<class OutIter>
  void sort(OutIter sorted)
  const
  {
    topological_sort(graph_, sorted);
  }
 
private:
  graph_type graph_;
};
 
std::map<std::string, artifact> artifacts{};
 
artifact* lookup_artifact(std::string const&name)
{
  auto a( artifacts.find(name) );
  if (a == artifacts.end())
    artifacts.emplace(name, artifact{name});
  return&artifacts[name];
}
 
int main()
{
  dependency_graph graph{};
 
  std::string line{};
  while (std::getline(std::cin, line))
  {
    std::string target_name{}, dependency_name{};
    std::istringstream stream{line};
    if (stream >> target_name >> dependency_name)
    {
      artifact* target{lookup_artifact(target_name)};
      artifact* dependency{lookup_artifact(dependency_name)};
      graph.store_dependency(target, dependency);
    }
    else if (not target_name.empty())
      // Input line has a target with no dependency, so report an error.
      std::cerr << "malformed input: target, " << target_name <<
                   ", must be followed by a dependency name ";
    // else ignore blank lines
  }
 
  try {
    // Get the sorted artifacts in reverse order.
    std::vector<artifact*> sorted{};
    graph.sort(std::back_inserter(sorted));
    // Then print the artifacts in the correct order.
    for (auto it(sorted.rbegin()), end(sorted.rend());
         it != end;
         ++it)
    {
      std::cout << (*it)->name() << ' ';
    }
  } catch (std::runtime_error const& ex) {
    std::cerr << ex.what() << ' ';
    return EXIT_FAILURE;
  }
}

Dereferencing the iterator yields a pointer. To call the name() member function from a pointer, use the arrow (->) operator. The trick is that -> has higher precedence than *, so you must put *it in parentheses in order to dereference the iterator first. Overall, the program requires minimal changes, and the changes are mostly simplifications. As the program grows more complicated (as real programs inevitably do), the simplicity and elegance of pointers become more and more evident.

Standard containers are extremely helpful, but sometimes a program needs greater control over its objects.It must create and destroy objects on its own schedule, not when functions start and end or when control entersor leaves a block. The next Exploration tackles this problem by introducing dynamic memory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset