EXPLORATION 59

image

Dynamic Memory

Declaring pointers is all well and good, but real programs have to do more. The next step is to learn how to create new objects on the fly, at runtime. Your program takes full control over the lifetime of these objects, destroying the objects only when the program is done using them. This Exploration details how to allocate and free memory dynamically. It also continues to develop artifact and related classes from Exploration 58.

I want to warn you, however, not to run off immediately and start using dynamic memory in your programs. We still have several more Explorations to go, each one building on its predecessors. You need the full picture, which will include safer ways to manage pointers and dynamic memory.

Allocating Memory

A new expression allocates memory for a new object, calls a constructor to initialize that object, and returns the address of the newly allocated and constructed object. The syntax is the new keyword, followed by the base type, followed by an optional initializer.

int *pointer{ new int{42}};
std::cout << *pointer << ' '; // prints 42
*pointer = 10;                 // changes the int to 10

The new expression returns the address of an lvalue. The type of the lvalue is the type you provide after the new keyword. The normal rules for initialization apply. If the initializer is a set of empty curly braces, the newly allocated object is zero-initialized (see Exploration 32), meaning all members are initialized to known, zero values. If you omit the curly braces entirely, the new object is default-initialized, which means objects of built-in type are left uninitialized. This is usually a bad thing, so I recommend providing an initializer with all new expressions, such as shown in the following:

int *good{new int{}}; // *good == 0
int *bad{new int};    // undefined behavior if you read *bad

As with any other initializer, if you are initializing a class-type object, and the constructor takes multiple arguments, separate the arguments with commas.

rational* piptr{new rational{355, 113}};

If the C++ environment runs out of memory and cannot fulfill a new request, it throws the std::bad_alloc exception (declared in <new>).

Once allocated and initialized, a dynamically allocated object lives until you get rid of it with a delete expression (as covered in the next section).

Freeing Memory

When your program no longer requires an object that it had allocated with a new expression, it must free that object with a delete expression. The syntax is the delete keyword, followed by a pointer to the object you want to delete.

int *tmp{new int{42}};
// use *tmp
delete tmp;

After you delete an object, all pointers to it become invalid. It is your responsibility to ensure that you never dereference, copy, or otherwise use these pointers. The only thing you can do with a pointer variable after you delete its object is to assign a new value to the variable.

Dynamic objects are more difficult to work with than other objects, because they impose a greater burden on you, the programmer, to manage their lifetimes. Almost any mistake results in undefined behavior. A few of the most common mistakes are

  • Using a pointer after a delete
  • Invoking delete more than once on the same object
  • Failing to delete an object before the program terminates

Most programmers are familiar with segmentation faults, access violations, and the like. These are the most benign of the actual behavior you might encounter if you fail to follow these rules.

It sure would be nice to be able to test whether a pointer is valid before dereferencing it. Too bad you can’t. The only way to ensure that your program never dereferences an invalid pointer or tries to otherwise misuse a pointer is to be very, very careful when you write programs that use pointers and dynamic memory. That’s why I waited until late in the book to introduce these topics.

Fortunately, C++ offers some tools to help you: one is a special pointer value that you can assign to any pointer variable to represent a “pointer to nothing.” You cannot dereference a pointer to nothing, but you can copy and assign these pointers, and most important, compare a pointer variable with the “pointer to nothing” value. In other words, when your program deletes an object, it should assign a “pointer to nothing” value to the pointer variable. By ensuring that every pointer variable stores a valid pointer or a “pointer to nothing,” you can safely test whether a pointer is valid before dereferencing it.

Pointer to Nothing

A “pointer to nothing” is called a null pointer, and you write a null pointer with the nullptr keyword. Use nullptr to initialize a pointer variable, as the source of an assignment or as a function argument, as shown in the following:

int* ptr{nullptr};
ptr = nullptr;

If you initialize a pointer with empty curly braces, you also get a null pointer. That is, the following declarations are equivalent:

int* ptr1{};
int* ptr2{nullptr};

I prefer being explicit, so I always use nullptr.

OLD-FASHIONED Null Pointers

In C++ 03, a null pointer was written with an integer literal 0. Before that, C programmers used NULL for null pointers. Unfortunately, a few errant C and C++ programmers also used NULL to mean zero in non-pointer expressions. Some C++ programmers continued to use NULL instead of 0. As you might imagine, this led to all kinds of problems, leading to the invention of the nullptr keyword.

All new code should use nullptr. The addition of the nullptr keyword is such an important leap forward in clarity that I go out of my way to fix old code to use nullptr. Ordinarily, I don’t take existing, working code and go in to add C++ 11 features, but I make an exception for nullptr.

The bits actually stored in a pointer variable are implementation-defined. In particular, you cannot make any assumption about the bits used to represent nullptr. But whatever those bits are, C++ will use the correct address for your environment when you zero-initialize a pointer or use nullptr.

You have one other guarantee with null pointers: whatever value the C++ environment uses for them, no real object will ever have that value as its address. Thus, you can assign a null pointer to a variable after you delete it, as a way to mark the pointer object as no longer pointing to anything.

int* ptr{new int{42}};
delete ptr;
ptr = nullptr;

A good programming practice is to ensure that no pointer variable ever retains an invalid value. Assigning nullptr to a pointer variable is one way to accomplish this.

A common idiom is to take advantage of short-circuiting (Exploration 12) to test for a null pointer, and if the pointer is not null, use it.

if (ptr != nullptr and ptr->needs_work( ))
    do_work(ptr);

A frequent question among newcomers to C++ is why the compiler does not generate code that automatically assigns a null pointer as part of the actions of the delete expression. There are two key reasons.

  • The delete expression takes an rvalue as an argument, not necessarily a modifiable pointer. Thus, it may not have anything to modify.
  • A program may have multiple pointers that all store the same address. Deleting the object invalidates all of these pointers. The delete expression cannot modify all of these pointers, because it doesn’t know about them. It knows only about the one address that is the argument to the delete expression. Thus, modifying the argument to delete does not solve this problem. You can minimize the extent of this problem by not copying pointers.

C++ has some of its own uses for null pointers. You can supply a null pointer value to the delete expression, and it safely ignores the delete request. If you wish, you can ask that the new expression return a null pointer instead of throwing an exception when it cannot allocate enough memory for the new object. Just add (std::nothrow) after the new keyword and be sure to check the pointer that new returns. The parentheses are required, and std::nothrow is declared in <new>.

int *pointer{ new (std::nothrow) int{42}};
if (pointer != nullptr)
  // do something with pointer...

Most programs don’t require #include <new>. You need that header only if you use std::nothrow or catch std::bad_alloc.

If you define a pointer variable without initializing it, the variable’s initial value is garbage. This is bad, and you are courting disaster—don’t do it. If you don’t have a valid address to use as the pointer’s initial value, use nullptr.

Implementing Standard Containers

Have you ever wondered how the standard containers actually work? If, for instance, I were to ask you to implement std::map or std::set, could you do it? A full, high-quality implementation of any standard container is surprisingly difficult, but it isn’t hard to grasp the basic principles.

The standard mandates logarithmic complexity for the associative containers. That is, lookups and insertion must have logarithmic performance, which pretty much forces a tree implementation. Most standard C++ libraries use red-black trees. A quick trip to the Internet will provide the algorithms—even working code—to implement red-black trees. The details of balancing the trees obscure the interesting use of pointers, so let’s pick a simpler container: list.

The list class template implements a common doubly linked list. Start with a simplified definition of the list class template itself, as shown in Listing 59-1.

Listing 59-1.  Defining the list Class Template

template<class T>
class list
{
public:
   list( )
  : head_{nullptr}, tail_{nullptr}, size_{0}
  {}
  ~list( ) { clear( ); }
 
  void clear( );               ///< Erase all nodes in the list. Reset size to 0.
  void push_back(T const& x); ///< Add to tail.
  void pop_back( );            ///< Erase from tail.
  // Other useful functions omitted for brevity...
private:
  class node
  {
  public:
    node(T const& key)
    : next_{nullptr}, prev_{nullptr}, value_{key}
    {}
    node* next_;
    node* prev_;
    T     value_;
  };
 
  node* head_;       ///< Start of the list
  node* tail_;       ///< End of the list
  std::size_t size_; ///< Number of nodes in the list
};

A list has a number of insert and erase functions. For the sake of simplicity, Listing 59-2 implements only push_back and pop_back.

Listing 59-2.  Implementing list::push_back and list::pop_back

template<class T>
void list<T>::push_back(T const& x)
{
    node* n{new node{x}};
    if (tail_ == nullptr)
        head_ = tail_ = n;
    else
    {
        n->prev_ = tail_;
        tail_->next = n;
    }
    ++size_;
}
 
template<class T>
void list<T>::pop_back( )
{
    node* n{tail_};
    if (head_ == tail_)
        head_ = tail_ = nullptr;
    else
    {
        tail_ = tail_->prev_;
        tail_->next_ = nullptr;
    }
    --size_;
    delete n;
}

Notice how pop_back removes the node from the list, ensures that the list is in a valid state, and then deletes the memory. Also notice how pop_back does not have to set n to a null pointer. Instead, the function simply returns; it has no opportunity to refer to the address of the deleted memory. These two techniques are ways that help ensure that your program handles dynamic memory correctly.

Adding Variables

Now return to the dependencies example that you started in Exploration 58. Let’s add a new feature: variables. If an input line contains only one string, and the string contains an equal sign, it is a variable assignment. The variable name is to the left of the equal sign; the value is to the right. In this oversimplified example, no spaces are allowed around the equal sign or in the variable’s value.

An artifact can contain a variable reference, which is a dollar sign, followed by the variable name in parentheses. The variable’s value replaces the reference in its containing string. If a variable is not defined, automatically define it as an empty string.

NUM=1
target$(NUM) source$(NUM)
NUM=2
target$(NUM) source$(NUM)
source1 hdrX
source2 hdrY
all target1
all target2

The target all depends on target1 and target2. In turn, target1 depends on source1, and target2 depends on source2. Finally, source1 depends on hdrX, and source2 depends on hdrY.

Add an expand function that takes a string, expands variables, and returns the result. What if the expansion of a variable contains further variable references? I suggest expanding all variables and re-expanding the string until no more variables remain to be expanded. The std::string class has a number of helpful member functions: the find( ) function searches for the first occurrence of a substring or a character and returns the index of the substring or the constant std::string::npos to mean “not found.” The index type is std::string::size_type. Pass an optional second argument to specify the position at which the search should begin.

std::string::size_type pos{str.find('$', 4)}; // start searching at index 4

The substr( ) member function takes two arguments, a starting position and a length, and returns a substring. The second argument is optional; omit it to mean “the rest of the string.” The replace( ) function has several forms. It replaces a substring with a new string. Pass the starting index and length of the substring to replace, followed by the replacement string.

str.replace(start, length, replacement);

For now, use a global map to store variables. Call it variables. Listing 59-3 presents my implementation of the expand function.

Listing 59-3.  Expanding Variables

typedef std::map<std::string, std::string> variable_map;
variable_map variables{};
 
std::string expand(std::string str)
{
   std::string::size_type start{0}; // start searching here
   while (true)
   {
      // Find a dollar sign.
      std::string::size_type pos{str.find('$', start)};
      if (pos == std::string::npos)
         // No more dollar signs.
         return str;
      if (pos == str.size( ) - 1 or str[pos + 1] != '(')
         // Not a variable reference. Skip the dollar sign,
         // and keep searching.
         start = pos + 1;
      else
      {
         std::string::size_type end{str.find(')', pos)};
         if (end == std::string::npos)
            // No closing parenthesis.
            return str;
 
         // Get the variable name.
         std::string varname{str.substr(pos + 2, end - pos - 2)};
         // Replace the entire variable reference.
         str.replace(pos, end - pos + 1, variables[varname]);
         // Scan the replacement text for more variables.
         start = pos;
      }
   }
}

Now modify the input function to recognize a variable definition, parse the definition to extract the variable name and value, and store the variable and its value in the variables map. Listing 59-4 illustrates how I rewrote the main function to parse variable definitions.

Listing 59-4.  Parsing Variable Definitions

void parse_graph(std::istream& in, dependency_graph& graph)
{
  std::string line{};
  while (std::getline(in, line))
  {
    std::string target_name{}, dependency_name{};
    std::istringstream stream{line};
    if (stream >> target_name >> dependency_name)
    {
      artifact* target{lookup_artifact(expand(target_name))};
      artifact* dependency{lookup_artifact(expand(dependency_name))};
      graph.store_dependency(target, dependency);
    }
    else if (not target_name.empty( ))
    {
      std::string::size_type equal{target_name.find('=')};
      if (equal == std::string::npos)
        // Input line has a target with no dependency,
        // so report an error.
        std::cerr << "malformed input: target, " << target_name <<
                     ", must be followed by a dependency name ";
      else
      {
        std::string variable_name{target_name.substr(0, equal)};
        std::string variable_value{target_name.substr(equal+1)};
        variables[variable_name] = variable_value;
      }
    }
    // else ignore blank lines
  }
}
 
int main( )
{
  dependency_graph graph{};
 
  parse_graph(std::cin, graph);
 
  try {
    // Get the sorted artifacts in reverse order.
    std::vector<artifact*> sorted;
    graph.sort(std::back_inserter(sorted));
 
    // Then print the artifacts in the correct order.
    for (auto it(sorted.rbegin( )), end(sorted.rend( ));
         it != end;
         ++it)
    {
      std::cout << (*it)->name( ) << ' ';
    }
  } catch (std::runtime_error const& ex) {
    std::cerr << ex.what( ) << ' ';
    return EXIT_FAILURE;
  }
}

This seemed like a good time to factor the input to its own function, parse_graph. So far, the modifications have not used dynamic memory, at least not explicitly. (Do you think the std::string class uses dynamic memory in its implementation?) The next step is to permit per-target variables. That is, if the input starts with a target name, and instead of a dependency name, the second string is a variable definition, that definition applies only to that target.

NUM=1
target$(NUM) SRC=1
target$(NUM) source$(SRC)
target2      source$(NUM)
target2      source$(SRC)

The NUM variable is global, so target2 depends on source1. The SRC variable applies only to target1, so target1 depends on source1. On the other hand, target2 depends on source, not source2, because target2 does not have a SRC variable, and unknown variables expand to an empty string.

One implementation is to add a map object to every artifact. Most artifacts do not have variables, however, so that can become wasteful. An alternative implementation uses dynamic memory and allocates a map only if a target has at least one variable. To look up a variable for a target, look only in the global map. To look up a variable for a dependency, first check the target’s map then check the global map.

As the program evolves, the difference between a target and a dependency grows. This is to be expected, because in real life, they are quite different. Targets get actions, for example, so you can build them. You may well argue that now is a good time for refactoring, to create two derived classes: target and dependency. Only the target class could have a map for variables. I grant you extra credit if you decide to undertake this refactoring now. To keep the solution simple, however, I will make the fewest modifications possible as we go along.

Start with the new artifact class. In addition to the new map, add an expand member function, which hides the details of the two-level lookup. Compare your solution with mine, which is shown in Listing 59-5.

Listing 59-5.  Adding Variable Storage and Lookup to the artifact Class

#ifndef ARTIFACT_HPP_
#define ARTIFACT_HPP_
 
#include <chrono>
#include <string>
 
#include "variables.hpp"
 
class artifact
{
public:
  typedef  std::chrono::system_clock clock;
  artifact( )
  : name_{}, mod_time_{clock::time_point{}}, variables_{nullptr}
  {}
  artifact(std::string const& name)
  : name_{name}, mod_time_{get_mod_time( )}, variables_{nullptr}
  {}
  ~artifact( )
  {
    delete variables_;
  }
 
  std::string const& name( )     const { return name_; }
  clock::time_point  mod_time( ) const { return mod_time_; }
  std::string        expand(std::string str) const
  {
    return ::expand(str, variables_);
  }
 
  /// Build a target.
  /// After completing the actions (not yet implemented),
  /// update the modification time.
  void build( );
 
  /// Look up the modification time of the artifact.
  /// Return time_point{} if the artifact does not
  /// exist (and therefore must be built) or if the time cannot
  /// be obtained for any other reason.
  clock::time_point get_mod_time( )
  {
    // Real programs should get this information from the
    // operating system. This program returns the current time.
    return clock::now( );
  }
 
  /// Store a per-target variable.
  void store_variable(std::string const& name, std::string const& value)
  {
    if (variables_ == nullptr)
      variables_ = new variable_map;
    (*variables_)[name] = value;
  }
private:
  std::string name_;
  clock::time_point mod_time_;
  variable_map* variables_;
};
 
#endif // ARTIFACT_HPP_

Note that the new destructor does not have to test whether the variables_ variable is set to anything. The value will be either a null pointer or the address of a map. The delete expression handles both and does the right thing: nothing for a null pointer or deletes the memory for an address. Thus, the destructor is easy to write and understand.

I hid a number of details in the new header, variables.hpp, which I present in Listing 59-6.

Listing 59-6.  The variables.hpp File

#ifndef VARIABLES_HPP_
#define VARIABLES_HPP_
 
#include <map>
#include <string>
 
typedef std::map<std::string, std::string> variable_map;
extern variable_map global_variables;
 
/// Expand variables in a string using a local map
/// and the global map.
/// @param str The string to expand
/// @param local_variables The optional, local map; can be null
/// @return The expanded string
std::string expand(std::string str, variable_map const* local_variables);
 
#endif // VARIABLES_HPP_

The implementation of the new expand function is in variables.cpp, shown in Listing 59-7.

Listing 59-7.  The variables.cpp File Implements the expand Function

#include "variables.hpp"
 
variable_map global_variables{};
 
// Get a variable's value. Try the local variables first; if not found
// try the global variables. If still not found, define the name with
// an empty string and return an empty string. Subsequent lookups of
// the same name will find the empty string. Exercise for reader:
// print a message the first time the undefined variable's name
// is used.
std::string get_value(std::string const& name, variable_map const* local_variables)
{
   if (local_variables != nullptr)
   {
      variable_map::const_iterator iter{local_variables->find(name)};
      if (iter != local_variables->end( ))
         return iter->second;
   }
   return global_variables[name];
}
 
std::string expand(std::string str, variable_map const* local_variables)
{
   std::string::size_type start{0}; // start searching here
   while (true)
   {
      // Find a dollar sign.
      std::string::size_type pos{str.find('$', start)};
      if (pos == std::string::npos)
         // No more dollar signs.
         return str;
      if (pos == str.size( ) - 1 or str[pos + 1] != '(')
         // Not a variable reference.
         // Skip the dollar sign, and keep searching.
         start = pos + 1;
      else
      {
         std::string::size_type end{str.find(')', pos)};
         if (end == std::string::npos)
            // No closing parenthesis.
            return str;
 
         // Get the variable name.
         std::string varname{str.substr(pos + 2, end - pos - 2)};
         // Replace the entire variable reference.
         std::string value{get_value(varname, local_variables)};
         str.replace(pos, end - pos + 1, value);
         // Scan the replacement text for more variables.
         start = pos;
      }
   }
}

The only task that remains is to update the parser. Modify theparse_graph function to parse target-specific variables. Compare your solution with mine in Listing 59-8.

Listing 59-8.  Adding per-Target Variables to parse_graph

void parse_graph(std::istream& in, dependency_graph& graph)
{
  std::string line{};
  while (std::getline(in, line))
  {
    std::string target_name{}, dependency_name{};
    std::istringstream stream{line};
    if (stream >> target_name >> dependency_name)
    {
      artifact* target{lookup_artifact(expand(target_name, 0))};
      std::string::size_type equal{dependency_name.find('=')};
      if (equal == std::string::npos)
      {
        // It's a dependency specification
        artifact* dependency{lookup_artifact(target->expand(dependency_name))};
        graph.store_dependency(target, dependency);
      }
      else
        // It's a target-specific variable
        target->store_variable(dependency_name.substr(0, equal-1),
                               dependency_name.substr(equal+1));
    }
    else if (not target_name.empty( ))
    {
      std::string::size_type equal{target_name.find('=')};
      if (equal == std::string::npos)
        // Input line has a target with no dependency,
        // so report an error.
        std::cerr << "malformed input: target, " << target_name <<
                     ", must be followed by a dependency name ";
      else
        global_variables[target_name.substr(0, equal)] =
                                          target_name.substr(equal+1);
    }
    // else ignore blank lines
  }
}

Special Member Functions

The program seems to function, but it still needs work. As written, it triggers undefined behavior if the artifact class is copied. To understand what’s going on, consider the program in Listing 59-9.

Listing 59-9.  Simple Wrapper for Dynamic Memory

#include <iostream>
 
class wrapper
{
public:
   wrapper(int x) : p_{new int{x}} {}
   ~wrapper( )                      { delete p_; }
   int value( ) const               { return *p_; }
private:
   int* p_;
};
 
void print(wrapper w)
{
   std::cout << w.value( ) << ' ';
}
 
wrapper increment(wrapper w)
{
   return wrapper(w.value( ) + 1);
}
 
int main( )
{
  wrapper w{42};
  print(increment(w));
}

Predict the output from this program.

_____________________________________________________________

The behavior is undefined, so I cannot predict exactly how it will work on your system. Most likely, the program will fail, with some kind of memory violation. It is possible that the program will run without any observable sign of failure. It’s also possible that it will print a seemingly random value instead of 43.

So what’s going on? The constructor allocates an int; the destructor deletes an int; it all seems correct. The problem is the copy constructor.

“What copy constructor?” you ask.

“The copy constructor that the compiler creates for you,” I answer.

“Oh, that copy constructor,” you reply comprehendingly.

The compiler implicitly writes a copy constructor, assignment operator, destructor, and more for you, performing member-wise copying, assignment, and destruction (and more). In this case, the copy constructor dutifully copies the p_ data member. Thus, the original and the copy both contain identical pointers. The first one to be destroyed deletes the memory, leaving the other with a dangling pointer—an invalid pointer. When that other object is destroyed, it tries to delete p_ again, but it had already been deleted. Deleting the same address more than once is undefined behavior (unless a new expression has subsequently returned that same address).

One solution to this kind of problem is to disallow copying, as you learned in Exploration 33:

class nocopy {
public:
   nocopy(int x) : p_{new int{x}} {}
   ~nocopy( )                      { delete p_; }
   nocopy(nocopy const&) = delete;
   void operator=(nocopy const&) = delete;
private:
   int* p_;
};

On the other hand, this means you can’t copy or assign the objects, which is kind of limiting. Another solution is to make a deep copy—that is, allocate and copy the dynamic memory. Listing 59-10 shows this solution applied to Listing 59-9.

Listing 59-10.  Making a Deep Copy

#include <iostream>
 
class wrapper
{
public:
   wrapper(int x)            : p_{new int{x}}         {}
   wrapper(wrapper const& w) : p_{new int{w.value( )}} {}
   ~wrapper( )                        { delete p_; }
   wrapper& operator=(wrapper w)
   {
      swap(w);
      return *this;
   }
   void swap(wrapper& w)
   {
      int* tmp{w.p_};
      w.p_ = p_;
      p_ = tmp;
   }
   int value( ) const                 { return *p_; }
private:
   int* p_;
};
 
void print(wrapper w)
{
   std::cout << w.value( ) << ' ';
}
 
wrapper increment(wrapper w)
{
   return wrapper{w.value( ) + 1};
}
 
int main( )
{
  wrapper w{42};
  print(increment(w));
}

The implementation of the assignment operator is interesting, isn’t it? You have a choice of many ways to implement this operator. The swap idiom has the advantage of simplicity. The standard library defines swap functions for the standard containers, and any decent library implements swap as a lightweight, fast function (if the container permits it). Typically, these swap functions work similar to wrapper::swap: they copy a few pointers. The trick of the assignment operator is that it takes its argument by value, not by reference. The compiler uses the copy constructor to make a copy of the source of the assignment; the swap function then swaps the current p_ value with the copy. The copy will be freed after the assignment operator returns, thereby cleaning up the original pointer, leaving the object with a deep copy of the assignment source, which is exactly what you want to have happen. It is not an optimal implementation, but it’s easy and adequate, until you learn more in coming Explorations.

Deep copies are fine in some cases, but in others, they are wasteful. The pseudo-make program should never have to copy artifact objects. Each artifact object in the program represents a single artifact in the real world, and the real world doesn’t magically make copies willy-nilly, so the program shouldn’t either. Instead, you can move real objects from place to place. So why not move artifact objects in the program? That is the topic for the next Exploration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset