EXPLORATION 60

image

Moving Data with Rvalue References

In Exploration 39, I introduced std::move() without explaining what it really does or how it works. Somehow it moves data, such as a string, from one variable to another, instead of copying the string contents, but you must be wondering how it works its wonders. The function itself is surprisingly simple. The concepts that underlie that simple implementation are more complicated, which is why I have waited until now to introduce the complexities and subtleties involved.

Temporary Objects

Historically, temporary objects could be the source of much unnecessary copying of data. For example, suppose you have a vector of strings, and you want to append a string to the end of the vector. A vector allocates an array in which to store its data (details forthcoming in Exploration 62). In C++ 03 and in a number of other languages, the vector class would ensure it has enough room and then copy the string into the vector. If the vector does not have enough room for another object, it must make room by allocating a new, larger array. It then copies all the strings out of the old array into the new.

One way to reduce the copying is to store only pointers in the vector. Copying a pointer is fast. But that requires a dynamic memory allocation for every object. Creating objects on the stack is much faster than using dynamic memory.

The goal, therefore, is to retain the benefit of creating objects on the stack, while eliminating unnecessary copying of data. When a vector has to grow, for example, the strings are moved into the new array without making any copies of the string content. When adding a string to the end of a vector, the string contents are moved into the vector without copying. Thus, it doesn’t matter how large the strings are. Moving a big string is just as fast as moving a small string.

The trick is for the compiler to know when it is safe to move data and when it must make a copy. Ordinary assignment, for example, requires making a copy of the assignment source, so the target of the assignment ends up with an accurate copy of the source. Passing an argument to a function also requires making a copy of the argument, unless the parameter is declared as a reference type.

But other objects are temporary, and the compiler knows they are temporary. The compiler knows that it doesn’t have to copy data out of the temporary object. The temporary is about to be destroyed; it doesn’t need to retain its data, so the data can be moved to the destination without copying.

To help you visualize what happens when strings are copied, create a new class that wraps std::string and shows you when its copy constructor and assignment operator are called. Then create a trivial program that reads strings from std::cin and adds them to a vector. Write the program and compare your program with mine in Listing 60-1.

Listing 60-1.  Exposing How Strings Are Copied

#include <iostream>
#include <string>
#include <vector>
 
class mystring : public std::string
{
public:
   mystring() : std::string{} { std::cout << "mystring() "; }
   mystring(mystring const& copy) : std::string(copy) {
      std::cout << "mystring copy("" << *this << "") ";
   }
};
 
std::vector<mystring> read_data()
{
   std::vector<mystring> strings{};
   mystring line{};
   while (std::getline(std::cin, line))
      strings.push_back(line);
   return strings;
}
 
int main()
{
   std::vector<mystring> strings{};
   strings = read_data();
}

(I changed the initialization style back to C++ 03, so you can try compiling and running the program with C++ 03 and C++ 11, just for fun. If you can’t put your compiler into old-fashioned mode, that’s okay. This Exploration is about C++ 11, and the historical excursion is just for fun.)

Try running the program with a few lines of input. How many times is each string copied? ______. The program copies line into the vector in push_back(). When the compiler returns the strings variable to the caller, it knows that it doesn’t have to copy the vector. It can move it instead. Thus, you don’t get any extra copies there (unless you tried the C++ 03 experiment, and you saw that the entire vector is copied).

How can we reduce the number of times the strings are copied? The line variable stores temporary data. The program has no reason to retain the value in line after calling push_back(). So we know it is safe to move the string contents out of line and into data. Call std::move() to tell the compiler that it can move the string into the vector. You also have to add a move constructor to mystring. See the new program in Listing 60-2. Now how many times is each string copied? ______.

Listing 60-2.  Moving Strings Instead of Copying Them

#include <iostream>
#include <string>
#include <utility>
#include <vector>
 
class mystring : public std::string
{
public:
   mystring() : std::string() { std::cout << "mystring() "; }
   mystring(mystring const& copy) : std::string(copy) {
      std::cout << "mystring copy("" << *this << "") ";
   }
   mystring(mystring&& move) noexcept
   : std::string(std::move(move)) {
      std::cout << "mystring move("" << *this << "") ";
   }
};
 
std::vector<mystring> read_data()
{
   std::vector<mystring> strings;
   mystring line;
   while (std::getline(std::cin, line))
      strings.push_back(std::move(line));
   return strings;
}
 
int main()
{
   std::vector<mystring> strings;
   strings = read_data();
}

The new constructor declares its parameters with a double ampersand (&&). It looks sort of like a reference. Notice that the parameter is not const. That’s because moving data from an object must necessarily modify that object. Finally, recall from Exploration 45 that the noexcept specifier tells the compiler that the constructor cannot throw an exception. Notice also that the mystring constructor calls std::move() to move its parameter into the std::string constructor. You must call std::move() for any named object, even if that object is a special && reference.

The exact output depends on your library’s implementation, but most start with a small amount of memory for the vector and grow slowly at first, to avoid wasting memory. Thus, adding just a few strings should show the vector reallocating its array. Table 60-1 shows the output from Listing 60-1, compiled in C++ 03 and C++ 11, as well as Listing 60-2, when I supply three lines of input.

Table 60-1. Comparing Output of Listing 60-1 and Listing 60-2

Listing 60-1 C++ 03

Listing 60-1 C++ 11

Listing 60-2 C++ 11

mystring()

mystring()

mystring()

mystring copy("one")

mystring copy("one")

mystring copy("one")

mystring copy("two")

mystring copy("two")

mystring copy("two")

mystring copy("one")

mystring copy("one")

mystring copy("one")

mystring copy("three")

mystring copy("three")

mystring copy("three")

mystring copy("one")

mystring copy("two")

mystring copy("two")

mystring copy("two")

mystring copy("one")

mystring copy("one")

mystring copy("one")

mystring copy("two")

mystring copy("three")

The rest of this Exploration explains how C++ implements this move functionality.

Lvalues, Rvalues, and More

Recall from Exploration 21 that an expression falls into one of two categories: lvalue or rvalue. Informally, lvalues can appear on the left-hand side of an assignment, and rvalues appear on the right-hand side. Passing arguments to functions is similar to assignment: the function parameter takes on the role of lvalue, and the argument is an rvalue.

One key way to tell the difference between an lvalue and an rvalue is that you can take the address of an lvalue (using operator &). The compiler does not let you take the address of an rvalue, which makes sense. What is the address of 42?

The compiler automatically converts an lvalue to an rvalue whenever it has to, say, when passing an lvalue as an argument to a function or using an lvalue as the right-hand side of an assignment. The only situation in which the compiler turns an rvalue into an lvalue is when the lvalue’s type is reference to const. For example, a function that declares its parameter as std::string const& can take an rvalue std::string as an argument, and the compiler turns that rvalue into an lvalue. But except for that one case, you cannot turn an rvalue into an lvalue.

The distinction between an lvalue and an rvalue is important when you consider the lifetime of an object. You know that the scope of a variable determines its lifetime, so any lvalue with a name (e.g., a variable or function parameter) has a lifetime determined by the name’s scope. An rvalue, on the other hand, is temporary. Unless you bind a name to that rvalue (remember the name’s type must be a reference to const), the compiler will destroy the temporary object as soon as it can.

For example, in the following expression, two temporary std::string objects are created and then passed to operator+ to concatenate the strings. The operator+ function binds its std::string const& parameters to the corresponding arguments, thereby guaranteeing that the arguments will live at least until the function returns. The operator+ function returns a new temporary std::string, which is then printed to std::cout:

std::cout << std::string("concat") + std::string("enate");

Once operator+ returns, the temporary std::string objects can be destroyed. The temporary std::string result from operator+ is then passed to operator<<, which binds its parameters to its arguments, ensuring that the string argument will not be destroyed until the function returns. After operator<< returns, the compiler is free to destroy the temporary std::string result of the concatenation.

The std::move() function lets you distinguish between the lifetime of an object and the data it contains, such as the characters that make up a string or the elements of a vector. The function takes an lvalue, whose lifetime is dictated by the scope, and turns it into an rvalue, so the contents can be treated as temporary. Thus, in Listing 60-1, the lifetime of line is determined by the scope. But in Listing 60-2, by calling std::move(), you are saying that it is safe to treat the string contents of line as temporary.

Because std::move() turns an lvalue into an rvalue, the return type (using the double ampersand) is called an rvalue reference. The parameters to the mystring move constructor also use double ampersands, so their types are rvalue references. A single-ampersand reference type is called an lvalue reference, to clearly distinguish it from rvalue references.

In a somewhat confusing turn of terminology, an expression with an rvalue reference type falls into both the rvalue expression category and the lvalue category. To reduce the confusion somewhat, a new name is given to this kind of expression: xvalue, for “eXpiring value.” That is, the expression is still an lvalue and can appear on the left-hand side of an assignment, but it is also an rvalue, because it is near the end of its lifetime, so you are free to steal its contents.

A new name is given to rvalues that are not also xvalues: pure rvalue, or prvalue. Pure rvalues are expressions such as numeric literals, arithmetic expressions, function calls (if the return type is not a reference type), and so on. With a complete lack of symmetry, there are no pure lvalues. Instead, the term lvalue is used for the class of lvalues that are not also xvalues. The new term generalized lvalue, or glvalue, applies to all lvalues and xvalues. Figure 60-1 depicts the new expression categories.

9781430261933_Fig60-01.jpg

Figure 60-1. Expression categories

So it turns out std::move() is actually a trivial function. It takes an lvalue reference as an argument and turns it into an rvalue reference. The difference is important to the compiler in how it treats the expression, but std::move() does not generate any code and has no impact on performance.

To summarize:

  • Calling a function that returns a type of lvalue reference returns an expression of category lvalue.
  • Calling a function that returns a type of rvalue reference returns an expression of category xvalue.
  • Calling a function that returns a non-reference type returns an expression of category prvalue.
  • The compiler matches an rvalue (xvalue or prvalue) argument with a function parameter of type rvalue reference. It matches an lvalue argument with an lvalue reference.
  • A named object has category lvalue, even if the object’s type is rvalue reference.
  • Declare a parameter with an rvalue reference type (using a double ampersand) to move data from the argument.
  • Call std::move() as the source of an assignment or to pass an argument to a function when you want to move the data from an lvalue. This transforms an lvalue reference into an rvalue reference.

Implementing Move

Implementing a constructor for the mystring class is easy, because it simply moves its argument to its base class’s move constructor. But how does a class such as std::string or std::vector implement move functionality? Go back to the artifact class in Listing 59-5. As you learned at the end of Exploration 59, copying an artifact leads to undefined behavior, due to the variables_ pointer. But it should be possible to move an artifact. Think about how you might go about moving an artifact. What, exactly, needs to be moved? How?

Moving the name is easy, because std::string already knows how to move itself. The tricky part is the variables_ member, which is a pointer. Okay, it’s not tricky: a pointer has only one owner, so after moving it, set variables_ to nullptr in the move source. Try writing a move constructor for the artifact class. Then compare your solution with mine in Listing 60-3.

Listing 60-3.  Adding a Move Constructor to the artifact Class

class artifact {
public:
   typedef std::chrono::system_clock clock;
   artifact(artifact&& source) noexcept
   : name_{std::move(source.name_)},
     mod_time_{std::move(source.mod_time_)},
     variables_{source.variables_}
   {
      source.variables_ = nullptr;
   }
 
   artifact& operator=(artifact&& source) noexcept
   {
      delete variables_;
      variables_ = source.variables_;
      source.variables_ = nullptr;
      name_ = std::move(source.name_);
      mod_time_ = std::move(source.mod_time_);
      return *this;
   }
   ... rest of artifact ...
};

Copy the variables_ pointer from the source to the destination, and then set the source pointer to nullptr. Thus, when the source is destroyed (which will happen soon), it will try to delete its variable map, but its pointer is now null. The constructor is a little easier, because you know you are starting from scratch. The assignment operator must deal with the possibility that variables_ already has a value. So delete the old variable map before moving the pointer from the source.

When you implement the move constructor and move assignment operator, be sure to supply the noexcept specifier. This tells the compiler that the function does not throw an exception. If your move functions are normal, all they do is copy some pointers, so the noexcept specifier is correct. The standard library needs to know that a move constructor does not throw an exception, so it can call that constructor from a container’s own move constructors (which is also noexcept). The next Exploration will delve deeper into the subject of exceptions.

Rvalue or Lvalue?

All these xyzvalues can get confusing. To help you understand what’s going on, Listing 60-4 shows a number of different expressions passed as arguments to overloaded print() functions.

Listing 60-4.  Examining Expression Categories

#include <iostream>
#include <utility>
 
void print(std::string&& move)
{
   std::cout << "move: " << std::move(move) << ' ';
}
 
void print(std::string const& copy)
{
   std::cout << "copy: " << copy << ' ';
}
 
int main()
{
   std::string a{"a"}, b{"b"}, c{"c"};
 
  print(a);
  print(a + b);
  print(std::move(a));
  print(std::move(a + b));
  print(a + std::move(b));
  print(a + b + c);
}

Predict the output.

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

When I run the program, I get the following:

copy: a
move: ab
move: a
move: ab
move: ab
move: c

The precise output depends on your standard library. After std::move, the value of the argument is implementation-defined. Strings are often empty, but not necessarily so. Thus, your output may vary.

Special Member Functions

When you write a move constructor, you should also write a move assignment operator, and vice versa. You must also consider whether and how to write a copy constructor and copy assignment operator. The compiler will help you by implicitly writing default implementations or deleting the special member functions. This section takes a closer look at the compiler’s implicit behavior and guidelines for writing your own special member functions (constructors, assignment operators, and destructor).

The compiler can implicitly create any or all of the following:

  • default constructor, e.g., name()
  • copy constructor, e.g., name(name const&)
  • move constructor, e.g., name(name&&)
  • copy assignment operator, e.g., name& operator=(name const&)
  • move assignment operator, e.g., name& operator=(name&&)
  • destructor, e.g., ~name()

A good guideline is that if you write any one of these special functions, you should write them all. You might decide that the compiler’s implicit function is exactly what you want, in which case, you should say so explicitly with =default. That helps the human who maintains the code to know your intentions. If you know that the compiler would suppress a special member, note that explicitly with = delete.

As you know, the compiler deletes its implicit default constructor if you explicitly write any constructor. The implicit default constructor leaves pointers uninitialized, so if you have any pointer-type data member, you should write your own default constructor to initialize pointers to nullptr or delete the default constructor.

The compiler deletes its copy constructor and copy assignment operator if you explicitly provide a move constructor or move assignment operator. Thus, you know that copying an artifact class is unsafe. After implementing a move constructor, you don’t want a copy constructor, and the compiler automatically obliges and suppresses the copy constructor and copy assignment operator.

The compiler deletes its move constructor if you explicitly provide any of the following special member functions: move assignment, copy constructor, copy assignment, or destructor.

The compiler deletes the move assignment operator if you explicitly provide any of the following special member functions: move constructor, copy constructor, copy assignment, or destructor.

The compiler’s default behavior is to ensure safety. It will implicitly create copy and move functions if all the data members and base classes permit it. But if you start to write your own special members, the compiler assumes that you know best and suppresses anything that may be unsafe. It is then up to you to add back the special members that make sense for your class. When the compiler implicitly supplies any special member function, it also supplies the noexcept specifier when it is safe and correct to do so, that is, when all the data members and base classes also declare that function noexcept (or are built-in types).

Whenever a class allocates dynamic memory, you must consider all the special member functions. Ensure that pointer-type data members are always initialized. Ensure that move constructors and assignment operators correctly set the source pointers to nullptr. If you must implement a copy constructor or copy assignment operator, implement the appropriate deep copy or otherwise ensure that two objects are not both holding the same pointer. Make sure everything the class allocates gets deleted.

The requirements of data members apply to the class. Thus, if a data member lacks a copy constructor, then the compiler suppresses the implicit copy constructor for the containing class. After all, how could it copy the class if it can’t copy all the members? Ditto for move functions.

As you can see, dynamic memory involves a number of complications. The next Exploration takes a look at more complications—namely, exceptions. Exceptions greatly complicate proper handling of dynamic memory, so pay close attention. (But not to worry, by Exploration 63, you will learn how to handle all these complications so that pointers will be easy to use.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset