The purpose of this chapter is to show you how to eliminate a nasty category of bugs from your software. The bugs discussed in this chapter are quite subtle—the compiler normally does not give any warning or error messages—and disastrous, often causing the application to crash or behave chaotically.
The specific details involve three infrastructure routines that the C++ compiler automatically defines when the developer leaves them undefined. A guideline is provided so readers can tell when those automatic definitions will cause problems and when they won't cause problems.
It is essential that every C++ programmer understand the material in this chapter.
Destructor, copy constructor, and assignment operator.
These infrastructure routines provide the death and copy semantics for objects of the class. Here is some sample syntax:
The compiler synthesizes a destructor for the object's class.
For example, if an object of class Fred
is destroyed and class Fred
doesn't provide an explicit destructor, the compiler synthesizes a destructor that destroys all the Fred
object's member objects and base class subobjects. This is called memberwise destruction. Thus, if class Fred
doesn't have an explicit destructor and an object of class Fred
contains an object of class Member
that has an explicit destructor, then the compiler's synthesized Fred::~Fred()
invokes Member
's destructor.
The built-in types (int
, float
, void*
, and so on) can be regarded as having destructors that do nothing.
The compiler's synthesized Fred::~Fred()
calls Member::~Member()
automatically, so the output is
before destructing a Fred
destructing a Member object
after destructing a Fred
The compiler synthesizes a copy constructor for the object's class.
For example, if an object of class Fred
is copied and class Fred
doesn't provide an explicit copy constructor, the compiler synthesizes a copy constructor that copy constructs all the Fred
object's member objects and base class subobjects. This is called memberwise copy construction. Thus, if class Fred
doesn't have an explicit copy constructor, and an object of class Fred
contains an object of class Member
that has an explicit copy constructor, then the compiler's synthesized Fred::Fred(const Fred&)
invokes Member
's copy constructor.
Built-in types (int
, float
, void*
, and so on) can be viewed as having copy constructors that do a bitwise copy.
The compiler's synthesized Fred::Fred(const Fred&)
calls Member:: Member(const Member&)
automatically, so the output is
constructing a Member
copying a Member
The compiler synthesizes an assignment operator for the object's class.
For example, if an object of class Fred
is assigned and class Fred
doesn't provide an explicit assignment operator, the compiler synthesizes an assignment operator that assigns all the Fred
object's member objects and base class subobjects. This is called memberwise assignment. Thus if class Fred
doesn't have an explicit assignment operator and an object of class Fred
contains an object of class Member
that has an explicit assignment operator, then the compiler's synthesized Fred::operator= (const Fred&)
invokes Member
's assignment operator.
Built-in types (int
, float
, void*
, and so on) can be viewed as having assignment operators that do a bitwise copy.
The compiler's synthesized Fred::operator= (const Fred&)
calls Member::operator= (const Member&)
, so the output is
constructing a Member
constructing a Member
assigning a Member
If a class needs any of the Big Three, it needs them all.
This doesn't mean that every class should have all three of the Big Three. On the contrary, the Big Three are needed only in a relatively small percentage of classes. That is one of the reasons this is such an insidious error. Programmers see these infrastructure routines in only some of their classes, so they don't remember the critical Law of the Big Three.
This law first appeared in 1991 in the comp.lang.c++
FAQ, and it seems to be rediscovered every six months or so. Violations almost always lead to incorrect behavior and often lead to disasters.
In particular, violations of the Law of the Big Three often corrupt the heap. This usually means that the program does not crash until much later in the program's execution (and simple test programs may not crash at all). By the time the programmer goes in with a debugger, the root cause is hard to identify and the heap has so many things wrong with it that it's difficult to trace what's going wrong.
An explicit destructor.
Developers typically discover the need to do something special during a normal constructor, which frequently necessitates undoing the special action in the destructor. In almost all cases, the class needs a copy constructor so that the special thing will be done during copying, and the class also needs an assignment operator so that the special thing will be done during assignment.
The destructor is the signal for applying the Law. Pretend that the keyboard's ~ (tilde) key is painted bright red and is wired up to a siren.
In the following example, the constructor of class MyString
allocates memory, so its destructor delete
s the memory. Typing the ~ of ~MyString()
should sound a siren for the Law of the Big Three.
Classes that own allocated memory (hash tables, linked lists, and so forth) generally need the Big Three (see FAQ 30.08).
Remote ownership is the responsibility that comes with being the owner of something allocated from the heap.
When an object is the logical owner of something allocated from the heap (known as the referent), the object is said to have remote ownership. That is, the object owns the referent. When an object has remote ownership, it usually means that the object is responsible for delete
ing the referent.
Any time a pointer is added to an object's member data, the class's author should immediately determine whether the object owns the referent (that is, whether the object has remote ownership). If this determination is delayed, the class's implementation can become schizophrenic—some of the object's member functions assume that the object owns the referent, others assume that someone else owns the referent. This is usually a mess and sometimes a disaster.
It requires a deep copy, not a shallow copy.
When an object has remote ownership, the object needs the Big Three (destructor, copy constructor, and assignment operator). These routines are responsible for destroying the referent, creating a copy of the referent, and assigning the referent, respectively.
The copy semantics for remote ownership require the referent to be copied (a.k.a. deep copy) rather than just the pointer (a.k.a. shallow copy). For example, if class MyString
has a pointer to an array of characters, copying the MyString
object should copy the array. It is not sufficient to simply copy the pointer to the array, since that would result in two objects that both think they are responsible for delete
ing the same array.
When an object contains pointers for which it does not have remote ownership, the copy semantics are usually straightforward: the copy operation merely copies the pointer. For example, an iterator object might have a pointer to a node of a linked list, but the node is owned by the list rather than by the iterator, so copying an iterator merely needs to copy the pointer; the data in the node is not copied to the new iterator.
When an object doesn't contain pointers, the copy semantics are usually straightforward: the corresponding copy operation is called on each member object. This is what the compiler does automatically if the class doesn't have any copy operations (see FAQs 30.04, 30.05), which is why the Big Three are not usually needed in these cases.
Trouble is brewing.
The following EvilString
class doesn't have an explicit copy constructor or assignment operator, so the compiler synthesizes a copy constructor and/or assignment operator when it sees an EvilString
being copy initialized and/or assigned, respectively. Unfortunately the compiler-synthesized copy constructor and assignment operators copy only the pointer (shallow copy) rather than the referent (deep copy).
If an EvilString
is copied (passed by value, for example; see FAQ 20.07), then the copy points to the same string data as the original. When the copy dies, the data they are sharing is delete
d, leaving the original EvilString
object with a dangling reference. Any use of the original object, including the implicit destruction when the original dies, will probably corrupt the heap, which will eventually crash the program.
Note that the problem is not with pass-by-value. The problem is that the copy constructor for class EvilString
is broken. Similar comments can be made regarding the assignment operator.
Yes, auto_ptr
.
The standard template class auto_ptr<T>
is a partial solution to managing remote ownership. auto_ptr<Fred>
acts like a Fred*
, except the referent (the Fred
object) is automatically delete
d when the auto_ptr
dies. auto_ptr<T>
is known as a managed pointer.
Managed pointers are useful whenever a referent is allocated by new
and when the owner of the pointer owns the referent. In other words, auto_ptr<T>
is useful for managing remote ownership.
The most important issue isn't that auto_ptr<T>
saves the one line of delete
code. The most important issue is that auto_ptr<T>
handles exceptions properly: the referent is automagically delete
d when an exception causes the auto_ptr<T>
object to be destructed. In the following example, class Noisy
throws exceptions randomly to simulate the fact that we can't always predict when an exception is going to be thrown (hopefully your classes don't have this property).
Here is a function that randomly returns true
and false
with 50–50 probability:
Here is a class that prints messages and possibly throws exceptions in its functions.
Here is a function that wisely chooses to use the managed pointer auto_ptr<Noisy>
.
Here is the same function, but this time using a raw Noisy*
pointer. Note how much more complex this code is, even though it is doing the same thing. A significant portion of this code exists solely to ensure that the referent is deleted properly, whereas in the previous example the managed pointer enabled most of this scaffolding to disappear.
Here is main()
that repeatedly calls the foregoing routines.
auto_ptr
enforce the Law of the Big Three and solve the problems associated with remote ownership? No. auto_ptr<T>
plugs leaks, but it doesn't enforce the Law of the Big Three.
When a class uses a plain T*
to implement remote ownership, forgetting any of the Big Three causes the compiler to silently generate wrong code. The result is often a disaster at runtime.
Unfortunately, replacing the T*
with a managed pointer such as auto_ptr<T>
does not correct the problem. The root of the problem is that when an auto_ptr<T>
is copied, ownership of the referent is transferred to the copy and the original object's auto_ptr<T>
becomes NULL
. This is often undesirable. What is needed instead is for the referent to be copied or for a compile-time error to be generated that flags the problem.
The safest solution is to define and use a strict auto_ptr<T>
. For example, the following could go into file strict_auto_ptr.h
and could be reused whenever anyone wanted a strict auto_ptr<T>
. Note that the copy constructor and assignment operator are private:
and are undefined, thus making it impossible to copy a strict_auto_ptr<T>
object.
When strict_auto_ptr<T>
is used, the compiler either synthesizes the Big Three correctly or causes specific, compile-time errors; it does not allow run-time disasters.
The following example shows a class that implements remote ownership by the managed pointer strict_auto_ptr<Noisy>
rather than the plain pointer Noisy*
.
Because strict_auto_ptr<Noisy>
's destructor delete
s the referent, Fred
doesn't need an explicit destructor. The Fred::~Fred()
synthesized by the compiler is correct.
Because strict_auto_ptr<Noisy>
's copy constructor and assignment operator are private:
, the compiler is prevented from synthesizing either the copy constructor or the assignment operator for class Fred
. Copying or assigning a Fred
produces a specific, compile-time error message. Compare this to using a Noisy*
, in which case the compiler silently synthesizes the wrong code, producing disastrous results.
For example, when the GENERATE_ERROR
symbol is #define
d in the following function, the compiler gives an error message rather than silently doing the wrong thing.
strict_auto_ptr<T>
effectively automates the proper delete
and prevents the compiler from synthesizing improper copy operations. It plugs leaks and enforces the Law of the Big Three.
Yes, but define them all anyway.
There are cases where one or two of the Big Three may be needed but not all three. All three should usually be defined anyway so that people don't have to think so hard during code reviews and maintenance activities. Here are four common times when this happens: virtual
destructors, protected:
assignment operators, recording creation or destruction, and unnecessary or illogical copy operations.
Virtual destructors: A base class often has a virtual
destructor to ensure that the right destructor is called during delete basePointer
(see FAQ 21.05). If this explicit destructor exists solely to be made virtual
(for example, if it does what the synthesized destructor would have done, namely { }
), the class may not need an explicit copy constructor or assignment operator.
Protected assignment operators: An ABC often has a protected:
assignment operator to prevent users from performing assignment using a reference to an ABC (see FAQ 24.05). If this explicit assignment operator exists solely to be made protected:
(for example, if it does what the synthesized assignment operator would have done, namely memberwise assignment), the class may not need an explicit copy constructor or destructor.
Recording creation or destruction: A class sometimes has an explicit destructor and copy constructor solely to record the birth and death of its objects. For example, the class might print a message to a log file or count the number of existing objects. If the explicit destructor or copy constructor exists solely to perform this information recording (for example, if these operations do what the synthesized versions would have done), the class may not need an explicit assignment operator, since assignment doesn't change the number of instances of a class.
Unnecessary or illogical copy operations: There are cases where a class simply doesn't need one or both copy operations. Sometimes the copy operations don't even make logical sense. For example, the semantics of class File
may mean that it is nonsensical to copy File
objects; similarly for objects of class Semaphore
. In these cases, the unnecessary copy operations are normally declared in the private:
section of the class and are never defined. This prevents the compiler from synthesizing these operations in the class's public:
section and causes compile-time error messages whenever a user accidentally calls one of these member functions. In this case, it is not strictly necessary to define the other members of the Big Three just because one or both copy operations are declared in the private:
section of the class.
Yes, when the Big Three need to be non-inline
.
When the compiler synthesizes the Big Three, it makes them inline
. If the application's classes are exposed to customers (for example, if customers #include
the application's header files rather than merely using an executable), the application's inline
code is copied into their executables. If you want to maintain binary compatibility between releases of your library, you must not change any visible inline
functions, including the versions of the Big Three that are synthesized by the compiler. Therefore, explicit, non-inline
versions of the Big Three should be used.
memcpy()
cause a program crash?Because bitwise copying is evil.
A class's copy operations (copy constructor and assignment operator) are supposed to copy the logical state of an object. In some cases, the logical state of an object can be copied using a bitwise copy (e.g., using memcpy()
). However a bitwise copy doesn't make sense for a lot of objects; it may even put the copy in an incoherent state.
If a class X
has a nontrivial copy constructor or assignment operator, bitwise copying an X
object often creates wild pointers. One common case where bitwise copying of an object creates wild pointers is when the object owns a referent (that is, it has remote ownership). The wild pointers are a result of the bitwise copy operation, not some failure on the part of the class designer.
For example, consider a class that has remote ownership, such as a string
class that allocates an array of char
from the heap. If string
object a
is bitwise copied into string b
, then the two objects both point to the same allocated array. One of these string
s will die first, which will delete
the allocated array owned by both of them. BOOM!
Note that a bitwise copy is safe if the object's exact class is known and the object is (and will always remain!) bitwise copyable. For example, class string
might use memcpy()
to copy its string data because char
is and will always remain bitwise copyable (assuming that the string data is a simple array of char
).
Because variable-length argument lists use bitwise copy, which is dangerous in many cases. There are times where variable-length argument lists don't cause a problem (printf
comes to mind). But it is wise to avoid using them unless there is some compelling reason.
Objects passed into ellipses (...) are passed via bitwise copy. The parameter objects are bitwise copied onto the stack, but the va_arg
macro uses the copy constructor to copy the pile of bits from the stack. The technical term for this asymmetry is ouch.
“Ladies and gentlemen, this is your pilot speaking; please fasten your seat belts in preparation for the air turbulence ahead.”
main()
's three Fred
objects are constructed via Fred::Fred()
. The call to f(int,Fred...)
passes these Fred
s using bitwise copy. The bitwise copies may not be properly initialized Fred
objects and are not logical copies of a
, b
, and c
. Inside f(int,Fred...)
, the va_arg
macro uses a pointer cast (shudder) to create a Fred*
, but this Fred*
doesn't point to a valid Fred
object because it points to a bitwise copy of a Fred
object. The va_arg
macro then dereferences this (invalid) pointer and copies the pile of incoherent bits (via Fred
's copy constructor) into the local variable, x
.
If Fred
has nontrivial copy semantics, the chances that the bitwise copy is the same as a logical copy is remote at best.
Variable-length argument lists are evil.
realloc()
to reallocate an array of objects crash?When realloc()
needs to move the storage that is being reallocated, it uses bitwise copy rather than invoking the appropriate constructor for the newly allocated objects.
Use realloc()
only for objects guaranteed to be bitwise copyable.