Chapter 31. Using Objects to Prevent Memory Leaks

FAQ 31.01 When are memory leaks important?

When the application is important and its lifetime has some duration.

A memory leak occurs when a program allocates memory off the heap and does not return the memory when it is no longer needed. As a result, the system eventually runs out of heap memory and crashes or hangs up. In general, memory leaks cannot be tolerated, particularly for long-running applications. “Reboot every few hours” is not a practical solution to the problem, so it is important to understand how leaks occur and what can be done to prevent them. It is very, very difficult to cure these problems after the fact, but a modicum of solid engineering applied in the early stages of development can eliminate almost all the grief.

Note that there are cases when memory leaks can be ignored. Applications that are extremely short-lived don't need to worry about memory leaks. For example, they might run for only a fraction of a second and allocate less memory than the target machine has. When the application terminates, all the memory that was allocated is automatically returned to the operating system, so the only thing to worry about is whether destructors have other side effects. In cases like this it might make sense to use new but simply never use delete. However, remember: if someday some requirements require the leaks to be plugged, it is very, very difficult to do after the application has been written.

FAQ 31.02 What is the easiest way to avoid memory leaks?

image

Place pointers inside objects and have the objects manage the pointers.

The pointer returned from new should always be stored in a member variable of an object. That object's destructor should delete the allocation.

The beauty of this approach is that objects have comprehensive creation and destruction semantics (including constructors and destructors), whereas pointers have extremely rudimentary creation and destruction semantics. By putting pointers in objects it is possible to guarantee that the destructor will always be executed and the memory will be properly deallocated.

In the example that follows, whitespace-terminated words are read from the standard input stream cin (see FAQ 2.16), and the unique words are printed out in sorted order. For example, if the standard input stream contains the words “on and on and on he went,” the output will contain the unique words “and he on went.”

The preferred way to produce a sorted list of unique words is to use the string class (see FAQ 2.16) and a container class (in this case, the standard set<T> template; see FAQ 28.14):

image

Note that there are no explicit pointers in this code, and there are no chances for memory leaks. For example the string object contains a char*, but there is no possibility of a leak since the string object is local and it has a proper destructor—it handles its own memory management. Similarly for the set<string>: this object may contain many pointers and may use many memory allocations, but since the object is local and since it has a proper destructor, it manages its own memory without any possibility of memory leaks.

In contrast, the undesirable approach would be to use explicit pointers to explicitly allocated memory. The following example is more or less equivalent to the example shown above, but it is rife with opportunities for both wild pointers and memory leaks. Making this code safe would be quite a bit more difficult than the relatively simple solution shown above (the problems cited are described after the code):

image

Most programmers will notice these major problems with this code.

  1. If it runs out of memory in the malloc() step, it should probably throw an exception rather than silently returning; see FAQ 12.06.
  2. If the length of a word exceeds maxWordLen, the program overwrites memory and probably crashes.
  3. If the file contains null bytes (''), strcmp() and strdup() give the wrong answers.
  4. If it runs out of memory in the realloc() step, all the previously allocated memory is lost—a leak. Plus the routine should probably throw an exception rather than silently returning; see FAQ 12.06.

All these problems can be fixed, but fixing them makes the code even more complex. Note that theRightWay() doesn't have any of these problems and is much simpler. It properly handles running out of memory, and it can handle arbitrarily long words, arbitrarily many characters within long words (including null bytes), and arbitrarily many words.

The more subtle problem with theWrongWay() is its use of explicit pointers. If a maintenance programmer changes the code so that it exits before the last for loop, such as throwing an exception, perhaps from another routine, this code will leak memory. The code could protect against exceptions with a large try block around the whole routine except the last for loop, but it would be much harder to protect against an early return. Note that theRightWay() doesn't have this problem: it won't leak when an exception is thrown or an early return is executed.

But even if theWrongWay() is made safe, it will still be much slower than theRightWay(), especially with large numbers of unique words. This is because theWrongWay() uses a linear search (doubling the number of unique words typically quadruples the time); theRightWay() uses a much more efficient algorithm.

FAQ 31.03 What are the most important principles for resource management?

Ownership, responsibility, and focus.

Ownership: Every allocated resource is owned by exactly one resource manager object, which must be a local (auto) variable in some scope (or a member object of some local).

Responsibility: The resource manager object is charged with the responsibility of releasing the allocated resource. This is the only place the resource is released.

Focus: The resource manager object does nothing other than manage the individual resource.

A leak is simply a new that lacks a corresponding delete. Either the delete isn't physically in the source code or it is in the source code but is bypassed due to runtime control flow. Both situations are handled by the resource management discipline since the destructor for a local object always runs when control leaves the scope where the local object was created. In other words, the resource management discipline relies on the guarantees provided by the language rather than the good intentions of programmer self-discipline.

This resource management discipline can be applied to the management of all kinds of resources (e.g., files, semaphores, memory, database connections, and so on). “Memory” is used here only as a concrete example of a manageable resource.

FAQ 31.04 Should the object that manages a resource also perform operations that may throw exceptions?

Not usually.

In the following example, class Fred both manages a resource (an X allocation) and performs some operations that may throw exceptions (it calls the function mayThrow()). In other words, Fred violates the guideline. When Fred's constructor throws an exception (as a result of calling mayThrow()), there is a resource leak.

image

Because the guideline is violated, the X object leaks: the delete p_ instruction in Fred::~Fred() is never executed. Either Fred should focus on being a resource manager—and nothing but a resource manager—or Fred should delegate the resource management responsibility to some other class. In other words, either get rid of the code that calls mayThrow() from Fred, or change the X* to an auto_ptr<X>.

In those cases where it is not possible to abide by the discipline, a try block can be put in the constructor initialization list. Use this only as a last resort.

FAQ 31.05 Should an object manage two or more resources?

image

Not usually.

An object that manages a resource should manage exactly one resource. Use composition to combine multiple “pure” resource manager objects (for example, multiple auto_ptr<T> objects, File objects, and so forth). This guideline is a corollary of the guideline presented in the previous FAQ.

If an object manages two or more resources, the first resource may leak if the second allocation throws an exception. In particular, when an exception occurs during the execution of a constructor, the object's destructor is not executed so the destructor won't release the resource that was successfully allocated. In the following example, class Fred manages two resources; an X allocation and a Y allocation. When Fred's constructor throws an exception as a result of allocating a Y, the X resource leaks.

image

Because the guideline is violated, the X object leaks: the delete x_ instruction is never executed. Either Fred should focus on being a manager of a resource—not two or more resources—or Fred should delegate the resource management responsibility to some other class. In other words, either get rid of the Y resource from Fred or change the X* to an auto_ptr<X>.

In those cases where it is not possible to abide by the discipline, a try block can be put in the constructor initialization list. Use this only as a last resort.

FAQ 31.06 What if an object has a pointer to an allocation and one of the object's member functions deletes the allocation?

image

That member function must immediately restore the integrity of the object holding the pointer.

If some member function (other than the destructor) deletes memory allocated from the heap, then the member function must either reassign the pointer with a previously allocated (new) object or set a flag that tells the destructor to skip the delete. Setting the pointer to NULL can be used as such a flag.

For example, some assignment operators need to delete an old allocation as well as allocate a new one. In such cases, the new allocation should be performed before the old is deleted in case the allocation throws an exception. The goal is for the assignment operator to be atomic: either it should succeed completely (no exceptions, and all states successfully copied from the source object), or it should fail (throw an exception) without changing the state of the this object. It is not always possible to meet the goal of atomicity, but the assignment operator must never leave the this object in an incoherent state.

A related guideline is to use a local auto_ptr<T> to point to the new allocation. This will ensure that the new allocation is deleted if an exception is thrown by some subsequent operation in the member function. The ownership of the allocated object can be transferred to the this object by assigning from the local auto_ptr into the auto_ptr<T> in the this object.

FAQ 31.07 How should a pointer variable be handled after being passed to delete?

The pointer variable should immediately be put into a safe state.

After calling delete p, immediately set p = NULL or p = anotherAutoPtr (unless the pointer p is just about to go out of scope). The goal is to prevent a subsequent operation from following the pointer p (which now points at garbage) or calling delete p a second time.

Note that setting p = new Fred() is not acceptable, since the Fred allocation may throw an exception before p is changed to a safe state.

We recommend setting p = NULL immediately, in case an exception subverts the normal flow of control.

FAQ 31.08 What should be done with a pointer to an object that is allocated and deallocated in the same scope?

image

It should be placed in a managed pointer object that is local to the scope.

The goal is to make the code exception safe, that is, safe in the presence of exceptions (see FAQ 9.03). As a pleasant side effect, it becomes unnecessary to remember (and therefore, in a sense, impossible to forget) to make sure that the temporary object is deleted. Using a managed pointer (for example, an auto_ptr<T>) meets these goals since the managed pointer's destructor automatically deletes the temporary object. Here's an example.

image

FAQ 31.09 How easy is it to implement reference counting with pointer semantics?

image

It is relatively easy, and the result is worthwhile.

If the application tends to pass around pointers to dynamically allocated objects and possibly store some of the pointers in containers, it is quite possible that there will be either memory leaks or dangling references. Often a simple reference-counting scheme suffices in these circumstances.

Reference counting means that each object keeps track of how many pointers are pointing at it, and when the object no longer has any pointers pointing at it, the object deletes itself. With a little discipline, this means that the object dies when it becomes unreachable, which is precisely what is desired. A very simple implementation of this technique follows.

image

This simple reference-counting mechanism provides users with a pointer-oriented view of the Fred objects. In other words, users always allocate their Fred objects via new and point to the Fred objects via FredPtr “smart pointers.” Users can make as many copies of their FredPtr pointers as they wish, including storing some FredPtrs in containers, and the pointed-to Fred objects are automatically deleted when the last such FredPtr object vanishes.

To hide the pointers from users so that users see objects rather than pointers to objects, use reference counting with copy-on-write semantics (see FAQ 31.10).

Note that the constraint that all Fred objects be allocated via new can be enforced using the named constructor idiom (see FAQ 16.08). In this case, it means making all Fred constructors private: and defining each named constructor as a public: static create() member function. The public: static create() member function would allocate a new Fred object and would return the resulting pointer as a FredPtr (not a Fred*). Users would then use FredPtr p = Fred::create() rather than FredPtr p = new Fred().

FAQ 31.10 Is reference counting with copy-on-write semantics hard to implement?

image

It's a bit involved, but it's manageable.

Copy-on-write semantics allows users to think they're copying Fred objects, but in reality the underlying implementation doesn't actually do any copying unless and until some user actually tries to modify the copied Fred object. This approach provides users with reference semantics; the previous FAQ used reference counting to provide users with pointer semantics.

Nested class Fred::Data houses all the data that would normally go into a Fred object. Fred::Data also has an extra data member, count_, to manage the reference counting. Class Fred ends up being a smart reference: internally it points to the Fred::Data, but externally it acts as if it has the Fred::Data data within itself.

image

image

If it is fairly common to call Fred's default constructor, those new calls can be avoided by sharing a common Fred::Data object for all Freds that are constructed via Fred::Fred(). To avoid static initialization order problems (see FAQ 2.10), this shared Fred::Data object is created “on first use” inside a function. Here are the changes that need to be made (note that the shared Fred::Data object's destructor is never invoked; if that is a problem, either hope that there are no static initialization order problems, drop back to the approach described above, or use the nifty counter idiom (see FAQ 16.17)).

image

The point of all this is that users can freely copy Fred objects, but the actual data isn't copied unless and until a copy is actually needed. This can help improve performance in some cases.

To provide reference counting for a hierarchy of classes, see FAQ 31.11.

FAQ 31.11 How can reference counting be implemented with copy-on-write semantics for a hierarchy of classes?

image

Through an extension of the technique for a single class.

The previous FAQ presented a reference-counting scheme that provided users with reference semantics but did so for a single class rather than for a hierarchy of classes. This FAQ extends the technique to allow for a hierarchy of classes. The basic difference is that Fred::Data is now the root of a hierarchy of classes, which probably means that it has some virtual functions. Note that class Fred itself still does not have any virtual functions.

The virtual constructor idiom (FAQ 21.07), is used to make copies of the Fred::Data objects. To select which derived class to create, the sample code uses the named constructor idiom (FAQ 16.08), but other techniques are possible (a switch statement in the constructor, for example). The sample code assumes two derived classes, Der1 and Der2. Member functions in the derived classes are unaware of the reference counting.

image

image

image

Naturally the constructors and sampleXXX member functions for Fred::Der1 and Fred::Der2 should be implemented in whatever way is appropriate. The point is that users can copy Fred objects (pass them by value, assign them, and so on) even though they really represent a hierarchy of objects, yet the underlying data isn't actually copied unless and until a copy object is changed—that is, unless and until the copy is necessary to maintain the desired observable semantics. This can improve performance in some situations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset