Chapter 32. Wild Pointers and Other Devilish Errors

FAQ 32.01 What is a wild pointer?

A wild pointer is a pointer that refers to garbage.

There are three ways to get a wild pointer.

  1. An uninitialized pointer that contains garbage bits
  2. A pointer that gets inadvertently scribbled on (for example, by another wild pointer; this is the domino effect)
  3. A pointer that refers to something that is no longer there (a dangling reference)

In C, the classic example of a dangling reference (3) occurs when a function returns a pointer to a local variable or when someone uses a pointer that has already been passed to free. Both situations can occur in C++, too.

Wild pointers are bad news no matter how they are created. Bad enough that we devote this entire chapter to the subject.

FAQ 32.02 What happens to a program that has even one wild pointer?

A wild pointer is to software what a car bomb is to a busy street: both cause indiscriminate pain and suffering.

After a program spawns its first wild pointer, an awesome chain reaction begins. The first wild pointer scribbles on a random memory location, which probably corrupts the object at that location, creating other wild pointers. Eventually—almost mercifully—one of these wild pointers attempts to scribble on something protected by the operating system or the hardware, and the program crashes.

By the time that happens, finding the root cause of the error with a debugger is nearly hopeless; what was once a cohesive system of objects is now a pile of rubble. The system has literally blown itself to bits.

Wild pointers create unstable systems. Arbitrarily small changes, such as inserting an extra semicolon, running the program on a different day of the week, or changing the way you smile as you press the Enter key can cause arbitrarily large changes in how the system behaves (or misbehaves). Sometimes the program deletes user files, sometimes it just gives the wrong answer, sometimes it actually works!

Wild pointers are a problem worth avoiding.

FAQ 32.03 What does the compiler mean by the warning “Returning a reference to a local object”?

It means “Pay attention to me or you'll regret it!”

A local (auto) object is an object local to a routine (and it is usually allocated on the stack). Never return a reference or a pointer to a local (auto) object. As soon as the function returns, the local object is destructed, and the reference or pointer refers to garbage. A program working with garbage eventually gets very, very sick.

Note that returning a copy of a local object (returning “by value”) is fine.

FAQ 32.04 How should pointers across block boundaries be controlled?

Avoid storing the address of a local object created in an inner scope in a pointer in an outer scope. Here's an example.

image

When control flow leaves the inner block, a will be destroyed and p will be pointing at garbage. Because control can leave the inner scope a number of different ways (including an uncaught exception), setting the outer scope's pointer to point to an inner scope's object can lead to subtle errors and should be avoided on principle.

If the address of an inner scope's object has to be stored in an outer scope's pointer, then the outer scope's pointer should be changed to NULL (or some other safe value) before leaving the inner scope. Generally speaking, you should guarantee that the pointer is set to NULL by creating a pointer-like class whose destructor sets the pointer to NULL, then replace the Fred* local variable with a local object of that class.

Note that the problem addressed by this FAQ can occur only with pointers, not with references. This is because a reference is permanently bound to its referent at the moment it is initialized. This is yet another reason to prefer references to pointers (see FAQ 11.09).

FAQ 32.05 Is the reference-versus-pointer issue influenced by whether or not the object is allocated from the heap?

image

No, there is very little relationship between these issues.

Occasionally, the claim is made that if an object is allocated via new then it should be passed via pointer; otherwise it should be passed by reference. This is not correct. There are two separate questions, when to delete the object and how to pass it.

First consider the issue of deleting the object. If an object is allocated from the heap (e.g., p = new Fred();), then some routine has to be responsible for deleting it (e.g., delete p;), and the routine must have a pointer (e.g., p) to it. There are three common situations.

  1. The routine responsible for deleting the object is the same routine that created it, in which case a local auto_ptr is the easy solution: e.g., auto_ptr<Fred> p(new Fred());
  2. The routine responsible for deleting the object is the destructor of the same object that created the object. In this case put an auto_ptr in the this object and define a copy constructor and assignment operator that allocate a copy of the object from the heap (see FAQ 30.12).
  3. There is no clear responsibility for the delete, but the newed object should be deleted when there are no pointers to it. In this case, use reference counting and avoid passing raw pointers to the object. (See FAQ 31.09.)

Now consider how the object should be passed. Assume that the routine f() takes a Fred object. Which is better, f(Fred* p) or f(Fred& r)? The key criterion is this: Does f() want to handle the case when it gets passed a nonobject (that is, the NULL pointer)? If it does, then the pointer form is indicated because it can use NULL to indicate the nonobject case. If f() always needs an actual Fred object, then the best way to signal this is to use a reference, which guarantees that it can't be passed a NULL since a reference can't legally be NULL.

Notice that the issues of deletion and passing are almost completely independent. Obviously, if reference counting is used to handle the deletion problem, then pointer-like objects are typical. But otherwise the questions aren't related. References can be used even if the object was allocated off the heap, and pointers can be used even if the object was not allocated from the heap, since it is always possible to have a pointer to a local or global object (so long as the object outlives the pointer to it).

FAQ 32.06 When should C-style pointer casts be used?

Rarely, probably only when interfacing with other languages. Any casting that must be done should use the C++ facilities for type-safe casting.

C-style pointer casts are the goto of OO programming. A goto complicates the control flow, making it difficult to statically reason about the flow of control. To determine the code's behavior, the dynamic flow of control has to be simulated. A pointer cast complicates the type flow, making it difficult to statically reason about the type of an object. To determine the code's behavior, the dynamic flow of types must be simulated. Use a C-style pointer cast as often as you would use a goto.

C-style pointer casts are also error prone. The basic problem is that the compiler meekly accepts C-style pointer casts without using runtime checks to see if they are correct. This can create wild pointers. Shudder.

Developers with a background in untyped (a.k.a. dynamically typed) languages tend to produce designs whose implementations employ an excessive number of pointer casts. These old habits must be terminated without prejudice. The lowest levels of memory management are among the few places where pointer casts are necessary.

Reference casts are just like pointer casts and are equally dangerous.

FAQ 32.07 Is it safe to bind a reference variable to a temporary object?

Yes, as long as that reference isn't copied into another reference or pointer.

In the following example, an unnamed temporary string object is created at line 1. A (const) reference (main()'s x) is bound to this temporary. The language guarantees that the unnamed temporary will live until the reference x dies, which in this case is at the end of main(). Therefore, line 2 is safe: the compiler isn't allowed to destruct the unnamed temporary string object until line 3.

image

There is a caveat—don't copy reference x into a pointer variable that's out of the scope in which the temporary was created. For a subtle example of this, see the next FAQ.

FAQ 32.08 Should a parameter passed by const reference be returned by const reference?

No; it might create a dangling reference, which could destroy the world.

Returning an object by reference is not dangerous in and of itself, provided that the lifetime of the referent exceeds the lifetime of the returned reference. This cannot be guaranteed when a const reference parameter is returned by const reference, because the original argument might have been an unnamed temporary.

In the following example, an unnamed temporary string object is created at line 1. Parameter x from function unsafe() is bound to this temporary, but that is not an explicit, local reference in the scope of main(), so the temporary's lifetime is governed by the usual rules—the temporary dies at the ; of line 1. Unfortunately, function unsafe() returns the reference x, which means main()'s y ends up referring to the temporary, even though the temporary is now dead. This means that line 2 is unsafe: it uses y, which refers to an object that has already been destructed—a dangling reference.

image

Note that if a function accepts a parameter by non-const reference (for example, f(string& s)), returning a copy of this reference parameter is safe because a temporary cannot be passed by non-const reference.

FAQ 32.09 Should template functions for things like min(x,y) or abs(x) return a const reference?

No!

When the following example is compiled and the symbol UNSAFE is defined, min(x,y) avoids an extra copy operation by returning a const reference parameter by const reference. As discussed in the previous FAQ, this can create a dangling reference, which can destroy the world.

image

Returning a const reference to a const reference parameter is normally done as an optimization to avoid an extra copy operation. If you're willing to sacrifice correctness, you can make your software very fast!

FAQ 32.10 When is zero not necessarily zero?

image

When dealing with pointers.

When the token 0 appears in the source code at a place where a pointer should be, the compiler interprets the token 0 as the NULL pointer. However, the bit pattern for the NULL pointer is not guaranteed to be all zeros. More specifically, setting a pointer to NULL may set some of the bits of that pointer to 1.

Depending on the hardware, the operating system, or the compiler, a pointer whose bits are all zeros may not be the same as the NULL pointer. For example, using memset() to set all the bits of a pointer to zero may not make that pointer equal to NULL.

In the following program, all conforming C++ compilers produce code that prints 0 is NULL, then NULL is NULL, but some may produce code that prints memsetPtr is not NULL.

image

Another common way to generate a pointer whose bits are all zero that is equally dangerous is with unions. For example, the following is wrong on two levels. First, it accesses the char* member of the union even though it was the unsigned long member that was set most recently. Second, it assumes that a pointer whose bits are all zero is the same as a NULL pointer—the output may be unionPtr is not NULL on some machines.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset