Chapter 28. Containers

FAQ 28.01 What are container classes and what are the most common mistakes made with container classes?

image

Containers are objects that hold other objects (see FAQ 2.15). For example, a linked list of string objects is a container—users can add numerous elements (or element objects) to the list, and in this case all the elements are of type string. The closest that the core C++ language comes to container classes is the array—but see FAQ 28.02 to find out why C++ arrays should not be used.

Unfortunately programmers' instincts tend to lead them in the wrong direction with containers. This may be because of past experience with other programming languages or it may be for some other reason, but whatever the reason the result is the same: programmers tend to make common and serious mistakes with containers that lead to bugs. Here are a few common mistakes.

• Using arrays rather than safe container classes (the “Grandma used arrays” fiasco; see FAQ 28.02)

• Rolling your own container classes from scratch (the “not invented here” fiasco; see FAQ 28.03)

Containers of pointers (the “random object ownership” fiasco; see FAQ 28.04)

• Containers of char* (the “string contents vs. string address” fiasco; see FAQ 28.06)

• Containers of auto_ptr<T> (the “transfer of object ownership” fiasco; see FAQ 28.07)

• Inheriting everything from Object (the “based object” fiasco; see FAQ 28.08)

• Selecting incompatible vendors for container classes (the “best of breed” fiasco; see FAQ 28.10)

FAQ 28.02 Are arrays good or evil?

image

Arrays are evil. They have their place in some specialized applications, but they should be avoided.

The most common mistake with container classes is failing to use them and substituting arrays instead. Unfortunately, this is not only a fairly common mistake, it's also a very dangerous mistake. Arrays simply aren't safe. They're pointers in disguise, and pointers aren't safe. Yes, Grandma used pointers and survived, but given the relative size and complexity of today's software systems and given all the other things that need to be handled, raw pointers or arrays generally aren't worth the risk. Instead pointers and arrays should be wrapped inside a safe interface—a container class.

Avoid using arrays and pointers. Use container classes instead.

FAQ 28.03 Should application development organizations create their own container classes?

image

No. Too many application development organizations roll their own container classes from scratch. This is often motivated by the “not invented here” (NIH) syndrome, which is lethal to any kind of reuse.

The problems with application development organizations rolling their own container classes include the following.

Increased development costs: It's normally far cheaper to pay for a class library than to pay yourself to write your own.

Increased long-term costs: Building your own means increasing your maintenance costs, which is usually the last thing you want to do.

Degraded quality: Application developers are not usually experts in the intricacies of data structures, so they often do a mediocre job with container classes—reinventing the wheel is bad enough, but usually they reinvent a flat tire.

Lack of flexibility and versatility: It is hard for an in-house development team to compete with the well-funded software houses in developing software that is flexible enough to meet the needs of the whole industry. This flexibility may be very important later in the life cycle.

Loss of focus: Application developers should focus on getting the application done on time and within budget, not on plumbing.

Missed opportunities for standardization: It makes sense for the containers to be as standardized as the basic language types for both training and maintenance considerations.

FAQ 28.04 What are some common mistakes with containers of pointers?

image

Lifetime errors caused by ownership confusion.

Be careful about containers of pointers. A container of pointers contains pointers to the objects that are inserted into the container, whereas a container of values contains copies of the objects that are inserted into the container. The main purpose of a container of pointers is to hold on to the identities of certain objects, but the main purpose of a container of values is to hold on to the state of certain objects.

Containers of pointers allow objects to be inserted without copying, allow the contained objects to be distinguished by identity rather than merely by state or value, and allow objects of classes within an inheritance lattice to be inserted without slicing (a.k.a. chopped copies). However, containers of pointers can create a difficult relationship between the user of the container and the container. For example, if a pointer to an object is in some container and someone else deletes the object, the container ends up with a dangling reference (see FAQ 24.04). In some cases this means that an object needs to know all the containers that point to it, which sometimes requires each object to maintain a container of container pointers. Sometimes users can't even change an object without informing all the containers that point to the object. For example, if a user changes an object's state and a container's order semantics depend on the object's state, the innocent change might subtly break the container's invariant. This becomes very messy and complex.

FAQ 28.05 Does this mean that containers of pointers should be avoided?

image

The rule of thumb is to use containers of values when you can, use containers of pointers when you must.

Containers of pointers must be used when the identity of the contained objects matters, in addition to their state. For example, containers of pointers must be used when the same instance must be “in” several containers at the same time.

Another reason containers of pointers are used is when the cost to copy an object into and out of a container is too large, but in this case judicious use of reference counting (see FAQ 31.09) and copy on write (see FAQ 31.10) can reduce the copying cost significantly.

FAQ 28.06 Surely good old-fashioned char* is an exception, right?

image

Wrong!

For example, if a container of char* is used when the goal is to have a container of strings, lookup requests may fail since the container will be looking for a particular pointer rather than for the contents of the string. To make matters worse, if a string buffer is used to build the string foo and that is inserted into the container and the string buffer is later changed to junk, the container will appear to contain junk rather than foo.

If a string is desired, a string class should be used and char* should be avoided. Every compiler vendor has a string class, and there is a standard C++ string class called string.

FAQ 28.07 Can auto_ptr<T> simplify ownership problems with containers of pointers?

image

No, auto_ptr<T> usually makes things worse!

Generally speaking, auto_ptr<T> should not be used inside container classes. For example, if a List of Fred pointers is desired, it is usually bad to use a List<auto_ptr<Fred> >. The reason is that copying an auto_ptr<T> makes changes to the original, in addition to the obvious changes to the copy. In particular, the ownership of the referent is transferred to the copy, so the original will be NULL. For example, if someone retrieved a copy of an element in the List, that would copy the auto_ptr<T>, which would transfer ownership of the referent to the copy and the auto_ptr<T> in the List would be NULL.

FAQ 28.08 Can a Java-like Object class simplify containers in C++?

image

No, that usually makes things worse!

Java is not C++. C++ is not Java. Just because something works well in Java doesn't mean it will work well in C++ or vice versa. Java has automatic garbage collection, C++ has destructors (FAQ 2.20); Java has a ubiquitous base class Object, C++ has templates (FAQ 2.15). For a lot of reasons, Java's container classes all take Object pointers, and they work well. It's possible to do the same thing in C++, but in C++ this technique tends to be more complex and expensive than using templates. For example, when Fred objects are inserted into a list of Object*, the only thing that is known about them is that they are some kind of Object. When these objects are accessed from the list, the programmer has to carefully find out what the objects really are and typically has to downcast the Object* before anything useful can be done with the objects. This is another form of dynamic type checking, and it can be expensive to write, test, and maintain (see FAQ 27.03).

Forcing things to inherit from a common base class such as Object is called the based object approach. See the next FAQ for more details on these heterogeneous containers.

FAQ 28.09 What's the difference between a homogeneous and a heterogeneous container?

The elements of a homogeneous container are all of the same type; the elements of a heterogeneous container can be of different types.

Containers come in many shades of gray. Generally speaking, the more heterogeneous the element types are, the less type safe the container is. For example, the ultimate heterogeneous container is a container of void* in which the various elements could point to objects of any type. Even though this seems to optimize flexibility, in practice such containers are nearly worthless. In particular, putting things into such a container is easy (any object of any type can be inserted), but when an element is removed, the user of the container knows nothing about the element's type.

A more useful form of heterogeneous container is one that requires its elements to point to objects derived from some specific base class, often an ABC. This base class typically has all or nearly all the member functions provided by the actual objects referenced by the container. For example, one might have a list of Shape* where the actual referents might be objects of class Square, Circle, Hexagon, and so on.

Note that the based object approach (see the previous FAQ) is another form of heterogeneous container.

FAQ 28.10 Is it a good idea to use a “best of breed” approach when selecting container classes?

image

Not usually.

For small projects, it's often desirable to select each component using a best of breed approach. That is, select the best of breed in category X, the best of breed in category Y, and so on.

But for large, mission-critical projects, it is very important to reduce the overall system complexity, and selecting the very best container classes could actually make things worse (see also FAQ 39.08). Most vendors' GUI frameworks are designed to work very well with the vendor's container classes. In some cases, this increases the integration costs of mixing container classes from one vendor and GUI classes from another. Database, network, and other infrastructure frameworks are similar: it is often difficult and/or risky to mix libraries and frameworks classes from different vendors.

When this mix-and-match problem occurs, the low-cost, low-risk approach is often to select the container classes that “go with” the rest of the infrastructure. Usually that means that the container classes are less than ideal, often frustrating programmers who don't see the big picture. But remember: the goal is to reduce the overall system complexity, and container classes are only one piece.

FAQ 28.11 Should all projects use C++'s standardized containers?

image

Not always, because of big picture issues. We hope that someday the software industry will fully embrace standardized containers, but until then, be prepared to make decisions that will be uncomfortable.

When compared to other container classes, C++'s standardized containers have several benefits.

• They are standardized and are therefore portable.

• They are fast—very, very fast.

However in large and/or mission-critical applications, integration issues often dominate. When this happens, it may be better in the overall scheme of things to use a vendor's proprietary container classes (see FAQ 28.10). Programmers who have not worked on large and/or mission-critical projects where integration issues became a problem do not understand this and usually argue in favor of the portability or performance of the standardized container classes.

For example, in some cases a particular vendor's library is considered essential to the project's success, and the library might be (shouldn't be, but in the real world it might be) tightly integrated with the container classes from the same vendor. In such a case, the cost and risk of integrating standard containers might not be worth the benefit. It's a dilemma of local versus global optimization.

Of course there are some cases where performance or portability is more important than integration. In these cases the standardized C++ container classes should be considered, even if integrating them with the other third-party libraries requires some invention.

FAQ 28.12 What are the C++ standardized container classes?

image

The C++ container classes are a part of what was formerly known as the standard template library (STL). They include the following container classes.

vector<Value> (see FAQ 28.13).

list<Value> (see FAQ 28.13).

deque<Value> (see FAQ 28.13).

set<Value> (see FAQ 28.14).

multiset<Value> (see FAQ 28.14).

map<Key,Value> (see FAQ 28.14).

multimap<Key,Value> (see FAQ 28.14).

FAQ 28.13 What are the best applications for the standardized C++ sequence container classes?

image

Operations that make sense for a linear sequence of objects.

The sequence container classes store their objects in a linear sequence. They include the following container classes.

vector<T>

list<T>

deque<T>

They all expand themselves to allow an arbitrary number of elements to be inserted, and all provide numerous operations. The biggest difference between them is related to performance. For example, vector<T> has very fast array-like access to random elements and very fast insertions and removals at the end; list<T> is not as fast to access a random element, but it is faster when inserting in the middle or at the beginning; deque<T> is like vector<T> except that it also has very fast insertion and removal of elements at the beginning.

Some of the operations that can be performed on a sequence follow. In the following descriptions, first and toofar represent iterators within a container, and “the range [first, toofar)” means all the elements starting at first up to but not including the element at toofar.

Selected nonmutating sequence operations:

for_each(first, toofar, function) applies the function to each element in the range [first, toofar).

find(first, toofar, const T& value) finds the first element that is equal to value in the range [first, toofar), returning the corresponding iterator (or toofar if nothing matched). find_if() is similar, but it takes a predicate to determine if the element matches.

adjacent_find(first, toofar) finds the first adjacent pair of elements that are equal in the range [first, toofar), returning the corresponding iterator (or toofar if nothing matched). An optional binary predicate can be given to determine if an adjacent pair matches.

count(first, toofar, const T& value, result) counts the number of elements in the range [first, toofar) that are equal to value and stores that number in the by-reference parameter result.count_if() is similar, but it takes a predicate to determine if the element matches.

mismatch(first, toofar, first2) finds the first mismatch between the two sequences. The elements from sequence 1 are in the range [first, toofar), and the corresponding elements from sequence 2 start at first2. An optional binary predicate can be used to determine if elements match.

equal(first, toofar, first2) checks two sequences for element-by-element equality. It returns true if and only if the elements of sequence 1 in the range [first, toofar) match the corresponding elements of sequence 2 starting at first2. An optional binary predicate can be used to determine if elements match.

search(first, toofar, first2, toofar2) finds a subsequence within a sequence. An optional binary predicate can be given to determine if an adjacent pair matches.

Selected mutating sequence operations:

copy(first, toofar, dest) copies elements from the range [first, toofar) into the sequence starting at dest. copy_backward() is similar, but it copies elements from right to left, which is especially useful when sliding a sequence a few slots to the right (that is, when the source and destination ranges overlap).

swap_ranges(first, toofar, first2) swaps the contents of the elements from range [first, toofar) with the corresponding elements from sequence 2 starting at first2.

swap(a, b) swaps the values of the elements or the entire sequences.

transform(first, toofar, dest, unaryOp) calls unaryOp() on each element in the range [first, toofar) and copies the results into sequence 2 starting at dest.

transform(first, toofar, first2, dest, binaryOp) calls binaryOp(), passing an element from sequence 1 in the range [first, toofar) and the corresponding element from sequence 2 starting at first2, and copies the results into sequence 3 starting at dest.

replace(first, toofar, const T& oldValue, const T& newValue) replaces all elements equal to oldValue in the range [first, toofar) with newValue. replace_if() is simliar, but it takes a predicate to determine if the element matches. replace_copy() and replace_copy_if() are similar except that the original sequence is unchanged, and the results are copied to a second sequence.

fill(first, toofar, const T& value) fills the range [first, toofar) with copies of value. fill_n() is similar, but it takes a starting iterator and a number.

generate(first, toofar, generator) fills the range [first, toofar) with the results of successively calling generator(). generate_n() is similar, but it takes a starting iterator and a number.

remove(first, toofar, const T& value) removes those elements from the range [first, toofar) that are equal to value. remove_if() is similar, but it takes a predicate to determine if the element matches. remove_copy() and remove_copy_if() are similar except that the original sequence is unchanged, and the results are copied to a second sequence.

unique(first, toofar) removes successive duplicates from the range [first, toofar). An optional binary predicate can be given to determine if an adjacent pair matches. unique_copy() is similar except that the original sequence is unchanged, and the results are copied to a second sequence.

reverse(first, toofar) reverses the elements of the range [first, toofar). reverse_copy() is similar except that the original sequence is unchanged, and the results are copied to a second sequence.

rotate(first, middle, toofar) rotates the range [first, toofar) around the iterator middle. rotate_copy() is similar except that the original sequence is unchanged, and the results are copied to a second sequence.

random_shuffle(first, toofar) randomly shuffles the elements in the range [first, toofar).

partition(first, toofar, unaryOp) partitions the range [first, toofar) by moving elements where unaryOp() returns true to the left, the others to the right. stable_partition() is similar, but it preserves the relative order of the elements within each group.

Selected sorting and related operations:

• All the operations in this section take an optional binary predicate used to compare two elements.

sort(first, toofar) sorts the elements in the range [first, toofar).

stable_sort(first, toofar) sorts the elements in the range [first, toofar) and never reverses two elements that compare as equal.

binary_search(first, toofar, const T& value) looks for value in the sorted sequence range [first, toofar) and returns true or false based on whether value was found.

includes(first, toofar, first2, toofar2) checks if the first multiset [first, toofar) is a subset of the second multiset [first2, toofar2).

set_union(first, toofar, first2, toofar2, dest) copies into dest the union of the first set [first, toofar) and the second set [first2, toofar2).

set_intersection(first, toofar, first2, toofar2, dest) copies into dest the intersection of the first set [first, toofar) and the second set [first2, toofar2).

set_difference(first, toofar, first2, toofar2, dest) copies into dest the difference between the first set [first, toofar) and the second set [first2, toofar2).

min_element(first, toofar) finds the minimum element within the range [first, toofar). max_element() is analogous.

FAQ 28.14 What are the best situations for the standardized C++ associative container classes?

image

Operations that make sense for keyed containers of objects.

The associative container classes store their objects so that the keys are in a sorted order based on a comparison object or function. They include the following container classes.

set<T>

multiset<T>

map<Key,Value>

multimap<Key,Value>

They all expand to allow an arbitrary number of elements to be inserted, and all provide numerous operations. set<T> and multiset<T> allow the insertion, lookup, and removal of keys, with multiset<T> allowing multiple copies of the same key value (some libraries use the term “bag” for this). map<Key,Value> and multimap<Key,Value> allow values to be inserted, looked up, and removed by their key, with multiset<T> allowing multiple values associated with the same key value.

Here are some selected ways to use the standard map<Key,Value> class.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset