9Collections and Their Organization

We have introduced basic data types and user-defined types (array, structures, etc.) in Chapters 2 and 6. Basic data types are predefined by the C++ compiler system, and user-defined data types consist of elements of base types and user-defined types. We call the data collections. In many cases, system-defined operations aren’t enough and user-defined operations are required to deal with specific problems. A collection class can be defined by encapsulating data and operations with the object-oriented method.

There are two types of collections: linear collections and nonlinear collections. A node in a linear collection can be specified by its position. It is possible to determine the order of the elements in a linear collection. An array is a good example of a linear collection (as shown in Figure 9.1). In a nonlinear collection, a node cannot be specified by its position. For example, a tree is a nonlinear collection (as shown in Figure 9.2). This chapter will introduce several commonly used collections and their related operations.

Organizing data in a collection is of interest in the study of data structure, which is described briefly in this chapter. Two types of commonly used algorithms are introduced: sorting and searching.

Sorting is the process of ordering all elements in a collection. There are two basic operations in sorting algorithm: comparing two elements and adjusting the position of an element. In this chapter, we introduce three sorting methods: direct insertion sorting, direct selection sorting, and bubble sorting.

Searching is the process of locating a specific element in an array by following a certain procedure. Looking up a word in a dictionary is a good example of searching.

Fig. 9.1: A linear collection.
Fig. 9.2: A nonlinear collection.

In this chapter, we only introduce the two most basic searching algorithms: sequential searching and binary searching.

In this chapter, the concept of templates will be introduced, and this concept will be used to describe linear collections and the organization of collections in later chapters.

9.1Function Templates and Class Templates

Code reuse is the most important feature in C++. Code has to be written in the most generic form in order for it to be reusable. A generic code can adapt to various data types automatically. This type of programming is called parameterized programming. C++ provides templates as a tool to support parameterized programming. By declaring the object types as parameters, a code segment can be used for various object types.

9.1.1Function Template

Function overloading was introduced in Chapter 3. Function overloading can be used to perform similar actions on different data types. In many cases, the algorithm is the same but the operations are performed on different data types. Even with the help of function overloading, identical codes are often repeated many times. The following program is an example of such cases.

These two functions are identical except that the parameter types differ. In this case, a generic code can efficiently deal with both types of parameters. In C++, programmers only need to write a function template once, and then the compiler will generate corresponding functions for executing the algorithm on specific data types when the function template is called.

The definition of a function template is:

The definition of function templates begins with the keyword template and is followed by a type parameter list embraced by angular brackets. Each parameter (e.g., T in the above example) specifies a data type with a leading keyword, class or typename. The parameter can be referred to as the type of function formal parameter, the type of return value, or the type of local variable. The body of a function template is similar to an ordinary function. The following is an example that calculates absolute value:

The compiler infers the types of parameters in the template when it finds an invocation of abs(). For example, the compiler infers the type T as int when it tries to compile the expression abs(n), in which n is an int variable.

The compiler instantiates the template after determining the types of parameters in the template and generates a new function:

Similarly, the compiler infers T as double when it tries to compile abs(d), in which d is a double variable. Then it generates the following function based on the template:

Therefore, the main function actually calls the following function when the function abs(n) is executed

and the main function calls the function

when the function abs(d) is executed.

Example 9.1: An example of a function template.

The result from executing this program is as follows:

The function template declares the type parameter T an abstract type. When the compiler compiles the call outputArray, it replaces T as the type of the first parameter in the call, then it instantiates the template and compiles the result of instantiation.

The main program defines three types of array, aArray(an int array), bArray(a double array), and cArray(a char array), whose lengths are 8, 8, and 20 respectively. The main function invokes the function template outputArray to print out the elements in the array. Below are the functions generated during the compilation:

As shown above, function template and overloading are closely related. Functions generated from a specific template have the same function name, since the compiler invokes the corresponding function in the form of overloading. Note that there are many other ways to overload a function template.

9.1.2Class Template

Class template allows programmers to parameterize the class and set the type of the members of the class, the parameters in class methods, and the return value of the methods to any types (including both primitive and user-defined).

A class template is a higher level of abstraction. A class is an abstraction of common properties of objects, and a class template is an abstraction of common properties of classes; thus a class template is also called a parameterized class because it needs one or more type parameters.

The form of a class template is:

The declaration of template classes is nearly identical to that of ordinary classes except that the type parameter T is often used in members of template classes.

When defining class methods outside the class template, the following form should be used:

“TypeParameterList” is a list of type identifiers or constant expressions separated by commas, including:

  1. Class (or typename) Identifier, indicating the template is able to accept a type parameter
  2. TypeName Identifier, indicating the template is able to accept a constant with the type TypeName

Parameters should be separated by commas if there is more than one parameter in the list. Moreover, member functions of a template class have to be function templates.

The compiler does not generate codes immediately when it sees declarations of class templates, which only declare a series of classes. The compiler generates codes on demand when it figures out a template is referenced.

The form for using a template class is as follows:

Figure 9.3 demonstrates how to represent template classes in a UML graph, in which T is a formal parameter indicating the type information in the class.

The compiler generates concrete classes by binding actual types to formal type parameters, which is often called “instantiation”. The UML represents the dependency of instantiation by the <<bind>> operation. There are two forms of showing binding elements, as in Figure 9.4:

Each parameter in the ParameterList corresponds to the parameter in the class template declaration. The compiler generates a class based on the constants and the types of parameter, then instantiates the class to generate an instance. That is to say, the compiler instantiates the class template to generate a concrete class and instantiates the concrete class to produce an object.

Fig. 9.3: Parameterized class with explicit attributes and methods.
Fig. 9.4: Forms of binding elements in parameterized class.
Fig. 9.5: UML of template class Store and concrete classes.

Example 9.2: Example of class template.

In this example, the program declares a class template Store to implement a wrapper to store / load data. The template is instantiated as concrete classes based on the given parameters, which are instantiated as objects S1, S2, S3, and D. However, the instantiation of the class template (as shown in Figure 9.5) is hidden. We call these concrete classes intStore / doubleStore / StudentStore (which do not occur in the program) in order to represent them in the UML graph.

Running result:

9.2Linear Collection

9.2.1Definition of Linear Collection

The order and the position of an element are directly related in a linear collection. There are three access methods in linear collection: direct access, sequential access, and indexed access. In this section, we only introduce direct access and sequential access.

Programmers can directly access any element in a linear collection that supports the direct access method. For instance, programmers can access any element in an array through the subscripts. But programmers can only traverse the collection to access an element in a linear collection that only supports sequential accesses.

There are two special linear collections – stacks and queues.

To introduce stacks, let’s consider a pile of dishes in a restaurant: we can only grab the dish on the top, and only put a dish on the top of the pile. This is exactly what a stack does as shown in Figure 9.6. A stack is a linear collection that only can be accessed from one side. The accessible side is called the top of the stack, while the other side is called the bottom of the stack. Adding / removing elements to the top of the stack is called pushing and popping, respectively. As we can see, elements in stacks are “Last In, First Out” (LIFO).

Fig. 9.6: A typical stack.
Fig. 9.7: How a stack works in a function call.

The stack is a basic and widely used linear collection in computer programs. For example, the compilation system uses stacks to implement function calls.

Programmers only need to write a simple statement to call a function in C++ and the compiler does the rest. Figure 9.7 shows how function calls are implemented. Before a function call, the compiler generates codes to push the return address, calling contexts and actual parameters into the stack. The function pops necessary information out from the stack to initialize formal parameters. Return addresses and contexts are restored from the stack when the function returns.

The compiler also parses expressions in the programs using stacks. We can see expressions such as the following one in almost all programs: a/b+c*d.

Let’s examine how the compiler parses these expressions. The compiler establishes two stacks: an operand stack (ODS) and an operator stack (OPS) to parse them. An operand or an operator is represented by a token. Figure 9.8 shows the algorithm for parsing expressions. Figure 9.9 shows the process of parsing the expression a/b+c*d.

The queue is also a basic and widely used linear collection in computer programs. Elements are only added to one side (which is called the tail), and are only removed at the other side (which is called the head). Elements in the queue are served in a “First In, First Out” (FIFO) manner. Figure 9.10 shows the logical structure of a queue.

Fig. 9.8: Algorithm of parsing expressions.
Fig. 9.9: Using a stack to parse the expression a/b+c*d.
Fig. 9.10: A typical queue.

Queues are widely used in request scheduling, like processing client requests in the bank, or processing printing jobs in a computer system.

9.2.2Direct Accessible Linear Collection – Array

Static arrays were discussed in Chapter 6. The static array has a fixed number of elements, which can be accessed directly with subscripts. It is impossible to modify the size of a static array and there is no built-in mechanism to deal with the out-of-bound problem.

Now we propose a class template Array to deal with these problems. Array contains a number of consecutive elements of the same type, and its size can be changed on demand.

Example 9.3: Class Array.

The definition and implementation of template class Array is in the file below:

Array does an out-of-bound check before accessing an element, and its size can be changed during the runtime. Programmers can use Array as a normal static array, since Array overloads the subscript operator [] and dereference operator *. Programmers can replace static arrays with Array to perform bounds checking during the runtime.

There are three design issues in the Array template: 1) Why is the copy constructor so complicated, rather than just a simple assignment? 2) Why do some functions return a reference of an object, rather than the value of the object? 3) Why does Array overload the type-cast pointer? The following section addresses these problems.

1.Shallow Copy vs Deep Copy

This problem was discussed in Section 6.4. Shallow copy only copies the references of elements, and not the contents of elements in Array. Here is an example of shallow copy:

Consider the following main function:

Figure 9.11 shows the effects of shallow copy.

As the figure shows, the copy constructor does not copy the elements in the array, and instead, A and B share the same array. The program will generate a serious error during the execution of the destructors. Here is the destructor function in Array:

At the end of the program, the destructor will be invoked twice and free the same memory twice, once for A and once for B. This, of course, is not allowed and will certainly generate a runtime error.

The correct way of doing it is ‘deep copy’ as shown in the following scenario:

Fig. 9.11: Shallow copy.

Figure 9.12 shows the effects of deep copy. That is what we want to do.

2.Special Operators

In this example, the overloaded operator = and [] return the reference of the object instead of the value of it. Readers might wonder why it is done this way.

Notice that in a program, an arithmetic expression can be used in further computation or as the value of a variable (on the right side of an assignment operator), but it cannot be placed on the left of an assignment operator. C++ does not allow expressions like “a+b=c”.

Fig. 9.12: Deep copy.

Programmers often write statements like: “a[3] = 5”, in which the result of operator [] is placed on the left of the assignment operator. We call it a left value (lvalue) by convention.

As discussed above, the result of arithmetic operators should be a constant so that it cannot be used as a lvalue, but the situation differs in operator [] – it is desirable that we can write statements like “a[3]=5”. That is why the operator [] returns a reference rather than a constant value.

The operator = is just a convention for compatibility. As a convention, C++ requires that the result of an assignment operator can be used as an lvalue. The following statements are valid in C++:

The value of A is 6 after running these statements.

Moreover, the syntax of C++ requires operators = / [] / () / −> to be member functions of a class, and the assignment operator is not inheritable.

3.The Type-Cast Operator

Let’s consider the following program:

There is no problem in this program. Now consider replacing a with the Array class:

When the compiler detected the mismatch between the type of formal parameter and actual parameter, it will try to perform a type cast, i.e., converting a into a variable with type int*. However, we need to override the type-cast operator to help the compiler to cast a into an int* variable.

Some reader might find that the type-cast operator does not declare the type of return value. It is in fact a C++ syntax, which requires that the type of return value should not be declared (including void) for overloading a type-cast operator.

Example 9.4: An example of an array.

The problem is to find all prime numbers between 2 and N where N is provided by users during runtime.

Primes are integers greater than or equal to 2 that can only be divided by itself and 1. We have to use a dynamic array since N is provided at runtime. Here is the program:

Running result:

9.2.3Sequential Access Collection – Linked List

A linked list is another linear collection and it is used as an example of a sequential access collection. A linked list consists of a number of nodes, which are generated at runtime. There are two fields in each node: a payload and a pointer pointing to the next node. If a node has one field pointing to the next node, the list is called a single linked list. If a node has two fields connecting to other nodes, one pointing to its previous node and the other pointing to its next node, then the list is called a double linked list. Figure 9.13 is an example of a single linked list.

1.Node Class

The node, the basic component of a linked list, consists of a data member and a pointer pointing to the next node. It should also contain methods to initialize the node, to remove the node, or to add a new node into the list. Figure 9.14 and Figure 9.15 show the procedure of adding and removing nodes. Example 9.5 is the node template class.

Example 9.5: Node template class.

Source code:

Fig. 9.13: A typical single linked list.
Fig. 9.14: Insert a node after p.
Fig. 9.15: Delete a node after p.

2.Linked List Class Template

Many applications use linked lists. It is beneficial to abstract common linked list operations, like creating or removing nodes, and traversing through a linked list in an object-oriented manner.

a)Data Member of Linked List

A sequence of nodes linking sequentially makes a linked list. All operations of a linked list start from its head node. It is essential to have a pointer pointing to the head node in the linked list class, and a pointer for the last node is also valuable information to the program.

The linked list class needs to manage the nodes at runtime since it is a dynamic data structure.

The traversal operation requires two pointers, one pointing to the current visiting node, and the other pointing to the predecessor. Both inserting and deleting nodes use the pointer of predecessor.

Therefore, the linked list class needs to manage these data:

Pointer to the head node

Pointer to the last node

The number of nodes

Current position of traversal

b)Methods of a Linked List Class

The linked list class should provide these primitives:

Creatinganode

Inserting a node into the list

Removinganode

Access and modification of the data of a node

Traversal

All of them should be wrapped into the methods of the class. The linked list class also overloads the “=” operator to behave correctly in assignments.

The declaration of the linked list class is shown in Example 9.6. The implementation is left as an exercise.

Example 9.6: The declaration of the linked list class.

3.Application

Example 9.7: Application of a linked list.

This program is to perform the following operations: get 10 integers from the keyboard; create a linked list whose nodes hold these integers, and output the numbers in order. Then get another number from the keyboard, delete all nodes containing the number and output the data in the linked list, lastly clearing the linked list before exiting.

Running result:

9.2.4Stack

As mentioned in 9.2.1, a stack is a special linear collection with restricted access. It is possible to implement stacks in arrays or linked lists. However, the access pattern of a stack is a restricted one, so we need a different interface for the stack class.

In order to represent the states of a stack, the stack class should contain data elements and the pointer to the top of the stack. There are two types of implementation: one is based on arrays (either static or dynamic) and the other is based on linked lists.

There are three states of a stack: normal, empty and full. A stack is full when the number of elements reaches the number in the declaration. In the implementation based on dynamic arrays or linked lists, it is possible to set the number as unlimited.

A stack should provide interfaces of these operations:

Initialization

Push data into the stack

Pop data out of the stack

Clear the stack

Access the top of the stack

State detection

Figure 9.16 shows a stack implemented via arrays.

Fig. 9.16: Stack implemented via array.

Example 9.8: Implementation of stack.

The example shows how to implement a stack in arrays. It is also possible to implement a stack in a linked list, which is left as an exercise.

Example 9.9: Application of stacks – a simple calculator.

In this example, we implement a simple calculator that is capable of addition, subtraction, multiplication, and powers. The input expressions are in suffix form, with each operand and operator separated by a space. For example, the user should input “3 5 +” in order to calculate the value of “3+5”. The power operator is “^”. Each calculation is based on the previous result. “c” means clear the result and “q” means exit.

In 9.2.1, we introduced how to use stacks to evaluate an expression. A calculator needs two stacks, one for operands and one for operators to calculate complex expressions. However, the simple calculator can only use one stack to store the operands since it does not need to deal with the precedence of operators.

The main operation of the program is:

When the input is an operand, push it into the stack.

When the input is an operator, pop two operands out of the stack, calculate the result and push the result back to the stack

When the input is “c”, clear the stack

When the input is “q”, clean up and exit the program.

We implement a calculator class to model all operations stated above.

Running result:

9.2.5Queues

Similar to stacks, queues are also linear collections with a restricted access pattern. We can only add elements on one side (head), and only remove elements on the other side (tail). It is also possible to implement queues in both arrays and linked lists. The definition of the status of a queue is similar to the one for stacks.

A queue should provide a queue element, head pointer, and tail pointer as data members, and it also needs to provide the interfaces of these operations:

Initialization

Enqueue / dequeue

Clear the stack

Access the head of the queue

State detection

Figure 9.17 shows a trivial implementation in static arrays. We can see all elements must be moved towards the head of the queue when dequeuing an element, which is very inefficient.

A clever way to design a queue is to make it a circular one. The idea is shown in Figure 9.18. The queue maintains two pointers pointing to the head and the tail of the queue respectively. The queue changes these pointers when enqueuing and dequeuing elements. The important part is to use a modular operation to wrap the queue, ensuring the pointers are reset to zero when they cross the boundary. The queue also needs to count the elements. Example 9.10 is an implementation using this method.

Example 9.10: Implementation of queues.

Fig. 9.17: States of a queue.
Fig. 9.18: States of a circular queue.

Implementing queues via linked list is also left as an exercise.

9.3Organizing Data in Linear Collections

In this section, we discuss how to organize data in linear collections, introducing how to sort collections and find a particular element in a collection.

9.3.1Insertion Sort

The basic idea of insertion sort is to insert one unsorted element at a time into the correct position (in the sorted subcollection) until the collection is completely sorted. For example, if we need to sort an n-element array a, we can consider a[0] a sorted subcollection and a[1:n–1] as the subcollection to be sorted. The procedure is shown in Figure 9.19. There are some variants of the algorithm using different ways to find the position in which to insert the element. Here we only introduce the simplest one, the direct insertion sort.

Example 9.11: Implementation of direct insertion sort.

Fig. 9.19: Procedure of insertion sort.

9.3.2Selection Sort

The basic idea of the selection sort is to select the minimum element in the subcollection to be sorted and put it at the end of the sorted subcollection. There are also some variants of it based on different ways to find the minimum element. The simplest one is to scan through the collection, which is the so-called direct selection sort as shown in Figure 9.20.

Example 9.12: Implementation of direct selection sort.

Fig. 9.20: Procedure of direct selection sort.

9.3.3Exchange Sort

The idea of the exchange sort is to pair up elements in the collections and exchange them if they are disordered. Bubble sort is a good example.

The following is the algorithm of bubble sort for operating on an array with n elements:

  1. Check the first and the second element, and exchange them if they are not in the right order. Then do this similarly for the second and the third one, and so on until the last one is reached. The whole procedure is called the first stage of bubble sort. After the procedure, the maximum is at the n-th position.
  2. Repeat the first step with the first n−1 elements. After the procedure, the maximum of these n − 1 elements is at n − 1-th position.
  3. Repeat the second step until no exchange appears in a stage of bubbling sort. For an n-element collection, bubbling sort requires n − 1 stages at most.

Figure 9.21 shows the procedure of the bubble sort.

Example 9.13: Implementation of bubble sort.

Fig. 9.21: Procedure of bubble sort.

9.3.4Sequential Search

Sequential search is a basic search method. It scans through the array, and compares each element with the desired one until it finds the match. The search is unsuccessful if there is no element that matches the desired one.

Example 9.14: Sequential search.

9.3.5Binary Search

We can use binary search if we search a sorted collection. The idea of binary search is that a comparison splits the sequence into two parts of equal size and only one of them can contain the desired one. Figure 9.22 shows the procedure of a binary search.

Example 9.15: Binary search.

Fig. 9.22: Procedure of a binary search.

9.4Application – Improving the HR Management Program of a Small Company

We demonstrated how to use polymorphism in Chapter 8. Now we use the dynamic array to implement the same functionality.

Example 9.16: Improving the HR management program.

There are four files in the program: employee.h, employee.cpp in Chapter 8, 9_3.h in this chapter, and a modified 9_16.cpp derived from 8_8.cpp, with slight modifications.

Running result:

9.5Summary

This chapter introduces the concept of templates, and several commonly used linear collections: arrays, linked lists, stacks, and queues. Methods for sorting and searching a linear collection are also discussed.

The template is a powerful tool for describing parameterized polymorphism. Templates allow programmers to write features (like searching), without considering the underlying implementation (like, searching an array or a linked list).

The access order of an element and the position of it are directly related in a linear collection. There are three ways to access a linear collection: direct access, sequential access, and indexed access. In this chapter, we introduce the linear collections that support direct access and sequential access.

The basic idea of insertion sort is to repeatedly insert an unsorted element into the correct position in the sorted subcollection until the complete collection is sorted.

The basic idea of selection sort is to select the minimum element in the unsorted subcollection and put it to the end of the sorted subcollection.

The idea of exchange sort is to pair up elements in the collections and exchange them if they are disordered. Bubble sort is a good example.

The idea of sequential search is to scan through the array and compare each element with the desired one until finds the matched one. The search is unsuccessful if there are no element matches with the desired one.

The idea of binary search is that a comparison can split the sequence into two parts of equal size and only one of them can possibly contain the desired one.

There are many algorithms for sorting and searching linear collections. Readers can refer to books of data structure for advanced topics in this area.

Exercises

9.1 There are N students in a class. Write a program to calculate the average score of course A of the class. Please use the Array template to define a float type array to store the scores.

9.2 What elements at least should be in the Node class in a linked list? What are the differences between a single-linked list and a double-linked list?

9.3 What is the maximum number of elements in a linked list?

9.4 What are the differences of the Node class used in a double-linked list and single-linked list? Try to declare the Node class DNODE used in a double-linked list.

9.5 Declare two linked lists A and B with type int using the template provided in this chapter. Insert five elements respectively and then append all elements in B to the tail of A.

9.6 Derive an ordered linked list class OrderList from the class LinkedList, and add a new method InsertOrder to insert elements in order. Declare two ordered linked lists A and B with type int using the template provided in this chapter. Insert five elements in order respectively, and then add all elements to B, keeping A ordered.

9.7 What is a stack? What is the access pattern of a stack?

9.8 What is a queue? What is the access pattern of a stack?

9.9 Describe the idea of insertion sort.

9.10 Initialize an int-type array

Sort it with the direct insertion sort algorithm. Modify the implementation slightly to show the whole array after each insertion, and observe the changes of the data.

9.11 Describe the idea of selection sort.

9.12 Initialize an int-type array

Sort it with the direct selection sort algorithm. Modify the implementation slightly to show the whole array after each insertion, and observe the changes of the data.

9.13 Describe the idea of bubble sort.

9.14 Initialize an int-type array

Sort it with the bubbling sort algorithm. Modify the implementation slightly to show the whole array after each insertion, and observe the changes of the data.

9.15 All sorting algorithms described in this chapter are sorting in ascending order. Modify the bubbling sort algorithm slightly to perform a descending sort. Initialize an int-type array

Sort it with the descending bubble sort algorithm. Modify the implementation slightly to show the whole array after each insertion, observe the changes of the data.

9.16 Describe the idea of sequential search.

9.17 Initialize an int-type array

Let the user input a number and search it in the array, using the sequential search algorithm.

9.18 Describe the idea of binary search.

9.19 Initialize an int-type array

Let the user input a number and search it in the array, using the binary search algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset