Sorting data (i.e., placing the data in some particular order, such as ascending or descending) is one of the most important computing applications. A bank sorts all checks by account number so that it can prepare individual bank statements at the end of each month. Telephone companies sort their lists of accounts by last name and, further, by first name to make it easy to find phone numbers. Virtually every organization must sort some data— often, massive amounts of it. Sorting data is an intriguing, compute-intensive problem that has attracted substantial research efforts.
It’s important to understand about sorting that the end result—the sorted array—will be the same no matter which (correct) algorithm you use to sort the array. The choice of algorithm affects only the runtime and memory use of the app. The rest of the chapter introduces three common sorting algorithms. The first two—selection sort and insertion sort— are simple to program, but inefficient. The last—merge sort—is much faster than selection sort and insertion sort but more difficult to program. We focus on sorting arrays of simple-type data, namely int
s. It’s possible to sort arrays of objects as well—we discuss this in Chapter 21, Generic Collections; Functional Programming with LINQ/PLINQ.
In the text, we add output statements to help you visualize how each sort works. The web is loaded with visual animations that show these sorts in action. Simply search for the algorithm name and “animation.”
Selection sort is a simple, but inefficient, sorting algorithm. The first iteration of the algorithm selects the smallest element in the array and swaps it with the first element. The second iteration selects the second-smallest element (which is the smallest remaining element) and swaps it with the second element. The algorithm continues until the last iteration selects the second-largest element and, if necessary, swaps it with the second-to-last element, leaving the largest element in the last position. After the ith iteration, the smallest i elements of the array will be sorted in increasing order in the first i positions of the array.
For the following discussion, we higlight the smallest element in bold and italic, and the element with which it will be swapped in bold. Consider the array
34 56 4 10 77 51 93 30 5 52
A selection sort begins from index 0 and determines the smallest element (4
). The app then swaps 4
with the element at index 0 (34
), resulting in
4 56 34 10 77 51 93 30 5 52
The second iteration begins from index 1 and determines the smallest value of the remaining elements (5
). The app then swaps 5
with the element at index 1 (56
), resulting in
4 5 34 10 77 51 93 30 56 52
The third iteration begins from index 2 and determines the next smallest value (10). The app then swaps 10 with the element at index 2 (34
), resulting in
4 5 10 34 77 51 93 30 56 52
The process continues until the array is fully sorted:
4 5 10 30 34 51 52 56 77 93
After the first iteration, the smallest element is in the first position. After the second iteration, the two smallest elements are in order in the first two positions. After the third iteration, the three smallest elements are in order in the first three positions.
In Fig. 18.4, Main
declares as local variables a Random
object named generator
(line 10) to produce random int
values and the int
array data
(line 11) to store randomly generated int
s. Lines 14–17 generate random int
s in the range 10
–99
and store them in data
. Line 20 outputs the array’s unsorted contents. Line 22 calls method SelectionSort
to sort array data
’s contents in ascending order, then lines 24–25 display the sorted array. Method SelectionSort
calls method PrintPass
(defined in lines 59–86) to help visualize the sorting algorithm. The output uses dashes to indicate the portion of the array that is sorted after each pass of the algorithm. An asterisk is placed next to the position of the element that was swapped with the smallest element on that pass. On each pass, the element next to the asterisk and the element above the rightmost set of dashes were the two values that were swapped.
SelectionSort
Lines 29–48 declare method SelectionSort
. Lines 32–47 loop data.Length
-
1
times. Line 29 initializes smallest
to the index of the current item. Lines 37–43 loop over the remaining elements in the array. For each of these elements, line 39 compares its value to the value of the smallest element. If the current element is smaller than the element at index smallest
, line 41 assigns the current element’s index to smallest
. When this loop finishes, smallest
contains the index of the smallest element in the remaining array. Line 45 calls method Swap
(lines 51–56) to place the smallest remaining element in the next spot in the array.
The selection sort algorithm runs in time. Method SelectionSort
, which implements the selection sort algorithm, contains nested for
loops. The outer for
loop (lines 32–47) iterates over the first elements in the array, swapping the smallest remaining element to its sorted position. The inner for
loop (lines 37–43) iterates over each element in the remaining array, searching for the smallest. This loop executes times during the first iteration of the outer loop, times during the second iteration, then , …, 3, 2, 1. This inner loop will iterate a total of or . In Big O notation, smaller terms drop out and constants are ignored, leaving a final Big O of .
Insertion sort is another simple, but inefficient, sorting algorithm. Its first iteration takes the second element in the array and, if it’s less than the first, swaps them. The second iteration looks at the third element and inserts it in the correct position with respect to the first two elements (moving them as necessary), so all three elements are in order. At the ith iteration of this algorithm, the first i elements in the original array will be sorted.
Consider as an example the following array:
34 56 4 10 77 51 93 30 5 52
An app that implements the insertion sort algorithm first looks at the first two elements of the array, 34 and 56. These are already in order, so the app continues (if they were out of order, it would swap them).
In the next iteration, the app looks at the third value, 4. This value is less than 56, so the app stores 4 in a temporary variable and moves 56 one element to the right. The app then checks and determines that 4 is less than 34, so it moves 34 one element to the right. The app has now reached the beginning of the array, so it places 4 in the zeroth position. The array now is
4 34 56 10 77 51 93 30 5 52
In the next iteration, the app stores the value 10 in a temporary variable. Then the app compares 10 to 56 and moves 56 one element to the right because it’s larger than 10. The app then compares 10 to 34, moving 34 one element to the right. When the app compares 10 to 4, it observes that 10 is larger than 4 and places 10 in element 1. The array now is
4 10 34 56 77 51 93 30 5 52
Using this algorithm, at the ith iteration, the original array’s first i elements are sorted, but they may not be in their final locations—smaller values may be located later in the array.
Figure 18.5 declares the InsertionSortTest
class. Lines 29–51 declare method InsertionSort
. Lines 32–50 loop through data.Length
-
1
items in the array. In each iteration, line 35 stores in variable insert
the value of the element that will be inserted in the sorted portion of the array. Line 38 declares and initializes variable moveItem
, which keeps track of where to insert the element. Lines 41–46 loop to locate the correct position to insert the element. The loop will terminate either when the app reaches the front of the array or when it reaches an element that is less than the value to be inserted. Line 44 moves an element to the right, and line 45 decrements the position at which to insert the next element. After the loop ends, line 48 inserts the element in place—it’s possible that no movements will need to me made. Method InsertionSort
calls method PrintPass
(defined in lines 54–81) to help visualize the sorting algorithm. The output uses dashes to indicate the portion of the array that is sorted after each pass of the algorithm. An asterisk is placed next to the element that was inserted in place on that pass.
The insertion sort algorithm also runs in time. Like selection sort, the implementation of insertion sort (lines 29–51 of Fig. 18.5) contains nested loops. The for
loop (lines 32–50) iterates data.Length
-
1
times, inserting an element in the appropriate position in the elements sorted so far. For the purposes of this app, data.Length
-
1
is equivalent to (as data.Length
is the size of the array). The while
loop (lines 41–46) iterates over the elements that precede the one that will be inserted. In the worst case, this while
loop will require comparisons. Each individual loop runs in O(n) time. In Big O notation, nested loops mean that you must multiply the number of iterations of each loop. For each iteration of an outer loop, there will be a certain number of iterations of the inner loop. In this algorithm, for each O(n) iterations of the outer loop, there will be O(n) iterations of the inner loop. Multiplying these values results in a Big O of .
Merge sort is an efficient sorting algorithm but is conceptually more complex than selection sort and insertion sort. The merge sort algorithm sorts an array by splitting it into two equal-sized subarrays, sorting each subarray and merging them in one larger array. With an odd number of elements, the algorithm creates the two subarrays such that one has one more element than the other.
The implementation of merge sort in this example is recursive. The base case is an array with one element. A one-element array is, of course, sorted, so merge sort immediately returns when it’s called with a one-element array. The recursion step splits an array in two approximately equal-length pieces, recursively sorts them and merges the two sorted arrays in one larger, sorted array.
Suppose the algorithm has already merged smaller arrays to create sorted arrays A:
4 10 34 56 77
and B:
5 30 51 52 93
Merge sort combines these two arrays in one larger, sorted array. The smallest element in A is 4 (located in element 0 of A). The smallest element in B is 5 (located in element 0 of B). To determine the smallest element in the merged array, the algorithm compares 4 and 5. The value from A is smaller, so 4 becomes the merged array’s first element. The algorithm continues by comparing 10 (located in element 1 in A) to 5 (located in element 0 in B). The value from B is smaller, so 5 becomes the second element in the larger array. The algorithm continues by comparing 10 to 30, with 10 becoming the third element in the array, and so on.
Lines 29–32 of Fig. 18.6 declare the MergeSort
method. Line 31 calls private
method SortArray
with 0
and data.Length
-
1
as the arguments—these are the beginning and ending indices of the array to be sorted. These values tell SortArray
to operate on the entire array. Method SubArray
(lines 116–127)—which is called from methods SortArray
(35–56) and Merge
(59–113)—produces a string
representation of a portion of the array. The program displays these string
s to help visualize the sort algorithm—the output shows the splits and merges performed by MergeSort
as the algorithm progresses.
SortArray
Method SortArray
is declared in lines 35–56. Line 38 tests the base case. When the array’s size is 1
, the array is already sorted, so the method simply returns immediately. If the array size is greater than 1
, the method
splits the array in two,
recursively calls method SortArray
to sort the two subarrays and
merges them.
Lines 50–51 recursively call method SortArray
on the array’s first half and second half, respectively. When these two method calls return, the array’s two halves have been sorted. Line 54 calls method Merge
(lines 59–113) to combine the two sorted halves of the array into one larger sorted array.
Merge
Lines 72–84 in method Merge
loop until the app reaches the end of either subarray. Line 76 tests which element at the beginning of the subarrays is smaller. If the element in the left array is smaller or equal, line 78 places it in position in the combined array. If the element in the right array is smaller, line 82 places it in position in the combined array. When the loop completes, one entire subarray is placed in the combined array, but the other subarray still contains data. Line 87 tests whether the left array has reached the end. If so, lines 90–93 fill the combined array with the remaining elements of the right array. If the left array has not reached the end, then the right array has, and lines 98–101 fill the combined array with the remaining elements of the left array. Finally, lines 105–108 copy the combined array into the original array.
Merge sort is a far more efficient algorithm than either insertion sort or selection sort when sorting large sets of data. Consider the first (nonrecursive) call to method SortArray
. This results in two recursive calls to method SortArray
with subarrays each approximately half the size of the original array, and a single call to method Merge
. This call to method Merge
requires, at worst, comparisons to fill the original array, which is . (Recall that each element in the array can be chosen by comparing one element from each of the subar-rays.) The two calls to method SortArray
result in four more recursive calls to SortArray
, each with a subarray approximately a quarter the size of the original array, along with two calls to method Merge
. These two calls to method Merge
each require, at worst, comparisons for a total number of comparisons of , which is . This process continues, each call to SortArray
generating two additional calls to method SortArray
and a call to Merge
, until the algorithm has split the array into one-element subarrays. At each level, comparisons are required to merge the subarrays. Each level splits the size of the arrays in half, so doubling the size of the array requires only one more level. Quadrupling the size of the array requires only two more levels. This pattern is logarithmic and results in log2 n levels. This results in a total efficiency of .
As you may have noticed, the key to achieving the high efficiency of the merge sort is cleverly using additional memory. This is another example of the space/time trade-off—if you have more space you can get your algorithm to run in less time and if you have less space your algorithm may require more time. In industry, you might create apps for memory constrained devices, which might prevent you from using the merge sort algorithm.