Looking up a phone number, accessing a website and checking a word’s definition in a dictionary all involve searching through large amounts of data. A searching algorithm finds an element that matches a given search key, if such an element does, in fact, exist. There are, however, a number of things that differentiate search algorithms from one another. The major difference is the amount of effort they require to complete the search. One way to describe this effort is with Big O notation. For searching and sorting algorithms, this is particularly dependent on the number of data elements.
In Section 20.2.1, we present the linear search algorithm then discuss the algorithm’s efficiency as measured by Big O notation. In Section 20.2.2, we introduce the binary search algorithm, which is much more efficient but more complex to implement.
In this section, we discuss the simple linear search for determining whether an unsorted array
(i.e., an array
with element values that are in no particular order) contains a specified search key. Exercise 20.8 at the end of this chapter asks you to implement a recursive version of the linear search.
linearSearch
Function template linearSearch
(Fig. 20.2, lines 10–19) compares each element of an array
with a search key (line 13). Because the array is not in any particular order, it’s just as likely that the search key will be found in the first element as the last. On average, therefore, the program must compare the search key with half of the array
’s elements. To determine that a value is not in the array
, the program must compare the search key to every array
element. Linear search works well for small or unsorted arrays. However, for large arrays, linear searching is inefficient. If the array is sorted (e.g., its elements are in ascending order), you can use the high-speed binary search technique (Section 20.2.2).
Suppose an algorithm simply tests whether the first element of an array
is equal to the second element. If the array
has 10 elements, this algorithm requires only one comparison. If the array
has 1000 elements, the algorithm still requires only one comparison. In fact, the algorithm is independent of the number of array
elements. This algorithm is said to have a constant runtime, which is represented in Big O notation as O(1). An algorithm that’s O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constant—it does not grow as the size of the array
increases. An algorithm that tests whether the first element of an array
is equal to any of the next three elements will always require three comparisons, but in Big O notation it’s still considered O(1). O(1) is often pronounced “on the order of 1” or more simply “order 1.”
An algorithm that tests whether the first element of an array
is equal to any of the other elements of the array
requires at most
comparisons, where n is the number of elements in the array
. If the array
has 10 elements, the algorithm requires up to nine comparisons. If the array
has 1000 elements, the algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression “dominates,” and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of comparisons (such as the one we described in this paragraph) is said to be O(n) and is referred to as having a linear runtime. O(n) is often pronounced “on the order of n” or more simply “order n.”
Now suppose you have an algorithm that tests whether any element of an array
is duplicated elsewhere in the array
. The first element must be compared with all the other elements. The second element must be compared with all the other elements except the first (it was already compared to the first). The third element then must be compared with all the other elements except the first two. In the end, this algorithm will end up making or comparisons. As n increases, the term dominates and the n term becomes inconsequential. Again, Big O notation highlights the term, leaving . As we’ll soon see, even constant factors, such as the 1/2 here, are omitted in Big O notation.
Big O is concerned with how an algorithm’s runtime grows in relation to the number of items processed. Suppose an algorithm requires comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant, and both algorithms are considered to be , which is referred to as quadratic runtime and pronounced “on the order of n-squared” or more simply “order n-squared.”
When n is small, algorithms (running on today’s billions-of-operations-per-second personal computers) will not noticeably affect performance. But as n grows, you’ll start to notice the performance degradation. An algorithm running on a million-element array
would require a trillion “operations” (where each could actually require several machine instructions to execute). This could require hours to execute. A billion-element array
would require a quintillion operations, a number so large that the algorithm could take decades! Unfortunately, algorithms tend to be easy to write. In this chapter, you’ll see algorithms with more favorable Big O measures. Such efficient algorithms often take a bit more cleverness and effort to create, but their superior performance can be worth the extra effort, especially as n gets large.
The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search key is in the array
. If the array
’s size doubles, the number of comparisons that the algorithm must perform also doubles. Linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array
. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array
. If a program needs to perform many searches on large array
s, it may be better to implement a different, more efficient algorithm, such as the binary search which we consider in the next section.
Sometimes the simplest algorithms perform poorly. Their virtue is that they’re easy to program, test and debug. Sometimes more complex algorithms are required to maximize performance.
The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array
first be sorted. This is only worthwhile when the array
, once sorted, will be searched a great many times—or when the searching application has stringent performance requirements. The first iteration of this algorithm tests the middle array
element. If this matches the search key, the algorithm ends. Assuming the array
is sorted in ascending order, then if the search key is less than the middle element, the search key cannot match any element in the array
’s second half so the algorithm continues with only the first half (i.e., the first element up to, but not including, the middle element). If the search key is greater than the middle element, the search key cannot match any element in the array
’s first half so the algorithm continues with only the second half (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the array
’s remaining elements. If the element does not match the search key, the algorithm eliminates half of the remaining elements. The algorithm ends either by finding an element that matches the search key or by reducing the sub-array
to zero size.
As an example, consider the sorted 15-element array
2 3 5 10 27 30 34 51 56 65 77 81 82 93 99
and the search key 65. A binary search first checks whether the middle element (51) is the search key. The search key (65) is larger than 51, so 51 is eliminated from consideration along with the first half of the array
(all elements smaller than 51.) Next, the algorithm checks whether 81 (the middle element of the remaining elements) matches the search key. The search key (65) is smaller than 81, so 81 is eliminated from consideration along with the elements larger than 81. After just two tests, the algorithm has narrowed the number of elements to check to three (56, 65 and 77). The algorithm then checks 65 (which matches the search key), and returns the element’s index (9). In this case, the algorithm required just three comparisons to determine whether the array
contained the search key. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we’ve chosen to use an array
with 15 elements, so that there will always be an obvious middle element in the array
. With an even number of elements, the middle of the array
lies between two elements. We implement the algorithm to choose the element with the higher index number.]
Figure 20.3 implements and demonstrates the binary-search algorithm. Throughout the program’s execution, we use function template displayElements
(lines 11–23) to display the portion of the array
that’s currently being searched.
binarySearch
Lines 26–59 define function template binarySearch
, which has two parameters—a reference to the array
to search and a reference to the search key. Lines 28–30 calculate the low
end index, high
end index and middle
index of the portion of the array
that the algorithm is currently searching. When binarySearch
is first called, low
is 0
, high
is the array
’s size minus 1
and middle
is the average of these two values. Line 31 initializes location
to -1
—the value that binarySearch
returns if the search key is not found. Lines 33–56 loop until low
is greater than high
(indicating that the element was not found) or location
does not equal -1
(indicating that the search key was found). Line 45 tests whether the value in the middle
element is equal to key
. If so, line 46 assigns the middle
index to location
. Then the loop terminates and location
is returned to the caller. Each iteration of the loop that does not find the search key tests a single value (line 48) and eliminates half of the remaining values in the array
(line 49 or 51).
main
Lines 64–66 set up a random-number generator for int
values from 10
–99
. Lines 68–74 create an array
and fill it with random int
s. Recall that the binary search algorithm requires a sorted array
, so line 76 calls the Standard Library function sort
to sort arrayToSearch
’s elements into ascending order. Line 78 displays arrayToSearch
’s sorted contents.
Lines 88–105 loop until the user enters the value -1
. For each search key the user enters, the program performs a binary search of arrayToSearch
to determine whether it contains the search key. The first line of output from this program shows arrayToSearch
’s contents in ascending order. When the user instructs the program to search for 48
, the program first tests the middle element, which is 60
(as indicated by *
). The search key is less than 60
, so the program eliminates the second half of the array
and tests the middle element from the first half of the array
. The search key equals 48
, so the program returns the index 3
after performing just two comparisons. The output also shows the results of searching for the values 92
and 22
.
In the worst-case scenario, searching a sorted array
of 1023 elements will take only 10 comparisons when using a binary search. Repeatedly dividing 1023 by 2 (because, after each comparison, we can eliminate from consideration half of the remaining elements) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number 1023 is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary search algorithm. Thus, an array
of 1,048,575 elements takes a maximum of 20 comparisons to find the key, and an array
of approximately one billion elements takes a maximum of 30 comparisons to find the key. This is a tremendous performance improvement over the linear search. For a one-billion-element array
, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array
is the exponent of the first power of 2 greater than the number of elements in the array
, which is represented as . All logarithms grow at roughly the same rate, so in Big O notation the base can be omitted. This results in a Big O of O(log n) for a binary search, which is also known as logarithmic runtime and pronounced “on the order of log n” or more simply “order log n.”