Looking up a phone number, accessing a website and checking the definition of a word in a dictionary all involve searching large amounts of data. The next two sections discuss two common search algorithms—one that is easy to program yet relatively inefficient and one that is relatively efficient but more complex to program.
The linear search algorithm searches each element in an array sequentially. If the search key does not match an element in the array, the algorithm tests each element and, when the end of the array is reached, informs the user that the search key is not present. If the search key is in the array, the algorithm tests each element until it finds one that matches the search key and returns the index of that element.
As an example, consider an array containing the following values
34 56 2 10 77 51 93 30 5 52
and a method that is searching for 51. Using the linear search algorithm, the method first checks whether 34
(at index 0
) matches the search key. It does not, so the algorithm checks whether 56
(at index 1
) matches the search key. The method continues moving through the array sequentially, testing 2
(at index 2
), then 10
(at index 3
), then 77
(at index 4
). When the method tests 51
(at index 5
), which matches the search key, the method returns 51
’s location (5
) in the array. If, after checking every array element, the method determines that the search key does not match any element in the array, the method returns -1
. If there are duplicate values in the array, linear search returns the index of the first element in the array that matches the search key.
Figure 18.2 implements the linear search algorithm. Method Main
declares as local variables a Random
object (line 10) to produce random int
values and the int
array data
(line 11) to store randomly generated int
s. Lines 14–17 store in data
random int
s in the range 10
–99
. Line 19 displays the array using string
method Join
to create a string
with the array’s elements separated by spaces. Join
’s first argument is a delimiter that’s placed between each pair of elements and the second argument is the array. Lines 22–23 prompt for and input the search key. Lines 26–45 loop until the user enters -1
. Line 29 calls method LinearSearch
to determine whether searchInt
is in the array. If so, LinearSearch
returns the element’s index, which is displayed in lines 33–34. If not, LinearSearch
returns -1
, and lines 38–39 notify the user. Lines 43–44 prompt for and input the next search key.
LinearSearch
Lines 49–61 perform the linear search. The search key is passed to parameter searchKey
. Lines 52–58 loop through the elements in the array. Line 54 compares the current element in the array with searchKey
. If the values are equal, line 56 returns the index of the element. If the loop ends without finding the value, line 60 returns −1
.
Searching algorithms all accomplish the same goal—finding an element that matches a given search key, if such an element exists. Many things, however, differentiate search algorithms from one another. The major difference is the amount of effort required to complete the search. One way to describe this effort is with Big O notation, which is a measure of the worst-case runtime for an algorithm—that is, how hard an algorithm may have to work to solve a problem. For searching and sorting algorithms, this is particularly dependent on how many elements there are in the data set and the algorithm used.
Suppose an algorithm is designed to test whether the first element of an array is equal to the second element. If the array has 10 elements, this algorithm requires one comparison. If the array has 1,000 elements, this algorithm still requires one comparison. In fact, this algorithm is completely independent of the number of elements in the array, and is thus said to have a constant runtime, which is represented in Big O notation as O(1). An algorithm that is O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constant—it does not grow as the size of the array increases. An algorithm that tests whether the first element of an array is equal to any of the next three elements is still O(1), even though it requires three comparisons.
An algorithm that tests whether the first element of an array is equal to any of the other elements of the array will require at most comparisons, where n is the number of elements in the array. If the array has 10 elements, this algorithm requires up to nine comparisons. If the array has 1,000 elements, this algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression dominates, and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of comparisons (such as the one we described earlier) is said to be O(n). An O(n) algorithm is referred to as having a linear runtime. O(n) is often pronounced “on the order of n” or more simply “order n.”
Now suppose you have an algorithm that tests whether any element of an array is duplicated elsewhere in the array. The first element must be compared with every other element in the array. The second element must be compared with every other element except the first (it was already compared to the first). The third element must be compared with every other element except the first two. In the end, this algorithm will end up making or comparisons. As n increases, the term dominates and the n term becomes inconsequential. Again, Big O notation highlights the term, leaving . But as we’ll soon see, constant factors are omitted in Big O notation.
Big O is concerned with how an algorithm’s runtime grows in relation to the number of items processed. Suppose an algorithm requires comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, the algorithm will require 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant, and both algorithms are considered to be , referred to as quadratic runtime and pronounced “on the order of n-squared” or more simply “order n-squared.”
When n is small, algorithms (running on today’s billions-of-operations-per-second personal computers) will not noticeably affect performance. But as n grows, you’ll start to notice the performance degradation. An algorithm running on a million-element array would require a trillion “operations” (where each could actually require several machine instructions to execute). This could require many minutes to execute. A billion-element array would require a quintillion operations, a number so large that the algorithm could take decades! algorithms are easy to write, as you’ll see shortly. You’ll also see algorithms with more favorable Big O measures. These efficient algorithms often take more cleverness and effort to create, but their superior performance can be well worth the extra effort, especially as n gets large.
The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search item exists in the array. If the size of the array is doubled, the number of comparisons that the algorithm must perform is also doubled. Linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array.
Linear search is the easiest search algorithm to program, but it can be slow compared to other search algorithms. If an app needs to perform many searches on large arrays, it may be better to implement a different, more efficient algorithm, such as the binary search, which we present in the next section.
Sometimes the simplest algorithms perform poorly. Their virtue is that they’re easy to program, test and debug. Sometimes more complex algorithms are required to realize maximum performance.
The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array first be sorted. The first iteration of this algorithm tests the middle element in the array. If this matches the search key, the algorithm ends. Assuming the array is sorted in ascending order, if the search key is less than the middle element, the search key cannot match any element in the second half of the array and the algorithm continues with only the first half of the array (i.e., the first element up to, but not including, the middle element). If the search key is greater than the middle element, the search key cannot match any element in the first half of the array, and the algorithm continues with only the second half of the array (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the remaining portion of the array, called a subarray. A subarray can have no elements, or it can encompass the entire array. If the search key does not match the element, the algorithm eliminates half of the remaining elements. The algorithm ends by either finding an element that matches the search key or reducing the subarray to zero size.
As an example, consider the sorted 15-element array
2 3 5 10 27 30 34 51 56 65 77 81 82 93 99
and a search key of 65. The binary search algorithm would first check whether 51
is the search key, because 51
is the array’s middle element. The search key 65
is larger than 51
, so 51
is “discarded” (i.e., eliminated from consideration) along with the first half of the array (all elements smaller than 51
), leaving the following subarray to be searched:
56 65 77 81 82 93 99
Next, the algorithm checks whether the subarray’s middle element (81
) matches the search key—65
is smaller than 81
, so 81
is discarded along with the elements larger than 81
, leaving the following subarray to be searched:
56 65 77
After just two tests, the algorithm has narrowed the search from 15 elements to three. The algorithm then checks 65 (which indeed matches the search key) and returns that array element’s index. This algorithm required just three comparisons to determine whether the search key matched an element in the array. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we have chosen to use an array with 15 elements so that there will always be an obvious middle element in the array. With an even number of elements, the middle of the array lies between two elements. We implement the algorithm to choose the higher of the two elements.]
Figure 18.3 declares class BinarySearchTest
, which contains static
methods Main
, BinarySearch
and DisplayElements
(which helps visualize the binary search algorithm). Method Main
has as local variables a Random
object named generator
(line 10) to produce random int
values and the int
array data
(line 11) to store randomly generated int
s. Lines 14–17 generate random int
s in the range 10
–99
and store them in data
. Recall that a binary search requires an array’s elements to be sorted. For this reason, line 19 calls class Array
’s static
method Sort
for the array data
—this sorts the elements in an array in ascending order. Line 20 outputs the array’s contents by calling method DisplayElements
(lines 86–98). Lines 23–24 prompt the user for the search key and store it in searchInt
. Lines 27–46 loop until the user enters −1
. Line 30 calls method BinarySearch
to determine whether searchInt
is in the array. If searchInt
is found, BinarySearch
returns the element’s index, which is displayed in lines 34–35. If searchInt
is not in the array, BinarySearch
returns −1
, and lines 39–40 notify the user. Lines 44–45 prompt for and retrieve the next integer from the user.
The first line of output from this app is the array of int
s, in increasing order. When the user instructs the app to search for 72
, the app first tests the middle element (indicated by --
below the element in the output), which is 52
. The search key is greater than 52
, so the app eliminates from consideration the first half of the array and tests the middle element from the second half. The search key is smaller than 82
, so the app eliminates from consideration the second half of the subarray, leaving only three elements. Finally, the app checks 72
(which matches the search key) and returns the index 9
. string
method PadLeft
(lines 62 and 89) adds the specified number of spaces to the beginning of the string
on which the method is called and returns a new string
containing the results.
BinarySearch
Lines 50–83 declare method BinarySearch
, which receives as parameters the array to search and the search key. Lines 52–54 calculate the low
end index, high
end index and middle
index of the portion of the array being searched. When the method is first called, the low
index is 0
, the high
index is the length of the array minus 1
and the middle
is the average of these two values. Lines 56–80 loop until low
is greater than high
(this occurs when the element is not found). Line 65 tests whether the value in the middle
element is equal to searchElement
. If this is true
, line 67 returns middle
—the search key’s location—to the caller. Each iteration of the loop tests a single value (line 65 and 70) and eliminates half of the remaining values in the array (line 72 or 76), then adjusts the middle
index (line 79) for the next iteration of the loop. Lines 59 and 62 in this program—and similar statements in the chapters’ remaining programs—help visualize the algorithm’s operation. Actual searching and sorting implementations (and other time-critical algorithms) should not include output statements.
In the worst-case scenario, searching a sorted array of 1,023 elements causes Binary-Search
’s loop to iterate only 10 times. Repeatedly dividing 1,023 by 2 (because after each comparison, we are able to eliminate half of the array) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary search algorithm. Thus, an array of elements takes a maximum of 20 comparisons to find the key, and an array of one billion elements (which is less than takes a maximum of 30 comparisons to find the key. This is a tremendous improvement in performance over the linear search. For a one-billion-element array, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array is the exponent of the first power of 2 greater than the number of elements in the array, which is represented as . All logarithms grow at roughly the same rate, so in Big O notation the base can be omitted. This results in a big O of O(log n) for a binary search, which is also known as logarithmic runtime.