20.2 Searching Algorithms

Looking up a phone number, accessing a website and checking a word’s definition in a dictionary all involve searching through large amounts of data. A searching algorithm finds an element that matches a given search key, if such an element does, in fact, exist. There are, however, a number of things that differentiate search algorithms from one another. The major difference is the amount of effort they require to complete the search. One way to describe this effort is with Big O notation. For searching and sorting algorithms, this is particularly dependent on the number of data elements.

In Section 20.2.1, we present the linear search algorithm then discuss the algorithm’s efficiency as measured by Big O notation. In Section 20.2.2, we introduce the binary search algorithm, which is much more efficient but more complex to implement.

20.2.1 Linear Search

In this section, we discuss the simple linear search for determining whether an unsorted array (i.e., an array with element values that are in no particular order) contains a specified search key. Exercise 20.8 at the end of this chapter asks you to implement a recursive version of the linear search.

Function Template linearSearch

Function template linearSearch (Fig. 20.2, lines 10–19) compares each element of an array with a search key (line 13). Because the array is not in any particular order, it’s just as likely that the search key will be found in the first element as the last. On average, therefore, the program must compare the search key with half of the array’s elements. To determine that a value is not in the array, the program must compare the search key to every array element. Linear search works well for small or unsorted arrays. However, for large arrays, linear searching is inefficient. If the array is sorted (e.g., its elements are in ascending order), you can use the high-speed binary search technique (Section 20.2.2).


Fig. 20.2 Linear search of an array.

Alternate View

 1   // Fig. 20.2: LinearSearch.cpp
 2   // Linear search of an array.
 3   #include <iostream>
 4   #include <array>
 5   using namespace std;
 6
 7   // compare key to every element of array until location is
 8   // found or until end of array is reached; return location of
 9   // element if key is found or -1 if key is not found
10   template <typename T, size_t size>                           
11   int linearSearch(const array<T, size>& items, const T& key) {
12      for (size_t i{0}; i < items.size(); ++i) {                
13         if (key == items[i]) { // if found,                    
14            return i; // return location of key                 
15         }                                                      
16      }                                                         
17                                                                
18      return -1; // key not found                               
19   }                                                            

20
21   int main() {
22      const size_t arraySize{100}; // size of array
23      array<int, arraySize> arrayToSearch; // create array
24
25      for (size_t i{0}; i < arrayToSearch.size(); ++i) {
26         arrayToSearch[i] = 2 * i; // create some data
27      }
28
29      cout << "Enter integer search key: ";
30      int searchKey; // value to locate
31      cin >> searchKey;
32
33      // attempt to locate searchKey in arrayToSearch
34      int element{linearSearch(arrayToSearch, searchKey)};
35
36      // display results
37      if (element != -1) {
38         cout << "Found value in element " << element << endl;
39      }
40      else {
41         cout << "Value not found" << endl;
42      }
43   }

Enter integer search key: 36
Found value in element 18

Enter integer search key: 37
Value not found

Big O: Constant Runtime

Suppose an algorithm simply tests whether the first element of an array is equal to the second element. If the array has 10 elements, this algorithm requires only one comparison. If the array has 1000 elements, the algorithm still requires only one comparison. In fact, the algorithm is independent of the number of array elements. This algorithm is said to have a constant runtime, which is represented in Big O notation as O(1). An algorithm that’s O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constant—it does not grow as the size of the array increases. An algorithm that tests whether the first element of an array is equal to any of the next three elements will always require three comparisons, but in Big O notation it’s still considered O(1). O(1) is often pronounced “on the order of 1” or more simply “order 1.”

Big O: Linear Runtime

An algorithm that tests whether the first element of an array is equal to any of the other elements of the array requires at most n1 comparisons, where n is the number of elements in the array. If the array has 10 elements, the algorithm requires up to nine comparisons. If the array has 1000 elements, the algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression n1 “dominates,” and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of n1 comparisons (such as the one we described in this paragraph) is said to be O(n) and is referred to as having a linear runtime. O(n) is often pronounced “on the order of n” or more simply “order n.”

Big O: Quadratic Runtime

Now suppose you have an algorithm that tests whether any element of an array is duplicated elsewhere in the array. The first element must be compared with all the other elements. The second element must be compared with all the other elements except the first (it was already compared to the first). The third element then must be compared with all the other elements except the first two. In the end, this algorithm will end up making (n1)+(n2)++2+1 or n2/2n/2 comparisons. As n increases, the n2 term dominates and the n term becomes inconsequential. Again, Big O notation highlights the n2 term, leaving n2/2. As we’ll soon see, even constant factors, such as the 1/2 here, are omitted in Big O notation.

Big O is concerned with how an algorithm’s runtime grows in relation to the number of items processed. Suppose an algorithm requires n2 comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring n2/2 comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant, and both algorithms are considered to be O(n2), which is referred to as quadratic runtime and pronounced “on the order of n-squared” or more simply “order n-squared.”

O(n2) Performance

When n is small, O(n2) algorithms (running on today’s billions-of-operations-per-second personal computers) will not noticeably affect performance. But as n grows, you’ll start to notice the performance degradation. An O(n2) algorithm running on a million-element array would require a trillion “operations” (where each could actually require several machine instructions to execute). This could require hours to execute. A billion-element array would require a quintillion operations, a number so large that the algorithm could take decades! Unfortunately, O(n2) algorithms tend to be easy to write. In this chapter, you’ll see algorithms with more favorable Big O measures. Such efficient algorithms often take a bit more cleverness and effort to create, but their superior performance can be worth the extra effort, especially as n gets large.

Linear Search’s Runtime

The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search key is in the array. If the array’s size doubles, the number of comparisons that the algorithm must perform also doubles. Linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array. If a program needs to perform many searches on large arrays, it may be better to implement a different, more efficient algorithm, such as the binary search which we consider in the next section.

Performance Tip 20.1

Sometimes the simplest algorithms perform poorly. Their virtue is that they’re easy to program, test and debug. Sometimes more complex algorithms are required to maximize performance.

20.2.2 Binary Search

The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array first be sorted. This is only worthwhile when the array, once sorted, will be searched a great many times—or when the searching application has stringent performance requirements. The first iteration of this algorithm tests the middle array element. If this matches the search key, the algorithm ends. Assuming the array is sorted in ascending order, then if the search key is less than the middle element, the search key cannot match any element in the array’s second half so the algorithm continues with only the first half (i.e., the first element up to, but not including, the middle element). If the search key is greater than the middle element, the search key cannot match any element in the array’s first half so the algorithm continues with only the second half (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the array’s remaining elements. If the element does not match the search key, the algorithm eliminates half of the remaining elements. The algorithm ends either by finding an element that matches the search key or by reducing the sub-array to zero size.

Binary Search of 15 Integer Values

As an example, consider the sorted 15-element array


2  3  5  10  27  30  34  51  56  65  77  81  82  93  99

and the search key 65. A binary search first checks whether the middle element (51) is the search key. The search key (65) is larger than 51, so 51 is eliminated from consideration along with the first half of the array (all elements smaller than 51.) Next, the algorithm checks whether 81 (the middle element of the remaining elements) matches the search key. The search key (65) is smaller than 81, so 81 is eliminated from consideration along with the elements larger than 81. After just two tests, the algorithm has narrowed the number of elements to check to three (56, 65 and 77). The algorithm then checks 65 (which matches the search key), and returns the element’s index (9). In this case, the algorithm required just three comparisons to determine whether the array contained the search key. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we’ve chosen to use an array with 15 elements, so that there will always be an obvious middle element in the array. With an even number of elements, the middle of the array lies between two elements. We implement the algorithm to choose the element with the higher index number.]

Binary Search Example

Figure 20.3 implements and demonstrates the binary-search algorithm. Throughout the program’s execution, we use function template displayElements (lines 11–23) to display the portion of the array that’s currently being searched.




Fig. 20.3 Binary search of an array.

Alternate View

 1   // Fig 20.3: BinarySearch.cpp
 2   // Binary search of an array.
 3   #include <algorithm>
 4   #include <array>
 5   #include <ctime>
 6   #include <iostream>
 7   #include <random>
 8   using namespace std;
 9
10   // display array elements from index low through index high
11   template <typename T, size_t size>
12   void displayElements(const array<T, size>& items,
13      size_t low, size_t high) {
14      for (size_t i{0}; i < items.size() && i < low; ++i) {
15         cout << "   "; // display spaces for alignment
16      }
17
18      for (size_t i{low}; i < items.size() && i <= high; ++i) {
19         cout << items[i] << " "; // display element
20      }
21
22      cout << endl;
23   }
24
25   // perform a binary search on the data
26   template <typename T, size_t size>                               
27   int binarySearch(const array<T, size>& items, const T& key) {    
28      int low{0}; // low index of elements to search                
29      int high{static_cast<int>(items.size()) - 1}; // high index   
30      int middle{(low + high + 1) / 2}; // middle element           
31      int location{-1}; // key's index; -1 if not found             
32
33      do { // loop to search for element                            
34         // display remaining elements of array to be searched
35         displayElements(items, low, high);
36
37         // output spaces for alignment
38         for (int i{0}; i < middle; ++i) {
39            cout << "   ";
40         }
41
42         cout << " * " << endl; // indicate current middle
43
44         // if the element is found at the middle                
45         if (key == items[middle]) {                             
46            location = middle; // location is the current middle 
47         }                                                       
48         else if (key < items[middle]) { // middle is too high   
49            high = middle - 1; // eliminate the higher half      
50         }                                                       
51         else { // middle element is too low                     
52            low = middle + 1; // eliminate the lower half        
53         }                                                       
54                                                                 
55         middle = (low + high + 1) / 2; // recalculate the middle
56      } while ((low <= high) && (location == -1));               
57
58      return location; // return location of key
59   }
60
61   int main() {
62      // use the default random-number generation engine to produce
63      // uniformly distributed pseudorandom int values from 10 to 99
64      default_random_engine engine{
65         static_cast<unsigned int>(time(nullptr))};
66      uniform_int_distribution<unsigned int> randomInt{10, 99};
67
68      const size_t arraySize{15}; // size of array
69      array<int, arraySize> arrayToSearch; // create array
70
71      // fill arrayToSearch with random values
72      for (int &item : arrayToSearch) {
73         item = randomInt(engine);
74      }
75
76      sort(arrayToSearch.begin(), arrayToSearch.end()); // sort the array
77
78      // display arrayToSearch's values
79      displayElements(arrayToSearch, 0, arrayToSearch.size() - 1);
80
81      // get input from user
82      cout << "
Please enter an integer value (-1 to quit): ";
83      int searchKey; // value to locate
84      cin >> searchKey; // read an int from user
85      cout << endl;
86
87      // repeatedly input an integer; -1 terminates the program
88      while (searchKey != -1) {
89         // use binary search to try to find integer
90         int position{binarySearch(arrayToSearch, searchKey)};
91
92         // return value of -1 indicates integer was not found
93         if (position == -1) {
94            cout << "The integer " << searchKey << " was not found.
";
95         }
96         else {
97            cout << "The integer " << searchKey
98               << " was found in position " << position << ".
";
99         }
100
101         // get input from user
102         cout << "

Please enter an integer value (-1 to quit): ";
103         cin >> searchKey; // read an int from user
104         cout << endl;
105      }
106   }

10 23 27 48 52 55 58 60 62 63 68 72 75 92 97
Please enter an integer value (-1 to quit):  48
10 23 27 48 52 55 58 60 62 63 68 72 75 92 97
                      *
10 23 27 48 52 55 58
           *
The integer 48 was found in position 3.

Please enter an integer value (-1 to quit):  92
10 23 27 48 52 55 58 60 62 63 68 72 75 92 97
                      *
                        62 63 68 72 75 92 97
                                  *
                                    75 92 97
                                        *
The integer 92 was found in position 13.

Please enter an integer value (-1 to quit):  22
10 23 27 48 52 55 58 60 62 63 68 72 75 92 97
                      *
10 23 27 48 52 55 58
          *
10 23 27
   *
10
 *
The integer 22 was not found.

Please enter an integer value (-1 to quit):  -1

Function Template binarySearch

Lines 26–59 define function template binarySearch, which has two parameters—a reference to the array to search and a reference to the search key. Lines 28–30 calculate the low end index, high end index and middle index of the portion of the array that the algorithm is currently searching. When binarySearch is first called, low is 0, high is the array’s size minus 1 and middle is the average of these two values. Line 31 initializes location to -1—the value that binarySearch returns if the search key is not found. Lines 33–56 loop until low is greater than high (indicating that the element was not found) or location does not equal -1 (indicating that the search key was found). Line 45 tests whether the value in the middle element is equal to key. If so, line 46 assigns the middle index to location. Then the loop terminates and location is returned to the caller. Each iteration of the loop that does not find the search key tests a single value (line 48) and eliminates half of the remaining values in the array (line 49 or 51).

Function main

Lines 64–66 set up a random-number generator for int values from 1099. Lines 68–74 create an array and fill it with random ints. Recall that the binary search algorithm requires a sorted array, so line 76 calls the Standard Library function sort to sort arrayToSearch’s elements into ascending order. Line 78 displays arrayToSearch’s sorted contents.

Lines 88–105 loop until the user enters the value -1. For each search key the user enters, the program performs a binary search of arrayToSearch to determine whether it contains the search key. The first line of output from this program shows arrayToSearch’s contents in ascending order. When the user instructs the program to search for 48, the program first tests the middle element, which is 60 (as indicated by *). The search key is less than 60, so the program eliminates the second half of the array and tests the middle element from the first half of the array. The search key equals 48, so the program returns the index 3 after performing just two comparisons. The output also shows the results of searching for the values 92 and 22.

Efficiency of Binary Search

In the worst-case scenario, searching a sorted array of 1023 elements will take only 10 comparisons when using a binary search. Repeatedly dividing 1023 by 2 (because, after each comparison, we can eliminate from consideration half of the remaining elements) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number 1023 (2101) is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary search algorithm. Thus, an array of 1,048,575 (2201) elements takes a maximum of 20 comparisons to find the key, and an array of approximately one billion elements takes a maximum of 30 comparisons to find the key. This is a tremendous performance improvement over the linear search. For a one-billion-element array, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array is the exponent of the first power of 2 greater than the number of elements in the array, which is represented as log2 n. All logarithms grow at roughly the same rate, so in Big O notation the base can be omitted. This results in a Big O of O(log n) for a binary search, which is also known as logarithmic runtime and pronounced “on the order of log n” or more simply “order log n.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset