18.2 Searching Algorithms

Looking up a phone number, accessing a website and checking the definition of a word in a dictionary all involve searching large amounts of data. The next two sections discuss two common search algorithms—one that is easy to program yet relatively inefficient and one that is relatively efficient but more complex to program.

18.2.1 Linear Search

The linear search algorithm searches each element in an array sequentially. If the search key does not match an element in the array, the algorithm tests each element and, when the end of the array is reached, informs the user that the search key is not present. If the search key is in the array, the algorithm tests each element until it finds one that matches the search key and returns the index of that element.

As an example, consider an array containing the following values


34   56   2   10   77   51   93   30   5   52

and a method that is searching for 51. Using the linear search algorithm, the method first checks whether 34 (at index 0) matches the search key. It does not, so the algorithm checks whether 56 (at index 1) matches the search key. The method continues moving through the array sequentially, testing 2 (at index 2), then 10 (at index 3), then 77 (at index 4). When the method tests 51 (at index 5), which matches the search key, the method returns 51’s location (5) in the array. If, after checking every array element, the method determines that the search key does not match any element in the array, the method returns -1. If there are duplicate values in the array, linear search returns the index of the first element in the array that matches the search key.

Performing a Linear Search

Figure 18.2 implements the linear search algorithm. Method Main declares as local variables a Random object (line 10) to produce random int values and the int array data (line 11) to store randomly generated ints. Lines 14–17 store in data random ints in the range 1099. Line 19 displays the array using string method Join to create a string with the array’s elements separated by spaces. Join’s first argument is a delimiter that’s placed between each pair of elements and the second argument is the array. Lines 22–23 prompt for and input the search key. Lines 26–45 loop until the user enters -1. Line 29 calls method LinearSearch to determine whether searchInt is in the array. If so, LinearSearch returns the element’s index, which is displayed in lines 33–34. If not, LinearSearch returns -1, and lines 38–39 notify the user. Lines 43–44 prompt for and input the next search key.

Fig. 18.2 Class that contains an array of random integers and a method that searches that array sequentially.

Alternate View

  1   // Fig. 18.2: LinearSearchTest.cs
  2   // Class that contains an array of random integers and a method
  3   // that searches that array sequentially.
  4   using System;
  5
  6   public class LinearSearchTest
  7   {
  8      static void Main()
  9      {
 10         var generator = new Random();
 11         var data = new int[10]; // create space for array
 12
 13         // fill array with random ints in range 10–99
 14         for (var i = 0; i < data.Length; ++i)
 15         {
 16            data[i] = generator.Next(10, 100);
 17         }
 18
 19         Console.WriteLine(string.Join(" ", data) + "
"); // display array
 20
 21         // input first int from user
 22         Console.Write("Enter an integer value (−1 to quit): ");
 23         var searchInt = int.Parse(Console.ReadLine());
 24
 25         // repeatedly input an integer; −1 terminates the app
 26         while (searchInt != −1)
 27         {
 28            // perform linear search
 29            int position = LinearSearch(data, searchInt);
 30
 31            if (position != −1) // integer was found
 32            {
 33               Console.WriteLine($"The integer {searchInt} was found in " +
 34                  $"position {position}.
");
 35            }
 36            else // integer was not found 
 37            {
 38               Console.WriteLine(
 39                  $"The integer {searchInt} was not found.
");
 40            }
 41
 42            // input next int from user 
 43            Console.Write("Enter an integer value (−1 to quit): ");
 44            searchInt = int.Parse(Console.ReadLine());
 45         }
 46      }
 47
 48      // perform a linear search on the data
 49      public static int LinearSearch(int[] values, int searchKey)
 50      { 
 51         // loop through array sequentially  
 52         for (var index = 0; index < values.Length; ++index)
 53         { 
 54            if (values[index] == searchKey)
 55            {
 56               return index; // return the element's index
 57            } 
 58         }
 59  
 60         return −1; // integer was not found
 61      } 
 62   }

64 90 84 62 28 68 55 27 78 73

Enter an integer value (−1 to quit): 78
The integer 78 was found in position 8.

Enter an integer value (−1 to quit): 64
The integer 64 was found in position 0.

Enter an integer value (−1 to quit): 65
The integer 65 was not found.

Enter an integer value (−1 to quit): −1

Method LinearSearch

Lines 49–61 perform the linear search. The search key is passed to parameter searchKey. Lines 52–58 loop through the elements in the array. Line 54 compares the current element in the array with searchKey. If the values are equal, line 56 returns the index of the element. If the loop ends without finding the value, line 60 returns −1.

Efficiency of Linear Search

Searching algorithms all accomplish the same goal—finding an element that matches a given search key, if such an element exists. Many things, however, differentiate search algorithms from one another. The major difference is the amount of effort required to complete the search. One way to describe this effort is with Big O notation, which is a measure of the worst-case runtime for an algorithm—that is, how hard an algorithm may have to work to solve a problem. For searching and sorting algorithms, this is particularly dependent on how many elements there are in the data set and the algorithm used.

Constant Runtime

Suppose an algorithm is designed to test whether the first element of an array is equal to the second element. If the array has 10 elements, this algorithm requires one comparison. If the array has 1,000 elements, this algorithm still requires one comparison. In fact, this algorithm is completely independent of the number of elements in the array, and is thus said to have a constant runtime, which is represented in Big O notation as O(1). An algorithm that is O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constant—it does not grow as the size of the array increases. An algorithm that tests whether the first element of an array is equal to any of the next three elements is still O(1), even though it requires three comparisons.

Linear Runtime

An algorithm that tests whether the first element of an array is equal to any of the other elements of the array will require at most n1 comparisons, where n is the number of elements in the array. If the array has 10 elements, this algorithm requires up to nine comparisons. If the array has 1,000 elements, this algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression dominates, and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of n1 comparisons (such as the one we described earlier) is said to be O(n). An O(n) algorithm is referred to as having a linear runtime. O(n) is often pronounced “on the order of n” or more simply “order n.”

Quadratic Runtime

Now suppose you have an algorithm that tests whether any element of an array is duplicated elsewhere in the array. The first element must be compared with every other element in the array. The second element must be compared with every other element except the first (it was already compared to the first). The third element must be compared with every other element except the first two. In the end, this algorithm will end up making (n1)+(n2)++2+1 or n2/2n/2 comparisons. As n increases, the n2 term dominates and the n term becomes inconsequential. Again, Big O notation highlights the n2 term, leaving n2/2. But as we’ll soon see, constant factors are omitted in Big O notation.

Big O is concerned with how an algorithm’s runtime grows in relation to the number of items processed. Suppose an algorithm requires n2 comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, the algorithm will require 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring n2/2 comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant, and both algorithms are considered to be O(n2), referred to as quadratic runtime and pronounced “on the order of n-squared” or more simply “order n-squared.”

When n is small, O(n2) algorithms (running on today’s billions-of-operations-per-second personal computers) will not noticeably affect performance. But as n grows, you’ll start to notice the performance degradation. An O(n2) algorithm running on a million-element array would require a trillion “operations” (where each could actually require several machine instructions to execute). This could require many minutes to execute. A billion-element array would require a quintillion operations, a number so large that the algorithm could take decades! O(n2) algorithms are easy to write, as you’ll see shortly. You’ll also see algorithms with more favorable Big O measures. These efficient algorithms often take more cleverness and effort to create, but their superior performance can be well worth the extra effort, especially as n gets large.

Linear Search Runtime

The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search item exists in the array. If the size of the array is doubled, the number of comparisons that the algorithm must perform is also doubled. Linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array.

Linear search is the easiest search algorithm to program, but it can be slow compared to other search algorithms. If an app needs to perform many searches on large arrays, it may be better to implement a different, more efficient algorithm, such as the binary search, which we present in the next section.

Performance Tip 18.1

Sometimes the simplest algorithms perform poorly. Their virtue is that they’re easy to program, test and debug. Sometimes more complex algorithms are required to realize maximum performance.

18.2.2 Binary Search

The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array first be sorted. The first iteration of this algorithm tests the middle element in the array. If this matches the search key, the algorithm ends. Assuming the array is sorted in ascending order, if the search key is less than the middle element, the search key cannot match any element in the second half of the array and the algorithm continues with only the first half of the array (i.e., the first element up to, but not including, the middle element). If the search key is greater than the middle element, the search key cannot match any element in the first half of the array, and the algorithm continues with only the second half of the array (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the remaining portion of the array, called a subarray. A subarray can have no elements, or it can encompass the entire array. If the search key does not match the element, the algorithm eliminates half of the remaining elements. The algorithm ends by either finding an element that matches the search key or reducing the subarray to zero size.

As an example, consider the sorted 15-element array


2  3  5  10  27  30  34  51  56  65  77  81  82  93  99

and a search key of 65. The binary search algorithm would first check whether 51 is the search key, because 51 is the array’s middle element. The search key 65 is larger than 51, so 51 is “discarded” (i.e., eliminated from consideration) along with the first half of the array (all elements smaller than 51), leaving the following subarray to be searched:


56  65  77  81  82  93  99

Next, the algorithm checks whether the subarray’s middle element (81) matches the search key—65 is smaller than 81, so 81 is discarded along with the elements larger than 81, leaving the following subarray to be searched:


56  65  77

After just two tests, the algorithm has narrowed the search from 15 elements to three. The algorithm then checks 65 (which indeed matches the search key) and returns that array element’s index. This algorithm required just three comparisons to determine whether the search key matched an element in the array. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we have chosen to use an array with 15 elements so that there will always be an obvious middle element in the array. With an even number of elements, the middle of the array lies between two elements. We implement the algorithm to choose the higher of the two elements.]

Implementing Binary Search

Figure 18.3 declares class BinarySearchTest, which contains static methods Main, BinarySearch and DisplayElements (which helps visualize the binary search algorithm). Method Main has as local variables a Random object named generator (line 10) to produce random int values and the int array data (line 11) to store randomly generated ints. Lines 14–17 generate random ints in the range 1099 and store them in data. Recall that a binary search requires an array’s elements to be sorted. For this reason, line 19 calls class Array’s static method Sort for the array data—this sorts the elements in an array in ascending order. Line 20 outputs the array’s contents by calling method DisplayElements (lines 86–98). Lines 23–24 prompt the user for the search key and store it in searchInt. Lines 27–46 loop until the user enters −1. Line 30 calls method BinarySearch to determine whether searchInt is in the array. If searchInt is found, BinarySearch returns the element’s index, which is displayed in lines 34–35. If searchInt is not in the array, BinarySearch returns −1, and lines 39–40 notify the user. Lines 44–45 prompt for and retrieve the next integer from the user.

Fig. 18.3 Class that contains an array of random integers and a method that uses binary search to find an integer.

Alternate View

 1   // Fig. 18.3: BinarySearchTest.cs 
 2   // Class that contains an array of random integers and a method 
 3   // that uses binary search to find an integer. 
 4   using System;
 5
 6   public class BinarySearchTest
 7   {
 8      static void Main()
 9      {
10         var generator = new Random();
11         var data = new int[15]; // create space for array 
12
13         // fill array with random ints in range 10–99 
14         for (var i = 0; i < data.Length; ++i)
15         {
16            data[i] = generator.Next(10, 100);
17         }
18
19         Array.Sort(data); // elements must be sorted in ascending order
20         DisplayElements(data, 0, data.Length − 1); // display array 
21
22         // input first int from user 
23         Console.Write("
Please enter an integer value (−1 to quit): ");
24         int searchInt = int.Parse(Console.ReadLine());
25
26         // repeatedly input an integer; −1 terminates the app 
27         while (searchInt != −1)
28         {
29            // perform binary search 
30            int position = BinarySearch(data, searchInt);
31
32            if (position != −1) // integer was found
33            {
34               Console.WriteLine($"The integer {searchInt} was found in " +
35                  $"position {position}.
");
36            }
37            else // integer was not found 
38            {
39               Console.WriteLine(
40                  $"The integer {searchInt} was not found.
");
41            }
42
43            // input next int from user 
44            Console.Write("Please enter an integer value (−1 to quit): ");
45            searchInt = int.Parse(Console.ReadLine());
46         }
47      }
48
49      // perform a binary search on the data  
50      public static int BinarySearch(int[] values, int searchElement)     
51      {                                                                   
52         var low = 0; // low end of the search area                       
53         var high = values.Length − 1; // high end of the search area     
54         var middle = (low + high + 1) / 2; // middle element             
55                                                                          
56         do // loop to search for element                                 
57         {                                                                
58            // display remaining elements of array                        
59            DisplayElements(values, low, high);                           
60                                                                          
61            // indicate current middle; pad left with spaces for alignment
62            Console.WriteLine("-- ".PadLeft((middle + 1) * 3));           
63                                                                          
64            // if the element is found at the middle                      
65            if (searchElement == values[middle])                          
66            {                                                             
67               return middle; // search key found, so return its index    
68            }                                                             
69            // middle element is too high                                 
70            else if (searchElement < values[middle])                      
71            {                                                             
72               high = middle − 1; // eliminate the higher half            
73            }                                                             
74            else // middle element is too low                             
75            {                                                             
76               low = middle + 1; // eliminate the lower half              
77            }                                                             
78                                                                          
79            middle = (low + high + 1) / 2; // recalculate the middle      
80         } while (low <= high);                                           
81                                                                          
82         return −1; // search key was not found                           
83      }                                                                   
84
85      // method to output certain values in array 
86      public static void DisplayElements(int[] values, int low, int high)
87      {
88         // output three spaces for each element up to low for alignment 
89         Console.Write(string.Empty.PadLeft(low * 3));
90
91         // output elements left in array 
92         for (var i = low; i <= high; ++i)
93         {
94            Console.Write($"{values[i]} ");
95         }
96
97         Console.WriteLine();
98       }
99   }

12 17 22 25 30 39 40 52 56 72 76 82 84 91 93

Please enter an integer value (−1 to quit):  72
12 17 22 25 30 39 40 52 56 72 76 82 84 91 93
                      *
                        56 72 76 82 84 91 93
                                  *
                        56 72 76
                            *
The integer 72 was found in position 9.

Please enter an integer value (−1 to quit):  13
12 17 22 25 30 39 40 52 56 72 76 82 84 91 93
                       *
12 17 22 25 30 39 40
          *
12 17 22
    *
12
 *
The integer 13 was not found.

Please enter an integer value (−1 to quit):  −1

The first line of output from this app is the array of ints, in increasing order. When the user instructs the app to search for 72, the app first tests the middle element (indicated by -- below the element in the output), which is 52. The search key is greater than 52, so the app eliminates from consideration the first half of the array and tests the middle element from the second half. The search key is smaller than 82, so the app eliminates from consideration the second half of the subarray, leaving only three elements. Finally, the app checks 72 (which matches the search key) and returns the index 9. string method PadLeft (lines 62 and 89) adds the specified number of spaces to the beginning of the string on which the method is called and returns a new string containing the results.

Method BinarySearch

Lines 50–83 declare method BinarySearch, which receives as parameters the array to search and the search key. Lines 52–54 calculate the low end index, high end index and middle index of the portion of the array being searched. When the method is first called, the low index is 0, the high index is the length of the array minus 1 and the middle is the average of these two values. Lines 56–80 loop until low is greater than high (this occurs when the element is not found). Line 65 tests whether the value in the middle element is equal to searchElement. If this is true, line 67 returns middle—the search key’s location—to the caller. Each iteration of the loop tests a single value (line 65 and 70) and eliminates half of the remaining values in the array (line 72 or 76), then adjusts the middle index (line 79) for the next iteration of the loop. Lines 59 and 62 in this program—and similar statements in the chapters’ remaining programs—help visualize the algorithm’s operation. Actual searching and sorting implementations (and other time-critical algorithms) should not include output statements.

Efficiency of Binary Search

In the worst-case scenario, searching a sorted array of 1,023 elements causes Binary-Search’s loop to iterate only 10 times. Repeatedly dividing 1,023 by 2 (because after each comparison, we are able to eliminate half of the array) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number 1023(2101) is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary search algorithm. Thus, an array of 1,048,575(2201) elements takes a maximum of 20 comparisons to find the key, and an array of one billion elements (which is less than 2301 takes a maximum of 30 comparisons to find the key. This is a tremendous improvement in performance over the linear search. For a one-billion-element array, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array is the exponent of the first power of 2 greater than the number of elements in the array, which is represented as log2 n. All logarithms grow at roughly the same rate, so in Big O notation the base can be omitted. This results in a big O of O(log n) for a binary search, which is also known as logarithmic runtime.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset