21.4 Dictionary Collections

A dictionary is the general term for a collection of key–value pairs. Section 17.9.2 introduced the generic Dictionary collection. In this section, we discuss fundamentals of how a Dictionary works, then demonstrate the related SortedDictionary collection.

21.4.1 Dictionary Fundamentals

When an app creates objects of new or existing types, it needs to manage those objects efficiently. This includes sorting and retrieving objects. Sorting and retrieving information with arrays is efficient if some aspect of your data directly matches the key value and if those keys are unique and tightly packed. If you have 100 employees with nine-digit social security numbers and you want to store and retrieve employee data by using the social security number as a key, it would nominally require an array with 1,000,000,000 elements, because there are 1,000,000,000 unique nine-digit numbers. If you have an array that large, you could get high performance storing and retrieving employee records by simply using the social security number as the array index, but it would be a huge waste of memory. Many apps have this problem—either the keys are of the wrong type (i.e., not nonnegative integers), or they’re of the right type but are sparsely spread over a large range.

Hashing

What’s needed is a high-speed scheme for converting keys such as social security numbers and inventory part numbers to unique array indices. Then, when an app needs to store something, the scheme could convert the key rapidly to an index and the record of information could be stored at that location in the array. Retrieval occurs the same way—once the app has a key for which it wants to retrieve the data record, the app simply applies the conversion to the key, which produces the array index where the data resides in the array and retrieves the data.

The scheme we describe here is the basis of a technique called hashing. Why the name? Because, when we convert a key into an array index, we literally scramble the bits, making a “hash” of the number. The number actually has no real significance beyond its usefulness in storing and retrieving this particular data record. Data structures that use hashing are commonly called hash tables (like class Hashtable in the System.Collections namespace). A hash table is one way to implement a dictionary—class Dictionary<K,V> in the System.Collections.Generic namespace is implemented as a hash table.

Collisions

A glitch in the scheme occurs when there are collisions (i.e., two different keys “hash into” the same cell, or element, in the array). Since we cannot sort two different data records to the same space, we need to find an alternative home for all records beyond the first that hash to a particular array index. One scheme for doing this is to “hash again” (i.e., to re-apply the hashing transformation to the key to provide a next candidate cell in the array). The hashing process is designed so that with just a few hashes, an available cell will be found.

Another scheme uses one hash to locate the first candidate cell. If the cell is occupied, successive cells are searched linearly until an available cell is found. Retrieval works the same way—the key is hashed once, the resulting cell is checked to determine whether it contains the desired data. If it does, the search is complete. If it does not, successive cells are searched linearly until the desired data is found.

The most popular solution to hash-table collisions is to have each cell of the table be a hash “bucket”—typically, a linked list of all the key–value pairs that hash to that cell. This is the solution that the .NET Framework’s Dictionary class implements.

Load Factor

The load factor affects the performance of hashing schemes. The load factor is the ratio of the number of objects stored in the hash table to the total number of cells of the hash table. As this ratio gets higher, the chance of collisions tends to increase.

Performance Tip 21.1

The load factor in a hash table is a classic example of a space/time trade-off: By increasing the load factor, we get better memory utilization, but the app runs slower due to increased hashing collisions. By decreasing the load factor, we get better speed because of reduced hashing collisions, but we get poorer memory utilization because a larger portion of the hash table remains empty.

Hash Function

A hash function performs a calculation that determines where to place data in the hash table. The hash function is applied to the key in a key–value pair of objects. Any object can be used as a key. For this reason, class object defines method GetHashCode, which all objects inherit. Most classes that are candidates to be used as keys in a hash table, such as string, override this method to provide one that performs efficient hash-code calculations for a specific type.

21.4.2 Using the SortedDictionary Collection

The .NET Framework provides several implementations of dictionaries that implement the IDictionary<K,V> interface (described in Fig. 21.1). The app in Fig. 21.4 demonstrates the generic class SortedDictionary. Unlike class Dictionary—which is implemented as a hash table—class SortedDictionary stores its key–value pairs in a binary search tree (introduced in Section 19.7). As the class name suggests, the entries in Sort-edDictionary are sorted by key. For key types that implement IComparable<T>, the SortedDictionary uses the results of IComparable<T> method CompareTo to sort the keys. Despite these implementation details, we use the same public methods, properties and indexers with classes Dictionary and SortedDictionary. In many scenarios, these classes are interchangeable—this is the beauty of object-oriented programming.

Performance Tip 21.2

Because class SortedDictionary keeps its elements sorted in a binary tree, obtaining or inserting a key–value pair takes O(log n) time, which is fast compared to linear searching, then inserting.

Fig. 21.4 App counts the number of occurrences of each word in a string and stores them in a generic sorted dictionary.

Alternate View

 1   // Fig. 21.4: SortedDictionaryTest.cs
 2   // App counts the number of occurrences of each word in a string
 3   // and stores them in a generic sorted dictionary.
 4   using System;
 5   using System.Text.RegularExpressions;
 6   using System.Collections.Generic;
 7
  8   class SortedDictionaryTest
 9   {
10      static void Main()
11      {
12         // create sorted dictionary based on user input
13         SortedDictionary<string, int> dictionary = CollectWords();
14
15         DisplayDictionary(dictionary); // display sorted dictionary content
16      }
17
18      // create sorted dictionary from user input
19      private static SortedDictionary<string, int> CollectWords()
20      {
21         // create a new sorted dictionary
22         var dictionary = new SortedDictionary<string, int>();
23
24         Console.WriteLine("Enter a string: "); // prompt for user input
25         string input = Console.ReadLine(); // get input
26
27         // split input text into tokens
28         string[] words = Regex.Split(input, @"s+");
29
30         // processing input words
31         foreach (var word in words)
32         {
33            var key = word.ToLower(); // get word in lowercase
34
35            // if the dictionary contains the word
36            if (dictionary.ContainsKey(key))
37            {
38               ++dictionary[key];
39            }
40            else
41            {
42               // add new word with a count of 1 to the dictionary
43               dictionary.Add(key, 1);
44            }
45         }
46
47         return dictionary;
48      }
49
50      // display dictionary content
51      private static void DisplayDictionary<K, V>(
52         SortedDictionary<K, V> dictionary)       
53      {
54         Console.WriteLine(
55            $"
Sorted dictionary contains:
{"Key",-12}{"Value",-12}");
56
57         // generate output for each key in the sorted dictionary
58         // by iterating through the Keys property with a foreach statement
59         foreach (var key in dictionary.Keys)
60         {
61            Console.WriteLine($"{key,-12}{dictionary[key] ,-12}");
62         }
63
64         Console.WriteLine($"
size: {dictionary.Count}");
65      }
66   }

Enter a string:
We few, we happy few, we band of brothers

Sorted dictionary contains:
Key         Value
band        1
brothers    1
few,        2
happy       1
of          1
we          3

size: 6

Lines 4–6 contain using directives for namespaces System (for class Console), System.Text.RegularExpressions (for class Regex) and System.Collections.Generic (for class SortedDictionary). The generic class SortedDictionary takes two type arguments:

  • the first specifies the type of key (i.e., string) and

  • the second the type of value (i.e., int).

Class SortedDictionaryTest declares three static methods:

  • Method CollectWords (lines 19–48) inputs a sentence and returns a Sorted-Dictionary<string, int> in which the keys are the words in the sentence and the values are the number of times each word appears in the sentence.

  • Method DisplayDictionary (lines 51–65) displays the SortedDictionary passed to it in column format.

  • Method Main (lines 10–16) simply invokes CollectWords (line 13), then passes the SortedDictionary<string, int> returned by CollectWords to Display-Dictionary in line 15.

Method CollectWords

Method CollectWords (lines 19–48) begins by initializing local variable dictionary with a new SortedDictionary<string, int> (line 22). Lines 24–25 prompt the user and input a sentence as a string. We use static method Split of class Regex (introduced in the online section of Chapter 16) in line 28 to divide the string by its whitespace characters. In the regular expression s+, s means whitespace and + means one or more of the expression to its left—so the words are separated by one or more whitespace characters, which are discarded. This creates an array of “words,” which we then store in local variable words.

SortedDictionary Methods ContainsKey and Add

Lines 31–45 iterate through the array words. Each word is converted to lowercase with string method ToLower, then stored in variable key (line 33). Next, line 36 calls Sorted-Dictionary method ContainsKey to determine whether the word is in the dictionary. If so, that word occurred previously in the sentence. If the SortedDictionary does not contain an entry for the word, line 43 uses SortedDictionary method Add to create a new entry in the dictionary, with the lowercase word as the key and 1 as the value.

Common Programming Error 21.2

Using the Add method to add a key that already exists in the hash table causes an ArgumentException.

SortedDictionary Indexer

If the word is already a key in the hash table, line 38 uses the SortedDictionary’s indexer to obtain and set the key’s associated value (the word count) in the dictionary. Using the set accessor with a key that does not exist in the hash table creates a new entry, as if you had used the Add method, so line 43 could have been written as


dictionary[key] = 1;

Common Programming Error 21.3

Invoking the get accessor of a SortedDictionary indexer with a key that does not exist in the collection causes a KeyNotFoundException.

Method DisplayDictionary

Line 47 returns the dictionary to the Main method, which then passes it to method DisplayDictionary (lines 51–65), which displays all the key–value pairs. This method uses read-only property Keys (line 59) to get an ICollection<T> that contains all the keys. Because the interface ICollection<T> extends IEnumerable<T>, we can use this collection in the foreach statement in lines 59–62 to iterate over the keys. This loop accesses and outputs each key and its corresponding value using the iteration variable and the Sorted-Dictionary indexer’s get accessor. Each key and value is displayed left aligned in a field width of 12 positions. Because a SortedDictionary stores its key–value pairs in a binary search tree, the key–value pairs are displayed with the keys in sorted order. Line 64 uses SortedDictionary property Count to get the number of key–value pairs in the dictionary.

Iterating Over a SortedDictionary’s KeyValuePairs

Lines 59–62 could have also used the foreach statement with the SortedDictionary object itself, instead of using the Keys property. If you use a foreach statement with a SortedDictionary object, the iteration variable will be of type KeyValuePair<K,V>. The enumerator of a SortedDictionary uses the KeyValuePair<K,V> struct value type to store key–value pairs. KeyValuePair<K,V> provides properties Key and Value for retrieving the key and value of the current element.

SortedDictionary’s Values Property

If you do not need the keys, class SortedDictionary also provides a read-only Values property that gets an ICollection<T> of all the values stored in the SortedDictionary. You could use this property to iterate through the values stored in the SortedDictionary without regard for their corresponding keys.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset