Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

13
Hash Tables

13.1. Introduction

The data structures of binary search trees, AVL trees, B trees, tries, red-black trees and splay trees discussed so far in the book (Volume 2) are tree-based data structures. These are nonlinear data structures and serve to capture the hierarchical relationship that exists between the elements forming the data structure. However, there are applications that deal with linear or tabular forms of data, devoid of any superior-subordinate relationship. In such cases, employing these data structures would be superfluous. Hash tables are one among such data structures which favor efficient storage and retrieval of data elements that are linear in nature.

13.1.1. Dictionaries

Dictionary is a collection of data elements uniquely identified by a field called a key. A dictionary supports the operations of search, insert and delete. The ADT of a dictionary is defined as a set of elements with distinct keys supporting the operations of search, insert, delete and create (which creates an empty dictionary). While most dictionaries deal with distinct keyed elements, it is not uncommon to find applications calling for dictionaries with duplicate or repeated keys. In this case, it is essential that the dictionary evolves rules to resolve the ambiguity that may arise while searching for or deleting data elements with duplicate keys.

A dictionary supports both sequential and random access. Sequential access is one in which the data elements of the dictionary are ordered and accessed according to the order of the keys (ascending or descending, for example). Random access is one in which the data elements of the dictionary are accessed according to no particular order.

Hash tables are ideal data structures for dictionaries. In this chapter, we introduce the concept of hashing and hash functions. The structure and operations of the hash tables are also discussed. The various methods of collision resolution, for example, linear open addressing and chaining and their performance analyses are detailed. Finally, the application of hash tables in the fields of compiler design, relational database query processing and file organization are discussed.

13.2. Hash table structure

A hash function H(X) is a mathematical function which, when given a key X of the dictionary D maps it to a position P in a storage table termed hash table. The process of mapping the keys to their respective positions in the hash table is called hashing. Figure 13.1 illustrates a hash function.

When the data elements of the dictionary are to be stored in the hash table, each key X_i is mapped to a position P_i in the hash table as determined by the value of H(X_i), that is, P_i = H(X_i). To search for a key X in the hash table all that one does is determine the position P by computing P = H(X) and accessing the appropriate data element. In the case of insertion of a key X or its deletion, the position P in the hash table where the data element needs to be inserted or from where it is to be deleted respectively, is determined by computing P = H(X).

If the hash table is implemented using a sequential data structure, for example, arrays, then the hash function H(X) may be so chosen to yield a value that corresponds to the index of the array. In such a case the hash function is a mere mapping of the keys to the array indices.

The computations of the positions of the keys in the hash table are shown below:

Key XYmn	*H(XYmn)*	Position of the key in the hash table
`AB12`	ord(A)	1
`VP99`	ord(V)	22
`RK32`	ord(R)	18
`CG45`	ord(C )	3
`KL78`	ord(K)	11
`OW31`	ord(O)	15
`ST65`	ord(S)	19
`EX44`	ord(E)	5

The hash table accommodating the data elements appears as shown below:

1	`AB12`	……….
2	`….`
3	`CG45`
4	`….`
5	`EX44`	………
….	`…..`
11	`KL78`
…
15	`OW31`	………….
…
18	`RK32`
19	`ST65`	……………
….
22	`VP99`	….
…	`…`	…

In Example 13.1, it was assumed that the hash function yields distinct values for the individual keys. If this were to be followed as a criterion, then the situation may turn out of control since, in the case of dictionaries with a very large set of data elements, the hash table size can be too huge to be handled efficiently. Therefore, it is convenient to choose hash functions that yield values lying within a limited range so as to restrict the length of the table. This would consequently imply that the hash functions may yield identical values for a set of keys. In other words, a set of keys could be mapped to the same position in the hash table.

Let X₁, X₂, ….X_n be the n keys that are mapped to the same position P in the hash table. Then, H(X₁) = H(X₂) = …H(X_n) = P. In such a case, X₁, X₂, ….X_n are called synonyms. The act of two or more synonyms vying for the same position in the hash table is known as a collision. Naturally, this entails a modification in the structure of the hash table to accommodate the synonyms. The two important methods of linear open addressing and chaining to handle synonyms are presented in sections 13.4 and 13.5, respectively.

13.3. Hash functions

The choice of the hash function plays a significant role in the structure and performance of the hash table. It is therefore essential that a hash function satisfies the following characteristics:

easy and quick to compute;
even distribution of keys across the hash table. In other words, a hash function must minimize collisions.

13.3.1. Building hash functions

The following are some of the methods of obtaining hash functions:

Folding: The key is first partitioned into two or three or more parts. Each of the individual parts is combined using any of the basic arithmetic operations such as addition or multiplication. The resultant number could be conveniently manipulated, for example, truncated, to finally arrive at the index where the key is to be stored. Folding assures a better spread of keys across the hash table.
Example: Consider a six-digit numerical key: 719532. We choose to partition the key into three parts of two digits each, that is, 71 | 95 | 32, and merely add the numerical equivalent of each of the parts, that is, 71 + 95 + 32 = 198. Truncating the result yields 98 which is chosen as the index of the hash table where the key 719532 is to be accommodated.
Truncation: In this method, the selective digits of the key are extracted to determine the index of the hash table where the key needs to be accommodated. In the case of alphabetical keys, their numerical equivalents may be considered. Truncation though quick to compute does not ensure even distribution of keys.
Example: Consider a group of six-digit numerical keys that need to be accommodated in a hash table with 100 locations. We choose to select digits in positions 3 and 6 to determine the index where the key is to be stored. Thus, key 719532 would be stored in location 92 of the hash table.
Modular arithmetic: This is a popular method and the size of the hash table L is involved in the computation of the hash function. The function makes use of modulo arithmetic. Let k be the numerical key or the numerical equivalent if it is an alphabetical key. The hash function is given by

The hash function evidently returns a value that lies between 0 and L-1. Choosing L to be a prime number has a proven better performance by way of even distribution of keys.

Example: Consider a group of six-digit numerical keys that need to be stored in a hash table of size 111. For a key 145682, H(k) = 145682 mod 111 = 50. Hence, the key is stored in location 50 of the hash table.

13.4. Linear open addressing

Let us suppose a group of keys is to be inserted into a hash table HT of size L, making use of the modulo arithmetic function H(k) = k mod L. Since the range of the hash table index is limited to lie between 0 and L-1, for a population of N (N > L) keys, collisions are bound to occur. Hence, a provision needs to be made in the hash table to accommodate the data elements that are synonyms.

We choose to adopt a sequential data structure to accommodate the hash table. Let HT[0: L-1] be the hash table. Here, the L locations of the hash table are termed buckets. Every bucket provides accommodation for the data elements. However, to accommodate synonyms, that is, keys that map to the same bucket, it is essential that a provision be made in the buckets. We, therefore, partition buckets into what are called slots to accommodate synonyms. Thus, if bucket b has s slots, then s synonyms can be accommodated in bucket b. In the case of an array implementation of a hash table, the rows of the array indicate buckets and the columns the slots. In such a case, the hash table is represented as HT[0:L-1, 0:s-1]. The number of slots in a bucket needs to be decided based on the application. Figure 13.2 illustrates a general hash table implemented using a sequential data structure.

**Figure 13.2** *Hash table implemented using a sequential data structure*

EXAMPLE 13.2.–

Let us consider a set of keys {45, 98, 12, 55, 46, 89, 65, 88, 36, 21} to be represented as a hash table as shown in Figure 13.2. Let us suppose the hash function H is defined as H(X) = X mod 11. The hash table, therefore, has 11 buckets. We propose three slots per bucket. Table 13.1 shows the hash function values of the keys and Figure 13.3 shows the structure of the hash table.

Table 13.1 Hash function values of the keys (Example 13.2)

Key X	45	98	12	55	46	89	65	88	36	21
H(X)	1	10	1	0	2	1	10	0	3	10

Observe how keys {45, 12, 89}, {98, 65, 21} and {55, 88} are synonyms mapping to the same bucket 1, 10 and 0 respectively.The provision of three slots per bucket makes it possible to accommodate synonyms.

Figure 13.3 Hash table (Example 13.2)

Now, what happens if a synonym is unable to find a slot in the bucket? In other words, if the bucket is full, then where do we find a place for the synonyms? In such a case an overflow is said to have occurred. All collisions need not result in overflows. But in the case of a hash table with single slot buckets, collisions mean overflows.

The bucket to which the key is mapped by the hash function is known as the homebucket. To tackle overflows we move further down, beginning from the home bucket and look for the closest slot that is empty and place the key in it. Such a method of handling overflows is known as Linear probing or Linear open addressing or closed hashing.

EXAMPLE 13.3.–

Let us proceed to insert the keys {77, 34, 43} in the hash table discussed in Example 13.2. The hash function values of the keys are {0, 1, 10}. When we proceed to insert 77 in its home bucket 0, we find a slot is available and hence the insertion is done. In the case of 34, its home bucket 1 is full and hence there is an overflow. By linear probing, we look for the closest slot that is vacant and find one in the second slot of bucket 2. While inserting 43, we find bucket 10 to be full. The search for the closest empty slot proceeds by moving downward in a circular fashion until it finds a vacant place in slot 3 of bucket 2. Note the circular movement of searching the hash table while looking for an empty slot. Figure 13.3 illustrates the linear probing method undertaken for the listed keys. The keys which have been accommodated in places other than their home buckets are shown over grey background.

Figure 13.4 Linear open addressing (Example 13.3)

13.4.1. Operations on linear open addressed hash tables

Search: Searching for a key in a linear open addressed hash table proceeds on lines similar to that of insertion. However, if the searched key is available in the home bucket then the search is done. The time complexity in such a case is O(1). However, if there had been overflows while inserting the key, then a sequential search has to be called which searches through each slot of the buckets following the home bucket until either (i) the key is found or (ii) an empty slot is encountered in which case the search terminates or (iii) the search path has curled back to the home bucket. In the case of (i), the search is said to be successful. In the cases of (ii) and (iii), it is said to be unsuccessful.

EXAMPLE 13.4.–

Consider the snapshot of the hash table shown in Figure 13.5, which represents keys whose first character lies between ‘A’ and ‘I’, both inclusive. The hash function used is H(X) = ord(C) mod 10 where C is the first character of the alphabetical key X.

Figure 13.5 Illustration of search in a hash table

The search for keys F18 and G64 is straightforward since they are present in their home buckets, which are 6 and 7, respectively. The search for keys A91 and F78 for example, are slightly more involved, in the sense that though they are available in their respective home buckets they are accessed only after a sequential search for them is done in the slots corresponding to their buckets. On the other hand, the search for I99 fails to find it in its home bucket, which is 9. This, therefore, triggers a sequential search of every slot following the home bucket until the key is found, in which case the search is successful or until an empty slot is encountered, in which case the search is a failure. I99 is indeed found in slot 2 of bucket 2! Observe how the search path curls back to the top of the hash table from the home bucket of key I99. Let us now search for the key G93. The search proceeds to look into its home bucket (7) before a sequential search for the same is undertaken in the slots following the home bucket. The search stops due to its encountering an empty slot and therefore the search is deemed unsuccessful.

Algorithm 13.1 illustrates the search algorithm for a linear open addressed hash table.

Insert: the insertion of data elements in a linear open addressed hash table is executed as explained in the previous section. The hash function, which is quite often modulo arithmetic based, determines the bucket b and thereafter slot s in which the data element is to be inserted. In the case of overflow, we search for the closest empty slot beginning from the home bucket and accommodate the key in the slot. Algorithm 13.1 could be modified to execute the insert operation. The line

in the algorithm is replaced by

Delete: the delete operation on a hash table can be clumsy. When a key is deleted it cannot be merely wiped off from its bucket (slot). A deletion leaves the slot vacant and if an empty slot is chosen as a signal to terminate a search then many of the elements following the empty slot and displaced from their home buckets may go unnoticed. To tackle this it is essential that the keys following the empty slot be moved up. This can make the whole operation clumsy.

An alternative could be to write a special element in the slot every time a delete operation is done. This special element not only serves to camouflage the empty space ‘available’ in the deleted slot when a search is in progress but also serves to accommodate an insertion when an appropriate element assigned to the slot turns up.

However, it is generally recommended that deletions in a hash table be avoided as much as possible due to their clumsy implementation.

13.4.2. Performance analysis

The complexity of the linear open addressed hash table is dependent on the number of buckets. In the case of hash functions that follow modular arithmetic, the number of buckets is given by the divisor L.

The best case time complexity of searching for a key in a hash table is given by O(1) and the worst case time complexity is given by O(n), where n is the number of data elements stored in the hash table. A worst case occurs when all the n data elements map to the same bucket.

The time complexities when compared to those of their linear list counterparts are not in any way less. The best and worst case complexity of searching for an element in a linear list of n elements is respectively, O(1) and O(n). However, on average, the performance of the hash table is much more efficient than that of the linear lists. It has been shown that the average case performance of a linear open addressed hash table for an unsuccessful and successful search is given by

where U_n and S_n are the number of buckets examined on an average during an unsuccessful and successful search respectively. The average is considered over all possible sequences of the n keys X₁, X₂, ….X_n.. α is the loading factor of the hash table and is given by where b is the number of buckets. The smaller the loading factor better is the average case performance of the hash table in comparison to that of linear lists.

13.4.3. Other collision resolution techniques with open addressing

The drawbacks of linear probing or linear open addressing could be overcome to an extent by employing one or more of the following strategies:

Rehashing

A major drawback of linear probing is clustering or primary clustering wherein the hash table gives rise to long sequences of records with gaps in between the sequences. This leads to longer sequential searches especially when an empty slot needs to be found. The problem could be resolved to an extent by resorting to what is known as rehashing. In this, a second hash function is used to determine the slot where the key is to be accommodated. If the slot is not empty, then another function is called for, and so on.

Thus, rehashing makes use of at least two functions H, H’ where H(X), H’(X) map keys X to any one of the b buckets. To insert a key, H(X) is computed and the key X is accommodated in the bucket if it is empty. In the case of a collision, the second hash function H’(X) is computed and the search sequence for the empty slot proceeds by computing,

**Algorithm 13.1** *Procedure to search for a key X in a linear open addressed hash table*

Here, h₁, h₂, ... is the search sequence before an empty slot is found to accommodate the key. It needs to be ensured that H’(X) does not evaluate to 0, since there is no way this would be of help. A good choice for H’(X) is given by M – (X mod M) where M is chosen to be a prime smaller than the hash table size (see illustrative problem 13.6)

Quadratic probing

This is another method that can substantially reduce clustering. In this method when a collision occurs at address h, unlike linear probing which probes buckets in locations h+1, h+2 …. and so forth, the technique probes buckets at locations h+1, h+4, h+9,…. and so forth. In other words, the method probes buckets at locations (h + i²) mod b, i = 1, 2, …., where h is the home bucket and b is the number of buckets. However, there is no guarantee that the method gives a fair chance to probe all locations in the hash table. Though quadratic probing reduces primary clustering, it may result in probing the same set of alternate cells. Such a case known as secondary clustering occurs especially when the hash table size is not prime.

If b is a prime number then quadratic probing probes exactly half the number of locations in the hash table. In this case, the method is guaranteed to find an empty slot if the hash table is at least half empty (see illustrative problems 13.4 and 13.5).

Random probing

Unlike quadratic probing where the increment during probing was definite, random probing makes use of a random number generator to obtain the increment and hence the next bucket to be probed. However, it is essential that the random number generator function generates the same sequence. Though this method reduces clustering, it can be a little slow when compared to others.

13.5. Chaining

In the case of linear open addressing, the solution of accommodating synonyms in the closest empty slot may contribute to a deterioration in performance. For example, the search for a synonym key may involve sequentially going through every slot occurring after its home bucket before it is either found or unfound. Also, the implementation of the hash table using a sequential data structure such as arrays limits its capacity (b × s slots). While increasing the number of slots to minimize overflows may lead to wastage of memory, containing the number of slots to the bare minimum may lead to severe overflows hampering the performance of the hash table.

An alternative to overcome this malady is to keep all synonyms that are mapped to the same bucket chained to it. In other words, every bucket is maintained as a singly linked list with synonyms represented as nodes. The buckets continue to be represented as a sequential data structure as before, to favor the hash function computation. Such a method of handling overflows is called chaining or open hashing or separate chaining. Figure 13.6 illustrates a chained hash table.

In the figure, observe how the buckets have been represented sequentially and each of the buckets is linked to a chain of nodes which are synonyms mapping to the same bucket.

Chained hash tables only acknowledge collisions. There are no overflows per se since any number of collisions can be handled provided there is enough memory to handle them!

13.5.1. Operations on chained hash tables

Search: The search for a key X in a chained hash table proceeds by computing the hash function value H(X). The bucket corresponding to the value H(X) is accessed and a sequential search along the chain of nodes is undertaken. If the key is found then the search is termed successful otherwise unsuccessful. If the chain is too long maintaining the chain in order (ascending or descending) helps in rendering the search efficient.

Algorithm 13.2 illustrates the procedure to undertake a search in a chained hash table.

Insert: To insert a key X into a hash table, we compute the hash function H(X) to determine the bucket. If the key is the first node to be linked to the bucket then all that it calls for is a mere execution of a function to insert a node in an empty singly linked list. In the case of keys that are synonyms, the new key could be inserted either at the beginning or the end of the chain leaving the list unordered. However, it would be prudent and less expensive too, to maintain each of the chains in the ascending or descending order of the keys. This would also render the search for a specific key among its synonyms to be efficiently carried out.

**Algorithm 13.2** *Procedure to search for a key X in a chained hash table*

Algorithm 13.2 could be modified to insert a key. It merely calls for the insertion of a node in a singly linked list that is unordered or ordered.

Delete: Unlike that of linear open addressed hash tables, the deletion of a key X in a chained hash table is elegantly done. All that it calls for is a search for X in the corresponding chain and deletion of the respective node.

13.5.2. Performance analysis

The complexity of the chained hash table is dependent on the length of the chain of nodes corresponding to the buckets. The best case complexity of a search is O(1). A worst case occurs when all the n elements map to the same bucket and the length of the chain corresponding to that bucket is n, with the searched key turning out to be the last in the chain. The worst-case complexity of the search in such a case is O(n).

On an average, the complexity of the search operation on a chained hash table is given by

where U_n and S_n are the number of nodes examined on an average during an unsuccessful and successful search respectively. α is the loading factor of the hash table and is given by where b is the number of buckets.

The average case performance of the chained hash table is superior to that of the linear open addressed hash table.

13.6. Applications

In this section, we discuss the application of hash tables in the fields of compiler design, relational database query processing and file organization.

13.6.1. Representation of a keyword table in a compiler

In section 10.4.1, (Volume 2) the application of binary search trees and AVL trees for the representation of symbol tables in a compiler was discussed. Hash tables find applications in the same problem as well.

A keyword table which is a static symbol table is best represented by means of a hash table. Each time a compiler checks out a string to be a keyword or a user-id, the string is searched against the keyword table. An appropriate hash function could be designed to minimize collisions among the keywords and yield the bucket where the keyword could be found. A successful search indicates that the string encountered is a keyword and an unsuccessful search indicates that it is a user-id. Considering the significant fact that but for retrievals, no insertions or deletions are permissible on a keyword table, hash tables turn out to be one of the best propositions for the representation of symbol tables.

13.6.2. Hash tables in the evaluation of a join operation on relational databases

Relational databases support a selective set of operations, for example, selection, projection, join (natural join, equi-join) and so on, which aid query processing. Of these, the natural join operation is most commonly used in relational database management systems. As indicated by the notation , the operation works on two relations (databases) to combine them into a single relation. Given two relations R and S, a natural join operation of the two databases is indicated as R S. The resulting relation is a combination of the two relations based on attributes common to the two relations.

EXAMPLE 13.8.–

Consider the two relations ITEM_DESCRIPTION and VENDOR shown in Figure 13.10(a). The ITEM_DESCRIPTION relation describes the items and the VENDOR relation contains details about the vendors supplying the items. The relation ITEM_DESCRIPTION contains the attributes ITEM_CODE and ITEM_NAME. The VENDOR relation contains the attributes ITEM_CODE, VENDOR _NAME and ADDRESS (city).

A query pertaining to who the vendors are for a given item code calls for joining the two relations. The join of the two relations yields the relation shown in Figure 13.10(b). Observe how the natural join operation combines the two relations on the basis of their common attribute ITEM_CODE. Those tuples (rows) of the two relations having a common attribute value in the ITEM_CODE field are “joined” together to form the output relation.

Figure 13.10 Natural join of two relations

One method of evaluating a join is to use the hash method. Let H(X) be the hash function where X is the attribute value of the relations. Here H(X) is the address of the bucket which contains the attribute value and a pointer to the appropriate tuple corresponding to the attribute value. The pointer to the tuple is known as Tuple Identifier (TID). TIDs in general, besides containing the physical address of the tuple of the relation also hold identifiers unique to the relation. The hash tables are referred to as hash indexes in relational database terminology.

A natural join of the two relations R and S over a common attribute ATTRIB, results in each bucket of the hash indexes recording the attribute values of ATTRIB along with the TIDs of the tuples in relations R and S whose R.ATTRIB = S.ATTRIB.

When a query associated with the natural join is to be answered all that it calls for is to access the hash indexes to retrieve the appropriate TIDs associated with the query. Retrieving the tuples using the TIDs satisfies the query.

Assume that a query “List the vendor(s) supplying the item P402” is to be processed. To process this request, we first compute H(“P402”) which as shown in Figure 13.11(b) yields the bucket address 16. Accessing bucket 16 we find the TID corresponding to the relation VENDOR is 7001. To answer the query all that needs to be done is to retrieve the tuple whose TID is 7001.

A general query such as “List the vendors supplying each of the items” may call for sequentially searching each of the hash indexes corresponding to each attribute value of ITEM_CODE.

**Figure 13.11** *Evaluation of natural join operation using hash indexes*

13.6.3. Hash tables in a direct file organization

File organization deals with methods and techniques to structure data in external or auxiliary storage devices such as tapes, disks, drums and so forth. A file is a collection of related data termed records. Each record is uniquely identified by what is known as a key, which is a datum or a portion of data in the record. The major concern in all these methods is regarding the access time when records pertaining to the keys (primary or secondary) are to be retrieved from the storage devices to be updated or inserted or deleted. Some of the commonly used file organization schemes are sequential file organization, serial file organization, indexed sequential access file organization and direct file organization. Chapter 14 elaborately details files and their methods of organization.

The direct file organization (see section 14.8) which is a kind of file organization method, employs hash tables for the efficient storage and retrieval of records from the storage devices. Given a file of records, { f₁, f₂, f₃,……f_N} with keys { k₁, k₂, k₃,…k_N} a hash function H(k) where k is the record key, determines the storage address of each of the records in the storage device. Thus, direct files undertake direct mapping of the keys to the storage locations of the records with the records of the file organized as a hash table.

Summary

Hash tables are ideal data structures for dictionaries. They favor efficient storage and retrieval of data lists which are linear in nature.
A hash function is a mathematical function which maps keys to positions in the hash tables known as buckets. The process of mapping is called hashing. Keys which map to the same bucket are called as synonyms. In such a case a collision is said to have occurred. A bucket may be divided into slots to accommodate synonyms. When a bucket is full and a synonym is unable to find space in the bucket then an overflow is said to have occurred.
The characteristics of a hash function are that it must be easy to compute and at the same time minimize collisions. Folding, truncation and modular arithmetic are some of the commonly used hash functions.
A hash table could be implemented using a sequential data structure such as arrays. In such a case, the method of handling overflows where the closest slot that is vacant is utilized to accommodate the synonym key is called linear open addressing or linear probing. However, over the course of time, linear probing can lead to the problem of clustering thereby deteriorating the performance of the hash table to a mere sequential search!
The other alternative methods of handling overflows are rehashing, quadratic probing and random probing.
A linked implementation of a hash table is known as chaining. In this all the synonyms are chained to their respective buckets as a singly linked list. On an average, a chained hash table is superior in performance when compared to that of a linear probed hash table.
Hash tables have found applications in the design of symbol tables in compiler design, query processing in relational database management systems and direct file organizations.

13.7. Illustrative problems

PROBLEM 13.4.–

For the set of keys {17, 9, 34, 56, 11, 4, 71, 86, 55, 10, 39, 49, 52, 82, 31, 13, 22, 35, 44, 20, 60, 28} obtain a hash table HT[0..8, 0..2] following quadratic probing. Make use of the hash function H(X) = X mod 9. What are your observations?

Solution:

Quadratic probing employs the function (h + i²) mod n i = 1, 2, …, where n = 9, to determine the empty slot during collisions. Here h is the address of the home bucket given by the hash function H(X), where X is the key. The quadratic probed hash table is shown in Figure P13.4.

Figure P13.4 The hash table following quadratic probing and using the hash function h(X) = X mod 9, for the keys listed in illustrative problem 13.4

Note how during the insertion of keys 13 and 22, their home buckets, which is 4 is full. To handle this collision, quadratic probing begins searching buckets 4+1 mod 9, 4+2² mod 9, …. Since the first searched bucket 5 has empty slots the keys find accommodation there. However in the case of key 44, to handle its collision with bucket 8, quadratic probing searches for an empty slot as ordered by the sequence, 8+1 mod 9, 8+2² mod 9, ... The search for an empty slot is successful when the bucket 8+1² mod 9 is encountered. 44 is accommodated in slot 2 of bucket 0.

The case of inserting key 28 is interesting for, despite the hash table containing free slots, quadratic probing is unable to find an empty slot to accommodate the key. The sequence searched for is 1+1 mod 9, 1+2² mod 9, 1+3² mod 9, …….

An important observation regarding quadratic probing is that there is no guarantee of finding an empty slot in a quadratic probed hash table if the hash table size is not prime. In this example, the hash table size is not prime.

PROBLEM 13.6.–

For the set of keys {11, 55, 13, 35, 71, 52, 61, 9, 86, 31, 49, 85, 70} obtain the hash table HT[0..8, 0..1] which employs rehashing for collision resolution. Assume the hash function to be H(X) = X mod 9 and the rehashing function to be H’(X) = 7- (X mod 7). The collision resolution function is given by h_i = ( H(X)+ i. H’(X)) mod b, i=1, 2, …, where b is the number of buckets.

Solution:

The hash table for the problem is shown in Figure P13.6.

Figure P13.6 The hash table employing rehashing for collision resolution, for the set of keys listed in illustrative problem 13.6

Observe how during the insertion of key 49 a collision occurs and its bucket 4 (H(49)= 49 mod 9 = 4) is found to be full. Rehashing turns to the next hash function H’(49) = 7- (49 mod 7) to help obtain the empty slot to accommodate the key. The slot searched is h1 = (H(49) + 1. H’(49)) mod 9 = 2. Since the bucket contains a vacant slot, key 49 is accommodated in the slot.

In the case of key 85 which once again collides with the keys in bucket 4, rehashing computes H’(85) = 6 and h₁ = (H(85) + 1. H’(85)) mod 9 = 1. Key 85 is accommodated in bucket 1 slot 2. Finally, following similar lines, key 70 is accommodated in bucket 5.

PROBLEM 13.8.–

Fill in Table P13.8(a) with the number of comparisons made, when the elements shown in row 1 of the table ({66, 100, 55, 3, 99, 144}) are either successfully or unsuccessfully searched over the list of elements {66, 42, 96, 100, 3, 55, 99} when the latter is represented as (i) sequential list (ii) binary search tree and (iii) linear probing based hash table with single slot buckets, using the hash function h(X)= X mod 7.

Table P13.8(a). Search comparison table for illustrative problem 13.8

Representation of data elements	Number of comparisons
Representation of data elements	66	100	55	3	99	144
Sequential list
Binary search tree
Hash table

Solution:

Representing the elements of the list to be searched as a sequential list, yields {3, 42, 55, 66, 96, 99, 100}. The number of comparisons made for searching 66 is 4 and that for 144 which is an unsuccessful search is 7.

Representation of the elements in the list as a binary search tree is given below. The number of comparisons made for element 66 is 1 and that for 144 is 3.

Representation of the elements as a linear probed hash table with single slot buckets is shown below. The hash function used is h(X) = X mod 7. The data element displaced from the home bucket is shown over grey background. The number of comparisons made for element 66 is 1 and that for 144 is 7.

[0]	42
[1]	99
[2]	100
[3]	66
[4]	3
[5]	96
[6]	55

The comparisons for the rest of the elements are shown in Table P13.8(b).

Table P13.8(b). The completed search comparison table for the data listed in illustrative problem 13.8

Representation of data elements	Number of comparisons
Representation of data elements	66	100	55	3	99	144
Sequential list	4	7	3	1	6	7
Binary search tree	1	3	3	3	4	3
Hash table	1	1	1	2	1	7

Review questions

Hash tables are ideal data structures for --------------------
1. dictionaries
2. graphs
3. trees
4. none of these
State whether true or false:
In the case of a linear open addressed hash table with multiple slots in a bucket,
1. overflows always mean collisions, and
2. collisions always mean overflows
1. 1. true
  2. true
2. 1. true
  2. false
3. 1. false
  2. true
4. 1. false
  2. false
In the context of building hash functions, find the odd term out in the following list:
Folding, modular arithmetic, truncation, random probing
1. folding
2. modular arithmetic
3. truncation
4. random probing
In the case of a chained hash table of n elements with b buckets, assuming that a worst-case resulted in all the n elements getting mapped to the same bucket, then the worst-case time complexity of a search on the hash table would be given by
1. O(1)
2. O(n/b)
3. O(n)
4. O(b)
Match the following:
rehashing
collision resolution
folding
hash function
1. linear probing
  1. (A, i)) (B, ii)) (C, ii))
  2. (A, ii)) (B, ii)) (C, i))
  3. (A, ii)) (B, i)) (C, i))
  4. (A, i)) (B, ii)) (C, i))
What are the advantages of using modulo arithmetic for building hash functions?
How are collisions handled in linear probing?
How are insertions and deletions handled in a chained hash table?
Comment on the search operation for a key K in a list L represented as a
1. sequential list,
2. chained hash table, and
3. linear probed hash table
What is rehashing? How does it serve to overcome the drawbacks of linear probing?
The following is a list of keys. Making use of a hash function h(k) = k mod 11, represent the keys in a linear open addressed hash table with buckets containing (i) three slots and (ii) four slots.
For review problem 11, resolve collisions by means of i) rehashing that makes use of an appropriate rehashing function and ii) quadratic probing.
For review problem 11, implement a chained hash table.

Programming assignments

Implement a hash table using an array data structure. Design functions to handle overflows using i) linear probing, ii) quadratic probing and iii) rehashing. For a set of keys observe the performance when the methods listed above are executed.
Implement a hash table for a given set of keys using the chaining method of handling overflows. Maintain the chains in the ascending order of the keys. Design a menu-driven front-end interface, to perform the insert, delete and search operations on the hash table.
The following is a list of binary keys:

Design a hash function and an appropriate hash table to store and retrieve the keys efficiently. Compare the performance when the set is stored as a sequential list.
Store a dictionary of a limited set of words as a hash table. Implement a spell check program that, given an input text file, will check for the spelling using the hash table-based dictionary and in the case of misspelled words will correct the same.
Let TABLE_A and TABLE_B be two files implemented as a table. Design and implement a function JOIN (TABLE_A, TABLE_B) which will “natural join” the two files as discussed in section 13.6.2. Make use of an appropriate hash function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13 Hash Tables

Create new playlist

Sign In

Sign Up

13.1. Introduction

13.1.1. Dictionaries

13.2. Hash table structure

13.3. Hash functions

13.3.1. Building hash functions

13.4. Linear open addressing

13.4.1. Operations on linear open addressed hash tables

13.4.2. Performance analysis

13.4.3. Other collision resolution techniques with open addressing

13.5. Chaining

13.5.1. Operations on chained hash tables

13.5.2. Performance analysis

13.6. Applications

13.6.1. Representation of a keyword table in a compiler

13.6.2. Hash tables in the evaluation of a join operation on relational databases

13.6.3. Hash tables in a direct file organization

13.7. Illustrative problems

Review questions

Programming assignments

Table of Contents for
13 Hash Tables