In the next few exercises, I present several implementations of the Map
interface. One of them is based on a hash table, which is arguably the most magical data structure ever invented. Another, which is similar to TreeMap
, is not quite as magical, but it has the added capability that it can iterate the elements in order.
You will have a chance to implement these data structures, and then we will analyze their performance.
But before I can explain hash tables, I’ll start with a simple implementation of a Map
using a List
of key-value pairs.
As usual, I provide starter code and you will fill in the missing methods. Here’s the beginning of the MyLinearMap
class definition:
public class MyLinearMap<K, V> implements Map<K, V> { private List<Entry> entries = new ArrayList<Entry>();
This class uses two type parameters, K
, which is the type of the keys, and V
, which is the type of the values. MyLinearMap
implements Map
, which means it has to provide the methods in the Map
interface.
A MyLinearMap
object has a single instance variable, entries
, which is an ArrayList
of Entry
objects. Each Entry
contains a key-value pair. Here is the definition:
public class Entry implements Map.Entry<K, V> { private K key; private V value; public Entry(K key, V value) { this.key = key; this.value = value; } @Override public K getKey() { return key; } @Override public V getValue() { return value; } }
There’s not much to it; an Entry
is just a container for a key and a value. This definition is nested inside MyLinearList
, so it uses the same type parameters, K
and V
.
That’s all you need to do the exercise, so let’s get started.
In the repository for this book, you’ll find the source files for this exercise:
MyLinearMap.java
contains starter code for the first part of the exercise.
MyLinearMapTest.java
contains the unit tests for MyLinearMap
.
You’ll also find the Ant build file build.xml
.
Run ant build
to compile the source files. Then run ant
MyLinearMapTest
; several tests should fail, because you have some work to do!
First, fill in the body of findEntry
. This is a helper method that is not part of the Map
interface, but once you get it working, you can use it for several methods. Given a target key, it should search through the entries and return the entry that contains the target (as a key, not a value) or null
if it’s not there. Notice that I have provided an equals
method that compares two keys and handles null
correctly.
You can run ant MyLinearMapTest
again, but even if your findEntry
is correct, the tests won’t pass because put
is not complete.
Fill in put
. You should read the documentation of Map.put
at http://thinkdast.com/listput so you know what it is supposed to do. You might want to start with a version of put
that always adds a new entry and does not modify an existing entry; that way you can test the simple case first. Or if you feel more confident, you can write the whole thing at once.
Once you’ve got put
working, the test for containsKey
should pass.
Read the documentation of Map.get
at http://thinkdast.com/listget and then fill in the method. Run the tests again.
Finally, read the documentation of Map.remove
at http://thinkdast.com/maprem and fill in the method.
At this point, all tests should pass. Congratulations!
In this section I present a solution to the previous exercise and analyze the performance of the core methods. Here are findEntry
and equals
:
private Entry findEntry(Object target) { for (Entry entry: entries) { if (equals(target, entry.getKey())) { return entry; } } return null; } private boolean equals(Object target, Object obj) { if (target == null) { return obj == null; } return target.equals(obj); }
The runtime of equals
might depend on the size of the target
and the keys, but does not generally depend on the number of entries, n. So equals
is constant time.
In findEntry
, we might get lucky and find the key we’re looking for at the beginning, but we can’t count on it. In general, the number of entries we have to search is proportional to n, so findEntry
is linear.
Most of the core methods in MyLinearMap
use findEntry
, including put
, get
, and remove
. Here’s what they look like:
public V put(K key, V value) { Entry entry = findEntry(key); if (entry == null) { entries.add(new Entry(key, value)); return null; } else { V oldValue = entry.getValue(); entry.setValue(value); return oldValue; } }
public V get(Object key) { Entry entry = findEntry(key); if (entry == null) { return null; } return entry.getValue(); }
public V remove(Object key) { Entry entry = findEntry(key); if (entry == null) { return null; } else { V value = entry.getValue(); entries.remove(entry); return value; } }
After put
calls findEntry
, everything else is constant time. Remember that entries
is an ArrayList
, so adding an element at the end is constant time, on average. If the key is already in the map, we don’t have to add an entry, but we have to call entry.getValue
and entry.setValue
, and those are both constant time. Putting it all together, put
is linear.
By the same reasoning, get
is also linear.
remove
is slightly more complicated because entries.remove
might have to remove an element from the beginning or middle of the ArrayList
, and that takes linear time. But that’s OK: two linear operations are still linear.
In summary, the core methods are all linear, which is why we called this implementation MyLinearMap
(ta-da!).
If we know that the number of entries will be small, this implementation might be good enough, but we can do better. In fact, there is an implementation of Map
where all of the core methods are constant time. When you first hear that, it might not seem possible. What I’m saying, in effect, is that you can find a needle in a haystack in constant time, regardless of how big the haystack is. It’s magic.
I’ll explain how it works in two steps:
Instead of storing entries in one big List
, we’ll break them up into lots of short lists. For each key, we’ll use a hash code (explained in the next section) to determine which list to use.
Using lots of short lists is faster than using just one, but as I’ll explain, it doesn’t change the order of growth; the core operations are still linear. But there is one more trick: if we increase the number of lists to limit the number of entries per list, the result is a constant-time map. You’ll see the details in the next exercise, but first: hashing!
In the next chapter, I’ll present a solution, analyze the performance of the core Map
methods, and introduce a more efficient implementation.