Deciding what to hash

Not every object should provide a hash value. Specifically, if we're creating a class of stateful, mutable objects, the class should never return a hash value. There should not be an implementation of the __hash__() method.

Immutable objects, on the other hand, might sensibly return a hash value so that the object can be used as the key in a dictionary, or as a member of a set. In this case, the hash value needs to parallel the way the test for equality works. It's bad to have objects that claim to be equal and have different hash values. The reverse—objects with the same hash that are actually not equal – is acceptable and expected. Several distinct objects may happen to have overlapping hash values.

There are three tiers of equality comparison:

  • The same hash value: This means that two objects could be equal. The hash value provides us with a quick check for likely equality. If the hash value is different, the two objects cannot possibly be equal, nor can they be the same object.
  • Compare as equal: This means that the hash values must also have been equal. This is the definition of the == operator. The objects may be the same object.
  • Same ID value: This means that they are the same object. They also compare as equal and will also have the same hash value. This is the definition of the is operator.

The Fundamental Law of Hash (FLH) has two parts:

  • Objects that compare as equal have the same hash value.
  • Objects with the same hash value may actually be distinct and not compare as equal.

We can think of a hash comparison as being the first step in an equality test. This is a very fast comparison to determine whether subsequent equality checks are necessary. 

The __eq__() method, which we'll also look at in the section on comparison operators, is intimately tied up with hashing. This provides a potentially slow field-by-field comparison between objects.

Here's a contrived example of two distinct numeric values with the same hash value:

>>> v1 = 123_456_789
>>> v2 = 2_305_843_009_337_150_740
>>> hash(v1)
123456789
>>> hash(v2)
123456789
>>> v2 == v1
False

Notice that a v1 integer, equal to 123,456,789, has a hash value equal to itself. This is typical of integers up to the hash modulus. The v2 integer has the same hash, but a different actual value.

This hash collision is expected. It's part of the known processing overhead when creating sets or dictionaries. There will be unequal objects that are reduced to coincidentally equal hash values. 

There are three use cases for defining equality tests and hash values via the __eq__() and __hash__() method functions:

  • Immutable objects: These are stateless objects of types such as tuples, namedtuples, and frozensets that cannot be updated. We have two choices:
    • Define neither __hash__() nor __eq__(). This means doing nothing and using the inherited definitions. In this case, __hash__() returns a trivial function of the ID value for the object, and __eq__() compares the ID values. 
    • Define both __hash__() and __eq__(). Note that we're expected to define both for an immutable object.
  • Mutable objects: These are stateful objects that can be modified internally. We have one design choice:
    • Define __eq__() but set __hash__ to None. These cannot be used as dict keys or items in sets.

The default behavior for immutable objects will be undesirable when an application requires two distinct objects to compare as equal. For example we might want two instances of Card(1, Clubs) to test as equal and compute the same hash; this will not happen by default. For this to work, we'll need to override the __hash__() and __eq__() methods.

Note that there's an additional possible combination: defining __hash__() but using a default definition for __eq__(). This is simply a waste of code, as the default __eq__() method is the same as the is operator. Using the default __hash__() method would have involved writing less code for the same behavior.

We'll look at each of these three situations in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset