The counter collection

One of the most common use cases for a defaultdict class is when accumulating counts of key instances. A simple way to count keys looks like this:

frequency = defaultdict(int) 
for k in some_iterator(): 
    frequency[k] += 1 

This example counts the number of times each key value, k, appears in the sequence of values from some_iterator().

This use case is so common that there's a variation on the defaultdict theme that performs the same operation shown in the preceding codeā€“it's called Counter. A Counter collection, however, is considerably more sophisticated than a simple defaultdict class.

Here's an example that creates a frequency histogram from some source of data showing values in descending order by frequency:

from collections import Counter
frequency = Counter(some_iterator()) 
for k, freq in frequency.most_common(): 
    print(k, freq) 

This example shows us how we can easily gather statistical data by providing any iterable item to Counter. It will gather frequency data on the values in that iterable item. In this case, we provided an iterable function named some_iterator(). We might have provided a sequence or some other collection.

We can then display the results in descending order of popularity. But wait! That's not all.

The Counter collection is not merely a simplistic variation of the defaultdict collection. The name is misleading. A Counter object is actually a aaaaaaaaaaaaaaaamultiset, sometimes called a bag.

It's a collection that is set-like, but allows repeat values in the bag. It is not a sequence with items identified by an index or position; order doesn't matter. It is not a mapping with keys and values. It is like a set in which items stand for themselves and order doesn't matter. But, it is unlike a set because, in this case, elements can repeat.

As elements can repeat, the Counter object represents multiple occurrences with an integer count. Hence, it's used as a frequency table. However, it does more than this. As a bag is like a set, we can compare the elements of two bags to create a union or an intersection.

Let's create two bags:

>>> bag1 = Counter("aardwolves")
>>> bag2 = Counter("zymologies")
>>> bag1 
Counter({'a': 2, 'o': 1, 'l': 1, 'w': 1, 'v': 1, 'e': 1,
'd': 1, 's': 1, 'r': 1}) >>> bag2 Counter({'o': 2, 'm': 1, 'l': 1, 'z': 1, 'y': 1, 'g': 1,
'i': 1, 'e': 1, 's': 1})

We built each bag by examining a sequence of letters. For characters that occur more than once, there's a count that is more than one.

We can easily compute the union of the two bags:

>>> bag1+bag2 
Counter({'o': 3, 's': 2, 'l': 2, 'e': 2, 'a': 2, 'z': 1, 
'y': 1, 'w': 1, 'v': 1, 'r': 1, 'm': 1, 'i': 1, 'g': 1,
'd': 1})

This shows us the entire suite of letters between the two strings. There were three instances of o. Not surprisingly, other letters were less popular.

We can just as easily compute the difference between the bags:

>>> bag1-bag2 
Counter({'a': 2, 'w': 1, 'v': 1, 'd': 1, 'r': 1}) 
>>> bag2-bag1 
Counter({'o': 1, 'm': 1, 'z': 1, 'y': 1, 'g': 1, 'i': 1}) 

The first expression shows us characters in bag1 that were not in bag2.

The second expression shows us characters in bag2 that were not in bag1. Note that the letter o occurred twice in bag2 and once in bag1. The difference only removed one of the o characters from bag1.

In the next section, we'll see how to create new kinds of collections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset