Reducing sets of data with the reduce() function

The sum(), len(), max(), and min() functions are in a way all specializations of a more general algorithm expressed by the reduce() function. The reduce() function is a higher-order function that folds a function into each pair of items in an iterable.

A sequence object is given as follows:

d = [2, 4, 4, 4, 5, 5, 7, 9]

The function reduce(lambda x, y: x+y, d) will fold in + operators to the list as follows:

2+4+4+4+5+5+7+9

It can help to include () to show the effective left-to-right grouping as follows:

((((((2+4)+4)+4)+5)+5)+7)+9

Python's standard interpretation of expressions involves a left-to-right evaluation of operators. Consequently, a fold left isn't a change in meaning. Some functional programming languages offer a fold-right alternative. When used in conjunction with recursion, a compiler for another language can do a number of clever optimizations. This isn't available in Python: a reduction is always left to right.

We can also provide an initial value as follows:

reduce(lambda x, y: x+y**2, iterable, 0)  

If we don't, the initial value from the sequence is used as the initialization. Providing an initial value is essential when there's a map() function as well as a reduce() function. The following is how the right answer is computed with an explicit 0 initializer:

0 + 2**2 + 4**2 + 4**2 + 4**2 + 5**2 + 5**2 + 7**2 + 9**2

If we omit the initialization of 0, the reduce() function uses the first item as an initial value. This value does not have the transformation function applied, which leads to the wrong answer. In effect, the reduce() without a proper initial value is computing this:

2 + 4**2 + 4**2 + 4**2 + 5**2 + 5**2 + 7**2 + 9**2

This kind of mistake is part of the reason why reduce() must be used carefully.

We can define a number of built-in reductions using the reduce() higher-order function as follows:

sum2 = lambda data: reduce(lambda x, y: x+y**2, data, 0)
sum = lambda data: reduce(lambda x, y: x+y, data, 0)
count = lambda data: reduce(lambda x, y: x+1, data, 0)
min = lambda data: reduce(lambda x, y: x if x < y else y, data)
max = lambda data: reduce(lambda x, y: x if x > y else y, data)

The sum2() reduction function is the sum of squares, useful for computing the standard deviation of a set of samples. This sum() reduction function mimics the built-in sum() function. The count() reduction function is similar to the len() function, but it can work on an iterable, whereas the len() function can only work on a materialized collection object.

The min() and max() functions mimic the built-in reductions. Because the first item of the iterable is used for initialization, these two functions will work properly. If we provided any initial value to these reduce() functions, we might incorrectly use a value that never occurred in the original iterable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset