Using the filter() function to pass or reject data

The job of the filter() function is to use and apply a decision function called a predicate to each value in a collection. A decision of True means that the value is passed; otherwise, the value is rejected. The itertools module includes filterfalse() as variations on this theme. Refer to Chapter 8, The Itertools Module, to understand the usage of the itertools module's filterfalse() function.

We might apply this to our trip data to create a subset of legs that are over 50 nautical miles long, as follows:

long= list(
filter(lambda leg: dist(leg) >= 50, trip))
)

The predicate lambda will be True for long legs, which will be passed. Short legs will be rejected. The output is the 14 legs that pass this distance test.

This kind of processing clearly segregates the filter rule (lambda leg: dist(leg) >= 50) from any other processing that creates the trip object or analyzes the long legs.

For another simple example, look at the following code snippet:

>>> filter(lambda x: x%3==0 or x%5==0, range(10))
<filter object at 0x101d5de50>
>>> sum(_)
23  

We've defined a simple lambda to check whether a number is a multiple of three or a multiple of five. We've applied that function to an iterable, range(10). The result is an iterable sequence of numbers that are passed by the decision rule.

The numbers for which the lambda is True are [0, 3, 5, 6, 9], so these values are passed. As the lambda is False for all other numbers, they are rejected.

This can also be done with a generator expression by executing the following code:

>>> list(x for x in range(10) if x%3==0 or x%5==0)
[0, 3, 5, 6, 9]

We can formalize this using the following set comprehension notation:

This says that we're building a collection of x values such that x is in range(10) and x%3==0 or x%5==0. There's a very elegant symmetry between the filter() function and formal mathematical set comprehensions.

We often want to use the filter() function with defined functions instead of lambda forms. The following is an example of reusing a predicate defined earlier:

>>> from ch01_ex1 import isprimeg
>>> list(filter(isprimeg, range(100)))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]  

In this example, we imported a function from another module called isprimeg(). We then applied this function to a collection of values to pass the prime numbers and rejected any non-prime numbers from the collection.

This can be a remarkably inefficient way to generate a table of prime numbers. The superficial simplicity of this is the kind of thing lawyers call an attractive nuisance. It looks like it might be fun, but it doesn't scale well at all. The isprimeg() function duplicates all of the testing effort for each new value. Some kind of cache is essential to provide redoing the testing of primality. A better algorithm is the Sieve of Eratosthenes; this algorithm retains the previously located prime numbers and uses them to prevent recalculation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset