Memoization and caching

As we saw in Chapter 10, The Functools Module, many algorithms can benefit from memoization. We'll start with a review of some previous examples to characterize the kinds of functions that can be helped with memoization.

In Chapter 6, Recursions and Reductions, we looked at a few common kinds of recursions. The simplest kind of recursion is a tail recursion with arguments that can be easily matched to values in a cache. If the arguments are integers, strings, or materialized collections, then we can compare arguments quickly to determine if the cache has a previously computed result.

We can see from these examples that integer numeric calculations, such as computing factorial or locating a Fibonacci number, will be obviously improved. Locating prime factors and raising integers to powers are more examples of numeric algorithms that apply to integer values.

When we looked at the recursive version of a Fibonacci number, , calculator, we saw that it contained two tail-call recursions. Here's the definition:

This can be turned into a loop, but any design change requires some thinking. The memoized version of a recursive definition can be quite fast and doesn't require quite so much thinking to design.

The Syracuse function is an example of the kind of function used to compute fractal values. Here's the Syracuse function, .

Applied recursively, there's a chain of values, C, from a given starting value, n.

The Collatz conjecture is the Syracuse function always leads to 1. The values starting with form a loop of 1, 4, 2, 1, ... Exploring the behavior of this function requires memoized intermediate results. An interesting question is locating the extremely long sequences. See https://projecteuler.net/problem=14 for an interesting problem that requires careful use of caching.

The recursive application of the Syracuse function is an example of a function with an "attractor," where the value is attracted to 1. In some higher dimensional functions, the attractor can be a line or perhaps a fractal curve. When the attractor is a point, memoization can help; otherwise, memoization may actually be a hindrance, since each fractal value is unique.

When working with collections, the benefits of caching may vanish. If the collection happens to have the same number of integer values, strings, or tuples, then there's a chance that the collection is a duplicate and time can be saved. However, if a calculation on a collection will be needed more than once, manual optimization is best: do the calculation once and assign the results to a variable.

When working with iterables, generator functions, and other lazy objects, caching or memoization is not going to help at all. Lazy functions already do the least amount of work to provide the next value from a source sequence.

Raw data that includes measurements often use floating point values. Since an exact equality comparison between floating point values may not work out well, memoizing intermediate results may not work out well, either.

Raw data that includes counts, however, may benefit from memoization. These are integers, and we can trust exact integer comparisons to (potentially) save recalculating a previous value. Some statistical functions, when applied to counts, can benefit from using the fractions module instead of floating point values. When we replace x/y with the Fraction(x,y) method, we've preserved the ability to do exact value matching. We can produce the final result using the float(some_fraction) method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset