Memoizing previous results with lru_cache

The lru_cache decorator transforms a given function into a function that might perform more quickly. The LRU means Least Recently Used—a finite pool of recently used items is retained. Items not frequently used are discarded to keep the pool to a bounded size.

Since this is a decorator, we can apply it to any function that might benefit from caching previous results. We can use it as follows:

from functools import lru_cache
@lru_cache(128)
def fibc(n: int) -> int:
    if n == 0: return 0
    if n == 1: return 1
    return fibc(n-1) + fibc(n-2)

This is an example based on Chapter 6, Recursions and Reductions. We've applied the @lru_cache decorator to the naive Fibonacci number calculation. Because of this decoration, each call to the fibc(n) function will now be checked against a cache maintained by the decorator. If the argument n is in the cache, the previously computed result is used instead of doing a potentially expensive re-calculation. Each return value is added to the cache.

We highlight this example because the naive recursion is quite expensive in this case. The complexity of computing any given Fibonacci number, , involves not merely computing  but also . This tree of values leads to a complexity in the order of .

The argument value 128 is the size of the cache. This is used to limit the amount of memory used for the cache. When the cache is full, the LRU item is replaced.

We can try to confirm the benefits empirically using the timeit module. We can execute the two implementations a thousand times each to see how the timing compares. Using the fib(20) and fibc(20) methods shows just how costly this calculation is without the benefit of caching. Because the naive version is so slow, the timeit number of repetitions was reduced to only 1,000. Here are the results:

  • Naive 3.23
  • Cached 0.0779

Note that we can't trivially use the timeit module on the fibc() function. The cached values will remain in place, we'll only compute the fibc(20) function once, which populates this value in the cache. Each of the remaining 999 iterations will simply fetch the value from the cache. We need to actually clear the cache between uses of the fibc() function or the time drops to almost zero. This is done with a fibc.cache_clear() method built by the decorator.

The concept of memoization is powerful. There are many algorithms that can benefit from memoization of results.

The number of combinations of p things taken in groups of r is often stated as follows:

This binomial function involves computing three factorial values. It might make sense to use an @lru_cache decorator on a factorial function. A program that calculates a number of binomial values will not need to re-compute all of those factorials. For cases where similar values are computed repeatedly, the speedup can be impressive. For situations where the cached values are rarely reused, the overheads of maintaining the cached values outweigh any speedups.

When computing similar values repeatedly, we see the following:

  • Naive factorial 0.174
  • Cached factorial 0.046

It's important to recognize that the cache is a stateful object. This design pushes the edge of the envelope on purely functional programming. One functional ideal is to avoid changes of state. This concept of avoiding stateful variables is exemplified by a recursive function, the current state is contained in the argument values, and not in the changing values of variables. We've seen how tail-call optimization is an essential performance improvement to assure that this idealized recursion actually works nicely with the available processor hardware and limited memory budgets. In Python, we do this tail-call optimization manually by replacing the tail recursions with a for loop. Caching is a similar kind of optimization, we'll implement it manually as needed, knowing that it isn't purely functional programming.

In principle, each call to a function with an LRU cache has two results, the expected result and a new cache object available for future evaluations of the function. Pragmatically, the cache object is encapsulated inside the decorated version of the fibc() function, and it isn't available for inspection or manipulation.

Caching is not a panacea. Applications that work with float values might not benefit much from memoization because float values are often approximations. The least-significant bits of a float value should be seen as random noise that can prevent the exact equality test in the lru_cache decorator from working.

We'll revisit this in Chapter 16, Optimizations and Improvements. We'll look at some additional ways to implement this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset