Caching with decorators

As you can see, geocoding takes time—working with a server takes time, as does being nice and waiting between requests. Thus, we probably don't want to waste time asking the same questions over and over again. For example, if many records within the same sessions have the same address, it makes sense to pull that data once, and then reuse it. Specifics may depend on the nature of the data. Namely, if we're checking air ticket availability, we shouldn't cache the results—the data might change any second. But for geolocation, we don't anticipate any changes any time soon.

The process of storing data we've pulled locally and then using it instead of getting the same data again is called caching. For example, all modern browsers do this—they cache some secondary elements of the web page for you to use and they're kept for a certain period of time. Caching can have different forms. We can store information in memory for the current session, or store it to a disk to be able to retrieve it in other sessions (or by other processes).

Here, we'll go with the first option—especially as everything we need is built into Python itself. All hail the lru_cache function, part of the functools standard library. The name LRU represents a specific algorithm we use and stands for Least Recently Used. lru_cache stores N last requests, starting with the most recent one (so that it will be retrieved faster). Once the limit is surpassed, the oldest values will be thrown out. 

But how can we neatly intervene in the request process to pull local data or cache the new result? Here, we'll use one more trick from up Python's sleeve—decorators. Consider the following example:

def title(f):
def _title(*args, **kwargs):
return f'<h1>{f(arg)}</h1>'
return _title

Here, title is a decorator function that wraps a given function, f, and returns another function that executes f from inside. Here is how it can be used:

>>> def mytext(x):
return str(x)

>>> MyTitle = title(mytext)
>>> MyTitle('hello')
<h1>hello</h1>

In other words, we inject our function inside another one that can run something else before and/or after running it! The preceding operation is a little clumsy (and long)—that's why Python has decorators, which are merely a syntactic sugar to make this pattern shorter. Here is exactly the same code, using decorator:

>>> @title
>>> def MyTitle(x):
return str(x)

>>> MyTitle('hello')
<h1>hello</h1>

As you can see, the actual function we're running is _title, which is using MyTitle inside—and we don't need to create this "initial" function. Neat! But when is it useful?

Actually, quite often! Decorators are usually nice when you need some sort of framework to take your code and run within a certain context. We'll see this pattern quite often in Chapter 17Let's Build a Dashboard, and Chapter 18, Serving Models with a RESTful APIs—for many web-related frameworks, it is easy to decorate your code with the application, which will then route and execute a given function when needed.

Now, how is that connected to caching? Simple: because it follows the same pattern. For a given function, we can initiate a caching data store, and then on any invocation of a function, check whether the corresponding values are pulled already and use them if they are. If they are not, we can run the function, store data in the cache, and return the values. See? It is exactly the decorator pattern. And, indeed, here is how it might look (we show only the first lines of the function to keep it short):

from functools import lru_cache

@lru_cache(maxsize=2000) # lru decorator added
def nominatim_geocode(address, format='json', limit=1, **kwargs):
'''thin wrapper around nominatim API.
...

As you can see, using the cache required just two lines here: one to import the function, and another right before the function declaration. Here, maxsize means the maximum number of values to store before starting to drop the old ones. The great part is that we—or anyone using the codedon't need to change anything in the external code; everything looks like it was an ordinary function with no caching. 

lru_cache will only store results for the time of the session. If your script timed out or exits with an exception, everything is lost. If you want to store cached data to disk, consider using third-party tools, such as joblib or python-diskcache. Both can store information to disk and retrieve it from any session, as long as the files are intact.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset