© J. Burton Browning and Marty Alchin 2019
J. Burton Browning and Marty AlchinPro Python 3https://doi.org/10.1007/978-1-4842-4385-5_3

3. Functions

J. Burton Browning1  and Marty Alchin2
(1)
Oak Island, NC, USA
(2)
Agoura Hills, CA, USA
 

At the core of any programming language is the notion of functions, but we tend to take them for granted. Sure, there’s the obvious fact that functions allow code to be encapsulated into individual units, which can be reused rather than being duplicated all over the place. But Python takes this beyond just the notion of what some languages allow, with functions being full-fledged objects that can be passed around in data structures, wrapped in other functions, or replaced entirely by new implementations.

In fact, Python provides enough flexibility with functions that there are actually several different types of functions, reflecting the various forms of declaration and purposes. Understanding each of these types of functions will help you decide which is appropriate for each situation you encounter while working with your own code. This chapter explains each of them in turn, as well as a variety of features you can take advantage of to extend the value of each function you create, regardless of its type.

At their core all functions are essentially equal, regardless of which of the following sections they fit into. The built-in function type forms their basis, containing all the attributes necessary for Python to understand how to use them:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figa_HTML.jpg
>>> def example():
...     pass
...
>>> type(example)
<type 'function'>
>>> example
<function example at 0x...>

Of course, there are still a number of different types of functions and as many different ways of declaring them. First off, let’s examine one of the most universal aspects of functions.

Arguments

Most functions need to take some arguments in order to do anything useful. Normally, that means defining them in order in the function (the declaration’s) signature, and then supplying them in the same order when calling that function later. Python supports that model, but also supports passing keyword arguments and even arguments that won’t be known until the function is called.

One of the most common advantages of Python’s keyword arguments is that you can pass arguments in a different order than the way they were defined in the function. You can even skip arguments entirely, as long as they have a default value defined. This flexibility helps encourage the use of functions that support lots of arguments with default values.

Explicit is Better than Implicit

One way that Python’s keyword arguments encourage being explicit is to only allow arguments to be passed out of order if they’re passed by keyword. Without keywords, Python needs to use the position of the argument to know which parameter name to bind to it when the function runs. Because keywords are just as explicit as positions, the ordering requirement can be lifted without introducing ambiguity.

In fact, keywords are even more explicit than positions when working with arguments, because the function call documents the purpose of each argument. Otherwise, you’d have to look up the function definition in order to understand its arguments. Some arguments may be understandable in context, but most optional arguments aren’t obvious at a glance, so passing them with keywords makes for more readable code.

Planning for Flexibility

Planning parameter names, order, and default values is particularly important for functions intended to be called by someone who didn’t write them, such as those in distributed applications. If you don’t know the exact needs of the users who will eventually be using your code, it’s best to move any assumptions you may have into arguments that can be overridden later.

As an extremely simple example, consider a function that appends a prefix to a string:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figb_HTML.jpg
def add_prefix(my_string):
    """Adds a 'pro_' prefix before the new string is returned."""
    return 'pro_' + my_string
final_string=input('Enter a string so we can put pro_ in front of it!:  ')
print(add_prefix(final_string))

The 'pro_' prefix here may make sense for the application it was written for, but what happens when anything else wants to use it? Right now, the prefix is hard-coded into the body of the function itself, so there’s no available alternative. Moving that assumption into an argument makes for an easy way to customize the function later:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figc_HTML.jpg
def add_prefix(my_string, prefix="pro_"):
    """Adds a 'pro_' prefix before the string provided, a default value."""
    return prefix + my_string
final_string=input("Enter a string so we can put pro_ in front of it!:  ")
print(add_prefix(final_string))

The function call without the prefix argument doesn’t need to change, so existing code works just fine. The section on preloading arguments later in this chapter shows how even the prefix can be changed and still be used by code that doesn’t know about it.

Of course, this example is far too simple to provide much real-world value, but the functions illustrated throughout the rest of this book will take advantage of plenty of optional arguments, showing their value in each situation.

Variable Positional Arguments

Most functions are designed to work on a specific set of arguments, but some can handle any number of arguments, acting on each in turn. These may be passed into a single argument as a tuple, list, or other iterable.

Take a typical shopping cart, for example. Adding items to the cart could be done one at a time or in batches. Using a definition of a class, with a function inside, here’s how it could be done, using a standard argument:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figd_HTML.jpg
class ShoppingCart:
    def add_to_cart(items):
        self.items.extend(items)

That would certainly do the trick, but now consider what that means for all the code that has to call it. The common case would be to add just a single item, but as the function always accepts a list, it would end up looking something like this:

../images/330715_3_En_3_Chapter/330715_3_En_3_Fige_HTML.jpg
cart.add_to_cart([item])

So we’ve basically sabotaged the majority case in order to support the minority. Worse yet, if add_to_cart() originally supported just one item and was changed to support multiples, this syntax would break any existing calls, requiring you to rewrite them just to avoid a TypeError.

Ideally, the method should support the standard syntax for single arguments, while still supporting multiple arguments. By adding an asterisk before an argument name, you can specify that that all remaining positional arguments are collected into one tuple bound to the argument prefixed with an asterisk that didn’t get assigned to anything before it. In this case there are no other arguments, so variable positional arguments can make up the entire argument list:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figf_HTML.jpg
    def add_to_cart(*items):
        self.items.extend(items)

Now, the method can be called with any number of positional arguments rather than having to group those arguments first into a tuple or list. The extra arguments are bundled in a tuple automatically before the function starts executing. This cleans up the common case, while still enabling more arguments as needs require. Here are a few examples of how the method could be called:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figg_HTML.jpg
cart.add_to_cart(item)
cart.add_to_cart(item1, item2)
cart.add_to_cart(item1, item2, item3, item4, item5)

There is still one more way to call this function that allows the calling code to support any number of items as well, but it’s not specific to functions that are designed to accept variable arguments. See the section on invoking functions with variable arguments for all the details.

Variable Keyword Arguments

Functions may need to take extra configuration options, particularly if passing those options to some other library further down the line. The obvious approach would be to accept a dictionary, which can map configuration names to their values:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figh_HTML.jpg
class ShoppingCart:
    def __init__(self, options):
        self.options = options

Unfortunately, that ends up with a problem similar to the one we encountered with positional arguments described in the previous section. The simple case in which you only override one or two values gets fairly complicated. Here are two ways the function call could look, depending on preference:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figi_HTML.jpg
options = {'currency': 'USD'}
cart = ShoppingCart(options)
cart = ShoppingCart({'currency': 'USD'})

Of course, this approach doesn’t scale any prettier than the list provided in the positional argument problem from the previous section. Also, like the previous problem, this can be problematic. If the function you’re working with were previously set up to accept some explicit keyword arguments, the new dictionary argument would break compatibility.

Instead, Python offers the ability to pass a variable number of keyword arguments by adding two asterisks before the name of the argument that will accept them. This allows for the much friendlier keyword argument syntax, while still allowing for a fully dynamic function call. Examine the following stub:
    def __init__(self, **options):
        self.options = options
Now consider what that same stub function from earlier would look like, given that the function now takes arbitrary keyword arguments:
cart = ShoppingCart(currency='USD')

Caution

When working with variable arguments, there’s one difference between positional and keyword arguments that can cause problems. Positional arguments are grouped into a tuple, which is immutable, while keyword arguments are placed into a dictionary, which is mutable (changeable).

Beautiful is Better than Ugly

The second function call example here is a classic example of code that would generally be considered ugly by many Python programmers. The sheer volume of punctuation—quotation marks around both the key and value, a colon between them, and curly braces around the whole thing—inside the already necessary parentheses make it very cluttered and difficult to process at a glance.
E.g. cart = ShoppingCart({'currency': 'USD'})

By switching to keyword arguments, as shown in this section, the appearance of the code is considerably better aligned with Python’s core values and philosophy. Beauty may be subjective in its very nature, but certain subjective decisions are praised by the vast majority of the programmers.

Combining Different Kinds of Arguments

These options for variable arguments combine with the standard options, such as required and optional arguments. In order to make sure everything meshes nicely, Python has some very specific rules for defining parameters in a function signature. There are only four types of arguments, listed here in the order they generally appear in functions:
  • Required arguments

  • Optional arguments

  • Variable number of positional arguments

  • Variable keyword arguments

Putting the required arguments first in the list ensures that positional arguments satisfy the required arguments prior to getting into the optional arguments. Variable arguments can only pick up values that didn’t fit into anything else, so they naturally get defined at the end. Here’s how this stub would look in a typical function definition:
def create_element(name, editable=True, *children, **attributes):

This same ordering can be used when calling functions, but it has one shortcoming. In this example, you’d have to supply a value for editable as a positional argument in order to pass in any children at all. It’d be better to be able to supply them right after the name, avoiding the optional editable argument entirely most of the time.

To support this, Python also allows variable positional arguments to be placed among standard arguments. Both required and optional arguments can be positioned after the variable argument, but now they must be passed by keyword. All the arguments are still available, but the less common ones become more optional when not required and more explicit when they do make sense.

In the Face of Ambiguity, Refuse the Temptation to Guess

By allowing positional arguments in the middle of a list of explicit arguments, Python might have introduced a considerable ambiguity. Consider a function defined to pass commands through to an arbitrary argument: perform_action(action, *args, log_output=False). Ordinarily, you can supply enough positional arguments to reach even the optional arguments, but in this case, what would happen if you supplied three or more values?

One possible interpretation is to give the first value to the first argument, the last value to the last argument, and everything else to the variable argument. That could work, but then it comes down to a guess as to the intent of the programmer making the call. Once you consider a function with even more arguments behind the variable argument, the possible interpretations become quite numerous.

Instead, Python strictly enforces that everything after the variable argument becomes accessible by keyword only. Positional argument values beyond those explicitly defined in the function go straight into the variable argument, whether just one or dozens were provided. The implementation becomes easy to explain by having just one way to do it, and it’s even more explicit by enforcing the use of keywords.

An added feature of this behavior is that explicit arguments placed after variable positional arguments can still be required. The only real difference between the two types of placement is the requirement of using keyword arguments; whether the argument requires a value still depends on whether you define a default argument:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figj_HTML.jpg
>>> def join_with_prefix(prefix, *segments, delimiter):
...     return delimiter.join(prefix + segment for segment in segments)
...
>>> join_with_prefix('P', 'ro', 'ython')
Traceback (most recent call last):
  ...
TypeError: join_with_prefix() needs keyword-only argument delimiter
>>> join_with_prefix('P', 'ro', 'ython', ' ')
Traceback (most recent call last):
  ...
TypeError: join_with_prefix() needs keyword-only argument delimiter
>>> join_with_prefix('P', 'ro', 'ython', delimiter=' ')
'Pro Python'

Note

If you want to accept keyword-only arguments but you don’t have a good use for variable positional arguments, simply specify a single asterisk without an argument name. This tells Python that everything after the asterisk is keyword-only, without also accepting potentially long sets of positional arguments. One caveat is that if you also accept variable keyword arguments, you must supply at least one explicit keyword argument. Otherwise, there’s really no point in using the bare asterisk notation, and Python will raise a SyntaxError.

In fact, remember that the ordering requirements of required and optional arguments are solely intended for the case of positional arguments. With the ability to define arguments as being keyword-only, you’re now free to define them as required and optional in any order, without any complaints from Python. Ordering isn’t important when calling the function, so it’s also not important when defining the function. Consider rewriting the previous example to require the prefix as a keyword argument, while also making the delimiter optional:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figk_HTML.jpg
>>> def join_with_prefix(*segments, delimiter=' ', prefix):
...     return delimiter.join(prefix + segment for segment in segments)
>>> join_with_prefix('ro', 'ython', prefix="P")
'Pro Python'

Caution

Be careful taking advantage of this level of flexibility, because it’s not very straightforward compared to how Python code is typically written. It’s certainly possible, but it runs contrary to what most Python programmers will expect, which can make it difficult to maintain in the long run.

In all cases, however, variable keyword arguments must be positioned at the end of the list, after all other types of arguments.

Invoking Functions with Variable Arguments

In addition to being able to define arguments that can accept any number of values, the same syntax can be used to pass values into a function call. The big advantage to this is that it’s not restricted to arguments that were defined to be variable in nature. Instead, you can pass variable arguments into any function, regardless of how it was defined. The * unpacks an iterable and passes its contents as separate arguments.

The same asterisk ( * ) notation is used to specify variable arguments, which are then expanded into a function call as if all the arguments were specified directly. A single asterisk specifies positional arguments, while two asterisks specify keyword arguments. This is especially useful when passing in the return value of a function call directly as an argument, without assigning it to individual variables first:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figl_HTML.jpg
>>> value = 'ro ython'
>>> join_with_prefix(*value.split(' '), prefix="P")

This example seems obvious on its own, because it’s a variable argument being passed in to a variable argument, but the same process works just fine on other types of functions as well. Because the arguments get expanded before getting passed to the function, it can be used with any function, regardless of how its arguments were specified. It can even be used with built-in functions and those defined by extensions written in C.

Note

You can only pass in one set of variable positional arguments and one set of variable keyword arguments in a function call. If you have two lists of positional arguments, for example, you’ll need to join them together yourself and pass that combined list into the function instead of trying to use the two separately.

Passing Arguments

When you start adding a number of arguments to function calls, many of which are optional, it becomes fairly common to know some of the argument values that will need to be passed, even if it’s still long before the function will actually be called. Rather than having to pass in all the arguments at the time the call is made, it can be quite useful to apply some of those arguments in advance, so fewer can be applied later.

This concept is officially called partial application of the function, but the function doesn’t get called at all yet, so it’s really more a matter of preloading some of the arguments in advance. When the preloaded function is called later, any arguments passed along are added to those that were provided earlier.

What About Currying?

If you’re familiar with other forms of functional programming, you may have heard of currying, which may look very similar to preloading arguments. Some frameworks have even provided functions named curry() that can preload arguments on a function, which leads to even more confusion. The difference between the two is subtle but important.

With a truly curried function, you must call it as many times as necessary to fill up all of the arguments. If a function accepts three arguments and you call it with just one argument, you’d get back a function that accepts two more arguments. If you call that new function, it still won’t execute your code but will instead load the next argument and return another function that takes the last remaining argument. Calling that function will finally satisfy all the arguments, so the actual code will be executed and return a useful value.

Partial application returns a function which, when called later, will at least try to execute code, no matter how many arguments may remain. If there are required arguments that haven’t gotten a value yet, Python will raise a TypeError just like it would if you had called it with missing arguments any other time. So even though there are certainly similarities between the two techniques, it’s important to understand the difference.

This behavior is provided as part of the built-in functools module, by way of its partial() function. By passing in a callable and any number of positional and keyword arguments, it will return a new callable that can be used later to apply those arguments:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figm_HTML.jpg
>>> import os
>>> def load_file(file, base_path='/', mode="rb"):
...     return open(os.path.join(base_path, file), mode)
...
>>> f = load_file('example.txt')
>>> f.mode
'rb'
>>> f.close()
>>> import functools
>>> load_writable = functools.partial(load_file, mode="w")
>>> f = load_writable('example.txt')
>>> f.mode
'w'
>>> f.close()

Note

The technique of preloading arguments is true for the partial() function, but the technique of passing one function into another to get a new function back is generally known as a decorator or higher order function. Decorators, as you’ll see later in this chapter, can perform any number of tasks when called; preloading arguments is just one example.

This is commonly used to customize a more flexible function into something simpler, so it can be passed into an API that doesn’t know how to access that flexibility. By preloading the custom arguments beforehand, the code behind the API can call your function with the arguments it knows how to use, but all the arguments will still come into play.

Caution

When using functools.partial(), you won’t be able to provide any new values for those arguments that were previously loaded. This is, of course, standard behavior any time you try to supply multiple values for a single argument, but the situation comes up much more often when you’re not supplying them all in the same function call. For an alternative approach that addresses this issue, see the “Decorators” section of this chapter.

Introspection

Python is very transparent, allowing code to inspect many aspects of objects at runtime. Because functions are objects like any others, there are several things that your code can glean from them, including the function signature, which specifies parameters. Obtaining a function’s arguments directly requires going through a fairly complicated set of attributes that describe Python’s bytecode structures, but thankfully Python also provides some functions to make it easier.

Many of Python’s introspection features are available as part of the standard inspect module, with its getfullargspec() function being of use for function arguments. It accepts the function to be inspected and returns a named tuple of information about that function’s arguments. The returned tuple contains values for every aspect of an argument specification:
  • args: A list of explicit argument names

  • varargs: The name of the variable positional argument

  • varkw: The name of the variable keyword argument

  • defaults: A tuple of default values for explicit arguments

  • kwonlyargs: A list of keyword-only argument names

  • kwonlydefaults: A dictionary of default values for keyword-only arguments

  • annotations: A dictionary of argument annotations, which will be explained later in this chapter

To better illustrate what values are present in each part of the tuple, here’s how it maps out to a basic function declaration:

../images/330715_3_En_3_Chapter/330715_3_En_3_Fign_HTML.jpg
>>> def example(a=1, b=1, *c, d, e=2, **f) -> str:
...     pass
...
>>> import inspect
>>> inspect.getfullargspec(example)
FullArgSpec(args=['a', 'b'], varargs="c", varkw="f", defaults=(1,), kwonlyargs=[
'd', 'e'], kwonlydefaults={'e': 2}, annotations={'a': <class 'int'>, 'return': <
class 'str'>})

Example: Identifying Argument Values

Sometimes it can be useful to log what arguments a function will receive, regardless of which function it is or what its arguments look like. This behavior often comes into play in systems that generate argument lists based on something other than a Python function call. Some examples include instructions from a template language and regular expressions that parse text input.

Unfortunately, positional arguments present a bit of a problem because their values don’t include the name of the argument they’ll be sent to. Default values also pose a problem because the function call doesn’t need to include any values at all. Because the log should include all the values that will be given to the function, both of these problems will need to be addressed.

First, the easy part. Any argument values passed by keyword don’t need to be matched up with anything manually, as the argument names are provided right with the values. Rather than concerning ourselves with logging at the outset, let’s start with a function to get all the arguments in a dictionary that can be logged. The function accepts a function, a tuple of positional arguments, and a dictionary of keyword arguments:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figo_HTML.jpg
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    We are modifying get_arguments by adding new parts to it.
    """
    arguments = kwargs.copy()
    return arguments
print(get_arguments(example, (1,), {'f': 4}))  #will yield a result of:  {'f': 4}

That really was easy. The function makes a copy of the keyword arguments instead of just returning it directly because we’ll be adding entries to that dictionary soon enough. Next, we have to deal with positional arguments. The trick is to identify which argument names map to the positional argument values, so that those values can be added to the dictionary with the appropriate names. This is where inspect.getfullargspec() comes into play, using zip() to do the heavy lifting:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figp_HTML.jpg
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
import inspect
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = kwargs.copy()
    spec = inspect.getfullargspec(func)
    arguments.update(zip(spec.args, args))
    return arguments
print(get_arguments(example, (1,), {'f': 4}))  # will output {'a': 1, 'f': 4}

Now that the positional arguments have been dealt with, let’s move on to figuring out default values. If there are any default values that weren’t overridden by the arguments provided, the defaults should be added to the argument dictionary, as they will be sent to the function:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figq_HTML.jpg
import inspect
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = kwargs.copy()
    spec = inspect.getfullargspec(func)
    arguments.update(zip(spec.args, args))
    if spec.defaults:
        for i, name in enumerate(spec.args[-len(spec.defaults):]):
            if name not in arguments:
                arguments[name] = spec.defaults[i]
    return arguments
print(get_arguments(example, (1,), {'f': 4})) # will output  {'a': 1, 'b': 1, 'f': 4}

Because optional arguments must come after required arguments, this addition uses the size of the defaults tuple to determine the names of the optional argument. Looping over them, it then assigns only those values that weren’t already provided. Unfortunately, this is only half of the default value situation. Because keyword-only arguments can take default values as well, getfullargspec() returns a separate dictionary for those values:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figr_HTML.jpg
import inspect
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = kwargs.copy()
    spec = inspect.getfullargspec(func)
    arguments.update(zip(spec.args, args))
    for i, name in enumerate(spec.args[-len(spec.defaults)]):
        if name not in arguments:
            arguments[name] = spec.defaults[i]
    if spec.kwonlydefaults:
        for name, value in spec.kwonlydefaults.items():
            if name not in arguments:
                arguments[name] = value
    return arguments
print(get_arguments(example, (1,), {'f': 4})) # will yield {'a': 1, 'b': 1, 'e': 2, 'f': 4}

Because default values for keyword-only arguments also come in dictionary form, it’s much easier to apply those because the argument names are known in advance. With that in place, get_arguments() can produce a more complete dictionary of arguments that will be passed to the function. Unfortunately, because this returns a dictionary and variable positional arguments have no names, there’s no way to add them to the dictionary. This limits its usefulness a bit, but it’s still valid for a great many function definitions.

Example: A More Concise Version

The previous example is certainly functional, but it’s a bit more code than is really necessary. In particular, it takes a fair amount of work supplying default values when explicit values aren’t provided. That’s not very intuitive, however, because we usually think about default values the other way around: they’re provided first, then overridden by explicit arguments.

The get_arguments() function can be rewritten with that in mind by bringing the default values out of the function declaration first, before replacing them with any values passed in as actual arguments. This avoids a lot of the checks that have to be made to make sure nothing gets overwritten accidentally.

The first step is to get the default values out. Because the defaults and kwonlydefaults attributes of the argument specification are set to None if no default values were specified, we actually have to start by setting up an empty dictionary to update. Then the default values for positional arguments can be added in.

Because this only needs to update a dictionary this time, without regard for what might be in it already, it’s a bit easier to use a different technique to get the positional defaults. Rather than using a complex slice that’s fairly difficult to read, we can use a similar zip() to what was used to get the explicit argument values. By first reversing the argument list and the default values, they still match up starting at the end:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figs_HTML.jpg
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = {}
    spec = inspect.getfullargspec(func)
    if spec.defaults:
        arguments.update(zip(reversed(spec.args), reversed(spec.defaults)))
    return arguments
print(get_arguments(example, (1,), {'f': 4}))  # will output  {'b': 1}

Adding default values for keyword arguments is much easier because the argument specification already supplies them as a dictionary. We can just pass that straight into an update() of the argument dictionary and move on:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figt_HTML.jpg
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = {}
    spec = inspect.getfullargspec(func)
    if spec.defaults:
        arguments.update(zip(reversed(spec.args), reversed(spec.defaults)))
    if spec.kwonlydefaults:
        arguments.update(spec.kwonlydefaults)
    return arguments
print(get_arguments(example, (1,), {'f': 4})) # will output {'b': 1, 'e': 2}

Now all that’s left is to add the explicit argument values that were passed in. The same techniques used in the earlier version of this function will work here, with the only exception being that keyword arguments are passed in an update() function instead of being copied to form the argument dictionary in the first place:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figu_HTML.jpg
def example(a=1, b=1, *c, d, e=2, **f) -> str:
      pass
def get_arguments(func, args, kwargs):
    """
    Given a function and a set of arguments, return a dictionary
    of argument values that will be sent to the function.
    """
    arguments = {}
    spec = inspect.getfullargspec(func)
    if spec.defaults:
        arguments.update(zip(reversed(spec.args), reversed(spec.defaults)))
    if spec.kwonlydefaults:
        arguments.update(spec.kwonlydefaults)
    arguments.update(zip(spec.args, args))
    arguments.update(kwargs)
    return arguments
print(get_arguments(example, (1,), {'f': 4}))  # will output {'a': 1, 'b': 1, 'e': 2, 'f': 4}

With that, we have a much more concise function that works the way we normally think of default argument values. This type of refactoring is fairly common after you get more familiar with the advanced techniques available to you. It’s always useful to look over old code to see if there’s an easier, more straightforward way to go about the task at hand. This will often make your code faster as well as more readable and maintainable going forward. Now we'll extend our solution to also validate arguments.

Example: Validating Arguments

Unfortunately, that doesn’t mean that the arguments returned by get_arguments() are capable of being passed into the function without errors. As it stands, get_arguments() assumes that any keyword arguments supplied are in fact valid arguments for the function, but that isn’t always the case. In addition, any required arguments that didn’t get a value will cause an error when the function is called. Ideally, we should be able to validate the arguments as well.

We can start with get_arguments() , so we have a dictionary of all the values that will be passed to the function, then we have two validation tasks: make sure all arguments have values and make sure no arguments were provided that the function doesn’t know about. The function itself may impose additional requirements on the argument values, but as a generic utility, we can’t make any assumptions about the content of any of the provided values.

Let’s start off with making sure all the necessary values were provided. We don’t have to worry as much about required or optional arguments this time around, since get_arguments() already makes sure optional arguments have their default values. Any argument left without a value is therefore required:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figv_HTML.jpg
import itertools
def validate_arguments(func, args, kwargs):
    """
    Given a function and its arguments, return a dictionary
    with any errors that are posed by the given arguments.
    """
    arguments = get_arguments(func, args, kwargs)
    spec = inspect.getfullargspec(func)
    declared_args = spec.args[:]
    declared_args.extend(spec.kwonlyargs)
    errors = {}
    for name in declared_args:
        if name not in arguments:
            errors[name] = "Required argument not provided."
    return errors

With the basics in place to validate that all required arguments have values, the next step is to make sure the function knows how to deal with all the arguments that were provided. Any arguments passed in that aren’t defined in the function should be considered an error:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figw_HTML.jpg
import itertools
def validate_arguments(func, args, kwargs):
    """
    Given a function and its arguments, return a dictionary
    with any errors that are posed by the given arguments.
    """
    arguments = get_arguments(func, args, kwargs)
    spec = inspect.getfullargspec(func)
    declared_args = spec.args[:]
    declared_args.extend(spec.kwonlyargs)
    errors = {}
    for name in declared_args:
        if name not in arguments:
            errors[name] = "Required argument not provided."
    for name in arguments:
        if name not in declared_args:
            errors[name] = "Unknown argument provided."
    return errors

Of course, because this relies on get_arguments() , it inherits the same limitation of variable positional arguments. This means validate_arguments() may sometimes return an incomplete dictionary of errors. Variable positional arguments present an additional challenge that can’t be addressed with this function. A more comprehensive solution is provided in the section on function annotations.

Decorators

When dealing with a large codebase, it’s very common to have a set of tasks that need to be performed by many different functions, usually before or after doing something more specific to the function at hand. The nature of these tasks is as varied as the projects that use them, but here are some of the more common examples of where decorators are used:
  • Access control

  • Cleanup of temporary objects

  • Error handling

  • Caching

  • Logging

In all of these cases, there’s some boilerplate code that needs to be executed before or after what the function’s really trying to do. Rather than copying that code into each function, it would be better if it could be written once and simply applied to each function that needs it. This is where decorators come in.

Technically, decorators are just simple functions designed with one purpose: accept a function and return a function. The function returned can be the same as the one passed in, or it could be completely replaced by something else along the way. The most common way to apply a decorator is using a special syntax designed just for this purpose. Here’s how you could apply a decorator designed to suppress any errors during the execution of a function:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figx_HTML.jpg
import datetime
from myapp import suppress_errors
@suppress_errors
def log_error(message, log_file='errors.log'):
    """Log an error message to a file."""
    log = open(log_file, 'w')
    log.write('%s %s ' % (datetime.datetime.now(), message))

This syntax tells Python to pass the log_error() function as an argument to the suppress_errors() function , which then returns a replacement to use instead. It’s easier to understand what happens behind the scenes by examining the process used in older versions of Python, before the @ syntax was introduced in Python 2.4:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figy_HTML.jpg
 #Python 2.x example
import datetime
from myapp import suppress_errors
def log_error(message, log_file='errors.log'):
    """Log an error message to a file."""
    log = open(log_file, 'w')
    log.write('%s %s ' % (datetime.datetime.now(), message))
log_error = suppress_errors(log_error)

Don’t Repeat Yourself/Readability Counts

When using the older decoration approach, notice that the name of the function is written three different times. Not only is this some extra typing that seems unnecessary; it complicates matters if you ever need to change the function name, and it only gets worse the more decorators you add. The newer syntax can apply a decorator without repeating the function name, no matter how many decorators you use.

Of course, the @ syntax does have one other benefit, which greatly helps its introduction: it keeps decorators right near the function’s signature. This makes it easy to see at a glance which decorators are applied, which more directly conveys the total behavior of the function. Having them at the bottom of the function requires more effort to understand the complete behavior, so by moving decorators up to the top, readability is greatly enhanced.

The older option is still available and behaves identically to the @ syntax . The only real difference is that the @ syntax is only available when defining the function in the source file. If you want to decorate a function that was imported from elsewhere, you’ll have to pass it into the decorator manually, so it’s important to remember both ways it can work:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figz_HTML.jpg
from myapp import log_error, suppress_errors
log_error = suppress_errors(log_error)

To understand what commonly goes on inside decorators like log_error() , it’s necessary to first examine one of the most misunderstood and underutilized features of Python, and many other languages as well: closures.

Closures

Despite their usefulness, closures can seem to be an intimidating topic. Most explanations assume prior knowledge of things such as lexical scope, free variables, upvalues, and variable extent. Also, because so much can be done without ever learning about closures, the topic often seems mysterious and magical, as if it’s the domain of experts, unsuitable for the rest of us. Thankfully, closures really aren’t as difficult to understand as the terminology may suggest.

In a nutshell, a closure is a function that’s defined inside another function but is then passed outside that function where it can be used by other code. There are some other details to learn as well, but it’s still fairly abstract at this point, so here’s a simple example of a closure:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaa_HTML.jpg
def multiply_by(factor):
    """Return a function that multiplies values by the given factor"""
    def multiply(value):
        """Multiply the given value by the factor already provided"""
        return value * factor
    return multiply
times2=multiply_by(2)
print(times2(2))

As you can see, when you call multiply_by() with a value to use as a multiplication factor, the inner multiply() gets returned to be used later on. Here’s how it would actually be used, which may help explain why this is useful. If you key in the previous code line by line from a Python prompt, the following would give you an idea about how this works:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figab_HTML.jpg
>>> times2 = multiply_by(2)
>>> times2(5)
10
>>> times2(10)
20
>>> times3 = multiply_by(3)
>>> times3(5)
15
>>> times2(times3(5))
30

This behavior looks a bit like the argument preloading feature of functools.partial(), but you don’t need to have a function that takes both arguments at once. The interesting part of about how this works, however, is that the inner function doesn’t need to accept a factor argument of its own; it essentially inherits that argument from the outer function.

The fact that an inner function can reference the values of an outer function often seems perfectly normal when looking at the code, but there are a couple of rules about how it works that might not be as obvious. First, the inner function must be defined within the outer function; simply passing in a function as an argument won’t work:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figac_HTML.jpg
def multiply(value):
    return value * factor
def custom_operator(func, factor):
    return func
multiply_by = functools.partial(custom_operator, multiply)

On the surface, this looks mostly equivalent to the working example shown previously, but with the added benefit of being able to provide a callable at runtime. After all, the inner function gets placed inside the outer function and gets returned for use by other code. The problem is that closures only work if the inner function is actually defined inside the outer function, not just anything that gets passed in:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figad_HTML.jpg
>>> times2 = multiply_by(2)
>>> times2(5)
Traceback (most recent call last):
  ...
NameError: global name 'factor' is not defined

This almost contradicts the functionality of functools.partial(), which works much like the custom_operator() function described here, but remember that partial() accepts all of the arguments at the same time as it accepts the callable to be bundled with them. It doesn’t try to pull in any arguments from anywhere else.

Wrappers

Closures come into play heavily in the construction of wrappers , the most common use of decorators. Wrappers are functions designed to contain another function, adding some extra behavior before or after the wrapped function executes. In the context of the closure discussion, a wrapper is the inner function, while the wrapped function is passed in as an argument to the outer function. Here’s the code behind the suppress_errors() decorator shown in the previous section:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figae_HTML.jpg
def suppress_errors(func):
    """Automatically silence any errors that occur within a function"""
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception:
            pass
    return wrapper

A few things are going on here, but most of them have already been covered. The decorator takes a function as its only argument, which isn’t executed until the inner wrapper function executes. By returning the wrapper instead of the original function, we form a closure, which allows the same function name to be used even after suppress_errors() is done.

Because the wrapper has to be called as if it were the original function, regardless of how that function was defined, it must accept all possible argument combinations. This is achieved by using variable positional and keyword arguments together and passing them straight into the original function internally. This is a very common practice with wrappers because it allows maximum flexibility, without caring what type of function it’s applied to.

The actual work in the wrapper is quite simple: just execute the original function inside a try/except block to catch any exceptions that are raised. In the event of any errors it just continues merrily along, implicitly returning None instead of doing anything interesting. It also makes sure to return any value returned by the original function, so that everything meaningful about the wrapped function is maintained.

In this case the wrapper function is fairly simple, but the basic idea works for many more complex situations as well. There could be several lines of code both before and after the original function is called, perhaps with some decisions about whether it is called at all. Authorization wrappers, for instance, will typically return or raise an exception without ever calling the wrapped function, if the authorization failed for any reason.

Unfortunately, wrapping a function means some potentially useful information is lost. Chapter 5 shows how Python has access to certain attributes of a function, such as its name, docstring, and argument list. By replacing the original function with a wrapper, we’ve actually replaced all of that other information as well. In order to bring some of it back, we turn to a decorator in the functools module called wraps.

It may seem odd to use a decorator inside a decorator, but it really just solves the same problem as anything else: there’s a common need that shouldn’t require duplicate code everywhere it takes place. The functools.wraps() decorator copies the name, docstring, and some other information over to the wrapped function, so at least some of it gets retained. It does not copy over the argument list, but it’s better than nothing:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaf_HTML.jpg
import functools
def suppress_errors(func):
    """Automatically silence any errors that occur within a function"""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception:
            pass
    return wrapper

What may seem most odd about this construction is that functools.wraps() takes an argument in addition to the function to which it’s applied. In this case, that argument is the function to copy attributes from, which is specified on the line with the decorator itself. This is often useful for customizing decorators for specific tasks, so next we’ll examine how to take advantage of custom arguments in your own decorators.

Decorators with Arguments

Ordinarily decorators only take a single argument, the function to be decorated. Behind the scenes, though, Python evaluates the @ line as an expression before applying it as a decorator. The result of the expression is what’s actually used as a decorator. In the simple case, the decorator expression is just a single function, so it evaluates easily. Adding arguments in the form used by functools.wraps() makes the whole statement evaluate like this:
wrapper = functools.wraps(func)(wrapper)
Looking at it this way, the solution becomes clear: one function returns another. The first function accepts the extra arguments and returns another function, which is used as the decorator. This makes implementing arguments on a decorator more complex because it adds another layer to the whole process, but it’s easy to deal with once you see it in context. Here’s how everything works together in the longest chain you’re likely to see:
  • A function to accept and validate arguments, and also return a function that decorates the original

  • A decorator to accept a user-defined function

  • A wrapper to add extra behavior

  • The original function that was decorated

Not all of that will happen for every decorator, but that’s the general approach of the most complex scenarios. Anything more complicated is simply an expansion of one of those four steps. As you’ll notice, three of the four have already been covered, so the extra layer imposed by decorator arguments is really the only thing left to discuss.

This new outermost function accepts all the arguments for the decorator, optionally validates them, and returns a new function as a closure over the argument variables. That new function must take a single argument, functioning as the decorator. Here’s how the suppress_errors() decorator might look if it instead accepted a logger function to report the errors to, rather than completely silencing them:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figag_HTML.jpg
import functools
def suppress_errors(log_func=None):
    """Automatically silence any errors that occur within a function"""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except Exception as e:
                if log_func is not None:
                    log_func(str(e))
        return wrapper
    return decorator

This layering allows suppress_errors() to accept arguments prior to being used as a decorator, but it removes the ability to call it without any arguments. Because that was the previous behavior, we’ve now introduced a backward incompatibility. The closest we can get to the original syntax is to actually call suppress_errors() first, but without any arguments.

Here’s an example function that processes updates files in a given directory. This is a task that’s often performed on an automated schedule, so that if something goes wrong, it can just stop running and try again at the next appointed time:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figah_HTML.jpg
import datetime
import os
import time
from myapp import suppress_errors
@suppress_errors()
def process_updated_files(directory, process, since=None):
    """
    Processes any new files in a `directory` using the `process` function.
    If provided, `since` is a date after which files are considered updated.
    The process function passed in must accept a single argument: the absolute
    path to the file that needs to be processed.
    """
    if since is not None:
        # Get a threshold that we can compare to the modification time later
        threshold = time.mktime(since.timetuple()) + since.microsecond / 1000000
    else:
        threshold = 0
    for filename in os.listdir(directory):
        path = os.path.abspath(os.path.join(directory, filename))
        if os.stat(path).st_mtime > threshold:
            process(path)

Unfortunately, this is still a strange situation to end up with, and it really doesn’t look like anything that Python programmers are used to. Clearly, we need a better solution.

Decorators with—or without—Arguments

Ideally, a decorator with optional arguments would be able to be called without parentheses if no arguments are provided, while still being able to provide the arguments when necessary. This means supporting two different flows in a single decorator, which can get tricky if you’re not careful. The main problem is that the outermost function must be able to accept arbitrary arguments or a single function, and it must be able to tell the difference between the two and behave accordingly.

That brings us to the first task: determining which flow to use when the outer function is called. One option would be to inspect the first positional argument and check to see if it’s a function, since decorators always receive the function as a positional argument.

Interestingly, a pretty good distinction can be made based on something mentioned briefly in the previous paragraph. Decorators always receive the decorated function as a positional argument, so we can use that as its distinguishing factor. For all other arguments we can instead rely on keyword arguments, which are generally more explicit anyway, thus making it more readable as well.

We could do this by way of using *args and **kwargs, but because we know the positional argument list is just a fixed single argument, it’s easier to just make that the first argument and make it optional. Then, any additional keyword arguments can be placed after it. They’ll all need default values, of course, but the whole point here is that all arguments are optional, so that’s not a problem.

With the argument distinction squared away, all that’s left is to branch into a different code block if arguments are provided, rather than a function to be decorated. By having an optional first positional argument, we can simply test for its presence to determine which branch to go through:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figai_HTML.jpg
import functools
def suppress_errors(func=None, log_func=None):
    """Automatically silence any errors that occur within a function"""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except Exception as e:
                if log_func is not None:
                    log_func(str(e))
        return wrapper
    if func is None:
        return decorator
    else:
        return decorator(func)

This now allows suppress_errors() to be called with or without arguments, but it’s still important to remember that arguments must be passed with keywords. This is an example in which an argument looks identical to the function being decorated. There’s no way to tell the difference by examining them, even if we tried.

If a logger function is provided as a positional argument, the decorator will assume it’s the function to be decorated, so it’ll actually execute the logger immediately, with the function to be decorated as its argument. In essence, you’ll end up logging the function you wanted to decorate. Worse yet, the value you’re left with after decorating the function is actually the return value from the logger, not the decorator. Because most loggers don’t return anything, it’ll probably be None—that’s right, your function has vanished. Given that you keyed in the aforementioned functions, you can try the following from a prompt:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaj_HTML.jpg
>>> def print_logger(message):
...     print(message)
...
>>> @suppress_errors(print_logger)
... def example():
...     return variable_which_does_not_exist
...
<function example at 0x...>
>>> example
>>>

This is a side effect of the way the decorator works, and there’s little to be done other than documenting it and making sure you always specify keywords when applying arguments.

Example: Memoization

To demonstrate how decorators can copy out common behavior into any function you like, consider what could be done to improve the efficiency of deterministic functions. Deterministic functions always return the same result given the same set of arguments, no matter how many times they’re called. Given such a function, it should be possible to cache the results of a given function call so if it’s called with the same arguments again, the result can be looked up without having to call the function again.

Using a cache, a decorator can store the result of a function using the argument list as its key. Dictionaries can’t be used as keys in a dictionary, so only positional arguments can be taken into account when populating the cache. Thankfully, most functions that would take advantage of memoization are simple mathematical operations, which are typically called with positional arguments anyway:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figak_HTML.jpg
def memoize(func):
    """
    Cache the results of the function so it doesn't need to be called
    again, if the same arguments are provided a second time.
    """
    cache = {}
    @functools.wraps(func)
    def wrapper(*args):
        if args in cache:
            return cache[args]
        # This line is for demonstration only.
        # Remove it before using it for real.
        print('Calling %s()' % func.__name__)
        result = func(*args)
        cache[args] = result
        return result
    return wrapper

Now, whenever you define a deterministic function, you can use the memoize() decorator to automatically cache its result for future use. Here’s how it would work for some simple mathematical operations. Again, given you keyed in the aforelisted stub, try the following:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figal_HTML.jpg
>>> @memoize
... def multiply(x, y):
...     return x * y
...
>>> multiply(6, 7)
Calling multiply()
42
>>> multiply(6, 7)
42
>>> multiply(4, 3)
Calling multiply()
12
>>> @memoize
... def factorial(x):
...    result = 1
...    for i in range(x):
...        result *= i + 1
...    return result
...
>>> factorial(5)
Calling factorial()
120
>>> factorial(5)
120
>>> factorial(7)
Calling factorial()
5040

Caution

Memoization is best suited for functions with a few arguments, which are called with relatively few variations in the argument values. Functions that are called with a large number of arguments or have a lot of variety in the argument values that are used will quickly fill up a lot of memory with the cache. This can slow down the entire system, with the only benefit being the minority of cases where arguments are reused. Also, functions that aren’t truly deterministic will actually cause problems because the function won’t be called every time.

Example: A Decorator to Create Decorators

Astute readers will have noticed something of a contradiction in the descriptions of the more complex decorator constructs. The purpose of decorators is to avoid a lot of boilerplate code and simplify functions, but the decorators themselves end up getting quite complicated just to support features such as optional arguments. Ideally, we could put that boilerplate into a decorator as well, simplifying the process for new decorators.

Because decorators are Python functions, just like those they decorate, this is quite possible. As with the other situations, however, there’s something that needs to be taken into account. In this case, the function you define as a decorator will need to distinguish between the arguments meant for the decorator and those meant for the function it decorates:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figam_HTML.jpg
def decorator(declared_decorator):
    """Create a decorator out of a function, which will be used as a wrapper."""
    @functools.wraps(declared_decorator)
    def final_decorator(func=None, **kwargs):
        # This will be exposed to the rest
        # of your application as a decorator
        def decorated(func):
            # This will be exposed to the rest
            # of your application as a decorated
            # function, regardless how it was called
            @functools.wraps(func)
            def wrapper(*a, **kw):
                # This is used when actually executing
                # the function that was decorated
                return declared_decorator(func, a, kw, **kwargs)
            return wrapper
        if func is None:
            # The decorator was called with arguments,
            # rather than a function to decorate
            return decorated
        else:
            # The decorator was called without arguments,
            # so the function should be decorated immediately
            return decorated(func)
    return final_decorator
With this in place, you can define your decorators in terms of the wrapper function directly; then, just apply this decorator to manage the overhead behind the scenes. Your declared functions must always accept three arguments now, with any additional arguments added on beyond that. The three required arguments are shown in the following list:
  • The function that will be decorated, which should be called if appropriate

  • A tuple of positional arguments that were supplied to the decorated function

  • A dictionary of keyword arguments that were supplied to the decorated function

With these arguments in mind, here’s how you might define the suppress_errors() decorator described previously in this chapter:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figan_HTML.jpg
>>> @decorator
... def suppress_errors(func, args, kwargs, log_func=None):
...     try:
...        return func(*args, **kwargs)
...    except Exception as e:
...        if log_func is not None:
...           log_func(str(e))
...
>>> @suppress_errors
... def example():
...     return variable_which_does_not_exist
...
>>> example() # Doesn't raise any errors
>>> def print_logger(message):
...     print(message)
...
>>> @suppress_errors(log_func=print_logger)
... def example():
...     return variable_which_does_not_exist
...
>>> example()
global name 'variable_which_does_not_exist' is not defined

Function Annotations

There are typically three aspects of a function that don’t deal with the code within it: a name, a set of arguments, and an optional docstring. Sometimes, however, that’s not quite enough to fully describe how the function works or how it should be used. Static-typed languages—such as Java, for example—also include details about what type of values are allowed for each of the arguments, as well as what type can be expected for the return value.

Python’s response to this need is the concept of function annotations. Each argument, as well as the return value, can have an expression attached to it, which describes a detail that can’t be conveyed otherwise. This could be as simple as a type, such as int or str, which is analogous to static-typed languages, as shown in the following example stub:
def prepend_rows(rows:list, prefix:str) -> list:
    return [prefix + row for row in rows]

The biggest difference between this example and traditional static-typed languages isn’t a matter of syntax; it’s that in Python annotations can be any expression, not just a type or a class. You could annotate your arguments with descriptive strings, calculated values, or even inline functions—see this chapter’s section on lambdas for details. Here’s what the previous example might look like if annotated with strings as additional documentation:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figao_HTML.jpg
def prepend_rows(rows:"a list of strings to add to the prefix",
                 prefix:"a string to prepend to each row provided",
                 ) -> "a new list of strings prepended with the prefix":
    return [prefix + row for row in rows]

Of course, this flexibility might make you wonder about the intended use for function annotations, but there isn’t one, and that’s deliberate. Officially, the intent behind annotations is to encourage experimentation in frameworks and other third-party libraries. The two examples shown here could be valid for use with type checking and documentation libraries, respectively.

Example: Type Safety

To illustrate how annotations can be used by a library, consider a basic implementation of a type safety library that can understand and utilize the function described previously. It would expect argument annotations to specify a valid type for any incoming arguments, while the return annotation would be able to validate the value returned by the function.

Because type safety involves verifying values before and after the function is executed, a decorator is the most suitable option for the implementation. Also, because all of the type-hinting information is provided in the function declaration, we don’t need to worry about any additional arguments, so a simple decorator will suffice. The first task, however, is to validate the annotations themselves, as they must be valid Python types in order for the rest of the decorator to work properly:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figap_HTML.jpg
import inspect
def typesafe(func):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    spec = inspect.getfullargspec(func)
    for name, annotation in spec.annotations.items():
        if not isinstance(annotation, type):
            raise TypeError("The annotation for '%s' is not a type." % name)
    return func

So far this doesn’t do anything to the function, but it does check to see that each annotation provided is a valid type, which can then be used to verify the type of the arguments referenced by the annotations. This uses isinstance() , which compares an object to the type it’s expected to be. More information on isinstance() and on types and classes in general can be found in Chapter 4.

Now that we can be sure all the annotations are valid, it’s time to start validating some arguments. Given how many types of arguments there are, let’s take them one at a time. Keyword arguments are the easiest to start out with, since they already come with their name and value tied together, so that’s one less thing to worry about. With a name, we can get the associated annotation and validate the value against that. This would also be a good time to start factoring out some things, as we’ll end up having to use some of the same things over and over again. Here’s how the wrapper would look to begin with:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaq_HTML.jpg
import functools
import inspect
def typesafe(func):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    spec = inspect.getfullargspec(func)
    annotations = spec.annotations
    for name, annotation in annotations.items():
        if not isinstance(annotation, type):
            raise TypeError("The annotation for '%s' is not a type." % name)
    error = "Wrong type for %s: expected %s, got %s."
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Deal with keyword arguments
        for name, arg in kwargs.items():
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        return func(*args, **kwargs)
    return wrapper

By now, this should be fairly self-explanatory. Any keyword arguments provided will be checked to see if there’s an associated annotation. If there is, the provided value is checked to make sure it’s an instance of the type found in the annotation. The error message is factored out because it’ll get reused a few more times before we’re done.

Next up is dealing with positional arguments. Once again, we can rely on zip() to line up the positional argument names with the values that were provided. Because the result of zip() is compatible with the items() method of dictionaries, we can actually use chain() from the itertools module to link them together into the same loop:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figar_HTML.jpg
Part one: add part two to this to see it in action as a script:
import functools
import inspect
from itertools import chain
def typesafe(func):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    spec = inspect.getfullargspec(func)
    annotations = spec.annotations
    for name, annotation in annotations.items():
        if not isinstance(annotation, type):
            raise TypeError("The annotation for '%s' is not a type." % name)
    error = "Wrong type for %s: expected %s, got %s."
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Deal with keyword arguments
        for name, arg in chain(zip(spec.args, args), kwargs.items()):
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        return func(*args, **kwargs)
    return wrapper

Although that takes care of both positional and keyword arguments, it’s not everything. Because variable arguments can also accept annotations, we have to account for argument values that don’t line up as nicely with defined argument names. Unfortunately, there’s something else that must be dealt with before we can do much of anything on that front.

If you’re paying really close attention, you might notice a very subtle bug in the code as it stands. In order to make the code a bit easier to follow and to account for any arguments that are passed by keywords, the wrapper iterates over the kwargs dictionary in its entirely, checking for associated annotations. Unfortunately, that leaves us with the possibility of an unintentional name conflict.

To illustrate how the bug could be triggered, first consider what would be expected when dealing with variable arguments. Because we can only apply a single annotation to the variable argument name itself, that annotation must be assumed to apply to all arguments that fall under that variable argument, whether passed positionally or by keyword. Without explicit support for that behavior yet, variable arguments should just be ignored, but here’s what happens with the code as it stands:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figas_HTML.jpg
Part two: put this at the end of the script you just keyed in:
@typesafe
def example(*args:int, **kwargs:str):
    pass
print(example(spam='eggs'))  #fine
print(example(kwargs='spam'))  #fine
print(example(args='spam'))  # not fine!
# output will be:
#Traceback (most recent call last):
#TypeError: Wrong type for args: expected int, got str.

Interestingly, everything works fine unless the function call includes a keyword argument with the same name as the variable positional argument. Although it may not seem obvious at first, the problem is actually in the set of values to iterate over in the wrapper’s only loop. It assumes that the names of all the keyword arguments line up nicely with annotations.

Basically, the problem is that keyword arguments that are meant for the variable argument end up getting matched up with annotations from other arguments. For the most part, this is acceptable because two of the three types of arguments won’t ever cause problems. Matching it with an explicit argument name simply duplicates what Python already does, so using the associated annotation is fine, and matching the variable keyword argument name ends up using the same annotation that we were planning on using anyway.

So the problem only crops up when a keyword argument matches the variable positional argument name, because that association never makes sense. Sometimes if the annotation is the same as that of the variable keyword argument, the problem might never show up, but it’s still there, regardless. Because the code for the wrapper function is still fairly minimal, it’s not too difficult to see where the problem is occurring.

In the main loop, the second part of the iteration chain is the list of items in the kwargs dictionary. That means everything passed in by keyword is checked against named annotations, which clearly isn’t always what we want. Instead, we only want to loop over the explicit arguments at this point, while still supporting both positions and keywords. That means we’ll have to construct a new dictionary based on the function definition, rather than taking the easy way out and relying on kwargs, as we are now. The outer typesafe() function has been removed from the listing here to make the code easier to digest in print:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figat_HTML.jpg
    def wrapper(*args, **kwargs):
        # Populate a dictionary of explicit arguments passed positionally
        explicit_args = dict(zip(spec.args, args))
        # Add all explicit arguments passed by keyword
        for name in chain(spec.args, spec.kwonlyargs):
            if name in kwargs:
                explicit_args[name] = kwargs[name]
        # Deal with explicit arguments
        for name, arg in explicit_args.items():
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        return func(*args, **kwargs)

With that bug out of the way, we can focus on properly supporting variable arguments. Because keyword arguments have names but positional arguments don’t, we can’t manage both types in one pass like we could with the explicit arguments. The processes are fairly similar to the explicit arguments, but the values to iterate over are different in each case. The biggest difference, however, is that the annotations aren’t referenced by the name of the arguments.

In order to loop over just the truly variable positional arguments, we can simply use the number of explicit arguments as the beginning of a slice on the positional arguments tuple. This gets us all positional arguments provided after the explicit arguments or an empty list if only explicit arguments were provided.

For keyword arguments, we have to be a bit more creative. Because the function already loops over all the explicitly declared arguments at the beginning, we can use that same loop to exclude any matching items from a copy of the kwargs dictionary. Then we can iterate over what’s left over to account for all the variable keyword arguments:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figau_HTML.jpg
    def wrapper(*args, **kwargs):
        # Populate a dictionary of explicit arguments passed positionally
        explicit_args = dict(zip(spec.args, args))
        keyword_args = kwargs.copy()
        # Add all explicit arguments passed by keyword
        for name in chain(spec.args, spec.kwonlyargs):
            if name in kwargs:
                explicit_args[name] = keyword_args.pop(name)
        # Deal with explicit arguments
        for name, arg in explicit_args.items():
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        # Deal with variable positional arguments
        if spec.varargs and spec.varargs in annotations:
            annotation = annotations[spec.varargs]
            for i, arg in enumerate(args[len(spec.args):]):
                if not isinstance(arg, annotation):
                    raise TypeError(error % ('variable argument %s' % (i + 1),
                                             annotation.__name__,
                                             type(arg).__name__))
        # Deal with variable keyword arguments
        if spec.varkw and spec.varkw in annotations:
            annotation = annotations[spec.varkw]
            for name, arg in keyword_args.items():
                if not isinstance(arg, annotation):
                    raise TypeError(error % (name,
                                             annotation.__name__,
                                             type(arg).__name__))
        return func(*args, **kwargs)

This covers all explicit arguments as well as variable arguments passed in by position and keyword. The only thing left is to validate the value returned by the target function. Thus far the wrapper just calls the original function directly without regard for what it returns, but by now, it should be easy to see what needs to be done:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figav_HTML.jpg
    def wrapper(*args, **kwargs):
        # Populate a dictionary of explicit arguments passed positionally
        explicit_args = dict(zip(spec.args, args))
        keyword_args = kwargs.copy()
        # Add all explicit arguments passed by keyword
        for name in chain(spec.args, spec.kwonlyargs):
            if name in kwargs:
                explicit_args[name] = keyword_args(name)
        # Deal with explicit arguments
        for name, arg in explicit_args.items():
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        # Deal with variable positional arguments
        if spec.varargs and spec.varargs in annotations:
            annotation = annotations[spec.varargs]
            for i, arg in enumerate(args[len(spec.args):]):
                if not isinstance(arg, annotation):
                    raise TypeError(error % ('variable argument %s' % (i + 1),
                                             annotation.__name__,
                                             type(arg).__name__))
        # Deal with variable keyword arguments
        if spec.varkw and spec.varkw in annotations:
            annotation = annotations[spec.varkw]
            for name, arg in keyword_args.items():
                if not isinstance(arg, annotation):
                    raise TypeError(error % (name,
                                             annotation.__name__,
                                             type(arg).__name__))
        r = func(*args, **kwargs)
        if 'return' in annotations and not isinstance(r, annotations['return']):
            raise TypeError(error % ('the return value',
                                     annotations['return'].__name__,
                                     type(r).__name__))
        return r

With that, we have a fully functional type safety decorator, which can validate all arguments to a function as well as its return value. There’s one additional safeguard we can include to find errors even more quickly, however. In the same way as the outer typesafe() function already validates that the annotations are types, that part of the function is also capable of validating the default values for all provided arguments. Because variable arguments can’t have default values, this is much simpler than dealing with the function call itself:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaw_HTML.jpg
import functools
import inspect
from itertools import chain
def typesafe(func):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    spec = inspect.getfullargspec(func)
    annotations = spec.annotations
    for name, annotation in annotations.items():
        if not isinstance(annotation, type):
            raise TypeError("The annotation for '%s' is not a type." % name)
    error = "Wrong type for %s: expected %s, got %s."
    defaults = spec.defaults or ()
    defaults_zip = zip(spec.args[-len(defaults):], defaults)
    kwonlydefaults = spec.kwonlydefaults or {}
    for name, value in chain(defaults_zip, kwonlydefaults.items()):
        if name in annotations and not isinstance(value, annotations[name]):
            raise TypeError(error % ('default value of %s' % name,
                                     annotations[name].__name__,
                                     type(value).__name__))
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Populate a dictionary of explicit arguments passed positionally
        explicit_args = dict(zip(spec.args, args))
        keyword_args = kwargs.copy()
        # Add all explicit arguments passed by keyword
        for name in chain(spec.args, spec.kwonlyargs):
            if name in kwargs:
                explicit_args[name] = keyword_args.pop(name)
        # Deal with explicit arguments
        for name, arg in explicit_args.items():
            if name in annotations and not isinstance(arg, annotations[name]):
                raise TypeError(error % (name,
                                         annotations[name].__name__,
                                         type(arg).__name__))
        # Deal with variable positional arguments
        if spec.varargs and spec.varargs in annotations:
            annotation = annotations[spec.varargs]
            for i, arg in enumerate(args[len(spec.args):]):
                if not isinstance(arg, annotation):
                    raise TypeError(error % ('variable argument %s' % (i + 1),
                                             annotation.__name__,
                                             type(arg).__name__))
        # Deal with variable keyword arguments
        if spec.varkw and spec.varkw in annotations:
            annotation = annotations[spec.varkw]
            for name, arg in keyword_args.items():
                if not isinstance(arg, annotation):
                    raise TypeError(error % (name,
                                             annotation.__name__,
                                             type(arg).__name__))
        r = func(*args, **kwargs)
        if 'return' in annotations and not isinstance(r, annotations['return']):
            raise TypeError(error % ('the return value',
                                     annotations['return'].__name__,
                                     type(r).__name__))
        return r
    return wrapper

Factoring Out the Boilerplate

Looking over the code as it stands, you’ll notice a lot of repetition. Each form of annotation ends up doing the same things: checking to see if the value is appropriate and raising an exception if it’s not. Ideally, we’d be able to factor that into a separate function that can focus solely on the actual task of validation. The rest of the code is really just boilerplate, managing the details of finding the different types of annotations.

Because the common code will be going into a new function, the obvious way to tie it into the rest of the code is to create a new decorator. This new decorator will be placed on a function that will process the annotation for each value, so we’ll call it annotation_processor. The function passed into annotation_processor will then be used for each of the annotation types throughout the existing code:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figax_HTML.jpg
import functools
import inspect
from itertools import chain
def annotation_decorator(process):
    """
    Creates a decorator that processes annotations for each argument passed
    into its target function, raising an exception if there's a problem.
    """
    @functools.wraps(process)
    def decorator(func):
        spec = inspect.getfullargspec(func)
        annotations = spec.annotations
        defaults = spec.defaults or ()
        defaults_zip = zip(spec.args[-len(defaults):], defaults)
        kwonlydefaults = spec.kwonlydefaults or {}
        for name, value in chain(defaults_zip, kwonlydefaults.items()):
            if name in annotations:
                process(value, annotations[name])
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            # Populate a dictionary of explicit arguments passed positionally
            explicit_args = dict(zip(spec.args, args))
            keyword_args = kwargs.copy()
            # Add all explicit arguments passed by keyword
            for name in chain(spec.args, spec.kwonlyargs):
                if name in kwargs:
                    explicit_args[name] = keyword_args.pop(name)
            # Deal with explicit arguments
            for name, arg in explicit_args.items():
                if name in annotations:
                    process(arg, annotations[name])
            # Deal with variable positional arguments
            if spec.varargs and spec.varargs in annotations:
                annotation = annotations[spec.varargs]
                for arg in args[len(spec.args):]:
                    process(arg, annotation)
            # Deal with variable keyword arguments
            if spec.varkw and spec.varkw in annotations:
                annotation = annotations[spec.varkw]
                for name, arg in keyword_args.items():
                    process(arg, annotation)
            r = func(*args, **kwargs)
            if 'return' in annotations:
                process(r, annotations['return'])
            return r
        return wrapper
    return decorator

Note

Because we’re making it a bit more generic, you’ll notice that the initial portion of the decorator no longer checks that the annotations are valid types. The decorator itself no longer cares what logic you apply to the argument values, as that’s all done in the decorated function.

Now we can apply this new decorator to a much simpler function to provide a new typesafe() decorator , which functions just like the one in the previous section:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figay_HTML.jpg
@annotation_decorator
def typesafe(value, annotation):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    if not isinstance(value, annotation):
        raise TypeError("Expected %s, got %s." % (annotation.__name__,
                                                  type(value).__name__))

The benefit of doing this is that it’s much easier to modify the behavior of the decorator in the future. In addition, you can now use annotation_processor() to create new types of decorators that use annotation for different purposes, such as type coercion.

Example: Type Coercion

Rather than strictly requiring that the arguments all be the types specified when they’re passed into the function, another approach is to coerce them to the required types inside the function itself. Many of the same types that are used to validate values can also be used to coerce them directly into the types themselves. In addition, if a value can’t be coerced, the type it’s passed into raises an exception, usually a TypeError, just like our validation function.

Robustness Principle

This is one of the more obvious applications of the robustness principle. Your function requires an argument be of a specific type, but it’s much nicer to accept some variations, knowing that they can be converted to the right type before your function needs to deal with them. Likewise, coercion also helps ensure that the return value is always of a consistent type that the external code knows how to deal with.

The decorator presented in the previous section provides a good starting point for adding this behavior to a new decorator, and we can use it to modify the incoming value according to the annotation that was provided along with it. Because we’re relying on a type constructor to do all the necessary type checking and raise exceptions appropriately, this new decorator can be much simpler. In fact, it can be expressed in just one actual instruction:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figaz_HTML.jpg
@annotation_decorator
def coerce_arguments(value, annotation):
    return annotation(value)

This is so simple that it doesn’t even require the annotation be a type at all. Any function or class that returns an object will work just fine, and the value returned will be passed into the function decorated by coerce_arguments() . Or will it? If you look back at the annotation_decorator() function as it stands, there’s a minor problem that prevents it from working the way this new decorator would need it to.

The problem is that in the lines that call the process() function that was passed into the outer decorator, the return value is thrown away. If you try to use coerce_arguments() with the existing decorator, all you’ll get is the exception-raising aspect of the code, not the value coercion aspect. So, in order to work properly, we’ll need to go back and add that feature to annotation_processor().

There are a few things that need to be done overall, however. Because the annotation processor will be modifying the arguments that will be eventually sent to the decorated function, we’ll need to set up a new list for positional arguments and a new dictionary for keyword arguments. Then we have to split up the explicit argument handling, so that we can distinguish between positional and keyword arguments. Without that, the function wouldn’t be able to apply variable positional arguments correctly:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figba_HTML.jpg
        def wrapper(*args, **kwargs):
            new_args = []
            new_kwargs = {}
            keyword_args = kwargs.copy()
            # Deal with explicit arguments passed positionally
            for name, arg in zip(spec.args, args):
                if name in annotations:
                    new_args.append(process(arg, annotations[name]))
            # Deal with explicit arguments passed by keyword
            for name in chain(spec.args, spec.kwonlyargs):
                if name in kwargs and name in annotations:
                    new_kwargs[name] = process(keyword_args.pop(name),
                                               annotations[name])
            # Deal with variable positional arguments
            if spec.varargs and spec.varargs in annotations:
                annotation = annotations[spec.varargs]
                for arg in args[len(spec.args):]:
                    new_args.append(process(arg, annotation))
            # Deal with variable keyword arguments
            if spec.varkw and spec.varkw in annotations:
                annotation = annotations[spec.varkw]
                for name, arg in keyword_args.items():
                    new_kwargs[name] = process(arg, annotation)
            r = func(*new_args, **new_kwargs)
            if 'return' in annotations:
                r = process(r, annotations['return'])
            return r

With those changes in place, the new coerce_arguments() decorator will be able to replace the arguments on the fly, passing the replacements into the original function. Unfortunately, if you’re still using typesafe() from before, this new behavior causes problems because typesafe() doesn’t return a value. Fixing that is a simple matter of returning the original value, unchanged, if the type check was satisfactory:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbb_HTML.jpg
@annotation_decorator
def typesafe(value, annotation):
    """
    Verify that the function is called with the right argument types and
    that it returns a value of the right type, according to its annotations
    """
    if not isinstance(value, annotation):
        raise TypeError("Expected %s, got %s." % (annotation.__name__,
                                                  type(value).__name__))
    return value

Annotating with Decorators

The natural question to ask is: what happens if you want to use two libraries together? One might expect you to supply valid types, whereas the other expects a string to use for documentation. They’re completely incompatible with each other, which forces you to use one or the other, rather than both. Furthermore, any attempt to merge the two, using a dictionary or some other combined data type, would have to be agreed on by both libraries, as each would need to know how to get at the information it cares about.

Once you consider how many other frameworks and libraries might take advantage of these annotations, you can see how quickly the official function annotations fall apart. It’s still too early to see which applications will actually use it or how they’ll work together, but it’s certainly worth considering other options that can bypass the problems completely.

Because decorators can take arguments of their own, it’s possible to use them to provide annotations for the arguments of the functions they decorate. This way, the annotations are separate from the function itself and provided directly to the code that makes sense of them. And because multiple decorators can be stacked together on a single function, it’s already got a built-in way of managing multiple frameworks.

Example: Type Safety as a Decorator

To illustrate the decorator-based approach to function annotations, let’s consider the type safety example from earlier. It already relied on a decorator, so we can extend that to take arguments, using the same types that the annotations provided previously. Essentially, it’ll look something like this:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbc_HTML.jpg
>>> @typesafe(str, str)
... def combine(a, b):
...     return a + b
...
>>> combine('spam', 'alot')
'spamalot'
>>> combine('fail', 1)
Traceback (most recent call last):
  ...
TypeError: Wrong type for b: expected str, got int.

It works almost exactly like the true annotated version, except that the annotations are supplied to the decorator directly. In order to accept arguments, we’re going to just change the first portion of the code a bit so that we can get the annotations from the arguments instead of inspecting the function itself.

Because annotations come in through arguments to the decorator, we have a new outer wrapper for receiving them. When the next layer receives the function to be decorated it can match up the annotations with the function’s signature, providing names for any annotations passed positionally. Once all the available annotations have been given the right names, they can be used by the rest of the inner decorator without any further modifications:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbd_HTML.jpg
import functools
import inspect
from itertools import chain
def annotation_decorator(process):
    """
    Creates a decorator that processes annotations for each argument passed
    into its target function, raising an exception if there's a problem.
    """
    def annotator(*args, **kwargs):
        annotations = kwargs.copy()
        @functools.wraps(process)
        def decorator(func):
            spec = inspect.getfullargspec(func)
            annotations.update(zip(spec.args, args))
            defaults = spec.defaults or ()
            defaults_zip = zip(spec.args[-len(defaults):], defaults)
            kwonlydefaults = spec.kwonlydefaults or {}
            for name, value in chain(defaults_zip, kwonlydefaults.items()):
                if name in annotations:
                    process(value, annotations[name])
            @functools.wraps(func)
            def wrapper(*args, **kwargs):
                new_args = []
                new_kwargs = {}
                keyword_args = kwargs.copy()
                # Deal with explicit arguments passed positionally
                for name, arg in zip(spec.args, args):
                    if name in annotations:
                        new_args.append(process(arg, annotations[name]))
                # Deal with explicit arguments passed by keyword
                for name in chain(spec.args, spec.kwonlyargs):
                    if name in kwargs and name in annotations:
                        new_kwargs[name] = process(keyword_args.pop(name),
                                                   annotations[name])
                # Deal with variable positional arguments
                if spec.varargs and spec.varargs in annotations:
                    annotation = annotations[spec.varargs]
                    for arg in args[len(spec.args):]:
                        new_args.append(process(arg, annotation))
                # Deal with variable keyword arguments
                if spec.varkw and spec.varkw in annotations:
                    annotation = annotations[spec.varkw]
                    for name, arg in keyword_args.items():
                        new_kwargs[name] = process(arg, annotation)
                r = func(*new_args, **new_kwargs)
                if 'return' in annotations:
                    r = process(r, annotations['return'])
                return r
            return wrapper
        return decorator
    return annotator

That handles most of the situation, but it doesn’t handle return values yet. If you try to supply a return value using the right name, return, you’ll get a syntax error because it’s a reserved Python keyword. Trying to provide it alongside the other annotations would require each call to pass annotations using an actual dictionary, where you can provide the return annotation without upsetting Python’s syntax.

Instead, you’ll need to provide the return value annotation in a separate function call, where it can be the sole argument without any reserved name issues. When working with most types of decorators, this would be easy to do: just create a new decorator that checks the return value and be done with it. Unfortunately, as the eventual decorator you’re working with is created outside the control of our code, it’s not so easy.

If you completely detached the return value processing from the argument processing, the programmer who’s actually writing something like the typesafe() decorator would have to write it twice; once to create the argument-processing decorator and again to create the return-value–processing decorator. Because that’s a clear violation of DRY, let’s reuse as much of their work as possible.

Here’s where some design comes into play. We’re looking at going beyond just a simple decorator, so let's figure out how to best approach it so that it makes sense to those who have to use it. Thinking about the available options, one solution springs to mind fairly quickly. If we can add the extra annotation function as an attribute of the final decorator, you’d be able to write the return value annotator on the same line as the other decorator, but right afterward, in its own function call. Here’s what it might look like, if you went that route:
@typesafe(int, int).returns(int)
def add(a, b):
    return a + b

Unfortunately this isn’t an option, for reasons that can be demonstrated without even adding the necessary code to support it. The trouble is, this formation isn’t allowed as Python syntax. If typesafe() hadn’t taken any arguments it would work, but there’s no support for calling two separate functions as part of a single decorator. Instead of supplying the return value annotation in the decorator itself, let’s look somewhere else.

Another option is to use the generated typesafe() decorator to add a function as an attribute to the wrapper around the add() function. This places the return value annotation at the end of the function definition, closer to where the return value is specified. In addition, it helps clarify the fact that you can use typesafe() to supply argument decorators without bothering to check the return value, if you want to. Here’s how it would look:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbe_HTML.jpg
@typesafe(int, int)
def add(a, b):
    return a + b
add.returns(int)

It’s still very clear and perhaps even more explicit than the syntax that doesn’t work anyway. As an added bonus, the code to support it is very simple, requiring just a few lines be added to the end of the inner decorator() function:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbf_HTML.jpg
        def decorator(func):
            from itertools import chain
            spec = inspect.getfullargspec(func)
            annotations.update(zip(spec.args, args))
            defaults = spec.defaults or ()
            defaults_zip = zip(spec.args[-len(defaults):], defaults)
            kwonlydefaults = spec.kwonlydefaults or {}
            for name, value in chain(defaults_zip, kwonlydefaults.items()):
                if name in annotations:
                    process(value, annotations[name])
            @functools.wraps(func)
            def wrapper(*args, **kwargs):
                new_args = []
                new_kwargs = {}
                keyword_args = kwargs.copy()
                # Deal with explicit arguments passed positionally
                for name, arg in zip(spec.args, args):
                    if name in annotations:
                        new_args.append(process(arg, annotations[name]))
                # Deal with explicit arguments passed by keyword
                for name in chain(spec.args, spec.kwonlyargs):
                    if name in kwargs and name in annotations:
                        new_kwargs[name] = process(keyword_args.pop(name),
                                                   annotations[name])
                # Deal with variable positional arguments
                if spec.varargs and spec.varargs in annotations:
                    annotation = annotations[spec.varargs]
                    for arg in args[len(spec.args):]:
                        new_args.append(process(arg, annotation))
                # Deal with variable keyword arguments
                if spec.varkw and spec.varkw in annotations:
                    annotation = annotations[spec.varkw]
                    for name, arg in keyword_args.items():
                        new_kwargs[name] = process(arg, annotation)
                r = func(*new_args, **new_kwargs)
                if 'return' in annotations:
                    r = process(r, annotations['return'])
                return r
            def return_annotator(annotation):
                annotations['return'] = annotation
            wrapper.returns = return_annotator
            return wrapper

Because this new returns() function will be called before the final typesafe() function ever will, it can simply add a new annotation to the existing dictionary. Then, when typesafe() does get called later, the internal wrapper can just continue working like it always did. This just changes the way the return value annotation is supplied, which is all that was necessary.

Because all of this behavior was refactored into a separate decorator, you can apply this decorator to coerce_arguments() or any other similarly purposed function. The resulting function will work the same way as typesafe(), only swapping out the argument handling with whatever the new decorator needs to do.

Generators

Chapter 2 introduced the concept of generator expressions and stressed the importance of iteration. Whereas generator expressions are useful for simple situations, you’ll often need more sophisticated logic to determine how the iteration should work. You may need finer-grained control over the duration of the loop, the items getting returned, possible side effects that get triggered along the way, or any number of other concerns you may have.

Essentially, you need a real function, but with the benefits of a proper iterator and without the cognitive overhead of creating the iterator yourself. This is where generators come in. By allowing you to define a function that can produce individual values one at a time, rather than just a single return value, you have the added flexibility of a function and the performance of an iterator.

Generators are set aside from other functions by their use of the yield statement. This is somewhat of an analog to the typical return statement, except that yield doesn’t cause the function to stop executing completely. It pushes one value out of the function, which gets consumed by the loop that called the generator; then, when that loop starts over, the generator starts back up again. It picks up right where it left off, running until it finds another yield statement or the function simply finishes executing.

The basics are best illustrated by an example, so consider a simple generator that returns the values in the classic Fibonacci sequence. The sequence begins with 0 and 1; each following number is produced by adding up the two numbers before it in the sequence. Therefore, the function only ever needs to keep two numbers in memory at a time, no matter how high the sequence goes. In order to keep it from continuing on forever, however, it’s best to require a maximum number of values it should return, making a total of three values to keep track of.

It’s tempting to set up the first two values as special cases, yielding them one at a time before even starting into the main loop that would return the rest of the sequence. That adds some extra complexity, however, which can make it pretty easy to accidentally introduce an infinite loop. Instead, we’ll use a couple other seed values, –1 and 1, which can be fed right into the main loop directly. They’ll generate 0 and 1 correctly when the loop’s logic is applied.

Next, we can add a loop for all the remaining values in the sequence, up until the count is reached. Of course, by the time the loop starts two values have already been yielded, so we have to decrease count by 2 before entering the loop. Otherwise, we’d end up yielding two more values than were requested:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbg_HTML.jpg
Part one: add part two to see it in action:
def fibonacci(count):
    # These seed values generate 0 and 1 when fed into the loop
    a, b = -1, 1
    while count > 0:
        # Yield the value for this iteration
        c = a + b
        yield c
        # Update values for the next iteration
        a, b = b, c
        count -= 1

With the generator in place, you can iterate over the values it produces, simply by treating it like you would any other sequence. Generators are iterable automatically, so a standard for loop already knows how to activate it and retrieve its values. Before you add part two, do a hand trace of -1 and 1 through the structure you and can see exactly how it operates.

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbh_HTML.jpg
Part two: add to end of previous code and run:
for x in fibonacci(3):
      print(x)
# output is
#0
#1
#1
for x in fibonacci(7):
      print(x)
#output is
#0
#1
#1
#2
#3
#5
#8

Unfortunately, the main benefit of generators can also, at times, be somewhat of a burden. Because there’s no complete sequence in memory at any given time, generators always have to pick up where they left off. Most of the time, however, you’ll completely exhaust the generator when you iterate over it the first time, so when you try to put it into another loop, you won’t get anything back at all.

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbi_HTML.jpg
Add this to the end after part two and run:
fib = fibonacci(7)
print(list(fib)) # output [0, 1, 1, 2, 3, 5, 8]
print(list(fib)) # output []

This behavior can seem a bit misleading at first, but most of the time, it’s the only behavior that makes sense. Generators are often used in places where the entire sequence isn’t even known in advance or it may change after you iterate over it. For example, you might use a generator to iterate over the users currently accessing a system. Once you’ve identified all the users, the generator automatically becomes stale and you need to create a new one, which refreshes the list of users.

Note

If you’ve used the built-in range() function (or xrange() prior to Python 3.0) often enough, you may have noticed that it does restart itself if accessed multiple times. That behavior is provided by moving one level lower in the iteration process, by implementing the iterator protocol explicitly. It can’t be achieved with simple generators, but Chapter 5 shows that you can have greater control over iteration of the objects you create.

Lambdas

In addition to providing features on their own, functions are often called on to provide some extra minor bit of functionality to some other feature. For example, when sorting a list, you can configure Python’s behavior by supplying a function that accepts a list item and returns a value that should be used for comparison. This way, given a list of House objects, for instance, you can sort by price:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbj_HTML.jpg
>>> def get_price(house):
...     return house.price
...
>>> houses.sort(key=get_price)

Unfortunately, this seems like a bit of a waste of the function’s abilities, plus it requires a couple of extra lines of code and a name that never gets used outside of the sort() method call. A better approach would be if you could specify the key function directly in line with the method call. This not only makes it more concise, it also places the body of the function right where it will be used, so it’s a lot more readable for these types of simple behaviors.

In these situations, Python’s lambda form is extremely valuable. Python provides a separate syntax, identified by the keyword lambda. This allows you to define a function without a name as a single expression, with a much simpler feature set. Before diving into the details of the syntax, here’s what it looks like in the house-sorting example. Think of it as a one-line minifunction. Try the following:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbk_HTML.jpg
>>> g=lambda x: x*x
>>> g(8)  # which returns 8 * 8

As you can see, this is a considerably compressed form of a function definition. Following the lambda keyword is a list of arguments, separated by commas. In the sort example only one argument is needed, and it can be named anything you like, such as any other function. They can even have default values if necessary, using the same syntax as regular functions. Arguments are followed by a colon, which notes the beginning of the lambda’s body. If no arguments are involved, the colon can be placed immediately after the lambda keyword:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbl_HTML.jpg
>>> a = lambda: 'example'
>>> a
<function <lambda> at 0x. .>
>>> a()
'example'
>>> b = lambda x, y=3: x + y
>>> b()
Traceback (most recent call last):
TypeError: <lambda>() takes at least 1 positional argument (0 given)
>>> b(5)
8
>>> b(5, 1)
6

As you’ll have likely discovered by now, the body of the lambda is really just its return value. There’s no explicit return statement, so the entire body of the function is really just a single expression used to return a value. That’s a big part of what makes the lambda form so concise, yet easily readable, but it comes at a price: only a single expression is allowed. You can’t use any control structures, such as try, with, or while blocks; you can’t assign variables inside the function body; and you can’t perform multiple operations without them also being tied to the same overall expression.

This may seem extremely limiting, but in order to still be readable, the function body must be kept as simple as possible. In situations in which you need the additional control flow features, you’ll find it much more readable to specify it in a standard function, anyway. Then you can pass that function in where you might otherwise use the lambda. Alternatively, if you have a portion of the behavior that’s provided by some other function, but not all of it, you’re free to call out to other functions as part of the expression.

Introspection

One of the primary advantages of Python is that nearly everything can be examined at runtime, from object attributes and module contents to documentation and even generated bytecode. Peeking at this information is called introspection, and it permeates nearly every aspect of Python. The following sections define some of the more general introspection features that are available, while more specific details are given in the remaining chapters.

The most obvious attribute of a function that can be inspected is its name. It’s also one of the simplest, made available at the __name__ attribute. The return is the string used to define the function. In the case of lambdas, which have no names, the __name__ attribute is populated with the standard string '<lambda>':

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbm_HTML.jpg
>>> def example():
...     pass
...
>>> example.__name__
'example'
>>> (lambda: None).__name__
'<lambda>'

Identifying Object Types

Python’s dynamic nature can sometimes make it seem difficult to ensure you’re getting the right type of value or to even know what type of value it is. Python does provide some options for accessing that information, but it’s necessary to realize those are two separate tasks, so Python uses two different approaches.

The most obvious requirement is to identify what type of object your code was given. For this Python supplies its built-in type() function , which accepts an object to identify. The return value is the Python class that was used to create the given object, even if that creation was done implicitly, by way of a literal value:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbn_HTML.jpg
>>> type('example')
<type 'str'>
>>> class Test:
...     pass
...
>>> type(Test)
<type 'classobj'>
>>> type(Test())
<type 'instance'>

Chapter 4 explains in detail what you can do with that class object once you have it, but the more common case is to compare an object against a particular type you expect to receive. This is a different situation because it doesn’t really matter exactly what type the object is. As long as the value is an instance of the right type, you can make correct assumptions about how it behaves.

There are a number of different utility functions available for this purpose, most of which are covered in Chapter 4. This section and the next chapter will make use of one of them fairly frequently, so it merits some explanation here. The isinstance() function accepts two arguments: the object to check and the type you’re expecting it to be. The result is a simple True or False, making it suitable for if blocks:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbo_HTML.jpg
>>> def test(value):
...     if isinstance(value, int):
...         print('Found an integer!')
...
>>> test('0')
>>> test(0)
Found an integer!

Modules and Packages

Functions and classes that are defined in Python are placed inside of modules, which in turn are often part of a package structure. Accessing this structure when importing code is easy enough, using documentation or even just peeking at the source files on disk. Given a piece of code, however, it’s often useful to identify where it was defined in the source code.

For this reason, all functions and classes have a __module__ attribute, which contains the import location of the module where the code was defined. Rather than just supplying the name of the module, the math.sin._module__ also includes the full path to where the module resides. Essentially, it’s enough information for you to pass it straight into any of the dynamic importing features shown in Chapter 2.

Working with the interactive interpreter is something of a special case because there’s no named source file to work with. Any functions or classes defined there will have the special name '__main__' returned from the __module__ attribute:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbp_HTML.jpg
>>> def example():
...     pass
...
>>> example
<function example at 0x...>
>>> example.__module__
'__main__'

Docstrings

Because you can document your functions with docstrings included right alongside the code, Python also stores those strings as part of the function object. By accessing the __doc__ attribute of a function, you can read a docstring into code, which can be useful for generating a library’s documentation on the fly. Consider the following example, showing simple docstring access on a simple function:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbq_HTML.jpg
def example():
    """This is just an example to illustrate docstring access."""
    pass
print(example.__doc__)  # which outputs This is just an example to illustrate docstring access.
Next, try the following from a prompt:
../images/330715_3_En_3_Chapter/330715_3_En_3_Figbr_HTML.jpg
>>> def divide(x, y):
...     """
...     divide(integer, integer) -> floating point
...
...     This is a more complex example, with more comprehensive documentation.
...     """
...     return float(x) / y # Use float()for compatibility prior to 3.0
...
>>> divide.__doc__
'     divide(integer, integer) -> floating point     This is a more complex example, with more comprehensive documentation.     '
>>> print(divide.__doc__)
    divide(integer, integer) -> floating point

This is a more complex example, with more comprehensive documentation.

As you can see, simple docstrings are easy to handle just by reading in __doc__ and using it however you need to. Unfortunately, more complex docstrings will retain all whitespace, including newlines, making them more challenging to work with. Worse yet, your code can’t know which type of docstring you’re looking at without scanning it for certain characters. Even if you’re just printing it out to the interactive prompt, you still have an extra line before and after the real documentation, as well as the same indentation as was present in the file.

To handle complex docstrings more gracefully, like the one shown in the example, the inspect module mentioned previously also has a getdoc() function , designed to retrieve and format docstrings. It strips out whitespace both before and after the documentation, as well as any indentation that was used to line up the docstring with the code around it. Here’s that same docstring again, but formatted with inspect.getdoc():

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbs_HTML.jpg
>>> import inspect
>>> print(inspect.getdoc(divide))
divide(integer, integer) -> floating point
This is a more complex example, with more comprehensive documentation.

We still have to use print() at the interactive prompt because the newline character is still retained in the result string. All inspect.getdoc() strips out is the whitespace that was used to make the docstring look right alongside the code for the function. In addition to trimming the space at the beginning and end of the docstring, getdoc() uses a simple technique to identify and remove whitespace used for indentation.

Essentially, getdoc() counts the number of spaces at the beginning of each line of code, even if the answer is 0. Then it determines the lowest value of those counts and removes that many characters from each line that remains after the leading and trailing whitespace has been removed. This allows you to keep other indentation in the docstring intact, as long as it’s greater than what you need to align the text with the surrounding code. Here’s an example of an even more complex docstring, so you can see how inspect.getdoc() handles it:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbt_HTML.jpg
>>> def clone(obj, count=1):
...     """
...    clone(obj, count=1) -> list of cloned objects
...
...    Clone an object a specified number of times, returning the cloned
...    objects as a list. This is just a shallow copy only.
...
...    obj
...        Any Python object
...    count
...        Number of times the object will be cloned
...
...      >>> clone(object(), 2)
...      [<object object at 0x12345678>, <object object at 0x87654321>]
...    """
...    import copy
...    return [copy.copy(obj) for x in count]
...
>>> print(inspect.getdoc(clone))
clone(obj, count=1) -> list of cloned objects
Clone an object a specified number of times, returning the cloned
objects as a list. This is just a shallow copy only.
obj
    Any Python object
count
    Number of times the object will be cloned
  >>> clone(object(), 2)
  [<object object at 0x12345678>, <object object at 0x87654321>]

Notice how the descriptions of each argument are still indented four spaces, just like they appeared to be in the function definition. The shortest lines had just four total spaces at the beginning, while those had eight, so Python stripped out the first four, leaving the rest intact. Likewise, the example interpreter session was indented by two extra spaces, so the resulting string maintains a two-space indentation.

Oh, and don’t worry too much about the copy function just yet. Chapter 6 describes in detail how to make and manage copies of objects when necessary.

Exciting Python Extensions: Statistics

Most people working with statistical analysis might not consider Python as a first choice. Since Python is a general-purpose language and other languages such as R, SAS, or SPSS are aimed at statistics directly, this makes sense. However, Python, via it’s rich set of libraries, might be a good choice, especially since it is so user-friendly and handles data acquisition with ease. It is integrated well with other languages. However, let’s see how easy it is to work with statistical analysis with Python. One library to use is Pandas (Python Data Analysis Library).

Install Pandas and Matplotlib

Use PIP to install Pandas.
  1. 1)

    From an escalated command prompt, type: pip install pandas (enter)

    This will also install NumPy and datautils, which will be needed. Assuming you had no errors, make a file and try a test read to make sure it works.

     
  2. 2)

    Type: pip install matplotlib (enter)

     

Make a Text File of Data

First, we will make a CSV (comma separated values) text file with some hypothetical data. This could be data from the Internet, or a database, and so on. You might well have a spreadsheet (e.g., Excel or OpenOffice) of data you want to work with. These packages make it easy to “save as” CSV format. For now, use your favorite text editor.
  1. 1)

    Start Notepad (Windows) and enter the following, saving as a text file to the same folder where you are going to save your Python file to read it. Make sure the text file and Python file are in the same folder!

     
../images/330715_3_En_3_Chapter/330715_3_En_3_Figbu_HTML.jpg
  1. 2)

    Save the file as “students.csv” and make sure a txt extension is not appended to the file name; the complete file name should only be “students.csv”.

     

Use Pandas to Display Data

Now, let’s test and see if we can read our CSV data and display it to the screen. Once this works, we can work with the data a bit. Create a Python script and run the following, giving the Python file a valid name of your own choice:
import pandas
data = pandas.read_csv('students.csv', sep=',', na_values=".")
print (data)

Your output should be similar to the following:

../images/330715_3_En_3_Chapter/330715_3_En_3_Figbv_HTML.jpg

Output from reading students.csv data file using Pandas.

Running Some Data Analysis

In this next example, let’s look at the average age of the students who are in different majors. The statistics library’s make this easy, in this case the function is mean() and groupby() :
import pandas
data = pandas.read_csv('students.csv', sep=',', na_values=".")
print (data)
groupby_major = data.groupby('Major')
for major, student_age in groupby_major['Age']:
        print( 'The average age for', major, 'majors is: ', student_age.mean())
../images/330715_3_En_3_Chapter/330715_3_En_3_Figbw_HTML.jpg

Average student age output for various majors.

The unique() function will show you only unique values for a given column of data. For example, using our students.csv file, we can list only the majors that are in the dataset. Note that the column field is case-sensitive, so you would want to either display or view the original CSV file to make sure your case is correct. In this case a capital M is needed in Major, or it would not function properly:
import pandas
data = pandas.read_csv('students.csv', sep=',', na_values=".")
dif_majors = data.Major.unique()
print(dif_majors)
Next, you might want to only access certain columns of data. Consider the following, where only the Major and GPA columns of data will be extracted and displayed:
import pandas
data = pandas.read_csv('students.csv', sep=',', na_values=".")
major_gpa = data[['Major','GPA']].head(10)
print (major_gpa)

Plotting with Matplotlib

The Matplotlib library will allow you to visualize your numeric data, which is very important with trying to convey information to the general population. In fact, visualizing data can help even data experts to find hidden meaning from the information. Try the following example to see how easy it is to visualize a series of data values graphically:
import matplotlib.pyplot as plt
plt.plot([1,8,2,9,6]) # x values
plt.ylabel('Data readings for five hours') #y values
plt.show()

Types of Charts

There are many types of charts available. A quick visit to Matplotlib.​org will show new additions and features of the library for pyplot, which are evolving at a rapid pace. Consider the following to see just a few of the many types of charts available to you from this library:
#Pie chart example
import matplotlib.pyplot as plt
#Data sets 1 - 5
sets = 'D 1', 'D 2', 'D 3', 'D 4', 'D 5'
data = [5, 10, 15, 20, 50]
plt.pie(data, labels=sets)
plt.show()

There are many others such as bar, hist (histogram), box, density, area, scatter, and XKCD-style charts (comic web site with Pythonish humor). The format is similar to pie.

Combine Matplotlib with Pandas

Now that we have the basics down for visualizing data let’s visualize a larger data set, which would be a bit more practical: you would not normally type every value into your code, but would be reading from a CSV file or similar, perhaps obtained from an Internet site. We will combine data visualization with Pandas. In the following example we add a few functions such as tick and title and make a histogram of students in age ranges from the students.csv data set. Pandas and Matplotlib with pyplot are good tools to use in combination:
import pandas
import matplotlib.pyplot as plt
data = pandas.read_csv('students.csv', sep=',', na_values=".")
age = data[['Age']]
print(age)
plt.hist(age)
plt.xticks(range(18,33))
plt.title('Ages of students')
plt.show()

The Pandas and Matplotlib documentation and main web site of course will describe other functions available, but this will get you using Pandas features so that you can easily integrate other features you might need into your applications as needed.

Taking It with You

Although Python functions may seem to be quite simple on the surface, you now know how to define and manage them in ways that really fit your needs. Of course, you’re probably looking to incorporate functions into a more comprehensive object-oriented program, and for that, we’ll need to look at how Python’s classes work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset