Chapter 5. Local and Recursive Functions

It is often useful to have one or more small helper functions inside another function. Python allows this without formality—we simply define the functions we need inside the definition of an existing function. Such functions are often called nested functions or local functions.

One common use case for local functions is when we want to use recursion. In these cases, the enclosing function is called, sets things up, and then makes the first call to a local recursive function. Recursive functions (or methods) are ones that call themselves. Structurally, all directly recursive functions can be seen as having two cases: the base case and the recursive case. The base case is used to stop the recursion.

Recursive functions can be computationally expensive because for every recursive call another stack frame is used; however, some algorithms are most naturally expressed using recursion. Most Python implementations have a fixed limit to how many recursive calls can be made. The limit is returned by sys.getrecursionlimit() and can be changed by sys.setrecursionlimit(), although increasing the limit is most often a sign that the algorithm being used is inappropriate or that the implementation has a bug.

The classic example of a recursive function is one that is used to calculate factorials.[*] For example, factorial(5) will calculate 5! and return 120, that is, 1 × 2 × 3 × 4 × 5:

[*] Python’s math module provides a much more efficient math.factorial() function.

def factorial(x):
    if x <= 1:
        return 1
    return x * factorial(x - 1)


This is not an efficient solution, but it does show the two fundamental features of recursive functions. If the given number, x, is 1 or less, 1 is returned and no recursion occurs—this is the base case. But if x is greater than 1 the value returned is x * factorial(x - 1), and this is the recursive case because here the factorial function calls itself. The function is guaranteed to terminate because if the initial x is less than or equal to 1 the base case will be used and the function will finish immediately, and if x is greater than 1, each recursive call will be on a number one less than before and so will eventually be 1.

To see both local functions and recursive functions in a meaningful context we will study the indented_list_sort() function from module file IndentedList.py. This function takes a list of strings that use indentation to create a hierarchy, and a string that holds one level of indent, and returns a list with the same strings but where all the strings are sorted in case-insensitive alphabetical order, with indented items sorted under their parent item, recursively, as the before and after lists shown in Figure 1 illustrate.

Figure 1. Before and after sorting an indented list

image

Given the before list, the after list is produced by this call: after = Indent-edList.indented_list_sort(before). The default indent value is four spaces, the same as the indent used in the before list, so we did not need to set it explicitly.

We will begin by looking at the indented_list_sort() function as a whole, and then we will look at its two local functions.

def indented_list_sort(indented_list, indent="    "):
    KEY, ITEM, CHILDREN = range(3)

    def add_entry(level, key, item, children):
        ...

    def update_indented_list(entry):
        ...

    entries = []
    for item in indented_list:
        level = 0
        i = 0
        while item.startswith(indent, i):
            i += len(indent)
            level += 1
        key = item.strip().lower()
        add_entry(level, key, item, entries)

    indented_list = []
    for entry in sorted(entries):
        update_indented_list(entry)
    return indented_list


The code begins by creating three constants that are used to provide names for index positions used by the local functions. Then we define the two local functions which we will review in a moment. The sorting algorithm works in two stages. In the first stage we create a list of entries, each a 3-tuple consisting of a “key” that will be used for sorting, the original string, and a list of the string’s child entries. The key is just a lowercased copy of the string with whitespace stripped from both ends. The level is the indentation level, 0 for top-level items, 1 for children of top-level items, and so on. In the second stage we create a new indented list and add each string from the sorted entries list, and each string’s child strings, and so on, to produce a sorted indented list.

    def add_entry(level, key, item, children):
        if level == 0:
            children.append((key, item, []))
        else:
            add_entry(level - 1, key, item, children[-1][CHILDREN])


This function is called for each string in the list. The children argument is the list to which new entries must be added. When called from the outer function (indented_list_sort()), this is the entries list. This has the effect of turning a list of strings into a list of entries, each of which has a top-level (unindented) string and a (possibly empty) list of child entries.

If the level is 0 (top-level), we add a new 3-tuple to the entries list. This holds the key (for sorting), the original item (which will go into the resultant sorted list), and an empty children list. This is the base case since no recursion takes place. If the level is greater than 0, the item is a child (or descendant) of the last item in the children list. In this case we recursively call add_entry() again, reducing the level by 1 and passing the children list’s last item’s children list as the list to add to. If the level is 2 or more, more recursive calls will take place, until eventually the level is 0 and the children list is the right one for the entry to be added to.

For example, when the “Inner Transitionals” string is reached, the outer function calls add_entry() with a level of 0, a key of “inner transitionals”, an item of “Inner Transitionals”, and the entries list as the children list. Since the level is 0, a new item will be appended to the children list (entries), with the key, item, and an empty children list. The next string is “Lanthanides”—this is indented, so it is a child of the “Inner Transitionals” string. The add_entry() call this time has a level of 1, a key of “lanthanides”, an item of “Lanthanides”, and the entries list as the children list. Since the level is 1, the add_entry() function calls itself recursively, this time with level 0 (1 - 1), the same key and item, but with the children list being the children list of the last item, that is, the “Inner Transitionals” item’s children list.

Here is what the entries list looks like once all the strings have been added, but before the sorting has been done:

[('nonmetals',
  'Nonmetals',
  [('hydrogen', '     Hydrogen', []),
   ('carbon', '     Carbon', []),
   ('nitrogen', '     Nitrogen', []),
   ('oxygen', '     Oxygen', [])]),
 ('inner transitionals',
  'Inner Transitionals',
  [('lanthanides',
    '    Lanthanides',
    [('cerium', '         Cerium', []),
     ('europium', '         Europium', [])]),
   ('actinides',
    '    Actinides',
    [('uranium', '         Uranium', []),
     ('curium', '         Curium', []),
     ('plutonium', '         Plutonium', [])])]),
 ('alkali metals',
  'Alkali Metals',
  [('lithium', '     Lithium', []),
   ('sodium', '     Sodium', []),
   ('potassium', '     Potassium', [])])]


The output was produced using the pprint (“pretty print”) module’s pprint. pprint() function. Notice that the entries list has only three items (all of which are 3-tuples), and that each 3-tuple’s last element is a list of child 3-tuples (or is an empty list).

The add_entry() function is both a local function and a recursive function. Like all recursive functions, it has a base case (in this function, when the level is 0) that ends the recursion, and a recursive case.

The function could be written in a slightly different way:

def add_entry(key, item, children):
    nonlocal level
    if level == 0:
        children.append((key, item, []))
    else:
        level -= 1
        add_entry(key, item, children[-1][CHILDREN])


Here, instead of passing level as a parameter, we use a nonlocal statement to access a variable in an outer enclosing scope. If we did not change level inside the function we would not need the nonlocal statement—in such a situation, Python would not find it in the local (inner function) scope, and would look at the enclosing scope and find it there. But in this version of add_entry() we need to change level’s value, and just as we need to tell Python that we want to change global variables using the global statement (to prevent a new local variable from being created rather than the global variable updated), the same applies to variables that we want to change but which belong to an outer scope. Although it is often best to avoid using global altogether, it is also best to use nonlocal with care.

    def update_indented_list(entry):
        indented_list.append(entry[ITEM])
        for subentry in sorted(entry[CHILDREN]):
            update_indented_list(subentry)


In the algorithm’s first stage we build up a list of entries, each a (key, item, children) 3-tuple, in the same order as they are in the original list. In the algorithm’s second stage we begin with a new empty indented list and iterate over the sorted entries, calling update_indented_list() for each one to build up the new indented list. The update_indented_list() function is recursive. For each top-level entry it adds an item to the indented_list, and then calls itself for each of the item’s child entries. Each child is added to the indented_list, and then the function calls itself for each child’s children—and so on. The base case (when the recursion stops) is when an item, or child, or child of a child, and so on has no children of its own.

Python looks for indented_list in the local (inner function) scope and doesn’t find it, so it then looks in the enclosing scope and finds it there. But notice that inside the function we append items to the indented_list even though we have not used nonlocal. This works because nonlocal (and global) are concerned with object references, not with the objects they refer to. In the second version of add_entry() we had to use nonlocal for level because the += operator applied to a number rebinds the object reference to a new object—what really happens is level = level + 1, so level is set to refer to a new integer object. But when we call list.append() on the indented_list, it modifies the list itself and no rebinding takes place, and therefore nonlocal is not necessary. (For the same reason, if we have a dictionary, list, or other global collection, we can add or remove items from it without using a global statement.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset