Chapter 4. Dynamic Code Execution

There are some occasions when it is easier to write a piece of code that generates the code we need than to write the needed code directly. And in some contexts it is useful to let users enter code (e.g., functions in a spreadsheet), and to let Python execute the entered code for us rather than to write a parser and handle it ourselves—although executing arbitrary code like this is a potential security risk, of course. Another use case for dynamic code execution is to provide plug-ins to extend a program’s functionality. Using plug-ins has the disadvantage that all the necessary functionality is not built into the program (which can make the program more difficult to deploy and runs the risk of plug-ins getting lost), but has the advantages that plug-ins can be upgraded individually and can be provided separately, perhaps to provide enhancements that were not originally envisaged.

4.1. Dynamic Code Execution

The easiest way to execute an expression is to use the built-in eval() function. For example:

x = eval("(2 ** 31) - 1")    # x == 2147483647


This is fine for user-entered expressions, but what if we need to create a function dynamically? For that we can use the built-in exec() function. For example, the user might give us a formula such as 4πr2 and the name “area of sphere”, which they want turned into a function. Assuming that we replace π with math.pi, the function they want can be created like this:

import math
code = '''
def area_of_sphere(r):
    return 4 * math.pi * r ** 2
'''
context = {}
context["math"] = math
exec(code, context)


We must use proper indentation—after all, the quoted code is standard Python. (Although in this case we could have written it all on a single line because the suite is just one line.)

If exec() is called with some code as its only argument there is no way to access any functions or variables that are created as a result of the code being executed. Furthermore, exec() cannot access any imported modules or any of the variables, functions, or other objects that are in scope at the point of the call. Both of these problems can be solved by passing a dictionary as the second argument. The dictionary provides a place where object references can be kept for accessing after the exec() call has finished. For example, the use of the context dictionary means that after the exec() call, the dictionary has an object reference to the area_of_sphere() function that was created by exec(). In this example we needed exec() to be able to access the math module, so we inserted an item into the context dictionary whose key is the module’s name and whose value is an object reference to the corresponding module object. This ensures that inside the exec() call, math.pi is accessible.

In some cases it is convenient to provide the entire global context to exec(). This can be done by passing the dictionary returned by the globals() function. One disadvantage of this approach is that any objects created in the exec() call would be added to the global dictionary. A solution is to copy the global context into a dictionary, for example, context = globals().copy(). This still gives exec() access to imported modules and the variables and other objects that are in scope, and because we have copied, any changes to the context made inside the exec() call are kept in the context dictionary and are not propagated to the global environment. (It would appear to be more secure to use copy.deepcopy(), but if security is a concern it is best to avoid exec() altogether.) We can also pass the local context, for example, by passing locals() as a third argument—this makes objects in the local scope accessible to the code executed by exec().

After the exec() call the context dictionary contains a key called "area_of_ sphere" whose value is the area_of_sphere() function. Here is how we can access and call the function:

area_of_sphere = context["area_of_sphere"]
area = area_of_sphere(5)            # area == 314.15926535897933


The area_of_sphere object is an object reference to the function we have dynamically created and can be used just like any other function. And although we created only a single function in the exec() call, unlike eval(), which can operate on only a single expression, exec() can handle as many Python statements as we like, including entire modules, as we will see in the next subsubsection.

4.2. Dynamically Importing Modules

Python provides three straightforward mechanisms that can be used to create plug-ins, all of which involve importing modules by name at runtime. And once we have dynamically imported additional modules, we can use Python’s introspection functions to check the availability of the functionality we want, and to access it as required.

In this subsubsection we will review the magic-numbers.py program. This program reads the first 1000 bytes of each file given on the command line and for each one outputs the file’s type (or the text “Unknown”), and the filename. Here is an example command line and an extract from its output:

C:Python30python.exe magic-numbers.py c:windows*.*
...
XML.................c:windowsWindowsShell.Manifest
Unknown.............c:windowsWindowsUpdate.log
Windows Executable..c:windowswinhelp.exe
Windows Executable..c:windowswinhlp32.exe
Windows BMP Image...c:windowswinnt.bmp
...


The program tries to load in any module that is in the same directory as the program and whose name contains the text “magic”.Such modules are expected to provide a single public function, get_file_type(). Two very simple example modules, StandardMagicNumbers.py and WindowsMagicNumbers.py, that each have a get_file_type() function are provided with the book’s examples.

We will review the program’s main() function in two parts.

def main():
    modules = load_modules()
    get_file_type_functions = []
    for module in modules:
        get_file_type = get_function(module, "get_file_type")
        if get_file_type is not None:
            get_file_type_functions.append(get_file_type)


In a moment, we will look at three different implementations of the load_modules() function which returns a (possibly empty) list of module objects, and we will look at the get_function() function further on. For each module found we try to retrieve a get_file_type() function, and add any we get to a list of such functions.

for file in get_files(sys.argv[1:]):
    fh = None
    try:
        fh = open(file, "rb")
        magic = fh.read(1000)
        for get_file_type in get_file_type_functions:
            filetype = get_file_type(magic,
                                     os.path.splitext(file)[1])
            if filetype is not None:
                print("{0:.<20}{1}".format(filetype, file))
                break
        else:
            print("{0:.<20}{1}".format("Unknown", file))

        except EnvironmentError as err:
            print(err)
        finally:
            if fh is not None:
                fh.close()


This loop iterates over every file listed on the command line and for each one reads its first 1 000 bytes. It then tries each get_file_type() function in turn to see whether it can determine the current file’s type. If the file type is determined, the details are printed and the inner loop is broken out of, with processing continuing with the next file. If no function can determine the file type—or if no get_file_type() functions were found—an “Unknown” line is printed.

We will now review three different (but equivalent) ways of dynamically importing modules, starting with the longest and most difficult approach, since it shows every step explicitly:

def load_modules():
    modules = []
    for name in os.listdir(os.path.dirname(__file__) or "."):
        if name.endswith(".py") and "magic" in name.lower():
            filename = name
            name = os.path.splitext(name)[0]
            if name.isidentifier() and name not in sys.modules:
                fh = None
                try:
                    fh = open(filename, "r", encoding="utf8")
                    code = fh.read()
                    module = type(sys)(name)
                    sys.modules[name] = module
                    exec(code, module.__dict__)
                    modules.append(module)
                except (EnvironmentError, SyntaxError) as err:
                    sys.modules.pop(name, None)
                    print(err)
                finally:
                    if fh is not None:
                        fh.close()
    return modules


We begin by iterating over all the files in the program’s directory. If this is the current directory, os.path.dirname(__file__) will return an empty string which would cause os.listdir() to raise an exception, so we pass "." if necessary. For each candidate file (ends with .py and contains the text “magic”), we get the module name by chopping off the file extension. If the name is a valid identifier it is a viable module name, and if it isn’t already in the global list of modules maintained in the sys.modules dictionary we can try to import it.

We read the text of the file into the code string. The next line, module = type(sys)(name), is quite subtle. When we call type() it returns the type object of the object it is given. So if we called type(1) we would get int back. If we print the type object we just get something human readable like “int”, but if we call the type object as a function, we get an object of that type back. For example, we can get the integer 5 in variable x by writing x = 5, or x = int(5), or x = type(0)(5), or int_type = type(0); x = int_type(5). In this case we’ve used type(sys) and sys is a module, so we get back the module type object (essentially the same as a class object), and can use it to create a new module with the given name. Just as with the int example where it didn’t matter what integer we used to get the int type object, it doesn’t matter what module we use (as long as it is one that exists, that is, has been imported) to get the module type object.

Once we have a new (empty) module, we add it to the global list of modules to prevent the module from being accidentally reimported. This is done before calling exec() to more closely mimic the behavior of the import statement. Then we call exec() to execute the code we have read—and we use the module’s dictionary as the code’s context. At the end we add the module to the list of modules we will pass back. And if a problem arises, we delete the module from the global modules dictionary if it has been added—it will not have been added to the list of modules if an error occurred. Notice that exec() can handle any amount of code (whereas eval() evaluates a single expression—see Table 1), and raises a SyntaxError exception if there’s a syntax error.

Table 1. Dynamic Programming and Introspection Functions

image

Here’s the second way to dynamically load a module at runtime—the code shown here replaces the first approach’s try ... except block:

try:
    exec("import " + name)
    modules.append(sys.modules[name])
except SyntaxError as err:
    print(err)


One theoretical problem with this approach is that it is potentially insecure. The name variable could begin with sys; and be followed by some destructive code.

And here is the third approach, again just showing the replacement for the first approach’s try ... except block:

try:
    module = __import__(name)
    modules.append(module)
except (ImportError, SyntaxError) as err:
    print(err)


This is the easiest way to dynamically import modules and is slightly safer than using exec(), although like any dynamic import, it is by no means secure because we don’t know what is being executed when the module is imported.

None of the techniques shown here handles packages or modules in different paths, but it is not difficult to extend the code to accommodate these—although it is worth reading the online documentation, especially for __import__(), if more sophistication is required.

Having imported the module we need to be able to access the functionality it provides. This can be achieved using Python’s built-in introspection functions, getattr() and hasattr(). Here’s how we have used them to implement the get_function() function:

def get_function(module, function_name):
    function = get_function.cache.get((module, function_name), None)
    if function is None:
        try:
            function = getattr(module, function_name)
            if not hasattr(function, "__call__"):
                raise AttributeError()
            get_function.cache[module, function_name] = function
        except AttributeError:
            function = None
    return function
get_function.cache = {}


Ignoring the cache-related code for a moment, what the function does is call getattr() on the module object with the name of the function we want. If there is no such attribute an AttributeError exception is raised, but if there is such an attribute we use hasattr() to check that the attribute itself has the __call__ attribute—something that all callables (functions and methods) have. (Further on we will see a nicer way of checking whether an attribute is callable.) If the attribute exists and is callable we can return it to the caller; otherwise, we return None to signify that the function isn’t available.

If hundreds of files were being processed (e.g., due to using *.* in the C:windows directory), we don’t want to go through the lookup process for every module for every file. So immediately after defining the get_function() function, we add an attribute to the function, a dictionary called cache. (In general, Python allows us to add arbitrary attributes to arbitrary objects.) The first time that get_function() is called the cache dictionary is empty, so the dict.get() call will return None. But each time a suitable function is found it is put in the dictionary with a 2-tuple of the module and function name used as the key and the function itself as the value. So the second and all subsequent times a particular function is requested the function is immediately returned from the cache and no attribute lookup takes place at all.[*]

[*] A slightly more sophisticated get_function() that has better handling of modules without the required functionality is in the magic-numbers.py program alongside the version shown here.

The technique used for caching the get_function()’s return value for a given set of arguments is called memoizing. It can be used for any function that has no side effects (does not change any global variables), and that always returns the same result for the same (immutable) arguments. Since the code required to create and manage a cache for each memoized function is the same, it is an ideal candidate for a function decorator, and several @memoize decorator recipes are given in the Python Cookbook, in code.activestate.com/recipes/langs/python/. However, module objects are mutable, so some off-the-shelf memoizer decorators wouldn’t work with our get_function() function as it stands. An easy solution would be to use each module’s __name__ string rather than the module itself as the first part of the key tuple.

Dynamic Programming and Introspection Functions

Doing dynamic module imports is easy, and so is executing arbitrary Python code using the exec() function. This can be very convenient, for example, allowing us to store code in a database. However, we have no control over what imported or exec()uted code will do. Recall that in addition to variables, functions, and classes, modules can also contain code that is executed when it is imported—if the code came from an untrusted source it might do something unpleasant. How to address this depends on circumstances, although it may not be an issue at all in some environments, or for personal projects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset