Chapter 12. Modules

Chapter Topics

This chapter focuses on Python modules and how data are imported from modules into your programming environment. We will also take a look at packages. Modules are a means to organize Python code, and packages help you organize modules. We conclude this chapter with a look at other related aspects of modules.

12.1 What Are Modules?

A module allows you to logically organize your Python code. When code gets to be large enough, the tendency is to break it up into organized pieces that can still interact with one another at a functioning level. These pieces generally have attributes that have some relation to one another, perhaps a single class with its member data variables and methods, or maybe a group of related, yet independently operating functions. These pieces should be shared, so Python allows a module the ability to “bring in” and use attributes from other modules to take advantage of work that has been done, maximizing code reusability. This process of associating attributes from other modules with your module is called importing. In a nutshell, modules are self-contained and organized pieces of Python code that can be shared.

12.2 Modules and Files

If modules represent a logical way to organize your Python code, then files are a way to physically organize modules. To that end, each file is considered an individual module, and vice versa. The filename of a module is the module name appended with the .py file extension. There are several aspects we need to discuss with regard to what the file structure means to modules. Unlike other languages in which you import classes, in Python you import modules or module attributes.

12.2.1 Module Namespaces

We will discuss namespaces in detail later in this chapter, but the basic concept of a namespace is an individual set of mappings from names to objects. As you are no doubt aware, module names play an important part in the naming of their attributes. The name of the attribute is always prepended with the module name. For example, the atoi() function in the string module is called string.atoi(). Because only one module with a given name can be loaded into the Python interpreter, there is no intersection of names from different modules; hence, each module defines its own unique namespace. If I created a function called atoi() in my own module, perhaps mymodule, its name would be mymodule.atoi(). So even if there is a name conflict for an attribute, the fully qualified name—referring to an object via dotted attribute notation—prevents an exact and conflicting match.

12.2.2 Search Path and Path Search

The process of importing a module requires a process called a path search. This is the procedure of checking “predefined areas” of the file system to look for your mymodule.py file in order to load the mymodule module. These predefined areas are no more than a set of directories that are part of your Python search path. To avoid the confusion between the two, think of a path search as the pursuit of a file through a set of directories, the search path.

There may be times where importing a module fails:

image

When this error occurs, the interpreter is telling you it cannot access the requested module, and the likely reason is that the module you desire is not in the search path, leading to a path search failure.

A default search path is automatically defined either in the compilation or installation process. This search path may be modified in one of two places.

One is the PYTHONPATH environment variable set in the shell or command-line interpreter that invokes Python. The contents of this variable consist of a colon-delimited set of directory paths. If you want the interpreter to use the contents of this variable, make sure you set or update it before you start the interpreter or run a Python script.

Once the interpreter has started, you can access the path itself, which is stored in the sys module as the sys.path variable. Rather than a single string that is colon-delimited, the path has been “split” into a list of individual directory strings. Below is an example search path for a Unix machine. Your mileage will definitely vary as you go from system to system.

image

Bearing in mind that this is just a list, we can definitely take liberty with it and modify it at our leisure. If you know of a module you want to import, yet its directory is not in the search path, by all means use the list’s append() method to add it to the path, like so:

sys.path.append('/home/wesc/py/lib')

Once this is accomplished, you can then load your module. As long as one of the directories in the search path contains the file, then it will be imported. Of course, this adds the directory only to the end of your search path. If you want to add it elsewhere, such as in the beginning or middle, then you have to use the insert() list method for those. In our examples above, we are updating the sys.path attribute interactively, but it will work the same way if run as a script.

Here is what it would look like if we ran into this problem interactively:

image

On the flip side, you may have too many copies of a module. In the case of duplicates, the interpreter will load the first module it finds with the given name while rummaging through the search path in sequential order.

To find out what modules have been successfully imported (and loaded) as well as from where, take a look at sys.modules. Unlike sys.path, which is a list of modules, sys.modules is a dictionary where the keys are the module names with their physical location as the values. Finally, site-packages is where third-party or external modules or packages are installed (for all users). Per-user site-packages was added in 2.6 (see PEP 370).

image

12.3 Namespaces

A namespace is a mapping of names (identifiers) to objects. The process of adding a name to a namespace consists of binding the identifier to the object (and increasing the reference count to the object by one). The Python Language Reference also includes the following definitions: “changing the mapping of a name is called rebinding [, and] removing a name is unbinding.”

As briefly introduced in Chapter 11, there are either two or three active namespaces at any given time during execution. These three namespaces are the local, global, and built-ins namespaces, but local name-spaces come and go during execution, hence the “two or three” we just alluded to. The names accessible from these namespaces are dependent on their loading order, or the order in which the namespaces are brought into the system.

The Python interpreter loads the built-ins namespace first. This consists of the names in the __builtins__ module. Then the global namespace for the executing module is loaded, which then becomes the active namespace when the module begins execution. Thus we have our two active namespaces.

Core Note: __builtins__ versus __builtin__

image

The __builtins__ module should not be confused with the __builtin__ module. The names, of course, are so similar that it tends to lead to some confusion among new Python programmers who have gotten this far. The __builtins__ module consists of a set of built-in names for the built-ins namespace. Most, if not all, of these names come from the __builtin__ module, which is a module of the built-in functions, exceptions, and other attributes. In standard Python execution, __builtins__ contains all the names from __builtin__. Python used to have a restricted execution model that allowed modification of __builtins__ where key pieces from __builtin__ were left out to create a sandbox environment. However, due its security flaws and the difficulty involved with repairing it, restricted execution is no longer supported in Python (as of 2.3).

When a function call is made during execution, the third, a local, namespace is created. We can use the globals() and locals() built-in functions to tell us which names are in which namespaces. We will discuss both functions in more detail later on in this chapter.

12.3.1 Namespaces versus Variable Scope

Okay, now that we know what namespaces are, how do they relate to variable scope again? They seem extremely similar. The truth is, you are quite correct.

Namespaces are purely mappings between names and objects, but scope dictates how, or rather where, one can access these names based on the physical location from within your code. We illustrate the relationship between namespaces and variable scope in Figure 12-1.

Figure 12-1. Namespaces versus variable scope

image

Notice that each of the namespaces is a self-contained unit. But looking at the namespaces from the scoping point of view, things appear different. All names within the local namespace are within my local scope. Any name outside my local scope is in my global scope.

Also keep in mind that during the execution of the program, the local namespaces and scope are transient because function calls come and go, but the global and built-ins namespaces remain.

Our final thought to you in this section is, when it comes to namespaces, ask yourself the question, “Does it have it?” And for variable scope, ask, “Can I see it?”

12.3.2 Name Lookup, Scoping, and Overriding

So how do scoping rules work in relationship to namespaces? It all has to do with name lookup. When accessing an attribute, the interpreter must find it in one of the three namespaces. The search begins with the local namespace. If the attribute is not found there, then the global namespace is searched. If that is also unsuccessful, the final frontier is the built-ins namespace. If the exhaustive search fails, you get the familiar:

image

Notice how the figure features the foremost-searched namespaces “shadowing” namespaces, which are searched afterward. This is to try to convey the effect of overriding. This shadowing effect is illustrated by the gray boxes in Figure 12-1. For example, names found in the local namespace will hide access to objects in the global or built-ins namespaces. This is the process whereby names may be taken out of scope because a more local namespace contains a name. Take a look at the following piece of code that was introduced in the previous chapter:

image

When we execute this code, we get the following output:

image

The bar variable in the local namespace of foo() overrode the global bar variable. Although bar exists in the global namespace, the lookup found the one in the local namespace first, hence “overriding” the global one. For more information regarding scope, see Section 11.8 of Chapter 11.

12.3.3 Namespaces for Free!

One of Python’s most useful features is the ability to get a namespace almost anywhere you need a place to put things. We have seen in the previous chapter how you can just add attributes to functions at whim (using the familiar dotted-attribute notation):

image

In this chapter, we have shown how modules themselves make namespaces and how you access them in the same way:

mymodule.foo()
mymodule.version

Although we will discuss object-oriented programming (OOP) in Chapter 13, how about an example even simpler than a “Hello World!” to introduce you to Python classes?

image

You can throw just about anything you want in a namespace. This use of a class (instance) is perfectly fine, and you don’t even have to know much about OOP to be able to use a class! (Note: These guys are called instance attributes.) Fancy names aside, the instance is just used as a namespace.

You will see just how useful they are as you delve deeper into OOP and discover what a convenience it is during runtime just to be able to store temporary (but important) values! As stated in the final tenet of the Zen of Python:

“Namespaces are one honking great idea—let’s do more of those!”

(To see the complete Zen, just import the this module within the interactive interpreter.)

12.4 Importing Modules

12.4.1 The import Statement

Importing a module requires the use of the import statement, whose syntax is:

image

It is also possible to import multiple modules on the same line like this ...

import module1[, module2[,... moduleN]]

... but the resulting code is not as readable as having multiple import statements. Also, there is no performance hit and no change in the way that the Python bytecode is generated, so by all means, use the first form, which is the preferred form.

Core Style: Module ordering for import statements

image

It is recommended that all module imports happen at the top of Python modules. Furthermore, imports should follow this ordering:

Python Standard Library modules

Python third party modules

Application-specific modules

Separate these groups with an empty line between the imports of these three types of modules. This helps ensure that modules are imported in a consistent manner and helps minimize the number of import statements required in each of the modules. You can read more about this and other import tips in Python’s Style Guide, written up as PEP 8.

When this statement is encountered by the interpreter, the module is imported if found in the search path. Scoping rules apply, so if imported from the top level of a module, it has global scope; if imported from a function, it has local scope.

When a module is imported the first time, it is loaded and executed.

12.4.2 The from-import Statement

It is possible to import specific module elements into your own module. By this, we really mean importing specific names from the module into the current namespace. For this purpose, we can use the from-import statement, whose syntax is:

from module import name1[, name2[,... nameN]]

12.4.3 Multi-Line Import

The multi-line import feature was added in Python 2.4 specifically for long from-import statements. When importing many attributes from the same module, import lines of code tend to get long and wrap, requiring a NEWLINE-escaping backslash. Here is the example imported (pun intended) directly from PEP 328:

image

image

Your other option is to have multiple from-import statements:

image

We are also trying to stem usage on the unfavored from Tkinter import * (see the Core Style sidebar in Section 12.5.3). Instead, programmers should be free to use Python’s standard grouping mechanism (parentheses) to create a more reasonable multi-line import statement:

image

You can find out more about multi-line imports in the documentation or in PEP 328.

12.4.4 Extended Import Statement (as)

There are times when you are importing either a module or module attribute with a name that you are already using in your application, or perhaps it is a name that you do not want to use. Maybe the name is too long to type everywhere, or more subjectively, perhaps it is a name that you just plain do not like.

image

This had been a fairly common request from Python programmers: the ability to import modules and module attributes into a program using names other than their original given names. One common workaround is to assign the module name to a variable:

image

In the example above, rather than using longmodulename.attribute, you would use the short.attribute to access the same object. (A similar analogy can be made with importing module attributes using from-import, see below.) However, to do this over and over again and in multiple modules can be annoying and seem wasteful. Using extended import, you can change the locally bound name for what you are importing. Statements like ...

import Tkinter
from cgi import FieldStorage

. . . can be replaced by . . .

import Tkinter as tk
from cgi import FieldStorage as form

image

This feature was added in Python 2.0. At that time, “as” was not implemented as a keyword; it finally became one in Python 2.6. For more information on extended import, see the Python Language Reference Manual and PEP 221.

12.5 Features of Module Import

12.5.1 Module “Executed” When Loaded

One effect of loading a module is that the imported module is “executed,” that is, the top-level portion of the imported module is directly executed. This usually includes setting up of global variables as well as performing the class and function declarations. If there is a check for __name__ to do more on direct script invocation, that is executed, too.

Of course, this type of execution may or may not be the desired effect. If not, you will have to put as much code as possible into functions. Suffice it to say that good module programming style dictates that only function and/or class definitions should be at the top level of a module.

For more information see Section 14.1.1 and the Core Note contained therein.

A new feature was added to Python which allows you to execute an installed module as a script. (Sure, running your own script is easy [$ foo.py], but executing a module in the standard library or third party package is trickier.) You can read more about how to do this in Section 14.4.3.

12.5.2 Importing versus Loading

A module is loaded only once, regardless of the number of times it is imported. This prevents the module “execution” from happening over and over again if multiple imports occur. If your module imports the sys module, and so do five of the other modules you import, it would not be wise to load sys (or any other module) each time! So rest assured, loading happens only once, on first import.

12.5.3 Names Imported into Current Namespace

Calling from-import brings the name into the current namespace, meaning that you do not use the attribute/dotted notation to access the module identifier. For example, to access a variable named var in module module that was imported with:

from module import var

we would use “var” by itself. There is no need to reference the module since you imported var into your namespace. It is also possible to import all the names from the module into the current namespace using the following from-import statement:

from module import *

Core Style: Restrict your use of “from module import *”

image

In practice, using from module import * is considered poor style because it “pollutes” the current namespace and has the potential of overriding names in the current namespace; however, it is extremely convenient if a module has many variables that are often accessed, or if the module has a very long name.

We recommend using this form in only two situations. The first is where the target module has many attributes that would make it inconvenient to type in the module name over and over again. Two prime examples of this are the Tkinter (Python/Tk) and NumPy (Numeric Python) modules, and perhaps the socket module. The other place where it is acceptable to use from module import * is within the interactive interpreter, to save on the amount of typing.

12.5.4 Names Imported into Importer’s Scope

Another side effect of importing just names from modules is that those names are now part of the local namespace. A side effect is possibly hiding or overriding an existing object or built-in with the same name. Also, changes to the variable affect only the local copy and not the original in the imported module’s namespace. In other words, the binding is now local rather than across namespaces.

Here we present the code to two modules: an importer, impter.py, and an importee, imptee.py. Currently, impter.py uses the from-import statement, which creates only local bindings.

image

Upon running the importer, we discover that the importee’s view of its foo variable has not changed even though we modified it in the importer.

foo from imptee: abc
foo from impter: 123
foo from imptee: abc

The only solution is to use import and fully qualified identifier names using the attribute/dotted notation.

image

Once we make the update and change our references accordingly, we now have achieved the desired effect.

foo from imptee: abc
foo from impter: 123
foo from imptee: 123

12.5.5 Back to the __future__

Back in the days of Python 2.0, it was recognized that due to improvements, new features, and current feature enhancements, certain significant changes could not be implemented without affecting some existing functionality. To better prepare Python programmers for what was coming down the line, the __future__ directives were implemented.

By using the from-import statement and “importing” future functionality, users can get a taste of new features or feature changes enabling them to port their applications correctly by the time the feature becomes permanent. The syntax is:

from __future__ import new_feature

It does not make sense to import __future__ so that is disallowed. (Actually, it is allowed but does not do what you want it to do, which is enable all future features.) You have to import specific features explicitly. You can read more about __future__ directives in PEP 236.

12.5.6 Warning Framework

Similar to the __future__ directive, it is also necessary to warn users when a feature is about to be changed or deprecated so that they can take action based on the notice received. There are multiple pieces to this feature, so we will break it down into components.

image

The first piece is the application programmer’s interface (API). Programmers have the ability to issue warnings from both Python programs (via the warnings module) as well as from C [via a call to PyErr_Warn()].

Another part of the framework is a new set of warning exception classes. Warning is subclassed directly from Exception and serves as the root of all warnings: UserWarning, DeprecationWarning, SyntaxWarning, and RuntimeWarning. These are described in further detail in Chapter 10.

The next component is the warnings filter. There are different warnings of different levels and severities, and somehow the number and type of warnings should be controllable. The warnings filter not only collects information about the warning, such as line number, cause of the warning, etc., but it also controls whether warnings are ignored, displayed—they can be custom-formatted—or turned into errors (generating an exception).

Warnings have a default output to sys.stderr, but there are hooks to be able to change that, for example, to log it instead of displaying it to the end-user while running Python scripts subject to issued warnings. There is also an API to manipulate warning filters.

Finally, there are the command-line arguments that control the warning filters. These come in the form of options to the Python interpreter upon startup via the -W option. See the Python documentation or PEP 230 for the specific switches for your version of Python. The warning framework first appeared in Python 2.1.

12.5.7 Importing Modules from ZIP Files

In version 2.3, the feature that allows the import of modules contained inside ZIP archives was added to Python. If you add a .zip file containing Python modules (.py, .pyc, or .pyo files) to your search path, i.e., PYTHONPATH or sys.path, the importer will search that archive for the module as if the ZIP file was a directory.

image

If a ZIP file contains just a .py for any imported module, Python will not attempt to modify the archive by adding the corresponding .pyc file, meaning that if a ZIP archive does not contain a matching .pyc file, import speed should be expected to be slower than if they were present.

You are also allowed to add specific (sub)directories “under” a .zip file, i.e., /tmp/yolk.zip/lib/ would only import from the lib/ subdirectory within the yolk archive. Although this feature is specified in PEP 273, the actual implementation uses the import hooks provided by PEP 302.

12.5.8 “New” Import Hooks

The import of modules inside ZIP archives was “the first customer” of the new import hooks specified by PEP 302. Although we use the word “new,” that is relative considering that it has been difficult to create custom importers because the only way to accomplish this before was to use the other modules that were either really old or didn’t simplify writing importers. Another solution is to override __import__(), but that is not an easy thing to do because you have to pretty much (re)implement the entire import mechanism.

image

The new import hooks, introduced in Python 2.3, simplify it down to writing callable import classes, and getting them “registered” (or rather, “installed”) with the Python interpreter via the sys module.

There are two classes that you need: a finder and a loader. An instance of these classes takes an argument—the full name of any module or package. A finder instance will look for your module, and if it finds it, return a loader object. The finder can also take a path for finding subpackages. The loader is what eventually brings the module into memory, doing whatever it needs to do to make a real Python module object, which is eventually returned by the loader.

These instances are added to sys.path_hooks. The sys.path_importer_ cache just holds the instances so that path_hooks is traversed only once. Finally, sys.meta_path is a list of instances that should be traversed before looking at sys.path, for modules whose location you know and do not need to find. The meta-path already has the loader objects reader to execute for specific modules or packages.

12.6 Module Built-in Functions

The importation of modules has some functional support from the system. We will look at those now.

12.6.1 __import__()

The __import__() function is new as of Python 1.5, and it is the function that actually does the importing, meaning that the import statement invokes the __import__() function to do its work. The purpose of making this a function is to allow for overriding it if the user is inclined to develop his or her own importation algorithm.

The syntax of __import__() is:

__import__(module_name[, globals[, locals[, fromlist]]])

The module_name variable is the name of the module to import, globals is the dictionary of current names in the global symbol table, locals is the dictionary of current names in the local symbol table, and fromlist is a list of symbols to import the way they would be imported using the from-import statement.

The globals, locals, and fromlist arguments are optional, and if not provided, default to globals(), locals(), and [], respectively.

Calling import sys can be accomplished with

sys = __import__('sys')

12.6.2 globals() and locals()

The globals() and locals() built-in functions return dictionaries of the global and local namespaces, respectively, of the caller. From within a function, the local namespace represents all names defined for execution of that function, which is what locals() will return. globals(), of course, will return those names globally accessible to that function.

From the global namespace, however, globals() and locals() return the same dictionary because the global namespace is as local as you can get while executing there. Here is a little snippet of code that calls both functions from both namespaces:

image

We are going to ask for the dictionary keys only because the values are of no consequence here (plus they make the lines wrap even more in this text). Executing this script, we get the following output:

image

12.6.3 reload()

The reload() built-in function performs another import on a previously imported module. The syntax of reload() is:

reload(module)

module is the actual module you want to reload. There are some criteria for using the reload() module. The first is that the module must have been imported in full (not by using from-import), and it must have loaded successfully. The second rule follows from the first, and that is the argument to reload() the module itself and not a string containing the module name, i.e., it must be something like reload(sys) instead of reload('sys').

Also, code in a module is executed when it is imported, but only once. A second import does not re-execute the code, it just binds the module name. Thus reload() makes sense, as it overrides this default behavior.

12.7 Packages

A package is a hierarchical file directory structure that defines a single Python application environment that consists of modules and subpackages. Packages were added to Python 1.5 to aid with a variety of problems including:

• Adding hierarchical organization to flat namespace

• Allowing developers to group related modules

• Allowing distributors to ship directories vs. bunch of files

• Helping resolve conflicting module names

Along with classes and modules, packages use the familiar attribute/dotted attribute notation to access their elements. Importing modules within packages use the standard import and from-import statements.

12.7.1 Directory Structure

For our package examples, we will assume the directory structure below:

image

Phone is a top-level package and Voicedta, etc., are subpackages. Import subpackages by using import like this:

import Phone.Mobile.Analog
Phone.Mobile.Analog.dial()

Alternatively, you can use from-import in a variety of ways:

The first way is importing just the top-level subpackage and referencing down the subpackage tree using the attribute/dotted notation:

from Phone import Mobile
Mobile.Analog.dial('555-1212')

Furthermore, we can go down one more subpackage for referencing:

from Phone.Mobile import Analog
Analog.dial('555-1212')

In fact, you can go all the way down in the subpackage tree structure:

from Phone.Mobile.Analog import dial
dial('555-1212')

In our above directory structure hierarchy, we observe a number of __init__.py files. These are initializer modules that are required when using from-import to import subpackages but they can be empty if not used. Quite often, developers forget to add __inti__.py files to their package directories, so starting in Python 2.5, this triggers an ImportWarning message.

image

However, it is silently ignored unless the -Wd option is given when launching the interpreter.

12.7.2 Using from-import with Packages

Packages also support the from-import all statement:

from package.module import *

However, such a statement is dependent on the operating system’s filesystem for Python to determine which files to import. Thus the __all__ variable in __init__.py is required. This variable contains all the module names that should be imported when the above statement is invoked if there is such a thing. It consists of a list of module names as strings.

12.7.3 Absolute Import

As the use of packages becomes more pervasive, there have been more cases of the import of sub-packages that end up clashing with (and hiding or shadowing) “real” or standard library modules (actually their names). Package modules will hide any equivalently-named standard library module because it will look inside the package first to perform a relative import, thus hiding access to the standard library module.

Because of this, all imports are now classified as absolute, meaning that names must be packages or modules accessible via the Python path (sys.path or PYTHONPATH).

image

The rationale behind this decision is that subpackages can still be accessed via sys.path, i.e., import Phone.Mobile.Analog. Prior to this change, it was legal to have just import Analog from modules inside the Mobile subpackage.

As a compromise, Python allows relative importing where programmers can indicate the location of a subpackage to be imported by using leader dots in front of the module or package name. For more information, please see Section 12.7.4.

The absolute import feature is the default starting in Python 2.7. (This feature, absolute_import, can be imported from __future__ starting in version 2.5.) You can read more about absolute import in PEP 328.

12.7.4 Relative Import

As described previously, the absolute import feature takes away certain privileges of the module writer of packages. With this loss of freedom in import statements, something must be made available to proxy for that loss. This is where a relative import comes in. The relative import feature alters the import syntax slightly to let programmers tell the importer where to find a module in a subpackage. Because the import statements are always absolute, relative imports only apply to from-import statements.

image

The first part of the syntax is a leader dot to indicate a relative import. From there, any additional dot represents a single level above the current from where to start looking for the modules being imported.

Let us look at our example above again. From within Analog.Mobile. Digital, i.e., the Digital.py module, we cannot simply use this syntax anymore. The following will either still work in older versions of Python, generate a warning, or will not work in more contemporary versions of Python:

import Analog
from Analog import dial

This is due to the absolute import limitation. You have to use either the absolute or relative imports. Below are some valid imports:

image

Relative imports can be used starting in Python 2.5. In Python 2.6, a deprecation warning will appear for all intra-package imports not using the relative import syntax. You can read more about relative import in the Python documentation and in PEP 328.

12.8 Other Features of Modules

12.8.1 Auto-Loaded Modules

When the Python interpreter starts up in standard mode, some modules are loaded by the interpreter for system use. The only one that affects you is the __builtin__ module, which normally gets loaded in as the __builtins__ module.

The sys.modules variable consists of a dictionary of modules that the interpreter has currently loaded (in full and successfully) into the interpreter. The module names are the keys, and the location from which they were imported are the values.

For example, in Windows, the sys.modules variable contains a large number of loaded modules, so we will shorten the list by requesting only the module names. This is accomplished by using the dictionary’s keys() method:

image

The loaded modules for Unix are quite similar:

image

12.8.2 Preventing Attribute Import

If you do not want module attributes imported when a module is imported with “from module import *”, prepend an underscore ( _ ) to those attribute names (you do not want imported). This minimal level of data hiding does not apply if the entire module is imported or if you explicitly import a “hidden” attribute, e.g., import foo._bar.

12.8.3 Case-Insensitive Import

There are various operating systems with case-insensitive file systems. Prior to version 2.1, Python attempted to “do the right thing” when importing modules on the various supported platforms, but with the growing popularity of the MacOS X and Cygwin platforms, certain deficiencies could no longer be ignored, and support needed to be cleaned up.

image

The world was pretty clean-cut when it was just Unix (case-sensitive) and Win32 (case-insensitive), but these new case-insensitive systems coming online were not ported with the case-insensitive features. PEP 235, which specifies this feature, attempts to address this weakness as well as taking away some “hacks” that had existed for other systems to make importing modules more consistent.

The bottom line is that for case-insensitive imports to work properly, an environment variable named PYTHONCASEOK must be defined. Python will then import the first module name that is found (in a case-insensitive manner) that matches. Otherwise Python will perform its native case-sensitive module name matching and import the first matching one it finds.

12.8.4 Source Code Encoding

Starting in Python 2.3, it is now possible to create your Python module file in a native encoding other than 7-bit ASCII. Of course ASCII is the default, but with an additional encoding directive at the top of your Python modules, it will enable the importer to parse your modules using the specified encoding and designate natively encoded Unicode strings correctly so you do not have to worry about editing your source files in a plain ASCII text editor and have to individually “Unicode-tag” each string literal.

image

An example directive specifying a UTF-8 file can be declared like this:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

If you execute or import modules that contain non-ASCII Unicode string literals and do not have an encoding directive at the top, this will result in a DeprecationWarning in Python 2.3 and a syntax error starting in 2.5. You can read more about source code encoding in PEP 263.

12.8.5 Import Cycles

Working with Python in real-life situations, you discover that it is possible to have import loops. If you have ever worked on any large Python project, you are likely to have run into this situation.

Let us take a look at an example. Assume we have a very large product with a very complex command-line interface (CLI). There are a million commands for your product, and as a result, you have an overly massive handler (OMH) set. Every time a new feature is added, from one to three new commands must be added to support the new feature. This will be our omh4cli.py script:

image

You can pretend that the (empty) utility function is a very popular piece of code that most handlers must use. The overly massive handlers for the command-line interface are all in the omh4cli() function. If we have to add a new command, it would be called from here.

Now, as this module grows in a boundless fashion, certain smarter engineers decide to split off their new commands into a separate module and just provide hooks in the original module to access the new stuff. Therefore, the code is easier to maintain, and if bugs were found in the new stuff, one would not have to search through a one-megabyte-plus-sized Python file.

In our case, we have an excited product manager asking us to add a “very outstanding feature” (VOF). Instead of integrating our stuff into omh4cli.py, we create a new script, cli4vof.py:

image

As mentioned before, the utility function is a must for every command, and because we do not want to cut and paste its code from the main handler, we import the main module and call it that way. To finish off our integration, we add a call to our handler into the main overly massive handler, omh4cli().

The problem occurs when the main handler omh4cli imports our new little module cli4vof (to get the new command function) because cli4vof imports omh4cli (to get the utility function). Our module import fails because Python is trying to import a module that was not previously fully imported the first time:

image

Notice the circular import of cli4vof in the traceback. The problem is that in order to call the utility function, cli4vof has to import omh4cli. If it did not have to do that, then omh4cli would have completed its import of cli4vof successfully and there would be no problem. The issue is that when omh4cli is attempting to import cli4vof, cli4vof is trying to import omh4cli. No one finishes an import, hence the error. This is just one example of an import cycle. There are much more complicated ones out in the real world.

The workaround for this problem is almost always to move one of the import statements, e.g., the offending one. You will commonly see import statements at the bottom of modules. As a beginning Python programmer, you are used to seeing them in the beginning, but if you ever run across import statements at the end of modules, you will now know why. In our case, we cannot move the import of omh4cli to the end, because if cli4vof() is called, it will not have the omh4cli name loaded yet:

image

No, our solution here is to just move the import statement into the cli4vof() function declaration:

image

This way, the import of the cli4vof module from omh4cli completes successfully, and on the tail end, calling the utility function is successful because the omh4cli name is imported before it is called. As far as execution goes, the only difference is that from cli4vof, the import of omh4cli is performed when cli4vof.cli4vof() is called and not when the cli4vof module is imported.

12.8.6 Module Execution

There are many ways to execute a Python module: script invocation via the command-line or shell, execfile(), module import, interpreter -m option, etc. These are out of the scope of this chapter. We refer you to Chapter 14, “Execution Environment,” which covers all of these features in full detail.

12.9 Related Modules

The following are auxiliary modules that you may use when dealing with the import of Python modules. Of these listed below, modulefinder, pkgutil, and zipimport are new as of Python 2.3, and the distutils package was introduced back in version 2.0.

image

imp— this module gives you access to some lower-level importer functionality.

modulefinder— this is a module that lets you find all the modules that are used by a Python script. You can either use the ModuleFinder class or just run it as a script giving it the filename of a(nother) Python module with which to do module analysis on.

pkgutil— this module gives those putting together Python packages for distribution a way to place package files in various places yet maintain the abstraction of a single “package” file hierarchy. It uses *.pkg files in a manner similar to the way the site module uses *.pth files to help define the package path.

site— using this module along with *.pth files gives you the ability to specify the order in which packages are added to your Python path, i.e., sys.path, PYTHONPATH. You do not have to import it explicitly as the importer already uses it by default—you need to use the -S switch when starting up Python to turn it off. Also, you can perform further arbitrary site-specific customizations by adding a sitecustomize module whose import is attempted after the path manipulations have been completed.

zipimport— this module allows you to be able to import Python modules that are archived in ZIP files. Note that the functionality in this file is “automagically” called by the importer so there is no need to import this file for use in any application. We mention it here solely as a reference.

distutils— this package provides support for building, installing, and distributing Python modules and packages. It also aids in building Python extensions written in C/C++. More information on distutils can be found in the Python documentation available at these links:

http://docs.python.org/dist/dist.html

http://docs.python.org/inst/inst.html

12.10 Exercises

12-1. PathSearch versus SearchPath. What is the difference between a path search and a search path?

12-2. Importing Attributes. Assume you have a function called foo() in your module mymodule.

(a) What are the two ways of importing this function into your namespace for invocation?

(b) What are the namespace implications when choosing one over the other?

12-3. Importing. What are the differences between using “import module” and “from module import *”?

12-4. Namespaces versus Variable Scope. How are namespaces and variable scopes different from each other?

12-5. Using __import__().

(a) Use __import__() to import a module into your namespace. What is the correct syntax you finally used to get it working?

(b) Same as above, but use __import__() to import only specific names from modules.

12-6. Extended Import. Create a new function called importAs(). This function will import a module into your namespace, but with a name you specify, not its original name. For example, calling newname=importAs('mymodule') will import the module mymodule, but the module and all its elements are accessible only as newname or newname.attr. This is the exact functionality provided by the new extended import syntax introduced in Python 2.0.

image

12-7. Import Hooks. Study the import hooks mechanism provided for by the implementation of PEP 302. Implement your own import mechanism, which allows you to obfuscate your Python modules (encryption, bzip2, rot13, etc.) so that the interpreter can decode them properly and import them properly. You may wish to look at how it works with importing zip files (see Section 12.5.7).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset