2. System Tools

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. System Tools

“The os.path to Knowledge”

This chapter begins our in-depth look at ways to apply Python to real programming tasks. In this and the following chapters, you’ll see how to use Python to write system tools, GUIs, database applications, Internet scripts, websites, and more. Along the way, we’ll also study larger Python programming concepts in action: code reuse, maintainability, object-oriented programming (OOP), and so on.

In this first part of the book, we begin our Python programming tour by exploring the systems application domain—scripts that deal with files, programs, and the general environment surrounding a program. Although the examples in this domain focus on particular kinds of tasks, the techniques they employ will prove to be useful in later parts of the book as well. In other words, you should begin your journey here, unless you are already a Python systems programming wizard.

Why Python Here?

Python’s system interfaces span application domains, but for the next five chapters, most of our examples fall into the category of system tools—programs sometimes called command-line utilities, shell scripts, system administration, systems programming, and other permutations of such words. Regardless of their title, you are probably already familiar with this sort of script; these scripts accomplish such tasks as processing files in a directory, launching test programs, and so on. Such programs historically have been written in nonportable and syntactically obscure shell languages such as DOS batch files, csh, and awk.

Even in this relatively simple domain, though, some of Python’s better attributes shine brightly. For instance, Python’s ease of use and extensive built-in library make it simple (and even fun) to use advanced system tools such as threads, signals, forks, sockets, and their kin; such tools are much less accessible under the obscure syntax of shell languages and the slow development cycles of compiled languages. Python’s support for concepts like code clarity and OOP also help us write shell tools that can be read, maintained, and reused. When using Python, there is no need to start every new script from scratch.

Moreover, we’ll find that Python not only includes all the interfaces we need in order to write system tools, but it also fosters script portability. By employing Python’s standard library, most system scripts written in Python are automatically portable to all major platforms. For instance, you can usually run in Linux a Python directory-processing script written in Windows without changing its source code at all—simply copy over the source code. Though writing scripts that achieve such portability utopia requires some extra effort and practice, if used well, Python could be the only system scripting tool you need to use.

The Next Five Chapters

To make this part of the book easier to study, I have broken it down into five chapters:

In this chapter, I’ll introduce the main system-related modules in overview fashion. We’ll meet some of the most commonly used system tools here for the first time.
In Chapter 3, we continue exploring the basic system interfaces by studying their role in core system programming concepts: streams, command-line arguments, environment variables, and so on.
Chapter 4 focuses on the tools Python provides for processing files, directories, and directory trees.
In Chapter 5, we’ll move on to cover Python’s standard tools for parallel processing—processes, threads, queues, pipes, signals, and more.
Chapter 6 wraps up by presenting a collection of complete system-oriented programs. The examples here are larger and more realistic, and they use the tools introduced in the prior four chapters to perform real, practical tasks. This collection includes both general system scripts, as well as scripts for processing directories of files.

Especially in the examples chapter at the end of this part, we will be concerned as much with system interfaces as with general Python development concepts. We’ll see non-object-oriented and object-oriented versions of some examples along the way, for instance, to help illustrate the benefits of thinking in more strategic ways.

This chapter, and those that follow, deal with both the Python language and its standard library—a collection of precoded modules written in Python and C that are automatically installed with the Python interpreter. Although Python itself provides an easy-to-use scripting language, much of the real action in Python development involves this vast library of programming tools (a few hundred modules at last count) that ship with the Python package.

In fact, the standard library is so powerful that it is not uncommon to hear Python described as batteries included—a phrase generally credited to Frank Stajano meaning that most of what you need for real day-to-day work is already there for importing. Python’s standard library, while not part of the core language per se, is a standard part of the Python system and you can expect it to be available wherever your scripts run. Indeed, this is a noteworthy difference between Python and some other scripting languages—because Python comes with so many library tools “out of the box,” supplemental sites like Perl’s CPAN are not as important.

As we’ll see, the standard library forms much of the challenge in Python programming. Once you’ve mastered the core language, you’ll find that you’ll spend most of your time applying the built-in functions and modules that come with the system. On the other hand, libraries are where most of the fun happens. In practice, programs become most interesting when they start using services external to the language interpreter: networks, files, GUIs, XML, databases, and so on. All of these are supported in the Python standard library.

Beyond the standard library, there is an additional collection of third-party packages for Python that must be fetched and installed separately. As of this writing, you can find most of these third-party extensions via general web searches, and using the links at http://www.python.org and at the PyPI website (accessible from http://www.python.org). Some third-party extensions are large systems in their own right; NumPy, Django, and VPython, for instance, add vector processing, website construction, and visualization, respectively.

If you have to do something special with Python, chances are good that either its support is part of the standard Python install package or you can find a free and open source module that will help. Most of the tools we’ll employ in this text are a standard part of Python, but I’ll be careful to point out things that must be installed separately. Of course, Python’s extreme code reuse idiom also makes your programs dependent on the code you reuse; in practice, though, and as we’ll see repeatedly in this book, powerful libraries coupled with open source access speed development without locking you into an existing set of features or limitations.

System Scripting Overview

To begin our exploration of the systems domain, we will take a quick tour through the standard library sys and os modules in this chapter, before moving on to larger system programming concepts. As you can tell from the length of their attribute lists, both of these are large modules—the following reflects Python 3.1 running on Windows 7 outside IDLE:

C:...PP4ESystem> python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (...)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os
>>> len(dir(sys))          # 65 attributes
65
>>> len(dir(os))           # 122 on Windows, more on Unix
122
>>> len(dir(os.path))      # a nested module within os
52

The content of these two modules may vary per Python version and platform. For example, os is much larger under Cygwin after building Python 3.1 from its source code there (Cygwin is a system that provides Unix-like functionality on Windows; it is discussed further in More on Cygwin Python for Windows):

$ ./python.exe
Python 3.1.1 (r311:74480, Feb 20 2010, 10:16:52)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os
>>> len(dir(sys))
64
>>> len(dir(os))
217
>>> len(dir(os.path))
51

As I’m not going to demonstrate every item in every built-in module, the first thing I want to do is show you how to get more details on your own. Officially, this task also serves as an excuse for introducing a few core system scripting concepts; along the way, we’ll code a first script to format documentation.

Python System Modules

Most system-level interfaces in Python are shipped in just two modules: sys and os. That’s somewhat oversimplified; other standard modules belong to this domain too. Among them are the following:

glob: For filename expansion
socket: For network connections and Inter-Process Communication (IPC)
threading, _thread, queue: For running and synchronizing concurrent threads
time, timeit: For accessing system time details
subprocess, multiprocessing: For launching and controlling parallel processes
signal, select, shutil, tempfile, and others: For various other system-related tasks

Third-party extensions such as pySerial (a serial port interface), Pexpect (an Expect work-alike for controlling cross-program dialogs), and even Twisted (a networking framework) can be arguably lumped into the systems domain as well. In addition, some built-in functions are actually system interfaces as well—the open function, for example, interfaces with the file system. But by and large, sys and os together form the core of Python’s built-in system tools arsenal.

In principle at least, sys exports components related to the Python interpreter itself (e.g., the module search path), and os contains variables and functions that map to the operating system on which Python is run. In practice, this distinction may not always seem clear-cut (e.g., the standard input and output streams show up in sys, but they are arguably tied to operating system paradigms). The good news is that you’ll soon use the tools in these modules so often that their locations will be permanently stamped on your memory.^[3]

The os module also attempts to provide a portable programming interface to the underlying operating system; its functions may be implemented differently on different platforms, but to Python scripts, they look the same everywhere. And if that’s still not enough, the os module also exports a nested submodule, os.path, which provides a portable interface to file and directory processing tools.

Module Documentation Sources

As you can probably deduce from the preceding paragraphs, learning to write system scripts in Python is mostly a matter of learning about Python’s system modules. Luckily, there are a variety of information sources to make this task easier—from module attributes to published references and books.

For instance, if you want to know everything that a built-in module exports, you can read its library manual entry; study its source code (Python is open source software, after all); or fetch its attribute list and documentation string interactively. Let’s import sys in Python 3.1 and see what it has to offer:

C:...PP4ESystem> python
>>> import sys
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__',
'__stderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_current_frames',
'_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder',
'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle',
'dont_write_bytecode', 'exc_info', 'excepthook', 'exec_prefix', 'executable',
'exit', 'flags', 'float_info', 'float_repr_style', 'getcheckinterval',
'getdefaultencoding', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit',
'getrefcount', 'getsizeof', 'gettrace', 'getwindowsversion', 'hexversion',
'int_info', 'intern', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path',
'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2',
'setcheckinterval', 'setfilesystemencoding', 'setprofile', 'setrecursionlimit',
'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info',
'warnoptions', 'winver']

The dir function simply returns a list containing the string names of all the attributes in any object with attributes; it’s a handy memory jogger for modules at the interactive prompt. For example, we know there is something called sys.version, because the name version came back in the dir result. If that’s not enough, we can always consult the __doc__ string of built-in modules:

>>> sys.__doc__
"This module provides access to some objects used or maintained by the
interpre
ter and to functions that interact strongly with the interpreter.

Dynamic obj
ects:

argv -- command line arguments; argv[0] is the script pathname if known

path -- module search path; path[0] is the script directory, else ''
modules
-- dictionary of loaded modules

displayhook -- called to show results in an i
...lots of text deleted here..."

Paging Documentation Strings

The __doc__ built-in attribute just shown usually contains a string of documentation, but it may look a bit weird when displayed this way—it’s one long string with embedded end-line characters that print as , not as a nice list of lines. To format these strings for a more humane display, you can simply use a print function-call statement:

>>> print(sys.__doc__)
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.

Dynamic objects:

argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules

...lots of lines deleted here...

The print built-in function, unlike interactive displays, interprets end-line characters correctly. Unfortunately, print doesn’t, by itself, do anything about scrolling or paging and so can still be unwieldy on some platforms. Tools such as the built-in help function can do better:

>>> help(sys)
Help on built-in module sys:

NAME
    sys

FILE
    (built-in)

MODULE DOCS
    http://docs.python.org/library/sys

DESCRIPTION
    This module provides access to some objects used or maintained by the
    interpreter and to functions that interact strongly with the interpreter.

    Dynamic objects:

    argv -- command line arguments; argv[0] is the script pathname if known
    path -- module search path; path[0] is the script directory, else ''
    modules -- dictionary of loaded modules

...lots of lines deleted here...

The help function is one interface provided by the PyDoc system—standard library code that ships with Python and renders documentation (documentation strings, as well as structural details) related to an object in a formatted way. The format is either like a Unix manpage, which we get for help, or an HTML page, which is more grandiose. It’s a handy way to get basic information when working interactively, and it’s a last resort before falling back on manuals and books.

A Custom Paging Script

The help function we just met is also fairly fixed in the way it displays information; although it attempts to page the display in some contexts, its page size isn’t quite right on some of the machines I use. Moreover, it doesn’t page at all in the IDLE GUI, instead relying on manual use of the scrollbar—potentially painful for large displays. When I want more control over the way help text is printed, I usually use a utility script of my own, like the one in Example 2-1.

Example 2-1. PP4ESystemmore.py

"""
split and interactively page a string or file of text
"""

def more(text, numlines=15):
    lines = text.splitlines()                # like split('
') but no '' at end
    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]
        for line in chunk: print(line)
        if lines and input('More?') not in ['y', 'Y']: break

if __name__ == '__main__':
    import sys                               # when run, not imported
    more(open(sys.argv[1]).read(), 10)       # page contents of file on cmdline

The meat of this file is its more function, and if you know enough Python to be qualified to read this book, it should be fairly straightforward. It simply splits up a string around end-line characters, and then slices off and displays a few lines at a time (15 by default) to avoid scrolling off the screen. A slice expression, lines[:15], gets the first 15 items in a list, and lines[15:] gets the rest; to show a different number of lines each time, pass a number to the numlines argument (e.g., the last line in Example 2-1 passes 10 to the numlines argument of the more function).

The splitlines string object method call that this script employs returns a list of substrings split at line ends (e.g., ["line", "line",...]). An alternative splitlines method does similar work, but retains an empty line at the end of the result if the last line is terminated:

>>> line = 'aaa
bbb
ccc
'

>>> line.split('
')
['aaa', 'bbb', 'ccc', '']

>>> line.splitlines()
['aaa', 'bbb', 'ccc']

As we’ll see more formally in Chapter 4, the end-of-line character is normally always (which stands for a byte usually having a binary value of 10) within a Python script, no matter what platform it is run upon. (If you don’t already know why this matters, DOS characters in text are dropped by default when read.)

String Method Basics

Now, Example 2-1 is a simple Python program, but it already brings up three important topics that merit quick detours here: it uses string methods, reads from a file, and is set up to be run or imported. Python string methods are not a system-related tool per se, but they see action in most Python programs. In fact, they are going to show up throughout this chapter as well as those that follow, so here is a quick review of some of the more useful tools in this set. String methods include calls for searching and replacing:

>>> mystr = 'xxxSPAMxxx'
>>> mystr.find('SPAM')                           # return first offset
3
>>> mystr = 'xxaaxxaa'
>>> mystr.replace('aa', 'SPAM')                  # global replacement
'xxSPAMxxSPAM'

The find call returns the offset of the first occurrence of a substring, and replace does global search and replacement. Like all string operations, replace returns a new string instead of changing its subject in-place (recall that strings are immutable). With these methods, substrings are just strings; in Chapter 19, we’ll also meet a module called re that allows regular expression patterns to show up in searches and replacements.

In more recent Pythons, the in membership operator can often be used as an alternative to find if all we need is a yes/no answer (it tests for a substring’s presence). There are also a handful of methods for removing whitespace on the ends of strings—especially useful for lines of text read from a file:

>>> mystr = 'xxxSPAMxxx'
>>> 'SPAM' in mystr                              # substring search/test
True
>>> 'Ni' in mystr                                # when not found
False
>>> mystr.find('Ni')
-1

>>> mystr = '	  Ni
'
>>> mystr.strip()                                # remove whitespace
'Ni'
>>> mystr.rstrip()                               # same, but just on right side
'	  Ni'

String methods also provide functions that are useful for things such as case conversions, and a standard library module named string defines some useful preset variables, among other things:

>>> mystr = 'SHRUBBERY'
>>> mystr.lower()                           # case converters
'shrubbery'

>>> mystr.isalpha()                         # content tests
True
>>> mystr.isdigit()
False

>>> import string                           # case presets: for 'in', etc.
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

>>> string.whitespace                       # whitespace characters
' 	

x0bx0c'

There are also methods for splitting up strings around a substring delimiter and putting them back together with a substring in between. We’ll explore these tools later in this book, but as an introduction, here they are at work:

>>> mystr = 'aaa,bbb,ccc'
>>> mystr.split(',')                        # split into substrings list
['aaa', 'bbb', 'ccc']

>>> mystr = 'a  b
c
d'
>>> mystr.split()                           # default delimiter: whitespace
['a', 'b', 'c', 'd']

>>> delim = 'NI'
>>> delim.join(['aaa', 'bbb', 'ccc'])       # join substrings list
'aaaNIbbbNIccc'

>>> ' '.join(['A', 'dead', 'parrot'])       # add a space between
'A dead parrot'

>>> chars = list('Lorreta')                 # convert to characters list
>>> chars
['L', 'o', 'r', 'r', 'e', 't', 'a']
>>> chars.append('!')
>>> ''.join(chars)                          # to string: empty delimiter
'Lorreta!'

These calls turn out to be surprisingly powerful. For example, a line of data columns separated by tabs can be parsed into its columns with a single split call; the more.py script uses the splitlines variant shown earlier to split a string into a list of line strings. In fact, we can emulate the replace call we saw earlier in this section with a split/join combination:

>>> mystr = 'xxaaxxaa'
>>> 'SPAM'.join(mystr.split('aa'))          # str.replace, the hard way!
'xxSPAMxxSPAM'

For future reference, also keep in mind that Python doesn’t automatically convert strings to numbers, or vice versa; if you want to use one as you would use the other, you must say so with manual conversions:

>>> int("42"), eval("42")                   # string to int conversions
(42, 42)

>>> str(42), repr(42)                       # int to string conversions
('42', '42')

>>> ("%d" % 42), '{:d}'.format(42)          # via formatting expression, method
('42', '42')

>>> "42" + str(1), int("42") + 1            # concatenation, addition
('421', 43)

In the last command here, the first expression triggers string concatenation (since both sides are strings), and the second invokes integer addition (because both objects are numbers). Python doesn’t assume you meant one or the other and convert automatically; as a rule of thumb, Python tries to avoid magic—and the temptation to guess—whenever possible. String tools will be covered in more detail later in this book (in fact, they get a full chapter in Part V), but be sure to also see the library manual for additional string method tools.

Other String Concepts in Python 3.X: Unicode and bytes

Technically speaking, the Python 3.X string story is a bit richer than I’ve implied here. What I’ve shown so far is the str object type—a sequence of characters (technically, Unicode “code points” represented as Unicode “code units”) which represents both ASCII and wider Unicode text, and handles encoding and decoding both manually on request and automatically on file transfers. Strings are coded in quotes (e.g., 'abc'), along with various syntax for coding non-ASCII text (e.g., 'xc4xe8', 'u00c4u00e8').

Really, though, 3.X has two additional string types that support most str string operations: bytes—a sequence of short integers for representing 8-bit binary data, and bytearray—a mutable variant of bytes. You generally know you are dealing with bytes if strings display or are coded with a leading “b” character before the opening quote (e.g., b'abc', b'xc4xe8'). As we’ll see in Chapter 4, files in 3.X follow a similar dichotomy, using str in text mode (which also handles Unicode encodings and line-end conversions) and bytes in binary mode (which transfers bytes to and from files unchanged). And in Chapter 5, we’ll see the same distinction for tools like sockets, which deal in byte strings today.

Unicode text is used in Internationalized applications, and many of Python’s binary-oriented tools deal in byte strings today. This includes some file tools we’ll meet along the way, such as the open call, and the os.listdir and os.walk tools we’ll study in upcoming chapters. As we’ll see, even simple directory tools sometimes have to be aware of Unicode in file content and names. Moreover, tools such as object pickling and binary data parsing are byte-oriented today.

Later in the book, we’ll also find that Unicode also pops up today in the text displayed in GUIs; the bytes shipped other networks; Internet standard such as email; and even some persistence topics such as DBM files and shelves. Any interface that deals in text necessarily deals in Unicode today, because str is Unicode, whether ASCII or wider. Once we reach the realm of the applications programming presented in this book, Unicode is no longer an optional topic for most Python 3.X programmers.

In this book, we’ll defer further coverage of Unicode until we can see it in the context of application topics and practical programs. For more fundamental details on how 3.X’s Unicode text and binary data support impact both string and file usage in some roles, please see Learning Python, Fourth Edition; since this is officially a core language topic, it enjoys in-depth coverage and a full 45-page dedicated chapter in that book.

File Operation Basics

Besides processing strings, the more.py script also uses files—it opens the external file whose name is listed on the command line using the built-in open function, and it reads that file’s text into memory all at once with the file object read method. Since file objects returned by open are part of the core Python language itself, I assume that you have at least a passing familiarity with them at this point in the text. But just in case you’ve flipped to this chapter early on in your Pythonhood, the following calls load a file’s contents into a string, load a fixed-size set of bytes into a string, load a file’s contents into a list of line strings, and load the next line in the file into a string, respectively:

open('file').read()            # read entire file into string
open('file').read(N)           # read next N bytes into string
open('file').readlines()       # read entire file into line strings list
open('file').readline()        # read next line, through '
'

As we’ll see in a moment, these calls can also be applied to shell commands in Python to read their output. File objects also have write methods for sending strings to the associated file. File-related topics are covered in depth in Chapter 4, but making an output file and reading it back is easy in Python:

>>> file = open('spam.txt', 'w')        # create file spam.txt
>>> file.write(('spam' * 5) + '
')     # write text: returns #characters written
21
>>> file.close()

>>> file = open('spam.txt')             # or open('spam.txt').read()
>>> text = file.read()                  # read into a string
>>> text
'spamspamspamspamspam
'

Using Programs in Two Ways

Also by way of review, the last few lines in the more.py file in Example 2-1 introduce one of the first big concepts in shell tool programming. They instrument the file to be used in either of two ways—as a script or as a library.

Recall that every Python module has a built-in __name__ variable that Python sets to the __main__ string only when the file is run as a program, not when it’s imported as a library. Because of that, the more function in this file is executed automatically by the last line in the file when this script is run as a top-level program, but not when it is imported elsewhere. This simple trick turns out to be one key to writing reusable script code: by coding program logic as functions rather than as top-level code, you can also import and reuse it in other scripts.

The upshot is that we can run more.py by itself or import and call its more function elsewhere. When running the file as a top-level program, we list on the command line the name of a file to be read and paged: as I’ll describe in more depth in the next chapter, words typed in the command that is used to start a program show up in the built-in sys.argv list in Python. For example, here is the script file in action, paging itself (be sure to type this command line in your PP4ESystem directory, or it won’t find the input file; more on command lines later):

C:...PP4ESystem> python more.py more.py
"""
split and interactively page a string or file of text
"""

def more(text, numlines=15):
    lines = text.splitlines()                # like split('
') but no '' at end
    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]
        for line in chunk: print(line)
More?y
        if lines and input('More?') not in ['y', 'Y']: break

if __name__ == '__main__':
    import sys                               # when run, not imported
    more(open(sys.argv[1]).read(), 10)       # page contents of file on cmdline

When the more.py file is imported, we pass an explicit string to its more function, and this is exactly the sort of utility we need for documentation text. Running this utility on the sys module’s documentation string gives us a bit more information in human-readable form about what’s available to scripts:

C:...PP4ESystem> python
>>> from more import more
>>> import sys
>>> more(sys.__doc__)
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.

Dynamic objects:

argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules

displayhook -- called to show results in an interactive session
excepthook -- called to handle any uncaught exception other than SystemExit
  To customize printing in an interactive session or to install a custom
  top-level exception handler, assign other functions to replace these.

stdin -- standard input file object; used by input()
More?

Pressing “y” or “Y” here makes the function display the next few lines of documentation, and then prompt again, unless you’ve run past the end of the lines list. Try this on your own machine to see what the rest of the module’s documentation string looks like. Also try experimenting by passing a different window size in the second argument—more(sys.__doc__, 5) shows just 5 lines at a time.

Python Library Manuals

If that still isn’t enough detail, your next step is to read the Python library manual’s entry for sys to get the full story. All of Python’s standard manuals are available online, and they often install alongside Python itself. On Windows, the standard manuals are installed automatically, but here are a few simple pointers:

On Windows, click the Start button, pick All Programs, select the Python entry there, and then choose the Python Manuals item. The manuals should magically appear on your display; as of Python 2.4, the manuals are provided as a Windows help file and so support searching and navigation.
On Linux or Mac OS X, you may be able to click on the manuals’ entries in a file explorer or start your browser from a shell command line and navigate to the library manual’s HTML files on your machine.
If you can’t find the manuals on your computer, you can always read them online. Go to Python’s website at http://www.python.org and follow the documentation links there. This website also has a simple searching utility for the manuals.

However you get started, be sure to pick the Library manual for things such as sys; this manual documents all of the standard library, built-in types and functions, and more. Python’s standard manual set also includes a short tutorial, a language reference, extending references, and more.

Commercially Published References

At the risk of sounding like a marketing droid, I should mention that you can also purchase the Python manual set, printed and bound; see the book information page at http://www.python.org for details and links. Commercially published Python reference books are also available today, including Python Essential Reference, Python in a Nutshell, Python Standard Library, and Python Pocket Reference. Some of these books are more complete and come with examples, but the last one serves as a convenient memory jogger once you’ve taken a library tour or two.^[4]

Introducing the sys Module

But enough about documentation sources (and scripting basics)—let’s move on to system module details. As mentioned earlier, the sys and os modules form the core of much of Python’s system-related tool set. To see how, we’ll turn to a quick, interactive tour through some of the tools in these two modules before applying them in bigger examples. We’ll start with sys, the smaller of the two; remember that to see a full list of all the attributes in sys, you need to pass it to the dir function (or see where we did so earlier in this chapter).

Platforms and Versions

Like most modules, sys includes both informational names and functions that take action. For instance, its attributes give us the name of the underlying operating system on which the platform code is running, the largest possible “natively sized” integer on this machine (though integers can be arbitrarily long in Python 3.X), and the version number of the Python interpreter running our code:

C:...PP4ESystem> python
>>> import sys
>>> sys.platform, sys.maxsize, sys.version
('win32', 2147483647, '3.1.1 (r311:74483, Aug 17 2009, 17:02:12) ...more deleted...')

>>> if sys.platform[:3] == 'win': print('hello windows')
...
hello windows

If you have code that must act differently on different machines, simply test the sys.platform string as done here; although most of Python is cross-platform, nonportable tools are usually wrapped in if tests like the one here. For instance, we’ll see later that some program launch and low-level console interaction tools may vary per platform—simply test sys.platform to pick the right tool for the machine on which your script is running.

The Module Search Path

The sys module also lets us inspect the module search path both interactively and within a Python program. sys.path is a list of directory name strings representing the true search path in a running Python interpreter. When a module is imported, Python scans this list from left to right, searching for the module’s file on each directory named in the list. Because of that, this is the place to look to verify that your search path is really set as intended.^[5]

The sys.path list is simply initialized from your PYTHONPATH setting—the content of any .pth path files located in Python’s directories on your machine plus system defaults—when the interpreter is first started up. In fact, if you inspect sys.path interactively, you’ll notice quite a few directories that are not on your PYTHONPATH: sys.path also includes an indicator for the script’s home directory (an empty string—something I’ll explain in more detail after we meet os.getcwd) and a set of standard library directories that may vary per installation:

>>> sys.path
['', 'C:\PP4thEd\Examples', ...plus standard library paths deleted... ]

Surprisingly, sys.path can actually be changed by a program, too. A script can use list operations such as append, extend, insert, pop, and remove, as well as the del statement to configure the search path at runtime to include all the source directories to which it needs access. Python always uses the current sys.path setting to import, no matter what you’ve changed it to:

>>> sys.path.append(r'C:mydir')
>>> sys.path
['', 'C:\PP4thEd\Examples', ...more deleted..., 'C:\mydir']

Changing sys.path directly like this is an alternative to setting your PYTHONPATH shell variable, but not a very permanent one. Changes to sys.path are retained only until the Python process ends, and they must be remade every time you start a new Python program or session. However, some types of programs (e.g., scripts that run on a web server) may not be able to depend on PYTHONPATH settings; such scripts can instead configure sys.path on startup to include all the directories from which they will need to import modules. For a more concrete use case, see Example 1-34 in the prior chapter—there we had to tweak the search path dynamically this way, because the web server violated our import path assumptions.

Notice the use of a raw string literal in the sys.path configuration code: because backslashes normally introduce escape code sequences in Python strings, Windows users should be sure to either double up on backslashes when using them in DOS directory path strings (e.g., in "C:\dir", \ is an escape sequence that really means ), or use raw string constants to retain backslashes literally (e.g., r"C:dir").

If you inspect directory paths on Windows (as in the sys.path interaction listing), Python prints double \ to mean a single . Technically, you can get away with a single in a string if it is followed by a character Python does not recognize as the rest of an escape sequence, but doubles and raw strings are usually easier than memorizing escape code tables.

Also note that most Python library calls accept either forward (/) or backward () slashes as directory path separators, regardless of the underlying platform. That is, / usually works on Windows too and aids in making scripts portable to Unix. Tools in the os and os.path modules, described later in this chapter, further aid in script path portability.

The Loaded Modules Table

The sys module also contains hooks into the interpreter; sys.modules, for example, is a dictionary containing one name:module entry for every module imported in your Python session or program (really, in the calling Python process):

>>> sys.modules
{'reprlib': <module 'reprlib' from 'c:python31lib
eprlib.py'>, ...more deleted...

>>> list(sys.modules.keys())
 ['reprlib', 'heapq', '__future__', 'sre_compile', '_collections', 'locale', '_sre',
'functools', 'encodings', 'site', 'operator', 'io', '__main__', ...more deleted... ]

>>> sys
<module 'sys' (built-in)>
>>> sys.modules['sys']
<module 'sys' (built-in)>

We might use such a hook to write programs that display or otherwise process all the modules loaded by a program (just iterate over the keys of sys.modules).

Also in the interpret hooks category, an object’s reference count is available via sys.getrefcount, and the names of modules built-in to the Python executable are listed in sys.builtin_module_names. See Python’s library manual for details; these are mostly Python internals information, but such hooks can sometimes become important to programmers writing tools for other programmers to use.

Exception Details

Other attributes in the sys module allow us to fetch all the information related to the most recently raised Python exception. This is handy if we want to process exceptions in a more generic fashion. For instance, the sys.exc_info function returns a tuple with the latest exception’s type, value, and traceback object. In the all class-based exception model that Python 3 uses, the first two of these correspond to the most recently raised exception’s class, and the instance of it which was raised:

>>> try:
...     raise IndexError
... except:
...     print(sys.exc_info())
...
(<class 'IndexError'>, IndexError(), <traceback object at 0x019B8288>)

We might use such information to format our own error message to display in a GUI pop-up window or HTML web page (recall that by default, uncaught exceptions terminate programs with a Python error display). The first two items returned by this call have reasonable string displays when printed directly, and the third is a traceback object that can be processed with the standard traceback module:

>>> import traceback, sys
>>> def grail(x):
...     raise TypeError('already got one')
...
>>> try:
...     grail('arthur')
... except:
...     exc_info = sys.exc_info()
...     print(exc_info[0])
...     print(exc_info[1])
...     traceback.print_tb(exc_info[2])
...
<class 'TypeError'>
already got one
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in grail

The traceback module can also format messages as strings and route them to specific file objects; see the Python library manual for more details.

Other sys Module Exports

The sys module exports additional commonly-used tools that we will meet in the context of larger topics and examples introduced later in this part of the book. For instance:

Command-line arguments show up as a list of strings called sys.argv.
Standard streams are available as sys.stdin, sys.stdout, and sys.stderr.
Program exit can be forced with sys.exit calls.

Since these lead us to bigger topics, though, we will cover them in sections of their own.

Introducing the os Module

As mentioned, os is the larger of the two core system modules. It contains all of the usual operating-system calls you use in C programs and shell scripts. Its calls deal with directories, processes, shell variables, and the like. Technically, this module provides POSIX tools—a portable standard for operating-system calls—along with platform-independent directory processing tools as the nested module os.path. Operationally, os serves as a largely portable interface to your computer’s system calls: scripts written with os and os.path can usually be run unchanged on any platform. On some platforms, os includes extra tools available just for that platform (e.g., low-level process calls on Unix); by and large, though, it is as cross-platform as is technically feasible.

Tools in the os Module

Let’s take a quick look at the basic interfaces in os. As a preview, Table 2-1 summarizes some of the most commonly used tools in the os module, organized by functional area.

Table 2-1. Commonly used os module tools

Tasks	Tools
Shell variables	`os.environ`
Running programs	`os.system`, `os.popen`, `os.execv`, `os.spawnv`
Spawning processes	`os.fork`, `os.pipe`, `os.waitpid`, `os.kill`
Descriptor files, locks	`os.open`, `os.read`, `os.write`
File processing	`os.remove`, `os.rename`, `os.mkfifo`, `os.mkdir`, `os.rmdir`
Administrative tools	`os.getcwd`, `os.chdir`, `os.chmod`, `os.getpid`, `os.listdir`, `os.access`
Portability tools	`os.sep`, `os.pathsep`, `os.curdir`, `os.path.split`, `os.path.join`
Pathname tools	`os.path.exists('path')`, `os.path.isdir('path')`, `os.path.getsize('path')`

If you inspect this module’s attributes interactively, you get a huge list of names that will vary per Python release, will likely vary per platform, and isn’t incredibly useful until you’ve learned what each name means (I’ve let this line-wrap and removed most of this list to save space—run the command on your own):

>>> import os
>>> dir(os)
['F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINH
ERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEM
PORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', '
P_OVERLAY', 'P_WAIT', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX',
...9 lines removed here...
'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'remove', 'rem
ovedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnl', 'spawnle', 'spawnv', 's
pawnve', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result
', 'strerror', 'sys', 'system', 'times', 'umask', 'unlink', 'urandom', 'utime',
'waitpid', 'walk', 'write']

Besides all of these, the nested os.path module exports even more tools, most of which are related to processing file and directory names portably:

>>> dir(os.path)
['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__',
'_get_altsep', '_get_bothseps', '_get_colon', '_get_dot', '_get_empty',
'_get_sep', '_getfullpathname', 'abspath', 'altsep', 'basename', 'commonprefix',
'curdir', 'defpath', 'devnull', 'dirname', 'exists', 'expanduser', 'expandvars',
'extsep', 'genericpath', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs',
'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath',
'os', 'pardir', 'pathsep', 'realpath', 'relpath', 'sep', 'split', 'splitdrive',
'splitext', 'splitunc', 'stat', 'supports_unicode_filenames', 'sys']

Administrative Tools

Just in case those massive listings aren’t quite enough to go on, let’s experiment interactively with some of the more commonly used os tools. Like sys, the os module comes with a collection of informational and administrative tools:

>>> os.getpid()
7980
>>> os.getcwd()
'C:\PP4thEd\Examples\PP4E\System'

>>> os.chdir(r'C:Users')
>>> os.getcwd()
'C:\Users'

As shown here, the os.getpid function gives the calling process’s process ID (a unique system-defined identifier for a running program, useful for process control and unique name creation), and os.getcwd returns the current working directory. The current working directory is where files opened by your script are assumed to live, unless their names include explicit directory paths. That’s why earlier I told you to run the following command in the directory where more.py lives:

C:...PP4ESystem> python more.py more.py

The input filename argument here is given without an explicit directory path (though you could add one to page files in another directory). If you need to run in a different working directory, call the os.chdir function to change to a new directory; your code will run relative to the new directory for the rest of the program (or until the next os.chdir call). The next chapter will have more to say about the notion of a current working directory, and its relation to module imports when it explores script execution context.

Portability Constants

The os module also exports a set of names designed to make cross-platform programming simpler. The set includes platform-specific settings for path and directory separator characters, parent and current directory indicators, and the characters used to terminate lines on the underlying computer.

>>> os.pathsep, os.sep, os.pardir, os.curdir, os.linesep
(';', '', '..', '.', '
')

os.sep is whatever character is used to separate directory components on the platform on which Python is running; it is automatically preset to on Windows, / for POSIX machines, and : on some Macs. Similarly, os.pathsep provides the character that separates directories on directory lists, : for POSIX and ; for DOS and Windows.

By using such attributes when composing and decomposing system-related strings in our scripts, we make the scripts fully portable. For instance, a call of the form dirpath.split(os.sep) will correctly split platform-specific directory names into components, though dirpath may look like dirdir on Windows, dir/dir on Linux, and dir:dir on some Macs. As mentioned, on Windows you can usually use forward slashes rather than backward slashes when giving filenames to be opened, but these portability constants allow scripts to be platform neutral in directory processing code.

Notice also how os.linesep comes back as here—the symbolic escape code which reflects the carriage-return + line-feed line terminator convention on Windows, which you don’t normally notice when processing text files in Python. We’ll learn more about end-of-line translations in Chapter 4.

Common os.path Tools

The nested module os.path provides a large set of directory-related tools of its own. For example, it includes portable functions for tasks such as checking a file’s type (isdir, isfile, and others); testing file existence (exists); and fetching the size of a file by name (getsize):

>>> os.path.isdir(r'C:Users'), os.path.isfile(r'C:Users')
(True, False)
>>> os.path.isdir(r'C:config.sys'), os.path.isfile(r'C:config.sys')
(False, True)
>>> os.path.isdir('nonesuch'), os.path.isfile('nonesuch')
(False, False)

>>> os.path.exists(r'c:UsersBrian')
False
>>> os.path.exists(r'c:UsersDefault')
True
>>> os.path.getsize(r'C:autoexec.bat')
24

The os.path.isdir and os.path.isfile calls tell us whether a filename is a directory or a simple file; both return False if the named file does not exist (that is, nonexistence implies negation). We also get calls for splitting and joining directory path strings, which automatically use the directory name conventions on the platform on which Python is running:

>>> os.path.split(r'C:	empdata.txt')
('C:\temp', 'data.txt')

>>> os.path.join(r'C:	emp', 'output.txt')
'C:\temp\output.txt'

>>> name = r'C:	empdata.txt'                            # Windows paths
>>> os.path.dirname(name), os.path.basename(name)
('C:\temp', 'data.txt')

>>> name = '/home/lutz/temp/data.txt'                     # Unix-style paths
>>> os.path.dirname(name), os.path.basename(name)
('/home/lutz/temp', 'data.txt')

>>> os.path.splitext(r'C:PP4thEdExamplesPP4EPyDemos.pyw')
('C:\PP4thEd\Examples\PP4E\PyDemos', '.pyw')

os.path.split separates a filename from its directory path, and os.path.join puts them back together—all in entirely portable fashion using the path conventions of the machine on which they are called. The dirname and basename calls here return the first and second items returned by a split simply as a convenience, and splitext strips the file extension (after the last .). Subtle point: it’s almost equivalent to use string split and join method calls with the portable os.sep string, but not exactly:

>>> os.sep
''
>>> pathname = r'C:PP4thEdExamplesPP4EPyDemos.pyw'

>>> os.path.split(pathname)                                # split file from dir
('C:\PP4thEd\Examples\PP4E', 'PyDemos.pyw')

>>> pathname.split(os.sep)                                 # split on every slash
['C:', 'PP4thEd', 'Examples', 'PP4E', 'PyDemos.pyw']

>>> os.sep.join(pathname.split(os.sep))
'C:\PP4thEd\Examples\PP4E\PyDemos.pyw'

>>> os.path.join(*pathname.split(os.sep))
'C:PP4thEd\Examples\PP4E\PyDemos.pyw'

The last join call require individual arguments (hence the *) but doesn’t insert a first slash because of the Windows drive syntax; use the preceding str.join method instead if the difference matters. The normpath call comes in handy if your paths become a jumble of Unix and Windows separators:

>>> mixed
'C:\temp\public/files/index.html'
>>> os.path.normpath(mixed)
'C:\temp\public\files\index.html'
>>> print(os.path.normpath(r'C:	emp\sub.file.ext'))
C:	empsubfile.ext

This module also has an abspath call that portably returns the full directory pathname of a file; it accounts for adding the current directory as a path prefix, .. parent syntax, and more:

>>> os.chdir(r'C:Users')
>>> os.getcwd()
'C:\Users'
>>> os.path.abspath('')                        # empty string means the cwd
'C:\Users'

>>> os.path.abspath('temp')                    # expand to full pathname in cwd
'C:\Users\temp'
>>> os.path.abspath(r'PP4Edev')               # partial paths relative to cwd
'C:\Users\PP4E\dev'

>>> os.path.abspath('.')                       # relative path syntax expanded
'C:\Users'
>>> os.path.abspath('..')
'C:'
>>> os.path.abspath(r'..examples')
'C:\examples'

>>> os.path.abspath(r'C:PP4thEdchapters')    # absolute paths unchanged
'C:\PP4thEd\chapters'
>>> os.path.abspath(r'C:	empspam.txt')
'C:\temp\spam.txt'

Because filenames are relative to the current working directory when they aren’t fully specified paths, the os.path.abspath function helps if you want to show users what directory is truly being used to store a file. On Windows, for example, when GUI-based programs are launched by clicking on file explorer icons and desktop shortcuts, the execution directory of the program is the clicked file’s home directory, but that is not always obvious to the person doing the clicking; printing a file’s abspath can help.

Running Shell Commands from Scripts

The os module is also the place where we run shell commands from within Python scripts. This concept is intertwined with others, such as streams, which we won’t cover fully until the next chapter, but since this is a key concept employed throughout this part of the book, let’s take a quick first look at the basics here. Two os functions allow scripts to run any command line that you can type in a console window:

os.system: Runs a shell command from a Python script
os.popen: Runs a shell command and connects to its input or output streams

In addition, the relatively new subprocess module provides finer-grained control over streams of spawned shell commands and can be used as an alternative to, and even for the implementation of, the two calls above (albeit with some cost in extra code complexity).

What’s a shell command?

To understand the scope of the calls listed above, we first need to define a few terms. In this text, the term shell means the system that reads and runs command-line strings on your computer, and shell command means a command-line string that you would normally enter at your computer’s shell prompt.

For example, on Windows, you can start an MS-DOS console window (a.k.a. “Command Prompt”) and type DOS commands there—commands such as dir to get a directory listing, type to view a file, names of programs you wish to start, and so on. DOS is the system shell, and commands such as dir and type are shell commands. On Linux and Mac OS X, you can start a new shell session by opening an xterm or terminal window and typing shell commands there too—ls to list directories, cat to view files, and so on. A variety of shells are available on Unix (e.g., csh, ksh), but they all read and run command lines. Here are two shell commands typed and run in an MS-DOS console box on Windows:

C:...PP4ESystem> dir /B         ...type a shell command line
helloshell.py                      ...its output shows up here
more.py                            ...DOS is the shell on Windows
more.pyc
spam.txt
__init__.py

C:...PP4ESystem> type helloshell.py
# a Python program
print('The Meaning of Life')

Running shell commands

None of this is directly related to Python, of course (despite the fact that Python command-line scripts are sometimes confusingly called “shell tools”). But because the os module’s system and popen calls let Python scripts run any sort of command that the underlying system shell understands, our scripts can make use of every command-line tool available on the computer, whether it’s coded in Python or not. For example, here is some Python code that runs the two DOS shell commands typed at the shell prompt shown previously:

C:...PP4ESystem> python
>>> import os
>>> os.system('dir /B')
helloshell.py
more.py
more.pyc
spam.txt
__init__.py
0
>>> os.system('type helloshell.py')
# a Python program
print('The Meaning of Life')
0

>>> os.system('type hellshell.py')
The system cannot find the file specified.
1

The 0s at the end of the first two commands here are just the return values of the system call itself (its exit status; zero generally means success). The system call can be used to run any command line that we could type at the shell’s prompt (here, C:...PP4ESystem>). The command’s output normally shows up in the Python session’s or program’s standard output stream.

Communicating with shell commands

But what if we want to grab a command’s output within a script? The os.system call simply runs a shell command line, but os.popen also connects to the standard input or output streams of the command; we get back a file-like object connected to the command’s output by default (if we pass a w mode flag to popen, we connect to the command’s input stream instead). By using this object to read the output of a command spawned with popen, we can intercept the text that would normally appear in the console window where a command line is typed:

>>> open('helloshell.py').read()
"# a Python program
print('The Meaning of Life')
"

>>> text = os.popen('type helloshell.py').read()
>>> text
"# a Python program
print('The Meaning of Life')
"

>>> listing = os.popen('dir /B').readlines()
>>> listing
['helloshell.py
', 'more.py
', 'more.pyc
', 'spam.txt
', '__init__.py
']

Here, we first fetch a file’s content the usual way (using Python files), then as the output of a shell type command. Reading the output of a dir command lets us get a listing of files in a directory that we can then process in a loop. We’ll learn other ways to obtain such a list in Chapter 4; there we’ll also learn how file iterators make the readlines call in the os.popen example above unnecessary in most programs, except to display the list interactively as we did here (see also subprocess, os.popen, and Iterators for more on the subject).

So far, we’ve run basic DOS commands; because these calls can run any command line that we can type at a shell prompt, they can also be used to launch other Python scripts. Assuming your system search path is set to locate your Python (so that you can use the shorter “python” in the following instead of the longer “C:Python31python”):

>>> os.system('python helloshell.py')       # run a Python program
The Meaning of Life
0
>>> output = os.popen('python helloshell.py').read()
>>> output
'The Meaning of Life
'

In all of these examples, the command-line strings sent to system and popen are hardcoded, but there’s no reason Python programs could not construct such strings at runtime using normal string operations (+, %, etc.). Given that commands can be dynamically built and run this way, system and popen turn Python scripts into flexible and portable tools for launching and orchestrating other programs. For example, a Python test “driver” script can be used to run programs coded in any language (e.g., C++, Java, Python) and analyze their output. We’ll explore such a script in Chapter 6. We’ll also revisit os.popen in the next chapter in conjunction with stream redirection; as we’ll find, this call can also send input to programs.

The subprocess module alternative

As mentioned, in recent releases of Python the subprocess module can achieve the same effect as os.system and os.popen; it generally requires extra code but gives more control over how streams are connected and used. This becomes especially useful when streams are tied in more complex ways.

For example, to run a simple shell command like we did with os.system earlier, this new module’s call function works roughly the same (running commands like “type” that are built into the shell on Windows requires extra protocol, though normal executables like “python” do not):

>>> import subprocess
>>> subprocess.call('python helloshell.py')              # roughly like os.system()
The Meaning of Life
0
>>> subprocess.call('cmd /C "type helloshell.py"')       # built-in shell cmd
# a Python program
print('The Meaning of Life')
0
>>> subprocess.call('type helloshell.py', shell=True)    # alternative for built-ins
# a Python program
print('The Meaning of Life')
0

Notice the shell=True in the last command here. This is a subtle and platform-dependent requirement:

On Windows, we need to pass a shell=True argument to subprocess tools like call and Popen (shown ahead) in order to run commands built into the shell. Windows commands like “type” require this extra protocol, but normal executables like “python” do not.
On Unix-like platforms, when shell is False (its default), the program command line is run directly by os.execvp, a call we’ll meet in Chapter 5. If this argument is True, the command-line string is run through a shell instead, and you can specify the shell to use with additional arguments.

More on some of this later; for now, it’s enough to note that you may need to pass shell=True to run some of the examples in this section and book in Unix-like environments, if they rely on shell features like program path lookup. Since I’m running code on Windows, this argument will often be omitted here.

Besides imitating os.system, we can similarly use this module to emulate the os.popen call used earlier, to run a shell command and obtain its standard output text in our script:

>>> pipe = subprocess.Popen('python helloshell.py', stdout=subprocess.PIPE)
>>> pipe.communicate()
(b'The Meaning of Life
', None)
>>> pipe.returncode
0

Here, we connect the stdout stream to a pipe, and communicate to run the command to completion and receive its standard output and error streams’ text; the command’s exit status is available in an attribute after it completes. Alternatively, we can use other interfaces to read the command’s standard output directly and wait for it to exit (which returns the exit status):

>>> pipe = subprocess.Popen('python helloshell.py', stdout=subprocess.PIPE)
>>> pipe.stdout.read()
b'The Meaning of Life
'
>>> pipe.wait()
0

In fact, there are direct mappings from os.popen calls to subprocess.Popen objects:

>>> from subprocess import Popen, PIPE
>>> Popen('python helloshell.py', stdout=PIPE).communicate()[0]
b'The Meaning of Life
'
>>>
>>> import os
>>> os.popen('python helloshell.py').read()
'The Meaning of Life
'

As you can probably tell, subprocess is extra work in these relatively simple cases. It starts to look better, though, when we need to control additional streams in flexible ways. In fact, because it also allows us to process a command’s error and input streams in similar ways, in Python 3.X subprocess replaces the original os.popen2, os.popen3, and os.popen4 calls which were available in Python 2.X; these are now just use cases for subprocess object interfaces. Because more advanced use cases for this module deal with standard streams, we’ll postpone additional details about this module until we study stream redirection in the next chapter.

Shell command limitations

Before we move on, you should keep in mind two limitations of system and popen. First, although these two functions themselves are fairly portable, their use is really only as portable as the commands that they run. The preceding examples that run DOS dir and type shell commands, for instance, work only on Windows, and would have to be changed in order to run ls and cat commands on Unix-like platforms.

Second, it is important to remember that running Python files as programs this way is very different and generally much slower than importing program files and calling functions they define. When os.system and os.popen are called, they must start a brand-new, independent program running on your operating system (they generally run the command in a new process). When importing a program file as a module, the Python interpreter simply loads and runs the file’s code in the same process in order to generate a module object. No other program is spawned along the way.^[6]

There are good reasons to build systems as separate programs, too, and in the next chapter we’ll explore things such as command-line arguments and streams that allow programs to pass information back and forth. But in many cases, imported modules are a faster and more direct way to compose systems.

If you plan to use these calls in earnest, you should also know that the os.system call normally blocks—that is, pauses—its caller until the spawned command line exits. On Linux and Unix-like platforms, the spawned command can generally be made to run independently and in parallel with the caller by adding an & shell background operator at the end of the command line:

os.system("python program.py arg arg &")

On Windows, spawning with a DOS start command will usually launch the command in parallel too:

os.system("start program.py arg arg")

In fact, this is so useful that an os.startfile call was added in recent Python releases. This call opens a file with whatever program is listed in the Windows registry for the file’s type—as though its icon has been clicked with the mouse cursor:

os.startfile("webpage.html")    # open file in your web browser
os.startfile("document.doc")    # open file in Microsoft Word
os.startfile("myscript.py")     # run file with Python

The os.popen call does not generally block its caller (by definition, the caller must be able to read or write the file object returned) but callers may still occasionally become blocked under both Windows and Linux if the pipe object is closed—e.g., when garbage is collected—before the spawned program exits or the pipe is read exhaustively (e.g., with its read() method). As we will see later in this part of the book, the Unix os.fork/exec and Windows os.spawnv calls can also be used to run parallel programs without blocking.

Because the os module’s system and popen calls, as well as the subprocess module, also fall under the category of program launchers, stream redirectors, and cross-process communication devices, they will show up again in the following chapters, so we’ll defer further details for the time being. If you’re looking for more details right away, be sure to see the stream redirection section in the next chapter and the directory listings section in Chapter 4.

Other os Module Exports

That’s as much of a tour around os as we have space for here. Since most other os module tools are even more difficult to appreciate outside the context of larger application topics, we’ll postpone a deeper look at them until later chapters. But to let you sample the flavor of this module, here is a quick preview for reference. Among the os module’s other weapons are these:

os.environ: Fetches and sets shell environment variables
os.fork: Spawns a new child process on Unix-like systems
os.pipe: Communicates between programs
os.execlp: Starts new programs
os.spawnv: Starts new programs with lower-level control
os.open: Opens a low-level descriptor-based file
os.mkdir: Creates a new directory
os.mkfifo: Creates a new named pipe
os.stat: Fetches low-level file information
os.remove: Deletes a file by its pathname
os.walk: Applies a function or loop body to all parts of an entire directory tree

And so on. One caution up front: the os module provides a set of file open, read, and write calls, but all of these deal with low-level file access and are entirely distinct from Python’s built-in stdio file objects that we create with the built-in open function. You should normally use the built-in open function, not the os module, for all but very special file-processing needs (e.g., opening with exclusive access file locking).

In the next chapter we will apply sys and os tools such as those we’ve introduced here to implement common system-level tasks, but this book doesn’t have space to provide an exhaustive list of the contents of modules we will meet along the way. Again, if you have not already done so, you should become acquainted with the contents of modules such as os and sys using the resources described earlier. For now, let’s move on to explore additional system tools in the context of broader system programming concepts—the context surrounding a running script.

In Chapter 4, we’ll explore file iterators, but you’ve probably already studied the basics prior to picking up this book. Because os.popen objects have an iterator that reads one line at a time, their readlines method call is usually superfluous. For example, the following steps through lines produced by another program without any explicit reads:

>>> import os
>>> for line in os.popen('dir /B *.py'): print(line, end='')
...
helloshell.py
more.py
__init__.py

Interestingly, Python 3.1 implements os.popen using the subprocess.Popen object that we studied in this chapter. You can see this for yourself in file os.py in the Python standard library on your machine (see C:Python31Lib on Windows); the os.popen result is an object that manages the Popen object and its piped stream:

>>> I = os.popen('dir /B *.py')
>>> I
<os._wrap_close object at 0x013BC750>

Because this pipe wrapper object defines an __iter__ method, it supports line iteration, both automatic (e.g., the for loop above) and manual. Curiously, although the pipe wrapper object supports direct __next__ method calls as though it were its own iterator (just like simple files), it does not support the next built-in function, even though the latter is supposed to simply call the former:

>>> I = os.popen('dir /B *.py')
>>> I.__next__()
'helloshell.py
'

>>> I = os.popen('dir /B *.py')
>>> next(I)
TypeError: _wrap_close object is not an iterator

The reason for this is subtle—direct __next__ calls are intercepted by a __getattr__ defined in the pipe wrapper object, and are properly delegated to the wrapped object; but next function calls invoke Python’s operator overloading machinery, which in 3.X bypasses the wrapper’s __getattr__ for special method names like __next__. Since the pipe wrapper object doesn’t define a __next__ of its own, the call is not caught and delegated, and the next built-in fails. As explained in full in the book Learning Python, the wrapper’s __getattr__ isn’t tried because 3.X begins such searches at the class, not the instance.

This behavior may or may not have been anticipated, and you don’t need to care if you iterate over pipe lines automatically with for loops, comprehensions, and other tools. To code manual iterations robustly, though, be sure to call the iter built-in first—this invokes the __iter__ defined in the pipe wrapper object itself, to correctly support both flavors of advancement:

>>> I = os.popen('dir /B *.py')
>>> I = iter(I)                       # what for loops do
>>> I.__next__()                      # now both forms work
'helloshell.py
'
>>> next(I)
'more.py
'

^[3]They may also work their way into your subconscious. Python newcomers sometimes describe a phenomenon in which they “dream in Python” (insert overly simplistic Freudian analysis here…).

^[4]Full disclosure: I also wrote the last of the books listed as a replacement for the reference appendix that appeared in the first edition of this book; it’s meant to be a supplement to the text you’re reading, and its latest edition also serves as a translation resource for Python 2.X readers. As explained in the Preface, the book you’re holding is meant as tutorial, not reference, so you’ll probably want to find some sort of reference resource eventually (though I’m nearly narcissistic enough to require that it be mine).

^[5]It’s not impossible that Python sees PYTHONPATH differently than you do. A syntax error in your system shell configuration files may botch the setting of PYTHONPATH, even if it looks fine to you. On Windows, for example, if a space appears around the = of a DOS set command in your configuration file (e.g., set NAME = VALUE), you may actually set NAME to an empty string, not to VALUE!

^[6]The Python code exec(open(file).read()) also runs a program file’s code, but within the same process that called it. It’s similar to an import in that regard, but it works more as if the file’s text had been pasted into the calling program at the place where the exec call appears (unless explicit global or local namespace dictionaries are passed). Unlike imports, such an exec unconditionally reads and executes a file’s code (it may be run more than once per process), no module object is generated by the file’s execution, and unless optional namespace dictionaries are passed in, assignments in the file’s code may overwrite variables in the scope where the exec appears; see other resources or the Python library manual for more details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. System Tools

Create new playlist

Sign In

Sign Up