3. Script Execution Context

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Script Execution Context

“I’d Like to Have an Argument, Please”

Python scripts don’t run in a vacuum (despite what you may have heard). Depending on platforms and startup procedures, Python programs may have all sorts of enclosing context—information automatically passed in to the program by the operating system when the program starts up. For instance, scripts have access to the following sorts of system-level inputs and interfaces:

Current working directory: os.getcwd gives access to the directory from which a script is started, and many file tools use its value implicitly.
Command-line arguments: sys.argv gives access to words typed on the command line that are used to start the program and that serve as script inputs.
Shell variables: os.environ provides an interface to names assigned in the enclosing shell (or a parent program) and passed in to the script.
Standard streams: sys.stdin, stdout, and stderr export the three input/output streams that are at the heart of command-line shell tools, and can be leveraged by scripts with print options, the os.popen call and subprocess module introduced in Chapter 2, the io.StringIO class, and more.

Such tools can serve as inputs to scripts, configuration parameters, and so on. In this chapter, we will explore all these four context’s tools—both their Python interfaces and their typical roles.

Current Working Directory

The notion of the current working directory (CWD) turns out to be a key concept in some scripts’ execution: it’s always the implicit place where files processed by the script are assumed to reside unless their names have absolute directory paths. As we saw earlier, os.getcwd lets a script fetch the CWD name explicitly, and os.chdir allows a script to move to a new CWD.

Keep in mind, though, that filenames without full pathnames map to the CWD and have nothing to do with your PYTHONPATH setting. Technically, a script is always launched from the CWD, not the directory containing the script file. Conversely, imports always first search the directory containing the script, not the CWD (unless the script happens to also be located in the CWD). Since this distinction is subtle and tends to trip up beginners, let’s explore it in a bit more detail.

CWD, Files, and Import Paths

When you run a Python script by typing a shell command line such as python dir1dir2file.py, the CWD is the directory you were in when you typed this command, not dir1dir2. On the other hand, Python automatically adds the identity of the script’s home directory to the front of the module search path such that file.py can always import other files in dir1dir2 no matter where it is run from. To illustrate, let’s write a simple script to echo both its CWD and its module search path:

C:...PP4ESystem> type whereami.py
import os, sys
print('my os.getcwd =>', os.getcwd())           # show my cwd execution dir
print('my sys.path  =>', sys.path[:6])          # show first 6 import paths
input()                                         # wait for keypress if clicked

Now, running this script in the directory in which it resides sets the CWD as expected and adds it to the front of the module import search path. We met the sys.path module search path earlier; its first entry might also be the empty string to designate CWD when you’re working interactively, and most of the CWD has been truncated to “...” here for display:

C:...PP4ESystem> set PYTHONPATH=C:PP4thEdExamples
C:...PP4ESystem> python whereami.py
my os.getcwd => C:...PP4ESystem
my sys.path  => ['C:\...\PP4E\System', 'C:\PP4thEd\Examples', ...more... ]

But if we run this script from other places, the CWD moves with us (it’s the directory where we type commands), and Python adds a directory to the front of the module search path that allows the script to still see files in its own home directory. For instance, when running from one level up (..), the System name added to the front of sys.path will be the first directory that Python searches for imports within whereami.py; it points imports back to the directory containing the script that was run. Filenames without complete paths, though, will be mapped to the CWD (C:PP4thEdExamplesPP4E), not the System subdirectory nested there:

C:...PP4ESystem> cd ..
C:...PP4E> python Systemwhereami.py
my os.getcwd => C:...PP4E
my sys.path  => ['C:\...\PP4E\System', 'C:\PP4thEd\Examples', ...more... ]

C:...PP4E> cd System	emp
C:...PP4ESystem	emp> python ..whereami.py
my os.getcwd => C:...PP4ESystem	emp
my sys.path  => ['C:\...\PP4E\System', 'C:\PP4thEd\Examples', ...]

The net effect is that filenames without directory paths in a script will be mapped to the place where the command was typed (os.getcwd), but imports still have access to the directory of the script being run (via the front of sys.path). Finally, when a file is launched by clicking its icon, the CWD is just the directory that contains the clicked file. The following output, for example, appears in a new DOS console box when whereami.py is double-clicked in Windows Explorer:

my os.getcwd => C:...PP4ESystem
my sys.path  => ['C:\...\PP4E\System', ...more... ]

In this case, both the CWD used for filenames and the first import search directory are the directory containing the script file. This all usually works out just as you expect, but there are two pitfalls to avoid:

Filenames might need to include complete directory paths if scripts cannot be sure from where they will be run.
Command-line scripts cannot always rely on the CWD to gain import visibility to files that are not in their own directories; instead, use PYTHONPATH settings and package import paths to access modules in other directories.

For example, scripts in this book, regardless of how they are run, can always import other files in their own home directories without package path imports (import filehere), but must go through the PP4E package root to find files anywhere else in the examples tree (from PP4E.dir1.dir2 import filethere), even if they are run from the directory containing the desired external module. As usual for modules, the PP4Edir1dir2 directory name could also be added to PYTHONPATH to make files there visible everywhere without package path imports (though adding more directories to PYTHONPATH increases the likelihood of name clashes). In either case, though, imports are always resolved to the script’s home directory or other Python search path settings, not to the CWD.

CWD and Command Lines

This distinction between the CWD and import search paths explains why many scripts in this book designed to operate in the current working directory (instead of one whose name is passed in) are run with command lines such as this one:

C:	emp> python C:...PP4EToolscleanpyc.py                   process cwd

In this example, the Python script file itself lives in the directory C:...PP4ETools, but because it is run from C: emp, it processes the files located in C: emp (i.e., in the CWD, not in the script’s home directory). To process files elsewhere with such a script, simply cd to the directory to be processed to change the CWD:

C:	emp> cd C:PP4thEdExamples
C:PP4thEdExamples> python C:...PP4EToolscleanpyc.py       process cwd

Because the CWD is always implied, a cd command tells the script which directory to process in no less certain terms than passing a directory name to the script explicitly, like this (portability note: you may need to add quotes around the *.py in this and other command-line examples to prevent it from being expanded in some Unix shells):

C:...PP4ETools> python find.py *.py C:	emp                  process named dir

In this command line, the CWD is the directory containing the script to be run (notice that the script filename has no directory path prefix); but since this script processes a directory named explicitly on the command line (C: emp), the CWD is irrelevant. Finally, if we want to run such a script located in some other directory in order to process files located in yet another directory, we can simply give directory paths to both:

C:	emp> python C:...PP4EToolsfind.py *.cxx C:PP4thEdExamplesPP4E

Here, the script has import visibility to files in its PP4ETools home directory and processes files in the directory named on the command line, but the CWD is something else entirely (C: emp). This last form is more to type, of course, but watch for a variety of CWD and explicit script-path command lines like these in this book.

Command-Line Arguments

The sys module is also where Python makes available the words typed on the command that is used to start a Python script. These words are usually referred to as command-line arguments and show up in sys.argv, a built-in list of strings. C programmers may notice its similarity to the C argv array (an array of C strings). It’s not much to look at interactively, because no command-line arguments are passed to start up Python in this mode:

>>> import sys
>>> sys.argv
['']

To really see what arguments are about, we need to run a script from the shell command line. Example 3-1 shows an unreasonably simple one that just prints the argv list for inspection.

Example 3-1. PP4ESystem estargv.py

import sys
print(sys.argv)

Running this script prints the command-line arguments list; note that the first item is always the name of the executed Python script file itself, no matter how the script was started (see Executable Scripts on Unix).

C:...PP4ESystem> python testargv.py
['testargv.py']

C:...PP4ESystem> python testargv.py spam eggs cheese
['testargv.py', 'spam', 'eggs', 'cheese']

C:...PP4ESystem> python testargv.py -i data.txt -o results.txt
['testargv.py', '-i', 'data.txt', '-o', 'results.txt']

The last command here illustrates a common convention. Much like function arguments, command-line options are sometimes passed by position and sometimes by name using a “-name value” word pair. For instance, the pair -i data.txt means the -i option’s value is data.txt (e.g., an input filename). Any words can be listed, but programs usually impose some sort of structure on them.

Command-line arguments play the same role in programs that function arguments do in functions: they are simply a way to pass information to a program that can vary per program run. Because they don’t have to be hardcoded, they allow scripts to be more generally useful. For example, a file-processing script can use a command-line argument as the name of the file it should process; see Chapter 2’s more.py script (Example 2-1) for a prime example. Other scripts might accept processing mode flags, Internet addresses, and so on.

Parsing Command-Line Arguments

Once you start using command-line arguments regularly, though, you’ll probably find it inconvenient to keep writing code that fishes through the list looking for words. More typically, programs translate the arguments list on startup into structures that are more conveniently processed. Here’s one way to do it: the script in Example 3-2 scans the argv list looking for -optionname optionvalue word pairs and stuffs them into a dictionary by option name for easy retrieval.

Example 3-2. PP4ESystem estargv2.py

"collect command-line options in a dictionary"

def getopts(argv):
    opts = {}
    while argv:
        if argv[0][0] == '-':                  # find "-name value" pairs
            opts[argv[0]] = argv[1]            # dict key is "-name" arg
            argv = argv[2:]
        else:
            argv = argv[1:]
    return opts

if __name__ == '__main__':
    from sys import argv                       # example client code
    myargs = getopts(argv)
    if '-i' in myargs:
        print(myargs['-i'])
    print(myargs)

You might import and use such a function in all your command-line tools. When run by itself, this file just prints the formatted argument dictionary:

C:...PP4ESystem> python testargv2.py
{}

C:...PP4ESystem> python testargv2.py -i data.txt -o results.txt
data.txt
{'-o': 'results.txt', '-i': 'data.txt'}

Naturally, we could get much more sophisticated here in terms of argument patterns, error checking, and the like. For more complex command lines, we could also use command-line processing tools in the Python standard library to parse arguments:

The getopt module, modeled after a Unix/C utility of the same name
The optparse module, a newer alternative, generally considered to be more powerful

Both of these are documented in Python’s library manual, which also provides usage examples which we’ll defer to here in the interest of space. In general, the more configurable your scripts, the more you must invest in command-line processing logic complexity.

Unix and Linux users: you can also make text files of Python source code directly executable by adding a special line at the top with the path to the Python interpreter and giving the file executable permission. For instance, type this code into a text file called myscript:

#!/usr/bin/python
print('And nice red uniforms')

The first line is normally taken as a comment by Python (it starts with a #); but when this file is run, the operating system sends lines in this file to the interpreter listed after #! in line 1. If this file is made directly executable with a shell command of the form chmod +x myscript, it can be run directly without typing python in the command, as though it were a binary executable program:

% myscript a b c
And nice red uniforms

When run this way, sys.argv will still have the script’s name as the first word in the list: ["myscript", "a", "b", "c"], exactly as if the script had been run with the more explicit and portable command form python myscript a b c. Making scripts directly executable is actually a Unix trick, not a Python feature, but it’s worth pointing out that it can be made a bit less machine dependent by listing the Unix env command at the top instead of a hardcoded path to the Python executable:

#!/usr/bin/env python
print('Wait for it...')

When coded this way, the operating system will employ your environment variable settings to locate your Python interpreter (your PATH variable, on most platforms). If you run the same script on many machines, you need only change your environment settings on each machine (you don’t need to edit Python script code). Of course, you can always run Python files with a more explicit command line:

% python myscript a b c

This assumes that the python interpreter program is on your system’s search path setting (otherwise, you need to type its full path), but it works on any Python platform with a command line. Since this is more portable, I generally use this convention in the book’s examples, but consult your Unix manpages for more details on any of the topics mentioned here. Even so, these special #! lines will show up in many examples in this book just in case readers want to run them as executables on Unix or Linux; on other platforms, they are simply ignored as Python comments.

Note that on recent flavors of Windows, you can usually also type a script’s filename directly (without the word python) to make it go, and you don’t have to add a #! line at the top. Python uses the Windows registry on this platform to declare itself as the program that opens files with Python extensions (.py and others). This is also why you can launch files on Windows by clicking on them.

Shell Environment Variables

Shell variables, sometimes known as environment variables, are made available to Python scripts as os.environ, a Python dictionary-like object with one entry per variable setting in the shell. Shell variables live outside the Python system; they are often set at your system prompt or within startup files or control-panel GUIs and typically serve as system-wide configuration inputs to programs.

In fact, by now you should be familiar with a prime example: the PYTHONPATH module search path setting is a shell variable used by Python to import modules. By setting it once in your operating system, its value is available every time a Python program is run. Shell variables can also be set by programs to serve as inputs to other programs in an application; because their values are normally inherited by spawned programs, they can be used as a simple form of interprocess communication.

Fetching Shell Variables

In Python, the surrounding shell environment becomes a simple preset object, not special syntax. Indexing os.environ by the desired shell variable’s name string (e.g., os.environ['USER']) is the moral equivalent of adding a dollar sign before a variable name in most Unix shells (e.g., $USER), using surrounding percent signs on DOS (%USER%), and calling getenv("USER") in a C program. Let’s start up an interactive session to experiment (run in Python 3.1 on a Windows 7 laptop):

>>> import os
>>> os.environ.keys()
KeysView(<os._Environ object at 0x013B8C70>)

>>> list(os.environ.keys())
['TMP', 'COMPUTERNAME', 'USERDOMAIN', 'PSMODULEPATH', 'COMMONPROGRAMFILES',
...many more deleted...
'NUMBER_OF_PROCESSORS', 'PROCESSOR_LEVEL', 'USERPROFILE', 'OS', 'PUBLIC', 'QTJAVA']

>>> os.environ['TEMP']
'C:\Users\mark\AppData\Local\Temp'

Here, the keys method returns an iterable of assigned variables, and indexing fetches the value of the shell variable TEMP on Windows. This works the same way on Linux, but other variables are generally preset when Python starts up. Since we know about PYTHONPATH, let’s peek at its setting within Python to verify its content (as I wrote this, mine was set to the root of the book examples tree for this fourth edition, as well as a temporary development location):

>>> os.environ['PYTHONPATH']
'C:\PP4thEd\Examples;C:\Users\Mark\temp'

>>> for srcdir in os.environ['PYTHONPATH'].split(os.pathsep):
...     print(srcdir)
...
C:PP4thEdExamples
C:UsersMark	emp

>>> import sys
>>> sys.path[:3]
['', 'C:\PP4thEd\Examples', 'C:\Users\Mark\temp']

PYTHONPATH is a string of directory paths separated by whatever character is used to separate items in such paths on your platform (e.g., ; on DOS/Windows, : on Unix and Linux). To split it into its components, we pass to the split string method an os.pathsep delimiter—a portable setting that gives the proper separator for the underlying machine. As usual, sys.path is the actual search path at runtime, and reflects the result of merging in the PYTHONPATH setting after the current directory.

Changing Shell Variables

Like normal dictionaries, the os.environ object supports both key indexing and assignment. As for dictionaries, assignments change the value of the key:

>>> os.environ['TEMP']
'C:\Users\mark\AppData\Local\Temp
>>> os.environ['TEMP'] = r'c:	emp'
>>> os.environ['TEMP']
'c:\temp'

But something extra happens here. In all recent Python releases, values assigned to os.environ keys in this fashion are automatically exported to other parts of the application. That is, key assignments change both the os.environ object in the Python program as well as the associated variable in the enclosing shell environment of the running program’s process. Its new value becomes visible to the Python program, all linked-in C modules, and any programs spawned by the Python process.

Internally, key assignments to os.environ call os.putenv—a function that changes the shell variable outside the boundaries of the Python interpreter. To demonstrate how this works, we need a couple of scripts that set and fetch shell variables; the first is shown in Example 3-3.

Example 3-3. PP4ESystemEnvironmentsetenv.py

import os
print('setenv...', end=' ')
print(os.environ['USER'])                # show current shell variable value

os.environ['USER'] = 'Brian'             # runs os.putenv behind the scenes
os.system('python echoenv.py')

os.environ['USER'] = 'Arthur'            # changes passed to spawned programs
os.system('python echoenv.py')           # and linked-in C library modules

os.environ['USER'] = input('?')
print(os.popen('python echoenv.py').read())

This setenv.py script simply changes a shell variable, USER, and spawns another script that echoes this variable’s value, as shown in Example 3-4.

Example 3-4. PP4ESystemEnvironmentechoenv.py

import os
print('echoenv...', end=' ')
print('Hello,', os.environ['USER'])

No matter how we run echoenv.py, it displays the value of USER in the enclosing shell; when run from the command line, this value comes from whatever we’ve set the variable to in the shell itself:

C:...PP4ESystemEnvironment> set USER=Bob

C:...PP4ESystemEnvironment> python echoenv.py
echoenv... Hello, Bob

When spawned by another script such as setenv.py using the os.system and os.popen tools we met earlier, though, echoenv.py gets whatever USER settings its parent program has made:

C:...PP4ESystemEnvironment> python setenv.py
setenv... Bob
echoenv... Hello, Brian
echoenv... Hello, Arthur
?Gumby
echoenv... Hello, Gumby

C:...PP4ESystemEnvironment> echo %USER%
Bob

This works the same way on Linux. In general terms, a spawned program always inherits environment settings from its parents. Spawned programs are programs started with Python tools such as os.spawnv, the os.fork/exec combination on Unix-like platforms, and os.popen, os.system, and the subprocess module on a variety of platforms. All programs thus launched get the environment variable settings that exist in the parent at launch time.^[7]

From a larger perspective, setting shell variables like this before starting a new program is one way to pass information into the new program. For instance, a Python configuration script might tailor the PYTHONPATH variable to include custom directories just before launching another Python script; the launched script will have the custom search path in its sys.path because shell variables are passed down to children (in fact, watch for such a launcher script to appear at the end of Chapter 6).

Shell Variable Fine Points: Parents, putenv, and getenv

Notice the last command in the preceding example—the USER variable is back to its original value after the top-level Python program exits. Assignments to os.environ keys are passed outside the interpreter and down the spawned programs chain, but never back up to parent program processes (including the system shell). This is also true in C programs that use the putenv library call, and it isn’t a Python limitation per se.

It’s also likely to be a nonissue if a Python script is at the top of your application. But keep in mind that shell settings made within a program usually endure only for that program’s run and for the run of its spawned children. If you need to export a shell variable setting so that it lives on after Python exits, you may be able to find platform-specific extensions that do this; search http://www.python.org or the Web at large.

Another subtlety: as implemented today, changes to os.environ automatically call os.putenv, which runs the putenv call in the C library if it is available on your platform to export the setting outside Python to any linked-in C code. However, although os.environ changes call os.putenv, direct calls to os.putenv do not update os.environ to reflect the change. Because of this, the os.environ mapping interface is generally preferred to os.putenv.

Also note that environment settings are loaded into os.environ on startup and not on each fetch; hence, changes made by linked-in C code after startup may not be reflected in os.environ. Python does have a more focused os.getenv call today, but it is simply translated into an os.environ fetch on most platforms (or all, in 3.X), not into a call to getenv in the C library. Most applications won’t need to care, especially if they are pure Python code. On platforms without a putenv call, os.environ can be passed as a parameter to program startup tools to set the spawned program’s environment.

Standard Streams

The sys module is also the place where the standard input, output, and error streams of your Python programs live; these turn out to be another common way for programs to communicate:

>>> import sys
>>> for f in (sys.stdin, sys.stdout, sys.stderr): print(f)
...
<_io.TextIOWrapper name='<stdin>' encoding='cp437'>
<_io.TextIOWrapper name='<stdout>' encoding='cp437'>
<_io.TextIOWrapper name='<stderr>' encoding='cp437'>

The standard streams are simply preopened Python file objects that are automatically connected to your program’s standard streams when Python starts up. By default, all of them are tied to the console window where Python (or a Python program) was started. Because the print and input built-in functions are really nothing more than user-friendly interfaces to the standard output and input streams, they are similar to using stdout and stdin in sys directly:

>>> print('hello stdout world')
hello stdout world

>>> sys.stdout.write('hello stdout world' + '
')
hello stdout world
19

>>> input('hello stdin world>')
hello stdin world>spam
'spam'

>>> print('hello stdin world>'), sys.stdin.readline()[:-1]
hello stdin world>
eggs
'eggs'

Windows users: if you click a .py Python program’s filename in a Windows file explorer to start it (or launch it with os.system), a DOS console window automatically pops up to serve as the program’s standard stream. If your program makes windows of its own, you can avoid this console pop-up window by naming your program’s source-code file with a .pyw extension, not with a .py extension. The .pyw extension simply means a .py source file without a DOS pop up on Windows (it uses Windows registry settings to run a custom version of Python). A .pyw file may also be imported as usual.

Also note that because printed output goes to this DOS pop up when a program is clicked, scripts that simply print text and exit will generate an odd “flash”—the DOS console box pops up, output is printed into it, and the pop up goes away immediately (not the most user-friendly of features!). To keep the DOS pop-up box around so that you can read printed output, simply add an input() call at the bottom of your script to pause for an Enter key press before exiting.

Redirecting Streams to Files and Programs

Technically, standard output (and print) text appears in the console window where a program was started, standard input (and input) text comes from the keyboard, and standard error text is used to print Python error messages to the console window. At least that’s the default. It’s also possible to redirect these streams both to files and to other programs at the system shell, as well as to arbitrary objects within a Python script. On most systems, such redirections make it easy to reuse and combine general-purpose command-line utilities.

Redirection is useful for things like canned (precoded) test inputs: we can apply a single test script to any set of inputs by simply redirecting the standard input stream to a different file each time the script is run. Similarly, redirecting the standard output stream lets us save and later analyze a program’s output; for example, testing systems might compare the saved standard output of a script with a file of expected output to detect failures.

Although it’s a powerful paradigm, redirection turns out to be straightforward to use. For instance, consider the simple read-evaluate-print loop program in Example 3-5.

Example 3-5. PP4ESystemStreams eststreams.py

"read numbers till eof and show squares"

def interact():
    print('Hello stream world')                     # print sends to sys.stdout
    while True:
        try:
            reply = input('Enter a number>')        # input reads sys.stdin
        except EOFError:
            break                                   # raises an except on eof
        else:                                       # input given as a string
            num = int(reply)
            print("%d squared is %d" % (num, num ** 2))
    print('Bye')

if __name__ == '__main__':
    interact()                                      # when run, not imported

As usual, the interact function here is automatically executed when this file is run, not when it is imported. By default, running this file from a system command line makes that standard stream appear where you typed the Python command. The script simply reads numbers until it reaches end-of-file in the standard input stream (on Windows, end-of-file is usually the two-key combination Ctrl-Z; on Unix, type Ctrl-D instead^[8]):

C:...PP4ESystemStreams> python teststreams.py
Hello stream world
Enter a number>12
12 squared is 144
Enter a number>10
10 squared is 100
Enter a number>^Z
Bye

But on both Windows and Unix-like platforms, we can redirect the standard input stream to come from a file with the < filename shell syntax. Here is a command session in a DOS console box on Windows that forces the script to read its input from a text file, input.txt. It’s the same on Linux, but replace the DOS type command with a Unix cat command:

C:...PP4ESystemStreams> type input.txt
8
6

C:...PP4ESystemStreams> python teststreams.py < input.txt
Hello stream world
Enter a number>8 squared is 64
Enter a number>6 squared is 36
Enter a number>Bye

Here, the input.txt file automates the input we would normally type interactively—the script reads from this file rather than from the keyboard. Standard output can be similarly redirected to go to a file with the > filename shell syntax. In fact, we can combine input and output redirection in a single command:

C:...PP4ESystemStreams> python teststreams.py < input.txt > output.txt

C:...PP4ESystemStreams> type output.txt
Hello stream world
Enter a number>8 squared is 64
Enter a number>6 squared is 36
Enter a number>Bye

This time, the Python script’s input and output are both mapped to text files, not to the interactive console session.

Chaining programs with pipes

On Windows and Unix-like platforms, it’s also possible to send the standard output of one program to the standard input of another using the | shell character between two commands. This is usually called a “pipe” operation because the shell creates a pipeline that connects the output and input of two commands. Let’s send the output of the Python script to the standard more command-line program’s input to see how this works:

C:...PP4ESystemStreams> python teststreams.py < input.txt | more

Hello stream world
Enter a number>8 squared is 64
Enter a number>6 squared is 36
Enter a number>Bye

Here, teststreams’s standard input comes from a file again, but its output (written by print calls) is sent to another program, not to a file or window. The receiving program is more, a standard command-line paging program available on Windows and Unix-like platforms. Because Python ties scripts into the standard stream model, though, Python scripts can be used on both ends. One Python script’s output can always be piped into another Python script’s input:

C:...PP4ESystemStreams> type writer.py
print("Help! Help! I'm being repressed!")
print(42)

C:...PP4ESystemStreams> type reader.py
print('Got this: "%s"' % input())
import sys
data = sys.stdin.readline()[:-1]
print('The meaning of life is', data, int(data) * 2)

C:...PP4ESystemStreams> python writer.py
Help! Help! I'm being repressed!
42

C:...PP4ESystemStreams> python writer.py | python reader.py
Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84

This time, two Python programs are connected. Script reader gets input from script writer; both scripts simply read and write, oblivious to stream mechanics. In practice, such chaining of programs is a simple form of cross-program communications. It makes it easy to reuse utilities written to communicate via stdin and stdout in ways we never anticipated. For instance, a Python program that sorts stdin text could be applied to any data source we like, including the output of other scripts. Consider the Python command-line utility scripts in Examples 3-6 and 3-7 which sort and sum lines in the standard input stream.

Example 3-6. PP4ESystemStreamssorter.py

import sys                                  # or sorted(sys.stdin)
lines = sys.stdin.readlines()               # sort stdin input lines,
lines.sort()                                # send result to stdout
for line in lines: print(line, end='')      # for further processing

Example 3-7. PP4ESystemStreamsadder.py

import sys
sum = 0
while True:
    try:
        line = input()                     # or call sys.stdin.readlines()
    except EOFError:                       # or for line in sys.stdin:
        break                              # input strips 
 at end
    else:
        sum += int(line)                   # was sting.atoi() in 2nd ed
print(sum)

We can apply such general-purpose tools in a variety of ways at the shell command line to sort and sum arbitrary files and program outputs (Windows note: on my prior XP machine and Python 2.X, I had to type “python file.py” here, not just “file.py,” or else the input redirection failed; with Python 3.X on Windows 7 today, either form works):

C:...PP4ESystemStreams> type data.txt
123
000
999
042

C:...PP4ESystemStreams> python sorter.py < data.txt            sort a file
000
042
123
999

C:...PP4ESystemStreams> python adder.py < data.txt             sum file
1164

C:...PP4ESystemStreams> type data.txt | python adder.py        sum type output
1164

C:...PP4ESystemStreams> type writer2.py
for data in (123, 0, 999, 42):
    print('%03d' % data)

C:...PP4ESystemStreams> python writer2.py | python sorter.py   sort py output
000
042
123
999

C:...PP4ESystemStreams> writer2.py | sorter.py                 shorter form
...same output as prior command on Windows...

C:...PP4ESystemStreams> python writer2.py | python sorter.py | python adder.py
1164

The last command here connects three Python scripts by standard streams—the output of each prior script is fed to the input of the next via pipeline shell syntax.

Coding alternatives for adders and sorters

A few coding pointers here: if you look closely, you’ll notice that sorter.py reads all of stdin at once with the readlines method, but adder.py reads one line at a time. If the input source is another program, some platforms run programs connected by pipes in parallel. On such systems, reading line by line works better if the data streams being passed are large, because readers don’t have to wait until writers are completely finished to get busy processing data. Because input just reads stdin, the line-by-line scheme used by adder.py can always be coded with manual sys.stdin reads too:

C:...PP4ESystemStreams> type adder2.py
import sys
sum = 0
while True:
    line = sys.stdin.readline()
    if not line: break
    sum += int(line)
print(sum)

This version utilizes the fact that int allows the digits to be surrounded by whitespace (readline returns a line including its , but we don’t have to use [:-1] or rstrip() to remove it for int). In fact, we can use Python’s more recent file iterators to achieve the same effect—the for loop, for example, automatically grabs one line each time through when we iterate over a file object directly (more on file iterators in the next chapter):

C:...PP4ESystemStreams> type adder3.py
import sys
sum = 0
for line in sys.stdin: sum += int(line)
print(sum)

Changing sorter to read line by line this way may not be a big performance boost, though, because the list sort method requires that the list already be complete. As we’ll see in Chapter 18, manually coded sort algorithms are generally prone to be much slower than the Python list sorting method.

Interestingly, these two scripts can also be coded in a much more compact fashion in Python 2.4 and later by using the new sorted built-in function, generator expressions, and file iterators. The following work the same way as the originals, with noticeably less source-file real estate:

C:...PP4ESystemStreams> type sorterSmall.py
import sys
for line in sorted(sys.stdin): print(line, end='')

C:...PP4ESystemStreams> type adderSmall.py
import sys
print(sum(int(line) for line in sys.stdin))

In its argument to sum, the latter of these employs a generator expression, which is much like a list comprehension, but results are returned one at a time, not in a physical list. The net effect is space optimization. For more details, see a core language resource, such as the book Learning Python.

Redirected Streams and User Interaction

Earlier in this section, we piped teststreams.py output into the standard more command-line program with a command like this:

C:...PP4ESystemStreams> python teststreams.py < input.txt | more

But since we already wrote our own “more” paging utility in Python in the preceding chapter, why not set it up to accept input from stdin too? For example, if we change the last three lines of the more.py file listed as Example 2-1 in the prior chapter…

if __name__ == '__main__':                       # when run, not when imported
    import sys
    if len(sys.argv) == 1:                       # page stdin if no cmd args
        more(sys.stdin.read())
    else:
        more(open(sys.argv[1]).read())

…it almost seems as if we should be able to redirect the standard output of teststreams.py into the standard input of more.py:

C:...PP4ESystemStreams> python teststreams.py < input.txt | python ..more.py
Hello stream world
Enter a number>8 squared is 64
Enter a number>6 squared is 36
Enter a number>Bye

This technique generally works for Python scripts. Here, teststreams.py takes input from a file again. And, as in the last section, one Python program’s output is piped to another’s input—the more.py script in the parent (..) directory.

But there’s a subtle problem lurking in the preceding more.py command. Really, chaining worked there only by sheer luck: if the first script’s output is long enough that more has to ask the user if it should continue, the script will utterly fail (specifically, when input for user interaction triggers EOFError).

The problem is that the augmented more.py uses stdin for two disjointed purposes. It reads a reply from an interactive user on stdin by calling input, but now it also accepts the main input text on stdin. When the stdin stream is really redirected to an input file or pipe, we can’t use it to input a reply from an interactive user; it contains only the text of the input source. Moreover, because stdin is redirected before the program even starts up, there is no way to know what it meant prior to being redirected in the command line.

If we intend to accept input on stdin and use the console for user interaction, we have to do a bit more: we would also need to use special interfaces to read user replies from a keyboard directly, instead of standard input. On Windows, Python’s standard library msvcrt module provides such tools; on many Unix-like platforms, reading from device file /dev/tty will usually suffice.

Since this is an arguably obscure use case, we’ll delegate a complete solution to a suggested exercise. Example 3-8 shows a Windows-only modified version of the more script that pages the standard input stream if called with no arguments, but also makes use of lower-level and platform-specific tools to converse with a user at a keyboard if needed.

Example 3-8. PP4ESystemStreamsmoreplus.py

"""
split and interactively page a string, file, or stream of
text to stdout; when run as a script, page stdin or file
whose name is passed on cmdline; if input is stdin, can't
use it for user reply--use platform-specific tools or GUI;
"""

import sys

def getreply():
    """
    read a reply key from an interactive user
    even if stdin redirected to a file or pipe
    """
    if sys.stdin.isatty():                       # if stdin is console
        return input('?')                        # read reply line from stdin
    else:
        if sys.platform[:3] == 'win':            # if stdin was redirected
            import msvcrt                        # can't use to ask a user
            msvcrt.putch(b'?')
            key = msvcrt.getche()                # use windows console tools
            msvcrt.putch(b'
')                  # getch() does not echo key
            return key
        else:
            assert False, 'platform not supported'
            #linux?: open('/dev/tty').readline()[:-1]

def more(text, numlines=10):
    """
    page multiline string to stdout
    """
    lines = text.splitlines()
    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]
        for line in chunk: print(line)
        if lines and getreply() not in [b'y', b'Y', 'y', 'Y']: break

if __name__ == '__main__':                       # when run, not when imported
    if len(sys.argv) == 1:                       # if no command-line arguments
        more(sys.stdin.read())                   # page stdin, no inputs
    else:
        more(open(sys.argv[1]).read())           # else page filename argument

Most of the new code in this version shows up in its getreply function. The file’s isatty method tells us whether stdin is connected to the console; if it is, we simply read replies on stdin as before. Of course, we have to add such extra logic only to scripts that intend to interact with console users and take input on stdin. In a GUI application, for example, we could instead pop up dialogs, bind keyboard-press events to run callbacks, and so on (we’ll meet GUIs in Chapter 7).

Armed with the reusable getreply function, though, we can safely run our moreplus utility in a variety of ways. As before, we can import and call this module’s function directly, passing in whatever string we wish to page:

>>> from moreplus import more
>>> more(open('adderSmall.py').read())
import sys
print(sum(int(line) for line in sys.stdin))

Also as before, when run with a command-line argument, this script interactively pages through the named file’s text:

C:...PP4ESystemStreams> python moreplus.py adderSmall.py
import sys
print(sum(int(line) for line in sys.stdin))

C:...PP4ESystemStreams> python moreplus.py moreplus.py
"""
split and interactively page a string, file, or stream of
text to stdout; when run as a script, page stdin or file
whose name is passed on cmdline; if input is stdin, can't
use it for user reply--use platform-specific tools or GUI;
"""

import sys

def getreply():
?n

But now the script also correctly pages text redirected into stdin from either a file or a command pipe, even if that text is too long to fit in a single display chunk. On most shells, we send such input via redirection or pipe operators like these:

C:...PP4ESystemStreams> python moreplus.py < moreplus.py
"""
split and interactively page a string, file, or stream of
text to stdout; when run as a script, page stdin or file
whose name is passed on cmdline; if input is stdin, can't
use it for user reply--use platform-specific tools or GUI;
"""

import sys

def getreply():
?n

C:...PP4ESystemStreams> type moreplus.py | python moreplus.py
"""
split and interactively page a string, file, or stream of
text to stdout; when run as a script, page stdin or file
whose name is passed on cmdline; if input is stdin, can't
use it for user reply--use platform-specific tools or GUI;
"""

import sys

def getreply():
?n

Finally, piping one Python script’s output into this script’s input now works as expected, without botching user interaction (and not just because we got lucky):

......SystemStreams> python teststreams.py < input.txt | python moreplus.py
Hello stream world
Enter a number>8 squared is 64
Enter a number>6 squared is 36
Enter a number>Bye

Here, the standard output of one Python script is fed to the standard input of another Python script located in the same directory: moreplus.py reads the output of teststreams.py.

All of the redirections in such command lines work only because scripts don’t care what standard input and output really are—interactive users, files, or pipes between programs. For example, when run as a script, moreplus.py simply reads stream sys.stdin; the command-line shell (e.g., DOS on Windows, csh on Linux) attaches such streams to the source implied by the command line before the script is started. Scripts use the preopened stdin and stdout file objects to access those sources, regardless of their true nature.

And for readers keeping count, we have just run this single more pager script in four different ways: by importing and calling its function, by passing a filename command-line argument, by redirecting stdin to a file, and by piping a command’s output to stdin. By supporting importable functions, command-line arguments, and standard streams, Python system tools code can be reused in a wide variety of modes.

Redirecting Streams to Python Objects

All of the previous standard stream redirections work for programs written in any language that hook into the standard streams and rely more on the shell’s command-line processor than on Python itself. Command-line redirection syntax like < filename and | program is evaluated by the shell, not by Python. A more Pythonesque form of redirection can be done within scripts themselves by resetting sys.stdin and sys.stdout to file-like objects.

The main trick behind this mode is that anything that looks like a file in terms of methods will work as a standard stream in Python. The object’s interface (sometimes called its protocol), and not the object’s specific datatype, is all that matters. That is:

Any object that provides file-like read methods can be assigned to sys.stdin to make input come from that object’s read methods.
Any object that defines file-like write methods can be assigned to sys.stdout; all standard output will be sent to that object’s methods.

Because print and input simply call the write and readline methods of whatever objects sys.stdout and sys.stdin happen to reference, we can use this technique to both provide and intercept standard stream text with objects implemented as classes.

If you’ve already studied Python, you probably know that such plug-and-play compatibility is usually called polymorphism—it doesn’t matter what an object is, and it doesn’t matter what its interface does, as long as it provides the expected interface. This liberal approach to datatypes accounts for much of the conciseness and flexibility of Python code. Here, it provides a way for scripts to reset their own streams. Example 3-9 shows a utility module that demonstrates this concept.

Example 3-9. PP4ESystemStreams edirect.py

"""
file-like objects that save standard output text in a string and provide
standard input text from a string; redirect runs a passed-in function
with its output and input streams reset to these file-like class objects;
"""

import sys                                      # get built-in modules

class Output:                                   # simulated output file
    def __init__(self):
        self.text = ''                          # empty string when created
    def write(self, string):                    # add a string of bytes
        self.text += string
    def writelines(self, lines):                # add each line in a list
        for line in lines: self.write(line)

class Input:                                    # simulated input file
    def __init__(self, input=''):               # default argument
        self.text = input                       # save string when created
    def read(self, size=None):                  # optional argument
        if size == None:                        # read N bytes, or all
            res, self.text = self.text, ''
        else:
            res, self.text = self.text[:size], self.text[size:]
        return res
    def readline(self):
        eoln = self.text.find('
')             # find offset of next eoln
        if eoln == −1:                          # slice off through eoln
            res, self.text = self.text, ''
        else:
            res, self.text = self.text[:eoln+1], self.text[eoln+1:]
        return res

def redirect(function, pargs, kargs, input):    # redirect stdin/out
    savestreams = sys.stdin, sys.stdout         # run a function object
    sys.stdin   = Input(input)                  # return stdout text
    sys.stdout  = Output()
    try:
        result = function(*pargs, **kargs)      # run function with args
        output = sys.stdout.text
    finally:
        sys.stdin, sys.stdout = savestreams     # restore if exc or not
    return (result, output)                     # return result if no exc

This module defines two classes that masquerade as real files:

Output: Provides the write method interface (a.k.a. protocol) expected of output files but saves all output in an in-memory string as it is written.
Input: Provides the interface expected of input files, but provides input on demand from an in-memory string passed in at object construction time.

The redirect function at the bottom of this file combines these two objects to run a single function with input and output redirected entirely to Python class objects. The passed-in function to run need not know or care that its print and input function calls and stdin and stdout method calls are talking to a class rather than to a real file, pipe, or user.

To demonstrate, import and run the interact function at the heart of the teststreams script of Example 3-5 that we’ve been running from the shell (to use the redirection utility function, we need to deal in terms of functions, not files). When run directly, the function reads from the keyboard and writes to the screen, just as if it were run as a program without redirection:

C:...PP4ESystemStreams> python
>>> from teststreams import interact
>>> interact()
Hello stream world
Enter a number>2
2 squared is 4
Enter a number>3
3 squared is 9
Enter a number^Z
Bye
>>>

Now, let’s run this function under the control of the redirection function in redirect.py and pass in some canned input text. In this mode, the interact function takes its input from the string we pass in ('4 5 6 '—three lines with explicit end-of-line characters), and the result of running the function is a tuple with its return value plus a string containing all the text written to the standard output stream:

>>> from redirect import redirect
>>> (result, output) = redirect(interact, (), {}, '4
5
6
')
>>> print(result)
None
>>> output
'Hello stream world
Enter a number>4 squared is 16
Enter a number>5 squared
is 25
Enter a number>6 squared is 36
Enter a number>Bye
'

The output is a single, long string containing the concatenation of all text written to standard output. To make this look better, we can pass it to print or split it up with the string object’s splitlines method:

>>> for line in output.splitlines(): print(line)
...
Hello stream world
Enter a number>4 squared is 16
Enter a number>5 squared is 25
Enter a number>6 squared is 36
Enter a number>Bye

Better still, we can reuse the more.py module we wrote in the preceding chapter (Example 2-1); it’s less to type and remember, and it’s already known to work well (the following, like all cross-directory imports in this book’s examples, assumes that the directory containing the PP4E root is on your module search path—change your PYTHONPATH setting as needed):

>>> from PP4E.System.more import more
>>> more(output)
Hello stream world
Enter a number>4 squared is 16
Enter a number>5 squared is 25
Enter a number>6 squared is 36
Enter a number>Bye

This is an artificial example, of course, but the techniques illustrated are widely applicable. For instance, it’s straightforward to add a GUI interface to a program written to interact with a command-line user. Simply intercept standard output with an object such as the Output class instance shown earlier and throw the text string up in a window. Similarly, standard input can be reset to an object that fetches text from a graphical interface (e.g., a popped-up dialog box). Because classes are plug-and-play compatible with real files, we can use them in any tool that expects a file. Watch for a GUI stream-redirection module named guiStreams in Chapter 10 that provides a concrete implementation of some of these ideas.

The io.StringIO and io.BytesIO Utility Classes

The prior section’s technique of redirecting streams to objects proved so handy that now a standard library module automates the task for many use cases (though some use cases, such as GUIs, may still require more custom code). The standard library tool provides an object that maps a file object interface to and from in-memory strings. For example:

>>> from io import StringIO
>>> buff = StringIO()                   # save written text to a string
>>> buff.write('spam
')
5
>>> buff.write('eggs
')
5
>>> buff.getvalue()
'spam
eggs
'

>>> buff = StringIO('ham
spam
')      # provide input from a string
>>> buff.readline()
'ham
'
>>> buff.readline()
'spam
'
>>> buff.readline()
''

As in the prior section, instances of StringIO objects can be assigned to sys.stdin and sys.stdout to redirect streams for input and print calls and can be passed to any code that was written to expect a real file object. Again, in Python, the object interface, not the concrete datatype, is the name of the game:

>>> from io import StringIO
>>> import sys
>>> buff = StringIO()

>>> temp = sys.stdout
>>> sys.stdout = buff
>>> print(42, 'spam', 3.141)              # or print(..., file=buff)

>>> sys.stdout = temp                     # restore original stream
>>> buff.getvalue()
'42 spam 3.141
'

Note that there is also an io.BytesIO class with similar behavior, but which maps file operations to an in-memory bytes buffer, instead of a str string:

>>> from io import BytesIO
>>> stream = BytesIO()
>>> stream.write(b'spam')
>>> stream.getvalue()
b'spam'

>>> stream = BytesIO(b'dpam')
>>> stream.read()
b'dpam'

Due to the sharp distinction that Python 3X draws between text and binary data, this alternative may be better suited for scripts that deal with binary data. We’ll learn more about the text-versus-binary issue in the next chapter when we explore files.

Capturing the stderr Stream

We’ve been focusing on stdin and stdout redirection, but stderr can be similarly reset to files, pipes, and objects. Although some shells support this, it’s also straightforward within a Python script. For instance, assigning sys.stderr to another instance of a class such as Output or a StringIO object in the preceding section’s example allows your script to intercept text written to standard error, too.

Python itself uses standard error for error message text (and the IDLE GUI interface intercepts it and colors it red by default). However, no higher-level tools for standard error do what print and input do for the output and input streams. If you wish to print to the error stream, you’ll want to call sys.stderr.write() explicitly or read the next section for a print call trick that makes this easier.

Redirecting standard errors from a shell command line is a bit more complex and less portable. On most Unix-like systems, we can usually capture stderr output by using shell-redirection syntax of the form command > output 2>&1. This may not work on some platforms, though, and can even vary per Unix shell; see your shell’s manpages for more details.

Redirection Syntax in Print Calls

Because resetting the stream attributes to new objects was so popular, the Python print built-in is also extended to include an explicit file to which output is to be sent. A statement of this form:

print(stuff, file=afile)            # afile is an object, not a string name

prints stuff to afile instead of to sys.stdout. The net effect is similar to simply assigning sys.stdout to an object, but there is no need to save and restore in order to return to the original output stream (as shown in the section on redirecting streams to objects). For example:

import sys
print('spam' * 2, file=sys.stderr)

will send text the standard error stream object rather than sys.stdout for the duration of this single print call only. The next normal print statement (without file) prints to standard output as usual. Similarly, we can use either our custom class or the standard library’s class as the output file with this hook:

>>> from io import StringIO
>>> buff = StringIO()
>>> print(42, file=buff)
>>> print('spam', file=buff)
>>> print(buff.getvalue())
42
spam

>>> from redirect import Output
>>> buff = Output()
>>> print(43, file=buff)
>>> print('eggs', file=buff)
>>> print(buff.text)
43
eggs

Other Redirection Options: os.popen and subprocess Revisited

Near the end of the preceding chapter, we took a first look at the built-in os.popen function and its subprocess.Popen relative, which provide a way to redirect another command’s streams from within a Python program. As we saw, these tools can be used to run a shell command line (a string we would normally type at a DOS or csh prompt) but also provide a Python file-like object connected to the command’s output stream—reading the file object allows a script to read another program’s output. I suggested that these tools may be used to tap into input streams as well.

Because of that, the os.popen and subprocess tools are another way to redirect streams of spawned programs and are close cousins to some of the techniques we just met. Their effect is much like the shell | command-line pipe syntax for redirecting streams to programs (in fact, their names mean “pipe open”), but they are run within a script and provide a file-like interface to piped streams. They are similar in spirit to the redirect function, but are based on running programs (not calling functions), and the command’s streams are processed in the spawning script as files (not tied to class objects). These tools redirect the streams of a program that a script starts, instead of redirecting the streams of the script itself.

Redirecting input or output with os.popen

In fact, by passing in the desired mode flag, we redirect either a spawned program’s output or input streams to a file in the calling scripts, and we can obtain the spawned program’s exit status code from the close method (None means “no error” here). To illustrate, consider the following two scripts:

C:...PP4ESystemStreams> type hello-out.py
print('Hello shell world')

C:...PP4ESystemStreams> type hello-in.py
inp = input()
open('hello-in.txt', 'w').write('Hello ' + inp + '
')

These scripts can be run from a system shell window as usual:

C:...PP4ESystemStreams> python hello-out.py
Hello shell world

C:...PP4ESystemStreams> python hello-in.py
Brian

C:...PP4ESystemStreams> type hello-in.txt
Hello Brian

As we saw in the prior chapter, Python scripts can read output from other programs and scripts like these, too, using code like the following:

C:...PP4ESystemStreams> python
>>> import os
>>> pipe = os.popen('python hello-out.py')         # 'r' is default--read stdout
>>> pipe.read()
'Hello shell world
'
>>> print(pipe.close())                            # exit status: None is good
None

But Python scripts can also provide input to spawned programs’ standard input streams—passing a “w” mode argument, instead of the default “r”, connects the returned object to the spawned program’s input stream. What we write on the spawning end shows up as input in the program started:

>>> pipe = os.popen('python hello-in.py', 'w')     # 'w'--write to program stdin
>>> pipe.write('Gumby
')
6
>>> pipe.close()                                   # 
 at end is optional
>>> open('hello-in.txt').read()                    # output sent to a file
'Hello Gumby
'

The popen call is also smart enough to run the command string as an independent process on platforms that support such a notion. It accepts an optional third argument that can be used to control buffering of written text, which we’ll finesse here.

Redirecting input and output with subprocess

For even more control over the streams of spawned programs, we can employ the subprocess module we introduced in the preceding chapter. As we learned earlier, this module can emulate os.popen functionality, but it can also achieve feats such as bidirectional stream communication (accessing both a program’s input and output) and tying the output of one program to the input of another.

For instance, this module provides multiple ways to spawn a program and get both its standard output text and exit status. Here are three common ways to leverage this module to start a program and redirect its output stream (recall from Chapter 2 that you may need to pass a shell=True argument to Popen and call to make this section’s examples work on Unix-like platforms as they are coded here):

C:...PP4ESystemStreams> python
>>> from subprocess import Popen, PIPE, call
>>> X = call('python hello-out.py')                            # convenience
Hello shell world
>>> X
0

>>> pipe = Popen('python hello-out.py', stdout=PIPE)
>>> pipe.communicate()[0]                                      # (stdout, stderr)
b'Hello shell world
'
>>> pipe.returncode                                            # exit status
0

>>> pipe = Popen('python hello-out.py', stdout=PIPE)
>>> pipe.stdout.read()
b'Hello shell world
'
>>> pipe.wait()                                                # exit status
0

The call in the first of these three techniques is just a convenience function (there are more of these which you can look up in the Python library manual), and the communicate in the second is roughly a convenience for the third (it sends data to stdin, reads data from stdout until end-of-file, and waits for the process to end):

Redirecting and connecting to the spawned program’s input stream is just as simple, though a bit more complex than the os.popen approach with 'w' file mode shown in the preceding section (as mentioned in the last chapter, os.popen is implemented with subprocess, and is thus itself just something of a convenience function today):

>>> pipe = Popen('python hello-in.py', stdin=PIPE)
>>> pipe.stdin.write(b'Pokey
')
6
>>> pipe.stdin.close()
>>> pipe.wait()
0
>>> open('hello-in.txt').read()                       # output sent to a file
'Hello Pokey
'

In fact, we can use obtain both the input and output streams of a spawned program with this module. Let’s reuse the simple writer and reader scripts we wrote earlier to demonstrate:

C:...PP4ESystemStreams> type writer.py
print("Help! Help! I'm being repressed!")
print(42)

C:...PP4ESystemStreams> type reader.py
print('Got this: "%s"' % input())
import sys
data = sys.stdin.readline()[:-1]
print('The meaning of life is', data, int(data) * 2)

Code like the following can both read from and write to the reader script—the pipe object has two file-like objects available as attached attributes, one connecting to the input stream, and one to the output (Python 2.X users might recognize these as equivalent to the tuple returned by the now-defunct os.popen2):

>>> pipe = Popen('python reader.py', stdin=PIPE, stdout=PIPE)
>>> pipe.stdin.write(b'Lumberjack
')
11
>>> pipe.stdin.write(b'12
')
3
>>> pipe.stdin.close()
>>> output = pipe.stdout.read()
>>> pipe.wait()
0
>>> output
b'Got this: "Lumberjack"
The meaning of life is 12 24
'

As we’ll learn in Chapter 5, we have to be cautious when talking back and forth to a program like this; buffered output streams can lead to deadlock if writes and reads are interleaved, and we may eventually need to consider tools like the Pexpect utility as a workaround (more on this later).

Finally, even more exotic stream control is possible—the following connects two programs, by piping the output of one Python script into another, first with shell syntax, and then with the subprocess module:

C:...PP4ESystemStreams> python writer.py | python reader.py
Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84

C:...PP4ESystemStreams> python
>>> from subprocess import Popen, PIPE
>>> p1 = Popen('python writer.py', stdout=PIPE)
>>> p2 = Popen('python reader.py', stdin=p1.stdout, stdout=PIPE)
>>> output = p2.communicate()[0]
>>> output
b'Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84
'
>>> p2.returncode
0

We can get close to this with os.popen, but that the fact that its pipes are read or write (and not both) prevents us from catching the second script’s output in our code:

>>> import os
>>> p1 = os.popen('python writer.py', 'r')
>>> p2 = os.popen('python reader.py', 'w')
>>> p2.write( p1.read() )
36
>>> X = p2.close()
Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84
>>> print(X)
None

From the broader perspective, the os.popen call and subprocess module are Python’s portable equivalents of Unix-like shell syntax for redirecting the streams of spawned programs. The Python versions also work on Windows, though, and are the most platform-neutral way to launch another program from a Python script. The command-line strings you pass to them may vary per platform (e.g., a directory listing requires an ls on Unix but a dir on Windows), but the call itself works on all major Python platforms.

On Unix-like platforms, the combination of the calls os.fork, os.pipe, os.dup, and some os.exec variants can also be used to start a new independent program with streams connected to the parent program’s streams. As such, it’s yet another way to redirect streams and a low-level equivalent to tools such as os.popen (os.fork is available in Cygwin’s Python on Windows).

Since these are all more advanced parallel processing tools, though, we’ll defer further details on this front until Chapter 5, especially its coverage of pipes and exit status codes. And we’ll resurrect subprocess again in Chapter 6, to code a regression tester that intercepts all three standard streams of spawned test scripts—inputs, outputs, and errors.

But first, Chapter 4 continues our survey of Python system interfaces by exploring the tools available for processing files and directories. Although we’ll be shifting focus somewhat, we’ll find that some of what we’ve learned here will already begin to come in handy as general system-related tools. Spawning shell commands, for instance, provides ways to inspect directories, and the file interface we will expand on in the next chapter is at the heart of the stream processing techniques we have studied here.

If you are familiar with other common shell script languages, it might be useful to see how Python compares. Here is a simple script in a Unix shell language called csh that mails all the files in the current working directory with a suffix of .py (i.e., all Python source files) to a hopefully fictitious address:

#!/bin/csh
foreach x (*.py)
    echo $x
    mail [email protected] -s $x < $x
end

An equivalent Python script looks similar:

#!/usr/bin/python
import os, glob
for x in glob.glob('*.py'):
    print(x)
    os.system('mail [email protected] -s %s < %s' % (x, x))

but is slightly more verbose. Since Python, unlike csh, isn’t meant just for shell scripts, system interfaces must be imported and called explicitly. And since Python isn’t just a string-processing language, character strings must be enclosed in quotes, as in C.

Although this can add a few extra keystrokes in simple scripts like this, being a general-purpose language makes Python a better tool once we leave the realm of trivial programs. We could, for example, extend the preceding script to do things like transfer files by FTP, pop up a GUI message selector and status bar, fetch messages from an SQL database, and employ COM objects on Windows, all using standard Python tools.

Python scripts also tend to be more portable to other platforms than csh. For instance, if we used the Python SMTP interface module to send mail instead of relying on a Unix command-line mail tool, the script would run on any machine with Python and an Internet link (as we’ll see in Chapter 13, SMTP requires only sockets). And like C, we don’t need $ to evaluate variables; what else would you expect in a free language?

^[7]This is by default. Some program-launching tools also let scripts pass environment settings that are different from their own to child programs. For instance, the os.spawnve call is like os.spawnv, but it accepts a dictionary argument representing the shell environment to be passed to the started program. Some os.exec* variants (ones with an “e” at the end of their names) similarly accept explicit environments; see the os.exec* call formats in Chapter 5 for more details.

^[8]Notice that input raises an exception to signal end-of-file, but file read methods simply return an empty string for this condition. Because input also strips the end-of-line character at the end of lines, an empty string result means an empty line, so an exception is necessary to specify the end-of-file condition. File read methods retain the end-of-line character and denote an empty line as " " instead of "". This is one way in which reading sys.stdin directly differs from input. The latter also accepts a prompt string that is automatically printed before input is accepted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Script Execution Context

Create new playlist

Sign In

Sign Up