This chapter covers most of the issues related to dealing with files and filesystems in Python. A file is a stream of text or bytes that a program can read and/or write; a filesystem is a hierarchical repository of files on a computer system.
Because files are such a crucial concept in programming, even though this chapter is the largest one in the book, several other chapters also contain material that is relevant when you’re handling specific kinds of files. In particular, Chapter 11 deals with many kinds of files related to persistence and database functionality (JSON files in “The json Module”, pickle files in “The pickle and cPickle Modules”, shelve files in “The shelve Module”, DBM and DBM-like files in “The v3 dbm Package”, and SQLite database files in “SQLite”), Chapter 22 deals with files and other streams in HTML format, and Chapter 23 deals with files and other streams in XML format.
Files and streams come in many flavors: their contents can be arbitrary bytes or text (with various encodings, if the underlying storage or channel deals only with bytes, as most do); they may be suitable for reading, writing, or both; they may or may not be buffered; they may or may not allow “random access,” going back and forth in the file (a stream whose underlying channel is an Internet socket, for example, only allows sequential, “going forward” access—there’s no “going back and forth”).
Traditionally, old-style Python coalesced most of this diverse functionality into the built-in file
object, working in different ways depending on how the built-in function open
created it. These built-ins are still in v2, for backward compatibility.
In both v2 and v3, however, input/output (I/O) is more logically structured, within the standard library’s io
module. In v3, the built-in function open
is, in fact, simply an alias for the function io.open
. In v2, the built-in open
still works the old-fashioned way, creating and returning an old-fashioned built-in file
object (a type that does not exist anymore in v3). However, you can from io import open
to use, instead, the new and better structured io.open
: the file-like objects it returns operate quite similarly to the old-fashioned built-in file
objects in most simple cases, and you can call the new function in a similar way to old-fashioned built-in function open
, too. (Alternatively, of course, in both v2 and v3, you can practice the excellent principle “explicit is better than implicit” by doing import io
and then explicitly using io.open
.)
If you have to maintain old code using built-in file objects in complicated ways, and don’t want to port that code to the newer approach, use the online docs as a reference to the details of old-fashioned built-in file
objects. This book (and, specifically, the start of this chapter) does not cover old-fashioned built-in file
objects, just io.open
and the various classes from the io
module.
Immediately after that, this chapter covers the polymorphic concept of file-like objects (objects that are not actually files but behave to some extent like files) in “File-Like Objects and Polymorphism”.
The chapter next covers modules that deal with temporary files and file-like objects (tempfile
in “The tempfile Module”, and io.StringIO
and io.BytesIO
in “In-Memory “Files”: io.StringIO and io.BytesIO”).
Next comes the coverage of modules that help you access the contents of text and binary files (fileinput
in “The fileinput Module”, linecache
in “The linecache Module”, and struct
in “The struct Module”) and support compressed files and other data archives (gzip
in “The gzip Module”, bz2
in “The bz2 Module”, tarfile
in “The tarfile Module”, zipfile
in “The zipfile Module”, and zlib
in “The zlib Module”). v3 also supports LZMA compression, as used, for example, by the xz
program: we don’t cover that issue in this book, but see the online docs and PyPI for a backport to v2.
In Python, the os
module supplies many of the functions that operate on the filesystem, so this chapter continues by introducing the os
module in “The os Module”. The chapter then covers, in “Filesystem Operations”, operations on the filesystem (comparing, copying, and deleting directories and files; working with file paths; and accessing low-level file descriptors) offered by os
(in “File and Directory Functions of the os Module”), os.path
(in “The os.path Module”), and other modules (dircache
under listdir
in Table 10-3, stat
in “The stat Module”, filecmp
in “The filecmp Module”, fnmatch
in “The fnmatch Module”, glob
in “The glob Module”, and shutil
in “The shutil Module”). We do not cover the module pathlib
, supplying an object-oriented approach to filesystem paths, since, as of this writing, it has been included in the standard library only on a provisional basis, meaning it can undergo backward-incompatible changes, up to and including removal of the module; if you nevertheless want to try it out, see the online docs, and PyPI for a v2 backport.
While most modern programs rely on a graphical user interface (GUI), often via a browser or a smartphone app, text-based, nongraphical “command-line” user interfaces are still useful, since they’re simple, fast to program, and lightweight. This chapter concludes with material about text input and output in Python in “Text Input and Output”, richer text I/O in “Richer-Text I/O”, interactive command-line sessions in “Interactive Command Sessions”, and, finally, a subject generally known as internationalization (often abbreviated i18n). Building software that processes text understandable to different users, across languages and cultures, is described in “Internationalization”.
As mentioned in “Organization of This Chapter”, io
is a standard library module in Python and provides the most common ways for your Python programs to read or write files. Use io.open
to make a Python “file” object—which, depending on what parameters you pass to io.open
, can in fact be an instance of io.TextIOWrapper
if textual, or, if binary, io.BufferedReader
, io.BufferedWriter
, or io.BufferedRandom
, depending on whether it’s read-only, write-only, or read-write—to read and/or write data to a file as seen by the underlying operating system. We refer to these as “file” objects, in quotes, to distinguish them from the old-fashioned built-in file
type still present in v2.
In v3, the built-in function open
is a synonym for io.open
. In v2, use from io import open
to get the same effect. We use io.open
explicitly (assuming a previous import io
has executed, of course), for clarity and to avoid ambiguity.
This section covers such “file” objects, as well as the important issue of making and using temporary files (on disk, or even in memory).
Python reacts to any I/O error related to a “file” object by raising an instance of built-in exception class IOError
(in v3, that’s a synonym for OSError
, but many useful subclasses exist, and are covered in “OSError and subclasses (v3 only)”). Errors that cause this exception include open
failing to open a file, calls to a method on a “file” to which that method doesn’t apply (e.g., calling write
on a read-only “file,” or calling seek
on a nonseekable file)—which could also cause ValueError
or AttributeError
—and I/O errors diagnosed by a “file” object’s methods.
The io
module also provides the underlying web of classes, both abstract and concrete, that, by inheritance and by composition (also known as wrapping), make up the “file” objects (instances of classes mentioned in the first paragraph of this section) that your program generally uses. We do not cover these advanced topics in this book. If you have access to unusual channels for data, or nonfilesystem data storage, and want to provide a “file” interface to those channels or storage, you can ease your task, by appropriate subclassing and wrapping, using other classes in the module io
. For such advanced tasks, consult the online docs.
To create a Python “file” object, call io.open
with the following syntax:
open
(
file
,
mode
=
'
r
'
,
buffering
=
-
1
,
encoding
=
None
,
errors
=
'
strict
'
,
newline
=
None
,
closefd
=
True
,
opener
=
os
.
open
)
file
can be a string, in which case it’s any path to a file as seen by the underlying OS, or it can be an integer, in which case it’s an OS-level file descriptor as returned by os.open
(or, in v3 only, whatever function you pass as the opener
argument—opener
is not supported in v2). When file
is a string, open
opens the file thus named (possibly creating it, depending on mode
—despite its name, open
is not just for opening existing files: it can also create new ones); when file
is an integer, the underlying OS file must already be open (via os.open
or whatever).
open
is a context manager: use with io.open(...) as f:
, not f = io.open(...)
, to ensure the “file” f
gets closed as soon as the with
statement’s body is done.
open
creates and returns an instance f
of the appropriate class of the module io
, depending on mode and buffering—we refer to all such instances as “file” objects; they all are reasonably polymorphic with respect to each other.
mode
is a string indicating how the file is to be opened (or created). mode
can be:
'r'
The file must already exist, and it is opened in read-only mode.
'w'
The file is opened in write-only mode. The file is truncated to zero length and overwritten if it already exists, or created if it does not exist.
'a'
The file is opened in write-only mode. The file is kept intact if it already exists, and the data you write is appended to what’s already in the file. The file is created if it does not exist. Calling f.seek
on the file changes the result of the method f.tell
, but does not change the write position in the file.
'r+'
The file must already exist and is opened for both reading and writing, so all methods of f
can be called.
'w+'
The file is opened for both reading and writing, so all methods of f
can be called. The file is truncated and overwritten if it already exists, or created if it does not exist.
'a+'
The file is opened for both reading and writing, so all methods of f
can be called. The file is kept intact if it already exists, and the data you write is appended to what’s already in the file. The file is created if it does not exist. Calling f.seek
on the file, depending on the underlying operating system, may have no effect when the next I/O operation on f
writes data, but does work normally when the next I/O operation on f
reads data.
The mode
string may have any of the values just explained, followed by a b
or t
. b
means a binary file, while t
means a text one. When mode
has neither b
nor t
, the default is text (i.e., 'r'
is like 'rt'
, 'w'
is like 'wt'
, and so on).
Binary files let you read and/or write strings of type bytes
; text ones let you read and/or write Unicode text strings (str
in v3, unicode
in v2). For text files, when the underlying channel or storage system deals in bytes (as most do), encoding
(the name of an encoding known to Python) and errors
(an error-handler name such as 'strict'
, 'replace'
, and so on, as covered under decode
in Table 8-6) matter, as they specify how to translate between text and bytes, and what to do on encoding and decoding errors.
buffering
is an integer that denotes the buffering you’re requesting for the file. When buffering
is less than 0
, a default is used. Normally, this default is line buffering for files that correspond to interactive consoles, and a buffer of io.DEFAULT_BUFFER_SIZE
bytes for other files. When buffering
is 0
, the file is unbuffered; the effect is as if the file’s buffer were flushed every time you write anything to the file. When buffering
equals 1
, the file (which must be text mode) is line-buffered, which means the file’s buffer is flushed every time you write
to the file. When buffering
is greater than 1
, the file uses a buffer of about buffering
bytes, rounded up to some reasonable amount.
A “file” object f
is inherently sequential (a stream of bytes or text). When you read, you get bytes or text in the sequential order in which they’re present. When you write, the bytes or text you write are added in the order in which you write them.
To allow nonsequential access (also known as “random access”), a “file” object whose underlying storage allows this keeps track of its current position (the position in the underlying file where the next read or write operation starts transferring data). f.seekable()
returns True
when f
supports nonsequential access.
When you open a file, the initial position is at the start of the file. Any call to f
.write
on a “file” object f
opened with a mode
of 'a'
or 'a+'
always sets f
’s position to the end of the file before writing data to f
. When you write or read n
bytes to/from “file” object f
, f
’s position advances by n
. You can query the current position by calling f
.tell
and change the position by calling f
.seek
, both covered in the next section.
f.tell
and f.seek
also work on a text-mode f
, but in this case the offset you pass to f.seek
must be 0
(to position f
at the start or end, depending on f.seek
’s second parameter), or the opaque result previously returned by a call to f.tell
, to position f
back to a position you had thus “bookmarked” before.
A “file” object f
supplies the attributes and methods documented in this section.
close |
Closes the file. You can call no other method on |
closed |
|
encoding |
|
flush |
Requests that |
isatty |
|
fileno |
Returns an integer, the file descriptor of |
mode |
|
name |
|
read |
In v2, or in v3 when |
readline |
Reads and returns one line from |
readlines |
Reads and returns a list of all lines in |
seek |
Sets When When |
tell |
Returns |
truncate |
Truncates |
write |
Writes the bytes of string |
writelines |
Like:
It does not matter whether the strings in iterable |
A “file” object f
, open for text-mode reading, is also an iterator whose items are the file’s lines. Thus, the loop:
for
line
in
f
:
iterates on each line of the file. Due to buffering issues, interrupting such a loop prematurely (e.g., with break
), or calling next(f)
instead of f
.readline()
, leaves the file’s position set to an arbitrary value. If you want to switch from using f
as an iterator to calling other reading methods on f
, be sure to set the file’s position to a known value by appropriately calling f
.seek
. On the plus side, a loop directly on f
has very good performance, since these specifications allow the loop to use internal buffering to minimize I/O without taking up excessive amounts of memory even for huge files.
An object x
is file-like when it behaves polymorphically to a “file” object as returned by io.open
, meaning that we can use x
“as if” x
were a “file.” Code using such an object (known as client code of the object) usually gets the object as an argument, or by calling a factory function that returns the object as the result. For example, if the only method that client code calls on x
is x
.read()
, without arguments, then all x
needs to supply in order to be file-like for that code is a method read
that is callable without arguments and returns a string. Other client code may need x
to implement a larger subset of file methods. File-like objects and polymorphism are not absolute concepts: they are relative to demands placed on an object by some specific client code.
Polymorphism is a powerful aspect of object-oriented programming, and file-like objects are a good example of polymorphism. A client-code module that writes to or reads from files can automatically be reused for data residing elsewhere, as long as the module does not break polymorphism by the dubious practice of type checking. When we discussed built-ins type
and isinstance
in Table 7-1, we mentioned that type checking is often best avoided, as it blocks the normal polymorphism that Python otherwise supplies. Most often, to support polymorphism in your client code, all you have to do is avoid type checking.
You can implement a file-like object by coding your own class (as covered in Chapter 4) and defining the specific methods needed by client code, such as read
. A file-like object fl
need not implement all the attributes and methods of a true “file” object f
. If you can determine which methods the client code calls on fl
, you can choose to implement only that subset. For example, when fl
is only going to be written, fl
doesn’t need “reading” methods, such as read
, readline
, and readlines
.
If the main reason you want a file-like object instead of a real file object is to keep the data in memory, use the io
module’s classes StringIO
and BytesIO
, covered in “In-Memory “Files”: io.StringIO and io.BytesIO”. These classes supply “file” objects that hold data in memory and largely behave polymorphically to other “file” objects.
The tempfile
module lets you create temporary files and directories in the most secure manner afforded by your platform. Temporary files are often a good solution when you’re dealing with an amount of data that might not comfortably fit in memory, or when your program must write data that another process later uses.
The order of the parameters for the functions in this module is a bit confusing: to make your code more readable, always call these functions with named-argument syntax. The tempfile
module exposes the functions and classes outlined in Table 10-1.
mkdtemp |
Securely creates a new temporary directory that is readable, writable, and searchable only by the current user, and returns the absolute path to the temporary directory. The optional arguments
|
mkstemp |
Securely creates a new temporary file, readable and writable only by the current user, not executable, not inherited by subprocesses; returns a pair Ensuring that the temporary file is removed when you’re done using it is up to you:
|
SpooledTemporaryFile |
Just like |
TemporaryFile |
Creates a temporary file with |
NamedTemporaryFile |
Like |
“File” objects supply the minimal functionality needed for file I/O. Some auxiliary Python library modules, however, offer convenient supplementary functionality, making I/O even easier and handier in several important cases.
The fileinput
module lets you loop over all the lines in a list of text files. Performance is good, comparable to the performance of direct iteration on each file, since buffering is used to minimize I/O. You can therefore use module fileinput
for line-oriented file input whenever you find the module’s rich functionality convenient, with no worry about performance. The input
function is the key function of module fileinput
; the module also supplies a FileInput
class whose methods support the same functionality as the module’s functions. The module contents are listed here:
close |
Closes the whole sequence so that iteration stops and no file remains open. |
FileInput |
Creates and returns an instance |
filelineno |
Returns the number of lines read so far from the file now being read. For example, returns |
filename |
Returns the name of the file being read, or |
input |
Returns the sequence of lines in the files, suitable for use in a The sequence object that When
You can optionally pass an |
isfirstline |
Returns |
isstdin |
Returns |
lineno |
Returns the total number of lines read since the call to |
nextfile |
Closes the file being read: the next line to read is the first one of the next file. |
Here’s a typical example of using fileinput
for a “multifile search and replace,” changing one string into another throughout the text files whose name were passed as command-line arguments to the script:
import
fileinput
for
line
in
fileinput
.
input
(
inplace
=
True
):
(
line
.
replace
(
'foo'
,
'bar'
),
end
=
''
)
In such cases it’s important to have the end=''
argument to print
, since each line
has its line-end character
at the end, and you need to ensure that print
doesn’t add another (or else each file would end up “double-spaced”).
The linecache
module lets you read a given line (specified by number) from a file with a given name, keeping an internal cache so that, when you read several lines from a file, it’s faster than opening and examining the file each time. The linecache
module exposes the following functions:
checkcache |
Ensures that the module’s cache holds no stale data and reflects what’s on the filesystem. Call |
clearcache |
Drops the module’s cache so that the memory can be reused for other purposes. Call |
getline |
Reads and returns the |
getlines |
Reads and returns all lines from the text file named |
The struct
module lets you pack binary data into a bytestring, and unpack the bytes of such a bytestring back into the data they represent. Such operations are useful for many kinds of low-level programming. Most often, you use struct
to interpret data records from binary files that have some specified format, or to prepare records to write to such binary files. The module’s name comes from C’s keyword struct
, which is usable for related purposes. On any error, functions of the module struct
raise exceptions that are instances of the exception class struct.error
, the only class the module supplies.
The struct
module relies on struct format strings following a specific syntax. The first character of a format string gives byte order, size, and alignment of packed data:
@
Native byte order, native data sizes, and native alignment for the current platform; this is the default if the first character is none of the characters listed here (note that format P
in Table 10-2 is available only for this kind of struct
format string). Look at string sys.byteorder
when you need to check your system’s byte order ('little'
or 'big'
).
=
Native byte order for the current platform, but standard size and alignment.
<
Little-endian byte order (like Intel platforms); standard size and alignment.
>
, !
Big-endian byte order (network standard); standard size and alignment.
Character | C type | Python type | Standard size |
---|---|---|---|
|
|
|
1 byte |
|
|
|
1 byte |
|
|
|
1 byte |
|
|
|
8 bytes |
|
|
|
4 bytes |
|
|
|
2 bytes |
|
|
|
2 bytes |
|
|
|
4 bytes |
|
|
|
4 bytes |
|
|
|
4 bytes |
|
|
|
4 bytes |
|
|
|
N/A |
|
|
|
N/A |
|
|
|
N/A |
|
|
no value |
1 byte |
Standard sizes are indicated in Table 10-2. Standard alignment means no forced alignment, with explicit padding bytes used if needed. Native sizes and alignment are whatever the platform’s C compiler uses. Native byte order can put the most significant byte at either the lowest (big-endian) or highest (little-endian) address, depending on the platform.
After the optional first character, a format string is made up of one or more format characters, each optionally preceded by a count (an integer represented by decimal digits). (The format characters are shown in Table 10-2.) For most format characters, the count means repetition (e.g., '3h'
is exactly the same as 'hhh'
). When the format character is s
or p
—that is, a bytestring—the count is not a repetition: it’s the total number of bytes in the string. Whitespace can be freely used between formats, but not between a count and its format character.
Format s
means a fixed-length bytestring as long as its count (the Python string is truncated, or padded with copies of the null byte b' '
, if needed). The format p
means a “Pascal-like” bytestring: the first byte is the number of significant bytes that follow, and the actual contents start from the second byte. The count is the total number of bytes, including the length byte.
The struct
module supplies the following functions:
You can implement file-like objects by writing Python classes that supply the methods you need. If all you want is for data to reside in memory, rather than on a file as seen by the operating system, use the class StringIO
or BytesIO
of the io
module. The difference between them is that instances of StringIO
are text-mode “files,” so reads and writes consume or produce Unicode strings, while instances of BytesIO
are binary “files,” so reads and writes consume or produce bytestrings.
When you instantiate either class you can optionally pass a string argument, respectively Unicode or bytes, to use as the initial content of the “file.” An instance f
of either class, in addition to “file” methods, supplies one extra method:
getvalue |
Returns the current data contents of |
Storage space and transmission bandwidth are increasingly cheap and abundant, but in many cases you can save such resources, at the expense of some extra computational effort, by using compression. Computational power grows cheaper and more abundant even faster than some other resources, such as bandwidth, so compression’s popularity keeps growing. Python makes it easy for your programs to support compression, since the Python standard library contains several modules dedicated to compression.
Since Python offers so many ways to deal with compression, some guidance may be helpful. Files containing data compressed with the zlib
module are not automatically interchangeable with other programs, except for those files built with the zipfile
module, which respects the standard format of ZIP file archives. You can write custom programs, with any language able to use InfoZip’s free zlib compression library, to read files produced by Python programs using the zlib
module. However, if you need to interchange compressed data with programs coded in other languages but have a choice of compression methods, we suggest you use the modules bzip2
(best), gzip
, or zipfile
instead. The zlib
module, however, may be useful when you want to compress some parts of datafiles that are in some proprietary format of your own and need not be interchanged with any other program except those that make up your application.
In v3 only, you can also use the newer module lzma
for even (marginally) better compression and compatibility with the newer xz
utility. We do not cover lzma
in this book; see the online docs, and, for use in v2, the v2 backport.
The gzip
module lets you read and write files compatible with those handled by the powerful GNU compression programs gzip
and gunzip
. The GNU programs support many compression formats, but the module gzip
supports only the gzip format, often denoted by appending the extension .gz to a filename. The gzip
module supplies the GzipFile
class and an open
factory function:
GzipFile |
Creates and returns a file-like object
The file-like object |
open |
Like |
Say that you have some function f
(
x
)
that writes Unicode text to a text file object x
passed in as an argument, by calling x
.write
and/or x
.writelines
. To make f
write text to a gzip-compressed file instead:
import
gzip
,
io
with
io
.
open
(
'x.txt.gz'
,
'wb'
)
as
underlying
:
with
gzip
.
GzipFile
(
fileobj
=
underlying
,
mode
=
'wb'
)
as
wrapper
:
f
(
io
.
TextIOWrapper
(
wrapper
,
'utf8'
))
This example opens the underlying binary file x.txt.gz and explicitly wraps it with gzip.GzipFile
; thus, we need two nested with
statements. This separation is not strictly necessary: we could pass the filename directly to gzip.GzipFile
(or gzip.open
); in v3 only, with gzip.open
, we could even ask for mode='wt'
and have the TextIOWrapper
transparently provided for us. However, the example is coded to be maximally explicit, and portable between v2 and v3.
Reading back a compressed text file—for example, to display it on standard output—uses a pretty similar structure of code:
import
gzip
,
io
with
io
.
open
(
'x.txt.gz'
,
'rb'
)
as
underlying
:
with
gzip
.
GzipFile
(
fileobj
=
underlying
,
mode
=
'rb'
)
as
wrapper
:
for
line
in
wrapper
:
(
line
.
decode
(
'utf8'
),
end
=
''
)
Here, we can’t just use an io.TextIOWrapper
, since, in v2, it would not be iterable by line, given the characteristics of the underlying (decompressing) wrapper. However, the explicit decode
of each line works fine in both v2 and v3.
The bz2
module lets you read and write files compatible with those handled by the compression programs bzip2
and bunzip2
, which often achieve even better compression than gzip
and gunzip
. Module bz2
supplies the BZ2File
class, for transparent file compression and decompression, and functions compress
and decompress
to compress and decompress data strings in memory. It also provides objects to compress and decompress data incrementally, enabling you to work with data streams that are too large to comfortably fit in memory at once. For the latter, advanced functionality, consult the Python standard library’s online docs.
For richer functionality in v2, consider the third-party module bz2file, with more complete features, matching v3’s standard library module’s ones as listed here:
BZ2File |
Creates and returns a file-like object
|
compress |
Compresses string |
decompress |
Decompresses the compressed data string |
The tarfile
module lets you read and write TAR files (archive files compatible with those handled by popular archiving programs such as tar
), optionally with either gzip
or bzip2
compression (and, in v3 only, lzma
too). In v3 only, python -m tarfile
offers a useful command-line interface to the module’s functionality: run it without further arguments to get a brief help message.
When handling invalid TAR files, functions of tarfile
raise instances of tarfile.TarError
. The tarfile
module supplies the following classes and functions:
is_tarfile |
Returns |
TarInfo |
The methods
To check the type of
|
open |
Creates and returns a
In the mode strings specifying compression, you can use a vertical bar A |
add |
Adds to archive |
addfile |
Adds to archive |
close |
Closes archive |
extract |
Extracts the archive member identified by |
extractfile |
Extracts the archive member identified by |
getmember |
Returns a |
getmembers |
Returns a list of |
getnames |
Returns a list of strings, the names of each member in archive |
gettarinfo |
Returns a |
list |
Outputs a directory of the archive |
The zipfile
module can read and write ZIP files (i.e., archive files compatible with those handled by popular compression programs such as zip
and unzip
, pkzip
and pkunzip
, WinZip
, and so on). python -m zipfile
offers a useful command-line interface to the module’s functionality: run it without further arguments to get a brief help message.
Detailed information about ZIP files is at the pkware and info-zip web pages. You need to study that detailed information to perform advanced ZIP file handling with zipfile
. If you do not specifically need to interoperate with other programs using the ZIP file standard, the modules gzip
and bz2
are often better ways to deal with file compression.
The zipfile
module can’t handle multidisk ZIP files, and cannot create encrypted archives (however, it can decrypt them, albeit rather slowly). The module also cannot handle archive members using compression types besides the usual ones, known as stored (a file copied to the archive without compression) and deflated (a file compressed using the ZIP format’s default algorithm). (In v3 only, zipfile
also handles compression types bzip2 and lzma, but beware: not all tools, including v2’s zipfile
, can handle those, so if you use them you’re sacrificing some portability to get better compression.) For errors related to invalid .zip files, functions of zipfile
raise exceptions that are instances of the exception class zipfile.error
.
The zipfile
module supplies the following classes and functions:
is_zipfile |
Returns |
ZipInfo |
The methods
|
ZipFile |
Opens a ZIP file named by string When
When A In addition,
|
A ZipFile
instance z
supplies the following methods:
close |
Closes archive file |
extract |
Extract an archive member to disk, to the directory Returns the path to the file it has created (or overwritten if it already existed), or to the directory it has created (or left alone if it already existed). |
extractall |
Extract archive members to disk (by default, all of them), to directory |
getinfo |
Returns a |
infolist |
Returns a list of |
namelist |
Returns a list of strings, the name of each member in archive |
open |
Extracts and returns the archive member identified by |
printdir |
Outputs a textual directory of the archive |
read |
Extracts the archive member identified by |
setpassword |
Sets string |
testzip |
Reads and checks the files in archive |
write |
Writes the file named by string |
writestr |
Besides being faster and more concise, Here’s how you can print a list of all files contained in the ZIP file archive created by the previous example, followed by each file’s name and contents:
|
The zlib
module lets Python programs use the free InfoZip zlib compression library version 1.1.4 or later. zlib
is used by the modules gzip
and zipfile
, but is also available directly for any special compression needs. The most commonly used functions supplied by zlib
are:
compress |
Compresses string |
decompress |
Decompresses the compressed bytestring |
The zlib
module also supplies functions to compute Cyclic-Redundancy Check (CRC), which allows detection of damage in compressed data, as well as objects to compress and decompress data incrementally to allow working with data streams too large to fit in memory at once. For such advanced functionality, consult the Python library’s online docs.
os
is an umbrella module presenting a reasonably uniform cross-platform view of the capabilities of various operating systems. It supplies low-level ways to create and handle files and directories, and to create, manage, and destroy processes. This section covers filesystem-related functions of os
; “Running Other Programs with the os Module” covers process-related functions.
The os
module supplies a name
attribute, a string that identifies the kind of platform on which Python is being run. Common values for name
are 'posix'
(all kinds of Unix-like platforms, including Linux and macOS), 'nt'
(all kinds of 32-bit Windows platforms), and 'java'
(Jython). You can exploit some unique capabilities of a platform through functions supplied by os
. However, this book deals with cross-platform programming, not with platform-specific functionality, so we do not cover parts of os
that exist only on one platform, nor platform-specific modules. Functionality covered in this book is available at least on 'posix'
and 'nt'
platforms. We do, though, cover some of the differences among the ways in which a given functionality is provided on various platforms.
When a request to the operating system fails, os
raises an exception, an instance of OSError
. os
also exposes built-in exception class OSError
with the synonym os.error
. Instances of OSError
expose three useful attributes:
errno
The numeric error code of the operating system error
strerror
A string that summarily describes the error
filename
The name of the file on which the operation failed (file-related functions only)
In v3 only, OSError
has many subclasses that specify more precisely what problem was encountered, as covered in “OSError and subclasses (v3 only)”.
os
functions can also raise other standard exceptions, such as TypeError
or ValueError
, when the cause of the error is that you have called them with invalid argument types or values, so that the underlying operating system functionality has not even been attempted.
The errno
module supplies dozens of symbolic names for error code numbers. To handle possible system errors selectively, based on error codes, use errno
to enhance your program’s portability and readability. For example, here’s how you might handle “file not found” errors, while propagating all other kinds of errors (when you want your code to work as well in v2 as in v3):
try
:
os
.
some_os_function_or_other
(
)
except
OSError
as
err
:
import
errno
# check for "file not found" errors, re-raise other cases
if
err
.
errno
!=
errno
.
ENOENT
:
raise
# proceed with the specific case you can handle
(
'
Warning: file
'
,
err
.
filename
,
'
not found—continuing
'
)
If you’re coding for v3 only, however, you can make an equivalent snippet much simpler and clearer, by catching just the applicable OSError
subclass:
try
:
os
.
some_os_function_or_other
(
)
except
FileNotFoundError
as
err
:
(
'
Warning: file
'
,
err
.
filename
,
'
not found—continuing
'
)
errno
also supplies a dictionary named errorcode
: the keys are error code numbers, and the corresponding names are the error names, which are strings such as 'ENOENT'
. Displaying errno.errorcode[err.errno]
, as part of your diagnosis of some OSError
instance err
, can often make the diagnosis clearer and more understandable to readers who specialize in the specific platform.
Using the os
module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories; comparing files; and examining filesystem information about files and directories. This section documents the attributes and methods of the os
module that you use for these purposes, and covers some related modules that operate on the filesystem.
A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with a slash (/
) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, in particular, you may use a backslash () as the separator. However, you then need to double-up each backslash as
\
in string literals, or use raw-string syntax as covered in “Literals”; you needlessly lose portability. Unix path syntax is handier and usable everywhere, so we strongly recommend that you always use it. In the rest of this chapter, we assume Unix path syntax in both explanations and examples.
The os
module supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in “The os.path Module” rather than lower-level string operations based on these attributes. However, the attributes may be useful at times.
curdir
The string that denotes the current directory ('.'
on Unix and Windows)
defpath
The default search path for programs, used if the environment lacks a PATH
environment variable
linesep
The string that terminates text lines ('
'
on Unix; '
'
on Windows)
extsep
The string that separates the extension part of a file’s name from the rest of the name ('.'
on Unix and Windows)
pardir
The string that denotes the parent directory ('..'
on Unix and Windows)
pathsep
The separator between paths in lists of paths, such as those used for the environment variable PATH
(':'
on Unix; ';'
on Windows)
sep
The separator of path components ('/'
on Unix; ''
on Windows)
Unix-like platforms associate nine bits with each file or directory: three each for the file’s owner, its group, and anybody else (AKA “the world”), indicating whether the file or directory can be read, written, and executed by the given subject. These nine bits are known as the file’s permission bits, and are part of the file’s mode (a bit string that includes other bits that describe the file). These bits are often displayed in octal notation, since that groups three bits per digit. For example, mode 0o664
indicates a file that can be read and written by its owner and group, and read—but not written—by anybody else. When any process on a Unix-like system creates a file or directory, the operating system applies to the specified mode a bit mask known as the process’s umask, which can remove some of the permission bits.
Non-Unix-like platforms handle file and directory permissions in very different ways. However, the os
functions that deal with file permissions accept a mode
argument according to the Unix-like approach described in the previous paragraph. Each platform maps the nine permission bits in a way appropriate for it. For example, on versions of Windows that distinguish only between read-only and read/write files and do not distinguish file ownership, a file’s permission bits show up as either 0o666
(read/write) or 0o444
(read-only). On such a platform, when creating a file, the implementation looks only at bit 0o200
, making the file read/write when that bit is 1
, read-only when 0
.
The os
module supplies several functions to query and set file and directory status. In all versions and platforms, the argument path
to any of these functions can be a string giving the path of the file or directory involved. In v3 only, on some Unix platforms, some of the functions also support as argument path
a file descriptor (AKA fd), an int
denoting a file (as returned, for example, by os.open
). In this case, the module attribute os.supports_fd
is the set of functions of the os
module that do support a file descriptor as argument path
(the module attribute is missing in v2, and in v3 on platforms lacking such support).
In v3 only, on some Unix platforms, some functions support the optional, keyword-only argument follow_symlinks
, defaulting to True
. When true, and always in v2, if path
indicates a symbolic link, the function follows it to reach an actual file or directory; when false, the function operates on the symbolic link itself. The module attribute os.supports_follow_symlinks
, if present, is the set of functions of the os
module that do support this argument.
In v3 only, on some Unix platforms, some functions support the optional, keyword-only argument dir_fd
, defaulting to None
. When present, path
(if relative) is taken as being relative to the directory open at that file descriptor; when missing, and always in v2, path
(if relative) is taken as relative to the current working directory. The module attribute os.supports_dir_fd
, if present, is the set of functions of the os
module that do support this argument.
access |
Returns
In v3 only, |
|||||||||||||||||||||||||||||||||
chdir |
Sets the current working directory of the process to |
|||||||||||||||||||||||||||||||||
chmod |
Changes the permissions of the file |
|||||||||||||||||||||||||||||||||
getcwd |
Returns a |
|||||||||||||||||||||||||||||||||
link |
Create a hard link named |
|||||||||||||||||||||||||||||||||
listdir |
Returns a list whose items are the names of all files and subdirectories in the directory The v2-only |
|||||||||||||||||||||||||||||||||
makedirs, mkdir |
|
|||||||||||||||||||||||||||||||||
remove, unlink |
Removes the file named |
|||||||||||||||||||||||||||||||||
removedirs |
Loops from right to left over the directories that are part of |
|||||||||||||||||||||||||||||||||
rename |
Renames (i.e., moves) the file or directory named |
|||||||||||||||||||||||||||||||||
renames |
Like |
|||||||||||||||||||||||||||||||||
rmdir |
Removes the empty directory named |
|||||||||||||||||||||||||||||||||
scandir |
Returns an iterator over
|
|||||||||||||||||||||||||||||||||
stat |
Returns a value
For example, to print the size in bytes of file
Time values are in seconds since the epoch, as covered in Chapter 12 ( |
|||||||||||||||||||||||||||||||||
tempnam, tmpnam |
Returns an absolute path usable as the name of a new temporary file. Note: |
|||||||||||||||||||||||||||||||||
utime |
Sets the accessed and modified times of file or directory |
|||||||||||||||||||||||||||||||||
walk |
A generator yielding an item for each directory in the tree whose root is directory Each item By default, |
The os.path
module supplies functions to analyze and transform path strings. To use this module, you can import os.path
; however, even if you just import os
, you can also access the os.path
module and all of its attributes. The most commonly useful functions from the module are listed here:
abspath |
Returns a normalized absolute path string equivalent to
For example, |
basename |
Returns the base name part of |
commonprefix |
Accepts a list of strings and returns the longest string that is a prefix of all items in the list. Unlike all other functions in In v3 only, function |
dirname |
Returns the directory part of |
exists, |
Returns |
expandvars, |
Returns a copy of string
emits
|
getatime, getmtime, getctime, getsize |
Each of these functions returns an attribute from the result of |
isabs |
Returns |
isfile |
Returns |
isdir |
Returns |
islink |
Returns |
ismount |
Returns |
join |
Returns a string that joins the argument strings with the appropriate path separator for the current platform. For example, on Unix, exactly one slash character
The second call to |
normcase |
Returns a copy of |
normpath |
Returns a normalized pathname equivalent to |
realpath |
Returns the actual path of the specified file or directory, resolving symlinks along the way. |
relpath |
Returns a relative path to the specified file or directory, relative to directory |
samefile |
Returns |
sameopenfile |
Returns |
samestat |
Returns |
split |
Returns a pair of strings |
splitdrive |
Returns a pair of strings |
splitext |
Returns a pair |
walk |
(v2 only) Calls |
The function os.stat
(covered in Table 10-4) returns instances of stat_result
, whose item indices, attribute names, and meaning are also covered there. The stat
module supplies attributes with names like those of stat_result
’s attributes, turned into uppercase, and corresponding values that are the corresponding item indices.
The more interesting contents of the stat
module are functions to examine the st_mode
attribute of a stat_result
instance and determine the kind of file. os.path
also supplies functions for such tasks, which operate directly on the file’s path
. The functions supplied by stat
shown in the following list are faster when you perform several tests on the same file: they require only one os.stat
call at the start of a series of tests, while the functions in os.path
implicitly ask the operating system for the same information at each test. Each function returns True
when mode
denotes a file of the given kind; otherwise, False
.
S_ISDIR(
mode
)
Is the file a directory?
S_ISCHR(
mode
)
Is the file a special device-file of the character kind?
S_ISBLK(
mode
)
Is the file a special device-file of the block kind?
S_ISREG(
mode
)
Is the file a normal file (not a directory, special device-file, and so on)?
S_ISFIFO(
mode
)
Is the file a FIFO (also known as a “named pipe”)?
S_ISLNK(
mode
)
Is the file a symbolic link?
S_ISSOCK(
mode
)
Is the file a Unix-domain socket?
Several of these functions are meaningful only on Unix-like systems, since other platforms do not keep special files such as devices and sockets in the namespace for regular files, as Unix-like systems do.
The stat
module supplies two more functions that extract relevant parts of a file’s mode
(x
.st_mode
, for some result x
of function os.stat
):
S_IFMT |
Returns those bits of |
S_IMODE |
Returns those bits of |
The filecmp
module supplies the following functions to compare files and directories:
cmp |
Compares the files named by path strings |
cmpfiles |
Loops on the sequence |
dircmp |
Creates a new directory-comparison instance object, comparing directories named A
In addition, |
A dircmp
instance d
supplies several attributes, computed “just in time” (i.e., only if and when needed, thanks to a __getattr__
special method) so that using a dircmp
instance suffers no unnecessary overhead:
d
.common
Files and subdirectories that are in both dir1
and dir2
d
.common_dirs
Subdirectories that are in both dir1
and dir2
d
.common_files
Files that are in both dir1
and dir2
d
.common_funny
Names that are in both dir1
and dir2
for which os.stat
reports an error or returns different kinds for the versions in the two directories
d
.diff_files
Files that are in both dir1
and dir2
but with different contents
d
.funny_files
Files that are in both dir1
and dir2
but could not be compared
d
.left_list
Files and subdirectories that are in dir1
d
.left_only
Files and subdirectories that are in dir1
and not in dir2
d
.right_list
Files and subdirectories that are in dir2
d
.right_only
Files and subdirectories that are in dir2
and not in dir1
d
.same_files
Files that are in both dir1
and dir2
with the same contents
d
.subdirs
A dictionary whose keys are the strings in common_dirs
; the corresponding values are instances of dircmp
for each subdirectory
The fnmatch
module (an abbreviation for filename match) matches filename strings with patterns that resemble the ones used by Unix shells:
Matches any sequence of characters
Matches any single character
Matches any one of the characters in chars
Matches any one character not among those in chars
fnmatch
does not follow other conventions of Unix shells’ pattern matching, such as treating a slash /
or a leading dot .
specially. It also does not allow escaping special characters: rather, to literally match a special character, enclose it in brackets. For example, to match a filename that’s a single closed bracket, use the pattern '[]]'
.
The fnmatch
module supplies the following functions:
filter |
Returns the list of items of |
fnmatch |
Returns |
fnmatchcase |
Returns |
translate |
Returns the regular expression pattern (as covered in “Pattern-String Syntax”) equivalent to the
|
The glob
module lists (in arbitrary order) the path names of files that match a path pattern using the same rules as fnmatch
; in addition, it does treat a leading dot .
specially, like Unix shells do.
glob |
Returns the list of path names of files that match pattern |
iglob |
Like |
The shutil
module (an abbreviation for shell utilities) supplies the following functions to copy and move files, and to remove an entire directory tree. In v3 only, on some Unix platforms, most of the functions support optional, keyword-only argument follow_symlinks
, defaulting to True
. When true, and always in v2, if a path indicates a symbolic link, the function follows it to reach an actual file or directory; when false, the function operates on the symbolic link itself.
copy |
Copies the contents of the file |
copy2 |
Like |
copyfile |
Copies just the contents (not permission bits, nor last-access and modification times) of the file |
copyfileobj |
Copies all bytes from the “file” object |
copymode |
Copies permission bits of the file or directory |
copystat |
Copies permission bits and times of last access and modification of the file or directory |
copytree |
Copies the directory tree rooted at When When
copies the tree rooted at directory |
ignore_patterns |
Returns a callable picking out files and subdirectories matching |
move |
Moves the file or directory |
rmtree |
Removes directory tree rooted at |
Beyond offering functions that are directly useful, the source file shutil.py in the standard Python library is an excellent example of how to use many os
functions.
The os
module supplies, among many others, many functions to handle file descriptors, integers the operating system uses as opaque handles to refer to open files. Python “file” objects (covered in “The io Module”) are usually better for I/O tasks, but sometimes working at file-descriptor level lets you perform some operation faster, or (sacrificing portability) in ways not directly available with io.open
. “File” objects and file descriptors are not interchangeable.
To get the file descriptor n
of a Python “file” object f
, call n
=
f
.fileno()
. To wrap a new Python “file” object f
around an open file descriptor fd
, call f
=os.fdopen(
fd
)
, or pass fd
as the first argument of io.open
. On Unix-like and Windows platforms, some file descriptors are pre-allocated when a process starts: 0
is the file descriptor for the process’s standard input, 1
for the process’s standard output, and 2
for the process’s standard error.
os
provides many functions dealing with file descriptors; the most often used ones are listed in Table 10-5.
close |
Closes file descriptor |
closerange |
Closes all file descriptors from |
dup |
Returns a file descriptor that duplicates file descriptor |
dup2 |
Duplicates file descriptor |
fdopen |
Like |
fstat |
Returns a |
lseek |
Sets the current position of file descriptor |
open |
Returns a file descriptor, opening or creating a file named by string
|
pipe |
Creates a pipe and returns a pair of file descriptors |
read |
Reads up to |
write |
Writes all bytes from bytestring |
Python presents non-GUI text input and output channels to your programs as “file” objects, so you can use the methods of “file” objects (covered in “Attributes and Methods of “file” Objects”) to operate on these channels.
The sys
module (covered in “The sys Module”) has the attributes stdout
and stderr
, writeable “file” objects. Unless you are using shell redirection or pipes, these streams connect to the “terminal” running your script. Nowadays, actual terminals are very rare: a so-called “terminal” is generally a screen window that supports text I/O (e.g., a command prompt console on Windows or an xterm
window on Unix).
The distinction between sys.stdout
and sys.stderr
is a matter of convention. sys.stdout
, known as standard output, is where your program emits results. sys.stderr
, known as standard error, is where error messages go. Separating results from error messages helps you use shell redirection effectively. Python respects this convention, using sys.stderr
for errors and warnings.
Programs that output results to standard output often need to write to sys.stdout
. Python’s print
function (covered in Table 7-2) can be a rich, convenient alternative to sys.stdout.write
. (In v2, start your module with from __future__ import print_function
to make print
a function—otherwise, for backward compatibility, it’s a less-convenient statement.)
print
is fine for the informal output used during development to help you debug your code. For production output, you may need more control of formatting than print
affords. You may need to control spacing, field widths, number of decimals for floating-point, and so on. If so, prepare the output as a string with string-formatting method format
(covered in “String Formatting”), then output the string, usually with the write
method of the appropriate “file” object. (You can pass formatted strings to print
, but print
may add spaces and newlines; the write
method adds nothing at all, so it’s easier for you to control what exactly gets output.)
To direct the output from print
calls to a certain file, as an alternative to repeated use of file=
destination
on each print
, you can temporarily change the value of sys.stdout
. The following example is a general-purpose redirection function usable for such a temporary change; in the presence of asynchronous operations, make sure to also add a lock in order to avoid any contention:
def
redirect
(
func
,
*
args
,
**
kwds
):
"""redirect(func, ...) -> (string result, func's return value)
func must be a callable and may emit results to standard output.
redirect captures those results as a string and returns a pair,
with the output string as the first item and func's return value
as the second one.
"""
import
sys
,
io
save_out
=
sys
.
stdout
sys
.
stdout
=
io
.
StringIO
()
try
:
retval
=
func
(
*
args
,
**
kwds
)
return
sys
.
stdout
.
getvalue
(),
retval
finally
:
sys
.
stdout
.
close
()
sys
.
stdout
=
save_out
To output a few text values to a file object f
that isn’t the current sys.stdout
, avoid such manipulations. For such simple purposes, just calling f
.write
is often best, and print(file=
f
,...)
is, even more often, a handy alternative.
The sys
module provides the stdin
attribute, which is a readable “file” object. When you need a line of text from the user, you can call built-in function input
(covered in Table 7-2; in v2, it’s named raw_input
), optionally with a string argument to use as a prompt.
When the input you need is not a string (for example, when you need a number), use input
to obtain a string from the user, then other built-ins, such as int
, float
, or ast.literal_eval
(covered below), to turn the string into the number you need.
You could, in theory, also use eval
(normally preceded by compile
, for better control of error diagnostics), so as to let the user input any expression, as long as you totally trust the user. A nasty user can exploit eval
to breach security and cause damage (a well-meaning but careless user can also unfortunately cause just about as much damage). There is no effective defense—just avoid eval
(and exec
!) on any input from sources you do not fully trust.
One advanced alternative we do recommend is to use the function literal_eval
from the standard library module ast
(as covered in the online docs). ast.literal_eval(astring)
returns a valid Python value for the given literal astring
when it can, or else raises a SyntaxError
or ValueError
; it never has any side effect. However, to ensure complete safety, astring
in this case cannot use any operator, nor any nonkeyword identifier. For example:
import
ast
(
ast
.
literal_eval
(
'
23
'
)
)
# prints
23
(
ast
.
literal_eval
(
'
[2,3]
'
)
)
# prints
[2, 3]
(
ast
.
literal_eval
(
'
2+3
'
)
)
# raises ValueError
(
ast
.
literal_eval
(
'
2+
'
)
)
# raises SyntaxError
Very occasionally, you may want the user to input a line of text in such a way that somebody looking at the screen cannot see what the user is typing. This may occur when you’re asking the user for a password. The getpass
module provides the following functions:
The tools covered so far supply the minimal subset of text I/O functionality on all platforms. Most platforms offer richer-text I/O, such as responding to single keypresses (not just entire lines) and showing text in any spot on the terminal.
Python extensions and core Python modules let you access platform-specific functionality. Unfortunately, various platforms expose this functionality in very different ways. To develop cross-platform Python programs with rich-text I/O functionality, you may need to wrap different modules uniformly, importing platform-specific modules conditionally (usually with the try
/except
idiom covered in “try/except”).
The readline
module wraps the GNU Readline Library. Readline lets the user edit text lines during interactive input, and recall previous lines for editing and reentry. Readline comes pre-installed on many Unix-like platforms, and it’s available online. On Windows, you can install and use the third-party module pyreadline.
When readline
is available, Python uses it for all line-oriented input, such as input
. The interactive Python interpreter always tries to load readline
to enable line editing and recall for interactive sessions. Some readline
functions control advanced functionality, particularly history, for recalling lines entered in previous sessions, and completion, for context-sensitive completion of the word being entered. (See the GNU Readline docs for details on configuration commands.) You can access the module’s functionality using the following functions:
add_history |
Adds string |
clear_history |
Clears the history buffer. |
get_completer |
Returns the current completer function (as last set by |
get_history_length |
Returns the number of lines of history to be saved to the history file. When the result is less than |
parse_and_bind |
Gives Readline a configuration command. To let the user hit Tab to request completion, call A good completion function is in the module
For the rest of this interactive session, you can hit Tab during line editing and get completion for global names and object attributes. |
read_history_file |
Loads history lines from the text file at path |
read_init_file |
Makes Readline load a text file: each line is a configuration command. When |
set_completer |
Sets the completion function. When |
set_history_length |
Sets the number of lines of history that are to be saved to the history file. When |
write_history_file |
Saves history lines to the text file whose name or path is |
“Terminals” today are usually text windows on a graphical screen. You may also, in theory, use a true terminal, or (perhaps a tad less theoretical, but, these days, not by much) the console (main screen) of a personal computer in text mode. All such “terminals” in use today offer advanced text I/O functionality, accessed in platform-dependent ways. The curses
package works on Unix-like platforms; for a cross-platform (Windows, Unix, Mac) solution, you may use third-party package UniCurses. The msvcrt
module, on the contrary, exists only on Windows.
The classic Unix approach to advanced terminal I/O is named curses, for obscure historical reasons.1 The Python package curses
affords reasonably simple use, but still lets you exert detailed control if required. We cover a small subset of curses
, just enough to let you write programs with rich-text I/O functionality. (See Eric Raymond’s tutorial Curses Programming with Python for more). Whenever we mention “the screen” in this section, we mean the screen of the terminal (usually, these days, that’s the text window of a terminal-emulator program).
The simplest and most effective way to use curses
is through the curses.wrapper
function:
wrapper |
When you call
|
curses
models text and background colors as character attributes. Colors available on the terminal are numbered from 0
to curses.COLORS
. The function color_content
takes a color number n
as its argument and returns a tuple (
r
,
g
,
b
)
of integers between 0
and 1000
giving the amount of each primary color in n
. The function color_pair
takes a color number n
as its argument and returns an attribute code that you can pass to various methods of a curses.Window
object in order to display text in that color.
curses
lets you create multiple instances of type curses.Window
, each corresponding to a rectangle on the screen. You can also create exotic variants, such as instances of Panel
, polymorphic with Window
but not tied to a fixed screen rectangle. You do not need such advanced functionality in simple curses
programs: just use the Window
object stdscr
that curses.wrapper
gives you. Call w
.refresh()
to ensure that changes made to any Window
instance w
, including stdscr
, show up on screen. curses
can buffer the changes until you call refresh
.
An instance w
of Window
supplies, among many others, the following frequently used methods:
addstr |
Puts the characters in the string |
clrtobot, clrtoeol |
|
delch |
Deletes one character from |
deleteln |
Deletes from |
erase |
Writes spaces to the entire terminal screen. |
getch |
Returns an integer If you have set window |
getyx |
Returns |
insstr |
Inserts the characters in string |
move |
Moves |
nodelay |
Sets |
refresh |
Updates window |
The curses.textpad
module supplies the Textpad
class, which lets you support advanced input and text editing.
The msvcrt
module, available only on Windows, supplies functions that let Python programs access a few proprietary extras supplied by the Microsoft Visual C++’s runtime library msvcrt.dll. Some msvcrt
functions let you read user input character by character rather than reading a full line at a time, as listed here:
The cmd
module offers a simple way to handle interactive sessions of commands. Each command is a line of text. The first word of each command is a verb defining the requested action. The rest of the line is passed as an argument to the method that implements the verb’s action.
The cmd
module supplies the class Cmd
to use as a base class, and you define your own subclass of cmd.Cmd
. Your subclass supplies methods with names starting with do_
and help_
, and may optionally override some of Cmd
’s methods. When the user enters a command line such as verb and the rest
, as long as your subclass defines a method named do_
verb
, Cmd.onecmd
calls:
self
.
do_
verb
(
'
and the rest
'
)
Similarly, as long as your subclass defines a method named help_
verb
, Cmd.do_help
calls the method when the command line starts with 'help
verb
'
or '?
verb
'
. Cmd
shows suitable error messages if the user tries to use, or asks for help about, a verb for which the needed method is not defined.
Your subclass of cmd.Cmd
, if it defines its own __init__
special method, must call the base class’s __init__
, whose signature is as follows:
__init__ |
Initializes the instance If your subclass does not define |
An instance c
of a subclass of the class Cmd
supplies the following methods (many of these methods are “hooks” meant to be optionally overridden by the subclass):
cmdloop |
Performs an interactive session of line-oriented commands.
|
default |
|
do_help |
|
emptyline |
|
onecmd |
|
postcmd |
|
postloop |
|
precmd |
|
preloop |
|
An instance c
of a subclass of the class Cmd
supplies the following attributes:
identchars
A string whose characters are all those that can be part of a verb; by default, c
.identchars
contains letters, digits, and an underscore (_
).
intro
The message that cmdloop
outputs first, when called with no argument.
lastcmd
The last nonblank command line seen by onecmd
.
prompt
The string that cmdloop
uses to prompt the user for interactive input. You almost always bind c
.prompt
explicitly, or override prompt
as a class attribute of your subclass; the default Cmd.prompt
is just '(Cmd) '
.
use_rawinput
When false (default is true), cmdloop
prompts and inputs via calls to methods of sys.stdout
and sys.stdin
, rather than via input
.
Other attributes of Cmd
instances, which we do not cover here, let you exert fine-grained control on many formatting details of help messages.
The following example shows how to use cmd.Cmd
to supply the verbs print
(to output the rest of the line) and stop
(to end the loop):
import
cmd
class
X
(
cmd
.
Cmd
):
def
do_print
(
self
,
rest
):
(
rest
)
def
help_print
(
self
):
(
'print (any string): outputs (any string)'
)
def
do_stop
(
self
,
rest
):
return
True
def
help_stop
(
self
):
(
'stop: terminates the command loop'
)
if
__name__
==
'__main__'
:
X
()
.
cmdloop
()
A session using this example might proceed as follows:
C:>
python examples/chapter10/cmdex.py
(Cmd)
help
Documented commands (type help <topic>):
= == == == == == == == == == == == == == == == == == == == =
print stop
Undocumented commands:
= == == == == == == == == == == =
help
(Cmd)
help print
print (any string): outputs (any string)
(Cmd)
print hi there
hi there
(Cmd)
stop
Most programs present some information to users as text. Such text should be understandable and acceptable to the user. For example, in some countries and cultures, the date “March 7” can be concisely expressed as “3/7.” Elsewhere, “3/7” indicates “July 3,” and the string that means “March 7” is “7/3.” In Python, such cultural conventions are handled with the help of the standard module locale
.
Similarly, a greeting can be expressed in one natural language by the string “Benvenuti,” while in another language the string to use is “Welcome.” In Python, such translations are handled with the help of standard module gettext
.
Both kinds of issues are commonly called internationalization (often abbreviated i18n, as there are 18 letters between i and n in the full spelling)—a misnomer, since the same issues apply to different languages or cultures within a single nation.
Python’s support for cultural conventions imitates that of C, slightly simplified. A program operates in an environment of cultural conventions known as a locale. The locale setting permeates the program and is typically set at program startup. The locale is not thread-specific, and the locale
module is not thread-safe. In a multithreaded program, set the program’s locale before starting secondary threads.
If your application needs to handle multiple locales at the same time in a single process—whether that’s in threads or asynchronously—locale
is not the answer, due to its process-wide nature. Consider alternatives such as PyICU, mentioned in “More Internationalization Resources”.
If a program does not call locale.setlocale
, the locale is a neutral one known as the C locale. The C locale is named from this architecture’s origins in the C language; it’s similar, but not identical, to the U.S. English locale. Alternatively, a program can find out and accept the user’s default locale. In this case, the locale
module interacts with the operating system (via the environment or in other system-dependent ways) to find the user’s preferred locale. Finally, a program can set a specific locale, presumably determining which locale to set on the basis of user interaction or via persistent configuration settings.
Locale setting is normally performed across the board for all relevant categories of cultural conventions. This wide-spectrum setting is denoted by the constant attribute LC_ALL
of module locale
. However, the cultural conventions handled by locale
are grouped into categories, and, in some cases, a program can choose to mix and match categories to build up a synthetic composite locale. The categories are identified by the following constant attributes of the locale
module:
LC_COLLATE
String sorting; affects functions strcoll
and strxfrm
in locale
LC_CTYPE
Character types; affects aspects of the module string
(and string methods) that have to do with lowercase and uppercase letters
LC_MESSAGES
Messages; may affect messages displayed by the operating system—for example, the function os.strerror
and module gettext
LC_MONETARY
Formatting of currency values; affects function locale.localeconv
LC_NUMERIC
Formatting of numbers; affects the functions atoi
, atof
, format
, localeconv
, and str
in locale
LC_TIME
Formatting of times and dates; affects the function time.strftime
The settings of some categories (denoted by LC_CTYPE
, LC_TIME
, and LC_MESSAGES
) affect behavior in other modules (string
, time
, os
, and gettext
, as indicated). Other categories (denoted by LC_COLLATE
, LC_MONETARY
, and LC_NUMERIC
) affect only some functions of locale
itself.
The locale
module supplies the following functions to query, change, and manipulate locales, as well as functions that implement the cultural conventions of locale categories LC_COLLATE
, LC_MONETARY
, and LC_NUMERIC
:
A key issue in internationalization is the ability to use text in different natural languages, a task known as localization. Python supports localization via the module gettext
, inspired by GNU gettext. The gettext
module is optionally able to use the latter’s infrastructure and APIs, but also offers a simpler, higher-level approach, so you don’t need to install or study GNU gettext to use Python’s gettext
effectively.
For full coverage of gettext
from a different perspective, see the online docs.
gettext
does not deal with automatic translation between natural languages. Rather, it helps you extract, organize, and access the text messages that your program uses. Pass each string literal subject to translation, also known as a message, to a function named _
(underscore) rather than using it directly. gettext
normally installs a function named _
in the __builtin__
module. To ensure that your program runs with or without gettext
, conditionally define a do-nothing function, named _
, that just returns its argument unchanged. Then you can safely use _('
message
')
wherever you would normally use a literal '
message
'
that should be translated if feasible. The following example shows how to start a module for conditional use of gettext
:
try
:
_
except
NameError
:
def
_
(
s
):
return
s
def
greet
():
(
_
(
'Hello world'
))
If some other module has installed gettext
before you run this example code, the function greet
outputs a properly localized greeting. Otherwise, greet
outputs the string 'Hello world'
unchanged.
Edit your source, decorating message literals with function _
. Then use any of various tools to extract messages into a text file (normally named messages.pot) and distribute the file to the people who translate messages into the various natural languages your application must support. Python supplies a script pygettext.py (in directory Tools/i18n in the Python source distribution) to perform message extraction on your Python sources.
Each translator edits messages.pot to produce a text file of translated messages with extension .po. Compile the .po files into binary files with extension .mo, suitable for fast searching, using any of various tools. Python supplies script Tools/i18n/msgfmt.py for this purpose. Finally, install each .mo file with a suitable name in a suitable directory.
Conventions about which directories and names are suitable differ among platforms and applications. gettext
’s default is subdirectory share/locale/<lang>/LC_MESSAGES/ of directory sys.prefix, where <lang> is the language’s code (two letters). Each file is named <name>.mo, where <name> is the name of your application or package.
Once you have prepared and installed your .mo files, you normally execute, at the time your application starts up, some code such as the following:
import
os
,
gettext
os
.
environ
.
setdefault
(
'LANG'
,
'en'
)
# application-default language
gettext
.
install
(
'your_application_name'
)
This ensures that calls such as _('message')
return the appropriate translated strings. You can choose different ways to access gettext
functionality in your program—for example, if you also need to localize C-coded extensions, or to switch between languages during a run. Another important consideration is whether you’re localizing a whole application, or just a package that is distributed separately.
gettext
supplies many functions; the most often used ones are:
install |
Installs in Python’s built-in namespace a function named |
translation |
Searches for a .mo file similarly to function
|
Internationalization is a very large topic. For a general introduction, see Wikipedia. One of the best packages of code and information for internationalization is ICU, which also embeds the Unicode Consortium’s excellent Common Locale Data Repository (CLDR) database of locale conventions, and code to access the CLDR. To use ICU in Python, install the third-party package PyICU.
1 “Curses” does describe well the typical utterances of programmers faced with this rich, complicated approach.