In the prior chapter we met Python’s two looping statements,
while
and for
. Although they can handle most repetitive
tasks programs need to perform, the need to iterate over sequences is so
common and pervasive that Python provides additional tools to make it
simpler and more efficient. This chapter begins our exploration of these
tools. Specifically, it presents the related concepts of Python’s
iteration protocol—a method-call model used by
the for
loop—and fills in some
details on list comprehensions—a close cousin to
the for
loop that applies an
expression to items in an iterable.
Because both of these tools are related to both the for
loop and functions, we’ll take a two-pass
approach to covering them in this book: this chapter introduces the
basics in the context of looping tools, serving as something of
continuation of the prior chapter, and a later chapter (Chapter 20) revisits them in
the context of function-based tools. In this chapter, we’ll also sample
additional iteration tools in Python and touch on the new iterators
available in Python 3.0.
One note up front: some of the concepts presented in these chapters may seem advanced at first glance. With practice, though, you’ll find that these tools are useful and powerful. Although never strictly required, because they’ve become commonplace in Python code, a basic understanding can also help if you must read programs written by others.
In the preceding chapter, I mentioned that the for
loop can work on any sequence type in
Python, including lists, tuples, and strings, like this:
>>>for x in [1, 2, 3, 4]: print(x ** 2, end=' ')
... 1 4 9 16 >>>for x in (1, 2, 3, 4): print(x ** 3, end=' ')
... 1 8 27 64 >>>for x in 'spam': print(x * 2, end=' ')
... ss pp aa mm
Actually, the for
loop turns
out to be even more generic than this—it works on any
iterable object. In fact, this is true of all
iteration tools that scan objects from left to right in Python,
including for
loops, the list
comprehensions we’ll study in this chapter, in
membership tests, the map
built-in function, and more.
The concept of “iterable objects” is relatively recent in
Python, but it has come to permeate the language’s design. It’s
essentially a generalization of the notion of sequences—an object is considered iterable if it
is either a physically stored sequence or an object that produces one
result at a time in the context of an iteration tool like a for
loop. In a sense, iterable objects
include both physical sequences and virtual
sequences
computed on demand.[33]
One of the easiest ways to understand what this means is to look at how
it works with a built-in type such as the file. Recall from Chapter 9 that open file
objects have a method called readline
, which reads one line of text
from a file at a time—each time we call the readline
method, we advance to the next
line. At the end of the file, an empty string is returned, which we
can detect to break out of the loop:
>>>f = open('script1.py')
# Read a 4-line script file in this directory >>>f.readline()
# readline loads one line on each call 'import sys ' >>>f.readline()
'print(sys.path) ' >>>f.readline()
'x = 2 ' >>>f.readline()
'print(2 ** 33) ' >>>f.readline()
# Returns empty string at end-of-file ''
However, files also have a method named __next__
that has a nearly identical effect—it returns the next
line from a file each time it is called. The only noticeable
difference is that __next__
raises a built-in StopIteration
exception at end-of-file
instead of returning an empty string:
>>>f = open('script1.py')
# __next__ loads one line on each call too >>>f.__next__()
# But raises an exception at end-of-file 'import sys ' >>>f.__next__()
'print(sys.path) ' >>>f.__next__()
'x = 2 ' >>>f.__next__()
'print(2 ** 33) ' >>>f.__next__()
Traceback (most recent call last):...more exception text omitted...
StopIteration
This interface is exactly what we call the iteration
protocol in Python. Any object with a __next__
method to advance to a next
result, which raises StopIteration
at the end of the series of
results, is considered iterable in Python. Any such object may also
be stepped through with a for
loop or other iteration tool, because all iteration tools normally
work internally by calling __next__
on each iteration and catching
the StopIteration
exception to
determine when to exit.
The net effect of this magic is that, as mentioned in Chapter 9, the best way
to read a text file line by line today is to not read it
at all—instead, allow the for
loop to automatically call __next__
to advance to the next line on
each iteration. The file object’s iterator will do the work of
automatically loading lines as you go. The following, for example,
reads a file line by line, printing the uppercase version of each
line along the way, without ever explicitly reading from the file at
all:
>>>for line in open('script1.py'):
# Use file iterators to read by lines ...print(line.upper(), end='')
# Calls __next__, catches StopIteration ... IMPORT SYS PRINT(SYS.PATH) X = 2 PRINT(2 ** 33)
Notice that the print
uses
end=''
here to suppress adding a
, because line strings already
have one (without this, our output would be double-spaced). This is
considered the best way to read text files line by line today, for
three reasons: it’s the simplest to code, might be the quickest to
run, and is the best in terms of memory usage. The older, original
way to achieve the same effect with a for
loop is to call the file readlines
method to load the file’s
content into memory as a list of line strings:
>>>for line in open('script1.py').readlines():
...print(line.upper(), end='')
... IMPORT SYS PRINT(SYS.PATH) X = 2 PRINT(2 ** 33)
This readlines
technique
still works, but it is not considered the best practice today and
performs poorly in terms of memory usage. In fact, because this
version really does load the entire file into memory all at once, it
will not even work for files too big to fit into the memory space
available on your computer. By contrast, because it reads one line
at a time, the iterator-based version is immune to such
memory-explosion issues. The iterator version might run quicker too,
though this can vary per release (Python 3.0 made this advantage
less clear-cut by rewriting I/O libraries to support Unicode text
and be less system-dependent).
As mentioned in the prior chapter’s sidebar, Why You Will Care: File Scanners, it’s also
possible to read a file line by line with a while
loop:
>>>f = open('script1.py')
>>>while True:
...line = f.readline()
...if not line: break
...print(line.upper(), end='')
... ...same output
...
However, this may run slower than the iterator-based for
loop version, because iterators run at
C language speed inside Python, whereas the while
loop version runs Python byte code
through the Python virtual machine. Any time we trade Python code
for C code, speed tends to increase. This is not an absolute truth,
though, especially in Python 3.0; we’ll see timing techniques later
in this book for measuring the relative speed of alternatives like
these.
To support manual iteration code (with less typing), Python 3.0 also provides a
built-in function, next
, that automatically calls an object’s
__next__
method. Given an
iterable object X
, the call
next(X)
is the same as X.__next__()
, but noticeably simpler. With
files, for instance, either form may be used:
>>>f = open('script1.py')
>>>f.__next__()
# Call iteration method directly 'import sys ' >>>f.__next__()
'print(sys.path) ' >>>f = open('script1.py')
>>>next(f)
# next built-in calls __next__ 'import sys ' >>>next(f)
'print(sys.path) '
Technically, there is one more piece to the iteration
protocol. When the for
loop
begins, it obtains an iterator from the iterable object by passing
it to the iter
built-in function; the object returned
by iter
has the required next
method. This becomes obvious if we
look at how for
loops internally
process built-in sequence types such as lists:
>>>L = [1, 2, 3]
>>>I = iter(L)
# Obtain an iterator object >>>I.__next__()
# Call next to advance to next item 1 >>>I.__next__()
2 >>>I.__next__()
3 >>>I.__next__()
Traceback (most recent call last):...more omitted...
StopIteration
This initial step is not required for files, because a file
object is its own iterator. That is, files have their own __next__
method and so do not need to
return a different object that does:
>>>f = open('script1.py')
>>>iter(f) is f
True >>>f.__next__()
'import sys '
Lists, and many other built-in objects, are not their own
iterators because they support multiple open iterations. For such
objects, we must call iter
to
start iterating:
>>>L = [1, 2, 3]
>>>iter(L) is L
False >>>L.__next__()
AttributeError: 'list' object has no attribute '__next__' >>>I = iter(L)
>>>I.__next__()
1 >>>next(I)
# Same as I.__next__() 2
Although Python iteration tools call these functions automatically, we can use them to apply the iteration protocol manually, too. The following interaction demonstrates the equivalence between automatic and manual iteration:[34]
>>>L = [1, 2, 3]
>>> >>>for X in L:
# Automatic iteration ...print(X ** 2, end=' ')
# Obtains iter, calls __next__, catches exceptions ... 1 4 9 >>>I = iter(L)
# Manual iteration: what for loops usually do >>>while True:
...try:
# try statement catches exceptions ...X = next(I)
# Or call I.__next__ ...except StopIteration:
...break
...print(X ** 2, end=' ')
... 1 4 9
To understand this code, you need to know that try
statements run an action and catch
exceptions that occur while the action runs (we’ll explore
exceptions in depth in Part VII). I
should also note that for
loops
and other iteration contexts can sometimes work differently for
user-defined classes, repeatedly indexing an object instead of
running the iteration protocol. We’ll defer that story until we
study class operator overloading in Chapter 29.
Version skew note: In Python 2.6, the iteration method is named X.next()
instead of X.__next__()
. For portability, the
next(X)
built-in function is
available in Python 2.6 too (but not earlier), and calls 2.6’s
X.next()
instead of 3.0’s
X.__next__()
. Iteration works
the same in 2.6 in all other ways, though; simply use X.next()
or next(X)
for manual iterations, instead
of 3.0’s X.__next__()
. Prior to
2.6, use manual X.next()
calls
instead of next(X)
.
Besides files and physical sequences like lists, other types have useful iterators as well. The classic way to step through the keys of a dictionary, for example, is to request its keys list explicitly:
>>>D = {'a':1, 'b':2, 'c':3}
>>>for key in D.keys():
...print(key, D[key])
... a 1 c 3 b 2
In recent versions of Python, though, dictionaries have an iterator that automatically returns one key at a time in an iteration context:
>>>I = iter(D)
>>>next(I)
'a' >>>next(I)
'c' >>>next(I)
'b' >>>next(I)
Traceback (most recent call last):...more omitted...
StopIteration
The net effect is that we no longer need to call the keys
method to step through dictionary
keys—the for
loop will use the
iteration protocol to grab one key each time through:
>>>for key in D:
...print(key, D[key])
... a 1 c 3 b 2
We can’t delve into their details here, but other Python
object types also support the iterator protocol and thus may be used
in for
loops too. For instance,
shelves (an access-by-key filesystem for Python
objects) and the results from os.popen
(a tool for reading the output of
shell commands) are iterable as well:
>>>import os
>>>P = os.popen('dir')
>>>P.__next__()
' Volume in drive C is SQ004828V03 ' >>>P.__next__()
' Volume Serial Number is 08BE-3CD4 ' >>>next(P)
TypeError: _wrap_close object is not an iterator
Notice that popen
objects
support a P.next()
method in
Python 2.6. In 3.0, they support the P.__next__()
method, but not the next(P)
built-in; since the latter is
defined to call the former, it’s not clear if this behavior will
endure in future releases (as described in an earlier footnote, this
appears to be an implementation issue). This is only an issue for
manual iteration, though; if you iterate over these objects
automatically with for
loops and
other iteration contexts (described in the next sections), they
return successive lines in either Python version.
The iteration protocol also is the reason that we’ve had to
wrap some results in a list
call
to see their values all at once. Objects that are iterable return
results one at a time, not in a physical list:
>>>R = range(5)
>>>R
# Ranges are iterables in 3.0 range(0, 5) >>>I = iter(R)
# Use iteration protocol to produce results >>>next(I)
0 >>>next(I)
1 >>>list(range(5))
# Or use list to collect all results at once [0, 1, 2, 3, 4]
Now that you have a better understanding of this
protocol, you should be able to see how it explains why the enumerate
tool introduced in the prior
chapter works the way it does:
>>>E = enumerate('spam')
# enumerate is an iterable too >>>E
<enumerate object at 0x0253F508> >>>I = iter(E)
>>>next(I)
# Generate results with iteration protocol (0, 's') >>>next(I)
# Or use list to force generation to run (1, 'p') >>>list(enumerate('spam'))
[(0, 's'), (1, 'p'), (2, 'a'), (3, 'm')]
We don’t normally see this machinery because for
loops run it for us automatically to
step through results. In fact, everything that scans left-to-right
in Python employs the iteration protocol in the same way—including
the topic of the next section.
Now that we’ve seen how the iteration protocol works, let’s
turn to a very common use case. Together with for
loops, list comprehensions are one of
the most prominent contexts in which the iteration protocol is
applied.
In the previous chapter, we learned how to use range
to change a list as we step across
it:
>>>L = [1, 2, 3, 4, 5]
>>>for i in range(len(L)):
...L[i] += 10
... >>>L
[11, 12, 13, 14, 15]
This works, but as I mentioned there, it may not be the optimal “best-practice” approach in Python. Today, the list comprehension expression makes many such prior use cases obsolete. Here, for example, we can replace the loop with a single expression that produces the desired result list:
>>>L = [x + 10 for x in L]
>>>L
[21, 22, 23, 24, 25]
The net result is the same, but it requires less coding on our
part and is likely to run substantially faster. The list comprehension
isn’t exactly the same as the for
loop statement version because it makes a new
list object (which might matter if there are multiple references to
the original list), but it’s close enough for most applications and is
a common and convenient enough approach to merit a closer look
here.
We met the list comprehension briefly in Chapter 4. Syntactically, its
syntax is derived from a construct in set theory notation that
applies an operation to each item in a set, but you don’t have to
know set theory to use this tool. In Python, most people find that a
list comprehension simply looks like a backward for
loop.
To get a handle on the syntax, let’s dissect the prior section’s example in more detail:
>>> L = [x + 10 for x in L]
List comprehensions are written in square brackets because they are ultimately a way to
construct a new list. They begin with an arbitrary expression that
we make up, which uses a loop variable that we make up (x + 10
). That is followed by what you
should now recognize as the header of a for
loop, which names the loop variable,
and an iterable object (for x in
L
).
To run the expression, Python executes an iteration across
L
inside the interpreter,
assigning x
to each item in turn,
and collects the results of running the items through the expression
on the left side. The result list we get back is exactly what the
list comprehension says—a new list containing x + 10
, for every x
in L
.
Technically speaking, list comprehensions are never really
required because we can always build up a list of expression results
manually with for
loops that
append results as we go:
>>>res = []
>>>for x in L:
...res.append(x + 10)
... >>>res
[21, 22, 23, 24, 25]
In fact, this is exactly what the list comprehension does internally.
However, list comprehensions are more concise to write, and
because this code pattern of building up result lists is so common
in Python work, they turn out to be very handy in many contexts.
Moreover, list comprehensions can run much faster than manual
for
loop statements (often
roughly twice as fast) because their iterations are performed at C
language speed inside the interpreter, rather than with manual
Python code; especially for larger data sets, there is a major
performance advantage to using them.
Let’s work through another common use case for list
comprehensions to explore them in more detail. Recall that the file
object has a readlines
method
that loads the file into a list of line strings all at once:
>>>f = open('script1.py')
>>>lines = f.readlines()
>>>lines
['import sys ', 'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) ']
This works, but the lines in the result all include the
newline character (
) at the
end. For many programs, the newline character gets in the way—we
have to be careful to avoid double-spacing when printing, and so on.
It would be nice if we could get rid of these newlines all at once,
wouldn’t it?
Any time we start thinking about performing an operation on
each item in a sequence, we’re in the realm of list comprehensions.
For example, assuming the variable lines
is as it was in the prior
interaction, the following code does the job by running each line in
the list through the string rstrip
method to remove whitespace on the
right side (a line[:−1]
slice
would work, too, but only if we can be sure all lines are properly
terminated):
>>>lines = [line.rstrip() for line in lines]
>>>lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']
This works as planned. Because list comprehensions are an
iteration context just like for
loop statements, though, we don’t even have to open the file ahead
of time. If we open it inside the expression, the list comprehension
will automatically use the iteration protocol we met earlier in this
chapter. That is, it will read one line from the file at a time by
calling the file’s next
method,
run the line through the rstrip
expression, and add it to the result list. Again, we get what we ask
for—the rstrip
result of a line,
for every line in the file:
>>>lines = [line.rstrip() for line in open('script1.py')]
>>>lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']
This expression does a lot implicitly, but we’re getting a lot
of work for free here—Python scans the file and builds a list of
operation results automatically. It’s also an efficient way to code
this operation: because most of this work is done inside the Python
interpreter, it is likely much faster than an equivalent for
statement. Again, especially for large
files, the speed advantages of list comprehensions can be
significant.
Besides their efficiency, list comprehensions are also remarkably expressive. In our example, we can run any string operation on a file’s lines as we iterate. Here’s the list comprehension equivalent to the file iterator uppercase example we met earlier, along with a few others (the method chaining in the second of these examples works because string methods return a new string, to which we can apply another string method):
>>>[line.upper() for line in open('script1.py')]
['IMPORT SYS ', 'PRINT(SYS.PATH) ', 'X = 2 ', 'PRINT(2 ** 33) '] >>>[line.rstrip().upper() for line in open('script1.py')]
['IMPORT SYS', 'PRINT(SYS.PATH)', 'X = 2', 'PRINT(2 ** 33)'] >>>[line.split() for line in open('script1.py')]
[['import', 'sys'], ['print(sys.path)'], ['x', '=', '2'], ['print(2', '**','33)']] >>>[line.replace(' ', '!') for line in open('script1.py')]
['import!sys ', 'print(sys.path) ', 'x!=!2 ', 'print(2!**!33) '] >>>[('sys' in line, line[0]) for line in open('script1.py')]
[(True, 'i'), (True, 'p'), (False, 'x'), (False, 'p')]
In fact, list comprehensions can be even more advanced in
practice. As one particularly useful extension, the for
loop nested in the expression can have
an associated if
clause to filter
out of the result items for which the test is not true.
For example, suppose we want to repeat the prior section’s
file-scanning example, but we need to collect only lines that begin
with the letter p (perhaps the first character
on each line is an action code of some sort). Adding an if
filter clause to our expression does
the trick:
>>>lines = [line.rstrip() for line in open('script1.py') if line[0] == 'p']
>>>lines
['print(sys.path)', 'print(2 ** 33)']
Here, the if
clause checks
each line read from the file to see whether its first character is
p; if not, the line is omitted from the result
list. This is a fairly big expression, but it’s easy to understand
if we translate it to its simple for
loop statement equivalent. In general,
we can always translate a list comprehension to a for
statement by appending as we go and
further indenting each successive part:
>>>res = []
>>>for line in open('script1.py'):
...if line[0] == 'p':
...res.append(line.rstrip())
... >>>res
['print(sys.path)', 'print(2 ** 33)']
This for
statement
equivalent works, but it takes up four lines instead of one and
probably runs substantially slower.
List comprehensions can become even more complex if we need
them to—for instance, they may contain nested loops, coded as a
series of for
clauses. In fact,
their full syntax allows for any number of for
clauses, each of which can have an
optional associated if
clause
(we’ll be more formal about their syntax in Chapter 20).
For example, the following builds a list of the concatenation
of x + y
for every x
in one string and every y
in another. It effectively collects the
permutation of the characters in two strings:
>>> [x + y for x in 'abc' for y in 'lmn']
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']
Again, one way to understand this expression is to convert it to statement form by indenting its parts. The following is an equivalent, but likely slower, alternative way to achieve the same effect:
>>>res = []
>>>for x in 'abc':
...for y in 'lmn':
...res.append(x + y)
... >>>res
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']
Beyond this complexity level, though, list comprehension
expressions can often become too compact for their own good. In
general, they are intended for simple types of iterations; for more
involved work, a simpler for
statement structure will probably be easier to understand and modify
in the future. As usual in programming, if something is difficult
for you to understand, it’s probably not a good idea.
We’ll revisit list comprehensions in Chapter 20, in the context of functional programming tools; as we’ll see, they turn out to be just as related to functions as they are to looping statements.
Later in the book, we’ll see that user-defined classes can implement the iteration protocol too. Because of this, it’s sometimes important to know which built-in tools make use of it—any tool that employs the iteration protocol will automatically work on any built-in type or user-defined class that provides it.
So far, I’ve been demonstrating iterators in the context of the
for
loop statement, because this
part of the book is focused on statements. Keep in mind, though, that
every tool that scans from left to right across objects uses the
iteration protocol. This includes the for
loops we’ve seen:
>>>for line in open('script1.py'):
# Use file iterators ...print(line.upper(), end='')
... IMPORT SYS PRINT(SYS.PATH) X = 2 PRINT(2 ** 33)
However, list comprehensions, the in
membership test, the map
built-in function, and other built-ins
such as the sorted
and zip
calls also leverage the iteration
protocol. When applied to a file, all of these use the file object’s
iterator automatically to scan line by line:
>>>uppers = [line.upper() for line in open('script1.py')]
>>>uppers
['IMPORT SYS ', 'PRINT(SYS.PATH) ', 'X = 2 ', 'PRINT(2 ** 33) '] >>>map(str.upper, open('script1.py'))
# map is an iterable in 3.0 <map object at 0x02660710> >>>list( map(str.upper, open('script1.py')) )
['IMPORT SYS ', 'PRINT(SYS.PATH) ', 'X = 2 ', 'PRINT(2 ** 33) '] >>>'y = 2 ' in open('script1.py')
False >>>'x = 2 ' in open('script1.py')
True
We introduced the map
call
used here in the preceding chapter; it’s a built-in that applies a
function call to each item in the passed-in iterable object. map
is similar to a list comprehension but
is more limited because it requires a function instead of an arbitrary
expression. It also returns an iterable object
itself in Python 3.0, so we must wrap it in a list
call to force it to give us all its
values at once; more on this change later in this chapter. Because
map
, like the list comprehension,
is related to both for
loops and
functions, we’ll also explore both again in Chapters 19
and 20.
Python includes various additional built-ins that process
iterables, too: sorted
sorts items in an iterable, zip
combines items from iterables, enumerate
pairs items in an iterable with
relative positions, filter
selects items
for which a function is true, and reduce
runs pairs of
items in an iterable through a function. All of these accept
iterables, and zip
, enumerate
, and filter
also return an iterable in Python
3.0, like map
. Here they are in
action running the file’s iterator automatically to scan line by
line:
>>>sorted(open('script1.py'))
['import sys ', 'print(2 ** 33) ', 'print(sys.path) ', 'x = 2 '] >>>list(zip(open('script1.py'), open('script1.py')))
[('import sys ', 'import sys '), ('print(sys.path) ', 'print(sys.path) '), ('x = 2 ', 'x = 2 '), ('print(2 ** 33) ', 'print(2 ** 33) ')] >>>list(enumerate(open('script1.py')))
[(0, 'import sys '), (1, 'print(sys.path) '), (2, 'x = 2 '), (3, 'print(2 ** 33) ')] >>>list(filter(bool, open('script1.py')))
['import sys ', 'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) '] >>>import functools, operator
>>>functools.reduce(operator.add, open('script1.py'))
'import sys print(sys.path) x = 2 print(2 ** 33) '
All of these are iteration tools, but they have unique roles. We
met zip
and enumerate
in the prior chapter; filter
and reduce
are in Chapter 19’s functional programming domain,
so we’ll defer details for now.
We first saw the sorted
function used here at work in Chapter 4, and we used it for
dictionaries in Chapter 8. sorted
is a built-in that employs the
iteration protocol—it’s like the original list sort
method, but it returns the new sorted
list as a result and runs on any iterable object. Notice that, unlike
map
and others, sorted
returns an actual
list in Python 3.0 instead of an iterable.
Other built-in functions support the iteration protocol as well
(but frankly, are harder to cast in interesting examples related to
files). For example, the sum
call
computes the sum of all the numbers in any iterable; the any
and all
built-ins return True
if any or all items in an iterable are
True
, respectively; and max
and min
return the largest and smallest item in
an iterable, respectively. Like reduce
, all of the tools in the following
examples accept any iterable as an argument and use the iteration
protocol to scan it, but return a single result:
>>>sum([3, 2, 4, 1, 5, 0])
# sum expects numbers only 15 >>>any(['spam', '', 'ni'])
True >>>all(['spam', '', 'ni'])
False >>>max([3, 2, 5, 1, 4])
5 >>>min([3, 2, 5, 1, 4])
1
Strictly speaking, the max
and min
functions can be applied to
files as well—they automatically use the iteration protocol to scan
the file and pick out the lines with the highest and lowest string
values, respectively (though I’ll leave valid use cases to your
imagination):
>>>max(open('script1.py'))
# Line with max/min string value 'x = 2 ' >>>min(open('script1.py'))
'import sys '
Interestingly, the iteration protocol is even more pervasive in
Python today than the examples so far have
demonstrated—everything in Python’s built-in
toolset that scans an object from left to right is defined to use the
iteration protocol on the subject object. This even includes more
esoteric tools such as the list
and
tuple
built-in functions (which
build new objects from iterables), the string join
method (which puts a substring between
strings contained in an iterable), and even sequence assignments.
Consequently, all of these will also work on an open file and
automatically read one line at a time:
>>>list(open('script1.py'))
['import sys ', 'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) '] >>>tuple(open('script1.py'))
('import sys ', 'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) ') >>>'&&'.join(open('script1.py'))
'import sys &&print(sys.path) &&x = 2 &&print(2 ** 33) ' >>>a, b, c, d = open('script1.py')
>>>a, d
('import sys ', 'print(2 ** 33) ') >>>a, *b = open('script1.py')
# 3.0 extended form >>>a, b
('import sys ', ['print(sys.path) ', 'x = 2 ', 'print(2 ** 33) '])
Earlier, we saw that the built-in dict
call accepts an iterable zip
result, too. For that matter, so does
the set
call, as well as the new
set and dictionary comprehension
expressions in Python 3.0, which we met in Chapters 4, 5, and 8:
>>>set(open('script1.py'))
{'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) ', 'import sys '} >>>{line for line in open('script1.py')}
{'print(sys.path) ', 'x = 2 ', 'print(2 ** 33) ', 'import sys '} >>>{ix: line for ix, line in enumerate(open('script1.py'))}
{0: 'import sys ', 1: 'print(sys.path) ', 2: 'x = 2 ', 3: 'print(2 ** 33) '}
In fact, both set and dictionary comprehensions support the
extended syntax of list comprehensions we met earlier in this chapter,
including if
tests:
>>>{line for line in open('script1.py') if line[0] == 'p'}
{'print(sys.path) ', 'print(2 ** 33) '} >>>{ix: line for (ix, line) in enumerate(open('script1.py')) if line[0] == 'p'}
{1: 'print(sys.path) ', 3: 'print(2 ** 33) '}
Like the list comprehension, both of these scan the file line by line and pick out lines that begin with the letter “p.” They also happen to build sets and dictionaries in the end, but we get a lot of work “for free” by combining file iteration and comprehension syntax.
There’s one last iteration context that’s worth mentioning,
although it’s a bit of a preview: in Chapter 18,
we’ll learn that a special *arg
form can be used in function calls to unpack a collection of values
into individual arguments. As you can probably predict by now, this
accepts any iterable, too, including files (see Chapter 18 for more details on the call syntax):
>>>def f(a, b, c, d): print(a, b, c, d, sep='&')
... >>>f(1, 2, 3, 4)
1&2&3&4 >>>f(*[1, 2, 3, 4])
# Unpacks into arguments 1&2&3&4 >>>f(*open('script1.py'))
# Iterates by lines too! import sys &print(sys.path) &x = 2 &print(2 ** 33)
In fact, because this argument-unpacking syntax in calls accepts
iterables, it’s also possible to use the zip
built-in to unzip
zipped tuples, by making prior or nested zip
results arguments for another zip
call (warning: you probably shouldn’t
read the following example if you plan to operate heavy machinery
anytime soon!):
>>>X = (1, 2)
>>>Y = (3, 4)
>>> >>>list(zip(X, Y))
# Zip tuples: returns an iterable [(1, 3), (2, 4)] >>> >>>A, B = zip(*zip(X, Y))
# Unzip a zip! >>>A
(1, 2) >>>B
(3, 4)
Still other tools in Python, such as the range
built-in and dictionary view objects,
return iterables instead of processing them. To see how these have
been absorbed into the iteration protocol in Python 3.0 as well, we
need to move on to the next section.
One of the fundamental changes in Python 3.0 is that it has
a stronger emphasis on iterators than 2.X. In addition to the iterators
associated with built-in types such as files and dictionaries, the
dictionary methods keys
, values
, and items
return iterable objects in Python 3.0,
as do the built-in functions range
,
map
, zip
, and filter
. As shown in the prior section, the
last three of these functions both return iterators and process them.
All of these tools produce results on demand in Python 3.0, instead of
constructing result lists as they do in 2.6.
Although this saves memory space, it can impact your coding
styles in some contexts. In various places in this book so far, for
example, we’ve had to wrap up various function and method call results
in a list(...)
call in order to
force them to produce all their results at once:
>>>zip('abc', 'xyz')
# An iterable in Python 3.0 (a list in 2.6) <zip object at 0x02E66710> >>>list(zip('abc', 'xyz'))
# Force list of results in 3.0 to display [('a', 'x'), ('b', 'y'), ('c', 'z')]
This isn’t required in 2.6, because functions like zip
return lists of results. In 3.0, though,
they return iterable objects, producing results on demand. This means
extra typing is required to display the results at the interactive
prompt (and possibly in some other contexts), but it’s an asset in
larger programs—delayed evaluation like this conserves memory and
avoids pauses while large result lists are computed. Let’s take a
quick look at some of the new 3.0 iterables in action.
We studied the range
built-in’s basic behavior in the prior
chapter. In 3.0, it returns an iterator that generates numbers in
the range on demand, instead of building the result list in memory.
This subsumes the older 2.X xrange
(see the upcoming version skew
note), and you must use list(range(...))
to force an actual range
list if one is needed (e.g., to display results):
C:\misc>c:python30python
>>>R = range(10)
# range returns an iterator, not a list >>>R
range(0, 10) >>>I = iter(R)
# Make an iterator from the range >>>next(I)
# Advance to next result 0 # What happens in for loops, comprehensions, etc. >>>next(I)
1 >>>next(I)
2 >>>list(range(10))
# To force a list if required [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Unlike the list returned by this call in 2.X, range
objects in 3.0 support only
iteration, indexing, and the len
function. They do not support any other sequence operations (use
list(...)
if you require more
list tools):
>>>len(R)
# range also does len and indexing, but no others 10 >>>R[0]
0 >>>R[-1]
9 >>>next(I)
# Continue taking from iterator, where left off 3 >>>I.__next__()
# .next() becomes .__next__(), but use new next() 4
Version skew note: Python 2.X also has
a built-in called xrange
, which
is like range
but produces
items on demand instead of building a list of results in memory
all at once. Since this is exactly what the new iterator-based
range
does in Python 3.0,
xrange
is no longer available
in 3.0—it has been subsumed. You may still see it in 2.X code,
though, especially since range
builds result lists there and so is not as efficient in its memory
usage. As noted in a sidebar in the prior chapter, the file.xreadlines()
method used to
minimize memory use in 2.X has been dropped in Python 3.0 for
similar reasons, in favor of file iterators.
Like range
, the map
, zip
, and filter
built-ins also become iterators in 3.0 to conserve space, rather than producing a
result list all at once in memory. All three not only process
iterables, as in 2.X, but also return iterable results in 3.0.
Unlike range
, though, they are
their own iterators—after you step through their results once, they
are exhausted. In other words, you can’t have multiple iterators on
their results that maintain different positions in those
results.
Here is the case for the map
built-in we met in the prior chapter.
As with other iterators, you can force a list with list(...)
if you really need one, but the
default behavior can save substantial space in memory for large
result sets:
>>>M = map(abs, (-1, 0, 1))
# map returns an iterator, not a list >>>M
<map object at 0x0276B890> >>>next(M)
# Use iterator manually: exhausts results 1 # These do not support len() or indexing >>>next(M)
0 >>>next(M)
1 >>>next(M)
StopIteration >>>for x in M: print(x)
# map iterator is now empty: one pass only ... >>>M = map(abs, (-1, 0, 1))
# Make a new iterator to scan again >>>for x in M: print(x)
# Iteration contexts auto call next() ... 1 0 1 >>>list(map(abs, (-1, 0, 1)))
# Can force a real list if needed [1, 0, 1]
The zip
built-in,
introduced in the prior chapter, returns iterators that work the
same way:
>>>Z = zip((1, 2, 3), (10, 20, 30))
# zip is the same: a one-pass iterator >>>Z
<zip object at 0x02770EE0> >>>list(Z)
[(1, 10), (2, 20), (3, 30)] >>>for pair in Z: print(pair)
# Exhausted after one pass ... >>>Z = zip((1, 2, 3), (10, 20, 30))
>>>for pair in Z: print(pair)
# Iterator used automatically or manually ... (1, 10) (2, 20) (3, 30) >>>Z = zip((1, 2, 3), (10, 20, 30))
>>>next(Z)
(1, 10) >>>next(Z)
(2, 20)
The filter
built-in, which
we’ll study in the next part of this book, is also analogous. It
returns items in an iterable for which a passed-in function returns
True
(as we’ve learned, in Python
True
includes nonempty
objects):
>>>filter(bool, ['spam', '', 'ni'])
<filter object at 0x0269C6D0> >>>list(filter(bool, ['spam', '', 'ni']))
['spam', 'ni']
Like most of the tools discussed in this section, filter
both accepts an iterable to process
and returns an iterable to generate results in 3.0.
It’s interesting to see how the range
object differs from the built-ins described
in this section—it supports len
and indexing, it is not its own iterator (you make one with iter
when iterating manually), and it
supports multiple iterators over its result that remember their
positions independently:
>>>R = range(3)
# range allows multiple iterators >>>next(R)
TypeError: range object is not an iterator >>>I1 = iter(R)
>>>next(I1)
0 >>>next(I1)
1 >>>I2 = iter(R)
# Two iterators on one range >>>next(I2)
0 >>>next(I1)
# I1 is at a different spot than I2 2
By contrast, zip
, map
, and filter
do not support multiple active
iterators on the same result:
>>>Z = zip((1, 2, 3), (10, 11, 12))
>>>I1 = iter(Z)
>>>I2 = iter(Z)
# Two iterators on one zip >>>next(I1)
(1, 10) >>>next(I1)
(2, 11) >>>next(I2)
# I2 is at same spot as I1! (3, 12) >>>M = map(abs, (-1, 0, 1))
# Ditto for map (and filter) >>>I1 = iter(M); I2 = iter(M)
>>>print(next(I1), next(I1), next(I1))
1 0 1 >>>next(I2)
StopIteration >>>R = range(3)
# But range allows many iterators >>>I1, I2 = iter(R), iter(R)
>>>[next(I1), next(I1), next(I1)]
[0 1 2] >>>next(I2)
0
When we code our own iterable objects with classes later in
the book (Chapter 29), we’ll see that
multiple iterators are usually supported by returning new objects
for the iter
call; a single
iterator generally means an object returns itself. In Chapter 20, we’ll also find
that generator functions and expressions behave
like map
and zip
instead of range
in this regard, supporting a single
active iteration. In that chapter, we’ll see some subtle
implications of one-shot iterators in loops that attempt to scan
multiple times.
As we saw briefly in Chapter 8,
in Python 3.0 the dictionary keys
, values
, and items
methods return iterable
view objects that generate result items one at
a time, instead of producing result lists all at once in memory.
View items maintain the same physical ordering as that of the
dictionary and reflect changes made to the underlying dictionary.
Now that we know more about iterators, here’s the rest of the
story:
>>>D = dict(a=1, b=2, c=3)
>>>D
{'a': 1, 'c': 3, 'b': 2} >>>K = D.keys()
# A view object in 3.0, not a list >>>K
<dict_keys object at 0x026D83C0> >>>next(K)
# Views are not iterators themselves TypeError: dict_keys object is not an iterator >>>I = iter(K)
# Views have an iterator, >>>next(I)
# which can be used manually 'a' # but does not support len(), index >>>next(I)
'c' >>>for k in D.keys(): print(k, end=' ')
# All iteration contexts use auto ... a c b
As for all iterators, you can always force a 3.0 dictionary
view to build a real list by passing it to the list
built-in. However, this usually isn’t
required except to display results interactively or to apply list
operations like indexing:
>>>K = D.keys()
>>>list(K)
# Can still force a real list if needed ['a', 'c', 'b'] >>>V = D.values()
# Ditto for values() and items() views >>>V
<dict_values object at 0x026D8260> >>>list(V)
[1, 3, 2] >>>list(D.items())
[('a', 1), ('c', 3), ('b', 2)] >>>for (k, v) in D.items(): print(k, v, end=' ')
... a 1 c 3 b 2
In addition, 3.0 dictionaries still have iterators themselves,
which return successive keys. Thus, it’s not often necessary to call
keys
directly in this
context:
>>>D
# Dictionaries still have own iterator {'a': 1, 'c': 3, 'b': 2} # Returns next key on each iteration >>>I = iter(D)
>>>next(I)
'a' >>>next(I)
'c' >>>for key in D: print(key, end=' ')
# Still no need to call keys() to iterate ... # But keys is an iterator in 3.0 too! a c b
Finally, remember again that because keys
no longer returns a list, the
traditional coding pattern for scanning a dictionary by sorted keys
won’t work in 3.0. Instead, convert keys views first with a list
call, or use the sorted
call on either a keys view or the
dictionary itself, as follows:
>>>D
{'a': 1, 'c': 3, 'b': 2} >>>for k in sorted(D.keys()): print(k, D[k], end=' ')
... a 1 b 2 c 3 >>>D
{'a': 1, 'c': 3, 'b': 2} >>>for k in sorted(D): print(k, D[k], end=' ')
# Best practice key sorting ... a 1 b 2 c 3
We’ll learn more about both list comprehensions and iterators in Chapter 20, in conjunction with functions, and again in Chapter 29 when we study classes. As you’ll see later:
User-defined functions can be turned into iterable
generator functions, with yield
statements.
List comprehensions morph into iterable generator expressions when coded in parentheses.
User-defined classes are made iterable with __iter__
or __getitem__
operator overloading.
In particular, user-defined iterators defined with classes allow arbitrary objects and operations to be used in any of the iteration contexts we’ve met here.
In this chapter, we explored concepts related to looping in
Python. We took our first substantial look at the iteration
protocol in Python—a way for nonsequence objects to take
part in iteration loops—and at list
comprehensions. As we saw, a list comprehension is an
expression similar to a for
loop
that applies another expression to all the items in any iterable
object. Along the way, we also saw other built-in iteration tools at
work and studied recent iteration additions in Python 3.0.
This wraps up our tour of specific procedural statements and related tools. The next chapter closes out this part of the book by discussing documentation options for Python code; documentation is also part of the general syntax model, and it’s an important component of well-written programs. In the next chapter, we’ll also dig into a set of exercises for this part of the book before we turn our attention to larger structures such as functions. As usual, though, let’s first exercise what we’ve learned here with a quiz.
How are for
loops and
iterators related?
How are for
loops and
list comprehensions related?
Name four iteration contexts in the Python language.
What is the best way to read line by line from a text file today?
What sort of weapons would you expect to see employed by the Spanish Inquisition?
The for
loop uses the
iteration protocol to step through items in
the object across which it is iterating. It calls the object’s
__next__
method (run by the
next
built-in) on each
iteration and catches the StopIteration
exception to determine
when to stop looping. Any object that supports this model works in
a for
loop and in other
iteration contexts.
Both are iteration tools. List comprehensions are a concise
and efficient way to perform a common for
loop task: collecting the results of
applying an expression to all items in an iterable object. It’s
always possible to translate a list comprehension to a for
loop, and part of the list
comprehension expression looks like the header of a for
loop syntactically.
Iteration contexts in Python include the for
loop; list comprehensions; the
map
built-in function; the
in
membership test expression;
and the built-in functions sorted
, sum
, any
, and all
. This category also includes the
list
and tuple
built-ins, string join
methods, and sequence assignments,
all of which use the iteration protocol (the __next__
method) to step across iterable
objects one item at a time.
The best way to read lines from a text file today is to not
read it explicitly at all: instead, open the file within an
iteration context such as a for
loop or list comprehension, and let the iteration tool
automatically scan one line at a time by running the file’s next
method on each iteration. This
approach is generally best in terms of coding simplicity,
execution speed, and memory space requirements.
I’ll accept any of the following as correct answers: fear, intimidation, nice red uniforms, a comfy chair, and soft pillows.
[33] Terminology in this topic tends to be a bit loose. This text
uses the terms “iterable” and “iterator” interchangeably to refer
to an object that supports iteration in general. Sometimes the
term “iterable” refers to an object that supports iter
and “iterator” refers to an object
return by iter
that supports
next(
I
)
, but that convention is not universal
in either the Python world or this book.
[34] Technically speaking, the for
loop calls the internal equivalent
of I.__next__
, instead of the
next(I)
used here. There is
rarely any difference between the two, but as we’ll see in the
next section, there are some built-in objects in 3.0 (such as
os.popen
results) that
support the former and not the latter, but may be still be
iterated across in for
loops.
Your manual iterations can generally use either call scheme. If
you care for the full story, in 3.0 os.popen
results have been
reimplemented with the subprocess
module and a wrapper class,
whose __getattr__
method is
no longer called in 3.0 for implicit __next__
fetches made by the next
built-in, but is called for
explicit fetches by name—a 3.0 change issue we’ll confront in
Chapters 37 and 38, which
apparently burns some standard library code too! Also in 3.0,
the related 2.6 calls os.popen2/3/4
are no longer available;
use subprocess.Popen
with
appropriate arguments instead (see the Python 3.0 library manual
for the new required code).