Chapter 14. Iterations and Comprehensions, Part 1

In the prior chapter we met Python’s two looping statements, while and for. Although they can handle most repetitive tasks programs need to perform, the need to iterate over sequences is so common and pervasive that Python provides additional tools to make it simpler and more efficient. This chapter begins our exploration of these tools. Specifically, it presents the related concepts of Python’s iteration protocol—a method-call model used by the for loop—and fills in some details on list comprehensions—a close cousin to the for loop that applies an expression to items in an iterable.

Because both of these tools are related to both the for loop and functions, we’ll take a two-pass approach to covering them in this book: this chapter introduces the basics in the context of looping tools, serving as something of continuation of the prior chapter, and a later chapter (Chapter 20) revisits them in the context of function-based tools. In this chapter, we’ll also sample additional iteration tools in Python and touch on the new iterators available in Python 3.0.

One note up front: some of the concepts presented in these chapters may seem advanced at first glance. With practice, though, you’ll find that these tools are useful and powerful. Although never strictly required, because they’ve become commonplace in Python code, a basic understanding can also help if you must read programs written by others.

Iterators: A First Look

In the preceding chapter, I mentioned that the for loop can work on any sequence type in Python, including lists, tuples, and strings, like this:

>>> for x in [1, 2, 3, 4]: print(x ** 2, end=' ')
...
1 4 9 16

>>> for x in (1, 2, 3, 4): print(x ** 3, end=' ')
...
1 8 27 64

>>> for x in 'spam': print(x * 2, end=' ')
...
ss pp aa mm

Actually, the for loop turns out to be even more generic than this—it works on any iterable object. In fact, this is true of all iteration tools that scan objects from left to right in Python, including for loops, the list comprehensions we’ll study in this chapter, in membership tests, the map built-in function, and more.

The concept of “iterable objects” is relatively recent in Python, but it has come to permeate the language’s design. It’s essentially a generalization of the notion of sequences—an object is considered iterable if it is either a physically stored sequence or an object that produces one result at a time in the context of an iteration tool like a for loop. In a sense, iterable objects include both physical sequences and virtual sequences computed on demand.[33]

The Iteration Protocol: File Iterators

One of the easiest ways to understand what this means is to look at how it works with a built-in type such as the file. Recall from Chapter 9 that open file objects have a method called readline, which reads one line of text from a file at a time—each time we call the readline method, we advance to the next line. At the end of the file, an empty string is returned, which we can detect to break out of the loop:

>>> f = open('script1.py')     # Read a 4-line script file in this directory
>>> f.readline()               # readline loads one line on each call
'import sys
'
>>> f.readline()
'print(sys.path)
'
>>> f.readline()
'x = 2
'
>>> f.readline()
'print(2 ** 33)
'
>>> f.readline()               # Returns empty string at end-of-file
''

However, files also have a method named __next__ that has a nearly identical effect—it returns the next line from a file each time it is called. The only noticeable difference is that __next__ raises a built-in StopIteration exception at end-of-file instead of returning an empty string:

>>> f = open('script1.py')     # __next__ loads one line on each call too
>>> f.__next__()               # But raises an exception at end-of-file
'import sys
'
>>> f.__next__()
'print(sys.path)
'
>>> f.__next__()
'x = 2
'
>>> f.__next__()
'print(2 ** 33)
'
>>> f.__next__()
Traceback (most recent call last):
...more exception text omitted...
StopIteration

This interface is exactly what we call the iteration protocol in Python. Any object with a __next__ method to advance to a next result, which raises StopIteration at the end of the series of results, is considered iterable in Python. Any such object may also be stepped through with a for loop or other iteration tool, because all iteration tools normally work internally by calling __next__ on each iteration and catching the StopIteration exception to determine when to exit.

The net effect of this magic is that, as mentioned in Chapter 9, the best way to read a text file line by line today is to not read it at all—instead, allow the for loop to automatically call __next__ to advance to the next line on each iteration. The file object’s iterator will do the work of automatically loading lines as you go. The following, for example, reads a file line by line, printing the uppercase version of each line along the way, without ever explicitly reading from the file at all:

>>> for line in open('script1.py'):       # Use file iterators to read by lines
...     print(line.upper(), end='')       # Calls __next__, catches StopIteration
...
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

Notice that the print uses end='' here to suppress adding a , because line strings already have one (without this, our output would be double-spaced). This is considered the best way to read text files line by line today, for three reasons: it’s the simplest to code, might be the quickest to run, and is the best in terms of memory usage. The older, original way to achieve the same effect with a for loop is to call the file readlines method to load the file’s content into memory as a list of line strings:

>>> for line in open('script1.py').readlines():
...     print(line.upper(), end='')
...
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

This readlines technique still works, but it is not considered the best practice today and performs poorly in terms of memory usage. In fact, because this version really does load the entire file into memory all at once, it will not even work for files too big to fit into the memory space available on your computer. By contrast, because it reads one line at a time, the iterator-based version is immune to such memory-explosion issues. The iterator version might run quicker too, though this can vary per release (Python 3.0 made this advantage less clear-cut by rewriting I/O libraries to support Unicode text and be less system-dependent).

As mentioned in the prior chapter’s sidebar, Why You Will Care: File Scanners, it’s also possible to read a file line by line with a while loop:

>>> f = open('script1.py')
>>> while True:
...     line = f.readline()
...     if not line: break
...     print(line.upper(), end='')
...
...same output...

However, this may run slower than the iterator-based for loop version, because iterators run at C language speed inside Python, whereas the while loop version runs Python byte code through the Python virtual machine. Any time we trade Python code for C code, speed tends to increase. This is not an absolute truth, though, especially in Python 3.0; we’ll see timing techniques later in this book for measuring the relative speed of alternatives like these.

Manual Iteration: iter and next

To support manual iteration code (with less typing), Python 3.0 also provides a built-in function, next, that automatically calls an object’s __next__ method. Given an iterable object X, the call next(X) is the same as X.__next__(), but noticeably simpler. With files, for instance, either form may be used:

>>> f = open('script1.py')
>>> f.__next__()                   # Call iteration method directly
'import sys
'
>>> f.__next__()
'print(sys.path)
'

>>> f = open('script1.py')
>>> next(f)                        # next built-in calls __next__
'import sys
'
>>> next(f)
'print(sys.path)
'

Technically, there is one more piece to the iteration protocol. When the for loop begins, it obtains an iterator from the iterable object by passing it to the iter built-in function; the object returned by iter has the required next method. This becomes obvious if we look at how for loops internally process built-in sequence types such as lists:

>>> L = [1, 2, 3]
>>> I = iter(L)                    # Obtain an iterator object
>>> I.__next__()                   # Call next to advance to next item
1
>>> I.__next__()
2
>>> I.__next__()
3
>>> I.__next__()
Traceback (most recent call last):
...more omitted...
StopIteration

This initial step is not required for files, because a file object is its own iterator. That is, files have their own __next__ method and so do not need to return a different object that does:

>>> f = open('script1.py')
>>> iter(f) is f
True
>>> f.__next__()
'import sys
'

Lists, and many other built-in objects, are not their own iterators because they support multiple open iterations. For such objects, we must call iter to start iterating:

>>> L = [1, 2, 3]
>>> iter(L) is L
False
>>> L.__next__()
AttributeError: 'list' object has no attribute '__next__'

>>> I = iter(L)
>>> I.__next__()
1
>>> next(I)                # Same as I.__next__()
2

Although Python iteration tools call these functions automatically, we can use them to apply the iteration protocol manually, too. The following interaction demonstrates the equivalence between automatic and manual iteration:[34]

>>> L = [1, 2, 3]
>>>
>>> for X in L:                 # Automatic iteration
...     print(X ** 2, end=' ')  # Obtains iter, calls __next__, catches exceptions
...
1 4 9

>>> I = iter(L)                 # Manual iteration: what for loops usually do
>>> while True:
...     try:                    # try statement catches exceptions
...         X = next(I)         # Or call I.__next__
...     except StopIteration:
...         break
...     print(X ** 2, end=' ')
...
1 4 9

To understand this code, you need to know that try statements run an action and catch exceptions that occur while the action runs (we’ll explore exceptions in depth in Part VII). I should also note that for loops and other iteration contexts can sometimes work differently for user-defined classes, repeatedly indexing an object instead of running the iteration protocol. We’ll defer that story until we study class operator overloading in Chapter 29.

Note

Version skew note: In Python 2.6, the iteration method is named X.next() instead of X.__next__(). For portability, the next(X) built-in function is available in Python 2.6 too (but not earlier), and calls 2.6’s X.next() instead of 3.0’s X.__next__(). Iteration works the same in 2.6 in all other ways, though; simply use X.next() or next(X) for manual iterations, instead of 3.0’s X.__next__(). Prior to 2.6, use manual X.next() calls instead of next(X).

Other Built-in Type Iterators

Besides files and physical sequences like lists, other types have useful iterators as well. The classic way to step through the keys of a dictionary, for example, is to request its keys list explicitly:

>>> D = {'a':1, 'b':2, 'c':3}
>>> for key in D.keys():
...     print(key, D[key])
...
a 1
c 3
b 2

In recent versions of Python, though, dictionaries have an iterator that automatically returns one key at a time in an iteration context:

>>> I = iter(D)
>>> next(I)
'a'
>>> next(I)
'c'
>>> next(I)
'b'
>>> next(I)
Traceback (most recent call last):
...more omitted...
StopIteration

The net effect is that we no longer need to call the keys method to step through dictionary keys—the for loop will use the iteration protocol to grab one key each time through:

>>> for key in D:
...     print(key, D[key])
...
a 1
c 3
b 2

We can’t delve into their details here, but other Python object types also support the iterator protocol and thus may be used in for loops too. For instance, shelves (an access-by-key filesystem for Python objects) and the results from os.popen (a tool for reading the output of shell commands) are iterable as well:

>>> import os
>>> P = os.popen('dir')
>>> P.__next__()
' Volume in drive C is SQ004828V03
'
>>> P.__next__()
' Volume Serial Number is 08BE-3CD4
'
>>> next(P)
TypeError: _wrap_close object is not an iterator

Notice that popen objects support a P.next() method in Python 2.6. In 3.0, they support the P.__next__() method, but not the next(P) built-in; since the latter is defined to call the former, it’s not clear if this behavior will endure in future releases (as described in an earlier footnote, this appears to be an implementation issue). This is only an issue for manual iteration, though; if you iterate over these objects automatically with for loops and other iteration contexts (described in the next sections), they return successive lines in either Python version.

The iteration protocol also is the reason that we’ve had to wrap some results in a list call to see their values all at once. Objects that are iterable return results one at a time, not in a physical list:

>>> R = range(5)
>>> R                            # Ranges are iterables in 3.0
range(0, 5)
>>> I = iter(R)                  # Use iteration protocol to produce results
>>> next(I)
0
>>> next(I)
1
>>> list(range(5))               # Or use list to collect all results at once
[0, 1, 2, 3, 4]

Now that you have a better understanding of this protocol, you should be able to see how it explains why the enumerate tool introduced in the prior chapter works the way it does:

>>> E = enumerate('spam')        # enumerate is an iterable too
>>> E
<enumerate object at 0x0253F508>
>>> I = iter(E)
>>> next(I)                      # Generate results with iteration protocol
(0, 's')
>>> next(I)                      # Or use list to force generation to run
(1, 'p')
>>> list(enumerate('spam'))
[(0, 's'), (1, 'p'), (2, 'a'), (3, 'm')]

We don’t normally see this machinery because for loops run it for us automatically to step through results. In fact, everything that scans left-to-right in Python employs the iteration protocol in the same way—including the topic of the next section.

List Comprehensions: A First Look

Now that we’ve seen how the iteration protocol works, let’s turn to a very common use case. Together with for loops, list comprehensions are one of the most prominent contexts in which the iteration protocol is applied.

In the previous chapter, we learned how to use range to change a list as we step across it:

>>> L = [1, 2, 3, 4, 5]

>>> for i in range(len(L)):
...     L[i] += 10
...
>>> L
[11, 12, 13, 14, 15]

This works, but as I mentioned there, it may not be the optimal “best-practice” approach in Python. Today, the list comprehension expression makes many such prior use cases obsolete. Here, for example, we can replace the loop with a single expression that produces the desired result list:

>>> L = [x + 10 for x in L]
>>> L
[21, 22, 23, 24, 25]

The net result is the same, but it requires less coding on our part and is likely to run substantially faster. The list comprehension isn’t exactly the same as the for loop statement version because it makes a new list object (which might matter if there are multiple references to the original list), but it’s close enough for most applications and is a common and convenient enough approach to merit a closer look here.

List Comprehension Basics

We met the list comprehension briefly in Chapter 4. Syntactically, its syntax is derived from a construct in set theory notation that applies an operation to each item in a set, but you don’t have to know set theory to use this tool. In Python, most people find that a list comprehension simply looks like a backward for loop.

To get a handle on the syntax, let’s dissect the prior section’s example in more detail:

>>> L = [x + 10 for x in L]

List comprehensions are written in square brackets because they are ultimately a way to construct a new list. They begin with an arbitrary expression that we make up, which uses a loop variable that we make up (x + 10). That is followed by what you should now recognize as the header of a for loop, which names the loop variable, and an iterable object (for x in L).

To run the expression, Python executes an iteration across L inside the interpreter, assigning x to each item in turn, and collects the results of running the items through the expression on the left side. The result list we get back is exactly what the list comprehension says—a new list containing x + 10, for every x in L.

Technically speaking, list comprehensions are never really required because we can always build up a list of expression results manually with for loops that append results as we go:

>>> res = []
>>> for x in L:
...     res.append(x + 10)
...
>>> res
[21, 22, 23, 24, 25]

In fact, this is exactly what the list comprehension does internally.

However, list comprehensions are more concise to write, and because this code pattern of building up result lists is so common in Python work, they turn out to be very handy in many contexts. Moreover, list comprehensions can run much faster than manual for loop statements (often roughly twice as fast) because their iterations are performed at C language speed inside the interpreter, rather than with manual Python code; especially for larger data sets, there is a major performance advantage to using them.

Using List Comprehensions on Files

Let’s work through another common use case for list comprehensions to explore them in more detail. Recall that the file object has a readlines method that loads the file into a list of line strings all at once:

>>> f = open('script1.py')
>>> lines = f.readlines()
>>> lines
['import sys
', 'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
']

This works, but the lines in the result all include the newline character ( ) at the end. For many programs, the newline character gets in the way—we have to be careful to avoid double-spacing when printing, and so on. It would be nice if we could get rid of these newlines all at once, wouldn’t it?

Any time we start thinking about performing an operation on each item in a sequence, we’re in the realm of list comprehensions. For example, assuming the variable lines is as it was in the prior interaction, the following code does the job by running each line in the list through the string rstrip method to remove whitespace on the right side (a line[:−1] slice would work, too, but only if we can be sure all lines are properly terminated):

>>> lines = [line.rstrip() for line in lines]
>>> lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

This works as planned. Because list comprehensions are an iteration context just like for loop statements, though, we don’t even have to open the file ahead of time. If we open it inside the expression, the list comprehension will automatically use the iteration protocol we met earlier in this chapter. That is, it will read one line from the file at a time by calling the file’s next method, run the line through the rstrip expression, and add it to the result list. Again, we get what we ask for—the rstrip result of a line, for every line in the file:

>>> lines = [line.rstrip() for line in open('script1.py')]
>>> lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

This expression does a lot implicitly, but we’re getting a lot of work for free here—Python scans the file and builds a list of operation results automatically. It’s also an efficient way to code this operation: because most of this work is done inside the Python interpreter, it is likely much faster than an equivalent for statement. Again, especially for large files, the speed advantages of list comprehensions can be significant.

Besides their efficiency, list comprehensions are also remarkably expressive. In our example, we can run any string operation on a file’s lines as we iterate. Here’s the list comprehension equivalent to the file iterator uppercase example we met earlier, along with a few others (the method chaining in the second of these examples works because string methods return a new string, to which we can apply another string method):

>>> [line.upper() for line in open('script1.py')]
['IMPORT SYS
', 'PRINT(SYS.PATH)
', 'X = 2
', 'PRINT(2 ** 33)
']

>>> [line.rstrip().upper() for line in open('script1.py')]
['IMPORT SYS', 'PRINT(SYS.PATH)', 'X = 2', 'PRINT(2 ** 33)']

>>> [line.split() for line in open('script1.py')]
[['import', 'sys'], ['print(sys.path)'], ['x', '=', '2'], ['print(2', '**','33)']]

>>> [line.replace(' ', '!') for line in open('script1.py')]
['import!sys
', 'print(sys.path)
', 'x!=!2
', 'print(2!**!33)
']

>>> [('sys' in line, line[0]) for line in open('script1.py')]
[(True, 'i'), (True, 'p'), (False, 'x'), (False, 'p')]

Extended List Comprehension Syntax

In fact, list comprehensions can be even more advanced in practice. As one particularly useful extension, the for loop nested in the expression can have an associated if clause to filter out of the result items for which the test is not true.

For example, suppose we want to repeat the prior section’s file-scanning example, but we need to collect only lines that begin with the letter p (perhaps the first character on each line is an action code of some sort). Adding an if filter clause to our expression does the trick:

>>> lines = [line.rstrip() for line in open('script1.py') if line[0] == 'p']
>>> lines
['print(sys.path)', 'print(2 ** 33)']

Here, the if clause checks each line read from the file to see whether its first character is p; if not, the line is omitted from the result list. This is a fairly big expression, but it’s easy to understand if we translate it to its simple for loop statement equivalent. In general, we can always translate a list comprehension to a for statement by appending as we go and further indenting each successive part:

>>> res = []
>>> for line in open('script1.py'):
...     if line[0] == 'p':
...         res.append(line.rstrip())
...
>>> res
['print(sys.path)', 'print(2 ** 33)']

This for statement equivalent works, but it takes up four lines instead of one and probably runs substantially slower.

List comprehensions can become even more complex if we need them to—for instance, they may contain nested loops, coded as a series of for clauses. In fact, their full syntax allows for any number of for clauses, each of which can have an optional associated if clause (we’ll be more formal about their syntax in Chapter 20).

For example, the following builds a list of the concatenation of x + y for every x in one string and every y in another. It effectively collects the permutation of the characters in two strings:

>>> [x + y for x in 'abc' for y in 'lmn']
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

Again, one way to understand this expression is to convert it to statement form by indenting its parts. The following is an equivalent, but likely slower, alternative way to achieve the same effect:

>>> res = []
>>> for x in 'abc':
...     for y in 'lmn':
...         res.append(x + y)
...
>>> res
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

Beyond this complexity level, though, list comprehension expressions can often become too compact for their own good. In general, they are intended for simple types of iterations; for more involved work, a simpler for statement structure will probably be easier to understand and modify in the future. As usual in programming, if something is difficult for you to understand, it’s probably not a good idea.

We’ll revisit list comprehensions in Chapter 20, in the context of functional programming tools; as we’ll see, they turn out to be just as related to functions as they are to looping statements.

Other Iteration Contexts

Later in the book, we’ll see that user-defined classes can implement the iteration protocol too. Because of this, it’s sometimes important to know which built-in tools make use of it—any tool that employs the iteration protocol will automatically work on any built-in type or user-defined class that provides it.

So far, I’ve been demonstrating iterators in the context of the for loop statement, because this part of the book is focused on statements. Keep in mind, though, that every tool that scans from left to right across objects uses the iteration protocol. This includes the for loops we’ve seen:

>>> for line in open('script1.py'):         # Use file iterators
...     print(line.upper(), end='')
...
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

However, list comprehensions, the in membership test, the map built-in function, and other built-ins such as the sorted and zip calls also leverage the iteration protocol. When applied to a file, all of these use the file object’s iterator automatically to scan line by line:

>>> uppers = [line.upper() for line in open('script1.py')]
>>> uppers
['IMPORT SYS
', 'PRINT(SYS.PATH)
', 'X = 2
', 'PRINT(2 ** 33)
']

>>> map(str.upper, open('script1.py'))      # map is an iterable in 3.0
<map object at 0x02660710>

>>> list( map(str.upper, open('script1.py')) )
['IMPORT SYS
', 'PRINT(SYS.PATH)
', 'X = 2
', 'PRINT(2 ** 33)
']

>>> 'y = 2
' in open('script1.py')
False
>>> 'x = 2
' in open('script1.py')
True

We introduced the map call used here in the preceding chapter; it’s a built-in that applies a function call to each item in the passed-in iterable object. map is similar to a list comprehension but is more limited because it requires a function instead of an arbitrary expression. It also returns an iterable object itself in Python 3.0, so we must wrap it in a list call to force it to give us all its values at once; more on this change later in this chapter. Because map, like the list comprehension, is related to both for loops and functions, we’ll also explore both again in Chapters 19 and 20.

Python includes various additional built-ins that process iterables, too: sorted sorts items in an iterable, zip combines items from iterables, enumerate pairs items in an iterable with relative positions, filter selects items for which a function is true, and reduce runs pairs of items in an iterable through a function. All of these accept iterables, and zip, enumerate, and filter also return an iterable in Python 3.0, like map. Here they are in action running the file’s iterator automatically to scan line by line:

>>> sorted(open('script1.py'))
['import sys
', 'print(2 ** 33)
', 'print(sys.path)
', 'x = 2
']

>>> list(zip(open('script1.py'), open('script1.py')))
[('import sys
', 'import sys
'), ('print(sys.path)
', 'print(sys.path)
'),
('x = 2
', 'x = 2
'), ('print(2 ** 33)
', 'print(2 ** 33)
')]

>>> list(enumerate(open('script1.py')))
[(0, 'import sys
'), (1, 'print(sys.path)
'), (2, 'x = 2
'),
(3, 'print(2 ** 33)
')]

>>> list(filter(bool, open('script1.py')))
['import sys
', 'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
']

>>> import functools, operator
>>> functools.reduce(operator.add, open('script1.py'))
'import sys
print(sys.path)
x = 2
print(2 ** 33)
'

All of these are iteration tools, but they have unique roles. We met zip and enumerate in the prior chapter; filter and reduce are in Chapter 19’s functional programming domain, so we’ll defer details for now.

We first saw the sorted function used here at work in Chapter 4, and we used it for dictionaries in Chapter 8. sorted is a built-in that employs the iteration protocol—it’s like the original list sort method, but it returns the new sorted list as a result and runs on any iterable object. Notice that, unlike map and others, sorted returns an actual list in Python 3.0 instead of an iterable.

Other built-in functions support the iteration protocol as well (but frankly, are harder to cast in interesting examples related to files). For example, the sum call computes the sum of all the numbers in any iterable; the any and all built-ins return True if any or all items in an iterable are True, respectively; and max and min return the largest and smallest item in an iterable, respectively. Like reduce, all of the tools in the following examples accept any iterable as an argument and use the iteration protocol to scan it, but return a single result:

>>> sum([3, 2, 4, 1, 5, 0])                  # sum expects numbers only
15
>>> any(['spam', '', 'ni'])
True
>>> all(['spam', '', 'ni'])
False
>>> max([3, 2, 5, 1, 4])
5
>>> min([3, 2, 5, 1, 4])
1

Strictly speaking, the max and min functions can be applied to files as well—they automatically use the iteration protocol to scan the file and pick out the lines with the highest and lowest string values, respectively (though I’ll leave valid use cases to your imagination):

>>> max(open('script1.py'))                  # Line with max/min string value
'x = 2
'
>>> min(open('script1.py'))
'import sys
'

Interestingly, the iteration protocol is even more pervasive in Python today than the examples so far have demonstrated—everything in Python’s built-in toolset that scans an object from left to right is defined to use the iteration protocol on the subject object. This even includes more esoteric tools such as the list and tuple built-in functions (which build new objects from iterables), the string join method (which puts a substring between strings contained in an iterable), and even sequence assignments. Consequently, all of these will also work on an open file and automatically read one line at a time:

>>> list(open('script1.py'))
['import sys
', 'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
']

>>> tuple(open('script1.py'))
('import sys
', 'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
')

>>> '&&'.join(open('script1.py'))
'import sys
&&print(sys.path)
&&x = 2
&&print(2 ** 33)
'

>>> a, b, c, d = open('script1.py')
>>> a, d
('import sys
', 'print(2 ** 33)
')

>>> a, *b = open('script1.py')                # 3.0 extended form
>>> a, b
('import sys
', ['print(sys.path)
', 'x = 2
', 'print(2 ** 33)
'])

Earlier, we saw that the built-in dict call accepts an iterable zip result, too. For that matter, so does the set call, as well as the new set and dictionary comprehension expressions in Python 3.0, which we met in Chapters 4, 5, and 8:

>>> set(open('script1.py'))
{'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
', 'import sys
'}

>>> {line for line in open('script1.py')}
{'print(sys.path)
', 'x = 2
', 'print(2 ** 33)
', 'import sys
'}

>>> {ix: line for ix, line in enumerate(open('script1.py'))}
{0: 'import sys
', 1: 'print(sys.path)
', 2: 'x = 2
', 3: 'print(2 ** 33)
'}

In fact, both set and dictionary comprehensions support the extended syntax of list comprehensions we met earlier in this chapter, including if tests:

>>> {line for line in open('script1.py') if line[0] == 'p'}
{'print(sys.path)
', 'print(2 ** 33)
'}

>>> {ix: line for (ix, line) in enumerate(open('script1.py')) if line[0] == 'p'}
{1: 'print(sys.path)
', 3: 'print(2 ** 33)
'}

Like the list comprehension, both of these scan the file line by line and pick out lines that begin with the letter “p.” They also happen to build sets and dictionaries in the end, but we get a lot of work “for free” by combining file iteration and comprehension syntax.

There’s one last iteration context that’s worth mentioning, although it’s a bit of a preview: in Chapter 18, we’ll learn that a special *arg form can be used in function calls to unpack a collection of values into individual arguments. As you can probably predict by now, this accepts any iterable, too, including files (see Chapter 18 for more details on the call syntax):

>>> def f(a, b, c, d): print(a, b, c, d, sep='&')
...
>>> f(1, 2, 3, 4)
1&2&3&4
>>> f(*[1, 2, 3, 4])                   # Unpacks into arguments
1&2&3&4

>>> f(*open('script1.py'))             # Iterates by lines too!
import sys
&print(sys.path)
&x = 2
&print(2 ** 33)

In fact, because this argument-unpacking syntax in calls accepts iterables, it’s also possible to use the zip built-in to unzip zipped tuples, by making prior or nested zip results arguments for another zip call (warning: you probably shouldn’t read the following example if you plan to operate heavy machinery anytime soon!):

>>> X = (1, 2)
>>> Y = (3, 4)
>>>
>>> list(zip(X, Y))                    # Zip tuples: returns an iterable
[(1, 3), (2, 4)]
>>>
>>> A, B = zip(*zip(X, Y))             # Unzip a zip!
>>> A
(1, 2)
>>> B
(3, 4)

Still other tools in Python, such as the range built-in and dictionary view objects, return iterables instead of processing them. To see how these have been absorbed into the iteration protocol in Python 3.0 as well, we need to move on to the next section.

New Iterables in Python 3.0

One of the fundamental changes in Python 3.0 is that it has a stronger emphasis on iterators than 2.X. In addition to the iterators associated with built-in types such as files and dictionaries, the dictionary methods keys, values, and items return iterable objects in Python 3.0, as do the built-in functions range, map, zip, and filter. As shown in the prior section, the last three of these functions both return iterators and process them. All of these tools produce results on demand in Python 3.0, instead of constructing result lists as they do in 2.6.

Although this saves memory space, it can impact your coding styles in some contexts. In various places in this book so far, for example, we’ve had to wrap up various function and method call results in a list(...) call in order to force them to produce all their results at once:

>>> zip('abc', 'xyz')                  # An iterable in Python 3.0 (a list in 2.6)
<zip object at 0x02E66710>

>>> list(zip('abc', 'xyz'))            # Force list of results in 3.0 to display
[('a', 'x'), ('b', 'y'), ('c', 'z')]

This isn’t required in 2.6, because functions like zip return lists of results. In 3.0, though, they return iterable objects, producing results on demand. This means extra typing is required to display the results at the interactive prompt (and possibly in some other contexts), but it’s an asset in larger programs—delayed evaluation like this conserves memory and avoids pauses while large result lists are computed. Let’s take a quick look at some of the new 3.0 iterables in action.

The range Iterator

We studied the range built-in’s basic behavior in the prior chapter. In 3.0, it returns an iterator that generates numbers in the range on demand, instead of building the result list in memory. This subsumes the older 2.X xrange (see the upcoming version skew note), and you must use list(range(...)) to force an actual range list if one is needed (e.g., to display results):

C:\misc> c:python30python
>>> R = range(10)                # range returns an iterator, not a list
>>> R
range(0, 10)

>>> I = iter(R)                  # Make an iterator from the range
>>> next(I)                      # Advance to next result
0                                # What happens in for loops, comprehensions, etc.
>>> next(I)
1
>>> next(I)
2

>>> list(range(10))              # To force a list if required
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Unlike the list returned by this call in 2.X, range objects in 3.0 support only iteration, indexing, and the len function. They do not support any other sequence operations (use list(...) if you require more list tools):

>>> len(R)                       # range also does len and indexing, but no others
10
>>> R[0]
0
>>> R[-1]
9

>>> next(I)                      # Continue taking from iterator, where left off
3
>>> I.__next__()                 # .next() becomes .__next__(), but use new next()
4

Note

Version skew note: Python 2.X also has a built-in called xrange, which is like range but produces items on demand instead of building a list of results in memory all at once. Since this is exactly what the new iterator-based range does in Python 3.0, xrange is no longer available in 3.0—it has been subsumed. You may still see it in 2.X code, though, especially since range builds result lists there and so is not as efficient in its memory usage. As noted in a sidebar in the prior chapter, the file.xreadlines() method used to minimize memory use in 2.X has been dropped in Python 3.0 for similar reasons, in favor of file iterators.

The map, zip, and filter Iterators

Like range, the map, zip, and filter built-ins also become iterators in 3.0 to conserve space, rather than producing a result list all at once in memory. All three not only process iterables, as in 2.X, but also return iterable results in 3.0. Unlike range, though, they are their own iterators—after you step through their results once, they are exhausted. In other words, you can’t have multiple iterators on their results that maintain different positions in those results.

Here is the case for the map built-in we met in the prior chapter. As with other iterators, you can force a list with list(...) if you really need one, but the default behavior can save substantial space in memory for large result sets:

>>> M = map(abs, (-1, 0, 1))            # map returns an iterator, not a list
>>> M
<map object at 0x0276B890>
>>> next(M)                             # Use iterator manually: exhausts results
1                                       # These do not support len() or indexing
>>> next(M)
0
>>> next(M)
1
>>> next(M)
StopIteration

>>> for x in M: print(x)                # map iterator is now empty: one pass only
...

>>> M = map(abs, (-1, 0, 1))            # Make a new iterator to scan again
>>> for x in M: print(x)                # Iteration contexts auto call next()
...
1
0
1
>>> list(map(abs, (-1, 0, 1)))          # Can force a real list if needed
[1, 0, 1]

The zip built-in, introduced in the prior chapter, returns iterators that work the same way:

>>> Z = zip((1, 2, 3), (10, 20, 30))    # zip is the same: a one-pass iterator
>>> Z
<zip object at 0x02770EE0>

>>> list(Z)
[(1, 10), (2, 20), (3, 30)]

>>> for pair in Z: print(pair)          # Exhausted after one pass
...

>>> Z = zip((1, 2, 3), (10, 20, 30))
>>> for pair in Z: print(pair)          # Iterator used automatically or manually
...
(1, 10)
(2, 20)
(3, 30)

>>> Z = zip((1, 2, 3), (10, 20, 30))
>>> next(Z)
(1, 10)
>>> next(Z)
(2, 20)

The filter built-in, which we’ll study in the next part of this book, is also analogous. It returns items in an iterable for which a passed-in function returns True (as we’ve learned, in Python True includes nonempty objects):

>>> filter(bool, ['spam', '', 'ni'])
<filter object at 0x0269C6D0>
>>> list(filter(bool, ['spam', '', 'ni']))
['spam', 'ni']

Like most of the tools discussed in this section, filter both accepts an iterable to process and returns an iterable to generate results in 3.0.

Multiple Versus Single Iterators

It’s interesting to see how the range object differs from the built-ins described in this section—it supports len and indexing, it is not its own iterator (you make one with iter when iterating manually), and it supports multiple iterators over its result that remember their positions independently:

>>> R = range(3)                           # range allows multiple iterators
>>> next(R)
TypeError: range object is not an iterator

>>> I1 = iter(R)
>>> next(I1)
0
>>> next(I1)
1
>>> I2 = iter(R)                           # Two iterators on one range
>>> next(I2)
0
>>> next(I1)                               # I1 is at a different spot than I2
2

By contrast, zip, map, and filter do not support multiple active iterators on the same result:

>>> Z = zip((1, 2, 3), (10, 11, 12))
>>> I1 = iter(Z)
>>> I2 = iter(Z)                           # Two iterators on one zip
>>> next(I1)
(1, 10)
>>> next(I1)
(2, 11)
>>> next(I2)                               # I2 is at same spot as I1!
(3, 12)

>>> M = map(abs, (-1, 0, 1))                # Ditto for map (and filter)
>>> I1 = iter(M); I2 = iter(M)
>>> print(next(I1), next(I1), next(I1))
1 0 1
>>> next(I2)
StopIteration

>>> R = range(3)                            # But range allows many iterators
>>> I1, I2 = iter(R), iter(R)
>>> [next(I1), next(I1), next(I1)]
[0 1 2]
>>> next(I2)
0

When we code our own iterable objects with classes later in the book (Chapter 29), we’ll see that multiple iterators are usually supported by returning new objects for the iter call; a single iterator generally means an object returns itself. In Chapter 20, we’ll also find that generator functions and expressions behave like map and zip instead of range in this regard, supporting a single active iteration. In that chapter, we’ll see some subtle implications of one-shot iterators in loops that attempt to scan multiple times.

Dictionary View Iterators

As we saw briefly in Chapter 8, in Python 3.0 the dictionary keys, values, and items methods return iterable view objects that generate result items one at a time, instead of producing result lists all at once in memory. View items maintain the same physical ordering as that of the dictionary and reflect changes made to the underlying dictionary. Now that we know more about iterators, here’s the rest of the story:

>>> D = dict(a=1, b=2, c=3)
>>> D
{'a': 1, 'c': 3, 'b': 2}

>>> K = D.keys()                              # A view object in 3.0, not a list
>>> K
<dict_keys object at 0x026D83C0>

>>> next(K)                                   # Views are not iterators themselves
TypeError: dict_keys object is not an iterator

>>> I = iter(K)                               # Views have an iterator,
>>> next(I)                                   # which can be used manually
'a'                                           # but does not support len(), index
>>> next(I)
'c'

>>> for k in D.keys(): print(k, end=' ')      # All iteration contexts use auto
...
a c b

As for all iterators, you can always force a 3.0 dictionary view to build a real list by passing it to the list built-in. However, this usually isn’t required except to display results interactively or to apply list operations like indexing:

>>> K = D.keys()
>>> list(K)                              # Can still force a real list if needed
['a', 'c', 'b']

>>> V = D.values()                       # Ditto for values() and items() views
>>> V
<dict_values object at 0x026D8260>
>>> list(V)
[1, 3, 2]

>>> list(D.items())
[('a', 1), ('c', 3), ('b', 2)]

>>> for (k, v) in D.items(): print(k, v, end=' ')
...
a 1 c 3 b 2

In addition, 3.0 dictionaries still have iterators themselves, which return successive keys. Thus, it’s not often necessary to call keys directly in this context:

>>> D                                    # Dictionaries still have own iterator
{'a': 1, 'c': 3, 'b': 2}                 # Returns next key on each iteration
>>> I = iter(D)
>>> next(I)
'a'
>>> next(I)
'c'

>>> for key in D: print(key, end=' ')    # Still no need to call keys() to iterate
...                                      # But keys is an iterator in 3.0 too!
a c b

Finally, remember again that because keys no longer returns a list, the traditional coding pattern for scanning a dictionary by sorted keys won’t work in 3.0. Instead, convert keys views first with a list call, or use the sorted call on either a keys view or the dictionary itself, as follows:

>>> D
{'a': 1, 'c': 3, 'b': 2}
>>> for k in sorted(D.keys()): print(k, D[k], end=' ')
...
a 1 b 2 c 3

>>> D
{'a': 1, 'c': 3, 'b': 2}
>>> for k in sorted(D): print(k, D[k], end=' ')    # Best practice key sorting
...
a 1 b 2 c 3

Other Iterator Topics

We’ll learn more about both list comprehensions and iterators in Chapter 20, in conjunction with functions, and again in Chapter 29 when we study classes. As you’ll see later:

  • User-defined functions can be turned into iterable generator functions, with yield statements.

  • List comprehensions morph into iterable generator expressions when coded in parentheses.

  • User-defined classes are made iterable with __iter__ or __getitem__ operator overloading.

In particular, user-defined iterators defined with classes allow arbitrary objects and operations to be used in any of the iteration contexts we’ve met here.

Chapter Summary

In this chapter, we explored concepts related to looping in Python. We took our first substantial look at the iteration protocol in Python—a way for nonsequence objects to take part in iteration loops—and at list comprehensions. As we saw, a list comprehension is an expression similar to a for loop that applies another expression to all the items in any iterable object. Along the way, we also saw other built-in iteration tools at work and studied recent iteration additions in Python 3.0.

This wraps up our tour of specific procedural statements and related tools. The next chapter closes out this part of the book by discussing documentation options for Python code; documentation is also part of the general syntax model, and it’s an important component of well-written programs. In the next chapter, we’ll also dig into a set of exercises for this part of the book before we turn our attention to larger structures such as functions. As usual, though, let’s first exercise what we’ve learned here with a quiz.

Test Your Knowledge: Quiz

  1. How are for loops and iterators related?

  2. How are for loops and list comprehensions related?

  3. Name four iteration contexts in the Python language.

  4. What is the best way to read line by line from a text file today?

  5. What sort of weapons would you expect to see employed by the Spanish Inquisition?

Test Your Knowledge: Answers

  1. The for loop uses the iteration protocol to step through items in the object across which it is iterating. It calls the object’s __next__ method (run by the next built-in) on each iteration and catches the StopIteration exception to determine when to stop looping. Any object that supports this model works in a for loop and in other iteration contexts.

  2. Both are iteration tools. List comprehensions are a concise and efficient way to perform a common for loop task: collecting the results of applying an expression to all items in an iterable object. It’s always possible to translate a list comprehension to a for loop, and part of the list comprehension expression looks like the header of a for loop syntactically.

  3. Iteration contexts in Python include the for loop; list comprehensions; the map built-in function; the in membership test expression; and the built-in functions sorted, sum, any, and all. This category also includes the list and tuple built-ins, string join methods, and sequence assignments, all of which use the iteration protocol (the __next__ method) to step across iterable objects one item at a time.

  4. The best way to read lines from a text file today is to not read it explicitly at all: instead, open the file within an iteration context such as a for loop or list comprehension, and let the iteration tool automatically scan one line at a time by running the file’s next method on each iteration. This approach is generally best in terms of coding simplicity, execution speed, and memory space requirements.

  5. I’ll accept any of the following as correct answers: fear, intimidation, nice red uniforms, a comfy chair, and soft pillows.



[33] Terminology in this topic tends to be a bit loose. This text uses the terms “iterable” and “iterator” interchangeably to refer to an object that supports iteration in general. Sometimes the term “iterable” refers to an object that supports iter and “iterator” refers to an object return by iter that supports next(I), but that convention is not universal in either the Python world or this book.

[34] Technically speaking, the for loop calls the internal equivalent of I.__next__, instead of the next(I) used here. There is rarely any difference between the two, but as we’ll see in the next section, there are some built-in objects in 3.0 (such as os.popen results) that support the former and not the latter, but may be still be iterated across in for loops. Your manual iterations can generally use either call scheme. If you care for the full story, in 3.0 os.popen results have been reimplemented with the subprocess module and a wrapper class, whose __getattr__ method is no longer called in 3.0 for implicit __next__ fetches made by the next built-in, but is called for explicit fetches by name—a 3.0 change issue we’ll confront in Chapters 37 and 38, which apparently burns some standard library code too! Also in 3.0, the related 2.6 calls os.popen2/3/4 are no longer available; use subprocess.Popen with appropriate arguments instead (see the Python 3.0 library manual for the new required code).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset