Chapter 8. Conditionals and Loops

The primary focus of this chapter are Python’s conditional and looping statements, and all their related components. We will take a close look at if, while, for, and their friends else, elif, break, continue, and pass.

8.1 if Statement

The if statement for Python will seem amazingly familiar. It is made up of three main components: the keyword itself, an expression that is tested for its truth value, and a code suite to execute if the expression evaluates to nonzero or true. The syntax for an if statement is:

if expression:
     expr_true_suite

The suite of the if clause, expr_true_suite, will be executed only if the above conditional expression results in a Boolean true value. Otherwise, execution resumes at the next statement following the suite.

8.4.1 Multiple Conditional Expressions

The Boolean operators and, or, and not can be used to provide multiple conditional expressions or perform negation of expressions in the same if statement.

image

8.1.2 Single Statement Suites

If the suite of a compound statement, i.e., if clause, while or for loop, consists only of a single line, it may go on the same line as the header statement:

if make_hard_copy: send_data_to_printer()

Single line statements such as the above are valid syntax-wise; however, although it may be convenient, it may make your code more difficult to read, so we recommend you indent the suite on the next line. Another good reason is that if you must add another line to the suite, you have to move that line down anyway.

8.2 else Statement

Like other languages, Python features an else statement that can be paired with an if statement. The else statement identifies a block of code to be executed if the conditional expression of the if statement resolves to a false Boolean value. The syntax is what you expect:

image

Now we have the obligatory usage example:

image

8.2.1 “Dangling else” Avoidance

Python’s design of using indentation rather than braces for code block delimitation not only helps to enforce code correctness, but it even aids implicitly in avoiding potential problems in code that is syntactically correct. One of those such problems is the (in)famous “dangling else” problem, a semantic optical illusion.

We present some C code here to illustrate our example (which is also illuminated by K&R and other programming texts):

image

The question is, which if does the else belong to? In the C language, the rule is that the else stays with the closest if. In our example above, although indented for the outer if statement, the else statement really belongs to the inner if statement because the C compiler ignores superfluous whitespace. As a result, if you have a positive balance but it is below the minimum, you will get the horrid (and erroneous) message that your balance is either zero or negative.

Although solving this problem may be easy due to the simplistic nature of the example, any larger sections of code embedded within this framework may be a hair-pulling experience to root out. Python puts up guardrails not necessarily to prevent you from driving off the cliff, but to steer you away from danger. The same example in Python will result in one of the following choices (one of which is correct):

image

Python’s use of indentation forces the proper alignment of code, giving the programmer the ability to make a conscious decision as to which if an else statement belongs to. By limiting your choices and thus reducing ambiguities, Python encourages you to develop correct code the first time. It is impossible to create a dangling else problem in Python. Also, since parentheses are not required, Python code is easier to read.

8.3 elif (aka else-if) Statement

elif is the Python else-if statement. It allows one to check multiple expressions for truth value and execute a block of code as soon as one of the conditions evaluates to true. Like the else, the elif statement is optional. However, unlike else, for which there can be at most one statement, there can be an arbitrary number of elif statements following an if.

image

Proxy for switch/case Statement?

At some time in the future, Python may support the switch or case statement, but you can simulate it with various Python constructs. But even a good number of if-elif statements are not that difficult to read in Python:

image

Although the above statements do work, you can simplify them with a sequence and the membership operator:

image

We can create an even more elegant solution using Python dictionaries, which we learned about in Chapter 7, “Mapping and Set Types.”

image

One well-known benefit of using mapping types such as dictionaries is that the searching is very fast compared to a sequential lookup as in the above if-elif-else statements or using a for loop, both of which have to scan the elements one at a time.

8.4 Conditional Expressions (aka “the Ternary Operator”)

If you are coming from the C/C++ or Java world, it is difficult to ignore or get over the fact that Python has not had a conditional or ternary operator (C ? X : Y) for the longest time. (C is the conditional expression; X represents the resultant expression if C is True and Y if C is False.) Guido has resisted adding such a feature to Python because of his belief in keeping code simple and not giving programmers easy ways to obfuscate their code.

image

However, after more than a decade, he has given in, mostly because of the error-prone ways in which people have tried to simulate it using and and or, many times incorrectly. According to the FAQ, the one way of getting it right is (C and [X] or [Y])[0]. The only problem was that the community could not agree on the syntax. (You really have to take a look at PEP 308 to see all the different proposals.) This is one of the areas of Python in which people have expressed strong feelings.

The final decision came down to Guido choosing the most favored (and his most favorite) of all the choices, then applying it to various modules in the standard library. According to the PEP, “this review approximates a sampling of real-world use cases, across a variety of applications, written by a number of programmers with diverse backgrounds.” And this is the syntax that was finally chosen for integration into Python 2.5: X if C else Y.

The main motivation for even having a ternary operator is to allow the setting of a value based on a conditional all on a single line, as opposed to the standard way of using an if-else statement, as in this min() example using numbers x and y:

image

In versions prior to 2.5, Python programmers at best could do this:

image

In versions 2.5 and newer, this can be further simplified to:

image

8.5 while Statement

Python’s while is the first looping statement we will look at in this chapter. In fact, it is a conditional looping statement. In comparison with an if statement where a true expression will result in a single execution of the if clause suite, the suite in a while clause will be executed continuously in a loop until that condition is no longer satisfied.

8.5.1 General Syntax

Here is the syntax for a while loop:

while expression:
    suite_to_repeat

The suite_to_repeat clause of the while loop will be executed continuously in a loop until expression evaluates to Boolean False. This type of looping mechanism is often used in a counting situation, such as the example in the next subsection.

8.5.2 Counting Loops

image

The suite here, consisting of the print and increment statements, is executed repeatedly until count is no longer less than 9. With each iteration, the current value of the index count is displayed and then bumped up by 1. If we take this snippet of code to the Python interpreter, entering the source and seeing the resulting execution would look something like:

image

8.5.3 Infinite Loops

One must use caution when using while loops because of the possibility that the condition never resolves to a false value. In such cases, we would have a loop that never ends on our hands. These “infinite” loops are not necessarily bad things—many communications “servers” that are part of client/server systems work exactly in that fashion. It all depends on whether or not the loop was meant to run forever, and if not, whether the loop has the possibility of terminating; in other words, will the expression ever be able to evaluate to false?

image

For example, the piece of code above was set deliberately to never end because True is not going to somehow change to False. The main point of this server code is to sit and wait for clients to connect, presumably over a network link. These clients send requests which the server understands and processes.

After the request has been serviced, a return value or data is returned to the client who may either drop the connection altogether or send another request. As far as the server is concerned, it has performed its duty to this one client and returns to the top of the loop to wait for the next client to come along. You will find out more about client/server computing in Chapter 16, “Network Programming” and Chapter 17, “Internet Client Programming.”

8.6 for Statement

The other looping mechanism in Python comes to us in the form of the for statement. It represents the single most powerful looping construct in Python. It can loop over sequence members, it is used in list comprehensions and generator expressions, and it knows how to call an iterator’s next() method and gracefully ends by catching StopIteration exceptions (all under the covers). If you are new to Python, we will tell you now that you will be using for statements a lot.

Unlike the traditional conditional looping for statement found in mainstream languages like C/C++, Fortran, or Java, Python’s for is more akin to a shell or scripting language’s iterative foreach loop.

8.6.1 General Syntax

The for loop traverses through individual elements of an iterable (like a sequence or iterator) and terminates when all the items are exhausted. Here is its syntax:

for iter_var in iterable:
    suite_to_repeat

With each loop, the iter_var iteration variable is set to the current element of the iterable (sequence, iterator, or object that supports iteration), presumably for use in suite_to_repeat.

8.6.2 Used with Sequence Types

In this section, we will see how the for loop works with the different sequence types. The examples will include string, list, and tuple types.

image

When iterating over a string, the iteration variable will always consist of only single characters (strings of length 1). Such constructs may not necessarily be useful. When seeking characters in a string, more often than not, the programmer will either use in to test for membership, or one of the string module functions or string methods to check for sub strings.

One place where seeing individual characters does come in handy is during the debugging of sequences in a for loop in an application where you are expecting strings or entire objects to show up in your print statements. If you see individual characters, this is usually a sign that you received a single string rather than a sequence of objects.

There are three basic ways of iterating over a sequence:

Iterating by Sequence Item

image

In the above example, a list is iterated over, and for each iteration, the eachName variable contains the list element that we are on for that particular iteration of the loop.

Iterating by Sequence Index

An alternative way of iterating through each item is by index offset into the sequence itself:

image

Rather than iterating through the elements themselves, we are iterating through the indices of the list.

We employ the assistance of the len() built-in function, which provides the total number of elements in the tuple as well as the range() built-in function (which we will discuss in more detail below) to give us the actual sequence to iterate over.

image

Using range(), we obtain a list of the indexes that nameIndex iterates over; and using the slice/subscript operator ( [ ] ), we can obtain the corresponding sequence element.

Those of you who are performance pundits will no doubt recognize that iteration by sequence item wins over iterating via index. If not, this is something to think about. (See Exercise 8-13.)

Iterating with Item and Index
image

The best of both worlds comes from using the enumerate() built-in function, which was added to Python in version 2.3. Enough said ... here is some code:

image

8.6.3 Used with Iterator Types

Using for loops with iterators is identical to using them with sequences. The only difference is that the for statement must do a little bit of extra work on your behalf. An iterator does not represent a set of items to loop over.

Instead, iterator objects have a next() method, which is called to get subsequent items. When the set of items has been exhausted, the iterator raises the StopIteration exception to signal that it has finished. Calling next() and catching StopIteration is built-in to the for statement.

When you are using a for loop with an iterator, the code is nearly identical to that of looping over sequence items. In fact, for most cases, you cannot tell that you are iterating over a sequence or an iterator, hence the reason why you will see us refer to iterating over an iterable, which could mean a sequence, an iterator, or any object that supports iteration, e.g., has a next() method.

8.6.4 range() Built-in Function

We mentioned above during our introduction to Python’s for loop that it is an iterative looping mechanism. Python also provides a tool that will let us use the for statement in a traditional pseudo-conditional setting, i.e., when counting from one number to another and quitting once the final number has been reached or some condition is no longer satisfied.

The built-in function range() can turn your foreach-like for loop back into one that you are more familiar with, i.e., counting from 0 to 10, or counting from 10 to 100 in increments of 5.

range() Full Syntax

Python presents two different ways to use range(). The full syntax requires that two or all three integer arguments are present:

range(startendstep=1)

range() will then return a list where for any k, start <= k < end and k iterates from start to end in increments of step. step cannot be 0, or an error condition will occur.

>>> range(2, 19, 3)
[2, 5, 8, 11, 14, 17]

If step is omitted and only two arguments given, step takes a default value of 1.

>>> range(3, 7)
[3, 4, 5, 6]

Let’s take a look at an example used in the interpreter environment:

image

Our for loop now “counts” from 2 to 19, incrementing by steps of 3. If you are familiar with C, then you will notice the direct correlation between the arguments of range() and those of the variables in the C for loop:

image

Although it seems like a conditional loop now (checking if eachVal < 19), reality tells us that range() takes our conditions and generates a list that meets our criteria, which in turn is used by the same Python for statement.

range() Abbreviated Syntax

range() also has two abbreviated syntax formats:

range(end)

range(startend)

We saw the shortest syntax earlier in Chapter 2. Given only a single value, start defaults to 0, step defaults to 1, and range() returns a list of numbers from zero up to the argument end:

>>> range(5)
[0, 1, 2, 3, 4]

Given two values, this midsized version of range() is exactly the same as the long version of range() taking two parameters with step defaulting to 1. We will now take this to the Python interpreter and plug in for and print statements to arrive at:

image

Core Note: Why not just one syntax for range()?

image

Now that you know both syntaxes for range(), one nagging question you may have is, why not just combine the two into a single one that looks like this?

range(start=0, endstep=1) # invalid

This syntax will work for a single argument or all three, but not two. It is illegal because the presence of step requires start to be given. In other words, you cannot provide end and step in a two-argument version because they will be (mis)interpreted as start and end.

8.6.5 xrange() Built-in Function

xrange() is similar to range() except that if you have a really large range list, xrange() may come in handier because it does not have to make a complete copy of the list in memory. This built-in was made for exclusive use in for loops. It does not make sense outside a for loop. Also, as you can imagine, the performance will not be as good because the entire list is not in memory. In Python 3, xrange() replaces and becomes range(), returning a range object, which is an iterator.

image

8.6.6 Sequence-Related Built-in Functions

sorted(), reversed(), enumerate(), zip()

Below are some examples of using these loop-oriented sequence-related functions. The reason why they are “sequence-related” is that half of them (sorted() and zip()) return a real sequence (list), while the other two (reversed() and enumerate()) return iterators (sequence-like).

image

Now that we have covered all the loops Python has to offer, let us take a look at the peripheral commands that typically go together with loops. These include statements to abandon the loop (break) and to immediately begin the next iteration (continue).

8.7 break Statement

The break statement in Python terminates the current loop and resumes execution at the next statement, just like the traditional break found in C. The most common use for break is when some external condition is triggered (usually by testing with an if statement), requiring a hasty exit from a loop. The break statement can be used in both while and for loops.

image

The task of this piece of code is to find the largest divisor of a given number num. We iterate through all possible numbers that could possibly be factors of num, using the count variable and decrementing for every value that does not divide num. The first number that evenly divides num is the largest factor, and once that number is found, we no longer need to continue and use break to terminate the loop.

image

The break statement here is used to interrupt the iteration of the list. The goal is to find a target element in the list, and, if found, to remove it from the database and break out of the loop.

8.8 continue Statement

Core Note: continue statements

image

Whether in Python, C, Java, or any other structured language that features the continue statement, there is a misconception among some beginning programmers that the traditional continue statement “immediately starts the next iteration of a loop.” While this may seem to be the apparent action, we would like to clarify this somewhat invalid supposition. Rather than beginning the next iteration of the loop when a continue statement is encountered, a continue statement terminates or discards the remaining statements in the current loop iteration and goes back to the top. If we are in a conditional loop, the conditional expression is checked for validity before beginning the next iteration of the loop. Once confirmed, then the next iteration begins. Likewise, if the loop were iterative, a determination must be made as to whether there are any more arguments to iterate over. Only when that validation has completed successfully can we begin the next iteration.

The continue statement in Python is not unlike the traditional continue found in other high-level languages. The continue statement can be used in both while and for loops. The while loop is conditional, and the for loop is iterative, so using continue is subject to the same requirements (as highlighted in the Core Note above) before the next iteration of the loop can begin. Otherwise, the loop will terminate normally.

image

In this combined example using while, for, if, break, and continue, we are looking at validating user input. The user is given three opportunities to enter the correct password; otherwise, the valid variable remains False, which presumably will result in appropriate action being taken soon after.

8.9 pass Statement

One Python statement not found in C is the pass statement. Because Python does not use curly braces to delimit blocks of code, there are places where code is syntactically required. We do not have the equivalent empty braces or single semicolon the way C does to indicate “do nothing.” If you use a Python statement that expects a sub-block of code or suite, and one is not present, you will get a syntax error condition. For this reason, we have pass, a statement that does absolutely nothing—it is a true NOP, to steal the “No OPeration” assembly code jargon. Style- and development-wise, pass is also useful in places where your code will eventually go, but has not been written yet (in stubs, for example):

image

This code structure is helpful during the development or debugging stages because you want the structure to be there while the code is being created, but you do not want it to interfere with the other parts of the code that have been completed already. In places where you want nothing to execute, pass is a good tool to have in the box.

Another popular place is with exception handling, which we will take a look at in Chapter 10; this is where you can track an error if it occurs, but take no action if it is not fatal (you just want to keep a record of the event or perform an operation under the covers if an error occurs).

8.10 else Statement ... Take Two

In C (as well as in most other languages), you will not find an else statement outside the realm of conditional statements, yet Python bucks the trend again by offering these in while and for loops. How do they work? When used with loops, an else clause will be executed only if a loop finishes to completion, meaning they were not abandoned by break.

One popular example of else usage in a while statement is in finding the largest factor of a number. We have implemented a function that performs this task, using the else statement with our while loop. The showMaxFactor() function in Example 8.1 (maxFact.py) utilizes the else statement as part of a while loop.

Example 8.1. while-else Loop Example (maxFact.py)

This program displays the largest factors for numbers between 10 and 20. If the number is prime, the script will indicate that as well.

image

The loop beginning on line 3 in showMaxFactor() counts down from half the amount (starts checking if two divides the number, which would give the largest factor). The loop decrements each time (line 10) through until a divisor is found (lines 6-9). If a divisor has not been found by the time the loop decrements to 1, then the original number must be prime. The else clause on lines 11-12 takes care of this case. The main part of the program on lines 14-15 fires off the requests to showMaxFactor() with the numeric argument.

Running our program results in the following output:

image

Likewise, a for loop can have a post-processing else. It operates exactly the same way as for a while loop. As long as the for loop exits normally (not via break), the else clause will be executed.

Table 8.1 summarizes with which conditional or looping statements auxiliary statements can be used.

Table 8.1. Auxiliary Statements to Loops and Conditionals

image

8.11 Iterators and the iter() Function

8.11.1 What Are Iterators?

image

Iterators were added to Python in version 2.2 to give sequence-like objects a sequence-like interface. We formally introduced sequences back in Chapter 6. They are just data structures that you can “iterate” over by using their index starting at 0 and continuing till the final item of the sequence. Because you can do this “counting,” iterating over sequences is trivial. Iteration support in Python works seamlessly with sequences but now also allows programmers to iterate through non-sequence types, including user-defined objects.

Iterators come in handy when you are iterating over something that is not a sequence but exhibits behavior that makes it seem like a sequence, for example, keys of a dictionary, lines of a file, etc. When you use loops to iterate over an object item, you will not be able to easily tell whether it is an iterator or a sequence. The best part is that you do not have to care because Python makes it seem like a sequence.

8.11.2 Why Iterators?

The defining PEP (234) cites that iterators:

• Provide an extensible iterator interface.

• Bring performance enhancements to list iteration.

• Allow for big performance improvements in dictionary iteration.

• Allow for the creation of a true iteration interface as opposed to overriding methods originally meant for random element access.

• Be backward-compatible with all existing user-defined classes and extension objects that emulate sequences and mappings.

• Result in more concise and readable code that iterates over non-sequence collections (mappings and files, for instance).

8.11.3 How Do You Iterate?

Basically, instead of an index to count sequentially, an iterator is any item that has a next() method. When the next item is desired, either you or a looping mechanism like for will call the iterators next() method to get the next value. Once the items have been exhausted, a StopIteration exception is raised, not to indicate an error, but to let folks know that we are done.

Iterators do have some restrictions, however. For example, you cannot move backward, go back to the beginning, or copy an iterator. If you want to iterate over the same objects again (or simultaneously), you have to create another iterator object. It isn’t all that bad, however, as there are various tools to help you with using iterators.

image

There is a reversed() built-in function that returns an iterator that traverses an iterable in reverse order. The enumerate() BIF also returns an iterator. Two new BIFs, any() and all(), made their debut in Python 2.5—they will return True if any or all items traversed across an iterator have a Boolean True value, respectively. We saw earlier in the chapter how you can use it in a for loop to iterate over both the index and the item of an iterable. There is also an entire module called itertools that contains various iterators you may find useful.

8.11.4 Using Iterators with ...

Sequences

As mentioned before, iterating through Python sequence types is as expected:

image

If this had been an actual program, we would have enclosed the code inside a try-except block. Sequences now automatically produce their own iterators, so a for loop:

for i in seq:
    do_something_to(i)

under the covers now really behaves like this:

image

However, your code does not need to change because the for loop itself calls the iterator’s next() method (as well as monitors for StopIteration).

Dictionaries

Dictionaries and files are two other Python data types that received the iteration makeover. A dictionary’s iterator traverses its keys. The idiom for eachKey in myDict.keys() can be shortened to for eachKey in myDict as shown here:

image

In addition, three new built-in dictionary methods have been introduced to define the iteration: myDict.iterkeys() (iterate through the keys), myDict.itervalues() (iterate through the values), and myDict.iteritems() (iterate through key/value pairs). Note that the in operator has been modified to check a dictionary’s keys. This means the Boolean expression myDict.has_key(anyKey) can be simplified as anyKey in myDict.

Files

File objects produce an iterator that calls the readline() method. Thus, they loop through all lines of a text file, allowing the programmer to replace essentially for eachLine in myFile.readlines() with the more simplistic for eachLine in myFile:

image

8.11.5 Mutable Objects and Iterators

Remember that interfering with mutable objects while you are iterating them is not a good idea. This was a problem before iterators appeared. One popular example of this is to loop through a list and remove items from it if certain criteria are met (or not):

image

All sequences are immutable except lists, so the danger occurs only there. A sequence’s iterator only keeps track of the Nth element you are on, so if you change elements around during iteration, those updates will be reflected as you traverse through the items. If you run out, then StopIteration will be raised.

When iterating through keys of a dictionary, you must not modify the dictionary. Using a dictionary’s keys() method is okay because keys() returns a list that is independent of the dictionary. But iterators are tied much more intimately with the actual object and will not let us play that game anymore:

image

This will help prevent buggy code. For full details on iterators, see PEP 234.

8.11.6 How to Create an Iterator

You can take an item and call iter() on it to turn it into an iterator. Its syntax is one of the following:

iter(obj)
iter(func, sentinel)

If you call iter() with one object, it will check if it is just a sequence, for which the solution is simple: It will just iterate through it by (integer) index from 0 to the end. Another way to create an iterator is with a class. As we will see in Chapter 13, a class that implements the __iter__() and next() methods can be used as an iterator.

If you call iter() with two arguments, it will repeatedly call func to obtain the next value of iteration until that value is equal to sentinel.

8.12 List Comprehensions

List comprehensions (or “list comps” for short) come to us from the functional programming language Haskell. They are an extremely valuable, simple, and flexible utility tool that helps us create lists on the fly. They were added to Python in version 2.0.

image

Up ahead in Functions (Chapter 11), we will be discussing long-time Python functional programming features like lambda, map(), and filter(). They have been around in Python for quite a while, but with list comprehensions, they have simplified their use to only requiring a list comp instead. map() is a function that applies an operation to list members, and filter() filters out list members based on a conditional expression. Finally, lambda allows you to create one-line function objects on the fly. It is not important that you learn them now, but you will see examples of them in this section because we are discussing the merits of list comps. Let us take a look at the simpler list comprehension syntax first:

[expr for iter_var in iterable]

The core of this statement is the for loop, which iterates over each item of iterable. The prefixed expr is applied for each member of the sequence, and the resulting values comprise the list that the expression yields. The iteration variable need not be part of the expression.

Here is a sneak preview of some code from Chapter 11. It has a lambda function that squares the members of a sequence:

>>> map(lambda x: x ** 2, range(6))
[0, 1, 4, 9, 16, 25]

We can replace this code with the following list comprehension statement:

>>> [x ** 2 for x in range(6)]
[0, 1, 4, 9, 16, 25]

In the new statement, only one function call (range()) is made (as opposed to three—range(), map(), and the lambda function). You may also use parentheses around the expression if [(x ** 2) for x in range(6)] is easier for you to read. This syntax for list comprehensions can be a substitute for and is more efficient than using the map() built-in function along with lambda.

List comprehensions also support an extended syntax with the if statement:

[expr for iter_var in iterable if cond_expr]

This syntax will filter or “capture” sequence members only if they meet the condition provided for in the cond_expr conditional expression during iteration.

Recall the following odd() function below, which determines whether a numeric argument is odd or even (returning 1 for odd numbers and 0 for even numbers):

def odd(n):
    return n % 2

We were able to take the core operation from this function, and use it with filter() and lambda to obtain the set of odd numbers from a sequence:

image

As in the previous example, we can bypass the use of filter() and lambda to obtain the desired set of numbers with list comprehensions:

>>> [x for x in seq if x % 2]
[11, 9, 9, 9, 23, 9, 7, 11]

Let us end this section with a few more practical examples.

Matrix Example

Do you want to iterate through a matrix of three rows and five columns? It is as easy as:

image

Disk File Example

Now let us say we have the following data file and want to count the total number of non-whitespace characters in the file hhga.txt:

And the Lord spake, saying, “First shalt thou take out the
Holy Pin. Then shalt thou count to three, no more, no less.
Three shall be the number thou shalt count, and the number of
the counting shall be three. Four shalt thou not count,
neither count thou two, excepting that thou then proceed to
three. Five is right out. Once the number three, being the
third number, be reached, then lobbest thou thy Holy Hand
Grenade of Antioch towards thy foe, who, being naughty in My
sight, shall snuff it.”

We know that we can iterate through each line with for line in data, but more than that, we can also go and split each line up into words, and we can sum up the number of words to get a total like this:

image

Let us get a quick total file size:

image

Assuming that there is at least one whitespace character in the file, we know that there are fewer than 499 non-whitespace characters in the file. We can sum up the length of each word to arrive at our total:

image

Note we have to rewind back to the beginning of the file each time through because the iterator exhausts it. But wow, a non-obfuscated one-liner now does something that used to take many lines of code to accomplish!

As you can see, list comps support multiple nested for loops and more than one if clause. The full syntax can be found in the official documentation. You can also read more about list comprehensions in PEP 202.

8.13 Generator Expressions

Generator expressions extend naturally from list comprehensions (“list comps”). When list comps came into being in Python 2.0, they revolutionized the language by giving users an extremely flexible and expressive way to designate the contents of a list on a single line. Ask any long-time Python user what new features have changed the way they program Python, and list comps should be near the top of the list.

Another significant feature that was added to Python in version 2.2 was the generator. A generator is a specialized function that allows you to return a value and “pause” the execution of that code and resume it at a later time. We will discuss generators in Chapter 11.

image

The one weakness of list comps is that all of the data have to be made available in order to create the entire list. This can have negative consequences if an iterator with a large dataset is involved. Generator expressions resolve this issue by combining the syntax and flexibility of list comps with the power of generators.

Introduced in Python 2.4, generator expressions are similar to list comprehensions in that the basic syntax is nearly identical; however, instead of building a list with values, they return a generator that “yields” after processing each item. Because of this, generator expressions are much more memory efficient by performing “lazy evaluation.” Take a look at how similar they appear to list comps:

LIST COMPREHENSION:

[expr for iter_var in iterable if cond_expr]

GENERATOR EXPRESSION:

(expr for iter_var in iterable if cond_expr)

Generator expressions do not make list comps obsolete. They are just a more memory-friendly construct, and on top of that, are a great use case of generators. We now present a set of generator expression examples, including a long-winded one at the end showing you how Python code has changed over the years.

Disk File Example

In the previous section on list comprehensions, we took a look at finding the total number of non-whitespace characters in a text file. In the final snippet of code, we showed you how to perform that in one line of code using a list comprehension. If that file became unwieldy due to size, it would become fairly unfriendly memory-wise because we would have to put together a very long list of word lengths.

Instead of creating that large list, we can use a generator expression to perform the summing. Instead of building up this long list, it will calculate individual lengths and feed it to the sum() function, which takes not just lists but also iterables like generator expressions. We can then shorten our example above to be even more optimal (code- and execution-wise):

>>> sum(len(word) for line in data for word in line.split())
408

All we did was remove the enclosing list comprehension square brackets: Two bytes shorter and it saves memory ... very environmentally friendly!

Cross-Product Pairs Example

Generator expressions are unlike list comprehensions in that they are lazy, which is their main benefit. They are also great ways of dealing with other lists and generators, like rows and cols here:

image

We do not need to create a new list. We can piece together things on the fly. Let us create a generator expression for rows and cols:

x_product_pairs = ((i, j) for i in rows for j in cols())

Now we can loop through x_product_pairs, and it will loop through rows and cols lazily:

image

Refactoring Example

Let us look at some evolutionary code via an example that finds the longest line in a file. In the old days, the following was acceptable for reading a file:

image

Actually, this is not that old. If it were really old Python code, the Boolean constant True would be the integer one, and instead of using the string strip() method, you would be using the string module:

import string
          :
    len(string.strip(f.readline()))

Since that time, we realized that we could release the (file) resource sooner if we read all the lines at once. If this was a log file used by many processes, then it behooves us not to hold onto a (write) file handle for an extended period of time. Yes, our example is for read, but you get the idea. So the preferred way of reading in lines from a file changed slightly to reflect this preference:

image

List comps allow us to simplify our code a little bit more and give us the ability to do more processing before we get our set of lines. In the next snippet, in addition to reading in the lines from the file, we call the string strip() method immediately instead of waiting until later.

image

Still, both examples above have a problem when dealing with a large file as readlines() reads in all its lines. When iterators came around, and files became their own iterators, readlines() no longer needed to be called. While we are at it, why can’t we just make our data set the set of line lengths (instead of lines)? That way, we can use the max() built-in function to get the longest string length:

image

The only problem here is that even though you are iterating over f line by line, the list comprehension itself needs all lines of the file read into memory in order to generate the list. Let us simplify our code even more: we will replace the list comp with a generator expression and move it inside the call to max() so that all of the complexity is on a single line:

image

One more refactoring, which we are not as much fans of, is dropping the file mode (defaulting to read) and letting Python clean up the open file. It is not as bad as if it were a file open for write, however, but it does work:

return max(len(x.strip()) for x in open('/etc/motd'))

We have come a long way, baby. Note that even a one-liner is not obfuscated enough in Python to make it difficult to read. Generator expressions were added in Python 2.4, and you can read more about them in PEP 289.

8.14 Related Modules

Iterators were introduced in Python 2.2, and the itertools module was added in the next release (2.3) to aid developers who had discovered how useful iterators were but wanted some helper tools to aid in their development. The interesting thing is that if you read the documentation for the various utilities in itertools, you will discover generators. So there is a relationship between iterators and generators. You can read more about this relationship in Chapter 11, “Functions”.

8.15 Exercises

8-1. Conditionals. Study the following code:

image

(a) Which of the statements above (A, B, C, D, E) will be executed if x < 0?

(b) Which of the statements above will be executed if x == 0?

(c) Which of the statements above will be executed if x > 0?

8-2. Loops. Write a program to have the user input three (3) numbers: (f)rom, (t)o, and (i)ncrement. Count from f to t in increments of i, inclusive of f and t. For example, if the input is f == 2, t == 26, and i == 4, the program would output: 2, 6, 10, 14, 18, 22, 26.

8-3. range(). What argument(s) could we give to the range() built-in function if we wanted the following lists to be generated?

(a) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

(b) [3, 6, 9, 12, 15, 18]

(c) [-20, 200, 420, 640, 860]

8-4. Prime Numbers. We presented some code in this chapter to determine a number’s largest factor or if it is prime. Turn this code into a Boolean function called isprime() such that the input is a single value, and the result returned is True if the number is prime and False otherwise.

8-5. Factors. Write a function called getfactors() that takes a single integer as an argument and returns a list of all its factors, including 1 and itself.

8-6. Prime Factorization. Take your solutions for isprime() and getfactors() in the previous problems and create a function that takes an integer as input and returns a list of its prime factors. This process, known as prime factorization, should output a list of factors such that if multiplied together, they will result in the original number. Note that there could be repeats in the list. So if you gave an input of 20, the output would be [2, 2, 5].

8-7. Perfect Numbers. A perfect number is one whose factors (except itself) sum to itself. For example, the factors of 6 are 1, 2, 3, and 6. Since 1 + 2 + 3 is 6, it (6) is considered a perfect number. Write a function called isperfect() which takes a single integer input and outputs 1 if the number is perfect and 0 otherwise.

8-8. Factorial. The factorial of a number is defined as the product of all values from one to that number. A shorthand for N factorial is N! where N! == factorial(N) == 1 * 2 * 3 * ... * (N-2) * (N-1) * N. So 4! == 1 * 2 * 3 * 4. Write a routine such that given N, the value N! is returned.

8-9. Fibonacci Numbers. The Fibonacci number sequence is 1, 1, 2, 3, 5, 8, 13, 21, etc. In other words, the next value of the sequence is the sum of the previous two values in the sequence. Write a routine that, given N, displays the value of the Nth Fibonacci number. For example, the first Fibonacci number is 1, the 6th is 8, and so on.

8-10. Text Processing. Determine the total number of vowels, consonants, and words (separated by spaces) in a text sentence. Ignore special cases for vowels and consonants such as “h,” “y,” “qu,” etc. Extra credit: create code to handle those special case.

8-11. Text Processing. Write a program to ask the user to input a list of names, in the format “Last Name, First Name,” i.e., last name, comma, first name. Write a function that manages the input so that when/if the user types the names in the wrong order, i.e., “First Name Last Name,” the error is corrected, and the user is notified. This function should also keep track of the number of input mistakes. When the user is done, sort the list, and display the sorted names in “Last Name, First Name” order.

EXAMPLE input and output (you don’t have to do it this way exactly):

image

8-12. (Integer) Bit Operators. Write a program that takes begin and end values and prints out a decimal, binary, octal, hexadecimal chart like the one shown below. If any of the characters are printable ASCII characters, then print those, too. If none is, you may omit the ASCII column header.

image

8-13. Performance. In Section 8.6.2, we examined two basic ways of iterating over a sequence: (1) by sequence item, and (2) via sequence index. We pointed out at the end that the latter does not perform as well over the long haul (on my system here, a test suite shows performance is nearly twice as bad [83% worse]). Why do you think that is?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset