7 Functional programming with comprehensions

Programmers are always trying to do more with less code, while simultaneously making that code more reliable and easier to debug. And indeed, computer scientists have developed a number of techniques, each meant to bring us closer to that goal of short, reliable, maintainable, powerful code.

One set of techniques is known as functional programming. It aims to make programs more reliable by keeping functions short and data immutable. I think most developers would agree that short functions are a good idea, in no small part because they’re easier to understand, test, and maintain.

But how can you enforce the writing of short functions? Immutable data. If you can’t modify data from within a function, then the function will (in my experience) end up being shorter, with fewer potential paths to be tested. Functional programs thus end up having many short functions--in contrast with nonfunctional programs, which often have a smaller number of very long functions. Functional programming also assumes that functions can be passed as arguments to other functions, something that we’ve already seen to be the case in Python.

The good news is that functional techniques have the potential to make code short and elegant. The bad news is that for many developers, functional techniques aren’t natural. Not modifying any values, and not keeping track of state, might be great ways to make your software more reliable, but they’re almost guaranteed to confuse and frustrate many developers.

Consider, for example, that you have a Person object in a purely functional language. If the person wants to change their name, you’re out of luck, because all data is immutable. Instead, you’ll have to create a new person object based on the old one, but with the name changed. This isn’t terrible in and of itself, but given that the real world changes, and that we want our programs to model the real world, keeping everything immutable can be frustrating.

Then again, because functional languages can’t modify data, they generally provide mechanisms for taking a sequence of inputs, transforming them in some way, and producing a sequence of outputs. We might not be able to modify one Person object, but we can write a function that takes a list of Person objects, applies a Python expression to each one, and then gets a new list of Person objects back. In such a scenario, we perhaps haven’t modified our original data, but we’ve accomplished the task. And the code needed to do this is generally quite short.

Now, Python isn’t a functional language; we have mutable data types and assignment. But some functional techniques have made their way into the language and are considered standard Pythonic ways to solve some problems.

Specifically, Python offers comprehensions, a modern take on classic functions that originated in Lisp, one of the first high-level languages to be invented. Comprehensions make it relatively easy to create lists, sets, and dicts based on other data structures. The fact that Python’s functions are objects, and can thus be passed as arguments or stored in data structures, also comes from the functional world.

Some exercise solutions have already used, or hinted at, comprehensions. In this chapter, we’re going to concentrate on how and when to use these techniques, and expand on the ways we can use them.

In my experience, it’s common to be indifferent to functional techniques, and particularly to comprehensions, when first learning about them. But over time--and yes, it can take years!--developers increasingly understand how, when, and why to apply them. So even if you can solve the problems in this chapter without using functional techniques, the point here is to get your hands dirty, try them, and start to see the logic and elegance behind this way of doing things. The benefits might not be immediately obvious, but they’ll pay off over time.

If this all sounds very theoretical and you’d like to see some concrete examples of comprehensions versus traditional, procedural programming, then check out the “Writing comprehensions” sidebar coming up in this chapter, where I go through the differences more thoroughly.

Table 7.1 What you need to know

Concept

What is it?

Example

To learn more

List comprehension

Produces a list based on the elements of an iterable

[x*x for x in range(5)]

http://mng.bz/lGpy

Dict comprehension

Produces a dict based on the elements of an iterable

{x : 2*x for x in range(5)}

http://mng.bz/Vggy

Set comprehension

Produces a set based on the elements of an iterable

{x*x for x in range(5)}

http://mng.bz/GVxO

input

Prompts the user to enter a string, and returns a string

input('Name: ')

http://mng.bz/wB27

str.isdigit

Returns True or False, if the string is nonempty and contains only 0-9

# returns True

'5'.isdigit()

http://mng.bz/oPVN

str.split

Breaks strings apart, returning a list

# Returns ['ab', 'cd', 'ef']

'ab cd ef'.split()

http://mng.bz/aR4z

str.join

Combines strings to create a new one

# Returns 'ab*cd*ef'

'*'.join(['ab', 'cd', 'ef'])

http://mng.bz/gyYl

string.ascii _lowercase

All English lowercase letters

string.ascii_lowercase

http://mng.bz/zjxQ

enumerate

Returns an iterator of two-element tuples, with an index

enumerate('abcd')

http://mng.bz/qM1K

Exercise 28 Join numbers

People often ask me, “When should I use a comprehension, as opposed to a traditional for loop?”

My answer is basically as follows: when you want to transform an iterable into a list, you should use a comprehension. But if you just want to execute something for each element of an iterable, then a traditional for loop is better.

Put another way, is the point of your for loop the creation of a new list? If so, then use a comprehension. But if your goal is to execute something once for each element in an iterable, throwing away or ignoring any return value, then a for loop is preferable.

For example, I want to get the lengths of words in the string s. I can say

[len(one_word)
 for one_word in s.split()]

In this example, I care about the list we’re creating, so I use a comprehension.

But if my string s contains a list of filenames, and I want to create a new file for each of these filenames, then I’m not interested in the return value. Rather, I want to iterate over the filenames and create a file, as follows:

for one_filename in s.split():
    with open(one_filename, 'w') as f:
        f.write(f'{one_filename}
')

In this example, I open (and thus create) each file, and write to it the name of the file. Using a comprehension in this case would be inappropriate, because I’m not interested in the return value.

Transformations --taking values in a list, string, dict, or other iterable and producing a new list based on it--are common in programming. You might need to transform filenames into file objects, or words into their lengths, or usernames into user IDs. In all of these cases, a comprehension is the most Pythonic solution.

This exercise is meant to get your feet wet with comprehensions, and with implementing this idea. It might seem simple, but the underlying idea is deep and powerful and will help you to see additional opportunities to use comprehensions.

For this exercise, write a function (join_numbers) that takes a range of integers. The function should return those numbers as a string, with commas between the numbers. That is, given range(15) as input, the function should return this string:

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

Hint: if you’re thinking that str.join (http://mng.bz/gyYl) is a good idea here, then you’re mostly right--but remember that str.join won’t work on a list of integers.

Working it out

In this exercise, we want to use str.join on a range, which is similar to a list of integers. If we try to invoke str.join right away, we’ll get an error:

>>> numbers = range(15)
>>> ','.join(numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found

That’s because str.join only works on a sequence of strings. We’ll thus need to convert each of the integers in our range (numbers) into a string. Then, when we have a list of strings based on our range of integers, we can run str.join.

The solution is to use a list comprehension to invoke str on each of the numbers in the range. That will produce a list of strings, which is what str.join expects. How?

Consider this: a list comprehension says that we’re going to create a new list. The elements of the new list are all based on the elements in the source iterator, after an expression is run on them. What we’re doing is describing the new list in terms of the old one.

Here are some examples that can help you to see where and how to use list comprehensions:

  • I want to know the age of each student in a class. So we’re starting with a list of student objects and ending up with a list of integers. You can imagine a student _age function being applied to each student to get their age:

    [student_age(one_student)
     for one_student in all_students]
  • I want to know how many mm of rain fell on each day of the previous month. So we’re starting with a list of days and ending with a list of floats. You can imagine a daily_rain function being applied to each day:

    [daily_rain(one_day)
     for one_day in most_recent_month]
  • I want to know how many vowels were used in a book. So we would apply a number_of_vowels function to each word in the book, and then run the sum function on the resulting list:

    [number_of_vowels(one_word)
     for one_word in open(filename).read().split()]

If these three examples look quite similar, that’s because they are; part of the power of list comprehensions is the simple formula that we repeat. Each list comprehension contains two parts:

  • The source iterable

  • The expression we’ll invoke once for each element

In the case of our exercise here, we had a list of integers. By applying the str function on each int in the list, we got back a list of strings. str.join works fine on lists of strings.

Note We’ll get into the specifics of the iterator protocol in chapter 10, which is dedicated to that subject. You don’t need to understand those details to use comprehensions. However, if you’re particularly interested in what counts as an “iterable,” go ahead and read the first part of that chapter before continuing here.

Writing comprehensions

Comprehensions are traditionally written on a single line:

[x*x for x in range(5)]

I find that especially for new Python developers, but even for experienced ones, it’s hard to figure out what’s going on. Things get even worse if you add a condition:

[x*x for x in range(5) if x%2]

For this reason, I strongly suggest that Python developers break up their list comprehensions. Python is forgiving about whitespace if we’re inside of parentheses, which is always (by definition) the case when we’re in a comprehension. We can break up this comprehension as follows:

[x*x                  
for x in range(5)     
if x%2]               

Expression

Iteration

Condition

By separating the expression, iteration, and condition on different lines, the comprehension becomes more ... comprehensible. It’s also easier to experiment with the comprehension in this way. I’ll be writing most of my comprehensions in this book using this two- or three-line format, and I encourage you to do the same.

Note that using this technique, nested list comprehensions also become easier to understand:

[(x,y)              
for x in range(5)   
if x%2              
for y in range(5)   
if y%3 ]            

Expression

Iteration #1, from 0 through 4

Condition #1, ignoring even numbers

Iteration #2, from 0 through 4

Condition #2, ignore multiples of 3

In other words, this list comprehension produces pairs of integers in which the first number must be odd, and the second number can’t be divisible by 3. Nested comprehensions can be hard for anyone to understand, but when each of these sections appears on a line by itself, it’s easier to understand what’s happening.

Nested list comprehensions are great for working through complex data structures, such as lists of lists or lists of tuples. For example, let’s assume that I have a dict describing the countries and cities I’ve visited in the last year:

all_places = {'USA': ['Philadelphia', 'New York', 'Cleveland', 'San Jose', 'San Francisco'],
     'China': ['Beijing', 'Shanghai', 'Guangzhou'],
     'UK': ['London'],
     'India': ['Hyderabad']}

If I want a list of cities I’ve visited, ignoring the countries, I can use a nested list comprehension:

[one_city
 for one_country, all_cities in all_places.items()
 for one_city in all_cities]

I can also create a list of (city, country) tuples:

[(one_city, one_country)
 for one_country, all_cities in all_places.items()
 for one_city in all_cities]

And of course, I can always sort them using sorted:

[(one_city, one_country)
 for one_country, all_cities in sorted(all_places.items())
 for one_city in sorted(all_cities)]

Now, a list comprehension immediately produces a list--which, if you’re dealing with large quantities of data, can result in the use of a great deal of memory. For this reason, many Python developers would argue that we’d be better off using a generator expression (http://mng.bz/K2M0).

Generator expressions look just like list comprehensions, except that instead of using square brackets, they use regular, round parentheses. However, this turns out to make a big difference: a list comprehension has to create and return its output list in one fell swoop, which can potentially use lots of memory. A generator expression, by contrast, returns its output one piece at a time.

For example, consider

sum([x*x for x in range(100000)])

In this code, sum is given one input, a list of integers. It iterates over the list of integers and sums them. But consider that before sum can run, the comprehension needs to finish creating the entire list of integers. This list can potentially be quite large and consume a great deal of memory.

By contrast, consider this code:

sum((x*x for x in range(100000)))

Here, the input to sum isn’t a list; it’s a generator, one that we created via our generator expression. sum will return precisely the same result as it did previously. However, whereas our first example created a list containing 100,000 elements, the latter uses much less memory. The generator returns one element at a time, waiting for sum to request the next item in line. In this way, we’re only consuming one integer’s worth of memory at a time, rather than a huge list of integers’ memory. The bottom line, then, is that you can use generator expressions almost anywhere you can use comprehensions, but you’ll use much less memory.

It turns out that when we put a generator expression in a function call, we can remove the inner parentheses:

sum(x*x for x in range(100000))

And thus, here’s the syntax that you saw in the solution to this exercise, but using a generator expression:

numbers = range(15)

print(','.join(str(number)
                for number in numbers))

Solution

def join_numbers(numbers):
    return ','.join(str(number)               
                    for number in numbers)    
 
print(join_numbers(range(15)))

Applies str to each number and puts the new string in the output list

Iterates over the elements of numbers

You can work through a version of this code in the Python Tutor at http://mng.bz/ zj4w.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Here are a few ways you might want to go beyond this exercise, and push yourself to use list comprehensions in new ways:

  • As in the exercise, take a list of integers and turn them into strings. However, you’ll only want to produce strings for integers between 0 and 10. Doing this will require understanding the if statement in list comprehensions as well.

  • Given a list of strings containing hexadecimal numbers, sum the numbers together.

  • Use a list comprehension to reverse the word order of lines in a text file. That is, if the first line is abc def and the second line is ghi jkl, then you should return the list ['def abc', 'jkl ghi'].

map, filter, and comprehensions

Comprehensions, at their heart, do two different things. First, they transform one sequence into another, applying an expression on each element of the input sequence. Second, they filter out elements from the output. Here’s an example:

[x*x                   
 for x in range(10)    
 if x%2 == 0]          

x squared

For each number from 0-9

But only if x is even

The first line is where the transformation takes place, and the third line is where the filtering takes place. Before Python’s comprehensions, these features were traditionally implemented using two functions: map and filter. Indeed, these functions continue to exist in Python, even if they’re not used all that often.

map takes two arguments: a function and an iterable. It applies the function to each element of the iterable, returning a new iterable; for example

words = 'this is a bunch of words'.split()    
x = map(len, words)                           
print(sum(x))                                 

Creates a list of strings, and assigns it to “words”

Applies the len function to each word, resulting in an iterable of integers

Uses the sum function on x

Notice that map always returns an iterable that has the same length as its input. That’s because it doesn’t have a way to remove elements. It applies its input function once per input element. We can thus say that map transforms but doesn’t filter.

The function passed to map can be any function or method that takes a single argument. You can use built-in functions or write your own. The key thing to remember is that it’s the output from the function that’s placed in the output iterable.

filter also takes two arguments, a function and an iterable, and it applies the function to each element. But here, the output of the function determines whether the element will appear in the output--it doesn’t transform the element at all; for example

words = 'this is a bunch of words'.split()     
 
def is_a_long_word(one_word):                  
    return len(one_word) > 4
 
x = filter(is_a_long_word, words)              
print(' '.join(x))                             

Creates a list of strings, and assigns it to “words”

Defines a function that returns a True or False value, based on the word passed to it

Applies our function to each word in “words”

Shows the words that passed through the filter

While the function passed to filter doesn’t have to return a True or False value, its result will be interpreted as a Boolean and used to determine if the element is put into the output sequence. So it’s usually a good idea to pass a function that returns a True or False.

The combination of map and filter means that you can take an iterable, filter its elements, then apply a function to each of its elements. This turns out to be extremely useful and explains why map and filter have been around for so long--about 50 years, in fact.

The fact that functions can be passed as arguments is central to the ability of both map and filter to even execute. That’s one reason why these techniques are a core part of functional programming, because they require that functions can be treated as data.

That said, comprehensions are considered to be the modern way to do this kind of thing in Python. Whereas we pass functions to map and filter, we pass expressions to comprehensions.

Why, then, do map and filter continue to exist in the language, if comprehensions are considered to be better? Partly for nostalgic and historical reasons, but also because they can sometimes do things you can’t easily do with comprehensions. For example, map can take multiple iterables in its input and then apply functions that will work with each of them:

import operator                           
letters = 'abcd'                          
numbers = range(1,5)                      

x = map(operator.mul, letters, numbers)   
print(' '.join(x))                        

We’ll use operator.mul as our map function.

Sets up a four-element string

Sets up a four-element integer range

Applies operator.mul (multiply) to the corresponding elements of letters and numbers

Joins the strings together with spaces and prints the result

This code prints the following:

a bb ccc dddd

Using a comprehension, we could rewrite the code as

import operator
letters = 'abcd'
numbers = range(1,5)
 
print(' '.join(operator.mul(one_letter, one_number)
               for one_letter, one_number in zip(letters, numbers)))

Notice that to iterate over both letters and numbers at the same time, I had to use zip here. By contrast, map can simply take additional iterable arguments.

What is an expression?

An expression is anything in Python that returns a value. If that seems a bit abstract to you, then you can just think of an expression as anything you can assign to a variable, or return from a function. So 5 is an expression, as is 5+3, as is len('abcd').

When I say that comprehensions use expressions, rather than functions, I mean that we don’t pass a function. Rather, we just pass the thing that we want Python to evaluate, akin to passing the body of the function without passing the formal function definition.

Exercise 29 Add numbers

In the previous exercise, we took a sequence of numbers and turned it into a sequence of strings. This time, we’ll do the opposite--take a sequence of strings, turn them into numbers, and then sum them. But we’re going to make it a bit more complicated, because we’re going to filter out those strings that can’t be turned into integers.

Our function (sum_numbers) will take a string as an argument; for example

10 abc 20 de44 30 55fg 40

Given that input, our function should return 100. That’s because the function will ignore any word that contains nondigits.

Ask the user to enter integers, all at once, using input (http://mng.bz/wB27).

Working it out

In this exercise, we’re given a string, which we assume contains integers separated by spaces. We want to grab the individual integers from the string and then sum them together. The easiest way to do this is to invoke str.split on the string, which returns a list of strings. By invoking str.split without any parameters, we tell Python that any combination of whitespace should be used as a delimiter.

Now we have a list of strings, rather than a list of integers. What we need to do is iterate over the strings, turning each one into an integer by invoking int on it. The easiest way to turn a list of strings into a list of integers is to use a list comprehension, as in the solution code. In theory, we could then invoke the built-in sum function on the list of integers, and we would be done.

But there’s a catch. It’s possible that the user’s input includes elements that can’t be turned into integers. We need to get rid of those; if we try to run int on the string abcd, the program will exit with an error.

Fortunately, list comprehensions can help us here too. We can use the third (filtering) line of the comprehension to indicate that only those strings that can be turned into numbers will pass through to the first line. We do this with an if statement, applying the str.isdigit method to find out whether we can successfully turn the word into an integer.

We then invoke sum on the generator expression, returning an integer. Finally, we print the sum using an f-string.

Solution

def sum_numbers(numbers):
    return sum(int(number)                       
                for number in numbers.split()    
                if number.isdigit())             
 
print(sum_numbers('1 2 3 a b c 4'))

Creates an integer based on number

Iterates through each of the words in numbers

Ignores words that can’t be turned into integers

You can work through a version of this code in the Python Tutor at http://mng.bz/ 046p.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

One of the most common uses for list comprehensions, at least in my experience, is for doing this combination of transformation and filtering. Here are a few additional exercises you could do to ensure that you understand not just the syntax, but also their potential:

  • Show the lines of a text file that contain at least one vowel and contain more than 20 characters.

  • In the United States, phone numbers have 10 digits--a three-digit area code, followed by a seven-digit number. Several times during my childhood, area codes would run out of phone numbers, forcing half of the population to get a new area code. After such a split, XXX-YYY-ZZZZ might remain XXX-YYY-ZZZZ, or it might become NNN-YYY-ZZZZ, with NNN being the new area code. The decision regarding which numbers remained and which changed was often made based on the phone numbers’ final seven digits. Use a list comprehension to return a new list of strings, in which any phone number whose YYY begins with the digits 0-5 will have its area code changed to XXX+1. For example, given the list of strings ['123-456-7890', '123-333-4444', '123-777-8888'], we want to convert them to ['124-456-7890', '124-333-4444', '124-777-8888'].

  • Define a list of five dicts. Each dict will have two key-value pairs, name and age, containing a person’s name and age (in years). Use a list comprehension to produce a list of dicts in which each dict contains three key-value pairs: the original name, the original age, and a third age_in_months key, containing the person’s age in months. However, the output should exclude any of the input dicts representing people over 20 years of age.

Exercise 30 Flatten a list

It’s pretty common to use complex data structures to store information in Python. Sure, we could create a new class, but why do that when we can just use combinations of lists, tuples, and dicts? This means, though, that you’ll sometimes need to unravel those complex data structures, turning them into simpler ones.

In this exercise, we’ll practice doing such unraveling. Write a function that takes a list of lists (just one element deep) and returns a flat, one-dimensional version of the list. Thus, invoking

flatten([[1,2], [3,4]])

will return

[1,2,3,4]

Note that there are several possible solutions to this problem; I’m asking you to solve it with list comprehensions. Also note that we only need to worry about flattening a two-level list.

Working it out

As we’ve seen, list comprehensions allow us to evaluate an expression on each element of an iterable. But in a normal list comprehension, you can’t return more elements than were in the input iterable. If the input iterable has 10 elements, for example, you can only return 10, or fewer than 10 if you use an if clause.

Nested list comprehensions change this a bit, in that the result may contain as many elements as there are sub-elements of the input iterable. Given a list of lists, the first for loop will iterate over every element in mylist. But the second for loop will iterate over the elements of the inner list. We can produce one output element for each inner input element, and that’s what we do:

def flatten(mylist):
    return [one_element
            for one_sublist in mylist
            for one_element in one_sublist]

Solution

def flatten(mylist):
    return [one_element
            for one_sublist in mylist           
             for one_element in one_sublist]    
 
print(flatten([[1,2], [3,4]]))

Iterates through each element of mylist

Iterates through each element of one_sublist

You can work through a version of this code in the Python Tutor at http://mng.bz/ jg4P.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Nested list comprehensions can be a bit daunting at first, but they can be quite helpful in many circumstances. Here are some exercises you can try to improve your understanding of how to use them:

  • Write a version of the flatten function mentioned earlier called flatten_odd _ints. It’ll do the same thing as flatten, but the output will only contain odd integers. Inputs that are neither odd nor integers should be excluded. Inputs containing strings that could be converted to integers should be converted; other strings should be excluded.

  • Define a dict that represents the children and grandchildren in a family. (See figure 7.1 for a graphic representation.) Each key will be a child’s name, and each value will be a list of strings representing their children (i.e., the family’s grandchildren). Thus the dict {'A':['B', 'C', 'D'], 'E':['F', 'G']} means that A and E are siblings; A’s children are B, C, and D; and E’s children are F and G. Use a list comprehension to create a list of the grandchildren’s names.

  • Redo this exercise, but replace each grandchild’s name (currently a string) with a dict. Each dict will contain two name-value pairs, name and age. Produce a list of the grandchildren’s names, sorted by age, from eldest to youngest.

Figure 7.1 Graph of the family for nested list comprehensions

Exercise 31 Pig Latin translation of a file

List comprehensions are great when you want to transform a list. But they can actually work on any iterable--that is, any Python object on which you can run a for loop. This means that the source data for a list comprehension can be a string, list, tuple, dict, set, or even a file.

In this exercise, I want you to write a function that takes a filename as an argument. It returns a string with the file’s contents, but with each word translated into Pig Latin, as per our plword function in chapter 2 on “strings.” The returned translation can ignore newlines and isn’t required to handle capitalization and punctuation in any specific way.

Working it out

We’ve seen that nested list comprehensions can be used to iterate over complex data structures. In this case, we’re iterating over a file. And indeed, we could iterate over each line of the file.

But we can break the problem down further, using a nested list comprehension to first iterate over each line of the file, and then over each word within the current line. Our plword function can then operate on a single word at a time.

I realize that nested list comprehensions can be hard, at least at first, to read and understand. But as you use them, you’ll likely find that they allow you to elegantly break down a problem into its components.

There is a bit of a problem with what we’ve done here, but it might not seem obvious at first. List comprehensions, by their very nature, produce lists. This means that if we translate a large file into Pig Latin, we might find ourselves with a very long list. It would be better to return an iterator object that would save memory, only calculating the minimum necessary for each iteration.

It turns out that doing so is quite easy. We can use a generator expression (as suggested in this chapter’s first exercise), which looks almost precisely like a list comprehension, but using round parentheses rather than square brackets. We can put a generator expression in a call to str.join, just as we could put in a list comprehension, saving memory in the process.

Here’s how that code would look:

def plfile(filename):
    return ' '.join((plword(one_word)
                     for one_line in open(filename)
                     for one_word in one_line.split()))

But wait--it turns out that if you have a generator expression inside a function call, you don’t actually need both sets of parentheses. You can leave one out, which means the code will look like this:

def plfile(filename):
    return ' '.join(plword(one_word)
                     for one_line in open(filename)
                     for one_word in one_line.split())

We’ve now not only accomplished our original task, we’ve done so using less memory than a list comprehension requires. There might be a slight trade-off in terms of speed, but this is usually considered worthwhile, given the potential problems you’ll encounter reading a huge file into memory all at once.

Solution

def plword(word):
    if word[0] in 'aeiou':
        return word + 'way'
 
    return word[1:] + word[0] + 'ay'
 
 
def plfile(filename):
    return ' '.join(plword(one_word)
                    for one_line in open(filename)        
                     for one_word in one_line.split())    

Iterates through each line of filename

Iterates through each word in the current line

You can work through a version of this code in the Python Tutor at http://mng.bz/ K2xP.

Note that because the Python Tutor doesn’t support working with external files, I used an instance of StringIO to simulate a file.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Whenever you’re transforming and/or filtering complex or nested data structures, or (as in the case of a file) something that can be treated as a nested data structure, it’s often useful to use a nested list comprehension:

  • In this exercise, plfile applied the plword function to every word in a file. Write a new function, funcfile, that will take two arguments--a filename and a function. The output from the function should be a string, the result of invoking the function on each word in the text file. You can think of this as a generic version of plfile, one that can return any string value.

  • Use a nested list comprehension to transform a list of dicts into a list of two-element (name-value) tuples, each of which represents one of the name-value pairs in one of the dicts. If more than one dict has the same name-value pair, then the tuple should appear twice.

  • Assume that you have a list of dicts, in which each dict contains two name-value pairs: name and hobbies, where name is the person’s name and hobbies is a set of strings representing the person’s hobbies. What are the three most popular hobbies among the people listed in the dicts?

Exercise 32 Flip a dict

The combination of comprehensions and dicts can be quite powerful. You might want to modify an existing dict, removing or modifying certain elements. For example, you might want to remove all users whose ID number is lower than 500. Or you might want to find the user IDs of all users whose names begin with the letter “A”.

It’s also not uncommon to want to flip a dict--that is, to exchange its keys and values. Imagine a dict in which the keys are usernames and the values are user ID numbers; it might be useful to flip that so that you can search by ID number.

For this exercise, first create a dict of any size, in which the keys are unique and the values are also unique. (A key may appear as a value, or vice versa.) Here’s an example:

d = {'a':1, 'b':2, 'c':3}

Turn the dict inside out, such that the keys and the values are reversed.

Working it out

Just as list comprehensions provide an easy way to create lists based on another iterable, dict comprehensions provide an easy way to create a dict based on an iterable. The syntax is as follows:

{ KEY : VALUE
  for ITEM in ITERABLE }

In other words

  • The source for our dict comprehension is an iterable--typically a string, list, tuple, dict, set, or file.

  • We iterate over each such item in a for loop.

  • For each item, we then output a key-value pair.

Notice that a colon (:) separates the key from the value. That colon is part of the syntax, which means that the expressions on either side of the colon are evaluated separately and can’t share data.

In this particular case, we’re looping over the elements of a dict named d. We use the dict.items method to do so, which returns two values--the key and value--with each iteration. These two values are passed by parallel assignment to the variables key and value.

Another way of solving this exercise is to iterate over d, rather than over the output of d.items(). That would provide us with the keys, requiring that we retrieve each value:

{ d[key]:key for key in d }

In a comprehension, I’m trying to create a new object based on an old one. It’s all about the values that are returned by the expression at the start of the comprehension. By contrast, for loops are about commands, and executing those commands.

Consider what your goal is, and whether you’re better served with a comprehension or a for loop; for example

  • Given a string, you want a list of the ord values for each character. This should be a list comprehension, because you’re creating a list based on a string, which is iterable.

  • You have a list of dicts, in which each dict contains your friends’ first and last names, and you want to insert this data into a database. In this case, you’ll use a regular for loop, because you’re interested in the side effects, not the return value.

Solution

def flipped_dict(a_dict):
    return {value: key
            for key, value in a_dict.items()}      
 
print(flipped_dict({'a':1, 'b':2, 'c':3}))

All iterables are acceptable in a comprehension, even those that return two-element tuples, such as dict.items.

You can work through this code in the Python Tutor at http://mng.bz/905x.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Dict comprehensions provide us with a useful way to create new dicts. They’re typically used when you want to create a dict based on an iterable, such as a list or file. I’m especially partial to using them when I want to read from a file and turn the file’s contents into a dict. Here are some additional ideas for ways to practice the use of dict comprehensions:

  • Given a string containing several (space-separated) words, create a dict in which the keys are the words, and the values are the number of vowels in each word. If the string is “this is an easy test,” then the resulting dict would be {'this':1, 'is':1, 'an':1, 'easy':2, 'test':1}.

  • Create a dict whose keys are filenames and whose values are the lengths of the files. The input can be a list of files from os.listdir (http://mng.bz/YreB) or glob.glob (http://mng.bz/044N).

  • Find a configuration file in which the lines look like “name=value.” Use a dict comprehension to read from the file, turning each line into a key-value pair.

Exercise 33 Transform values

This exercise combines showing how you can receive a function as a function argument, and how comprehensions can help us to elegantly solve a wide variety of problems.

The built-in map (http://mng.bz/Ed2O) takes two arguments: (a) a function and (b) an iterable. It returns a new sequence, which is the result of applying the function to each element of the input iterable. A full discussion of map is in the earlier sidebar, “map, filter, and comprehensions.”

In this exercise, we’re going to create a slight variation on map, one that applies a function to each of the values of a dict. The result of invoking this function, transform_values, is a new dict whose keys are the same as the input dict, but whose values have been transformed by the function. (The name of the function comes from Ruby on Rails, which provides a function of the same name.) The function passed to transform_values should take a single argument, the dict’s value.

When your transform_values function works, you should be able to invoke it as follows:

d = {'a':1, 'b':2, 'c':3}
transform_values(lambda x: x*x, d)

The result of this call will be the following dict:

{'a': 1, 'b': 4, 'c': 9}

Working it out

The idea of transform_values is a simple one: you want to invoke a function repeatedly on the values of a dict. This means that you must iterate over the dict’s key-value pairs. For each pair, you want to invoke a user-supplied function on the value.

We know that functions can be passed as arguments, just like any other data types. In this case, we’re getting a function from the user so we can apply it. We apply functions with parentheses, so if we want to invoke the function func that the user passed to us, we simply say func(). Or in this case, since the function should take a single argument, we say func(value).

We can iterate over a dict’s key-value pairs with dict.items (http://mng.bz/ 4AeV), which returns an iterator that provides, one by one, the dict’s key-value pairs. But that doesn’t solve the problem of how to take these key-value pairs and turn them back into a dict.

The easiest, fastest, and most Pythonic way to create a dict based on an existing iterable is a dict comprehension. The dict we return from transform_values will have the same keys as our input dict. But as we iterate over the key-value pairs, we invoke func(value), applying the user-supplied function to each value we get and using the output from that expression as our value. We don’t even need to worry about what type of value the user-supplied function will return, because dict values can be of any type.

Solution

def transform_values(func, a_dict):
    return {key: func(value)                       
             for key, value in a_dict.items()}     
 
d = {'a':1, 'b':2, 'c':3}
print(transform_values(lambda x: x*x, d))

Applies the user-supplied function to each value in the dict

Iterates through each key-value pair in the dict

You can work through a version of this code in the Python Tutor at http://mng.bz/ jg2z.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Dict comprehensions are a powerful tool in any Python developer’s arsenal. They allow us to create new dicts based on existing iterables. However, they can take some time to get used to, and to integrate into your development. Here are some additional exercises you can try to improve your understanding and use of dict comprehensions:

  • Expand the transform_values exercise, taking two function arguments, rather than just one. The first function argument will work as before, being applied to the value and producing output. The second function argument takes two arguments, a key and a value, and determines whether there will be any output at all. That is, the second function will return True or False and will allow us to selectively create a key-value pair in the output dict.

  • Use a dict comprehension to create a dict in which the keys are usernames and the values are (integer) user IDs, based on a Unix-style /etc/passwd file. Hint: in a typical /etc/passwd file, the usernames are the first field in a row (i.e., index 0), and the user IDs are the third field in a row (i.e., index 2). If you need to download a sample /etc/passwd file, you can get it from http://mng.bz/ 2XXg. Note that this sample file contains comment lines, meaning that you’ll need to remove them when creating your dict.

  • Write a function that takes a directory name (i.e., a string) as an argument. The function should return a dict in which the keys are the names of files in that directory, and the values are the file sizes. You can use os.listdir or glob .glob to get the files, but because only regular files have sizes, you’ll want to filter the results using methods from os.path. To determine the file size, you can use os.stat or (if you prefer) just check the length of the string resulting from reading the file.

Exercise 34 (Almost) supervocalic words

Part of the beauty of Python’s basic data structures is that they can be used to solve a wide variety of problems. But it can sometimes be a challenge, especially at first, to decide which of the data structures is appropriate, and which of their methods will help you to solve problems most easily. Often, it’s a combination of techniques that will provide the greatest help.

In this exercise, I want you to write a get_sv function that returns a set of all “supervocalic” words in the dict. If you’ve never heard the term supervocalic before, you’re not alone: I only learned about such words several years ago. Simply put, such words contain all five vowels in English (a, e, i, o, and u), each of them appearing once and in alphabetical order.

For the purposes of this exercise, I’ll loosen the definition, accepting any word that has all five vowels, in any order and any number of times. Your function should find all of the words that match this definition (i.e., contain a, e, i, o, and u) and return a set containing them.

Your function should take a single argument: the name of a text file containing one word per line, as in a Unix/Linux dict. If you don’t have such a “words” file, you can download one from here: http://mng.bz/D2Rw.

Working it out

Before we can create a set of supervocalic words, or read from a file, we need to find a way to determine if a word is supervocalic. (Again, this isn’t the precise, official definition.) One way would be to use in five times, once for each vowel. But this seems a bit extreme and inefficient.

What we can instead do is create a set from our word. After all, a string is a sequence, and we can always create a set from any sequence with the set built in.

Fine, but how does that help us? If we already have a set of vowels, we can check to see if they’re all in the word with the < operator. Normally, < checks to see if one data point is less than another. But in the case of sets, it returns True if the item on the left is a subset of the item on the right.

This means that, given the word “superlogical,” I can do the following:

vowels = {'a', 'e', 'i', 'o', 'u'}
word = 'superlogical'
 
if vowels < set(word):
    print('Yes, it is supervocalic!')
else:
    print('Nope, just a regular word')

This is good for one word. But how can we do it for many words in a file? The answer could be a list comprehension. After all, we can think of our file as an iterator, one that returns strings. If the words file contains one word per line, then iterating over the lines of the file really means iterating over the different lines. If a set of the vowels is a set based on the current word, then we’ll consider it to be supervocalic and will include the current word in the output list.

But we don’t want a list, we want a set! Fortunately, the difference between creating a list comprehension and a set comprehension is a pair of brackets. We use square brackets ([]) for a list comprehension and curly braces ({}) for a set comprehension. A comprehension with curly braces and a colon is a dict comprehension; without the colon, it’s a set comprehension.

To summarize

  • We iterate over the lines of the file.

  • We turn each word into a set and check that the vowels are a subset of our word’s letters.

  • If the word passes this test, we include it (the word) in the output.

  • The output is all put into a set, thanks to a set comprehension.

Using sets as the basis for textual comparisons might not seem obvious, at least at first. But it’s good to learn to think in these ways, taking advantage of Python’s data structures in ways you never considered before.

Solution

def get_sv(filename):
    vowels = {'a', 'e', 'i', 'o', 'u'}         
 
    return {word.strip()                       
             for word in open(filename)        
             if vowels < set(word.lower())}    

Creates a set of the vowels

Returns the word, without any whitespace on either side

Iterates through each line in “filename”

Does this word contain all of the vowels?

You can work through a version of this code in the Python Tutor at http://mng.bz/ lG18. Note that because the Python Tutor doesn’t support working with external files, I used an instance of StringIO to simulate a file.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Set comprehensions are great in a variety of circumstances, including when you have inputs and you want to crunch them down to only have the distinct (unique) elements. Here are some additional ways for you to use and practice your set-comprehension chops:

  • In the /etc/passwd file you used earlier, what different shells (i.e., command interpreters, named in the final field on each line) are assigned to users? Use a set comprehension to gather them.

  • Given a text file, what are the lengths of the different words? Return a set of different word lengths in the file.

  • Create a list whose elements are strings--the names of people in your family. Now use a set comprehension (and, better yet, a nested set comprehension) to find which letters are used in your family members’ names.

Exercise 35a Gematria, part 1

In this exercise, we’re going to again try something that sits at the intersection of strings and comprehensions. This time, it’s dict comprehensions.

When you were little, you might have created or used a “secret” code in which a was 1, b was 2, c was 3, and so forth, until z (which was 26). This type of code happens to be quite ancient and was used by a number of different groups more than 2,000 years ago. “Gematria,” (http://mng.bz/B2R8) as it is known in Hebrew, is the way in which biblical verses have long been numbered. And of course, it’s not even worth describing it as a secret code, despite what you might have thought while little.

This exercise, the result of which you’ll use in the next one, asks that you create a dict whose keys are the (lowercase) letters of the English alphabet, and whose values are the numbers ranging from 1 to 26. And yes, you could simply type {'a':1, 'b':2, 'c':3} and so forth, but I’d like you to do this with a dict comprehension.

Working it out

The solution uses a number of different aspects of Python, combining them to create a dict with a minimum of code.

First, we want to create a dict, and thus turn to a dict comprehension. Our keys are going to be the lowercase letters of the English alphabet, and the values are going to be the numbers from 1 to 26.

We could create the string of lowercase letters. But, rather than doing that ourselves, we can rely on the string module, and its string.ascii_lowercase attribute, which comes in handy in such situations.

But how can we number the letters? We can use the enumerate built-in iterator, which will number our characters one at a time. We can then catch the iterated tuples via unpacking, grabbing the index and character separately:

{char:index
 for index, char in enumerate(string.ascii_lowercase)}

The only problem with doing this is that enumerate starts counting at 0, and we want to start counting at 1. We could, of course, just add 1 to the value of index. However, we can do even better than that by asking enumerate to start counting at 1, and we do so by passing 1 to it as the second argument:

{char:index
 for index, char in enumerate(string.ascii_lowercase, 1)}

And, sure enough, this produces the dict that we want. We’ll use it in the next exercise.

Solution

import string
 
def gematria_dict():
    return {char: index                               
             for index, char
                in enumerate(string.ascii_lowercase,
                             1)}                      
 
print(gematria_dict())

Returns the key-value pair, with the character and an integer

Iterates over lowercase letters with enumerate

You can work through a version of this code in the Python Tutor at http://mng.bz/ WPx4.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Dicts are also known as key-value pairs, for the simple reason that they contain keys and values--and because associations between two different types of data are extremely common in programming contexts. Often, if you can get your data into a dict, it becomes easier to work with and manipulate. For that reason, it’s important to know how to get information into a dict from a variety of different formats and sources. Here are some additional exercises to practice doing so:

  • Many programs’ functionality is modified via configuration files, which are often set using name-value pairs. That is, each line of the file contains text in the form of name=value, where the = sign separates the name from the value. I’ve prepared one such sample config file at http://mng.bz/rryD. Download this file, and then use a dict comprehension to read its contents from disk, turning it into a dict describing a user’s preferences. Note that all of the values will be strings.

  • Create a dict based on the config file, as in the previous exercise, but this time, all of the values should be integers. This means that you’ll need to filter out (and ignore) those values that can’t be turned into integers.

  • It’s sometimes useful to transform data from one format into another. Download a JSON-formatted list of the 1,000 largest cities in the United States from http://mng.bz/Vgd0. Using a dict comprehension, turn it into a dict in which the keys are the city names, and the values are the populations of those cities. Why are there only 925 key-value pairs in this dict? Now create a new dict, but set each key to be a tuple containing the state and city. Does that ensure there will be 1,000 key-value pairs?

Exercise 35b Gematria, part 2

In the previous exercise, you created a dict that allows you to get the numeric value from any lowercase letter. As you can imagine, we can use this dict not only to find the numeric value for a single letter, but to sum the values from the letters in a word, thus getting the word’s “value.” One of the games that Jewish mystics enjoy playing (although they would probably be horrified to hear me describe it as a game) is to find words with the same gematria value. If two words have the same gematria value, then they’re linked in some way.

In this exercise, you’ll write two functions:

  • gematria_for, which takes a single word (string) as an argument and returns the gematria score for that word

  • gematria_equal_words, which takes a single word and returns a list of those dict words whose gematria scores match the current word’s score.

For example, if the function is called with the word cat, with a gematria value of 24 (3 + 1 + 20), then the function will return a list of strings, all of whose gematria values are also 24. (This will be a long list!) Any nonlowercase characters in the user’s input should count 0 toward our final score for the word. Your source for the dict words will be the Unix file you used earlier in this chapter, which you can load into a list comprehension.

Working it out

This solution combines a large number of techniques that we’ve discussed so far in this book, and that you’re likely to use in your Python programming work. (However, I do hope that you’re not doing too many gematria calculations.)

First, how do we calculate the gematria score for a word, given our gematria dict? We want to iterate through each letter in a word, grabbing the score from the dict. And if the letter isn’t in the dict, we’ll give it a value of 0.

The standard way to do this would be with a for loop, using dict.get:

total = 0
for one_letter in word:
    total += gematria.get(one_letter, 0)

And there’s nothing wrong with this, per se. But comprehensions are usually your best bet when you’re starting with one iterable and trying to produce another iterable. And in this case, we can iterate over the letters in our word in a list comprehension, invoking sum on the list of integers that will result:

def gematria_for(word):
    return sum(gematria.get(one_char,0)
                for one_char in word)

Once we can calculate the gematria for one word, we need to find all of the dict words that are equivalent to it. We can do that, once again, with a list comprehension--this time, using the if clause to filter out those words whose gematria isn’t equal:

def gematria_equal_words(word):
    our_score = gematria_for(input_word.lower())
    return [one_word.strip()
            for one_word in open('/usr/share/dict/words')
            if gematria_for(one_word.lower()) == our_score]

As you can see, we’re forcing the words to be in lowercase. But we’re not modifying or otherwise transforming the word on the first line of our comprehension. Rather, we’re just filtering.

Meanwhile, we’re iterating over each of the words in the dict file. Each word in that file ends with a newline, which doesn’t affect our gematria score but isn’t something we want to return to the user in our list comprehension.

Finally, this exercise demonstrates that when you’re using a comprehension, and your output expression is a complex one, it’s often a good idea to create a separate function that you can repeatedly call.

Solution

import string
 
def gematria_dict():
    return {char: index
            for index, char
                in enumerate(string.ascii_lowercase,
                             1)}
 
GEMATRIA = gematria_dict()
 
def gematria_for(word):
    return sum(GEMATRIA.get(one_char, 0)            
                for one_char in word)               
 
 
def gematria_equal_words(input_word):
    our_score = gematria_for(input_word.lower())    
    return [one_word.strip()                        
            for one_word in
                open('/usr/share/dict/words')       
            if gematria_for(one_word.lower()) ==
                our_score]                          

Gets the value for the current character, or 0 if the character isn’t in the “GEMATRIA” dict

Iterates over the characters in “word”

Gets the total score for the input word

Removes leading and trailing whitespace from “one_word”

Iterates over each word in the English-language dict

Only adds the current word to our returned list if its gematria score matches ours

Note: there is no Python Tutor link for this exercise, because it uses an external file.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Once you have data in a dict, you can often use a comprehension to transform it in various ways. Here are some additional exercises you can use to sharpen your skills with dicts and dict comprehensions:

  • Create a dict whose keys are city names, and whose values are temperatures in Fahrenheit. Now use a dict comprehension to transform this dict into a new one, keeping the old keys but turning the values into the temperature in degrees Celsius.

  • Create a list of tuples in which each tuple contains three elements: (1) the author’s first and last names, (2) the book’s title, and (3) the book’s price in U.S. dollars. Use a dict comprehension to turn this into a dict whose keys are the book’s titles, with the values being another (sub -) dict, with keys for (a) the author’s first name, (b) the author’s last name, and (c) the book’s price in U.S. dollars.

  • Create a dict whose keys are currency names and whose values are the price of that currency in U.S. dollars. Write a function that asks the user what currency they use, then returns the dict from the previous exercise as before, but with its prices converted into the requested currency.

Summary

Comprehensions are, without a doubt, one of the most difficult topics for people to learn when they start using Python. The syntax is a bit weird, and it’s not even obvious where and when to use comprehensions. In this chapter, you saw many examples of how and when to use comprehensions, which will hopefully help you not only to use them, but also to see opportunities to do so.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset