Programmers are always trying to do more with less code, while simultaneously making that code more reliable and easier to debug. And indeed, computer scientists have developed a number of techniques, each meant to bring us closer to that goal of short, reliable, maintainable, powerful code.
One set of techniques is known as functional programming. It aims to make programs more reliable by keeping functions short and data immutable. I think most developers would agree that short functions are a good idea, in no small part because they’re easier to understand, test, and maintain.
But how can you enforce the writing of short functions? Immutable data. If you can’t modify data from within a function, then the function will (in my experience) end up being shorter, with fewer potential paths to be tested. Functional programs thus end up having many short functions--in contrast with nonfunctional programs, which often have a smaller number of very long functions. Functional programming also assumes that functions can be passed as arguments to other functions, something that we’ve already seen to be the case in Python.
The good news is that functional techniques have the potential to make code short and elegant. The bad news is that for many developers, functional techniques aren’t natural. Not modifying any values, and not keeping track of state, might be great ways to make your software more reliable, but they’re almost guaranteed to confuse and frustrate many developers.
Consider, for example, that you have a Person
object in a purely functional language. If the person wants to change their name, you’re out of luck, because all data is immutable. Instead, you’ll have to create a new person object based on the old one, but with the name changed. This isn’t terrible in and of itself, but given that the real world changes, and that we want our programs to model the real world, keeping everything immutable can be frustrating.
Then again, because functional languages can’t modify data, they generally provide mechanisms for taking a sequence of inputs, transforming them in some way, and producing a sequence of outputs. We might not be able to modify one Person
object, but we can write a function that takes a list of Person
objects, applies a Python expression to each one, and then gets a new list of Person
objects back. In such a scenario, we perhaps haven’t modified our original data, but we’ve accomplished the task. And the code needed to do this is generally quite short.
Now, Python isn’t a functional language; we have mutable data types and assignment. But some functional techniques have made their way into the language and are considered standard Pythonic ways to solve some problems.
Specifically, Python offers comprehensions, a modern take on classic functions that originated in Lisp, one of the first high-level languages to be invented. Comprehensions make it relatively easy to create lists, sets, and dicts based on other data structures. The fact that Python’s functions are objects, and can thus be passed as arguments or stored in data structures, also comes from the functional world.
Some exercise solutions have already used, or hinted at, comprehensions. In this chapter, we’re going to concentrate on how and when to use these techniques, and expand on the ways we can use them.
In my experience, it’s common to be indifferent to functional techniques, and particularly to comprehensions, when first learning about them. But over time--and yes, it can take years!--developers increasingly understand how, when, and why to apply them. So even if you can solve the problems in this chapter without using functional techniques, the point here is to get your hands dirty, try them, and start to see the logic and elegance behind this way of doing things. The benefits might not be immediately obvious, but they’ll pay off over time.
If this all sounds very theoretical and you’d like to see some concrete examples of comprehensions versus traditional, procedural programming, then check out the “Writing comprehensions” sidebar coming up in this chapter, where I go through the differences more thoroughly.
Returns |
|||
People often ask me, “When should I use a comprehension, as opposed to a traditional for
loop?”
My answer is basically as follows: when you want to transform an iterable into a list, you should use a comprehension. But if you just want to execute something for each element of an iterable, then a traditional for
loop is better.
Put another way, is the point of your for
loop the creation of a new list? If so, then use a comprehension. But if your goal is to execute something once for each element in an iterable, throwing away or ignoring any return value, then a for
loop is preferable.
For example, I want to get the lengths of words in the string s
. I can say
[len(one_word) for one_word in s.split()]
In this example, I care about the list we’re creating, so I use a comprehension.
But if my string s
contains a list of filenames, and I want to create a new file for each of these filenames, then I’m not interested in the return value. Rather, I want to iterate over the filenames and create a file, as follows:
for one_filename in s.split(): with open(one_filename, 'w') as f: f.write(f'{one_filename} ')
In this example, I open (and thus create) each file, and write to it the name of the file. Using a comprehension in this case would be inappropriate, because I’m not interested in the return value.
Transformations --taking values in a list, string, dict, or other iterable and producing a new list based on it--are common in programming. You might need to transform filenames into file objects, or words into their lengths, or usernames into user IDs. In all of these cases, a comprehension is the most Pythonic solution.
This exercise is meant to get your feet wet with comprehensions, and with implementing this idea. It might seem simple, but the underlying idea is deep and powerful and will help you to see additional opportunities to use comprehensions.
For this exercise, write a function (join_numbers
) that takes a range
of integers. The function should return those numbers as a string, with commas between the numbers. That is, given range(15)
as input, the function should return this string:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Hint: if you’re thinking that str.join
(http://mng.bz/gyYl) is a good idea here, then you’re mostly right--but remember that str.join
won’t work on a list of integers.
In this exercise, we want to use str.join
on a range, which is similar to a list of integers. If we try to invoke str.join
right away, we’ll get an error:
>>> numbers = range(15) >>> ','.join(numbers) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 0: expected str instance, int found
That’s because str.join
only works on a sequence of strings. We’ll thus need to convert each of the integers in our range (numbers
) into a string. Then, when we have a list of strings based on our range of integers, we can run str.join
.
The solution is to use a list comprehension to invoke str
on each of the numbers in the range. That will produce a list of strings, which is what str.join
expects. How?
Consider this: a list comprehension says that we’re going to create a new list. The elements of the new list are all based on the elements in the source iterator, after an expression is run on them. What we’re doing is describing the new list in terms of the old one.
Here are some examples that can help you to see where and how to use list comprehensions:
I want to know the age of each student in a class. So we’re starting with a list of student objects and ending up with a list of integers. You can imagine a student
_age
function being applied to each student to get their age:
[student_age(one_student) for one_student in all_students]
I want to know how many mm of rain fell on each day of the previous month. So we’re starting with a list of days and ending with a list of floats. You can imagine a daily_rain
function being applied to each day:
[daily_rain(one_day) for one_day in most_recent_month]
I want to know how many vowels were used in a book. So we would apply a number_of_vowels
function to each word in the book, and then run the sum
function on the resulting list:
[number_of_vowels(one_word) for one_word in open(filename).read().split()]
If these three examples look quite similar, that’s because they are; part of the power of list comprehensions is the simple formula that we repeat. Each list comprehension contains two parts:
In the case of our exercise here, we had a list of integers. By applying the str
function on each int in the list, we got back a list of strings. str.join
works fine on lists of strings.
Note We’ll get into the specifics of the iterator protocol in chapter 10, which is dedicated to that subject. You don’t need to understand those details to use comprehensions. However, if you’re particularly interested in what counts as an “iterable,” go ahead and read the first part of that chapter before continuing here.
Now, a list comprehension immediately produces a list--which, if you’re dealing with large quantities of data, can result in the use of a great deal of memory. For this reason, many Python developers would argue that we’d be better off using a generator expression (http://mng.bz/K2M0).
Generator expressions look just like list comprehensions, except that instead of using square brackets, they use regular, round parentheses. However, this turns out to make a big difference: a list comprehension has to create and return its output list in one fell swoop, which can potentially use lots of memory. A generator expression, by contrast, returns its output one piece at a time.
sum([x*x for x in range(100000)])
In this code, sum
is given one input, a list of integers. It iterates over the list of integers and sums them. But consider that before sum
can run, the comprehension needs to finish creating the entire list of integers. This list can potentially be quite large and consume a great deal of memory.
By contrast, consider this code:
sum((x*x for x in range(100000)))
Here, the input to sum
isn’t a list; it’s a generator, one that we created via our generator expression. sum
will return precisely the same result as it did previously. However, whereas our first example created a list containing 100,000 elements, the latter uses much less memory. The generator returns one element at a time, waiting for sum
to request the next item in line. In this way, we’re only consuming one integer’s worth of memory at a time, rather than a huge list of integers’ memory. The bottom line, then, is that you can use generator expressions almost anywhere you can use comprehensions, but you’ll use much less memory.
It turns out that when we put a generator expression in a function call, we can remove the inner parentheses:
sum(x*x for x in range(100000))
And thus, here’s the syntax that you saw in the solution to this exercise, but using a generator expression:
numbers = range(15) print(','.join(str(number) for number in numbers))
def join_numbers(numbers): return ','.join(str(number) ❶ for number in numbers) ❷ print(join_numbers(range(15)))
❶ Applies str to each number and puts the new string in the output list
❷ Iterates over the elements of numbers
You can work through a version of this code in the Python Tutor at http://mng.bz/ zj4w.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Here are a few ways you might want to go beyond this exercise, and push yourself to use list comprehensions in new ways:
As in the exercise, take a list of integers and turn them into strings. However, you’ll only want to produce strings for integers between 0 and 10. Doing this will require understanding the if
statement in list comprehensions as well.
Given a list of strings containing hexadecimal numbers, sum the numbers together.
Use a list comprehension to reverse the word order of lines in a text file. That is, if the first line is abc
def
and the second line is ghi
jkl
, then you should return the list ['def
abc',
'jkl
ghi']
.
In the previous exercise, we took a sequence of numbers and turned it into a sequence of strings. This time, we’ll do the opposite--take a sequence of strings, turn them into numbers, and then sum them. But we’re going to make it a bit more complicated, because we’re going to filter out those strings that can’t be turned into integers.
Our function (sum_numbers
) will take a string as an argument; for example
10 abc 20 de44 30 55fg 40
Given that input, our function should return 100
. That’s because the function will ignore any word that contains nondigits.
Ask the user to enter integers, all at once, using input
(http://mng.bz/wB27).
In this exercise, we’re given a string, which we assume contains integers separated by spaces. We want to grab the individual integers from the string and then sum them together. The easiest way to do this is to invoke str.split
on the string, which returns a list of strings. By invoking str.split
without any parameters, we tell Python that any combination of whitespace should be used as a delimiter.
Now we have a list of strings, rather than a list of integers. What we need to do is iterate over the strings, turning each one into an integer by invoking int
on it. The easiest way to turn a list of strings into a list of integers is to use a list comprehension, as in the solution code. In theory, we could then invoke the built-in sum
function on the list of integers, and we would be done.
But there’s a catch. It’s possible that the user’s input includes elements that can’t be turned into integers. We need to get rid of those; if we try to run int
on the string abcd
, the program will exit with an error.
Fortunately, list comprehensions can help us here too. We can use the third (filtering) line of the comprehension to indicate that only those strings that can be turned into numbers will pass through to the first line. We do this with an if
statement, applying the str.isdigit
method to find out whether we can successfully turn the word into an integer.
We then invoke sum
on the generator expression, returning an integer. Finally, we print the sum using an f-string.
def sum_numbers(numbers): return sum(int(number) ❶ for number in numbers.split() ❷ if number.isdigit()) ❸ print(sum_numbers('1 2 3 a b c 4'))
❶ Creates an integer based on number
❷ Iterates through each of the words in numbers
❸ Ignores words that can’t be turned into integers
You can work through a version of this code in the Python Tutor at http://mng.bz/ 046p.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
One of the most common uses for list comprehensions, at least in my experience, is for doing this combination of transformation and filtering. Here are a few additional exercises you could do to ensure that you understand not just the syntax, but also their potential:
Show the lines of a text file that contain at least one vowel and contain more than 20 characters.
In the United States, phone numbers have 10 digits--a three-digit area code, followed by a seven-digit number. Several times during my childhood, area codes would run out of phone numbers, forcing half of the population to get a new area code. After such a split, XXX-YYY-ZZZZ might remain XXX-YYY-ZZZZ, or it might become NNN-YYY-ZZZZ, with NNN being the new area code. The decision regarding which numbers remained and which changed was often made based on the phone numbers’ final seven digits. Use a list comprehension to return a new list of strings, in which any phone number whose YYY begins with the digits 0-5 will have its area code changed to XXX+1. For example, given the list of strings ['123-456-7890',
'123-333-4444',
'123-777-8888']
, we want to convert them to ['124-456-7890',
'124-333-4444',
'124-777-8888']
.
Define a list of five dicts. Each dict will have two key-value pairs, name
and age
, containing a person’s name and age (in years). Use a list comprehension to produce a list of dicts in which each dict contains three key-value pairs: the original name
, the original age
, and a third age_in_months
key, containing the person’s age in months. However, the output should exclude any of the input dicts representing people over 20 years of age.
It’s pretty common to use complex data structures to store information in Python. Sure, we could create a new class, but why do that when we can just use combinations of lists, tuples, and dicts? This means, though, that you’ll sometimes need to unravel those complex data structures, turning them into simpler ones.
In this exercise, we’ll practice doing such unraveling. Write a function that takes a list of lists (just one element deep) and returns a flat, one-dimensional version of the list. Thus, invoking
flatten([[1,2], [3,4]])
[1,2,3,4]
Note that there are several possible solutions to this problem; I’m asking you to solve it with list comprehensions. Also note that we only need to worry about flattening a two-level list.
As we’ve seen, list comprehensions allow us to evaluate an expression on each element of an iterable. But in a normal list comprehension, you can’t return more elements than were in the input iterable. If the input iterable has 10 elements, for example, you can only return 10, or fewer than 10 if you use an if
clause.
Nested list comprehensions change this a bit, in that the result may contain as many elements as there are sub-elements of the input iterable. Given a list of lists, the first for
loop will iterate over every element in mylist
. But the second for
loop will iterate over the elements of the inner list. We can produce one output element for each inner input element, and that’s what we do:
def flatten(mylist): return [one_element for one_sublist in mylist for one_element in one_sublist]
def flatten(mylist): return [one_element for one_sublist in mylist ❶ for one_element in one_sublist] ❷ print(flatten([[1,2], [3,4]]))
❶ Iterates through each element of mylist
❷ Iterates through each element of one_sublist
You can work through a version of this code in the Python Tutor at http://mng.bz/ jg4P.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Nested list comprehensions can be a bit daunting at first, but they can be quite helpful in many circumstances. Here are some exercises you can try to improve your understanding of how to use them:
Write a version of the flatten
function mentioned earlier called flatten_odd _ints
. It’ll do the same thing as flatten
, but the output will only contain odd integers. Inputs that are neither odd nor integers should be excluded. Inputs containing strings that could be converted to integers should be converted; other strings should be excluded.
Define a dict that represents the children and grandchildren in a family. (See figure 7.1 for a graphic representation.) Each key will be a child’s name, and each value will be a list of strings representing their children (i.e., the family’s grandchildren). Thus the dict {'A':['B',
'C',
'D'],
'E':['F',
'G']}
means that A
and E
are siblings; A
’s children are B
, C
, and D
; and E
’s children are F
and G
. Use a list comprehension to create a list of the grandchildren’s names.
Redo this exercise, but replace each grandchild’s name (currently a string) with a dict. Each dict will contain two name-value pairs, name
and age
. Produce a list of the grandchildren’s names, sorted by age, from eldest to youngest.
List comprehensions are great when you want to transform a list. But they can actually work on any iterable--that is, any Python object on which you can run a for
loop. This means that the source data for a list comprehension can be a string, list, tuple, dict, set, or even a file.
In this exercise, I want you to write a function that takes a filename as an argument. It returns a string with the file’s contents, but with each word translated into Pig Latin, as per our plword
function in chapter 2 on “strings.” The returned translation can ignore newlines and isn’t required to handle capitalization and punctuation in any specific way.
We’ve seen that nested list comprehensions can be used to iterate over complex data structures. In this case, we’re iterating over a file. And indeed, we could iterate over each line of the file.
But we can break the problem down further, using a nested list comprehension to first iterate over each line of the file, and then over each word within the current line. Our plword
function can then operate on a single word at a time.
I realize that nested list comprehensions can be hard, at least at first, to read and understand. But as you use them, you’ll likely find that they allow you to elegantly break down a problem into its components.
There is a bit of a problem with what we’ve done here, but it might not seem obvious at first. List comprehensions, by their very nature, produce lists. This means that if we translate a large file into Pig Latin, we might find ourselves with a very long list. It would be better to return an iterator object that would save memory, only calculating the minimum necessary for each iteration.
It turns out that doing so is quite easy. We can use a generator expression (as suggested in this chapter’s first exercise), which looks almost precisely like a list comprehension, but using round parentheses rather than square brackets. We can put a generator expression in a call to str.join
, just as we could put in a list comprehension, saving memory in the process.
Here’s how that code would look:
def plfile(filename): return ' '.join((plword(one_word) for one_line in open(filename) for one_word in one_line.split()))
But wait--it turns out that if you have a generator expression inside a function call, you don’t actually need both sets of parentheses. You can leave one out, which means the code will look like this:
def plfile(filename): return ' '.join(plword(one_word) for one_line in open(filename) for one_word in one_line.split())
We’ve now not only accomplished our original task, we’ve done so using less memory than a list comprehension requires. There might be a slight trade-off in terms of speed, but this is usually considered worthwhile, given the potential problems you’ll encounter reading a huge file into memory all at once.
def plword(word): if word[0] in 'aeiou': return word + 'way' return word[1:] + word[0] + 'ay' def plfile(filename): return ' '.join(plword(one_word) for one_line in open(filename) ❶ for one_word in one_line.split()) ❷
❶ Iterates through each line of filename
❷ Iterates through each word in the current line
You can work through a version of this code in the Python Tutor at http://mng.bz/ K2xP.
Note that because the Python Tutor doesn’t support working with external files, I used an instance of StringIO
to simulate a file.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Whenever you’re transforming and/or filtering complex or nested data structures, or (as in the case of a file) something that can be treated as a nested data structure, it’s often useful to use a nested list comprehension:
In this exercise, plfile
applied the plword
function to every word in a file. Write a new function, funcfile
, that will take two arguments--a filename and a function. The output from the function should be a string, the result of invoking the function on each word in the text file. You can think of this as a generic version of plfile
, one that can return any string value.
Use a nested list comprehension to transform a list of dicts into a list of two-element (name-value) tuples, each of which represents one of the name-value pairs in one of the dicts. If more than one dict has the same name-value pair, then the tuple should appear twice.
Assume that you have a list of dicts, in which each dict contains two name-value pairs: name
and hobbies
, where name
is the person’s name and hobbies
is a set of strings representing the person’s hobbies. What are the three most popular hobbies among the people listed in the dicts?
The combination of comprehensions and dicts can be quite powerful. You might want to modify an existing dict, removing or modifying certain elements. For example, you might want to remove all users whose ID number is lower than 500. Or you might want to find the user IDs of all users whose names begin with the letter “A”.
It’s also not uncommon to want to flip a dict--that is, to exchange its keys and values. Imagine a dict in which the keys are usernames and the values are user ID numbers; it might be useful to flip that so that you can search by ID number.
For this exercise, first create a dict of any size, in which the keys are unique and the values are also unique. (A key may appear as a value, or vice versa.) Here’s an example:
d = {'a':1, 'b':2, 'c':3}
Turn the dict inside out, such that the keys and the values are reversed.
Just as list comprehensions provide an easy way to create lists based on another iterable, dict comprehensions provide an easy way to create a dict based on an iterable. The syntax is as follows:
{ KEY : VALUE for ITEM in ITERABLE }
The source for our dict comprehension is an iterable--typically a string, list, tuple, dict, set, or file.
Notice that a colon (:
) separates the key from the value. That colon is part of the syntax, which means that the expressions on either side of the colon are evaluated separately and can’t share data.
In this particular case, we’re looping over the elements of a dict named d
. We use the dict.items
method to do so, which returns two values--the key and value--with each iteration. These two values are passed by parallel assignment to the variables key
and value
.
Another way of solving this exercise is to iterate over d
, rather than over the output of d.items()
. That would provide us with the keys, requiring that we retrieve each value:
{ d[key]:key for key in d }
In a comprehension, I’m trying to create a new object based on an old one. It’s all about the values that are returned by the expression at the start of the comprehension. By contrast, for
loops are about commands, and executing those commands.
Consider what your goal is, and whether you’re better served with a comprehension or a for
loop; for example
Given a string, you want a list of the ord
values for each character. This should be a list comprehension, because you’re creating a list based on a string, which is iterable.
You have a list of dicts, in which each dict contains your friends’ first and last names, and you want to insert this data into a database. In this case, you’ll use a regular for
loop, because you’re interested in the side effects, not the return value.
def flipped_dict(a_dict):
return {value: key
for key, value in a_dict.items()} ❶
print(flipped_dict({'a':1, 'b':2, 'c':3}))
❶ All iterables are acceptable in a comprehension, even those that return two-element tuples, such as dict.items.
You can work through this code in the Python Tutor at http://mng.bz/905x.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Dict comprehensions provide us with a useful way to create new dicts. They’re typically used when you want to create a dict based on an iterable, such as a list or file. I’m especially partial to using them when I want to read from a file and turn the file’s contents into a dict. Here are some additional ideas for ways to practice the use of dict comprehensions:
Given a string containing several (space-separated) words, create a dict in which the keys are the words, and the values are the number of vowels in each word. If the string is “this is an easy test,” then the resulting dict would be {'this':1,
'is':1,
'an':1,
'easy':2,
'test':1}
.
Create a dict whose keys are filenames and whose values are the lengths of the files. The input can be a list of files from os.listdir
(http://mng.bz/YreB) or glob.glob
(http://mng.bz/044N).
Find a configuration file in which the lines look like “name=value.” Use a dict comprehension to read from the file, turning each line into a key-value pair.
This exercise combines showing how you can receive a function as a function argument, and how comprehensions can help us to elegantly solve a wide variety of problems.
The built-in map
(http://mng.bz/Ed2O) takes two arguments: (a) a function and (b) an iterable. It returns a new sequence, which is the result of applying the function to each element of the input iterable. A full discussion of map
is in the earlier sidebar, “map
, filter
, and comprehensions.”
In this exercise, we’re going to create a slight variation on map
, one that applies a function to each of the values of a dict. The result of invoking this function, transform_values
, is a new dict whose keys are the same as the input dict, but whose values have been transformed by the function. (The name of the function comes from Ruby on Rails, which provides a function of the same name.) The function passed to transform_values
should take a single argument, the dict’s value.
When your transform_values
function works, you should be able to invoke it as follows:
d = {'a':1, 'b':2, 'c':3} transform_values(lambda x: x*x, d)
The result of this call will be the following dict:
{'a': 1, 'b': 4, 'c': 9}
The idea of transform_values
is a simple one: you want to invoke a function repeatedly on the values of a dict. This means that you must iterate over the dict’s key-value pairs. For each pair, you want to invoke a user-supplied function on the value.
We know that functions can be passed as arguments, just like any other data types. In this case, we’re getting a function from the user so we can apply it. We apply functions with parentheses, so if we want to invoke the function func
that the user passed to us, we simply say func()
. Or in this case, since the function should take a single argument, we say func(value)
.
We can iterate over a dict’s key-value pairs with dict.items
(http://mng.bz/ 4AeV), which returns an iterator that provides, one by one, the dict’s key-value pairs. But that doesn’t solve the problem of how to take these key-value pairs and turn them back into a dict.
The easiest, fastest, and most Pythonic way to create a dict based on an existing iterable is a dict comprehension. The dict we return from transform_values
will have the same keys as our input dict. But as we iterate over the key-value pairs, we invoke func(value)
, applying the user-supplied function to each value we get and using the output from that expression as our value. We don’t even need to worry about what type of value the user-supplied function will return, because dict values can be of any type.
def transform_values(func, a_dict): return {key: func(value) ❶ for key, value in a_dict.items()} ❷ d = {'a':1, 'b':2, 'c':3} print(transform_values(lambda x: x*x, d))
❶ Applies the user-supplied function to each value in the dict
❷ Iterates through each key-value pair in the dict
You can work through a version of this code in the Python Tutor at http://mng.bz/ jg2z.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Dict comprehensions are a powerful tool in any Python developer’s arsenal. They allow us to create new dicts based on existing iterables. However, they can take some time to get used to, and to integrate into your development. Here are some additional exercises you can try to improve your understanding and use of dict comprehensions:
Expand the transform_values
exercise, taking two function arguments, rather than just one. The first function argument will work as before, being applied to the value and producing output. The second function argument takes two arguments, a key and a value, and determines whether there will be any output at all. That is, the second function will return True
or False
and will allow us to selectively create a key-value pair in the output dict.
Use a dict comprehension to create a dict in which the keys are usernames and the values are (integer) user IDs, based on a Unix-style /etc/passwd
file. Hint: in a typical /etc/passwd
file, the usernames are the first field in a row (i.e., index 0), and the user IDs are the third field in a row (i.e., index 2). If you need to download a sample /etc/passwd
file, you can get it from http://mng.bz/ 2XXg. Note that this sample file contains comment lines, meaning that you’ll need to remove them when creating your dict.
Write a function that takes a directory name (i.e., a string) as an argument. The function should return a dict in which the keys are the names of files in that directory, and the values are the file sizes. You can use os.listdir
or glob .glob
to get the files, but because only regular files have sizes, you’ll want to filter the results using methods from os.path
. To determine the file size, you can use os.stat
or (if you prefer) just check the length of the string resulting from reading the file.
Part of the beauty of Python’s basic data structures is that they can be used to solve a wide variety of problems. But it can sometimes be a challenge, especially at first, to decide which of the data structures is appropriate, and which of their methods will help you to solve problems most easily. Often, it’s a combination of techniques that will provide the greatest help.
In this exercise, I want you to write a get_sv
function that returns a set of all “supervocalic” words in the dict. If you’ve never heard the term supervocalic before, you’re not alone: I only learned about such words several years ago. Simply put, such words contain all five vowels in English (a, e, i, o, and u), each of them appearing once and in alphabetical order.
For the purposes of this exercise, I’ll loosen the definition, accepting any word that has all five vowels, in any order and any number of times. Your function should find all of the words that match this definition (i.e., contain a, e, i, o, and u) and return a set containing them.
Your function should take a single argument: the name of a text file containing one word per line, as in a Unix/Linux dict. If you don’t have such a “words” file, you can download one from here: http://mng.bz/D2Rw.
Before we can create a set of supervocalic words, or read from a file, we need to find a way to determine if a word is supervocalic. (Again, this isn’t the precise, official definition.) One way would be to use in
five times, once for each vowel. But this seems a bit extreme and inefficient.
What we can instead do is create a set from our word. After all, a string is a sequence, and we can always create a set from any sequence with the set
built in.
Fine, but how does that help us? If we already have a set of vowels, we can check to see if they’re all in the word with the <
operator. Normally, <
checks to see if one data point is less than another. But in the case of sets, it returns True
if the item on the left is a subset of the item on the right.
This means that, given the word “superlogical,” I can do the following:
vowels = {'a', 'e', 'i', 'o', 'u'} word = 'superlogical' if vowels < set(word): print('Yes, it is supervocalic!') else: print('Nope, just a regular word')
This is good for one word. But how can we do it for many words in a file? The answer could be a list comprehension. After all, we can think of our file as an iterator, one that returns strings. If the words
file contains one word per line, then iterating over the lines of the file really means iterating over the different lines. If a set of the vowels is a set based on the current word, then we’ll consider it to be supervocalic and will include the current word in the output list.
But we don’t want a list, we want a set! Fortunately, the difference between creating a list comprehension and a set comprehension is a pair of brackets. We use square brackets ([]
) for a list comprehension and curly braces ({}
) for a set comprehension. A comprehension with curly braces and a colon is a dict comprehension; without the colon, it’s a set comprehension.
We turn each word into a set and check that the vowels are a subset of our word’s letters.
If the word passes this test, we include it (the word) in the output.
The output is all put into a set, thanks to a set comprehension.
Using sets as the basis for textual comparisons might not seem obvious, at least at first. But it’s good to learn to think in these ways, taking advantage of Python’s data structures in ways you never considered before.
def get_sv(filename): vowels = {'a', 'e', 'i', 'o', 'u'} ❶ return {word.strip() ❷ for word in open(filename) ❸ if vowels < set(word.lower())} ❹
❷ Returns the word, without any whitespace on either side
❸ Iterates through each line in “filename”
❹ Does this word contain all of the vowels?
You can work through a version of this code in the Python Tutor at http://mng.bz/ lG18. Note that because the Python Tutor doesn’t support working with external files, I used an instance of StringIO
to simulate a file.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Set comprehensions are great in a variety of circumstances, including when you have inputs and you want to crunch them down to only have the distinct (unique) elements. Here are some additional ways for you to use and practice your set-comprehension chops:
In the /etc/passwd
file you used earlier, what different shells (i.e., command interpreters, named in the final field on each line) are assigned to users? Use a set comprehension to gather them.
Given a text file, what are the lengths of the different words? Return a set of different word lengths in the file.
Create a list whose elements are strings--the names of people in your family. Now use a set comprehension (and, better yet, a nested set comprehension) to find which letters are used in your family members’ names.
In this exercise, we’re going to again try something that sits at the intersection of strings and comprehensions. This time, it’s dict comprehensions.
When you were little, you might have created or used a “secret” code in which a
was 1, b
was 2, c
was 3, and so forth, until z (which was 26). This type of code happens to be quite ancient and was used by a number of different groups more than 2,000 years ago. “Gematria,” (http://mng.bz/B2R8) as it is known in Hebrew, is the way in which biblical verses have long been numbered. And of course, it’s not even worth describing it as a secret code, despite what you might have thought while little.
This exercise, the result of which you’ll use in the next one, asks that you create a dict whose keys are the (lowercase) letters of the English alphabet, and whose values are the numbers ranging from 1 to 26. And yes, you could simply type {'a':1,
'b':2,
'c':3}
and so forth, but I’d like you to do this with a dict comprehension.
The solution uses a number of different aspects of Python, combining them to create a dict with a minimum of code.
First, we want to create a dict, and thus turn to a dict comprehension. Our keys are going to be the lowercase letters of the English alphabet, and the values are going to be the numbers from 1 to 26.
We could create the string of lowercase letters. But, rather than doing that ourselves, we can rely on the string
module, and its string.ascii_lowercase
attribute, which comes in handy in such situations.
But how can we number the letters? We can use the enumerate
built-in iterator, which will number our characters one at a time. We can then catch the iterated tuples via unpacking, grabbing the index and character separately:
{char:index for index, char in enumerate(string.ascii_lowercase)}
The only problem with doing this is that enumerate
starts counting at 0, and we want to start counting at 1. We could, of course, just add 1 to the value of index
. However, we can do even better than that by asking enumerate
to start counting at 1, and we do so by passing 1 to it as the second argument:
{char:index for index, char in enumerate(string.ascii_lowercase, 1)}
And, sure enough, this produces the dict that we want. We’ll use it in the next exercise.
import string def gematria_dict(): return {char: index ❶ for index, char in enumerate(string.ascii_lowercase, 1)} ❷ print(gematria_dict())
❶ Returns the key-value pair, with the character and an integer
❷ Iterates over lowercase letters with enumerate
You can work through a version of this code in the Python Tutor at http://mng.bz/ WPx4.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Dicts are also known as key-value pairs, for the simple reason that they contain keys and values--and because associations between two different types of data are extremely common in programming contexts. Often, if you can get your data into a dict, it becomes easier to work with and manipulate. For that reason, it’s important to know how to get information into a dict from a variety of different formats and sources. Here are some additional exercises to practice doing so:
Many programs’ functionality is modified via configuration files, which are often set using name-value pairs. That is, each line of the file contains text in the form of name=value
, where the =
sign separates the name from the value. I’ve prepared one such sample config file at http://mng.bz/rryD. Download this file, and then use a dict comprehension to read its contents from disk, turning it into a dict describing a user’s preferences. Note that all of the values will be strings.
Create a dict based on the config file, as in the previous exercise, but this time, all of the values should be integers. This means that you’ll need to filter out (and ignore) those values that can’t be turned into integers.
It’s sometimes useful to transform data from one format into another. Download a JSON-formatted list of the 1,000 largest cities in the United States from http://mng.bz/Vgd0. Using a dict comprehension, turn it into a dict in which the keys are the city names, and the values are the populations of those cities. Why are there only 925 key-value pairs in this dict? Now create a new dict, but set each key to be a tuple containing the state and city. Does that ensure there will be 1,000 key-value pairs?
In the previous exercise, you created a dict that allows you to get the numeric value from any lowercase letter. As you can imagine, we can use this dict not only to find the numeric value for a single letter, but to sum the values from the letters in a word, thus getting the word’s “value.” One of the games that Jewish mystics enjoy playing (although they would probably be horrified to hear me describe it as a game) is to find words with the same gematria value. If two words have the same gematria value, then they’re linked in some way.
In this exercise, you’ll write two functions:
gematria_for
, which takes a single word (string) as an argument and returns the gematria score for that word
gematria_equal_words
, which takes a single word and returns a list of those dict words whose gematria scores match the current word’s score.
For example, if the function is called with the word cat
, with a gematria value of 24 (3 + 1 + 20), then the function will return a list of strings, all of whose gematria values are also 24. (This will be a long list!) Any nonlowercase characters in the user’s input should count 0 toward our final score for the word. Your source for the dict words will be the Unix file you used earlier in this chapter, which you can load into a list comprehension.
This solution combines a large number of techniques that we’ve discussed so far in this book, and that you’re likely to use in your Python programming work. (However, I do hope that you’re not doing too many gematria calculations.)
First, how do we calculate the gematria score for a word, given our gematria
dict? We want to iterate through each letter in a word, grabbing the score from the dict. And if the letter isn’t in the dict, we’ll give it a value of 0.
The standard way to do this would be with a for
loop, using dict.get
:
total = 0 for one_letter in word: total += gematria.get(one_letter, 0)
And there’s nothing wrong with this, per se. But comprehensions are usually your best bet when you’re starting with one iterable and trying to produce another iterable. And in this case, we can iterate over the letters in our word in a list comprehension, invoking sum
on the list of integers that will result:
def gematria_for(word): return sum(gematria.get(one_char,0) for one_char in word)
Once we can calculate the gematria for one word, we need to find all of the dict words that are equivalent to it. We can do that, once again, with a list comprehension--this time, using the if
clause to filter out those words whose gematria isn’t equal:
def gematria_equal_words(word): our_score = gematria_for(input_word.lower()) return [one_word.strip() for one_word in open('/usr/share/dict/words') if gematria_for(one_word.lower()) == our_score]
As you can see, we’re forcing the words to be in lowercase. But we’re not modifying or otherwise transforming the word on the first line of our comprehension. Rather, we’re just filtering.
Meanwhile, we’re iterating over each of the words in the dict file. Each word in that file ends with a newline, which doesn’t affect our gematria score but isn’t something we want to return to the user in our list comprehension.
Finally, this exercise demonstrates that when you’re using a comprehension, and your output expression is a complex one, it’s often a good idea to create a separate function that you can repeatedly call.
import string def gematria_dict(): return {char: index for index, char in enumerate(string.ascii_lowercase, 1)} GEMATRIA = gematria_dict() def gematria_for(word): return sum(GEMATRIA.get(one_char, 0) ❶ for one_char in word) ❷ def gematria_equal_words(input_word): our_score = gematria_for(input_word.lower()) ❸ return [one_word.strip() ❹ for one_word in open('/usr/share/dict/words') ❺ if gematria_for(one_word.lower()) == our_score] ❻
❶ Gets the value for the current character, or 0 if the character isn’t in the “GEMATRIA” dict
❷ Iterates over the characters in “word”
❸ Gets the total score for the input word
❹ Removes leading and trailing whitespace from “one_word”
❺ Iterates over each word in the English-language dict
❻ Only adds the current word to our returned list if its gematria score matches ours
Note: there is no Python Tutor link for this exercise, because it uses an external file.
Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.
Once you have data in a dict, you can often use a comprehension to transform it in various ways. Here are some additional exercises you can use to sharpen your skills with dicts and dict comprehensions:
Create a dict whose keys are city names, and whose values are temperatures in Fahrenheit. Now use a dict comprehension to transform this dict into a new one, keeping the old keys but turning the values into the temperature in degrees Celsius.
Create a list of tuples in which each tuple contains three elements: (1) the author’s first and last names, (2) the book’s title, and (3) the book’s price in U.S. dollars. Use a dict comprehension to turn this into a dict whose keys are the book’s titles, with the values being another (sub -) dict, with keys for (a) the author’s first name, (b) the author’s last name, and (c) the book’s price in U.S. dollars.
Create a dict whose keys are currency names and whose values are the price of that currency in U.S. dollars. Write a function that asks the user what currency they use, then returns the dict from the previous exercise as before, but with its prices converted into the requested currency.
Comprehensions are, without a doubt, one of the most difficult topics for people to learn when they start using Python. The syntax is a bit weird, and it’s not even obvious where and when to use comprehensions. In this chapter, you saw many examples of how and when to use comprehensions, which will hopefully help you not only to use them, but also to see opportunities to do so.