Chapter 3. The Python Language

This chapter is a guide to the Python language. To learn Python from scratch, we suggest you start with the appropriate links from https://www.python.org/about/gettingstarted/, depending on whether you’re a programming beginner or already have some programming experience. If you already know other programming languages well, and just want to learn specifics about Python, this chapter is for you. However, we’re not trying to teach Python: we cover a lot of ground at a pretty fast pace. We focus on the rules, and only secondarily point out best practices and style; as your Python style guide, use PEP 8 (optionally augmented by extra guidelines such as The Hitchhiker’s Guide’s, CKAN’s, and/or Google’s).

Lexical Structure

The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language, specifying such things as what variable names look like and how to denote comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully consider it a sequence of lines, tokens, or statements. These different lexical views complement each other. Python is very particular about program layout, especially regarding lines and indentation: pay attention to this information if you are coming to Python from another language.

Lines and Indentation

A Python program is a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A hash sign # that is not inside a string literal starts a comment. All characters after the #, up to but excluding the line end, are the comment: Python ignores them. A line containing only whitespace, possibly with a comment, is a blank line: Python ignores it.1

In Python, the end of a physical line marks the end of most statements. Unlike in other languages, you don’t normally terminate Python statements with a delimiter, such as a semicolon (;). When a statement is too long to fit on a physical line, you can join two adjacent physical lines into a logical line by ensuring that the first physical line has no comment and ends with a backslash (). However, Python also automatically joins adjacent physical lines into one logical line if an open parenthesis ((), bracket ([), or brace ({) has not yet been closed: take advantage of this mechanism to produce more readable code than you’d get with backslashes at line ends. Triple-quoted string literals can also span physical lines. Physical lines after the first one in a logical line are known as continuation lines. Indentation issues apply to the first physical line of each logical line, not to continuation lines.

Python uses indentation to express the block structure of a program. Unlike other languages, Python does not use braces, or other begin/end delimiters, around blocks of statements; indentation is the only way to denote blocks. Each logical line in a Python program is indented by the whitespace on its left. A block is a contiguous sequence of logical lines, all indented by the same amount; a logical line with less indentation ends the block. All statements in a block must have the same indentation, as must all clauses in a compound statement. The first statement in a source file must have no indentation (i.e., must not begin with any whitespace). Statements that you type at the interactive interpreter primary prompt >>> (covered in “Interactive Sessions”) must also have no indentation.

v2 logically replaces each tab by up to eight spaces, so that the next character after the tab falls into logical column 9, 17, 25, and so on. Standard Python style is to use four spaces (never tabs) per indentation level.

Don’t mix spaces and tabs for indentation, since different tools (e.g., editors, email systems, printers) treat tabs differently. The -t and -tt options to the v2 Python interpreter (covered in “Command-Line Syntax and Options”) ensure against inconsistent tab and space usage in Python source code. In v3, Python does not allow mixing tabs and spaces for indentation.

Use spaces, not tabs

We recommend you configure your favorite editor to expand tabs to four spaces, so that all Python source code you write contains just spaces, not tabs. This way, all tools, including Python itself, are consistent in handling indentation in your Python source files. Optimal Python style is to indent blocks by exactly four spaces, and use no tabs.

Character Sets

A v3 source file can use any Unicode character, encoded as UTF-8. (Characters with codes between 0 and 127, AKA ASCII characters, encode in UTF-8 into the respective single bytes, so an ASCII text file is a fine v3 Python source file, too.)

A v2 source file is usually made up of characters from the ASCII set (character codes between 0 and 127).

In both v2 and v3, you may choose to tell Python that a certain source file is written in a different encoding. In this case, Python uses that encoding to read the file (in v2, you can use non-ASCII characters only in comments and string literals).

To let Python know that a source file is written with a nonstandard encoding, start your source file with a comment whose form must be, for example:

# coding: iso-8859-1

After coding:, write the name of a codec known to Python and ASCII-compatible, such as utf-8 or iso-8859-1. Note that this coding directive comment (also known as an encoding declaration) is taken as such only if it is at the start of a source file (possibly after the “shebang line,” covered in “Running Python Programs”). The only effect of a coding directive in v2 is to let you use non-ASCII characters in string literals and comments. Best practice is to use utf-8 for all of your text files, including Python source files.

Tokens

Python breaks each logical line into a sequence of elementary lexical components known as tokens. Each token corresponds to a substring of the logical line. The normal token types are identifiers, keywords, operators, delimiters, and literals, which we cover in the following sections. You may freely use whitespace between tokens to separate them. Some whitespace separation is necessary between logically adjacent identifiers or keywords; otherwise, Python would parse them as a single, longer identifier. For example, ifx is a single identifier; to write the keyword if followed by the identifier x, you need to insert some whitespace (e.g., if x).

Identifiers

An identifier is a name used to specify a variable, function, class, module, or other object. An identifier starts with a letter (in v2, A to Z or a to z; in v3, other characters that Unicode classifies as letters are also allowed) or an underscore (_), followed by zero or more letters, underscores, and digits (in v2, 0 to 9; in v3, other characters that Unicode classifies as digits or combining marks are also allowed). See this website for a table identifying which Unicode characters can start or continue a v3 identifier. Case is significant: lowercase and uppercase letters are distinct. Punctuation characters such as @, $, and ! are not allowed in identifiers.

Normal Python style is to start class names with an uppercase letter, and other identifiers with a lowercase letter. Starting an identifier with a single leading underscore indicates by convention that the identifier is meant to be private. Starting an identifier with two leading underscores indicates a strongly private identifier; if the identifier also ends with two trailing underscores, however, this means that the identifier is a language-defined special name.

Single underscore _ in the interactive interpreter

The identifier _ (a single underscore) is special in interactive interpreter sessions: the interpreter binds _ to the result of the last expression statement it has evaluated interactively, if any.

Keywords

Python has keywords (31 of them in v2; 33 in v3), which are identifiers that Python reserves for special syntactic uses. Keywords contain lowercase letters only. You cannot use keywords as regular identifiers (thus, they’re sometimes known as “reserved words”). Some keywords begin simple statements or clauses of compound statements, while other keywords are operators. We cover all the keywords in detail in this book, either in this chapter or in Chapters 4, 5, and 6. The keywords in v2 are:

and

continue

except

global

lambda

raise

yield

as

def

exec

if

not

return

assert

del

finally

import

or

try

break

elif

for

in

pass

while

class

else

from

is

print

with

In v3, exec and print are no longer keywords: they were statements in v2, but they’re now functions in v3. (To use the print function in v2, start your source file with from __future__ import print_function, as mentioned in “Version Conventions”.) False, None, True, and nonlocal are new, additional keywords in v3 (out of them, False, None, and True were already built-in constants in v2, but they were not technically keywords). Special tokens async and await, covered in Chapter 18, are not currently keywords, but they’re scheduled to become keywords in Python 3.7.

Operators

Python uses nonalphanumeric characters and character combinations as operators. Python recognizes the following operators, which are covered in detail in “Expressions and Operators”:

+

-

*

/

%

**

//

<<

>>

&

|

^

~

<

<=

>

>=

<>

!=

==

In v3 only, you can also use @ as an operator (in matrix multiplication, covered in Chapter 15), although the character is technically a delimiter.

Delimiters

Python uses the following characters and combinations as delimiters in expressions, list, dictionary, and set literals, and various statements, among other purposes:

(

)

[

]

{

}

,

:

.

`

=

;

@

+=

-=

*=

/=

//=

%=

&=

|=

^=

>>=

<<=

**=

The period (.) can also appear in floating-point literals (e.g., 2.3) and imaginary literals (e.g., 2.3j). The last two rows are the augmented assignment operators, which are delimiters, but also perform operations. We discuss the syntax for the various delimiters when we introduce the objects or statements using them.

The following characters have special meanings as part of other tokens:

'

"

#

' and " surround string literals. # outside of a string starts a comment. at the end of a physical line joins the following physical line into one logical line; is also an escape character in strings. The characters $ and ?, all control characters2 except whitespace, and, in v2, all characters with ISO codes above 126 (i.e., non-ASCII characters, such as accented letters) can never be part of the text of a Python program, except in comments or string literals. (To use non-ASCII characters in comments or string literals in v2, you must start your Python source file with a coding directive as covered in “Character Sets”.)

Literals

A literal is the direct denotation in a program of a data value (a number, string, or container). The following are number and string literals in Python:

42                       # Integer literal
3.14                     # Floating-point literal
1.0j                     # Imaginary literal
'hello'                  # String literal
"world"                  # Another string literal
"""Good
night"""                 # Triple-quoted string literal

Combining number and string literals with the appropriate delimiters, you can build literals that directly denote data values of container types:

[42, 3.14, 'hello']    # List
[]                     # Empty list
100, 200, 300          # Tuple
()                     # Empty tuple
{'x':42, 'y':3.14}     # Dictionary
{}                     # Empty dictionary
{1, 2, 4, 8, 'string'} # Set
# There is no literal to denote an empty set; use set() instead

We cover the syntax for literals in detail in “Data Types”, when we discuss the various data types Python supports.

Statements

You can look at a Python source file as a sequence of simple and compound statements. Unlike some other languages, Python has no “declarations” or other top-level syntax elements: just statements.

Simple statements

A simple statement is one that contains no other statements. A simple statement lies entirely within a logical line. As in many other languages, you may place more than one simple statement on a single logical line, with a semicolon (;) as the separator. However, one statement per line is the usual and recommended Python style, and makes programs more readable.

Any expression can stand on its own as a simple statement (we discuss expressions in “Expressions and Operators”). When working interactively, the interpreter shows the result of an expression statement you enter at the prompt (>>>) and binds the result to a global variable named _ (underscore). Apart from interactive sessions, expression statements are useful only to call functions (and other callables) that have side effects (e.g., perform output, change global variables, or raise exceptions).

An assignment is a simple statement that assigns values to variables, as we discuss in “Assignment Statements”. An assignment in Python is a statement and can never be part of an expression.

Compound statements

A compound statement contains one or more other statements and controls their execution. A compound statement has one or more clauses, aligned at the same indentation. Each clause has a header starting with a keyword and ending with a colon (:), followed by a body, which is a sequence of one or more statements. When the body contains multiple statements, also known as a block, these statements are on separate logical lines after the header line, indented four spaces rightward. The block lexically ends when the indentation returns to that of the clause header (or further left from there, to the indentation of some enclosing compound statement). Alternatively, the body can be a single simple statement, following the : on the same logical line as the header. The body may also consist of several simple statements on the same line with semicolons between them, but, as we’ve already mentioned, this is not good Python style.

Data Types

The operation of a Python program hinges on the data it handles. Data values in Python are known as objects; each object, AKA value, has a type. An object’s type determines which operations the object supports (in other words, which operations you can perform on the value). The type also determines the object’s attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. We cover object attributes and items in “Object attributes and items”.

The built-in type(obj) accepts any object as its argument and returns the type object that is the type of obj. The built-in function isinstance(obj, type) returns True when object obj has type type (or any subclass thereof); otherwise, it returns False.

Python has built-in types for fundamental data types such as numbers, strings, tuples, lists, dictionaries, and sets, as covered in the following sections. You can also create user-defined types, known as classes, as discussed in “Classes and Instances”.

Numbers

The built-in numeric types in Python include integers (int and long, in v2; in v3, there’s no distinction between kinds of integers), floating-point numbers, and complex numbers. The standard library also offers decimal floating-point numbers, covered in “The decimal Module”, and fractions, covered in “The fractions Module”. All numbers in Python are immutable objects; therefore, when you perform an operation on a number object, you produce a new number object. We cover operations on numbers, also known as arithmetic operations, in “Numeric Operations”.

Numeric literals do not include a sign: a leading + or -, if present, is a separate operator, as discussed in “Arithmetic Operations”.

Integer numbers

Integer literals can be decimal, binary, octal, or hexadecimal. A decimal literal is a sequence of digits in which the first digit is nonzero. A binary literal is 0b followed by a sequence of binary digits (0 or 1). An octal literal, in v2 only, can be 0 followed by a sequence of octal digits (0 to 7). This syntax can be quite misleading for the reader, and we do not recommend it; rather, use 0o followed by a sequence of octal digits, which works in both v2 and v3 and does not risk misleading the reader. A hexadecimal literal is 0x followed by a sequence of hexadecimal digits (0 to 9 and A to F, in either upper- or lowercase). For example:

1, 23, 3493                  # Decimal integer literals
0b010101, 0b110010           # Binary integer literals
0o1, 0o27, 0o6645            # Octal integer literals
0x1, 0x17, 0xDA5             # Hexadecimal integer literals

Integer literals have no defined upper bound (in v2 only, if greater than sys.maxint, integer literals are instances of built-in type long; v3 does not draw that distinction, but rather uses int as the type of all integers).

Floating-point numbers

A floating-point literal is a sequence of decimal digits that includes a decimal point (.), an exponent suffix (an e or E, optionally followed by + or -, followed by one or more digits), or both. The leading character of a floating-point literal cannot be e or E; it may be any digit or a period (.). For example:

0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0  # Floating-point literals

A Python floating-point value corresponds to a C double and shares its limits of range and precision, typically 53 bits of precision on modern platforms. (For the exact range and precision of floating-point values on the current platform, see sys.float_info: we do not cover that in this book—see the online docs.)

Complex numbers

A complex number is made up of two floating-point values, one each for the real and imaginary parts. You can access the parts of a complex object z as read-only attributes z.real and z.imag. You can specify an imaginary literal as a floating-point or decimal literal followed by a j or J:

0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j

The j at the end of the literal indicates the square root of -1, as commonly used in electrical engineering (some other disciplines use i for this purpose, but Python has chosen j). There are no other complex literals. To denote any constant complex number, add or subtract a floating-point (or integer) literal and an imaginary one. For example, to denote the complex number that equals one, use expressions like 1+0j or 1.0+0.0j. Python performs the addition or subtraction at compile time.

New in 3.6: Underscores in numeric literals

To assist visual assessment of the magnitude of a number, from 3.6 onward numeric literals can include single underscore (_) characters between digits or after any base specifier. As this implies, not only decimal numeric constants can benefit from this new notational freedom:

>>> 100_000.000_0001, 0x_FF_FF, 0o7_777, 0b_1010_1010
(100000.0000001, 65535, 4095, 170)

Sequences

A sequence is an ordered container of items, indexed by integers. Python has built-in sequence types known as strings (bytes and Unicode), tuples, and lists. Library and extension modules provide other sequence types, and you can write yet others yourself (as discussed in “Sequences”). You can manipulate sequences in a variety of ways, as discussed in “Sequence Operations”.

Iterables

A Python concept that generalizes the idea of “sequence” is that of iterables, covered in “The for Statement” and “Iterators”. All sequences are iterable: whenever we say you can use an iterable, you can in particular use a sequence (for example, a list).

Also, when we say that you can use an iterable, we mean, usually, a bounded iterable: an iterable that eventually stops yielding items. All sequences are bounded. Iterables, in general, can be unbounded, but if you try to use an unbounded iterable without special precautions, you could produce a program that never terminates, or one that exhausts all available memory.

Strings

A built-in string object (bytes or Unicode) is a sequence of characters used to store and represent text-based information (byte strings, also known as byte objects, store and represent arbitrary sequences of binary bytes). Strings in Python are immutable: when you perform an operation on strings, you always produce a new string object, rather than mutating an existing string. String objects provide many methods, as discussed in detail in “Methods of String and Bytes Objects”.

Different string types in v2 and v3

In v2, unadorned string literals denote byte strings; such literals denote Unicode (AKA text) strings in v3.

A string literal can be quoted or triple-quoted. A quoted string is a sequence of 0+ characters within matching quotes, single (') or double ("). For example:

'This is a literal string'
"This is another string"

The two different kinds of quotes function identically; having both lets you include one kind of quote inside of a string specified with the other kind, with no need to escape quote characters with the backslash character ():

'I'm a Python fanatic'        # a quote can be escaped
"I'm a Python fanatic"         # this way is more readable

Other things equal, using single quotes to denote string literals is better Python style. To have a string literal span multiple physical lines, you can use a as the last character of a line to indicate that the next line is a continuation:

'A not very long string 
that spans two lines'      # comment not allowed on previous line

To make the string contain two lines, you can embed a newline in the string:

'A not very long string

that prints on two lines'  # comment not allowed on previous line

A better approach is to use a triple-quoted string, enclosed by matching triplets of quote characters (''' or, more commonly, """):

"""An even bigger
string that spans
three lines"""           # comments not allowed on previous lines

In a triple-quoted string literal, line breaks in the literal remain as newline characters in the resulting string object. You can start a triple-quoted literal with a backslash immediately followed by a newline, to avoid having the first line of the literal string’s content at a different indentation level from the rest. For example:

the_text = """
First line
Second line
"""  # like 'First line
Second line
' but more readable

The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain unescaped backslashes, nor line ends, nor the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. We list Python’s string escape sequences in Table 3-1.

Table 3-1. String escape sequences
Sequence Meaning ASCII/ISO code

<newline>

Ignore end of line

None

\

Backslash

0x5c

'

Single quote

0x27

"

Double quote

0x22

a

Bell

0x07



Backspace

0x08

f

Form feed

0x0c

Newline

0x0a

Carriage return

0x0d

Tab

0x09

v

Vertical tab

0x0b

DDD

Octal value DDD

As given

x XX

Hexadecimal value XX

As given

other

Any other character: a two-character string

0x5c + as given

A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in Table 3-1, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, especially regular expression patterns (see “Pattern-String Syntax”). A raw string cannot end with an odd number of backslashes: the last one would be taken as escaping the terminating quote.

In Unicode string literals you can use u followed by four hex digits, and U followed by eight hex digits, to denote Unicode characters, and can also include the same escape sequences listed in Table 3-1. Unicode literals can also include the escape sequence N{name}, where name is a standard Unicode name, as listed at http://www.unicode.org/charts/. For example, N{Copyright Sign} indicates a Unicode copyright sign character (©).

Raw Unicode string literals in v2 start with ur, not ru; raw byte string literals in v2 start with br, not rb (in v3, you can start them with either br or rb).

Raw strings are not a different type from other strings

Raw strings are not a different type from ordinary strings; they are just an alternative syntax for literals of the usual two string types, byte strings and Unicode.

New in 3.6, formatted string literals let you inject formatted expressions into your strings, which are therefore no longer constants but subject to evaluation at execution time. We cover these new literals in “New in 3.6: Formatted String Literals”. From a syntactic point of view, they can be regarded just as another kind of string literal.

Multiple string literals of any kind—quoted, triple-quoted, raw, bytes, formatted, Unicode—can be adjacent, with optional whitespace in between (except that, in v3, you cannot mix bytes and Unicode in this way). The compiler concatenates such adjacent string literals into a single string object. In v2, if any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines and gives you an opportunity to insert comments about parts of the string. For example:

marypop = ('supercalifragilistic' # Open paren->logical line continues
           'expialidocious')      # Indentation ignored in continuation

The string assigned to marypop is a single word of 34 characters.

Tuples

A tuple is an immutable ordered sequence of items. The items of a tuple are arbitrary objects and may be of different types. You can use mutable objects (e.g., lists) as tuple items; however, best practice is to avoid tuples with mutable items.

To denote a tuple, use a series of expressions (the items of the tuple) separated by commas (,); if every item is a literal, the whole assembly is a tuple literal. You may optionally place a redundant comma after the last item. You may group tuple items within parentheses, but the parentheses are necessary only where the commas would otherwise have another meaning (e.g., in function calls), or to denote empty or nested tuples. A tuple with exactly two items is also known as a pair. To create a tuple of one item, add a comma to the end of the expression. To denote an empty tuple, use an empty pair of parentheses. Here are some tuple literals, all in the optional parentheses (the parentheses are not optional in the last case):

(100, 200, 300)            # Tuple with three items
(3.14,)                    # Tuple with 1 item needs trailing comma
()                         # Empty tuple (parentheses NOT optional)

You can also call the built-in type tuple to create a tuple. For example:

tuple('wow')

This builds a tuple equal to that denoted by the tuple literal:

('w', 'o', 'w')

tuple() without arguments creates and returns an empty tuple, like (). When x is iterable, tuple(x) returns a tuple whose items are the same as those in x.

Lists

A list is a mutable ordered sequence of items. The items of a list are arbitrary objects and may be of different types. To denote a list, use a series of expressions (the items of the list) separated by commas (,), within brackets ([]); if every item is a literal, the whole assembly is a list literal. You may optionally place a redundant comma after the last item. To denote an empty list, use an empty pair of brackets. Here are some example list literals:

[42, 3.14, 'hello']        # List with three items
[100]                      # List with one item
[]                         # Empty list

You can also call the built-in type list to create a list. For example:

list('wow')

This builds a list equal to that denoted by the list literal:

['w', 'o', 'w']

list() without arguments creates and returns an empty list, like []. When x is iterable, list(x) creates and returns a new list whose items are the same as those x. You can also build lists with list comprehensions, covered in “List comprehensions”.

Sets

Python has two built-in set types, set and frozenset, to represent arbitrarily ordered collections of unique items. Items in a set may be of different types, but they must be hashable (see hash in Table 7-2). Instances of type set are mutable, and thus, not hashable; instances of type frozenset are immutable and hashable. You can’t have a set whose items are sets, but you can have a set (or frozenset) whose items are frozensets. Sets and frozensets are not ordered.

To create a set, you can call the built-in type set with no argument (this means an empty set) or one argument that is iterable (this means a set whose items are those of the iterable). You can similarly build a frozenset by calling frozenset.

Alternatively, to denote a (nonfrozen, nonempty) set, use a series of expressions (the items of the set) separated by commas (,) and within braces ({}); if every item is a literal, the whole assembly is a set literal. You may optionally place a redundant comma after the last item. Some example sets (two literals, one not):

{42, 3.14, 'hello'}     # Literal for a set with three items
{100}                   # Literal for a set with one item
set()                   # Empty set (can't use {}—empty dict!)

You can also build nonfrozen sets with set comprehensions, as discussed in “List comprehensions”.

Dictionaries

A mapping is an arbitrary collection of objects indexed by nearly3 arbitrary values called keys. Mappings are mutable and, like sets but unlike sequences, are not (necessarily) ordered.

Python provides a single built-in mapping type: the dictionary type. Library and extension modules provide other mapping types, and you can write others yourself (as discussed in “Mappings”). Keys in a dictionary may be of different types, but they must be hashable (see hash in Table 7-2). Values in a dictionary are arbitrary objects and may be of any type. An item in a dictionary is a key/value pair. You can think of a dictionary as an associative array (known in some other languages as an “unordered map,” “hash table,” or “hash”).

To denote a dictionary, you can use a series of colon-separated pairs of expressions (the pairs are the items of the dictionary) separated by commas (,) within braces ({}); if every expression is a literal, the whole assembly is a dict literal. You may optionally place a redundant comma after the last item. Each item in a dictionary is written as key:value, where key is an expression giving the item’s key and value is an expression giving the item’s value. If a key’s value appears more than once in a dictionary expression, only an arbitrary one of the items with that key is kept in the resulting dictionary object—dictionaries do not allow duplicate keys. To denote an empty dictionary, use an empty pair of braces.

Here are some dictionary literals:

{'x':42, 'y':3.14, 'z':7}    # Dictionary with three items, str keys
{1:2, 3:4}                   # Dictionary with two items, int keys
{1:'za', 'br':23}            # Dictionary with mixed key types
{}                           # Empty dictionary

You can also call the built-in type dict to create a dictionary in a way that, while usually less concise, can sometimes be more readable. For example, the dictionaries in the preceding snippet can equivalently be written as:

dict(x=42, y=3.14, z=7)      # Dictionary with three items, str keys
dict([(1, 2), (3, 4)])       # Dictionary with two items, int keys
dict([(1,'za'), ('br',23)])  # Dictionary with mixed key types
dict()                       # Empty dictionary

dict() without arguments creates and returns an empty dictionary, like {}. When the argument x to dict is a mapping, dict returns a new dictionary object with the same keys and values as x. When x is iterable, the items in x must be pairs, and dict(x) returns a dictionary whose items (key/value pairs) are the same as the items in x. If a key value appears more than once in x, only the last item from x with that key value is kept in the resulting dictionary.

When you call dict, in addition to, or instead of, the positional argument x, you may pass named arguments, each with the syntax name=value, where name is an identifier to use as an item’s key and value is an expression giving the item’s value. When you call dict and pass both a positional argument and one or more named arguments, if a key appears both in the positional argument and as a named argument, Python associates to that key the value given with the named argument (i.e., the named argument “wins”).

You can also create a dictionary by calling dict.fromkeys. The first argument is an iterable whose items become the keys of the dictionary; the second argument is the value that corresponds to each and every key (all keys initially map to the same value). If you omit the second argument, it defaults to None. For example:

dict.fromkeys('hello', 2)   # same as {'h':2, 'e':2, 'l':2, 'o':2}
dict.fromkeys([1, 2, 3])    # same as {1:None, 2:None, 3:None}

You can also build dicts with dict comprehensions, as discussed in “List comprehensions”.

None

The built-in None denotes a null object. None has no methods or other attributes. You can use None as a placeholder when you need a reference but you don’t care what object you refer to, or when you need to indicate that no object is there. Functions return None as their result unless they have specific return statements coded to return other values.

Callables

In Python, callable types are those whose instances support the function call operation (see “Calling Functions”). Functions are callable. Python provides several built-in functions (see “Built-in Functions”) and supports user-defined functions (see “The def Statement”). Generators are also callable (see “Generators”).

Types are also callable, as we already saw for the dict, list, set, and tuple built-in types. (See “Built-in Types” for a complete list of built-in types.) As we discuss in “Python Classes”, class objects (user-defined types) are also callable. Calling a type normally creates and returns a new instance of that type.

Other callables are methods, which are functions bound to class attributes, and instances of classes that supply a special method named __call__.

Boolean Values

Any data value in Python can be used as a truth value: true or false. Any nonzero number or nonempty container (e.g., string, tuple, list, set, or dictionary) is true. 0 (of any numeric type), None, and empty containers are false.

Beware using a float as a truth value

Be careful about using a floating-point number as a truth value: that’s like comparing the number for exact equality with zero, and floating-point numbers should almost never be compared for exact equality.

The built-in type bool is a subclass of int. The only two values of type bool are Trueand False, which have string representations of 'True' and 'False', but also numerical values of 1 and 0, respectively. Several built-in functions return bool results, as do comparison operators.

You can call bool(x) with any x as the argument. The result is True when x is true and False when x is false. Good Python style is not to use such calls when they are redundant, as they most often are: always write if x:, never any of if bool(x):, if x is True, if x==True:, if bool(x)==True. However, you can use bool(x) to count the number of true items in a sequence. For example:

def count_trues(seq): return sum(bool(x) for x in seq)

In this example, the bool call ensures each item of seq is counted as 0 (if false) or 1 (if true), so count_trues is more general than sum(seq) would be.

When we write “expression is true,” we mean that bool(expression) would return True.

Variables and Other References

A Python program accesses data values through references. A reference is a “name” that refers to a value (object). References take the form of variables, attributes, and items. In Python, a variable or other reference has no intrinsic type. The object to which a reference is bound at a given time always has a type, but a given reference may be bound to objects of various types in the course of the program’s execution.

Variables

In Python, there are no “declarations.” The existence of a variable begins with a statement that binds the variable (in other words, sets a name to hold a reference to some object). You can also unbind a variable, resetting the name so it no longer holds a reference. Assignment statements are the most common way to bind variables and other references. The del statement unbinds references.

Binding a reference that was already bound is also known as rebinding it. Whenever we mention binding, we implicitly include rebinding (except where we explicitly exclude it). Rebinding or unbinding a reference has no effect on the object to which the reference was bound, except that an object goes away when nothing refers to it. The cleanup of objects with no references is known as garbage collection.

You can name a variable with any identifier except the 30-plus reserved as Python’s keywords (see “Keywords”). A variable can be global or local. A global variable is an attribute of a module object (see Chapter 6). A local variable lives in a function’s local namespace (see “Namespaces”).

Object attributes and items

The main distinction between the attributes and items of an object is in the syntax you use to access them. To denote an attribute of an object, use a reference to the object, followed by a period (.), followed by an identifier known as the attribute name. For example, x.y refers to one of the attributes of the object bound to name x, specifically that attribute whose name is 'y'.

To denote an item of an object, use a reference to the object, followed by an expression within brackets ([]). The expression in brackets is known as the item’s index or key, and the object is known as the item’s container. For example, x[y] refers to the item at the key or index bound to name y, within the container object bound to name x.

Attributes that are callable are also known as methods. Python draws no strong distinctions between callable and noncallable attributes, as some other languages do. All rules about attributes also apply to callable attributes (methods).

Accessing nonexistent references

A common programming error is trying to access a reference that does not exist. For example, a variable may be unbound, or an attribute name or item index may not be valid for the object to which you apply it. The Python compiler, when it analyzes and compiles source code, diagnoses only syntax errors. Compilation does not diagnose semantic errors, such as trying to access an unbound attribute, item, or variable. Python diagnoses semantic errors only when the errant code executes—that is, at runtime. When an operation is a Python semantic error, attempting it raises an exception (see Chapter 5). Accessing a nonexistent variable, attribute, or item—just like any other semantic error—raises an exception.

Assignment Statements

Assignment statements can be plain or augmented. Plain assignment to a variable (e.g., name=value) is how you create a new variable or rebind an existing variable to a new value. Plain assignment to an object attribute (e.g., x.attr=value) is a request to object x to create or rebind attribute 'attr'. Plain assignment to an item in a container (e.g., x[k]=value) is a request to container x to create or rebind the item with index or key k.

Augmented assignment (e.g., name+=value) cannot, per se, create new references. Augmented assignment can rebind a variable, ask an object to rebind one of its existing attributes or items, or request the target object to modify itself. When you make a request to an object, it is up to the object to decide whether and how to honor the request, and whether to raise an exception.

Plain assignment

A plain assignment statement in the simplest form has the syntax:

target = expression

The target is known as the lefthand side (LHS), and the expression is the righthand side (RHS). When the assignment executes, Python evaluates the RHS expression, then binds the expression’s value to the LHS target. The binding does not depend on the type of the value. In particular, Python draws no strong distinction between callable and noncallable objects, as some other languages do, so you can bind functions, methods, types, and other callables to variables, just as you can numbers, strings, lists, and so on. This is part of functions and the like being first-class objects.

Details of the binding do depend on the kind of target. The target in an assignment may be an identifier, an attribute reference, an indexing, or a slicing:

An identifier

Is a variable name. Assigning to an identifier binds the variable with this name.

An attribute reference

Has the syntax obj.name, where obj is an arbitrary expression, and name is an identifier, known as an attribute name of the object. Assigning to an attribute reference asks object obj to bind its attribute named 'name'.

An indexing

Has the syntax obj[expr]. obj and expr are arbitrary expressions. Assigning to an indexing asks container obj to bind its item indicated by the value of expr, also known as the index or key of the item in the container.

A slicing

Has the syntax obj[start:stop] or obj[start:stop:stride]. obj, start, stop, and stride are arbitrary expressions. start, stop, and stride are all optional (i.e., obj[:stop:] and obj[:stop] are also syntactically correct slicings, equivalent to obj[None:stop:None]). Assigning to a slicing asks container obj to bind or unbind some of its items. Assigning to a slicing such as obj[start:stop:stride] is equivalent to assigning to the indexing obj[slice(start, stop, stride)]. See Python’s built-in type slice in (Table 7-1), whose instances represent slices.

We’ll get back to indexing and slicing targets when we discuss operations on lists, in “Modifying a list”, and on dictionaries, in “Indexing a Dictionary”.

When the target of the assignment is an identifier, the assignment statement specifies the binding of a variable. This is never disallowed: when you request it, it takes place. In all other cases, the assignment statement specifies a request to an object to bind one or more of its attributes or items. An object may refuse to create or rebind some (or all) attributes or items, raising an exception if you attempt a disallowed creation or rebinding (see also __setattr__ in Table 4-1 and __setitem__ in “Container methods”).

A plain assignment can use multiple targets and equals signs (=). For example:

a = b = c = 0

binds variables a, b, and c to the same value, 0. Each time the statement executes, the RHS expression evaluates just once, no matter how many targets are part of the statement. Each target, left to right, is bound to the single object returned by the expression, just as if several simple assignments executed one after the other.

The target in a plain assignment can list two or more references separated by commas, optionally enclosed in parentheses or brackets. For example:

a, b, c = x

This statement requires x to be an iterable with exactly three items, and binds a to the first item, b to the second, and c to the third. This kind of assignment is known as an unpacking assignment. The RHS expression must be an iterable with exactly as many items as there are references in the target; otherwise, Python raises an exception. Each reference in the target gets bound to the corresponding item in the RHS. An unpacking assignment can also be used to swap references:

a, b = b, a

This assignment statement rebinds name a to what name b was bound to, and vice versa. In v3, exactly one of the multiple targets of an unpacking assignment may be preceded by *. That starred target is bound to a list of all items, if any, that were not assigned to other targets. For example, in v3:

first, *middle, last = x

when x is a list, is the same as (but more concise, clearer, more general, and faster than):

first, middle, last = x[0], x[1:-1], x[-1]

Each of these assignments requires x to have at least two items. The second form, compatible with v2, requires the values in x to be a sequence, accessible by numeric index; the first, v3-only, form, is fine with x being any iterable with at least two items. This v3-only feature is known as extended unpacking.

Augmented assignment

An augmented assignment (sometimes also known as an in-place assignment) differs from a plain assignment in that, instead of an equals sign (=) between the target and the expression, it uses an augmented operator, which is a binary operator followed by =. The augmented operators are +=, -=, *=, /=, //=, %=, **=, |=, >>=, <<=, &=, and ^= (and, in v3 only, @=). An augmented assignment can have only one target on the LHS; augmented assignment doesn’t support multiple targets.

In an augmented assignment, just as in a plain one, Python first evaluates the RHS expression. Then, when the LHS refers to an object that has a special method for the appropriate in-place version of the operator, Python calls the method with the RHS value as its argument. It is up to the method to modify the LHS object appropriately and return the modified object (“Special Methods” covers special methods). When the LHS object has no appropriate in-place special method, Python applies the corresponding binary operator to the LHS and RHS objects, then rebinds the target reference to the operator’s result. For example, x+=y is like x=x.__iadd__(y) when x has special method __iadd__ for in-place addition. Otherwise, x+=y is like x=x+y.

Augmented assignment never creates its target reference; the target must already be bound when augmented assignment executes. Augmented assignment can rebind the target reference to a new object, or modify the same object to which the target reference was already bound. Plain assignment, in contrast, can create or rebind the LHS target reference, but it never modifies the object, if any, to which the target reference was previously bound. The distinction between objects and references to objects is crucial here. For example, x=x+y does not modify the object to which name x was originally bound. Rather, it rebinds the name x to refer to a new object. x+=y, in contrast, modifies the object to which the name x is bound, when that object has special method __iadd__; otherwise, x+=y rebinds the name x to a new object, just like x=x+y.

del Statements

Despite its name, a del statement unbinds references—it does not, per se, delete objects. Object deletion may automatically follow as a consequence, by garbage collection, when no more references to an object exist.

A del statement consists of the keyword del, followed by one or more target references separated by commas (,). Each target can be a variable, attribute reference, indexing, or slicing, just like for assignment statements, and must be bound at the time del executes. When a del target is an identifier, the del statement means to unbind the variable. If the identifier was bound, unbinding it is never disallowed; when requested, it takes place.

In all other cases, the del statement specifies a request to an object to unbind one or more of its attributes or items. An object may refuse to unbind some (or all) attributes or items, raising an exception if you attempt a disallowed unbinding (see also __delattr__ in “General-Purpose Special Methods” and __delitem__ in “Container methods”). Unbinding a slicing normally has the same effect as assigning an empty sequence to that slicing, but it is up to the container object to implement this equivalence.

Containers are also allowed to have del cause side effects. For example, assuming del C[2] succeeds, when C is a dict, this makes future references to C[2] invalid (raising KeyError) until and unless you assign to C[2] again; but when C is a list, del C[2] implies that every following item of C “shifts left by one”—so, if C is long enough, future references to C[2] are still valid, but denote a distinct item than they did before the del.

Expressions and Operators

An expression is a “phrase” of code, which Python evaluates to produce a value. The simplest expressions are literals and identifiers. You build other expressions by joining subexpressions with the operators and/or delimiters listed in Table 3-2. This table lists operators in decreasing order of precedence, higher precedence before lower. Operators listed together have the same precedence. The third column lists the associativity of the operator: L (left-to-right), R (right-to-left), or NA (non-associative).

Table 3-2. Operator precedence in expressions
Operator Description Associativity

{ key : expr ,...}

Dictionary creation

NA

{ expr ,...}

Set creation

NA

[ expr ,...]

List creation

NA

( expr ,...)

Tuple creation or just parentheses

NA

f ( expr ,...)

Function call

L

x [ index : index ]

Slicing

L

x [ index ]

Indexing

L

x . attr

Attribute reference

L

x ** y

Exponentiation (x to the yth power)

R

~ x

Bitwise NOT

NA

+x, -x

Unary plus and minus

NA

x*y, x/y, x//y, x%y

Multiplication, division, truncating division, remainder

L

x+y, x-y

Addition, subtraction

L

x<<y, x>>y

Left-shift, right-shift

L

x & y

Bitwise AND

L

x ^ y

Bitwise XOR

L

x | y

Bitwise OR

L

x<y, x<=y, x>y, x>=y, x<>y (v2 only),
x!=y, x==y

Comparisons (less than, less than or equal, greater than, greater than or equal, inequality, equality)a

NA

x is y, x is not y

Identity tests

NA

x in y, x not in y

Membership tests

NA

not x

Boolean NOT

NA

x and y

Boolean AND

L

x or y

Boolean OR

L

x if expr else y

Ternary operator

NA

lambda arg,...: expr

Anonymous simple function

NA

a In v2, <> and != are alternate forms of the same operator. != is the preferred version; <> is obsolete, and not supported in v3.

In Table 3-2, expr, key, f, index, x, and y indicate any expression, while attr and arg indicate any identifier. The notation ,... means commas join zero or more repetitions. In all such cases, a trailing comma is optional and innocuous.

Comparison Chaining

You can chain comparisons, implying a logical and. For example:

a < b <= c < d

has the same meaning as:

a < b and b <= c and c < d

The chained form is more readable, and evaluates each subexpression at most once.4

Short-Circuiting Operators

The and and or operators short-circuit their operands’ evaluation: the righthand operand evaluates only when its value is necessary to get the truth value of the entire and or or operation.

In other words, x and y first evaluates x. When x is false, the result is x; otherwise, the result is y. Similarly, x or y first evaluates x. When x is true, the result is x; otherwise, the result is y.

and and or don’t force their results to be True or False, but rather return one or the other of their operands. This lets you use these operators more generally, not just in Boolean contexts. and and or, because of their short-circuiting semantics, differ from other operators, which fully evaluate all operands before performing the operation. and and or let the left operand act as a guard for the right operand.

The ternary operator

Another short-circuiting operator is the ternary operator if/else:

whentrue if condition else whenfalse

Each of whentrue, whenfalse, and condition is an arbitrary expression. condition evaluates first. When condition is true, the result is whentrue; otherwise, the result is whenfalse. Only one of the subexpressions whentrue and whenfalse evaluates, depending on the truth value of condition.

The order of the subexpressions in this ternary operator may be a bit confusing. The recommended style is to always place parentheses around the whole expression.

Numeric Operations

Python offers the usual numeric operations, as we’ve just seen in Table 3-2. Numbers are immutable objects: when you perform operations on number objects, you produce a new number object, never modify existing ones. You can access the parts of a complex object z as read-only attributes z.real and z.imag. Trying to rebind these attributes raises an exception.

A number’s optional + or - sign, and the + that joins a floating-point literal to an imaginary one to make a complex number, is not part of the literals’ syntax. They are ordinary operators, subject to normal operator precedence rules (see Table 3-2). For example, -2**2 evaluates to -4: exponentiation has higher precedence than unary minus, so the whole expression parses as -(2**2), not as (-2)**2.

Numeric Conversions

You can perform arithmetic operations and comparisons between any two numbers of Python built-in types. If the operands’ types differ, coercion applies: Python converts the operand with the “smaller” type to the “larger” type. The types, in order from smallest to largest, are integers, floating-point numbers, and complex numbers. You can request an explicit conversion by passing a noncomplex numeric argument to any of the built-in number types: int, float, and complex. int drops its argument’s fractional part, if any (e.g., int(9.8) is 9). You can also call complex with two numeric arguments, giving real and imaginary parts. You cannot convert a complex to another numeric type in this way, because there is no single unambiguous way to convert a complex number into, for example, a float.

You can also call each built-in numeric type with a string argument with the syntax of an appropriate numeric literal, with small extensions: the argument string may have leading and/or trailing whitespace, may start with a sign, and—for complex numbers—may sum or subtract a real part and an imaginary one. int can also be called with two arguments: the first one a string to convert, and the second the radix, an integer between 2 and 36 to use as the base for the conversion (e.g., int('101', 2) returns 5, the value of '101' in base 2). For radices larger than 10, the appropriate subset of letters from the start of the alphabet (in either lower- or uppercase) are the extra needed “digits.”

Arithmetic Operations

Python arithmetic operations behave in rather obvious ways, with the possible exception of division and exponentiation.

Division

If the right operand of /, //, or % is 0, Python raises a runtime exception. The // operator performs truncating division, which means it returns an integer result (converted to the same type as the wider operand) and ignores the remainder, if any.

When both operands are integers: in v3, the / operator performs true division, returning a floating-point result (or a complex result if either operand is a complex number); in v2, / performs truncating division, like //. To perform truncating division in v3, use //. To have / perform true division on integer operands in v2, use option -Qnew on the Python command line (not recommended), or, better, begin your source file or interactive session with the statement:

from __future__ import division

This statement ensures that the operator / (within the module that starts with this statement) works without truncation on operands of any type.

To ensure that the behavior of division does not depend on the exact version of Python you’re using, always use // when you want truncating division. When you do not want truncation, use /, but also ensure that at least one operand is not an integer. For example, instead of using just a/b, code 1.0*a/b (float(a)/b is marginally slower, and fails when a is complex) to avoid making any assumption on the types of a and b. To check whether your v2 program has version dependencies in its use of division, use the option -Qwarn on the Python command line to get runtime warnings about uses of / on integer operands.

The built-in divmod function takes two numeric arguments and returns a pair whose items are the quotient and remainder, so you don’t have to use both // for the quotient and % for the remainder.

Exponentiation

The exponentiation (“raise to power”) operation, a**b, in v2, raises an exception when a is less than zero and b is a floating-point value with a nonzero fractional part, but, in v3, it returns the appropriate complex number in such cases.

The built-in pow(a, b) function returns the same result as a**b. With three arguments, pow(a, b, c) returns the same result as (a**b)%c but is faster.

Comparisons

All objects, including numbers, can be compared for equality (==) and inequality (!=). Comparisons requiring order (<, <=, >, >=) may be used between any two numbers, unless either operand is complex, in which case they raise runtime exceptions. All these operators return Boolean values (True or False). Beware comparing floating-point numbers for equality, as the online tutorial explains.

Bitwise Operations on Integers

Integers can be interpreted as strings of bits and used with the bitwise operations shown in Table 3-2. Bitwise operators have lower priority than arithmetic operators. Positive integers are conceptually extended by an unbounded string of 0 bits on the left. Negative integers, as they’re held in two’s complement representation, are conceptually extended by an unbounded string of 1 bits on the left.

Sequence Operations

Python supports a variety of operations applicable to all sequences, including strings, lists, and tuples. Some sequence operations apply to all containers (including sets and dictionaries, which are not sequences); some apply to all iterables (meaning “any object over which you can loop,” as covered in “Iterables”; all containers, be they sequences or not, are iterable, and so are many objects that are not containers, such as files, covered in “The io Module”, and generators, covered in “Generators”). In the following we use the terms sequence, container, and iterable quite precisely, to indicate exactly which operations apply to each category.

Sequences in General

Sequences are ordered containers with items that are accessible by indexing and slicing. The built-in len function takes any container as an argument and returns the number of items in the container. The built-in min and max functions take one argument, a nonempty iterable whose items are comparable, and return the smallest and largest items, respectively. You can also call min and max with multiple arguments, in which case they return the smallest and largest arguments, respectively. The built-in sum function takes one argument, an iterable whose items are numbers, and returns the sum of the numbers.

Sequence conversions

There is no implicit conversion between different sequence types except that Python, in v2 only, converts byte strings to Unicode strings if needed. (We cover string conversion in detail in “Unicode”.) You can call the built-ins tuple and list with a single argument (any iterable) to get a new instance of the type you’re calling, with the same items, in the same order, as in the argument.

Concatenation and repetition

You can concatenate sequences of the same type with the + operator. You can multiply a sequence S by an integer n with the * operator. S*n or n*S is the concatenation of n copies of S. When n<=0, S*n is an empty sequence of the same type as S.

Membership testing

The x in S operator tests to check whether object x equals any item in the sequence (or other kind of container or iterable) S. It returns True when it does and False when it doesn’t. The x not in S operator is equivalent to not (x in S). For dictionaries, x in S tests for the presence of x as a key. In the specific case of strings, though, x in S is more widely applicable; in this case, x in S tests whether x equals any substring of S, not just any single character.

Indexing a sequence

To denote the nth item of a sequence S, use an indexing: S[n]. Indexing is zero-based (S’s first item is S[0]). If S has L items, the index n may be 0, 1…up to and including L-1, but no larger. n may also be -1, -2…down to and including -L, but no smaller. A negative n (e.g., -1) denotes the same item in S as L+n (e.g., L + -1) does. In other words, S[-1], like S[L-1], is the last element of S, S[-2] is the next-to-last one, and so on. For example:

x = [1, 2, 3, 4]
x[1]                  # 2
x[-1]                 # 4

Using an index >=L or <-L raises an exception. Assigning to an item with an invalid index also raises an exception. You can add elements to a list, but to do so you assign to a slice, not an item, as we’ll discuss shortly.

Slicing a sequence

To indicate a subsequence of S, you can use a slicing, with the syntax S[i:j], where i and j are integers. S[i:j] is the subsequence of S from the ith item, included, to the jth item, excluded. In Python, ranges always include the lower bound and exclude the upper bound. A slice is an empty subsequence when j is less than or equal to i, or when i is greater than or equal to L, the length of S. You can omit i when it is equal to 0, so that the slice begins from the start of S. You can omit j when it is greater than or equal to L, so that the slice extends all the way to the end of S. You can even omit both indices, to mean a shallow copy of the entire sequence: S[:]. Either or both indices may be less than 0. Here are some examples:

x = [1, 2, 3, 4]
x[1:3]                 # [2, 3]
x[1:]                  # [2, 3, 4]
x[:2]                  # [1, 2]

A negative index n in a slicing indicates the same spot in S as L+n, just like it does in an indexing. An index greater than or equal to L means the end of S, while a negative index less than or equal to -L means the start of S. Slicing can use the extended syntax S[i:j:k]. k is the stride of the slice, meaning the distance between successive indices. S[i:j] is equivalent to S[i:j:1], S[::2] is the subsequence of S that includes all items that have an even index in S, and S[::-1]5 has the same items as S, but in reverse order. With a negative stride, in order to have a nonempty slice, the second (“stop”) index needs to be smaller than the first (“start”) one—the reverse of the condition that must hold when the stride is positive. A stride of 0 raises an exception.

y = list(range(10))
y[-5:]           #  last five items
[5, 6, 7, 8, 9]
y[::2]           #  every other item
[0, 2, 4, 6, 8]
y[10:0:-2]       #  every other item, in reverse order
[9, 7, 5, 3, 1]
y[:0:-2]         #  every other item, in reverse order (simpler)
[9, 7, 5, 3, 1]
y[::-2]          #  every other item, in reverse order (best)
[9, 7, 5, 3, 1]

Strings

String objects (byte strings, as well as text, AKA Unicode, ones) are immutable: attempting to rebind or delete an item or slice of a string raises an exception. The items of a string object (corresponding to each of the characters in the string) are themselves strings of the same kind, each of length 1—Python has no special data type for “single characters” (except for the items of a bytes object in v3: in that case, indexing produces an int). All slices of a string are strings of the same kind. String objects have many methods, covered in “Methods of String and Bytes Objects”.

Tuples

Tuple objects are immutable: therefore, attempting to rebind or delete an item or slice of a tuple raises an exception. The items of a tuple are arbitrary objects and may be of different types; tuple items may be mutable, but we don’t recommend this practice, as it can be confusing. The slices of a tuple are also tuples. Tuples have no normal (nonspecial) methods, except count and index, with the same meanings as for lists; they do have some of the special methods covered in “Special Methods”.

Lists

List objects are mutable: you may rebind or delete items and slices of a list. Items of a list are arbitrary objects and may be of different types. Slices of a list are lists.

Modifying a list

You can modify a single item in a list by assigning to an indexing. For instance:

x = [1, 2, 3, 4]
x[1] = 42                # x is now [1, 42, 3, 4]

Another way to modify a list object L is to use a slice of L as the target (LHS) of an assignment statement. The RHS of the assignment must be an iterable. When the LHS slice is in extended form (i.e., the slicing specifies a stride!=1), then the RHS must have just as many items as the number of items in the LHS slice. When the LHS slicing does not specify a stride, or explicitly specifies a stride of 1, the LHS slice and the RHS may each be of any length; assigning to such a slice of a list can make the list longer or shorter. For example:

x = [1, 2, 3, 4]
x[1:3] = [22, 33, 44]     # x is now [1, 22, 33, 44, 4]
x[1:4] = [8, 9]           # x is now [1, 8, 9, 4]

Here are some important special cases of assignment to slices:

  • Using the empty list [] as the RHS expression removes the target slice from L. In other words, L[i:j]=[] has the same effect as del L[i:j] (or the peculiar statement L[i:j]*=0).

  • Using an empty slice of L as the LHS target inserts the items of the RHS at the appropriate spot in L. For example, L[i:i]=['a','b'] inserts 'a' and 'b' before the item that was at index i in L prior to the assignment.

  • Using a slice that covers the entire list object, L[:], as the LHS target, totally replaces the contents of L.

You can delete an item or a slice from a list with del. For instance:

x = [1, 2, 3, 4, 5]
del x[1]                 # x is now [1, 3, 4, 5]
del x[::2]               # x is now [3, 5]

In-place operations on a list

List objects define in-place versions of the + and * operators, which you can use via augmented assignment statements. The augmented assignment statement L+=L1 has the effect of adding the items of iterable L1 to the end of L, just like L.extend(L1). L*=n has the effect of adding n-1 copies of L to the end of L; if n<=0, L*=n empties the contents of L, like L[:]=[].

List methods

List objects provide several methods, as shown in Table 3-3. Nonmutating methods return a result without altering the object to which they apply, while mutating methods may alter the object to which they apply. Many of a list’s mutating methods behave like assignments to appropriate slices of the list. In Table 3-3, L indicates any list object, i any valid index in L, s any iterable, and x any object.

Table 3-3. List object methods
Method Description

Nonmutating

 

L .count( x )

Returns the number of items of L that are equal to x.

L .index( x )

Returns the index of the first occurrence of an item in L that is equal to x, or raises an exception if L has no such item.

Mutating

 

L .append( x )

Appends item x to the end of L ; like L[len(L):]=[x].

L .extend( s )

Appends all the items of iterable s to the end of L ; like L[len(L):]=s or L += s.

L.insert(i, x)

Inserts item x in L before the item at index i, moving following items of L (if any) “rightward” to make space (increases len(L) by one, does not replace any item, does not raise exceptions; acts just like L[i:i]=[x]).

L .remove( x )

Removes from L the first occurrence of an item in L that is equal to x, or raises an exception if L has no such item.

L.pop(i=-1)

Returns the value of the item at index i and removes it from L; when you omit i, removes and returns the last item; raises an exception if L is empty or i is an invalid index in L.  

L .reverse()

Reverses, in place, the items of L.

L.sort(cmp=cmp, key=None, reverse=False)

Sorts, in-place, the items of L, comparing items pairwise via—v2, only —the function passed as cmp (by default, the built-in function cmp). When argument key is not None, what gets compared for each item x is key(x), not x itself. For more details, see “Sorting a list”. Argument cmp is deprecated in v2 (we recommend never using it) and does not exist at all in v3.

All mutating methods of list objects, except pop, return None.

Sorting a list

A list’s method sort causes the list to be sorted in-place (reordering items to place them in increasing order) in a way that is guaranteed to be stable (elements that compare equal are not exchanged). In practice, sort is extremely fast, often preternaturally fast, as it can exploit any order or reverse order that may be present in any sublist (the advanced algorithm sort uses, known as timsort to honor its inventor, great Pythonista Tim Peters, is a “non-recursive adaptive stable natural mergesort/binary insertion sort hybrid”—now there’s a mouthful for you!).

In v2, the sort method takes three optional arguments, which may be passed with either positional or named-argument syntax. The argument cmp (deprecated), when present, must be a function that, when called with any two list items as arguments, returns -1, 0, or 1, depending on whether the first item is to be considered less than, equal to, or greater than the second item for sorting purposes (when not present, it defaults to the built-in function cmp, which has exactly these semantics). The argument key, if not None, must be a function that can be called with any list item as its only argument. In this case, to compare any two items x and y, Python uses cmp(key(x),key(y)) rather than cmp(x,y) (internally, Python implements this in the same way as the decorate-sort-undecorate idiom presented in “Searching and sorting”, but substantially faster). The argument reverse, if True, causes the result of each comparison to be reversed; this is not the same thing as reversing L after sorting, because the sort is stable (elements that compare equal are never exchanged) whether the argument reverse is true or false.

v3’s sort works just like v2’s, but the previously deprecated argument cmp is now simply nonexistent (if you try passing it, Python raises an exception):

mylist = ['alpha', 'Beta', 'GAMMA']
mylist.sort()                  #  ['Beta', 'GAMMA', 'alpha']
mylist.sort(key=str.lower)     #  ['alpha', 'Beta', 'GAMMA']

Python also provides the built-in function sorted (covered in Table 7-2) to produce a sorted list from any input iterable. sorted, after the first argument (which is the iterable supplying the items), accepts the same arguments as a list’s method sort.

The standard library module operator (covered in “The operator Module”) supplies higher-order functions attrgetter and itemgetter, which produce functions particularly suitable for the key= optional argument of lists’ method sort and the built-in function sorted. The identical key= optional argument also exists for built-in functions min and max, and for functions nsmallest and nlargest in standard library module heapq, covered in “The heapq Module”.

Set Operations

Python provides a variety of operations applicable to sets (both plain and frozen). Since sets are containers, the built-in len function can take a set as its single argument and return the number of items in the set. A set is iterable, so you can pass it to any function or method that takes an iterable argument. In this case, iteration yields the items of the set in some arbitrary order. For example, for any set S, min(S) returns the smallest item in S, since min with a single argument iterates on that argument (the order does not matter, because the implied comparisons are transitive).

Set Membership

The k in S operator checks whether object k is one of the items of set S. It returns True when it is, False when it isn’t. k not in S is like not (k in S).

Set Methods

Set objects provide several methods, as shown in Table 3-4. Nonmutating methods return a result without altering the object to which they apply, and can also be called on instances of frozenset; mutating methods may alter the object to which they apply, and can be called only on instances of set. In Table 3-4, S denotes any set object, S1 any iterable with hashable items (often but not necessarily a set or frozenset), x any hashable object.

Table 3-4. Set object methods
Method Description

Nonmutating

 

S .copy()

Returns a shallow copy of S (a copy whose items are the same objects as S’s, not copies thereof), like set(S)

S .difference( S1 )

Returns the set of all items of S that aren’t in S1

S .intersection( S1 )

Returns the set of all items of S that are also in S1

S .issubset( S1 )

Returns True when all items of S are also in S1; otherwise, returns False

S .issuperset( S1 )

Returns True when all items of S1 are also in S; otherwise, returns False (like S1.issubset(S))

S .symmetric_difference( S1 )

Returns the set of all items that are in either S or S1, but not both

S .union( S1 )

Returns the set of all items that are in S, S1, or both

Mutating

 

S .add( x )

Adds x as an item to S; no effect if x was already an item in S

S .clear()

Removes all items from S, leaving S empty

S .discard( x )

Removes x as an item of S; no effect when x was not an item of S

S .pop()

Removes and returns an arbitrary item of S

S .remove( x )

Removes x as an item of S; raises a KeyError exception when x was not an item of S

All mutating methods of set objects, except pop, return None.

The pop method can be used for destructive iteration on a set, consuming little extra memory. The memory savings make pop usable for a loop on a huge set, when what you want is to “consume” the set in the course of the loop. A potential advantage of a destructive loop such as

while S:
    item = S.pop()
    ...handle item...

in comparison to a nondestructive loop such as

for item in S:
    ...handle item...

is that, in the body of the destructive loop, you’re allowed to modify S (adding and/or removing items), which is not allowed in the nondestructive loop.

Sets also have mutating methods named difference_update, intersection_update, symmetric_difference_update, and update (corresponding to non-mutating method union). Each such mutating method performs the same operation as the corresponding nonmutating method, but it performs the operation in place, altering the set on which you call it, and returns None. The four corresponding non-mutating methods are also accessible with operator syntax: where S2 is a set or frozenset, respectively, S-S2, S&S2, S^S2, and S|S2; the mutating methods are accessible with augmented assignment syntax: respectively, S-=S2, S&=S2, S^=S2, and S|=S2. Note that, when you use operator or augmented assignment syntax, both operands must be sets or frozensets; however, when you call named methods, argument S1 can be any iterable with hashable items, and it works just as if the argument you passed was set(S1).

Dictionary Operations

Python provides a variety of operations applicable to dictionaries. Since dictionaries are containers, the built-in len function can take a dictionary as its argument and return the number of items (key/value pairs) in the dictionary. A dictionary is iterable, so you can pass it to any function that takes an iterable argument. In this case, iteration yields only the keys of the dictionary, in arbitrary order.6 For example, for any dictionary D, min(D) returns the smallest key in D.

Dictionary Membership

The k in D operator checks whether object k is a key in dictionary D. It returns True when it is, False when it isn’t. k not in D is like not (k in D).

Indexing a Dictionary

To denote the value in a dictionary D currently associated with key k, use an indexing: D[k]. Indexing with a key that is not present in the dictionary raises an exception. For example:

d = {'x':42, 'y':3.14, 'z':7}
d['x']                           # 42
d['z']                           # 7
d['a']                           # raises KeyError exception

Plain assignment to a dictionary indexed with a key that is not yet in the dictionary (e.g., D[newkey]=value) is a valid operation and adds the key and value as a new item in the dictionary. For instance:

d = {'x':42, 'y':3.14}
d['a'] = 16                    # d is now {'x':42, 'y':3.14, 'a':16}

The del statement, in the form del D[k], removes from the dictionary the item whose key is k. When k is not a key in dictionary D, del D[k] raises a KeyError exception.

Dictionary Methods

Dictionary objects provide several methods, as shown in Table 3-5. Nonmutating methods return a result without altering the object to which they apply, while mutating methods may alter the object to which they apply. In Table 3-5, D and D1 indicate any dictionary object, k any hashable object, and x any object.

Table 3-5. Dictionary object methods
Method Description

Nonmutating

 

D.copy()

Returns a shallow copy of the dictionary (a copy whose items are the same objects as D’s, not copies thereof), like dict(D)

D.get(k[, x])

Returns D[k] when k is a key in D; otherwise, returns x (or None, when x is not given)

D .items()

In v2, returns a new list with all items (key/value pairs) in D; in v3, returns an iterable dict_items instance, not a list

D .iteritems()

Returns an iterator on all items (key/value pairs) in D (v2 only)

D .iterkeys()

Returns an iterator on all keys in D (v2 only)

D .itervalues()

Returns an iterator on all values in D (v2 only)

D.keys()

In v2, returns a new list with all keys in D; in v3, returns an iterable dict_keys instance, not a list

D .values()

In v2, returns a new list with all values in D; in v3, returns an iterable dict_values instance, not a list

Mutating

 

D .clear()

Removes all items from D, leaving D empty

D.pop(k[, x])

Removes and returns D[k] when k is a key in D; otherwise, returns x (or raises a KeyError exception when x is not given)

D .popitem()

Removes and returns an arbitrary item (key/value pair)

D.setdefault(k[, x])

Returns D[k] when k is a key in D; otherwise, sets D[k] equal to x (or None, when x is not given) and returns x

D .update( D1 )

For each k in mapping D1, sets D[k] equal to D1[k]

The items, keys, and values methods return their resulting lists (in v2; dict_... iterable instances in v3) in arbitrary order.7 If you call more than one of these methods without any intervening change to the dictionary, the order of the results is the same for all. The iteritems, iterkeys, and itervalues methods (v2 only) return iterators equivalent to these lists (we cover iterators in “Iterators”).

Never modify a dict’s set of keys while iterating on it

An iterator or dict_... instance consumes less memory than a list, but you must never modify the set of keys in a dictionary (i.e., you must never either add or remove keys) while iterating over any of that dictionary’s iterators.

Iterating on the lists returned by items, keys, or values in v2 carries no such constraint (in v3, for the same purpose, you can explicitly make lists, such as list(D.keys())). Iterating directly on a dictionary D is exactly like iterating on D.iterkeys() in v2, on D.keys() in v3.

v3’s dict_... types are all iterable. dict_items and dict_keys also implement set nonmutating methods and behave much like frozensets; dict_values doesn’t, since, differently from the others (and from sets), it may contain some duplicate items.

The popitem method can be used for destructive iteration on a dictionary. Both items and popitem return dictionary items as key/value pairs. popitem is usable for a loop on a huge dictionary, when what you want is to “consume” the dictionary in the course of the loop.

D.setdefault(k, x) returns the same result as D.get(k, x), but, when k is not a key in D, setdefault also has the side effect of binding D[k] to the value x. (In modern Python, setdefault is rarely used, since type collections.defaultdict, covered in “defaultdict”, offers similar functionality with more elegance and speed.)

The pop method returns the same result as get, but, when k is a key in D, pop also has the side effect of removing D[k] (when x is not specified, and k is not a key in D, get returns None, but pop raises an exception).

The update method can also accept an iterable of key/value pairs, as an alternative argument instead of a mapping, and can accept named arguments instead of—or in addition to—its positional argument; the semantics are the same as for passing such arguments when calling the built-in dict type, as covered in “Dictionaries”.

Control Flow Statements

A program’s control flow is the order in which the program’s code executes. The control flow of a Python program depends on conditional statements, loops, and function calls. (This section covers the if conditional statement and for and while loops; we cover functions in “Functions”.) Raising and handling exceptions also affects control flow; we cover exceptions in Chapter 5.

The if Statement

Often, you need to execute some statements only when some condition holds, or choose statements to execute depending on mutually exclusive conditions. The compound statement if —comprising if, elif, and else clauses—lets you conditionally execute blocks of statements. The syntax for the if statement is:

if expression:
    statement(s)
elif expression:
    statement(s)
elif expression:
    statement(s)
...
else:
    statement(s)

The elif and else clauses are optional. Note that, unlike some languages, Python does not have a “switch” statement. Use if, elif, and else for all conditional processing.

Here’s a typical if statement with all three kinds of clauses (not in optimal style):

if x < 0: print('x is negative')
elif x % 2: print('x is positive and odd')
else: print('x is even and non-negative')

When there are multiple statements in a clause (i.e., the clause controls a block of statements), place the statements on separate logical lines after the line containing the clause’s keyword (known as the header line of the clause), indented rightward from the header line. The block terminates when the indentation returns to that of the clause header (or further left from there). When there is just a single simple statement, as here, it can follow the : on the same logical line as the header, but it can also be on a separate logical line, immediately after the header line and indented rightward from it. Most Python programmers prefer the separate-line style, with four-space indents for the guarded statements. Such a style is more general and more readable, and recommended by PEP 8. So, a generally preferred style is:

if x < 0:
    print('x is negative')
elif x % 2:
    print('x is positive and odd')
else:
    print('x is even and non-negative')

You can use any Python expression as the condition in an if or elif clause. Using an expression this way is known as using it in a Boolean context. In a Boolean context, any value is taken as either true or false. As mentioned earlier, any nonzero number or nonempty container (string, tuple, list, dictionary, set) evaluates as true; zero (of any numeric type), None, and empty containers evaluate as false. To test a value x in a Boolean context, use the following coding style:

if x:

This is the clearest and most Pythonic form. Do not use any of the following:

if x is True:
if x == True:
if bool(x):

There is a crucial difference between saying that an expression returns True (meaning the expression returns the value 1 with the bool type) and saying that an expression evaluates as true (meaning the expression returns any result that is true in a Boolean context). When testing an expression, for example in an if clause, you only care about what it evaluates as, not what, precisely, it returns.

When the if clause’s condition evaluates as true, the statements after the if clause execute, then the entire if statement ends. Otherwise, Python evaluates each elif clause’s condition, in order. The statements after the first elif clause whose condition evaluates as true, if any, execute, and the entire if statement ends. Otherwise, when an else clause exists it is executed, and then the statements after the entire if construct execute.

The while Statement

The while statement repeats execution of a statement or block of statements, controlled by a conditional expression. Here’s the syntax of the while statement:

while expression:
    statement(s)

A while statement can also include an else clause, covered in “The else Clause on Loop Statements”, and break and continue statements, covered in “The break Statement” and “The continue Statement”.

Here’s a typical while statement:

count = 0
while x > 0:
    x //= 2            # truncating division
    count += 1
print('The approximate log2 is', count)

First, Python evaluates expression, which is known as the loop condition. When the condition is false, the while statement ends. When the loop condition is true, the statement or statements that make up the loop body execute. Once the loop body finishes executing, Python evaluates the loop condition again to check whether another iteration should execute. This process continues until the loop condition is false, at which point the while statement ends.

The loop body should contain code that eventually makes the loop condition false; otherwise, the loop never ends (unless the body raises an exception or executes a break statement). A loop within a function’s body also ends if the loop body executes a return statement, since in this case the whole function ends.

The for Statement

The for statement repeats execution of a statement, or block of statements, controlled by an iterable expression. Here’s the syntax of the for statement:

for target in iterable:
    statement(s)

The in keyword is part of the syntax of the for statement; its purpose here is distinct from the in operator, which tests membership.

Here’s a typical for statement:

for letter in 'ciao':
    print('give me a', letter, '...')

A for statement can also include an else clause, covered in “The else Clause on Loop Statements”, and break and continue statements, covered in “The break Statement” and “The continue Statement”. iterable may be any Python expression suitable as an argument to built-in function iter, which returns an iterator object (explained in detail in the next section). In particular, any sequence is iterable. target is normally an identifier that names the control variable of the loop; the for statement successively rebinds this variable to each item of the iterator, in order. The statement or statements that make up the loop body execute once for each item in iterable (unless the loop ends because of an exception or a break or return statement). Note that, since the loop body may contain a break statement to terminate the loop, this is one case in which you may use an unbounded iterable—one that, per se, would never cease yielding items.

You can also have a target with multiple identifiers, like in an unpacking assignment. In this case, the iterator’s items must be iterables, each with exactly as many items as there are identifiers in the target (in v3 only, precisely one of the identifiers can be preceded by a star, in which case the starred item is bound to a list of all items, if any, that were not assigned to other targets). For example, when d is a dictionary, this is a typical way to loop on the items (key/value pairs) in d:

for key, value in d.items():
    if key and value:             # print only true keys and values
        print(key, value)

The items method returns a list (in v2; another kind of iterable, in v3) of key/value pairs, so we use a for loop with two identifiers in the target to unpack each item into key and value. Although components of a target are commonly identifiers, values can be bound to any acceptable LHS expression as covered in “Assignment Statements”:

prototype = [1, 'placemarker', 3]
for prototype[1] in 'xyz': print(prototype)
# prints [1, 'x', 3], then [1, 'y', 3], then [1, 'z', 3]

Don’t alter mutable objects while looping on them

When an iterable has a mutable underlying object, don’t alter that object during a for loop on it. For example, the preceding key/value printing example cannot alter d in v3 (and couldn’t in v2 if it used iteritems instead of items). iteritems in v2 (and items in v3) return iterables whose underlying object is d, so the loop body cannot mutate the set of keys in d (e.g., by executing del d[key]). d.items() in v2 (and list(d.items()) in v3) return new, independent lists, so that d is not the underlying object of the iterable; therefore, the loop body could mutate d. Specifically:

  • When looping on a list, do not insert, append, or delete items (rebinding an item at an existing index is OK).

  • When looping on a dictionary, do not add or delete items (rebinding the value for an existing key is OK).

  • When looping on a set, do not add or delete items (no alteration permitted).

The control target variable(s) may be rebound in the loop body, but get rebound again to the next item in the iterator at the next iteration of the loop. The loop body does not execute at all if the iterator yields no items. In this case, the control variable is not bound or rebound in any way by the for statement. However, if the iterator yields at least one item, then, when the loop statement terminates, the control variable remains bound to the last value to which the loop statement has bound it. The following code is therefore correct only when someseq is not empty:

for x in someseq:
    process(x)
print('Last item processed was', x)

Iterators

An iterator is an object i such that you can call next(i). next(i) returns the next item of iterator i or, when iterator i has no more items, raises a StopIteration exception. Alternatively, you can call next(i, default), in which case, when iterator i has no more items, the call returns default.

When you write a class (see “Classes and Instances”), you can allow instances of the class to be iterators by defining a special method __next__ (in v2 the method must be called next—this was eventually seen as a design error) that takes no argument except self, and returns the next item or raises StopIteration. Most iterators are built by implicit or explicit calls to built-in function iter, covered in Table 7-2. Calling a generator also returns an iterator, as we’ll discuss in “Generators”.

The for statement implicitly calls iter to get an iterator. The statement:

for x in c:
    statement(s)

is exactly equivalent to:

_temporary_iterator = iter(c)
while True:
    try: x = next(_temporary_iterator)
    except StopIteration: break
    statement(s)

where _temporary_iterator is an arbitrary name not used elsewhere in the current scope.

Thus, when iter(c) returns an iterator i such that next(i) never raises StopIteration (an unbounded iterator), the loop for x in c never terminates (unless the statements in the loop body include suitable break or return statements, or raise or propagate exceptions). iter(c), in turn, calls special method c.__iter__() to obtain and return an iterator on c. We’ll talk more about the special method __iter__ in “Container methods”.

Many of the best ways to build and manipulate iterators are found in the standard library module itertools, covered in “The itertools Module”.

range and xrange

Looping over a sequence of integers is a common task, so Python provides built-in functions range (and, in v2, xrange) to generate integer sequences. The simplest way to loop n times in Python is:

for i in range(n):
    statement(s)

In v2, range(x) returns a list whose items are consecutive integers from 0 (included) up to x (excluded). range(x, y) returns a list whose items are consecutive integers from x (included) up to y (excluded). The result is the empty list when x is greater than or equal to y. range(x, y, stride) returns a list of integers from x (included) up to y (excluded), such that the difference between each two adjacent items is stride. If stride is less than 0, range counts down from x to y. range returns the empty list when x is greater than or equal to y and stride is greater than 0, or when x is less than or equal to y and stride is less than 0. When stride equals 0, range raises an exception.

While range, in v2, returns a normal list object, usable for any purpose, xrange (v2 only) returns a special-purpose object, intended just for use in iterations like the for statement shown previously (xrange itself does not return an iterator; however, you can easily obtain such an iterator, should you need one, by calling iter(xrange(...))). The special-purpose object returned by xrange consumes less memory (for wide ranges, much less memory) than the list object range returns; apart from such memory issues, you can use range wherever you could use xrange, but not vice versa. For example, in v2:

>>> print(range(1, 5))
[1, 2, 3, 4]
>>> print(xrange(1, 5))
xrange(1, 5)

Here, range returns a perfectly ordinary list, which displays quite normally, but xrange returns a special-purpose object, which displays in its own special way.

In v3, there is no xrange, and range works similarly to v2’s xrange. In v3, if you need a list that’s an arithmetic progression of ints, call list(range(...)).

List comprehensions

A common use of a for loop is to inspect each item in an iterable and build a new list by appending the results of an expression computed on some or all of the items. The expression form known as a list comprehension or listcomp lets you code this common idiom concisely and directly. Since a list comprehension is an expression (rather than a block of statements), you can use it wherever you need an expression (e.g., as an argument in a function call, in a return statement, or as a subexpression of some other expression).

A list comprehension has the following syntax:

[ expression for target in iterable lc-clauses ]

target and iterable are the same as in a regular for statement. When expression denotes a tuple, you must enclose it in parentheses.

lc-clauses is a series of zero or more clauses, each with one of the two forms:

for target in iterable
if expression

target and iterable in each for clause of a list comprehension have the same syntax and meaning as those in a regular for statement, and the expression in each if clause of a list comprehension has the same syntax and meaning as the expression in a regular if statement.

A list comprehension is equivalent to a for loop that builds the same list by repeated calls to the resulting list’s append method. For example (assigning the list comprehension result to a variable for clarity):

result1 = [x+1 for x in some_sequence]

is (apart from the different variable name) the same as the for loop:

result2 = []
for x in some_sequence:
    result2.append(x+1)

Here’s a list comprehension that uses an if clause:

result3 = [x+1 for x in some_sequence if x>23]

This list comprehension is the same as a for loop that contains an if statement:

result4 = []
for x in some_sequence:
    if x>23:
        result4.append(x+1)

And here’s a list comprehension that uses a nested for clause to flatten a “list of lists” into a single list of items:

result5 = [x for sublist in listoflists for x in sublist]

This is the same as a for loop with another for loop nested inside:

result6 = []
for sublist in listoflists:
    for x in sublist:
        result6.append(x)

As these examples show, the order of for and if in a list comprehension is the same as in the equivalent loop, but, in the list comprehension, the nesting remains implicit. If you remember “order for clauses as in a nested loop,” it can help you get the ordering of the list comprehension’s clauses right.

Don’t build a list if you don’t need to

If you are simply going to loop on the values, rather than requiring an actual, indexable list, consider a generator expression instead (covered in “Generator expressions”). This lets you avoid list creation, and takes less memory.

List comprehensions and variable scope

In v2, target variables of for clauses in a list comprehension remain bound to their last value when the list comprehension is done, just like they would if you used for statements; so, remember to use target names that won’t trample over other local variables you need. In v3, a list comprehension is its own scope, so you need not worry; also, no such worry, in either v2 or v3, for generator expressions, set comprehensions, and dict comprehensions—only for listcomps.

Set comprehensions

A set comprehension has exactly the same syntax and semantics as a list comprehension, except that you enclose it in braces ({}) rather than in brackets ([]). The result is a set, so the order of items is not meaningful. For example:

s = {n//2 for n in range(10)}
print(sorted(s))  # prints: [0, 1, 2, 3, 4]

A similar list comprehension would have each item repeated twice, but, of course, a set intrinsically removes duplicates.

Dict comprehensions

A dict comprehension has the same syntax as a set comprehension, except that, instead of a single expression before the for clause, you use two expressions with a colon : between them—key:value. The result is a dict, so, just as in a set, the order of items (in a dict, that’s key/value pairs) is not meaningful. For example:

d = {n:n//2 for n in range(5)}
print(d)  # prints: {0:0, 1:0, 2:1, 3:1, 4:2] or other order

The break Statement

The break statement is allowed only within a loop body. When break executes, the loop terminates. If a loop nests inside other loops, a break terminates only the innermost nested loop. In practice, a break is usually within a clause of an if in the loop body, so that break executes conditionally.

One common use of break is in the implementation of a loop that decides whether it should keep looping only in the middle of each loop iteration (what Knuth called the “loop and a half” structure in his seminal 1974 paper, in which he also first proposed using “devices like indentation, rather than delimiters” to express program structure—just as Python does). For example:

while True:           # this loop can never terminate naturally
    x = get_next()
    y = preprocess(x)
    if not keep_looping(x, y): break
    process(x, y)

The continue Statement

The continue statement can exist only within a loop body. When continue executes, the current iteration of the loop body terminates, and execution continues with the next iteration of the loop. In practice, a continue is usually within a clause of an if in the loop body, so that continue executes conditionally.

Sometimes, a continue statement can take the place of nested if statements within a loop. For example:

for x in some_container:
    if not seems_ok(x): continue
    lowbound, highbound = bounds_to_test()
    if x<lowbound or x>=highbound: continue
    pre_process(x)
    if final_check(x):
        do_processing(x)

This equivalent code does conditional processing without continue:

for x in some_container:
    if seems_ok(x):
        lowbound, highbound = bounds_to_test()
        if lowbound <= x < highbound:
            pre_process(x)
            if final_check(x):
                do_processing(x)

Flat is better than nested

Both versions work the same way, so which one you use is a matter of personal preference and style. One of the principles of The Zen of Python (which you can see at any time by typing import this at an interactive Python interpreter prompt) is “Flat is better than nested.” The continue statement is just one way Python helps you move toward the goal of a less-nested structure in a loop, if you so choose.

The else Clause on Loop Statements

while and for statements may optionally have a trailing else clause. The statement or block under that else executes when the loop terminates naturally (at the end of the for iterator, or when the while loop condition becomes false), but not when the loop terminates prematurely (via break, return, or an exception). When a loop contains one or more break statements, you often need to check whether the loop terminates naturally or prematurely. You can use an else clause on the loop for this purpose:

for x in some_container:
    if is_ok(x): break  # item x is satisfactory, terminate loop
else:
    print('Beware: no satisfactory item was found in container')
    x = None

The pass Statement

The body of a Python compound statement cannot be empty; it must always contain at least one statement. You can use a pass statement, which performs no action, as an explicit placeholder when a statement is syntactically required but you have nothing to do. Here’s an example of using pass in a conditional statement as a part of somewhat convoluted logic to test mutually exclusive conditions:

if condition1(x):
    process1(x)
elif x>23 or condition2(x) and x<5:
    pass  # nothing to be done in this case
elif condition3(x):
    process3(x)
else:
    process_default(x)

Empty def or class statements: use a docstring, not pass

As the body of an otherwise empty def or class statement, you should use a docstring, covered in “Docstrings”; when you do write a docstring, then you do not need to also add a pass statement (you can do so if you wish, but it’s not optimal Python style).

The try and raise Statements

Python supports exception handling with the try statement, which includes try, except, finally, and else clauses. Your code can explicitly raise an exception with the raise statement. As we discuss in detail in “Exception Propagation”, when code raises an exception, normal control flow of the program stops, and Python looks for a suitable exception handler.

The with Statement

A with statement can be a much more readable and useful alternative to the try/finally statement. We discuss it in detail in “The with Statement and Context Managers”.

Functions

Most statements in a typical Python program are part of functions. Code in a function body may be faster than at a module’s top level, as covered in “Avoid exec and from … import *”, so there are excellent practical reasons to put most of your code into functions, in addition to the advantages in clarity and readability. On the other hand, there are no disadvantages to organizing your code into functions versus leaving the code at module level.

A function is a group of statements that execute upon request. Python provides many built-in functions and allows programmers to define their own functions. A request to execute a function is known as a function call. When you call a function, you can pass arguments that specify data upon which the function performs its computation. In Python, a function always returns a result value, either None or a value that represents the results of the computation. Functions defined within class statements are also known as methods. We cover issues specific to methods in “Bound and Unbound Methods”; the general coverage of functions in this section, however, also applies to methods.

In Python, functions are objects (values), handled just like other objects. Thus, you can pass a function as an argument in a call to another function. Similarly, a function can return another function as the result of a call. A function, just like any other object, can be bound to a variable, can be an item in a container, and can be an attribute of an object. Functions can also be keys into a dictionary. For example, if you need to quickly find a function’s inverse given the function, you could define a dictionary whose keys and values are functions and then make the dictionary bidirectional. Here’s a small example of this idea, using some functions from module math, covered in “The math and cmath Modules”, where we create a basic inverse mapping and add the inverse of each entry:

def make_inverse(inverse_dict):
    for f in list(inverse_dict):
        inverse_dict[inverse_dict[f]] = f
    return inverse_dict
inverse = make_inverse({sin:asin, cos:acos, tan:atan, log:exp})

The fact that functions are ordinary objects in Python is often expressed by saying that functions are first-class objects.

The def Statement

The def statement is the most common way to define a function. def is a single-clause compound statement with the following syntax:

def function_name(parameters):
    statement(s)

function_name is an identifier. It is a variable that gets bound (or rebound) to the function object when def executes.

parameters is an optional list of identifiers that get bound to the values supplied when you call the function, which we refer to as arguments to distinguish between definitions (parameters) and calls (arguments). In the simplest case a function doesn’t have any parameters, which means the function won’t accept any arguments when you call it. In this case, the function definition has empty parentheses after function_name, as must all calls to the function.

When a function does accept arguments, parameters contains one or more identifiers, separated by commas (,). In this case, each call to the function supplies values, known as arguments, corresponding to the parameters listed in the function definition. The parameters are local variables of the function (as we’ll discuss later in this section), and each call to the function binds these local variables in the function namespace to the corresponding values that the caller, or the function definition, supplies as arguments.

The nonempty block of statements, known as the function body, does not execute when the def statement executes. Rather, the function body executes later, each time you call the function. The function body can contain zero or more occurrences of the return statement, as we’ll discuss shortly.

Here’s an example of a simple function that returns a value that is twice the value passed to it each time it’s called:

def twice(x):
    return x*2

Note that the argument can be anything that can be multiplied by two, so the function could be called with a number, string, list, or tuple as an argument, and it would return, in each case, a new value of the same type as the argument.

Parameters

Parameters (more pedantically, formal parameters) name, and sometimes specify a default value for, the values passed into a function call. Each of the names is bound inside a new local namespace every time the function is called; this namespace is destroyed when the function returns or otherwise exits (values returned by the function may be kept alive by the caller binding the function result). Parameters that are just identifiers indicate positional parameters (also known as mandatory parameters). Each call to the function must supply a corresponding value (argument) for each mandatory (positional) parameter.

In the comma-separated list of parameters, zero or more positional parameters may be followed by zero or more named parameters (also known as optional parameters, and sometimes—confusingly, since they do not involve any keyword—as keyword parameters), each with the syntax:

identifier=expression

The def statement evaluates each such expression and saves a reference to the expression’s value, known as the default value for the parameter, among the attributes of the function object. When a function call does not supply an argument corresponding to a named parameter, the call binds the parameter’s identifier to its default value for that execution of the function.

Note that Python computes each default value precisely once, when the def statement evaluates, not each time the resulting function gets called. In particular, this means that exactly the same object, the default value, gets bound to the named parameter whenever the caller does not supply a corresponding argument.

Beware using mutable default values

A mutable default value, such as a list, can be altered each time you call the function without an explicit argument corresponding to the respective parameter. This is usually not the behavior you want; see the following detailed explanation.

Things can be tricky when the default value is a mutable object and the function body alters the parameter. For example:

def f(x, y=[]):
    y.append(x)
    return y, id(y)
print(f(23))                # prints: ([23], 4302354376)
print(f(42))                # prints: ([23, 42], 4302354376)

The second print prints [23, 42] because the first call to f altered the default value of y, originally an empty list [], by appending 23 to it. The id values confirm that both calls return the same object. If you want y to be bound to a new empty list object each time you call f with a single argument, use the following idiom instead:

def f(x, y=None):
    if y is None: y = []
    y.append(x)
    return y, id(y)
print(f(23))               # prints: ([23], 4302354376)
prinf(f(42))               # prints: ([42], 4302180040)

Of course, there are cases in which you explicitly want to alter a parameter’s default value, most often for caching purposes, as in the following example:

def cached_compute(x, _cache={}):
    if x not in _cache:
        _cache[x] = costly_computation(x)
    return _cache[x]

Such caching behavior (also known as memoization), however, is usually best obtained by decorating the underlying function with functools.lru_cache, covered in Table 7-4.

At the end of the parameters, you may optionally use either or both of the special forms *args and **kwds.8 There is nothing special about the names—you can use any identifier you want in each special form; args and kwds or kwargs, as well as a and k, are just popular identifiers to use in these roles. If both forms are present, the form with two asterisks must come second.

*args specifies that any extra positional arguments to a call (i.e., those positional arguments not explicitly specified in the signature, defined in “Function signatures” below) get collected into a (possibly empty) tuple, and bound in the call namespace to the name args. Similarly **kwds specifies that any extra named arguments (i.e., those named arguments not explicitly specified in the signature, defined in “Function signatures” below) get collected into a (possibly empty) dictionary9 whose items are the names and values of those arguments, and bound to the name kwds in the function call namespace (we cover positional and named arguments in “Calling Functions”).

For example, here’s a function that accepts any number of positional arguments and returns their sum (also showing the use of an identifier other than *args):

def sum_args(*numbers):
    return sum(numbers)
print(sum_args(23, 42))          # prints: 65

Function signatures

The number of parameters of a function, together with the parameters’ names, the number of mandatory parameters, and the information on whether (at the end of the parameters) either or both of the single- and double-asterisk special forms are present, collectively form a specification known as the function’s signature. A function’s signature defines the ways in which you can call the function.

“Keyword-only” Parameters (v3 Only)

In v3 only, a def statement also optionally lets you specify parameters that must, when you call the function, correspond to named arguments of the form identifier=expression (see “Kinds of arguments”) if they are present at all. These are known as (confusingly, since keywords have nothing to do with them) as keyword-only parameters. If these parameters appear, they must come after *args (if any) and before **kwargs (if any).

Each keyword-only parameter can be specified in the parameter list as either a simple identifier (in which case it’s mandatory, when calling the function, to pass the corresponding named argument), or in identifier=default form (in which case it’s optional, when calling the function, whether to pass the corresponding argumentbut, when you pass it, you must pass it as a named argument, not as a positional one). Either form can be used for each different keyword-only parameter. It’s more usual and readable to have simple identifiers, if any, at the start of the keyword-only parameter specifications, and identifier=default forms, if any, following them, though this is not a requirement of the language.

All the keyword-only parameter specifications come after the special form *args, if the latter is present. If that special form is not present, then the start of the sequence of keyword-only parameter specifications is a null parameter, consisting solely of an * (asterisk), to which no argument corresponds. For example:

def f(a, *, b, c=56):  # b and c are keyword-only
    return a, b, c  
f(12,b=34)  # returns (12, 34, 56)—c's optional, having a default
f(12)       # raises a TypeError exception 
# error message is: missing 1 required keyword-only argument: 'b'

If you also specify the special form **kwds, it must come at the end of the parameter list (after the keyword-only parameter specifications, if any). For example:

def g(x, *a, b=23, **k):  # b is keyword-only
    return x, a, b, k
g(1, 2, 3, c=99)  # returns (1, (2, 3), 23, {'c': 99})

Attributes of Function Objects

The def statement sets some attributes of a function object. Attribute __name__ refers to the identifier string given as the function name in the def. You may rebind the attribute to any string value, but trying to unbind it raises an exception. Attribute __defaults__, which you may freely rebind or unbind, refers to the tuple of default values for optional parameters (an empty tuple, if the function has no optional parameters).

Docstrings

Another function attribute is the documentation string, also known as the docstring. You may use or rebind a function’s docstring attribute as __doc__. If the first statement in the function body is a string literal, the compiler binds that string as the function’s docstring attribute. A similar rule applies to classes (see “Class documentation strings”) and modules (see “Module documentation strings”). Docstrings usually span multiple physical lines, so you normally specify them in triple-quoted string literal form. For example:

def sum_args(*numbers):
    """Return the sum of multiple numerical arguments.

    The arguments are zero or more numbers.   
    The result is their sum.
    """
    return sum(numbers)

Documentation strings should be part of any Python code you write. They play a role similar to that of comments, but their applicability is wider, since they remain available at runtime (unless you run your program with python -OO, as covered in “Command-Line Syntax and Options”). Development environments and tools can use docstrings from function, class, and module objects to remind the programmer how to use those objects. The doctest module (covered in “The doctest Module”) makes it easy to check that sample code present in docstrings is accurate and correct, and remains so, as the code and docs get edited and maintained.

To make your docstrings as useful as possible, respect a few simple conventions. The first line of a docstring should be a concise summary of the function’s purpose, starting with an uppercase letter and ending with a period. It should not mention the function’s name, unless the name happens to be a natural-language word that comes naturally as part of a good, concise summary of the function’s operation. If the docstring is multiline, the second line should be empty, and the following lines should form one or more paragraphs, separated by empty lines, describing the function’s parameters, preconditions, return value, and side effects (if any). Further explanations, bibliographical references, and usage examples (which you should check with doctest) can optionally follow toward the end of the docstring.

Other attributes of function objects

In addition to its predefined attributes, a function object may have other arbitrary attributes. To create an attribute of a function object, bind a value to the appropriate attribute reference in an assignment statement after the def statement executes. For example, a function could count how many times it gets called:

def counter():
    counter.count += 1
    return counter.count
counter.count = 0

Note that this is not common usage. More often, when you want to group together some state (data) and some behavior (code), you should use the object-oriented mechanisms covered in Chapter 4. However, the ability to associate arbitrary attributes with a function can sometimes come in handy.

Function Annotations and Type Hints (v3 Only)

In v3 only, every parameter in a def clause can be annotated with an arbitrary expression—that is, wherever within the def’s parameter list you can use an identifier, you can alternatively use the form identifier:expression, and the expression’s value becomes the annotation for that parameter name.

You can annotate the return value of the function, using the form ->expression between the ) of the def clause and the : that ends the def clause; the expression’s value becomes the annotation for name 'return'. For example:

>>> def f(a:'foo', b)->'bar': pass
... 
>>> f.__annotations__
{'a': 'foo', 'return': 'bar'}

As shown in this example, the __annotations__ attribute of the function object is a dict mapping each annotated identifier to the respective annotation.

You can use annotations for whatever purpose you wish: Python itself does nothing with them, except constructing the __annotations__ attribute.

The intention is to let future, hypothetical third-party tools leverage annotations to perform more thorough static checking on annotated functions. (If you want to experiment with such static checks, we recommend you try the mypy project.)

Type hints (Python 3.5 only)

To further support hypothetical third-party tools, Python 3.5 has introduced a complicated set of commenting conventions, and a new provisional module called typing in the standard library, to standardize how annotations are to be used to denote typing hints (again, Python does nothing with the information, but third-party tools may eventually be developed to leverage this now-standardized information).

A provisional module is one that might change drastically, or even disappear, as early as the next feature release (so, in the future, Python 3.7 might not have the module typing in its standard library, or the module’s contents might be different). It is not yet intended for use “in production,” but rather just for experimental purposes.

As a consequence, we do not further cover type hints in this book; see PEP 484 and the many links from it for much, much more. (The mypy third-party project does fully support PEP 484–compliant v3 Python code).

New in 3.6: Type annotations

Although the conventions defined in PEP 484 could help third-party static analysis tools operate on Python code, the PEP does not include syntax support for explicitly marking variables as being of a specific type; such type annotations were embedded in the form of comments. PEP 526 now defines syntax for variable type annotations.

Guido van Rossum has cited anecdotal evidence that type annotations can be helpful, for example, when engineering large legacy codebases. They may be unnecessarily complicated for beginners, but they remain completely optional and so can be taught as required to those with some Python experience. While you can still use PEP 484–style comments, it seems likely that PEP 526 annotations will supersede them. (The mypy third-party project, at the time of this writing, only partially supports PEP 526–compliant Python 3.6 code.)

The return Statement

The return statement in Python can exist only inside a function body, and can optionally be followed by an expression. When return executes, the function terminates, and the value of the expression is the function’s result. A function returns None if it terminates by reaching the end of its body or by executing a return statement that has no expression (or, of course, by explicitly executing return None).

Good style in return statements

As a matter of good style, never write a return statement without an expression at the end of a function body. When some return statements in a function have an expression, then all return statements in the function should have an expression. return None should only ever be written explicitly to meet this style requirement. Python does not enforce these stylistic conventions, but your code is clearer and more readable when you follow them.

Calling Functions

A function call is an expression with the following syntax:

function_object(arguments)

function_object may be any reference to a function (or other callable) object; most often, it’s the function’s name. The parentheses denote the function-call operation itself. arguments, in the simplest case, is a series of zero or more expressions separated by commas (,), giving values for the function’s corresponding parameters. When the function call executes, the parameters are bound to the argument values, the function body executes, and the value of the function-call expression is whatever the function returns.

Don’t forget the trailing () to call a function

Just mentioning a function (or other callable object) does not, per se, call it. To call a function (or other object) without arguments, you must use () after the function’s name (or other reference to the callable object).

The semantics of argument passing

In traditional terms, all argument passing in Python is by value (although, in modern terminology, to say that argument passing is by object reference is more precise and accurate; some, but not all, of this book’s authors also like the synonym call by sharing). For example, if you pass a variable as an argument, Python passes to the function the object (value) to which the variable currently refers, not the variable itself, and this value is bound to the parameter name in the function call namespace. Thus, a function cannot rebind the caller’s variables. However, if you pass a mutable object as an argument, the function may request changes to that object, because Python passes the object itself, not a copy. Rebinding a variable and mutating an object are totally disjoint concepts. For example:

def f(x, y):
    x = 23
    y.append(42)
a = 77
b = [99]
f(a, b)
print(a, b)                # prints: 77 [99, 42]

print shows that a is still bound to 77. Function f’s rebinding of its parameter x to 23 has no effect on f’s caller, nor, in particular, on the binding of the caller’s variable that happened to be used to pass 77 as the parameter’s value. However, print also shows that b is now bound to [99, 42]. b is still bound to the same list object as before the call, but that object has mutated, as f has appended 42 to that list object. In either case, f has not altered the caller’s bindings, nor can f alter the number 77, since numbers are immutable. However, f can alter a list object, since list objects are mutable. In this example, f mutates the list object that the caller passes to f as the second argument by calling the object’s append method.

Kinds of arguments

Arguments that are just expressions are known as positional arguments. Each positional argument supplies the value for the parameter that corresponds to it by position (order) in the function definition. There is no requirement for positional arguments to match positional parameters—if there are more positional arguments that positional parameters, the additional arguments are bound by position to named parameters, if any, and then (if the function’s signature includes a *args special form) to the tuple bound to args. For example:

def f(a, b, c=23, d=42, *x):
    print(a, b, c, d, x)
f(1,2,3,4,5,6)  # prints (1, 2, 3, 4, (5, 6))

In a function call, zero or more positional arguments may be followed by zero or more named arguments (sometimes confusingly known as keyword arguments), each with the following syntax:

identifier=expression

In the absence of any **kwds parameter, the identifier must be one of the parameter names used in the def statement for the function. The expression supplies the value for the parameter of that name. Many built-in functions do not accept named arguments: you must call such functions with positional arguments only. However, functions coded in Python accept named as well as positional arguments, so you may call them in different ways. Positional parameters can be matched by named arguments, in the absence of matching positional arguments.

A function call must supply, via a positional or a named argument, exactly one value for each mandatory parameter, and zero or one value for each optional parameter. For example:

def divide(divisor, dividend):
    return dividend // divisor
print(divide(12, 94))                        # prints: 7
print(divide(dividend=94, divisor=12))       # prints: 7

As you can see, the two calls to divide are equivalent. You can pass named arguments for readability purposes whenever you think that identifying the role of each argument and controlling the order of arguments enhances your code’s clarity.

A common use of named arguments is to bind some optional parameters to specific values, while letting other optional parameters take default values:

def f(middle, begin='init', end='finis'):
    return begin+middle+end
print(f('tini', end=''))                    # prints: inittini

Using named argument end='', the caller specifies a value (the empty string '') for f’s third parameter, end, and still lets f’s second parameter, begin, use its default value, the string 'init'. Note that you may pass the arguments as positional arguments, even when the parameters are named; for example, using the preceding function:

print(f('a','c','t'))                       # prints: cat

At the end of the arguments in a function call, you may optionally use either or both of the special forms *seq and **dct. If both forms are present, the form with two asterisks must be last. *seq passes the items of seq to the function as positional arguments (after the normal positional arguments, if any, that the call gives with the usual syntax). seq may be any iterable. **dct passes the items of dct to the function as named arguments, where dct must be a dictionary whose keys are all strings. Each item’s key is a parameter name, and the item’s value is the argument’s value.

Sometimes you want to pass an argument of the form *seq or **dct when the parameters use similar forms, as covered earlier in “Parameters”. For example, using the function sum_args defined in that section (and shown again here), you may want to print the sum of all the values in dictionary d. This is easy with *seq:

def sum_args(*numbers):
    return sum(numbers)
print(sum_args(*d.values()))

(Of course, print(sum(d.values())) would be simpler and more direct.)

You may also pass arguments *seq or **dct when calling a function that does not use the corresponding forms in its signature. In that case, of course, you must ensure that iterable seq has the right number of items, or, respectively, that dictionary dct uses the right identifier strings as its keys; otherwise, the call operation raises an exception. As noted in ““Keyword-only” Parameters (v3 Only)”, keyword-only parameters in v3 cannot be matched with a positional argument (unlike named parameters); they must match a named argument, explicit or passed via **dct.

In v3, a function call may have zero or more occurrences of *seq and/or **dct, as specified in PEP 448.

Namespaces

A function’s parameters, plus any names that are bound (by assignment or by other binding statements, such as def) in the function body, make up the function’s local namespace, also known as local scope. Each of these variables is known as a local variable of the function.

Variables that are not local are known as global variables (in the absence of nested function definitions, which we’ll discuss shortly). Global variables are attributes of the module object, as covered in “Attributes of module objects”. Whenever a function’s local variable has the same name as a global variable, that name, within the function body, refers to the local variable, not the global one. We express this by saying that the local variable hides the global variable of the same name throughout the function body.

The global statement

By default, any variable that is bound within a function body is a local variable of the function. If a function needs to bind or rebind some global variables (not a good practice!), the first statement of the function’s body must be:

global identifiers

where identifiers is one or more identifiers separated by commas (,). The identifiers listed in a global statement refer to the global variables (i.e., attributes of the module object) that the function needs to bind or rebind. For example, the function counter that we saw in “Other attributes of function objects” could be implemented using global and a global variable, rather than an attribute of the function object:

_count = 0
def counter():
    global _count
    _count += 1
    return _count

Without the global statement, the counter function would raise an UnboundLocalError exception because _count would then be an uninitialized (unbound) local variable. While the global statement enables this kind of programming, this style is inelegant and unadvisable. As we mentioned earlier, when you want to group together some state and some behavior, the object-oriented mechanisms covered in Chapter 4 are usually best.

Eschew global

Never use global if the function body just uses a global variable (including mutating the object bound to that variable, when the object is mutable). Use a global statement only if the function body rebinds a global variable (generally by assigning to the variable’s name). As a matter of style, don’t use global unless it’s strictly necessary, as its presence causes readers of your program to assume the statement is there for some useful purpose. In particular, never use global except as the first statement in a function body.

Nested functions and nested scopes

A def statement within a function body defines a nested function, and the function whose body includes the def is known as an outer function to the nested one. Code in a nested function’s body may access (but not rebind) local variables of an outer function, also known as free variables of the nested function.

The simplest way to let a nested function access a value is often not to rely on nested scopes, but rather to pass that value explicitly as one of the function’s arguments. If necessary, the argument’s value can be bound at nested function def time, by using the value as the default for an optional argument. For example:

def percent1(a, b, c):
    def pc(x, total=a+b+c):
        return (x*100.0) / total
    print('Percentages are:', pc(a), pc(b), pc(c))

Here’s the same functionality using nested scopes:

def percent2(a, b, c):
    def pc(x):
        return (x*100.0) / (a+b+c)
    print('Percentages are:', pc(a), pc(b), pc(c))

In this specific case, percent1 has at least a tiny advantage: the computation of a+b+c happens only once, while percent2’s inner function pc repeats the computation three times. However, when the outer function rebinds local variables between calls to the nested function, repeating the computation can be necessary: be aware of both approaches, and choose the appropriate one case by case.

A nested function that accesses values from outer local variables is also known as a closure. The following example shows how to build a closure:

def make_adder(augend):
    def add(addend):
        return addend+augend
    return add

Closures are sometimes an exception to the general rule that the object-oriented mechanisms covered in Chapter 4 are the best way to bundle together data and code. When you need specifically to construct callable objects, with some parameters fixed at object-construction time, closures can be simpler and more effective than classes. For example, the result of make_adder(7) is a function that accepts a single argument and adds 7 to that argument. An outer function that returns a closure is a “factory” for members of a family of functions distinguished by some parameters, such as the value of argument augend in the previous example, and may often help you avoid code duplication.

In v3 only, the nonlocal keyword acts similarly to global, but it refers to a name in the namespace of some lexically surrounding function. When it occurs in a function definition nested several levels deep, the compiler searches the namespace of the most deeply nested containing function, then the function containing that one, and so on, until the name is found or there are no further containing functions, in which case the compiler raises an error.

Here’s a v3-only nested-functions approach to the “counter” functionality we implemented in previous sections using a function attribute, then a global variable:

def make_counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

c1 = make_counter()
c2 = make_counter()

print(c1(), c1(), c1())    # prints: 1 2 3
print(c2(), c2())          # prints: 1 2
print(c1(), c2(), c1())    # prints: 4 3 5 

A key advantage of this approach versus the previous ones is that nested functions, just like an OOP approach, let you make independent counters, here c1 and c2—each closure keeps its own state and doesn’t interfere with the other.

This example is v3-only because it requires nonlocal, since the inner function counter needs to rebind free variable count. In v2, you would need a somewhat nonintuitive trick to achieve the same effect without rebinding a free variable:

def make_counter():
    count = [0]
    def counter():
        count[0] += 1
        return count[0]
    return counter

Clearly, the v3-only version is more readable, since it doesn’t require any trick.

lambda Expressions

If a function body is a single return expression statement, you may choose to replace the function with the special lambda expression form:

lambda parameters: expression

A lambda expression is the anonymous equivalent of a normal function whose body is a single return statement. Note that the lambda syntax does not use the return keyword. You can use a lambda expression wherever you could use a reference to a function. lambda can sometimes be handy when you want to use an extremely simple function as an argument or return value. Here’s an example that uses a lambda expression as an argument to the built-in filter function (covered in Table 7-2):

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
low = 3
high = 7
filter(lambda x: low<=x<high, a_list)    # returns: [3, 4, 5, 6]

(In v3, filter returns an iterator, not a list.) Alternatively, you can always use a local def statement to give the function object a name, then use this name as an argument or return value. Here’s the same filter example using a local def statement:

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
def within_bounds(value, low=3, high=7):
    return low<=value><high
filter(within_bounds, a_list)            # returns: [3, 4, 5, 6]

While lambda can at times be handy, def is usually better: it’s more general, and makes the code more readable, as you can choose a clear name for the function.

Generators

When the body of a function contains one or more occurrences of the keyword yield, the function is known as a generator, or more precisely a generator function. When you call a generator, the function body does not execute. Instead, the generator function returns a special iterator object, known as a generator object (sometimes, somewhat confusingly, also referred to as just “a generator”), that wraps the function body, its local variables (including its parameters), and the current point of execution, which is initially the start of the function.

When you call next on a generator object, the function body executes from the current point up to the next yield, which takes the form:

yield expression

A bare yield without the expression is also legal, and equivalent to yield None. When a yield executes, the function execution is “frozen,” with current point of execution and local variables preserved, and the expression following yield returned as the result of next. When you call next again, execution of the function body resumes where it left off, again up to the next yield. When the function body ends or executes a return statement, the iterator raises a StopIteration exception to indicate that the iteration is finished. In v2, return statements in a generator cannot contain expressions; in v3, the expression after return, if any, is an argument to the resulting StopIteration.

yield is an expression, not a statement. When you call g.send(value) on a generator object g, the value of the yield is value; when you call next(g), the value of the yield is None. We cover this in “Generators as near-coroutines”: it’s the elementary building block to implement coroutines in Python.

A generator function is often a very handy way to build an iterator. Since the most common way to use an iterator is to loop on it with a for statement, you typically call a generator like this (with the call to next implicit in the for statement):

for avariable in somegenerator(arguments):

For example, say that you want a sequence of numbers counting up from 1 to N and then down to 1 again. A generator can help:

def updown(N):
    for x in range(1, N):
        yield x
    for x in range(N, 0, -1):
        yield x
for i in updown(3): print(i)                  # prints: 1 2 3 2 1

Here is a generator that works somewhat like built-in range, but returns an iterator on floating-point values rather than on integers:

def frange(start, stop, stride=1.0):
    while start < stop:
        yield start
        start += stride

This frange example is only somewhat like range because, for simplicity, it makes arguments start and stop mandatory, and assumes step is positive.

Generators are more flexible than functions that return lists. A generator may return an unbounded iterator, meaning one that yields an infinite stream of results (to use only in loops that terminate by other means, e.g., via a break statement). Further, a generator-object iterator performs lazy evaluation: the iterator computes each successive item only when and if needed, just in time, while the equivalent function does all computations in advance and may require large amounts of memory to hold the results list. Therefore, if all you need is the ability to iterate on a computed sequence, it is most often best to compute the sequence in a generator, rather than in a function that returns a list. If the caller needs a list of all the items produced by some bounded generator g(arguments), the caller can simply use the following code to explicitly request that a list be built:

resulting_list = list(g(arguments))

yield from (v3-only)

In v3 only, to improve execution efficiency and clarity when multiple levels of iteration are yielding values, you can use the form yield from expression, where expression is iterable. This yields the values from expression one at a time into the calling environment, avoiding the need to yield repeatedly. The updown generator defined earlier can be simplified in v3:

def updown(N):
    yield from range(1, N)
    yield from range(N, 0, -1)
for i in updown(3): print(i)                  # prints: 1 2 3 2 1

Moreover, the ability to use yield from means that, in v3 only, generators can be used as full-fledged coroutines, as covered in Chapter 18.

Generator expressions

Python offers an even simpler way to code particularly simple generators: generator expressions, commonly known as genexps. The syntax of a genexp is just like that of a list comprehension (as covered in “List comprehensions”), except that a genexp is within parentheses (()) instead of brackets ([]). The semantics of a genexp are the same as those of the corresponding list comprehension, except that a genexp produces an iterator yielding one item at a time, while a list comprehension produces a list of all results in memory (therefore, using a genexp, when appropriate, saves memory). For example, to sum the squares of all single-digit integers, you could code sum([x*x for x in range(10)]); however, you can express this even better, coding it as sum(x*x for x in range(10)) (just the same, but omitting the brackets), and obtain exactly the same result while consuming less memory. Note that the parentheses that indicate the function call also “do double duty” and enclose the genexp (no need for extra parentheses).

Generators as near-coroutines

In all modern Python versions (both v2 and v3), generators are further enhanced, with the possibility of receiving a value (or an exception) back from the caller as each yield executes. This lets generators implement coroutines, as explained in PEP 342. The main change from older versions of Python is that yield is not a statement, but an expression, so it has a value. When a generator resumes via a call to next on it, the corresponding yield’s value is None. To pass a value x into some generator g (so that g receives x as the value of the yield on which it’s suspended), instead of calling next(g), call g.send(x) (g.send(None) is just like next(g)).

Other modern enhancements to generators have to do with exceptions, and we cover them in “Generators and Exceptions”.

Recursion

Python supports recursion (i.e., a Python function can call itself), but there is a limit to how deep the recursion can be. By default, Python interrupts recursion and raises a RecursionLimitExceeded exception (covered in “Standard Exception Classes”) when it detects that the stack of recursive calls has gone over a depth of 1,000. You can change the recursion limit with function setrecursionlimit of module sys, covered in Table 7-3.

However, changing the recursion limit does not give you unlimited recursion; the absolute maximum limit depends on the platform on which your program is running, particularly on the underlying operating system and C runtime library, but it’s typically a few thousand levels. If recursive calls get too deep, your program crashes. Such runaway recursion, after a call to setrecursionlimit that exceeds the platform’s capabilities, is one of the very few ways a Python program can crash—really crash, hard, without the usual safety net of Python’s exception mechanisms. Therefore, beware “fixing” a program that is getting RecursionLimitExceeded exceptions by raising the recursion limit with setrecursionlimit. Most often, you’re better advised to look for ways to remove the recursion or, at least, limit the depth of recursion that your program needs.

Readers who are familiar with Lisp, Scheme, or functional programming languages must in particular be aware that Python does not implement the optimization of “tail-call elimination,” which is so important in these languages. In Python, any call, recursive or not, has the same cost in terms of both time and memory space, dependent only on the number of arguments: the cost does not change, whether the call is a “tail-call” (meaning that the call is the last operation that the caller executes) or any other, nontail call. This makes recursion removal even more important.

For example, consider a classic use for recursion: “walking a tree.” Suppose you have a “tree” data structure where each node is a tuple with three items: first, a payload value as item 0; then, as item 1, a similar tuple for the left-side descendant, or None; and similarly as item 2, for the right-side descendant. A simple example might be: (23, (42, (5, None, None), (55, None, None)), (94, None,None)) to represent the tree where the payloads are:

          23
       42    94
      5  55

We want to write a generator function that, given the root of such a tree, “walks the tree,” yielding each payload in top-down order. The simplest approach is clearly recursion; in v3:

def rec(t):
    yield t[0]
    for i in (1, 2):
        if t[i] is not None: yield from rec(t[i])

However, if a tree is very deep, recursion becomes a problem. To remove recursion, we just need to handle our own stack—a list used in last-in, first-out fashion, thanks to its append and pop methods. To wit, in either v2 or v3:

def norec(t):
    stack = [t]
    while stack:
        t = stack.pop()
        yield t[0]
        for i in (2, 1):
            if t[i] is not None: stack.append(t[i])

The only small issue to be careful about, to keep exactly the same order of yields as rec, is switching the (1, 2) index order in which to examine descendants, to (2, 1), adjusting to the reverse (last-in, first-out) behavior of stack.

1 In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.

2 Control characters are nonprinting characters such as (tab) and (newline), both of which count as whitespace; and a (alarm, AKA beep) and  (backspace), which are not whitespace.

3 Each specific mapping type may put constraints on the type of keys it accepts: in particular, dictionaries only accept hashable keys.

4 Note that, perhaps curiously, a!=b!=c does not imply a!=c, just like the longer-form equivalent expression a!=b and b!=c wouldn’t.

5 A slicing that’s also whimsically known as “the Martian smiley.”

6 New in 3.6: dictionaries have been reimplemented; keys are now iterated in insertion order until the first noninsertion mutator on the dict. To keep your code solid, do not rely on this new, non-guaranteed behavior until Python’s official docs explicitly ensure it.

7 In 3.6, as mentioned in the previous note, insertion order until a noninsertion mutator...but, to make your code robust, don’t take advantage of this new constraint.

8 Also spelled **kwargs.

9 New in 3.6: the mapping to which kwds is bound now preserves the order in which you pass extra named arguments in the call.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset