This chapter begins our in-depth tour of the Python language. In Python, data takes the form of objects—either built-in objects that Python provides, or objects we create using Python tools and other languages such as C. In fact, objects are the basis of every Python program you will ever write. Because they are the most fundamental notion in Python programming, objects are also our first focus in this book.
In the preceding chapter, we took a quick pass over Python’s core object types. Although essential terms were introduced in that chapter, we avoided covering too many specifics in the interest of space. Here, we’ll begin a more careful second look at data type concepts, to fill in details we glossed over earlier. Let’s get started by exploring our first data type category: Python’s numeric types.
Most of Python’s number types are fairly typical and will probably seem familiar if you’ve used almost any other programming language in the past. They can be used to keep track of your bank balance, the distance to Mars, the number of visitors to your website, and just about any other numeric quantity.
In Python, numbers are not really a single object type, but a category of similar types. Python supports the usual numeric types (integers and floating points), as well as literals for creating numbers and expressions for processing them. In addition, Python provides more advanced numeric programming support and objects for more advanced work. A complete inventory of Python’s numeric toolbox includes:
Integers and floating-point numbers
Complex numbers
Fixed-precision decimal numbers
Rational fraction numbers
Sets
Booleans
Unlimited integer precision
A variety of numeric built-ins and modules
This chapter starts with basic numbers and fundamentals, then moves on to explore the other tools in this list. Before we jump into code, though, the next few sections get us started with a brief overview of how we write and process numbers in our scripts.
Among its basic types, Python provides integers (positive and negative whole numbers) and floating-point numbers (numbers with a fractional part, sometimes called “floats” for economy). Python also allows us to write integers using hexadecimal, octal, and binary literals; offers a complex number type; and allows integers to have unlimited precision (they can grow to have as many digits as your memory space allows). Table 5-1 shows what Python’s numeric types look like when written out in a program, as literals.
Literal | Interpretation |
| Integers (unlimited size) |
| Floating-point numbers |
| Octal, hex, and binary literals in 2.6 |
| Octal, hex, and binary literals in 3.0 |
| Complex number literals |
In general, Python’s numeric type literals are straightforward to write, but a few coding concepts are worth highlighting here:
Integers are written as strings of decimal digits.
Floating-point numbers have a decimal point and/or an optional signed
exponent introduced by an e
or E
and followed by an
optional sign. If you write a number with a decimal point or
exponent, Python makes it a floating-point object and uses
floating-point (not integer) math when the object is used in
an expression. Floating-point numbers are implemented as C
“doubles,” and therefore get as much precision as the C
compiler used to build the Python interpreter gives to
doubles.
In Python 2.6 there are two integer types, normal (32 bits) and long (unlimited precision), and an integer may
end in an l
or L
to force it to become a long
integer. Because integers are automatically converted to long
integers when their values overflow 32 bits, you never need to
type the letter L
yourself—Python automatically converts up to long integer when
extra precision is needed.
In Python 3.0, the normal and long integer types
have been merged—there is only integer, which automatically
supports the unlimited precision of Python 2.6’s separate long
integer type. Because of this, integers can no longer be coded
with a trailing l
or
L
, and integers never print
with this character either. Apart from this, most programs are
unaffected by this change, unless they do type testing that
checks for 2.6 long integers.
Integers may be coded in decimal (base 10), hexadecimal (base 16), octal (base 8), or binary
(base 2). Hexadecimals start with a leading 0x
or 0X
, followed by a string of
hexadecimal digits (0
–9
and A
–F
). Hex digits may be coded in
lower- or uppercase. Octal literals start with a leading 0o
or 0O
(zero and lower- or uppercase
letter “o”), followed by a string of digits (0
–7
). In 2.6 and earlier, octal
literals can also be coded with just a leading 0
, but not in 3.0 (this original
octal form is too easily confused with decimal, and is
replaced by the new 0o
format). Binary literals, new in 2.6 and 3.0, begin with
a leading 0b
or 0B
, followed by binary digits
(0
–1
).
Note that all of these literals produce integer objects
in program code; they are just alternative syntaxes for
specifying values. The built-in calls hex(
I
)
, oct(
I
)
, and bin(
I
)
convert an integer to its
representation string in these three bases, and int(
str
,
base
)
converts a runtime string to an
integer per a given base.
Python complex literals are written as
realpart
+
imaginarypart
,
where the imaginarypart
is terminated with a j
or
J
. The
realpart
is technically optional,
so the imaginarypart
may appear on
its own. Internally, complex numbers are implemented as pairs
of floating-point numbers, but all numeric operations perform
complex math when applied to complex numbers. Complex numbers
may also be created with the complex(
real
,
imag
)
built-in call.
As we’ll see later in this chapter, there are additional, more advanced number types not included in Table 5-1. Some of these are created by calling functions in imported modules (e.g., decimals and fractions), and others have literal syntax all their own (e.g., sets).
Besides the built-in number literals shown in Table 5-1, Python provides a set of tools for processing number objects:
We’ll meet all of these as we go along.
Although numbers are primarily processed with expressions,
built-ins, and modules, they also have a handful of type-specific
methods today, which we’ll meet in this chapter
as well. Floating-point numbers, for example, have an as_integer_ratio
method that is useful for the fraction number type,
and an is_integer
method to test if the number is an integer. Integers
have various attributes, including a new bit_length
method in the upcoming Python 3.1 release that gives
the number of bits necessary to represent the object’s value.
Moreover, as part collection and part number,
sets also support both methods and
expressions.
Since expressions are the most essential tool for most number types, though, let’s turn to them next.
Perhaps the most fundamental tool that processes numbers is
the expression: a combination of
numbers (or other objects) and operators that computes a value when
executed by Python. In Python, expressions are written using the
usual mathematical notation
and operator symbols. For instance, to add two numbers X
and Y
you would say X + Y
, which tells
Python to apply the +
operator to
the values named by X
and
Y
. The result of the expression
is the sum of X
and Y
, another number object.
Table 5-2
lists all the operator expressions available in Python. Many are
self-explanatory; for
instance, the usual mathematical operators (+
, −
,
*
, /
, and so on) are supported. A few will be
familiar if you’ve used other languages in the past: %
computes a division remainder, <<
performs a bitwise left-shift, &
computes a bitwise AND result, and so on. Others are more
Python-specific, and not all are numeric in nature: for example, the
is
operator tests object identity
(i.e., address in memory, a strict form of equality), and lambda
creates unnamed functions.
Operators | Description |
| Generator
function |
| Anonymous function generation |
| Ternary selection
( |
| Logical OR ( |
| Logical AND ( |
| Logical negation |
| Membership (iterables, sets) |
| Object identity tests |
| Magnitude comparison, set subset and superset; Value equality operators |
| Bitwise OR, set union |
| Bitwise XOR, set symmetric difference |
| Bitwise AND, set intersection |
| Shift |
| Addition, concatenation; Subtraction, set difference |
| Multiplication, repetition; Remainder, format; Division: true and floor |
| Negation, identity |
| Bitwise NOT (inversion) |
| Power (exponentiation) |
| Indexing (sequence, mapping, others) |
| Slicing |
| Call (function, method, class, other callable) |
| Attribute reference |
| Tuple, expression, generator expression |
| List, list comprehension |
| Dictionary, set, set and dictionary comprehensions |
Since this book addresses both Python 2.6 and 3.0, here are some notes about version differences and recent additions related to the operators in Table 5-2:
In Python 2.6, value inequality can be written as either
X != Y
or X <> Y
. In Python 3.0, the
latter of these options is removed because it is redundant. In
either version, best practice is to use X != Y
for all value inequality
tests.
In Python 2.6, a backquotes expression `X`
works the same as repr(X)
and converts objects to
display strings. Due to its obscurity, this expression is
removed in Python 3.0; use the more readable str
and repr
built-in functions, described in
Numeric Display Formats.
The X // Y
floor
division expression always truncates fractional remainders
in both Python 2.6 and 3.0. The X /
Y
expression performs true division in 3.0 (retaining remainders) and
classic division in 2.6 (truncating for integers).
See Division: Classic, Floor, and True.
The syntax [...]
is
used for both list literals and list comprehension expressions.
The latter of these performs an implied loop and collects
expression results in a new list. See Chapters 4, 14, and 20 for examples.
The syntax (...)
is
used for tuples and expressions, as well as generator expressions—a form of list
comprehension that produces results on demand, instead of
building a result list. See Chapters 4 and 20 for examples. The parentheses
may sometimes be omitted in all three constructs.
The syntax {...}
is
used for dictionary literals, and in Python 3.0 for set literals
and both dictionary and set comprehensions. See the set coverage
in this chapter and Chapters 4, 8, 14, and 20 for examples.
The yield
and ternary
if
/else
selection expressions are
available in Python 2.5 and later. The former returns send(...)
arguments in generators; the
latter is shorthand for a multiline if
statement. yield
requires parentheses if not
alone on the right side of an assignment statement.
Comparison operators may be chained: X < Y < Z
produces the same
result as X < Y and Y < Z
. See
Comparisons: Normal and Chained for
details.
In recent Pythons, the slice expression X[I:J:K]
is equivalent to indexing
with a slice object: X[slice(I, J,
K)]
.
In Python 2.X, magnitude comparisons of mixed types—converting numbers to a common type, and ordering other mixed types according to the type name—are allowed. In Python 3.0, nonnumeric mixed-type magnitude comparisons are not allowed and raise exceptions; this includes sorts by proxy.
Magnitude comparisons for dictionaries are also no longer
supported in Python 3.0 (though equality tests are); comparing
sorted(dict.items())
is one
possible replacement.
We’ll see most of the operators in Table 5-2 in action later; first, though, we need to take a quick look at the ways these operators may be combined in expressions.
As in most languages, in Python, more complex expressions are coded by stringing together the operator expressions in Table 5-2. For instance, the sum of two multiplications might be written as a mix of variables and operators:
A * B + C * D
So, how does Python know which operation to perform first? The answer to this question lies in operator precedence. When you write an expression with more than one operator, Python groups its parts according to what are called precedence rules, and this grouping determines the order in which the expression’s parts are computed. Table 5-2 is ordered by operator precedence:
Operators lower in the table have higher precedence, and so bind more tightly in mixed expressions.
Operators in the same row in Table 5-2 generally group from left to right when combined (except for exponentiation, which groups right to left, and comparisons, which chain left to right).
For example, if you write X + Y *
Z
, Python evaluates the multiplication first (Y *
Z)
, then adds that result to X
because *
has higher precedence (is lower in the
table) than +
. Similarly, in
this section’s original example, both multiplications (A * B
and C *
D
) will happen before their results are
added.
You can forget about precedence completely if you’re careful to group parts of expressions with parentheses. When you enclose subexpressions in parentheses, you override Python’s precedence rules; Python always evaluates expressions in parentheses first before using their results in the enclosing expressions.
For instance, instead of coding X +
Y * Z
, you could write one of the following to force
Python to evaluate the expression in the desired order:
(X + Y) * Z X + (Y * Z)
In the first case, +
is
applied to X
and Y
first, because this subexpression is
wrapped in parentheses. In the second case, the *
is performed first (just as if there
were no parentheses at all). Generally speaking, adding
parentheses in large expressions is a good idea—it not only forces
the evaluation order you want, but also aids readability.
Besides mixing operators in expressions, you can also mix numeric types. For instance, you can add an integer to a floating-point number:
40 + 3.14
But this leads to another question: what type is the result—integer or floating-point? The answer is simple, especially if you’ve used almost any other language before: in mixed-type numeric expressions, Python first converts operands up to the type of the most complicated operand, and then performs the math on same-type operands. This behavior is similar to type conversions in the C language.
Python ranks the complexity of numeric types like so: integers are simpler than floating-point numbers, which are simpler than complex numbers. So, when an integer is mixed with a floating point, as in the preceding example, the integer is converted up to a floating-point value first, and floating-point math yields the floating-point result. Similarly, any mixed-type expression where one operand is a complex number results in the other operand being converted up to a complex number, and the expression yields a complex result. (In Python 2.6, normal integers are also converted to long integers whenever their values are too large to fit in a normal integer; in 3.0, integers subsume longs entirely.)
You can force the issue by calling built-in functions to convert types manually:
>>>int(3.1415)
# Truncates float to integer 3 >>>float(3)
# Converts integer to float 3.0
However, you won’t usually need to do this: because Python automatically converts up to the more complex type within an expression, the results are normally what you want.
Also, keep in mind that all these mixed-type conversions apply only when mixing numeric types (e.g., an integer and a floating-point) in an expression, including those using numeric and comparison operators. In general, Python does not convert across any other type boundaries automatically. Adding a string to an integer, for example, results in an error, unless you manually convert one or the other; watch for an example when we meet strings in Chapter 7.
In Python 2.6, nonnumeric mixed types can be compared, but no conversions are performed (mixed types compare according to a fixed but arbitrary rule). In 3.0, nonnumeric mixed-type comparisons are not allowed and raise exceptions.
Although we’re focusing on built-in numbers right now, all
Python operators may be overloaded (i.e., implemented) by Python
classes and C extension types to work on objects you create. For
instance, you’ll see later that objects coded with classes may be
added or concatenated with +
expressions, indexed with [i]
expressions, and so on.
Furthermore, Python itself automatically overloads some
operators, such that they perform different actions depending on
the type of built-in objects being processed. For example, the
+
operator performs addition
when applied to numbers but performs concatenation when applied to
sequence objects such as strings and lists. In fact, +
can mean anything at all when applied
to objects you define with classes.
As we saw in the prior chapter, this property is usually called polymorphism—a term indicating that the meaning of an operation depends on the type of the objects being operated on. We’ll revisit this concept when we explore functions in Chapter 16, because it becomes a much more obvious feature in that context.
On to the code! Probably the best way to understand numeric objects and expressions is to see them in action, so let’s start up the interactive command line and try some basic but illustrative operations (see Chapter 3 for pointers if you need help starting an interactive session).
First of all, let’s exercise some basic math. In the following
interaction, we first assign two variables (a
and b
) to integers so we can use them later in
a larger expression. Variables are simply names—created by you or
Python—that are used to keep track of information in your program.
We’ll say more about this in the next chapter, but in Python:
Variables are created when they are first assigned values.
Variables are replaced with their values when used in expressions.
Variables must be assigned before they can be used in expressions.
Variables refer to objects and are never declared ahead of time.
In other words, these assignments cause the variables a
and b
to spring into existence automatically:
%python
>>>a = 3
# Name created >>>b = 4
I’ve also used a comment here. Recall
that in Python code, text after a #
mark and continuing to the end of the
line is considered to be a comment and is ignored. Comments are a
way to write human-readable documentation for your code. Because
code you type interactively is temporary, you won’t normally write
comments in this context, but I’ve added them to some of this book’s
examples to help explain the code.[15] In the next part of the book, we’ll meet a related
feature—documentation strings—that attaches the text of your
comments to objects.
Now, let’s use our new integer objects in some expressions. At
this point, the values of a
and
b
are still 3
and 4
, respectively. Variables like these are
replaced with their values whenever they’re used inside an
expression, and the expression results are echoed back immediately
when working interactively:
>>>a + 1, a – 1
# Addition (3 + 1), subtraction (3 - 1) (4, 2) >>>b * 3, b / 2
# Multiplication (4 * 3), division (4 / 2) (12, 2.0) >>>a % 2, b ** 2
# Modulus (remainder), power (4 ** 2) (1, 16) >>>2 + 4.0, 2.0 ** b
# Mixed-type conversions (6.0, 16.0)
Technically, the results being echoed back here are
tuples of two values because the lines
typed at the prompt contain two expressions separated by commas;
that’s why the results are displayed in parentheses (more on tuples
later). Note that the expressions work because the variables
a
and b
within them have been assigned values.
If you use a different variable that has never been assigned, Python
reports an error rather than filling in some default value:
>>> c * 2
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'c' is not defined
You don’t need to predeclare variables in Python, but they must have been assigned at least once before you can use them. In practice, this means you have to initialize counters to zero before you can add to them, initialize lists to an empty list before you can append to them, and so on.
Here are two slightly larger expressions to illustrate operator grouping and more about conversions:
>>>b / 2 + a
# Same as ((4 / 2) + 3) 5.0 >>>print(b / (2.0 + a))
# Same as (4 / (2.0 + 3)) 0.8
In the first expression, there are no parentheses, so Python
automatically groups the components according to its precedence
rules—because /
is lower in Table 5-2 than +
, it binds more tightly and so is
evaluated first. The result is as if the expression had been
organized with parentheses as shown in the comment to the right of
the code.
Also, notice that all the numbers are integers in the first
expression. Because of that, Python 2.6 performs integer division and addition and will give a result of 5,
whereas Python 3.0 performs true division with remainders and gives
the result shown. If you want integer division in 3.0, code this as
b // 2 + a
(more on division in a
moment).
In the second expression, parentheses are added around the
+
part to force Python to
evaluate it first (i.e., before the /
). We also made one of the operands
floating-point by adding a decimal point: 2.0
. Because of the mixed types, Python
converts the integer referenced by a
to a floating-point value (3.0
) before performing the +
. If all the numbers in this expression
were integers, integer division (4 /
5
) would yield the truncated integer 0
in Python 2.6 but the floating-point
0.8
in Python 3.0 (again, stay tuned for division details).
Notice that we used a print
operation in the last of the preceding examples.
Without the print
, you’ll see
something that may look a bit odd at first glance:
>>>b / (2.0 + a)
# Auto echo output: more digits 0.80000000000000004 >>>print(b / (2.0 + a))
# print rounds off digits 0.8
The full story behind this odd result has to do with the
limitations of floating-point hardware and its inability to exactly
represent some values in a limited number of bits. Because computer
architecture is well beyond this book’s scope, though, we’ll finesse
this by saying that all of the digits in the first output are really
there in your computer’s floating-point hardware—it’s just that
you’re not accustomed to seeing them. In fact, this is really just a
display issue—the interactive prompt’s automatic result echo shows
more digits than the print
statement. If you don’t want to see all the digits, use print
; as the sidebar str and repr Display Formats will explain, you’ll get a
user-friendly display.
Note, however, that not all values have so many digits to display:
>>> 1 / 2.0
0.5
and that there are more ways to display the bits of a number
inside your computer than using print
and automatic echoes:
>>>num = 1 / 3.0
>>>num
# Echoes 0.33333333333333331 >>>print(num)
# print rounds 0.333333333333 >>>'%e' % num
# String formatting expression '3.333333e-001' >>>'%4.2f' % num
# Alternative floating-point format '0.33' >>>'{0:4.2f}'.format(num)
# String formatting method (Python 2.6 and 3.0) '0.33'
The last three of these expressions employ string formatting, a tool that allows for format flexibility, which we will explore in the upcoming chapter on strings (Chapter 7). Its results are strings that are typically printed to displays or reports.
So far, we’ve been dealing with standard numeric operations (addition and multiplication), but numbers can also be compared. Normal comparisons work for numbers exactly as you’d expect—they compare the relative magnitudes of their operands and return a Boolean result (which we would normally test in a larger statement):
>>>1 < 2
# Less than True >>>2.0 >= 1
# Greater than or equal: mixed-type 1 converted to 1.0 True >>>2.0 == 2.0
# Equal value True >>>2.0 != 2.0
# Not equal value False
Notice again how mixed types are allowed in numeric expressions (only); in the second test here, Python compares values in terms of the more complex type, float.
Interestingly, Python also allows us to chain multiple comparisons together to
perform range tests. Chained comparisons are a sort of shorthand for
larger Boolean expressions. In short, Python lets us string together
magnitude comparison tests to code chained comparisons such as range
tests. The expression (A < B <
C)
, for instance, tests whether B
is between A
and C
; it is equivalent to the Boolean test
(A < B and B < C)
but is
easier on the eyes (and the keyboard). For example, assume the
following assignments:
>>>X = 2
>>>Y = 4
>>>Z = 6
The following two expressions have identical effects, but the
first is shorter to type, and it may run slightly faster since
Python needs to evaluate Y
only
once:
>>>X < Y < Z
# Chained comparisons: range tests True >>>X < Y and Y < Z
True
The same equivalence holds for false results, and arbitrary chain lengths are allowed:
>>>X < Y > Z
False >>>X < Y and Y > Z
False >>>1 < 2 < 3.0 < 4
True >>>1 > 2 > 3.0 > 4
False
You can use other comparisons in chained tests, but the resulting expressions can become nonintuitive unless you evaluate them the way Python does. The following, for instance, is false just because 1 is not equal to 2:
>>> 1 == 2 < 3
# Same as: 1 == 2 and 2 < 3
False # Not same as: False < 3 (which means 0 < 3, which is true)
Python does not compare the 1 == 2
False
result to 3—this would technically mean the same as
0 < 3
, which would be True
(as we’ll see later in this chapter,
True
and False
are just customized 1 and 0).
You’ve seen how division works in the previous sections, so you should know that it behaves slightly differently in Python 3.0 and 2.6. In fact, there are actually three flavors of division, and two different division operators, one of which changes in 3.0:
X / Y
Classic and true division. In Python 2.6 and earlier, this operator performs classic division, truncating results for integers and keeping remainders for floating-point numbers. In Python 3.0, it performs true division, always keeping remainders regardless of types.
X // Y
Floor division. Added in Python 2.2 and available in both Python 2.6 and 3.0, this operator always truncates fractional remainders down to their floor, regardless of types.
True division was added to address the fact that the results
of the original classic division model are dependent on operand
types, and so can be difficult to anticipate in a dynamically typed
language like Python. Classic division was removed in 3.0 because of
this constraint—the /
and
//
operators implement true and
floor division in 3.0.
In sum:
In 3.0, the /
now always performs
true division, returning a float result
that includes any remainder, regardless of operand types. The
//
performs
floor division, which truncates the
remainder and returns an integer for integer operands or a float
if any operand is a float.
In 2.6, the /
does classic
division, performing truncating integer division if both
operands are integers and float division (keeping remainders)
otherwise. The //
does
floor division and works as it does in 3.0,
performing truncating division for integers and floor division
for floats.
Here are the two operators at work in 3.0 and 2.6:
C:misc>C:Python30python
>>> >>>10 / 4
# Differs in 3.0: keeps remainder 2.5 >>>10 // 4
# Same in 3.0: truncates remainder 2 >>>10 / 4.0
# Same in 3.0: keeps remainder 2.5 >>>10 // 4.0
# Same in 3.0: truncates to floor 2.0 C:misc>C:Python26python
>>> >>>10 / 4
2 >>>10 // 4
2 >>>10 / 4.0
2.5 >>>10 // 4.0
2.0
Notice that the data type of the result for //
is still dependent on the operand types
in 3.0: if either is a float, the result is a float; otherwise, it
is an integer. Although this may seem similar to the type-dependent
behavior of /
in 2.X that
motivated its change in 3.0, the type of the return value is much
less critical than differences in the return value itself. Moreover,
because //
was provided in part
as a backward-compatibility tool for programs that rely on
truncating integer division (and this is more common than you might
expect), it must return integers for integers.
Although /
behavior
differs in 2.6 and 3.0, you can still support both versions in
your code. If your programs depend on truncating integer division,
use //
in both 2.6 and 3.0. If
your programs require floating-point results with remainders for
integers, use float
to
guarantee that one operand is a float around a /
when run in 2.6:
X = Y // Z # Always truncates, always an int result for ints in 2.6 and 3.0 X = Y / float(Z) # Guarantees float division with remainder in either 2.6 or 3.0
Alternatively, you can enable 3.0 /
division in 2.6 with a __future__
import, rather than forcing
it with float
conversions:
C:misc>C:Python26python
>>>from __future__ import division
# Enable 3.0 "/" behavior >>>10 / 4
2.5 >>>10 // 4
2
One subtlety: the //
operator is generally referred to as
truncating division, but it’s more accurate
to refer to it as floor division—it truncates
the result down to its floor, which means the closest whole number
below the true result. The net effect is to round down, not
strictly truncate, and this matters for negatives. You can see the
difference for yourself with the Python math
module (modules must be imported
before you can use their contents; more on this later):
>>>import math
>>>math.floor(2.5
) 2 >>>math.floor(-2.5)
-3 >>>math.trunc(2.5)
2 >>>math.trunc(-2.5)
-2
When running division operators, you only really truncate for positive results, since truncation is the same as floor; for negatives, it’s a floor result (really, they are both floor, but floor is the same as truncation for positives). Here’s the case for 3.0:
C:misc>c:python30python
>>>5 / 2, 5 / −2
(2.5, −2.5) >>>5 // 2, 5 // −2
# Truncates to floor: rounds to first lower integer (2, −3) # 2.5 becomes 2, −2.5 becomes −3 >>>5 / 2.0, 5 / −2.0
(2.5, −2.5) >>>5 // 2.0, 5 // −2.0
# Ditto for floats, though result is float too (2.0, −3.0)
The 2.6 case is similar, but /
results differ again:
C:misc>c:python26python
>>>5 / 2, 5 / −2
# Differs in 3.0 (2, −3) >>>5 // 2, 5 // −2
# This and the rest are the same in 2.6 and 3.0 (2, −3) >>>5 / 2.0, 5 / −2.0
(2.5, −2.5) >>>5 // 2.0, 5 // −2.0
(2.0, −3.0)
If you really want truncation regardless of sign, you can
always run a float division result through math.trunc
, regardless of Python version
(also see the round
built-in
for related functionality):
C:misc>c:python30python
>>>import math
>>>5 / −2
# Keep remainder −2.5 >>>5 // −2
# Floor below result -3 >>>math.trunc(5 / −2)
# Truncate instead of floor −2 C:misc>c:python26python
>>>import math
>>>5 / float(−2)
# Remainder in 2.6 −2.5 >>>5 / −2, 5 // −2
# Floor in 2.6 (−3, −3) >>>math.trunc(5 / float(−2))
# Truncate in 2.6 −2
If you are using 3.0, here is the short story on division operators for reference:
>>>(5 / 2), (5 / 2.0), (5 / −2.0), (5 / −2)
# 3.0 true division (2.5, 2.5, −2.5, −2.5) >>>(5 // 2), (5 // 2.0), (5 // −2.0), (5 // −2)
# 3.0 floor division (2, 2.0, −3.0, −3) >>>(9 / 3), (9.0 / 3), (9 // 3), (9 // 3.0)
# Both (3.0, 3.0, 3, 3.0)
For 2.6 readers, division works as follows:
>>>(5 / 2), (5 / 2.0), (5 / −2.0), (5 / −2)
# 2.6 classic division (2, 2.5, −2.5, −3) >>>(5 // 2), (5 // 2.0), (5 // −2.0), (5 // −2)
# 2.6 floor division (same) (2, 2.0, −3.0, −3) >>>(9 / 3), (9.0 / 3), (9 // 3), (9 // 3.0)
# Both (3, 3.0, 3, 3.0)
Although results have yet to come in, it’s possible that the
nontruncating behavior of /
in
3.0 may break a significant number of programs. Perhaps because of
a C language legacy, many programmers rely on division truncation
for integers and will have to learn to use //
in such contexts instead. Watch for a
simple prime number while
loop
example in Chapter 13, and a
corresponding exercise at the end of Part IV
that illustrates the sort of code that may be impacted by this
/
change. Also stay tuned for
more on the special from
command used in this section; it’s discussed further in Chapter 24.
Division may differ slightly across Python releases, but it’s still fairly standard. Here’s something a bit more exotic. As mentioned earlier, Python 3.0 integers support unlimited size:
>>> 999999999999999999999999999999 + 1
1000000000000000000000000000000
Python 2.6 has a separate type for long integers, but it automatically converts any number too large to store in a normal integer to this type. Hence, you don’t need to code any special syntax to use longs, and the only way you can tell that you’re using 2.6 longs is that they print with a trailing “L”:
>>> 999999999999999999999999999999 + 1
1000000000000000000000000000000L
Unlimited-precision integers are a convenient built-in tool. For instance, you can use them to count the U.S. national debt in pennies in Python directly (if you are so inclined, and have enough memory on your computer for this year’s budget!). They are also why we were able to raise 2 to such large powers in the examples in Chapter 3. Here are the 3.0 and 2.6 cases:
>>>2 ** 200
1606938044258990275541962092341162602522202993782792835301376 >>>2 ** 200
1606938044258990275541962092341162602522202993782792835301376L
Because Python must do extra work to support their extended precision, integer math is usually substantially slower than normal when numbers grow large. However, if you need the precision, the fact that it’s built in for you to use will likely outweigh its performance penalty.
Although less widely used than the types we’ve been exploring thus far, complex numbers are a distinct core object type in Python. If you know what they are, you know why they are useful; if not, consider this section optional reading.
Complex numbers are represented as two floating-point
numbers—the real and imaginary parts—and are coded by adding a
j
or J
suffix to the imaginary part. We can
also write complex numbers with a nonzero real part by adding the
two parts with a +
. For example,
the complex number with a real part of 2
and an imaginary part of −3
is written 2 +
−3j
. Here are some examples of complex math at
work:
>>>1j * 1J
(-1+0j) >>>2 + 1j * 3
(2+3j) >>>(2 + 1j) * 3
(6+3j)
Complex numbers also allow us to extract their parts as
attributes, support all the usual mathematical expressions, and may
be processed with tools in the standard cmath
module (the complex version of the
standard math
module). Complex
numbers typically find roles in engineering-oriented programs.
Because they are advanced tools, check Python’s language reference
manual for additional details.
As described earlier in this chapter, Python integers can be coded in hexadecimal, octal, and binary notation, in addition to the normal base 10 decimal coding. The coding rules were laid out at the start of this chapter; let’s look at some live examples here.
Keep in mind that these literals are simply an alternative syntax for specifying the value of an integer object. For example, the following literals coded in Python 3.0 or 2.6 produce normal integers with the specified values in all three bases:
>>>0o1, 0o20, 0o377
# Octal literals (1, 16, 255) >>>0x01, 0x10, 0xFF
# Hex literals (1, 16, 255) >>>0b1, 0b10000, 0b11111111
# Binary literals (1, 16, 255)
Here, the octal value 0o377
, the hex value 0xFF
, and the binary value 0b11111111
are all decimal 255
. Python prints in decimal (base 10) by
default but provides built-in functions that allow you to convert
integers to other bases’ digit strings:
>>> oct(64), hex(64), bin(64)
('0o100', '0x40', '0b1000000')
The oct
function converts
decimal to octal, hex
to
hexadecimal, and bin
to binary.
To go the other way, the built-in int
function converts a string of digits
to an integer, and an optional second argument lets you specify the
numeric base:
>>>int('64'), int('100', 8), int('40', 16), int('1000000', 2)
(64, 64, 64, 64) >>>int('0x40', 16), int('0b1000000', 2)
# Literals okay too (64, 64)
The eval
function, which
you’ll meet later in this book, treats strings as though they were
Python code. Therefore, it has a similar effect (but usually runs
more slowly—it actually compiles and runs the string as a piece of a
program, and it assumes you can trust the source of the string being
run; a clever user might be able to submit a string that deletes
files on your machine!):
>>> eval('64'), eval('0o100'), eval('0x40'), eval('0b1000000')
(64, 64, 64, 64)
Finally, you can also convert integers to octal and hexadecimal strings with string formatting method calls and expressions:
>>>'{0:o}, {1:x}, {2:b}'.format(64, 64, 64)
'100, 40, 1000000' >>>'%o, %x, %X' % (64, 255, 255)
'100, ff, FF'
String formatting is covered in more detail in Chapter 7.
Two notes before moving on. First, Python 2.6 users should remember that you can code octals with simply a leading zero, the original octal format in Python:
>>>0o1, 0o20, 0o377
# New octal format in 2.6 (same as 3.0) (1, 16, 255) >>>01, 020, 0377
# Old octal literals in 2.6 (and earlier) (1, 16, 255)
In 3.0, the syntax in the second of these examples generates
an error. Even though it’s not an error in 2.6, be careful not to
begin a string of digits with a leading zero unless you really mean
to code an octal value. Python 2.6 will treat it as base 8, which
may not work as you’d expect—010
is always decimal 8 in 2.6, not decimal 10 (despite what you may or
may not think!). This, along with symmetry with the hex and binary
forms, is why the octal format was changed in 3.0—you must use
0o010
in 3.0, and probably should
in 2.6.
Secondly, note that these literals can produce arbitrarily long integers. The following, for instance, creates an integer with hex notation and then displays it first in decimal and then in octal and binary with converters (run in 2.6 here to reveal the long precision):
>>>X = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFF
>>>X
5192296858534827628530496329220095L >>>oct(X)
'017777777777777777777777777777777777777L' >>>bin(X)
'0b1111111111111111111111111111111111111111111111111111111111...and so on...
Speaking of binary digits, the next section shows tools for processing individual bits.
Besides the normal numeric operations (addition, subtraction, and so on), Python supports most of the numeric expressions available in the C language. This includes operators that treat integers as strings of binary bits. For instance, here it is at work performing bitwise shift and Boolean operations:
>>>x = 1
# 0001 >>>x << 2
# Shift left 2 bits: 0100 4 >>>x | 2
# Bitwise OR: 0011 3 >>>x & 1
# Bitwise AND: 0001 1
In the first expression, a binary 1
(in base 2, 0001
) is shifted left two slots to create
a binary 4
(0100
). The last two operations perform a
binary OR (0001|0010
= 0011
) and a binary AND (0001&0001
= 0001
). Such bit-masking operations allow
us to encode multiple flags and other values within a single
integer.
This is one area where the binary and hexadecimal number support in Python 2.6 and 3.0 become especially useful—they allow us to code and inspect numbers by bit-strings:
>>>X = 0b0001
# Binary literals >>>X << 2
# Shift left 4 >>>bin(X << 2)
# Binary digits string '0b100' >>>bin(X | 0b010)
# Bitwise OR '0b11' >>>bin(X & 0b1)
# Bitwise AND '0b1' >>>X = 0xFF
# Hex literals >>>bin(X)
'0b11111111' >>>X ^ 0b10101010
# Bitwise XOR 85 >>>bin(X ^ 0b10101010)
'0b1010101' >>>int('1010101', 2)
# String to int per base 85 >>>hex(85)
# Hex digit string '0x55'
We won’t go into much more detail on “bit-twiddling” here. It’s supported if you need it, and it comes in handy if your Python code must deal with things like network packets or packed binary data produced by a C program. Be aware, though, that bitwise operations are often not as important in a high-level language such as Python as they are in a low-level language such as C. As a rule of thumb, if you find yourself wanting to flip bits in Python, you should think about which language you’re really coding. In general, there are often better ways to encode information in Python than bit strings.
In the upcoming Python 3.1 release, the integer bit_length
method also allows you to
query the number of bits required to represent a number’s value in
binary. The same effect can often be achieved by subtracting 2
from the length of the bin
string using the len
built-in
function we met in Chapter 4, though it may be
less efficient:
>>>X = 99
>>>bin(X), X.bit_length()
('0b1100011', 7) >>>bin(256), (256).bit_length()
('0b100000000', 9) >>>len(bin(256)) - 2
9
In addition to its core object types, Python also provides both built-in functions and
standard library modules for numeric processing. The pow
and abs
built-in functions, for instance,
compute powers and absolute values, respectively. Here are some
examples of the built-in math
module (which contains most of the tools in the C language’s math
library) and a few built-in functions at work:
>>>import math
>>>math.pi, math.e
# Common constants (3.1415926535897931, 2.7182818284590451) >>>math.sin(2 * math.pi / 180)
# Sine, tangent, cosine 0.034899496702500969 >>>math.sqrt(144), math.sqrt(2)
# Square root (12.0, 1.4142135623730951) >>>pow(2, 4), 2 ** 4
# Exponentiation (power) (16, 16) >>>abs(-42.0), sum((1, 2, 3, 4))
# Absolute value, summation (42.0, 10) >>>min(3, 1, 2, 4), max(3, 1, 2, 4)
# Minimum, maximum (1, 4)
The sum
function shown here works on a sequence
of numbers, and min
and max
accept either a sequence or individual
arguments. There are a variety of ways to drop the decimal digits of
floating-point numbers. We met truncation and floor earlier; we can
also round, both numerically and for display purposes:
>>>math.floor(2.567), math.floor(-2.567)
# Floor (next-lower integer) (2, −3) >>>math.trunc(2.567), math.trunc(−2.567)
# Truncate (drop decimal digits) (2, −2) >>>int(2.567), int(−2.567)
# Truncate (integer conversion) (2, −2) >>>round(2.567), round(2.467), round(2.567, 2)
# Round (Python 3.0 version) (3, 2, 2.5699999999999998) >>>'%.1f' % 2.567, '{0:.2f}'.format(2.567)
# Round for display (Chapter 7) ('2.6', '2.57')
As we saw earlier, the last of these produces strings that we
would usually print and supports a variety of formatting options. As
also described earlier, the second to last test here will output
(3, 2, 2.57)
if we wrap it in a
print
call to request a more
user-friendly display. The last two lines still differ,
though—round
rounds a floating-point number but
still yields a floating-point number in memory, whereas string
formatting produces a string and doesn’t yield a modified
number:
>>> (1 / 3), round(1 / 3, 2), ('%.2f' % (1 / 3))
(0.33333333333333331, 0.33000000000000002, '0.33')
Interestingly, there are three ways to compute square roots in Python: using a module function, an expression, or a built-in function (if you’re interested in performance, we will revisit these in an exercise and its solution at the end of Part IV, to see which runs quicker):
>>>import math
>>>math.sqrt(144)
# Module 12.0 >>>144 ** .5
# Expression 12.0 >>>pow(144, .5)
# Built-in 12.0 >>>math.sqrt(1234567890)
# Larger numbers 35136.418286444619 >>>1234567890 ** .5
35136.418286444619 >>>pow(1234567890, .5)
35136.418286444619
Notice that standard library modules such as math
must be imported, but built-in
functions such as abs
and
round
are always available
without imports. In other words, modules are external components,
but built-in functions live in an implied namespace that Python
automatically searches to find names used in your program. This
namespace corresponds to the module called builtins
in Python 3.0 (__builtin__
in 2.6). There is much more
about name resolution in the function and module parts of this book;
for now, when you hear “module,” think “import.”
The standard library random
module must be imported as well. This
module provides tools for picking a random floating-point number
between 0 and 1, selecting a random integer between two numbers,
choosing an item at random from a sequence, and more:
>>>import random
>>>random.random()
0.44694718823781876 >>>random.random()
0.28970426439292829 >>>random.randint(1, 10)
5 >>>random.randint(1, 10)
4 >>>random.choice(['Life of Brian', 'Holy Grail', 'Meaning of Life'])
'Life of Brian' >>>random.choice(['Life of Brian', 'Holy Grail', 'Meaning of Life'])
'Holy Grail'
The random
module can be
useful for shuffling cards in games, picking images at random in a
slideshow GUI, performing statistical simulations, and much more.
For more details, see Python’s library manual.
So far in this chapter, we’ve been using Python’s core numeric types—integer, floating point, and complex. These will suffice for most of the number crunching that most programmers will ever need to do. Python comes with a handful of more exotic numeric types, though, that merit a quick look here.
Python 2.4 introduced a new core numeric type: the decimal object, formally known as Decimal
. Syntactically, decimals are
created by calling a function within an imported module, rather than
running a literal expression. Functionally, decimals are like
floating-point numbers, but
they have a fixed number of decimal points. Hence, decimals are
fixed-precision floating-point
values.
For example, with decimals, we can have a floating-point value that always retains just two decimal digits. Furthermore, we can specify how to round or truncate the extra decimal digits beyond the object’s cutoff. Although it generally incurs a small performance penalty compared to the normal floating-point type, the decimal type is well suited to representing fixed-precision quantities like sums of money and can help you achieve better numeric accuracy.
The last point merits elaboration. As you may or may not already know, floating-point math is less than exact, because of the limited space used to store values. For example, the following should yield zero, but it does not. The result is close to zero, but there are not enough bits to be precise here:
>>> 0.1 + 0.1 + 0.1 - 0.3
5.5511151231257827e-17
Printing the result to produce the user-friendly display format doesn’t completely help either, because the hardware related to floating-point math is inherently limited in terms of accuracy:
>>> print(0.1 + 0.1 + 0.1 - 0.3)
5.55111512313e-17
However, with decimals, the result can be dead-on:
>>>from decimal import Decimal
>>>Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3')
Decimal('0.0')
As shown here, we can make decimal objects by calling the
Decimal
constructor function in
the decimal
module and passing
in strings that have the desired number of decimal digits for the
resulting object (we can use the str
function to convert floating-point
values to strings if needed). When decimals of different precision
are mixed in expressions, Python converts up to the largest number
of decimal digits automatically:
>>> Decimal('0.1') + Decimal('0.10') + Decimal('0.10') - Decimal('0.30')
Decimal('0.00')
In Python 3.1 (to be released after this book’s
publication), it’s also possible to create a decimal object from
a floating-point object, with a call of the form decimal.Decimal.from_float(1.25)
. The
conversion is exact but can sometimes yield a large number of
digits.
Other tools in the decimal
module can be used to set the
precision of all decimal numbers, set up error handling, and more.
For instance, a context object in this module allows for
specifying precision (number of decimal digits) and rounding modes
(down, ceiling, etc.). The precision is applied globally for all
decimals created in the calling thread:
>>>import decimal
>>>decimal.Decimal(1) / decimal.Decimal(7)
Decimal('0.1428571428571428571428571429') >>>decimal.getcontext().prec = 4
>>>decimal.Decimal(1) / decimal.Decimal(7)
Decimal('0.1429')
This is especially useful for monetary applications, where cents are represented as two decimal digits. Decimals are essentially an alternative to manual rounding and string formatting in this context:
>>>1999 + 1.33
2000.3299999999999 >>> >>>decimal.getcontext().prec = 2
>>>pay = decimal.Decimal(str(1999 + 1.33))
>>>pay
Decimal('2000.33')
In Python 2.6 and 3.0 (and later), it’s also possible to
reset precision temporarily by using the with
context manager statement. The
precision is reset to its original value on statement exit:
C:misc>C:Python30python
>>>import decimal
>>>decimal.Decimal('1.00') / decimal.Decimal('3.00')
Decimal('0.3333333333333333333333333333') >>> >>>with decimal.localcontext() as ctx:
...ctx.prec = 2
...decimal.Decimal('1.00') / decimal.Decimal('3.00')
... Decimal('0.33') >>> >>>decimal.Decimal('1.00') / decimal.Decimal('3.00')
Decimal('0.3333333333333333333333333333')
Though useful, this statement requires much more background
knowledge than you’ve obtained at this point; watch for coverage
of the with
statement in Chapter 33.
Because use of the decimal type is still relatively rare in practice, I’ll defer to Python’s standard library manuals and interactive help for more details. And because decimals address some of the same floating-point accuracy issues as the fraction type, let’s move on to the next section to see how the two compare.
Python 2.6 and 3.0 debut a new numeric type, Fraction
, which implements a
rational number object. It essentially
keeps both a numerator and a denominator explicitly, so as to avoid
some of the inaccuracies and limitations of floating-point
math.
Fraction
is a sort of
cousin to the existing Decimal
fixed-precision type described in the prior section, as both can
be used to control numerical accuracy by fixing decimal digits and
specifying rounding or truncation policies. It’s also used in
similar ways—like Decimal
,
Fraction
resides in a module;
import its constructor and pass in a numerator and a denominator
to make one. The following interaction shows how:
>>>from fractions import Fraction
>>>x = Fraction(1, 3)
# Numerator, denominator >>>y = Fraction(4, 6)
# Simplified to 2, 3 by gcd >>>x
Fraction(1, 3) >>>y
Fraction(2, 3) >>>print(y)
2/3
Once created, Fraction
s
can be used in mathematical expressions as usual:
>>>x + y
Fraction(1, 1) >>>x – y
# Results are exact: numerator, denominator Fraction(-1, 3) >>>x * y
Fraction(2, 9)
Fraction
objects can also
be created from floating-point number strings, much like
decimals:
>>>Fraction('.25')
Fraction(1, 4) >>>Fraction('1.25')
Fraction(5, 4) >>> >>>Fraction('.25') + Fraction('1.25')
Fraction(3, 2)
Notice that this is different from floating-point-type math, which is constrained by the underlying limitations of floating-point hardware. To compare, here are the same operations run with floating-point objects, and notes on their limited accuracy:
>>>a = 1 / 3.0
# Only as accurate as floating-point hardware >>>b = 4 / 6.0
# Can lose precision over calculations >>>a
0.33333333333333331 >>>b
0.66666666666666663 >>>a + b
1.0 >>>a - b
-0.33333333333333331 >>>a * b
0.22222222222222221
This floating-point limitation is especially apparent for
values that cannot be represented accurately given their limited
number of bits in memory. Both Fraction
and Decimal
provide ways to get exact
results, albeit at the cost of some speed. For instance, in the
following example (repeated from the prior section),
floating-point numbers do not accurately give the zero answer
expected, but both of the other types do:
>>>0.1 + 0.1 + 0.1 - 0.3
# This should be zero (close, but not exact) 5.5511151231257827e-17 >>>from fractions import Fraction
>>>Fraction(1, 10) + Fraction(1, 10) + Fraction(1, 10) - Fraction(3, 10)
Fraction(0, 1) >>>from decimal import Decimal
>>>Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3')
Decimal('0.0')
Moreover, fractions and decimals both allow more intuitive and accurate results than floating points sometimes can, in different ways (by using rational representation and by limiting precision):
>>>1 / 3
# Use 3.0 in Python 2.6 for true "/" 0.33333333333333331 >>>Fraction(1, 3)
# Numeric accuracy Fraction(1, 3) >>>import decimal
>>>decimal.getcontext().prec = 2
>>>decimal.Decimal(1) / decimal.Decimal(3)
Decimal('0.33')
In fact, fractions both retain accuracy and automatically simplify results. Continuing the preceding interaction:
>>>(1 / 3) + (6 / 12)
# Use ".0" in Python 2.6 for true "/" 0.83333333333333326 >>>Fraction(6, 12)
# Automatically simplified Fraction(1, 2) >>>Fraction(1, 3) + Fraction(6, 12)
Fraction(5, 6) >>>decimal.Decimal(str(1/3)) + decimal.Decimal(str(6/12))
Decimal('0.83') >>>1000.0 / 1234567890
8.1000000737100011e-07 >>>Fraction(1000, 1234567890)
Fraction(100, 123456789)
To support fraction conversions, floating-point objects
now have a method that yields their numerator and denominator
ratio, fractions have a from_float
method, and float
accepts a Fraction
as an argument. Trace through
the following interaction to see how this pans out (the *
in the second test is special syntax
that expands a tuple into individual arguments; more on this when
we study function argument passing in Chapter 18):
>>>(2.5).as_integer_ratio()
# float object method (5, 2) >>>f = 2.5
>>>z = Fraction(*f.as_integer_ratio())
# Convert float -> fraction: two args >>>z
# Same as Fraction(5, 2) Fraction(5, 2) >>>x
# x from prior interaction Fraction(1, 3) >>>x + z
Fraction(17, 6) # 5/2 + 1/3 = 15/6 + 2/6 >>>float(x)
# Convert fraction -> float 0.33333333333333331 >>>float(z)
2.5 >>>float(x + z)
2.8333333333333335 >>>17 / 6
2.8333333333333335 >>>Fraction.from_float(1.75)
# Convert float -> fraction: other way Fraction(7, 4) >>>Fraction(*(1.75).as_integer_ratio())
Fraction(7, 4)
Finally, some type mixing is allowed in expressions, though
Fraction
must sometimes be
manually propagated to retain accuracy. Study the following
interaction to see how this works:
>>>x
Fraction(1, 3) >>>x + 2
# Fraction + int -> Fraction Fraction(7, 3) >>>x + 2.0
# Fraction + float -> float 2.3333333333333335 >>>x + (1./3)
# Fraction + float -> float 0.66666666666666663 >>>x + (4./3)
1.6666666666666665 >>>x + Fraction(4, 3)
# Fraction + Fraction -> Fraction Fraction(5, 3)
Caveat: although you can convert from floating-point to fraction, in some cases there is an unavoidable precision loss when you do so, because the number is inaccurate in its original floating-point form. When needed, you can simplify such results by limiting the maximum denominator value:
>>>4.0 / 3
1.3333333333333333 >>>(4.0 / 3).as_integer_ratio()
# Precision loss from float (6004799503160661, 4503599627370496) >>>x
Fraction(1, 3) >>>a = x + Fraction(*(4.0 / 3).as_integer_ratio())
>>>a
Fraction(22517998136852479, 13510798882111488) >>>22517998136852479 / 13510798882111488.
# 5 / 3 (or close to it!) 1.6666666666666667 >>>a.limit_denominator(10)
# Simplify to closest fraction Fraction(5, 3)
For more details on the Fraction
type, experiment further on
your own and consult the Python 2.6 and 3.0 library manuals and
other documentation.
Python 2.4 also introduced a new collection type, the set—an unordered collection of unique and immutable objects that supports operations corresponding to mathematical set theory. By definition, an item appears only once in a set, no matter how many times it is added. As such, sets have a variety of applications, especially in numeric and database-focused work.
Because sets are collections of other objects, they share some behavior with objects such as lists and dictionaries that are outside the scope of this chapter. For example, sets are iterable, can grow and shrink on demand, and may contain a variety of object types. As we’ll see, a set acts much like the keys of a valueless dictionary, but it supports extra operations.
However, because sets are unordered and do not map keys to values, they are neither sequence nor mapping types; they are a type category unto themselves. Moreover, because sets are fundamentally mathematical in nature (and for many readers, may seem more academic and be used much less often than more pervasive objects like dictionaries), we’ll explore the basic utility of Python’s set objects here.
There are a few ways to make sets today, depending on
whether you are using Python 2.6 or 3.0. Since this book covers
both, let’s begin with the 2.6 case, which also is available (and
sometimes still required) in 3.0; we’ll refine this for 3.0
extensions in a moment. To make a set object, pass in a sequence
or other iterable object to the built-in set
function:
>>>x = set('abcde')
>>>y = set('bdxyz')
You get back a set object, which contains all the items in the object passed in (notice that sets do not have a positional ordering, and so are not sequences):
>>> x
set(['a', 'c', 'b', 'e', 'd']) # 2.6 display format
Sets made this way support the common mathematical set operations with expression operators. Note that we can’t perform these expressions on plain sequences—we must create sets from them in order to apply these tools:
>>>'e' in x
# Membership True >>>x – y
# Difference set(['a', 'c', 'e']) >>>x | y
# Union set(['a', 'c', 'b', 'e', 'd', 'y', 'x', 'z']) >>>x & y
# Intersection set(['b', 'd']) >>>x ^ y
# Symmetric difference (XOR) set(['a', 'c', 'e', 'y', 'x', 'z']) >>>x > y, x < y
# Superset, subset (False, False)
In addition to expressions, the set object provides
methods that correspond to these operations
and more, and that support set changes—the set add
method inserts one item, update
is an in-place union, and
remove
deletes an item by value
(run a dir
call on any set
instance or the set
type name
to see all the available methods). Assuming x
and y
are still as they were in the prior
interaction:
>>>z = x.intersection(y)
# Same as x & y >>>z
set(['b', 'd']) >>>z.add('SPAM')
# Insert one item >>>z
set(['b', 'd', 'SPAM']) >>>z.update(set(['X', 'Y']))
# Merge: in-place union >>>z
set(['Y', 'X', 'b', 'd', 'SPAM']) >>>z.remove('b')
# Delete one item >>>z
set(['Y', 'X', 'd', 'SPAM'])
As iterable containers, sets can also
be used in operations such as len
, for
loops, and list comprehensions.
Because they are unordered, though, they don’t support sequence
operations like indexing and slicing:
>>> for item in set('abc'): print(item * 3)
...
aaa
ccc
bbb
Finally, although the set expressions shown earlier generally require two sets, their method-based counterparts can often work with any iterable type as well:
>>>S = set([1, 2, 3])
>>>S | set([3, 4])
# Expressions require both to be sets set([1, 2, 3, 4]) >>>S | [3, 4]
TypeError: unsupported operand type(s) for |: 'set' and 'list' >>>S.union([3, 4])
# But their methods allow any iterable set([1, 2, 3, 4]) >>>S.intersection((1, 3, 5))
set([1, 3]) >>>S.issubset(range(-5, 5))
True
For more details on set operations, see Python’s library reference manual or a reference book. Although set operations can be coded manually in Python with other types, like lists and dictionaries (and often were in the past), Python’s built-in sets use efficient algorithms and implementation techniques to provide quick and standard operation.
If you think sets are “cool,” they recently became
noticeably cooler. In Python 3.0 we can still use the set
built-in to make set objects, but
3.0 also adds a new set literal form, using the curly braces formerly reserved for dictionaries. In
3.0, the following are equivalent:
set([1, 2, 3, 4]) # Built-in call {1, 2, 3, 4} # 3.0 set literals
This syntax makes sense, given that sets are essentially like valueless dictionaries—because a set’s items are unordered, unique, and immutable, the items behave much like a dictionary’s keys. This operational similarity is even more striking given that dictionary key lists in 3.0 are view objects, which support set-like behavior such as intersections and unions (see Chapter 8 for more on dictionary view objects).
In fact, regardless of how a set is made, 3.0 displays it
using the new literal format. The set
built-in is still required in 3.0 to
create empty sets and to build sets from existing iterable objects
(short of using set comprehensions, discussed later in this
chapter), but the new literal is convenient for initializing sets
of known structure:
C:Misc>c:python30python
>>>set([1, 2, 3, 4])
# Built-in: same as in 2.6 {1, 2, 3, 4} >>>set('spam')
# Add all items in an iterable {'a', 'p', 's', 'm'} >>>{1, 2, 3, 4}
# Set literals: new in 3.0 {1, 2, 3, 4} >>>S = {'s', 'p', 'a', 'm'}
>>>S.add('alot')
>>>S
{'a', 'p', 's', 'm', 'alot'}
All the set processing operations discussed in the prior section work the same in 3.0, but the result sets print differently:
>>>S1 = {1, 2, 3, 4}
>>>S1 & {1, 3}
# Intersection {1, 3} >>>{1, 5, 3, 6} | S1
# Union {1, 2, 3, 4, 5, 6} >>>S1 - {1, 3, 4}
# Difference {2} >>>S1 > {1, 3}
# Superset True
Note that {}
is still a
dictionary in Python. Empty sets must be
created with the set
built-in,
and print the same way:
>>>S1 - {1, 2, 3, 4}
# Empty sets print differently set() >>>type({})
# Because {} is an empty dictionary <class 'dict'> >>>S = set()
# Initialize an empty set >>>S.add(1.23)
>>>S
{1.23}
As in Python 2.6, sets created with 3.0 literals support the same methods, some of which allow general iterable operands that expressions do not:
>>>{1, 2, 3} | {3, 4}
{1, 2, 3, 4} >>>{1, 2, 3} | [3, 4]
TypeError: unsupported operand type(s) for |: 'set' and 'list' >>>{1, 2, 3}.union([3, 4])
{1, 2, 3, 4} >>>{1, 2, 3}.union({3, 4})
{1, 2, 3, 4} >>>{1, 2, 3}.union(set([3, 4]))
{1, 2, 3, 4} >>>{1, 2, 3}.intersection((1, 3, 5))
{1, 3} >>>{1, 2, 3}.issubset(range(-5, 5))
True
Sets are powerful and flexible objects, but they do have one constraint in both 3.0 and 2.6 that you should keep in mind—largely because of their implementation, sets can only contain immutable (a.k.a “hashable”) object types. Hence, lists and dictionaries cannot be embedded in sets, but tuples can if you need to store compound values. Tuples compare by their full values when used in set operations:
>>>S
{1.23} >>>S.add([1, 2, 3])
# Only immutable objects work in a set TypeError: unhashable type: 'list' >>>S.add({'a':1})
TypeError: unhashable type: 'dict' >>>S.add((1, 2, 3))
>>>S
# No list or dict, but tuple okay {1.23, (1, 2, 3)} >>>S | {(4, 5, 6), (1, 2, 3)}
# Union: same as S.union(...) {1.23, (4, 5, 6), (1, 2, 3)} >>>(1, 2, 3) in S
# Membership: by complete values True >>>(1, 4, 3) in S
False
Tuples in a set, for instance, might be used to represent
dates, records, IP addresses, and so on (more on tuples later in
this part of the book). Sets themselves are mutable too, and so
cannot be nested in other sets directly; if you need to store a
set inside another set, the frozenset
built-in call works just like
set
but creates an immutable
set that cannot change and thus can be embedded in other
sets.
In addition to literals, 3.0 introduces a set comprehension construct; it is similar in form to the list comprehension we previewed in Chapter 4, but is coded in curly braces instead of square brackets and run to make a set instead of a list. Set comprehensions run a loop and collect the result of an expression on each iteration; a loop variable gives access to the current iteration value for use in the collection expression. The result is a new set created by running the code, with all the normal set behavior:
>>> {x ** 2 for x in [1, 2, 3, 4]}
# 3.0 set comprehension
{16, 1, 4, 9}
In this expression, the loop is coded on the right, and the
collection expression is coded on the left (x ** 2
). As for list comprehensions, we
get back pretty much what this expression says: “Give me a new set
containing X squared, for every X in a list.” Comprehensions can
also iterate across other kinds of objects, such as strings (the
first of the following examples illustrates the
comprehension-based way to make a set from an existing
iterable):
>>>{x for x in 'spam'}
# Same as: set('spam') {'a', 'p', 's', 'm'} >>>{c * 4 for c in 'spam'}
# Set of collected expression results {'ssss', 'aaaa', 'pppp', 'mmmm'} >>>{c * 4 for c in 'spamham'}
{'ssss', 'aaaa', 'hhhh', 'pppp', 'mmmm'} >>>S = {c * 4 for c in 'spam'}
>>>S | {'mmmm', 'xxxx'}
{'ssss', 'aaaa', 'pppp', 'mmmm', 'xxxx'} >>>S & {'mmmm', 'xxxx'}
{'mmmm'}
Because the rest of the comprehensions story relies upon
underlying concepts we’re not yet prepared to address, we’ll
postpone further details until later in this book. In Chapter 8, we’ll meet a first cousin in
3.0, the dictionary comprehension, and I’ll have much more to say
about all comprehensions (list, set, dictionary, and generator)
later, especially in Chapters14 and 20. As we’ll learn later, all
comprehensions, including sets, support additional syntax not
shown here, including nested loops and if
tests, which can be difficult to
understand until you’ve had a chance to study larger statements.
Set operations have a variety of common uses, some more
practical than mathematical. For example, because items are stored
only once in a set, sets can be used to filter duplicates out of
other collections. Simply convert the collection to a set, and
then convert it back again (because sets are iterable, they work
in the list
call here):
>>>L = [1, 2, 1, 3, 2, 4, 5]
>>>set(L)
{1, 2, 3, 4, 5} >>>L = list(set(L))
# Remove duplicates >>>L
[1, 2, 3, 4, 5]
Sets can also be used to keep track of where you’ve already been when traversing a graph or other cyclic structure. For example, the transitive module reloader and inheritance tree lister examples we’ll study in Chapters 24 and 30, respectively, must keep track of items visited to avoid loops. Although recording states visited as keys in a dictionary is efficient, sets offer an alternative that’s essentially equivalent (and may be more or less intuitive, depending on who you ask).
Finally, sets are also convenient when dealing with large
data sets (database query results, for example)—the intersection
of two sets contains objects in common to both categories, and the
union contains all items in either set. To illustrate, here’s a
somewhat more realistic example of set operations at work, applied
to lists of people in a hypothetical company, using 3.0 set
literals (use set
in
2.6):
>>>engineers = {'bob', 'sue', 'ann', 'vic'}
>>>managers = {'tom', 'sue'}
>>>'bob' in engineers
# Is bob an engineer? True >>>engineers & managers
# Who is both engineer and manager? {'sue'} >>>engineers | managers
# All people in either category {'vic', 'sue', 'tom', 'bob', 'ann'} >>>engineers – managers
# Engineers who are not managers {'vic', 'bob', 'ann'} >>>managers – engineers
# Managers who are not engineers {'tom'} >>>engineers > managers
# Are all managers engineers? (superset) False >>>{'bob', 'sue'} < engineers
# Are both engineers? (subset) True >>>(managers | engineers) > managers
# All people is a superset of managers True >>>managers ^ engineers
# Who is in one but not both? {'vic', 'bob', 'ann', 'tom'} >>>(managers | engineers) - (managers ^ engineers)
# Intersection! {'sue'}
You can find more details on set operations in the Python library manual and some mathematical and relational database theory texts. Also stay tuned for Chapter 8’s revival of some of the set operations we’ve seen here, in the context of dictionary view objects in Python 3.0.
Some argue that the Python Boolean type, bool
, is numeric in nature because its two
values, True
and False
, are just customized versions of the
integers 1 and 0 that print themselves differently. Although that’s
all most programmers need to know, let’s explore this type in a bit
more detail.
More formally, Python today has an explicit Boolean data type
called bool
, with the values
True
and False
available as new preassigned
built-in names. Internally, the names True
and False
are instances of bool
, which is in turn just a subclass (in
the object-oriented sense) of the built-in integer type int
. True
and False
behave exactly like the integers 1
and 0, except that they have customized printing logic—they print
themselves as the words True
and
False
, instead of the digits
1
and 0
. bool
accomplishes this by redefining str
and repr
string formats for its two
objects.
Because of this customization, the output of Boolean
expressions typed at the interactive prompt prints as the words
True
and False
instead of the older and less
obvious 1
and 0
. In addition, Booleans make truth values
more explicit. For instance, an infinite loop can now be coded as
while True:
instead of the less
intuitive while 1:
. Similarly,
flags can be initialized more clearly with flag = False
. We’ll discuss these
statements further in Part III.
Again, though, for all other practical purposes, you can treat
True
and False
as though they are predefined
variables set to integer 1
and
0
. Most programmers used to
preassign True
and False
to 1
and 0
anyway; the bool
type simply
makes this standard. Its implementation can lead to curious results,
though. Because True
is just the
integer 1
with a custom display
format, True + 4
yields 5
in Python:
>>>type(True)
<class 'bool'> >>>isinstance(True, int)
True >>>True == 1
# Same value True >>>True is 1
# But different object: see the next chapter False >>>True or False
# Same as: 1 or 0 True >>>True + 4
# (Hmmm) 5
Since you probably won’t come across an expression like the last of these in real Python code, you can safely ignore its deeper metaphysical implications....
We’ll revisit Booleans in Chapter 9 (to define
Python’s notion of truth) and again in Chapter 12
(to see how Boolean operators like and
and or
work).
Finally, although Python core numeric types offer plenty of power for most applications, there is a large library of third-party open source extensions available to address more focused needs. Because numeric programming is a popular domain for Python, you’ll find a wealth of advanced tools.
For example, if you need to do serious number crunching, an optional extension for Python called NumPy (Numeric Python) provides advanced numeric programming tools, such as a matrix data type, vector processing, and sophisticated computation libraries. Hardcore scientific programming groups at places like Los Alamos and NASA use Python with NumPy to implement the sorts of tasks they previously coded in C++, FORTRAN, or Matlab. The combination of Python and NumPy is often compared to a free, more flexible version of Matlab—you get NumPy’s performance, plus the Python language and its libraries.
Because it’s so advanced, we won’t talk further about NumPy in this book. You can find additional support for advanced numeric programming in Python, including graphics and plotting tools, statistics libraries, and the popular SciPy package at Python’s PyPI site, or by searching the Web. Also note that NumPy is currently an optional extension; it doesn’t come with Python and must be installed separately.
This chapter has taken a tour of Python’s numeric object types and the operations we can apply to them. Along the way, we met the standard integer and floating-point types, as well as some more exotic and less commonly used types such as complex numbers, fractions, and sets. We also explored Python’s expression syntax, type conversions, bitwise operations, and various literal forms for coding numbers in scripts.
Later in this part of the book, I’ll fill in some details about the next object type, the string. In the next chapter, however, we’ll take some time to explore the mechanics of variable assignment in more detail than we have here. This turns out to be perhaps the most fundamental idea in Python, so make sure you check out the next chapter before moving on. First, though, it’s time to take the usual chapter quiz.
What is the value of the expression 2 * (3 + 4)
in Python?
What is the value of the expression 2 * 3 + 4
in Python?
What is the value of the expression 2 + 3 * 4
in Python?
What tools can you use to find a number’s square root, as well as its square?
What is the type of the result of the expression 1 + 2.0 + 3
?
How can you truncate and round a floating-point number?
How can you convert an integer to a floating-point number?
How would you display an integer in octal, hexadecimal, or binary notation?
How might you convert an octal, hexadecimal, or binary string to a plain integer?
The value will be 14
, the
result of 2 * 7, because the parentheses force the addition to
happen before the multiplication.
The value will be 10
, the
result of 6 + 4. Python’s operator precedence rules are applied in
the absence of parentheses, and multiplication has higher
precedence than (i.e., happens before) addition, per Table 5-2.
This expression yields 14
, the result of 2 + 12, for the same
precedence reasons as in the prior question.
Functions for obtaining the square root, as well as
pi, tangents, and more, are available in the
imported math
module. To find a
number’s square root, import math
and call math.sqrt(N)
. To get a number’s square,
use either the exponent expression X **
2
or the built-in function pow(X, 2)
. Either of these last two can
also compute the square root when given a power of 0.5
(e.g., X **
.5
).
The result will be a floating-point number: the integers are converted up to floating point, the most complex type in the expression, and floating-point math is used to evaluate it.
The int(
N
)
and math.trunc(
N
)
functions truncate, and the round(
N
,
digits
)
function rounds. We can also compute
the floor with math.floor(
N
)
and round for display with string
formatting operations.
The float(
I
)
function converts an integer to a
floating point; mixing an integer with a floating point within an
expression will result in a conversion as well. In some sense,
Python 3.0 /
division converts
too—it always returns a floating-point result that includes the
remainder, even if both operands are integers.
The oct(
I
)
and hex(
I
)
built-in functions return the octal
and hexadecimal string forms for an integer. The bin(
I
)
call also returns a number’s binary
digits string in Python 2.6 and 3.0. The %
string formatting expression and
format
string method also
provide targets for some such conversions.
The int(
S
,
base
)
function can be used to convert from
octal and hexadecimal strings to normal integers (pass in 8
, 16
, or 2
for the base). The eval(
S
)
function can be used for this purpose
too, but it’s more expensive to run and can have security issues.
Note that integers are always stored in binary in computer memory;
these are just display string format conversions.
[15] If you’re working along, you don’t need to type any of the
comment text from the #
through to the end of the line; comments are simply ignored by
Python and not required parts of the statements we’re
running.