This book describes mainly the language defined by Python version 2.5. Python version 3.0 (and its companion “transition” release, 2.6) isn’t all that different. Most things work just as they did before, but the language cleanups introduced mean that some existing code will break.
If you’re transitioning from older code to Python 3.0, a couple of tools can come in quite handy. First, Python 2.6 comes with optional warnings about 3.0 incompatibilities (run Python with the -3
flag). If you first make sure your code runs without errors in 2.6 (which is largely backward-compatible), you can refactor away any incompatibility warnings. (Needless to say, you should have solid unit tests in place before you do this; see Chapter 16 for more advice on testing.) Second, Python 3.0 ships with an automatic refactoring tool called 2to3
, which can automatically upgrade your source files. (Be sure to back up or check in your files before performing any large-scale transformations.) If you wish to have both 2.6 and 3.0 code available, you could keep working on the 2.6 code (with the proper warnings turned on), and generate 3.0 code when it’s time for releasing.
Throughout the book, you’ll find notes about things that change in Python 3.0. This appendix gives a more comprehensive set of pointers for moving to the world of 3.0. I’ll describe some of the more noticeable changes, but not everything that is new in Python 3.0. There are many changes, both major and minor. Table D-1 (which is based on the document What’s New in Python 3.0?, by Guido van Rossum), at the end of this appendix, lists quite a few more changes and also refers to relevant PEP documents, when applicable (available from http://python.org/dev/peps
). Table D-2 lists some other sources of further information.
The following sections deal with new features related to text. Strings are no longer simply byte sequences (although such sequences are still available), the input
/print
pair has been revamped slightly, and string formatting has had a major facelift.
The distinction between text and byte sequences is significantly cleaned up in Python 3.0. Strings in previous versions were based on the somewhat outmoded (yet still prevalent) notion that text characters can easily be represented as single bytes. While this is true for English and most western languages, it fails to account for ideographic scripts, such as Chinese.
The Unicode standard was created to encompass all written languages, and it admits about 100,000 different characters, each of which has a unique numeric code. In Python 3.0, str
is, in fact, the unicode
type from earlier versions, which is a sequence of Unicode characters. As there is no unique way of encoding these into byte sequences (which you need to do in order to perform disk I/O, for example), you must supply an encoding (with UTF-8 as the default in most cases). So, text files are now assumed to be encoded versions of Unicode, rather than simply arbitrary sequences of bytes. (Binary files are still just byte sequences, though.) As a consequence of this, constants such as string.letters
have been given the prefix ascii_
(for example, string.ascii_letters
) to make the link to a specific encoding clear.
To avoid losing the old functionality of the previous str
class, there is a new class called bytes
, which represents immutable sequences of bytes (as well as bytearray
, which is its mutable sibling).
There is little reason to single out console printing to the degree that it has its own statement. Therefore, the print
statement is changed into a function. It still works in a manner very similar to the original statement (for example, you can print several arguments by separating them with commas), but the stream redirection functionality is now a keyword argument. In other words, instead of writing this:
print >> sys.stderr, "fatal error:", error
you would write this:
print("fatal error:", error, file=sys.stderr)
Also, the behavior of the original input
no longer has its own function. The name input
is now used for what used to be raw_input
, and you need to explicitly say eval(input())
to get the old functionality.
Strings now have a new method, called format
, which allows you to perform rather advanced string formatting. The fields in the string where values are to be spliced in are enclosed in braces, rather than prefaced with a %
(and braces are escaped by using double braces). The replacement fields refer to the arguments of the format
method, either by numbers (for positional arguments) or names (for keyword arguments):
>>> "{0}, {1}, {x}".format("a", 1, x=42)
'a 1 42'
In addition, the replacement fields can access attributes and elements of the values to be replaced, such as in "{foo.bar}"
or "{foo[bar]}"
, and can be modified by format specifiers similar to those in the current system. This new mechanism is quite flexible, and because it allows classes to specify their own format string behavior (through the magic __format__
method), you will be able to write much more elegant output formatting code.
Although none of the changes are quite as fundamental as the introduction of new-style classes, Python 3 has some goodies in store in the abstraction department: functions can now be annotated with information about parameters and return values, there is a framework for abstract base classes, metaclasses have a more convenient syntax, and you can have keyword-only parameters and nonlocal (but not global) variables.
The new function annotation system is something of a wildcard. It allows you to annotate the arguments and the return type of a function (or method) with the values of arbitrary expressions, and then to retrieve these values later. However, what this system is to be used for is not specified. It is motivated by several practical applications (such as more fine-grained docstring functionality, type specifications and checking, generic functions, and more), but you can basically use it for anything you like.
A function is annotated as follows:
def frozzbozz(x: foo, y: bar = 42) -> baz:
pass
Here, foo
, bar
, and baz
are annotations for the positional argument x
, the keyword argument y
, and the return value of frozzbozz
, respectively. These can be retrieved from the dictionary frozzbozz.func_annotations
, with the parameter names (or "return"
for the return value) as keys.
Sometimes you might want to implement only parts of a class. For example, you may have functionality that is to be shared among several classes, so you put it in a superclass. However, the superclass isn’t really complete and shouldn’t be instantiated by itself—it’s only there for others to inherit. This is called an abstract base class (or simply an abstract class). It’s quite common for such abstract classes to define nonfunctional methods that the subclasses need to override. In this way, the base class also acts as an interface definition, in a way.
You can certainly simulate this with older Python versions (for example, by raising NotImplementedError
), but now there is a more complete framework for abstract base classes. This framework includes a new metaclass (ABCMeta
), and the decorators @abstractmethod
and @abstractproperty
for defining abstract (that is, unimplemented) methods and properties, respectively. There’s also a separate module (abc
) that serves as a “support framework” for abstract base classes.
Class decorators work in a manner similar to function decorators. Simply put, instead of the following:
class A:
pass
A = foo(A)
@foo
class A:
pass
In other words, this lets you do some processing on the newly created class object. In fact, it may let you do many of the things you might have used a metaclass for in the past. But in case you need a metaclass, there is even a new syntax for those. Instead of this:
class A:
__metaclass__ = foo
you can now write this:
class A(metaclass=foo):
pass
For more information about class decorators, see PEP 3129 (http://python.org/dev/peps/pep-3129
), and for more on the new metaclass syntax, see PEP 3115 (http://python.org/dev/peps/pep-3115
).
It’s now possible to define parameters that must be supplied as keywords (if at all). In previous versions, any keyword parameter could also be supplied as a positional parameter, unless you used a function definition such as def foo(**kwds):
and processed the kwds
dictionary yourself. If a keyword argument was required, you needed to raise an exception explicitly when it was missing.
The new functionality is simple, logical, and elegant. You can now put parameters after a varargs argument:
def foo(*args, my_param=42): ...
The parameter my_param
will never be filled by a positional argument, as they are all eaten by args
. If it is to be supplied, it must be supplied as a keyword argument. Interestingly, you do not even need to give these keyword-only parameters a default. If you don’t, they become required keyword-only parameters (that is, not supplying them would be an error). If you don’t want the varargs argument (args
), you could use the new syntactical form, where the varargs operator (*
) is used without a variable:
def foo(x, y, *, z): ...
Here, x
and y
are required positional parameters, and z
is a required keyword parameter.
When nested (static) scopes were introduced in Python, they were read-only, and they have been ever since; that is, you can access the local variables of outer scopes, but you can’t rebind them. There’s a special case for the global scope, of course. If you declare a variable to be global (with the global
keyword), you can rebind it globally. Now you can do the same for outer, non-global scopes, using the nonlocal
keyword.
Some other new features include being able to collect excess elements when unpacking iterables, constructing dictionaries and sets in a manner similar to list comprehension, and creating dynamically updatable views of a dictionary. The use of iterable objects has also extended to the return values of several built-in functions.
Iterable unpacking (such as x, y, z = iterable
) has previously required that you know the exact number of items in the iterable object to be unpacked. Now you can use the *
operator, just for parameters, to gather up extra items as a list. This operator can be used on any one of the variables on the left-hand side of the assignment, and that variable will gather up any items that are left over when the other variables have received their items:
>>> a, *b, c, d = [1, 2, 3, 4, 5]
>>> a, b, c, d
(1, [2, 3], 4, 5)
It is now possible to construct dictionaries and sets using virtually the same comprehension syntax as for list comprehensions and generator expressions:
>>> {i:i for i in range(5)}
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
>>> {i for i in range(10)}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
The last result also demonstrates the new syntax for sets (see the section “Some Minor Issues,” later in this appendix).
You can now access different views on dictionaries. These views are collection-like objects that change automatically to reflect updates to the dictionary itself. The views returned by dict.keys
and dict.items
are set-like, and cannot include duplicates, while the views returned by dict.values
can. The set-like views permit set operations.
Several functions and methods that used to return lists now return more lazy iterable objects instead. Examples include range
, zip
, map
, and filter
.
Some functions will simply disappear in Python 3.0. For example, you can no longer use apply
. Then again, with the *
and **
operators for argument splicing, you don’t really need it. Another notable example is callable
. With it gone, you now have two main options for finding out whether an object is callable: you can check whether it has the magic method __callable__
, or you can simply try to call it (using try
/except
). Other examples include execfile
(use exec
instead), reload
(use exec
here, too), reduce
(it’s now in the functools
module), coerce
(not needed with the new numeric type hierarchy), and file
(use open
to open files).
The following are some minor issues that might trip you up:
<>
, is no longer allowed. You should write !=
instead (which is common practice already).repr
instead.<
, <=
, and the like) won’t allow you to compare incompatible types. For example, you can no longer check whether 4
is greater than "5"
(this is consistent with the existing rules for addition).{1, 2, 3}
is the same as set([1, 2, 3])
. However, {}
is still an empty dictionary. Use set()
to get an empty set.1/2
will give you 0.5
, not 0
. For integer division, use 1//2
. Because this is a “silent error” (you won’t get any error messages if you try to use /
for integer division), it can be insidious.The standard library is reorganized quite a bit in Python 3.0. A thorough discussion can be found in PEP 3108 (http://www.python.org/dev/peps/pep-3108
). Here are some examples:
mimetools
and md5
), platform-specific ones (for IRIX, Mac OS, and Solaris), and some that are hardly used (such as mutex
) or obsolete (such as bsddb185
). Important functionality is generally preserved through other modules.http://www.python.org/dev/peps/pep-0008
), among other things. For example, copy_reg
is now copyreg
, ConfigParser
is configparser
, cStringIO
is dropped, and StringIO
is added to the io
module.httplib
, BaseHTTPServer
, and Cookie
) are now collected in the new http
packages (as http.client
, http.server
, and http.cookies
).The idea behind these changes is, of course, to tidy things up a bit.
As I mentioned at the beginning of this appendix, Python 3.0 has a lot of new features. Table D-1 lists many of them, including some I haven’t discussed in this appendix. If there’s something specific that’s tripping you up, you might want to take a look at the official documentation or play around with the help
function. See also Table D-2 for some sources of further information.
Name | URL |
Python v3.0 Documentation | http://docs.python.org/dev/3.0 |
What’s New in Python 3.0? | http://docs.python.org/dev/3.0/whatsnew/3.0.html |
PEP 3000: Python 3000 | http://www.python.org/dev/peps/pep-3000 |
Python 3000 and You | http://www.artima.com/weblogs/viewpost.jsp?thread=227041 |