We’ll dig into more class syntax details in the next chapter. Before we do, though, I’d like to show you a more realistic example of classes in action that’s more practical than what we’ve seen so far. In this chapter, we’re going to build a set of classes that do something more concrete—recording and processing information about people. As you’ll see, what we call instances and classes in Python programming can often serve the same roles as records and programs in more traditional terms.
Specifically, in this chapter we’re going to code two classes:
Person
—a class that creates
and processes information about people
Manager
—a customization of
Person that modifies inherited behavior
Along the way, we’ll make instances of both classes and test out their functionality. When we’re done, I’ll show you a nice example use case for classes—we’ll store our instances in a shelve object-oriented database, to make them permanent. That way, you can use this code as a template for fleshing out a full-blown personal database written entirely in Python.
Besides actual utility, though, our aim here is also educational: this chapter provides a tutorial on object-oriented programming in Python. Often, people grasp the last chapter’s class syntax on paper, but have trouble seeing how to get started when confronted with having to code a new class from scratch. Toward this end, we’ll take it one step at a time here, to help you learn the basics; we’ll build up the classes gradually, so you can see how their features come together in complete programs.
In the end, our classes will still be relatively small in terms of code, but they will demonstrate all of the main ideas in Python’s OOP model. Despite its syntax details, Python’s class system really is largely just a matter of searching for an attribute in a tree of objects, along with a special first argument for functions.
OK, so much for the design phase—let’s move on to implementation.
Our first task is to start coding the main class, Person
. In your favorite text editor, open a
new file for the code we’ll be writing. It’s a fairly strong
convention in Python to begin module names with a lowercase letter and class names with an uppercase letter; like the name of
self
arguments in methods, this is
not required by the language, but it’s so common that deviating might
be confusing to people who later read your code. To conform, we’ll
call our new module file person.py and our class within it Person
, like this:
# File person.py (start) class Person:
All our work will be done in this file until later in this
chapter. We can code any number of functions and classes in a single
module file in Python, and this one’s person.py name might not make much sense if
we add unrelated components to it later. For now, we’ll assume
everything in it will be Person
-related. It probably should be
anyhow—as we’ve learned, modules tend to work best when they have a
single, cohesive purpose.
Now, the first thing we want to do with our Person
class is record basic information
about people—to fill out record fields, if you will. Of course,
these are known as instance object attributes in Python-speak, and they
generally are created by assignment to self
attributes in class method functions.
The normal way to give instance attributes their first
values is to assign them to self
in the __init__
constructor method, which contains code run
automatically by Python each time an instance is created. Let’s add
one to our class:
# Add record field initialization class Person:def __init__(self, name, job, pay):
# Constructor takes 3 argumentsself.name = name
# Fill out fields when createdself.job = job
# self is the new instance objectself.pay = pay
This is a very common coding pattern: we pass in the data to
be attached to an instance as arguments to the constructor method
and assign them to self
to retain
them permanently. In OO terms, self
is the newly created instance object,
and name
, job
, and pay
become state information—descriptive data saved on
an object for later use. Although other techniques (such as
enclosing scope references) can save details, too, instance attributes make this very
explicit and easy to understand.
Notice that the argument names appear
twice here. This code might seem a bit
redundant at first, but it’s not. The job
argument, for example, is a local
variable in the scope of the __init__
function, but self.job
is an attribute of the instance
that’s the implied subject of the method call. They are two
different variables, which happen to have the same name. By
assigning the job
local to the
self.job
attribute with self.job=job
, we save the passed-in
job
on the instance for later
use. As usual in Python, where a name is assigned (or what object it
is assigned to) determines what it means.
Speaking of arguments, there’s really nothing magical about
__init__
, apart from the fact
that it’s called automatically when an instance is made and has a
special first argument. Despite its weird name, it’s a normal
function and supports all the features of functions we’ve already
covered. We can, for example, provide defaults
for some of its arguments, so they need not be provided in cases
where their values aren’t available or useful.
To demonstrate, let’s make the job
argument optional—it will default to
None
, meaning the person being
created is not (currently) employed. If job
defaults to None
, we’ll probably want to default
pay
to 0
, too, for consistency (unless some of
the people you know manage to get paid without having jobs!). In
fact, we have to specify a default for pay
because according to Python’s syntax
rules, any arguments in a function’s header after the first default
must all have defaults, too:
# Add defaults for constructor arguments
class Person:
def __init__(self, name, job=None, pay=0):
# Normal function args
self.name = name
self.job = job
self.pay = pay
What this code means is that we’ll need to pass in a name when
making Person
s, but job
and pay
are now optional; they’ll default to
None
and 0
if omitted. The self
argument, as usual, is filled in by
Python automatically to refer to the instance object—assigning
values to attributes of self
attaches them to the new instance.
This class doesn’t do much yet—it essentially just fills out the fields of a new record—but it’s a real working class. At this point we could add more code to it for more features, but we won’t do that yet. As you’ve probably begun to appreciate already, programming in Python is really a matter of incremental prototyping—you write some code, test it, write more code, test again, and so on. Because Python provides both an interactive session and nearly immediate turnaround after code changes, it’s more natural to test as you go than to write a huge amount of code to test all at once.
Before adding more features, then, let’s test what we’ve got so far by making a few instances of our class and displaying their attributes as created by the constructor. We could do this interactively, but as you’ve also probably surmised by now, interactive testing has its limits—it gets tedious to have to reimport modules and retype test cases each time you start a new testing session. More commonly, Python programmers use the interactive prompt for simple one-off tests but do more substantial testing by writing code at the bottom of the file that contains the objects to be tested, like this:
# Add incremental self-test code class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = paybob = Person('Bob Smith')
# Test the classsue = Person('Sue Jones', job='dev', pay=100000)
# Runs __init__ automaticallyprint(bob.name, bob.pay)
# Fetch attached attributesprint(sue.name, sue.pay)
# sue's and bob's attrs differ
Notice here that the bob
object accepts the defaults for job
and pay
, but sue
provides values explicitly. Also note
how we use keyword arguments when making sue
; we could pass by position instead,
but the keywords may help remind us later what the data is (and they
allow us to pass the arguments in any left-to-right order we like).
Again, despite its unusual name, __init__
is a normal function, supporting
everything you already know about functions—including both defaults
and pass-by-name keyword arguments.
When this file runs as a script, the test code at the bottom
makes two instances of our class and prints two attributes of each
(name
and pay
):
C:misc> person.py
Bob Smith 0
Sue Jones 100000
You can also type this file’s test code at Python’s
interactive prompt (assuming you import the Person
class there first), but coding
canned tests inside the module file like this makes it much easier
to rerun them in the future.
Although this is fairly simple code, it’s already
demonstrating something important. Notice that bob
’s name
is not sue
’s, and sue
’s pay
is not bob
’s. Each is an independent record of
information. Technically, bob
and
sue
are both namespace objects—like all class instances,
they each have their own independent copy of the state information
created by the class. Because each instance of a class has its own
set of self
attributes, classes
are a natural for recording information for multiple objects this
way; just like built-in types, classes serve as a sort of
object factory. Other Python program
structures, such as functions and modules, have no such
concept.
As is, the test code at the bottom of the file works, but
there’s a big catch—its top-level print
statements run both when the file is
run as a script and when it is imported as a module. This means if
we ever decide to import the class in this file in order to use it
somewhere else (and we will later in this chapter), we’ll see the
output of its test code every time the file is imported. That’s not
very good software citizenship, though: client programs probably
don’t care about our internal tests and won’t want to see our output
mixed in with their own.
Although we could split the test code off into a separate
file, it’s often more convenient to code tests in the same file as
the items to be tested. It would be better to arrange to run the
test statements at the bottom only when the
file is run for testing, not when the file is imported. That’s
exactly what the module __name__
check is
designed for, as you learned in the preceding part of this book.
Here’s what this addition looks like:
# Allow this file to be imported as well as run/tested
class Person:
def __init__(self, name, job=None, pay=0):
self.name = name
self.job = job
self.pay = pay
if __name__ == '__main__':
# When run for testing only
# self-test code
bob = Person('Bob Smith')
sue = Person('Sue Jones', job='dev', pay=100000)
print(bob.name, bob.pay)
print(sue.name, sue.pay)
Now, we get exactly the behavior we’re after—running the file
as a top-level script tests it because its __name__
is __main__
, but importing it as a library of
classes later does not:
C:misc>person.py
Bob Smith 0 Sue Jones 100000 c:misc>python
Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) ... >>>import person
>>>
When imported, the file now defines the class, but does not use it. When run directly, this file creates two instances of our class as before, and prints two attributes of each; again, because each instance is an independent namespace object, the values of their attributes differ.
Everything looks good so far—at this point, our class is essentially a record factory; it creates and fills out fields of records (attributes of instances, in more Pythonic terms). Even as limited as it is, though, we can still run some operations on its objects. Although classes add an extra layer of structure, they ultimately do most of their work by embedding and processing basic core data types like lists and strings. In other words, if you already know how to use Python’s simple core types, you already know much of the Python class story; classes are really just a minor structural extension.
For example, the name
field
of our objects is a simple string, so we can extract last names from
our objects by splitting on spaces and indexing. These are all core
data type operations, which work whether their subjects are embedded
in class instances or not:
>>>name = 'Bob Smith'
# Simple string, outside class >>>name.split()
# Extract last name ['Bob', 'Smith'] >>>name.split()[-1]
# Or [1], if always just two parts 'Smith'
Similarly, we can give an object a pay raise by updating its
pay
field—that is, by changing its
state information in-place with an assignment. This task also involves
basic operations that work on Python’s core objects, regardless of
whether they are standalone or embedded in a class structure:
>>>pay = 100000
# Simple variable, outside class >>>pay *= 1.10
# Give a 10% raise >>>print(pay)
# Or: pay = pay * 1.10, if you like to type 110000.0 # Or: pay = pay + (pay * .10), if you _really_ do!
To apply these operations to the Person
objects created by our script, simply
do to bob.name
and sue.pay
what we just did to name
and pay
. The operations are the same, but the
subject objects are attached to attributes in our class
structure:
# Process embedded built-in types: strings, mutability class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob.name, bob.pay) print(sue.name, sue.pay)print(bob.name.split()[-1])
# Extract object's last namesue.pay *= 1.10
# Give this object a raise print(sue.pay)
We’ve added the last three lines here; when they’re run, we
extract bob
’s last name by using
basic string and list operations and give sue
a pay raise by modifying her pay
attribute in-place with basic number
operations. In a sense, sue
is also
a mutable object—her state changes in-place just
like a list after an append
call:
Bob Smith 0 Sue Jones 100000 Smith 110000.0
The preceding code works as planned, but if you show it to a veteran software developer he’ll probably tell you that its general approach is not a great idea in practice. Hardcoding operations like these outside of the class can lead to maintenance problems in the future.
For example, what if you’ve hardcoded the last-name-extraction formula at many different places in your program? If you ever need to change the way it works (to support a new name structure, for instance), you’ll need to hunt down and update every occurrence. Similarly, if the pay-raise code ever changes (e.g., to require approval or database updates), you may have multiple copies to modify. Just finding all the appearances of such code may be problematic in larger programs—they may be scattered across many files, split into individual steps, and so on.
What we really want to do here is employ a software design concept known as encapsulation. The idea with encapsulation is to wrap up operation logic behind interfaces, such that each operation is coded only once in our program. That way, if our needs change in the future, there is just one copy to update. Moreover, we’re free to change the single copy’s internals almost arbitrarily, without breaking the code that uses it.
In Python terms, we want to code operations on objects in class methods, instead of littering them throughout our program. In fact, this is one of the things that classes are very good at—factoring code to remove redundancy and thus optimize maintainability. As an added bonus, turning operations into methods enables them to be applied to any instance of the class, not just those that they’ve been hardcoded to process.
This is all simpler in code than it may sound in theory. The following achieves encapsulation by moving the two operations from code outside the class into class methods. While we’re at it, let’s change our self-test code at the bottom to use the new methods we’re creating, instead of hardcoding operations:
# Add methods to encapsulate operations for maintainability class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = paydef lastName(self):
# Behavior methodsreturn self.name.split()[-1]
# self is implied subjectdef giveRaise(self, percent):
self.pay = int(self.pay * (1 + percent))
# Must change here only if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob.name, bob.pay) print(sue.name, sue.pay)print(bob.lastName(), sue.lastName())
# Use the new methodssue.giveRaise(.10)
# instead of hardcoding print(sue.pay)
As we’ve learned, methods are simply normal functions that
are attached to classes and designed to process instances of those
classes. The instance is the subject of the method call and is
passed to the method’s self
argument automatically.
The transformation to the methods in this version is
straightforward. The new lastName
method, for example, simply
does to self
what the previous
version hardcoded for bob
,
because self
is the implied
subject when the method is called. lastName
also returns the result, because
this operation is a called function now; it computes a value for its
caller to use, even if it is just to be printed. Similarly, the new
giveRaise
method just does to
self
what we did to sue
before.
When run now, our file’s output is similar to before—we’ve mostly just refactored the code to allow for easier changes in the future, not altered its behavior:
Bob Smith 0 Sue Jones 100000 Smith Jones 110000
A few coding details are worth pointing out here. First,
notice that sue
’s pay is now
still an integer after a pay raise—we convert
the math result back to an integer by calling the int
built-in within the method. Changing
the value to either int
or
float
is probably not a
significant concern for most purposes (integer and floating-point
objects have the same interfaces and can be mixed within
expressions), but we may need to address rounding issues in a real
system (money probably matters to Person
s!).
As we learned in Chapter 5, we might
handle this by using the round(N,
2)
built-in to round and retain cents, using the decimal
type to fix precision, or storing
monetary values as full floating-point numbers and displaying them
with a %.2f
or {0:.2f}
formatting string to show cents.
For this example, we’ll simply truncate any cents with int
. (For another idea, also see the
money
function in the formats.py module of Chapter 24; you can import this tool to
show pay with commas, cents, and dollar signs.)
Second, notice that we’re also printing sue
’s last name this time—because the
last-name logic has been encapsulated in a method, we get to use it
on any instance of the class. As we’ve seen,
Python tells a method which instance to process by automatically
passing it in to the first argument, usually called self
. Specifically:
In the first call, bob.lastName()
, bob
is the implied subject passed to
self
.
In the second call, sue.lastName()
, sue
goes to self
instead.
Trace through these calls to see how the instance winds up in
self
. The net effect is that the
method fetches the name of the implied subject each time. The same
happens for giveRaise
. We could,
for example, give bob
a raise by
calling giveRaise
for both
instances this way, too; but unfortunately, bob
’s zero pay will prevent him from
getting a raise as the program is currently coded (something we may
want to address in a future 2.0 release of our software).
Finally, notice that the giveRaise
method assumes that percent
is passed in as a floating-point
number between zero and one. That may be too radical an assumption
in the real world (a 1000% raise would probably be a bug for most of
us!); we’ll let it pass for this prototype, but we might want to
test or at least document this in a future iteration of this code.
Stay tuned for a rehash of this idea in a later chapter in this
book, where we’ll code something called function
decorators and explore Python’s assert
statement—alternatives that can do
the validity test for us automatically during development.
At this point, we have a fairly full-featured class that generates and initializes instances, along with two new bits of behavior for processing instances (in the form of methods). So far, so good.
As it stands, though, testing is still a bit less convenient
than it needs to be—to trace our objects, we have to manually fetch
and print individual attributes (e.g., bob.name
, sue.pay
). It would be nice if displaying an
instance all at once actually gave us some useful information.
Unfortunately, the default display format for an instance object isn’t
very good—it displays the object’s class name, and its address in
memory (which is essentially useless in Python, except as a unique
identifier).
To see this, change the last line in the script to print(sue)
so it displays the object as a
whole. Here’s what you’ll get (the output says that sue
is an “object” in 3.0 and an “instance”
in 2.6):
Bob Smith 0 Sue Jones 100000 Smith Jones <__main__.Person object at 0x02614430>
Fortunately, it’s easy to do better by employing
operator overloading—coding methods in a
class that intercept and process built-in operations when run on the
class’s instances.
Specifically, we can make use of what is probably the second most
commonly used operator overloading method in Python, after __init__
: the __str__
method
introduced in the preceding chapter. __str__
is run automatically every time an
instance is converted to its print string. Because that’s what
printing an object does, the net transitive effect is that printing
an object displays whatever is returned by the object’s __str__
method, if it either defines one
itself or inherits one from a superclass (double-underscored names
are inherited just like any other).
Technically speaking, the __init__
constructor method we’ve already
coded is operator overloading too—it is run automatically at
construction time to initialize a newly created instance.
Constructors are so common, though, that they almost seem like a
special case. More focused methods like __str__
allow us to tap into specific
operations and provide specialized behavior
when our objects are used in those contexts.
Let’s put this into code. The following extends our class to give a custom display that lists attributes when our class’s instances are displayed as a whole, instead of relying on the less useful default display:
# Add __str__ overload method for printing objects class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent))def __str__(self):
# Added methodreturn '[Person: %s, %s]' % (self.name, self.pay)
# String to print if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue)
Notice that we’re doing string %
formatting to build the display string
in __str__
here; at the bottom,
classes use built-in type objects and operations like these to get
their work done. Again, everything you’ve already learned about both
built-in types and functions applies to class-based code. Classes
largely just add an additional layer of
structure that packages functions and data
together and supports extensions.
We’ve also changed our self-test code to print objects
directly, instead of printing individual attributes. When run, the
output is more coherent and meaningful now; the “[...]” lines are
returned by our new __str__
, run
automatically by print operations:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000]
Here’s a subtle point: as we’ll learn in the next chapter, a
related overloading method, __repr__
, provides an as-code low-level
display of an object when present. Sometimes classes provide both a
__str__
for user-friendly
displays and a __repr__
with
extra details for developers to view. Because printing runs __str__
and the interactive prompt echoes
results with __repr__
, this can
provide both target audiences with an appropriate display. Since
we’re not interested in displaying an as-code format, __str__
is sufficient for our class.
At this point, our class captures much of the OOP machinery in Python:
it makes instances, provides behavior in methods, and even does a bit
of operator overloading now to intercept print operations in __str__
. It effectively packages our data
and logic together into a single, self-contained software component, making it easy to locate
code and straightforward to change it in the future. By allowing us to
encapsulate behavior, it also allows us to factor that code to avoid
redundancy and its associated maintenance headaches.
The only major OOP concept it does not yet capture is customization by inheritance. In some sense, we’re already doing inheritance, because instances inherit methods from their classes. To demonstrate the real power of OOP, though, we need to define a superclass/subclass relationship that allows us to extend our software and replace bits of inherited behavior. That’s the main idea behind OOP, after all; by fostering a coding model based upon customization of work already done, it can dramatically cut development time.
As a next step, then, let’s put OOP’s methodology to use
and customize our Person
class by extending our software
hierarchy. For the purpose of this tutorial, we’ll define a subclass
of Person
called Manager
that replaces the inherited
giveRaise
method with a more
specialized version. Our new class begins as follows:
class Manager(Person): # Define a subclass of Person
This code means that we’re defining a new class named Manager
, which inherits from and may add
customizations to the superclass Person
. In plain terms, a Manager
is almost like a Person
(admittedly, a very long journey
for a very small joke...), but Manager
has a custom way to give
raises.
For the sake of argument, let’s assume that when a Manager
gets a raise, it receives the
passed-in percentage as usual, but also gets an extra bonus that
defaults to 10%. For instance, if a Manager
’s raise is specified as 10%, it
will really get 20%. (Any relation to Person
s living or dead is, of course,
strictly coincidental.) Our new method begins as follows; because
this redefinition of giveRaise
will be closer in the class tree to Manager
instances than the
original version in Person
, it
effectively replaces, and thereby customizes, the operation. Recall
that according to the inheritance search rules, the
lowest version of the name wins:
class Manager(Person): # Inherit Person attrs def giveRaise(self, percent, bonus=.10): # Redefine to customize
Now, there are two ways we might code this Manager
customization: a good way and a
bad way. Let’s start with the bad way, since it
might be a bit easier to understand. The bad way is to cut and paste
the code of giveRaise
in Person
and modify it for Manager
, like this:
class Manager(Person):
def giveRaise(self, percent, bonus=.10):
self.pay = int(self.pay * (1 + percent + bonus)) # Bad: cut-and-paste
This works as advertised—when we later call the giveRaise
method of a Manager
instance, it will run this custom
version, which tacks on the extra bonus. So what’s wrong with
something that runs correctly?
The problem here is a very general one: any time you copy code with cut and paste, you essentially double your maintenance effort in the future. Think about it: because we copied the original version, if we ever have to change the way raises are given (and we probably will), we’ll have to change the code in two places, not one. Although this is a small and artificial example, it’s also representative of a universal issue—any time you’re tempted to program by copying code this way, you probably want to look for a better approach.
What we really want to do here is somehow
augment the original giveRaise
, instead of replacing it
altogether. The good way to do that in Python
is by calling to the original version directly, with augmented
arguments, like this:
class Manager(Person):
def giveRaise(self, percent, bonus=.10):
Person.giveRaise(self, percent + bonus) # Good: augment original
This code leverages the fact that a class method can always be
called either through an instance (the usual
way, where Python sends the instance to the self
argument automatically) or through
the class (the less common scheme, where you
must pass the instance manually). In more symbolic terms, recall
that a normal method call of this form:
instance
.method
(args...
)
is automatically translated by Python into this equivalent form:
class
.method
(instance
,args...
)
where the class containing the method to be run is determined
by the inheritance search rule applied to the method’s name. You can
code either form in your script, but there is a
slight asymmetry between the two—you must remember to pass along the
instance manually if you call through the class directly. The method
always needs a subject instance one way or another, and Python
provides it automatically only for calls made through an instance.
For calls through the class name, you need to send an instance to
self
yourself; for code inside a
method like giveRaise
, self
already is the
subject of the call, and hence the instance to pass along.
Calling through the class directly effectively subverts
inheritance and kicks the call higher up the class tree to run a
specific version. In our case, we can use this technique to invoke
the default giveRaise
in Person
, even though it’s been redefined at
the Manager
level. In some sense, we
must call through Person
this way, because a self.give
Raise()
inside Manager
’s giveRaise
code would loop—since self
already is a Manager
, self.giveRaise()
would resolve again to
Manager.giveRaise
, and so on and
so forth until available memory is exhausted.
This “good” version may seem like a small difference in code,
but it can make a huge difference for future code
maintenance—because the giveRaise
logic lives in just one place
now (Person
’s method), we have
only one version to change in the future as needs evolve. And
really, this form captures our intent more directly anyhow—we want
to perform the standard giveRaise
operation, but simply tack on an extra bonus. Here’s our entire
module file with this step applied:
# Add customization of one behavior in a subclass class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) def __str__(self): return '[Person: %s, %s]' % (self.name, self.pay) class Manager(Person):def giveRaise(self, percent, bonus=.10):
# Redefine at this levelPerson.giveRaise(self, percent + bonus)
# Call Person's version if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue)tom = Manager('Tom Jones', 'mgr', 50000)
# Make a Manager: __init__tom.giveRaise(.10)
# Runs custom versionprint(tom.lastName())
# Runs inherited methodprint(tom)
# Runs inherited __str__
To test our Manager
subclass customization, we’ve also added self-test code that makes a
Manager
, calls its methods, and
prints it. Here’s the new version’s output:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000]
Everything looks good here: bob
and sue
are as before, and when tom
the Manager
is given a 10% raise, he really
gets 20% (his pay goes from $50K to $60K), because the customized
giveRaise
in Manager
is run for him only. Also notice
how printing tom
as a whole at
the end of the test code displays the nice format defined in
Person
’s __str__
: Manager
objects get this, lastName
, and the __init__
constructor method’s code “for
free” from Person
, by inheritance.
To make this acquisition of inherited behavior even more striking, we can add the following code at the end of our file:
if __name__ == '__main__': ... print('--All three--')for object in (bob, sue, tom):
# Process objects genericallyobject.giveRaise(.10)
# Run this object's giveRaiseprint(object)
# Run the common __str__
Here’s the resulting output:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000] --All three-- [Person: Bob Smith, 0] [Person: Sue Jones, 121000] [Person: Tom Jones, 72000]
In the added code, object
is either a Person
or a Manager
, and Python runs the appropriate
giveRaise
automatically—our
original version in Person
for
bob
and sue
, and our customized version in
Manager
for tom
. Trace the method calls yourself to
see how Python selects the right giveRaise
method for each object.
This is just Python’s notion of
polymorphism, which we met earlier in the book,
at work again—what giveRaise
does
depends on what you do it to. Here, it’s made all the more obvious
when it selects from code we’ve written ourselves in classes. The
practical effect in this code is that sue
gets another 10% but tom
gets another 20%, because give
Raise
is dispatched based upon
the object’s type. As we’ve learned, polymorphism is at the heart of
Python’s flexibility. Passing any of our three objects to a function
that calls a giveRaise
method,
for example, would have the same effect: the appropriate version
would be run automatically, depending on which type of object was
passed.
On the other hand, printing runs the same
__str__
for all three objects,
because it’s coded just once in Person
. Manager
both specializes and applies the
code we originally wrote in Person
. Although this example is small,
it’s already leveraging OOP’s talent for code customization and
reuse; with classes, this almost seems automatic at times.
In fact, classes can be even more flexible than our example
implies. In general, classes can inherit,
customize, or extend
existing code in superclasses. For example, although we’re focused
on customization here, we can also add unique methods to Manager
that are not present in Person
, if Manager
s require something completely
different (Python namesake reference intended). The following
snippet illustrates. Here, giveRaise
redefines a superclass method to
customize it, but someThingElse
defines something new to extend:
class Person: def lastName(self): ... def giveRaise(self): ... def __str__(self): ... class Manager(Person): # Inherit def giveRaise(self, ...): ... # Customize def someThingElse(self, ...): ... # Extend tom = Manager() tom.lastName() # Inherited verbatim tom.giveRaise() # Customized version tom.someThingElse() # Extension here print(tom) # Inherited overload method
Extra methods like this code’s someThingElse
extend
the existing software and are available on Manager
objects only, not on Person
s. For the purposes of this
tutorial, however, we’ll limit our scope to customizing some of
Person
’s behavior by redefining
it, not adding to it.
As is, our code may be small, but it’s fairly functional. And really, it already illustrates the main point behind OOP in general: in OOP, we program by customizing what has already been done, rather than copying or changing existing code. This isn’t always an obvious win to newcomers at first glance, especially given the extra coding requirements of classes. But overall, the programming style implied by classes can cut development time radically compared to other approaches.
For instance, in our example we could theoretically have
implemented a custom give
Raise
operation without
subclassing, but none of the other options yield code as optimal as
ours:
Although we could have simply coded Manager
from
scratch as new, independent code, we would have had
to reimplement all the behaviors in Person
that are the same for Manager
s.
Although we could have simply changed
the existing Person
class
in-place for the requirements of Manager
’s giveRaise
, doing so would probably
break the places where we still need the original Person
behavior.
Although we could have simply copied
the Person
class in its
entirety, renamed the copy to Manager
, and changed its giveRaise
, doing so would introduce
code redundancy that would double our work in the future—changes
made to Person
in the future
would not be picked up automatically, but would have to be
manually propagated to Manager
’s code. As usual, the
cut-and-paste approach may seem quick now, but it doubles your
work in the future.
The customizable hierarchies we can build with classes provide a much better solution for software that will evolve over time. No other tools in Python support this development mode. Because we can tailor and extend our prior work by coding new subclasses, we can leverage what we’ve already done, rather than starting from scratch each time, breaking what already works, or introducing multiple copies of code that may all have to be updated in the future. When done right, OOP is a powerful programmer’s ally.
Our code works as it is, but if you study the current
version closely, you may be struck by something a bit odd—it seems
pointless to have to provide a mgr
job name for Manager
objects when
we create them: this is already implied by the class itself. It would
be better if we could somehow fill in this value automatically when a
Manager
is made.
The trick we need to improve on this turns out to be the
same as the one we employed in the prior section:
we want to customize the constructor logic for Manager
s in such a way as to provide a job
name automatically. In terms of code, we want to redefine an __init__
method in Manager
that provides the mgr
string for us. And like with the
giveRaise
customization, we also
want to run the original __init__
in Person
by calling through the
class name, so it still initializes our objects’ state information
attributes.
The following extension will do the job—we’ve coded the new
Manager
constructor and changed the
call that creates tom
to not pass
in the mgr
job name:
# Add customization of constructor in a subclass class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) def __str__(self): return '[Person: %s, %s]' % (self.name, self.pay) class Manager(Person):def __init__(self, name, pay):
# Redefine constructorPerson.__init__(self, name, 'mgr', pay)
# Run original with 'mgr' def giveRaise(self, percent, bonus=.10): Person.giveRaise(self, percent + bonus) if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue)tom = Manager('Tom Jones', 50000)
# Job name not needed: tom.giveRaise(.10) # Implied/set by class print(tom.lastName()) print(tom)
Again, we’re using the same technique to augment the __init__
constructor here that we used for
giveRaise
earlier—running the
superclass version by calling through the class name directly and
passing the self
instance along
explicitly. Although the constructor has a strange name, the effect is
identical. Because we need Person
’s
construction logic to run too (to initialize instance attributes), we
really have to call it this way; otherwise, instances would not have
any attributes attached.
Calling superclass constructors from redefinitions this way
turns out to be a very common
coding pattern in Python. By itself, Python uses inheritance to look
for and call only one __init__
method at construction time—the
lowest one in the class tree. If you need higher
__init__
methods to be run at
construction time (and you usually do), you must call them manually
through the superclass’s name. The upside to this is that you can be
explicit about which argument to pass up to the superclass’s
constructor and can choose to not call it at all: not calling the
superclass constructor allows you to replace its logic altogether,
rather than augmenting it.
The output of this file’s self-test code is the same as before—we haven’t changed what it does, we’ve simply restructured to get rid of some logical redundancy:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000]
In this complete form, despite their sizes, our classes capture nearly all the important concepts in Python’s OOP machinery:
Instance creation—filling out instance attributes
Behavior methods—encapsulating logic in class methods
Operator overloading—providing behavior for built-in operations like printing
Customizing behavior—redefining methods in subclasses to specialize them
Customizing constructors—adding initialization logic to superclass steps
Most of these concepts are based upon just three simple ideas:
the inheritance search for attributes in object trees, the special
self
argument in methods, and
operator overloading’s automatic dispatch to methods.
Along the way, we’ve also made our code easy to change in the future, by harnessing the class’s propensity for factoring code to reduce redundancy. For example, we wrapped up logic in methods and called back to superclass methods from extensions to avoid having multiple copies of the same code. Most of these steps were a natural outgrowth of the structuring power of classes.
By and large, that’s all there is to OOP in Python. Classes certainly can become larger than this, and there are some more advanced class concepts, such as decorators and metaclasses, which we will meet in later chapters. In terms of the basics, though, our classes already do it all. In fact, if you’ve grasped the workings of the classes we’ve written, most OOP Python code should now be within your reach.
Having said that, I should also tell you that although the
basic mechanics of OOP are simple in Python, some of the art in
larger programs lies in the way that classes are put together. We’re
focusing on inheritance in this tutorial
because that’s the mechanism the Python language provides, but
programmers sometimes combine classes in other ways, too. For
example, a common coding pattern involves nesting objects inside
each other to build up composites. We’ll explore this pattern in
more detail in Chapter 30, which is
really more about design than about Python; as a quick example,
though, we could use this composition idea to code our Manager
extension by embedding a Person
, instead of inheriting
from it.
The following alternative does so by using the __getattr__
operator overloading method we will meet in Chapter 29 to intercept undefined attribute
fetches and delegate them to the embedded object with the getattr
built-in. The giveRaise
method here still achieves
customization, by changing the argument passed along to the embedded
object. In effect, Manager
becomes a controller layer that passes calls
down to the embedded object, rather than
up to superclass methods:
# Embedding-based Manager alternative class Person:...same...
class Manager: def __init__(self, name, pay): self.person = Person(name, 'mgr', pay) # Embed a Person object def giveRaise(self, percent, bonus=.10): self.person.giveRaise(percent + bonus) # Intercept and delegate def __getattr__(self, attr): return getattr(self.person, attr) # Delegate all other attrs def __str__(self): return str(self.person) # Must overload again (in 3.0) if __name__ == '__main__':...same...
In fact, this Manager
alternative is representative of a general coding pattern usually
known as delegation—a composite-based structure that
manages a wrapped object and propagates method calls to it. This
pattern works in our example, but it requires about twice as much
code and is less well suited than inheritance to the kinds of direct
customizations we meant to express (in fact, no reasonable Python
programmer would code this example this way in practice, except
perhaps those writing general tutorials). Manager
isn’t really a Person
here, so we need extra code to
manually dispatch method calls to the embedded object; operator
overloading methods like __str__
must be redefined (in 3.0, at least, as noted in the upcoming
sidebar Catching Built-in Attributes in 3.0), and
adding new Manager
behavior is
less straightforward since state information is one level
removed.
Still, object embedding, and design patterns based
upon it, can be a very good fit when embedded objects require more
limited interaction with the container than direct customization
implies. A controller layer like this alternative Manager
, for example, might come in handy
if we want to trace or validate calls to another object’s methods
(indeed, we will use a nearly identical coding pattern when we study
class decorators later in the book). Moreover,
a hypothetical Department
class
like the following could aggregate other
objects in order to treat them as a set. Add this to the bottom of
the person.py file to try this
on your own:
# Aggregate embedded objects into a composite ... bob = Person(...) sue = Person(...) tom = Manager(...) class Department: def __init__(self, *args): self.members = list(args) def addMember(self, person): self.members.append(person) def giveRaises(self, percent): for person in self.members: person.giveRaise(percent) def showAll(self): for person in self.members: print(person) development = Department(bob, sue) # Embed objects in a composite development.addMember(tom) development.giveRaises(.10) # Runs embedded objects' giveRaise development.showAll() # Runs embedded objects' __str__s
Interestingly, this code uses both inheritance
and composition—Department
is a composite that embeds and
controls other objects to aggregate, but the embedded Person
and Manager
objects themselves use inheritance
to customize. As another example, a GUI might similarly use
inheritance to customize the behavior or
appearance of labels and buttons, but also
composition to build up larger packages of
embedded widgets, such as input forms, calculators, and text
editors. The class structure to use depends on the objects you are
trying to model.
Design issues like composition are explored in Chapter 30, so we’ll postpone further
investigations for now. But again, in terms of the basic mechanics
of OOP in Python, our Person
and
Manager
classes already tell the
entire story. Having mastered the basics of OOP, though, developing
general tools for applying it more easily in your scripts is often a
natural next step—and the topic of the next section.
Let’s make one final tweak before we throw our objects onto a database. As they are, our classes are complete and demonstrate most of the basics of OOP in Python. They still have two remaining issues we probably should iron out, though, before we go live with them:
First, if you look at the display of the objects as they are
right now, you’ll notice that when you print tom
the Manager
labels him as a Person
. That’s not technically
incorrect, since Manager
is a
kind of customized and specialized Person
. Still, it would be more accurate
to display objects with the most specific (that is,
lowest) classes possible.
Second, and perhaps more importantly, the current display
format shows only the attributes we include
in our __str__
, and that might
not account for future goals. For example, we can’t yet verify
that tom
’s job name has been
set to mgr
correctly by
Manager
’s constructor, because
the __str__
we coded for
Person
does not print this
field. Worse, if we ever expand or otherwise change the set of
attributes assigned to our objects in __init__
, we’ll have to remember to also
update __str__
for new names to
be displayed, or it will become out of sync over time.
The last point means that, yet again, we’ve made potential extra
work for ourselves in the future by introducing
redundancy in our code. Because any disparity in
__str__
will be reflected in the
program’s output, this redundancy may be more obvious than the other
forms we addressed earlier; still, avoiding extra work in the future
is generally a good thing.
We can address both issues with Python’s introspection tools—special attributes and functions that give us access to some of the internals of objects’ implementations. These tools are somewhat advanced and generally used more by people writing tools for other programmers to use than by programmers developing applications. Even so, a basic knowledge of some of these tools is useful because they allow us to write code that processes classes in generic ways. In our code, for example, there are two hooks that can help us out, both of which were introduced near the end of the preceding chapter:
The built-in instance
.__class__
attribute provides a link
from an instance to the class from which it was created. Classes
in turn have a __name__
, just
like modules, and a __bases__
sequence that provides access to superclasses. We can use these
here to print the name of the class from which an instance is
made rather than one we’ve hardcoded.
The built-in object
.__dict__
attribute provides a
dictionary with one key/value pair for every attribute attached
to a namespace object (including modules, classes, and
instances). Because it is a dictionary, we can fetch its keys
list, index by key, iterate over its keys, and so on, to process
all attributes generically. We can use this here to print every
attribute in any instance, not just those we hardcode in custom
displays.
Here’s what these tools look like in action at Python’s
interactive prompt. Notice how we load Person
at the interactive prompt with a
from
statement here—class names
live in and are imported from modules, exactly like function names
and other variables:
>>>from person import Person
>>>bob = Person('Bob Smith')
>>>print(bob)
# Show bob's __str__ [Person: Bob Smith, 0] >>>bob.__class__
# Show bob's class and its name <class 'person.Person'> >>>bob.__class__.__name__
'Person' >>>list(bob.__dict__.keys())
# Attributes are really dict keys ['pay', 'job', 'name'] # Use list to force list in 3.0 >>>for key in bob.__dict__:
print(key, '=>', bob.__dict__[key])
# Index manually pay => 0 job => None name => Bob Smith >>>for key in bob.__dict__:
print(key, '=>', getattr(bob, key))
# obj.attr, but attr is a var pay => 0 job => None name => Bob Smith
As noted briefly in the prior chapter, some attributes
accessible from an instance might not be stored in the __dict__
dictionary if the instance’s
class defines __slots__
, an
optional and relatively obscure feature of new-style classes (and
all classes in Python 3.0) that stores attributes in an array and
that we’ll discuss in Chapters 30
and 31. Since slots really belong to
classes instead of instances, and since they are very rarely used in
any event, we can safely ignore them here and focus on the normal
__dict__
.
We can put these interfaces to work in a superclass that
displays accurate class names and formats all attributes of an
instance of any class. Open a new file in your text editor to code
the following—it’s a new, independent module named classtools.py
that implements just such a class. Because its __str__
print overload uses generic
introspection tools, it will work on any
instance, regardless of its attributes set. And because
this is a class, it automatically becomes a general formatting tool:
thanks to inheritance, it can be mixed into any
class that wishes to use its display format. As an added
bonus, if we ever want to change how instances are displayed we need
only change this class, as every class that inherits its __str__
will automatically pick up the new
format when it’s next run:
# File classtools.py (new) "Assorted class utilities and tools" class AttrDisplay: """ Provides an inheritable print overload method that displays instances with their class names and a name=value pair for each attribute stored on the instance itself (but not attrs inherited from its classes). Can be mixed into any class, and will work on any instance. """ def gatherAttrs(self): attrs = [] for key in sorted(self.__dict__): attrs.append('%s=%s' % (key, getattr(self, key))) return ', '.join(attrs) def __str__(self): return '[%s: %s]' % (self.__class__.__name__, self.gatherAttrs()) if __name__ == '__main__': class TopTest(AttrDisplay): count = 0 def __init__(self): self.attr1 = TopTest.count self.attr2 = TopTest.count+1 TopTest.count += 2 class SubTest(TopTest): pass X, Y = TopTest(), SubTest() print(X) # Show all instance attrs print(Y) # Show lowest class name
Notice the docstrings here—as a general-purpose tool, we want
to add some functional documentation for potential users to read. As
we saw in Chapter 15, docstrings
can be placed at the top of simple functions and modules, and also
at the start of classes and their methods; the help
function and the PyDoc tool extracts
and displays these automatically (we’ll look at docstrings again in
Chapter 28).
When run directly, this module’s self-test makes two instances
and prints them; the __str__
defined here shows the instance’s class, and all its attributes
names and values, in sorted attribute name order:
C:misc> classtools.py
[TopTest: attr1=0, attr2=1]
[SubTest: attr1=2, attr2=3]
If you study the classtools
module’s self-test code long enough, you’ll notice that its class
displays only instance attributes, attached to
the self
object at the bottom of
the inheritance tree; that’s what self
’s __dict__
contains. As an intended
consequence, we don’t see attributes inherited by the instance from
classes above it in the tree (e.g., count
in this file’s self-test code).
Inherited class attributes are attached to the class only, not
copied down to instances.
If you ever do wish to include inherited attributes too, you
can climb the __class__
link to
the instance’s class, use the __dict__
there to fetch class attributes,
and then iterate through the class’s __bases__
attribute to climb to even
higher superclasses (repeating as necessary). If you’re a fan of
simple code, running a built-in dir
call on the instance instead of using
__dict__
and climbing would have
much the same effect, since dir
results include inherited names in the sorted results list:
>>>from person import Person
>>>bob = Person('Bob Smith')
# In Python 2.6: >>>bob.__dict__.keys()
# Instance attrs only ['pay', 'job', 'name'] >>>dir(bob)
# + inherited attrs in classes ['__doc__', '__init__', '__module__', '__str__', 'giveRaise', 'job', 'lastName', 'name', 'pay'] # In Python 3.0: >>>list(bob.__dict__.keys())
# 3.0 keys is a view, not a list ['pay', 'job', 'name'] >>>dir(bob)
# 3.0 includes class type methods ['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', ...more lines omitted... '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'giveRaise', 'job', 'lastName', 'name', 'pay']
The output here varies between Python 2.6 and 3.0, because
3.0’s dict.keys
is not a list,
and 3.0’s dir
returns extra
class-type implementation attributes. Technically, dir
returns more in 3.0 because classes
are all “new style” and inherit a large set of operator overloading
names from the class type. In fact, you’ll probably want to filter
out most of the __
X
__
names in the 3.0 dir
result, since they are internal
implementation details and not something you’d normally want to
display.
In the interest of space, we’ll leave optional display of
inherited class attributes with either tree climbs or dir
as suggested experiments for now. For
more hints on this front, though, watch for the classtree.py inheritance tree climber we
will write in Chapter 28, and the
lister.py attribute listers and
climbers we’ll code in Chapter 30.
One last subtlety here: because our AttrDisplay
class in the classtools
module is a general tool
designed to be mixed into other arbitrary classes, we have to be
aware of the potential for unintended name
collisions with client classes. As is, I’ve assumed that
client subclasses may want to use both its __str__
and gatherAttrs
, but the latter of these may
be more than a subclass expects—if a subclass innocently defines a
gatherAttrs
name of its own, it
will likely break our class, because the lower version in the
subclass will be used instead of ours.
To see this for yourself, add a gatherAttrs
to TopTest
in the file’s self-test code;
unless the new method is identical, or intentionally customizes the
original, our tool class will no longer work as planned:
class TopTest(AttrDisplay):
....
def gatherAttrs(self): # Replaces method in AttrDisplay!
return 'Spam'
This isn’t necessarily bad—sometimes we want other methods to
be available to subclasses, either for direct calls or for
customization. If we really meant to provide a __str__
only, though, this is less than
ideal.
To minimize the chances of name collisions like this, Python
programmers often prefix methods not meant for external use with a
single underscore: _gatherAttrs
in our case. This isn’t
foolproof (what if another class defines _gatherAttrs
, too?), but it’s usually
sufficient, and it’s a common Python naming convention for methods
internal to a class.
A better and less commonly used solution would be to use
two underscores at the front of the method name
only: __gatherAttrs
for us.
Python automatically expands such names to include the enclosing
class’s name, which makes them truly unique. This is a feature
usually called pseudoprivate class attributes,
which we’ll expand on in Chapter 30.
For now, we’ll make both our methods available.
Now, to use this generic tool in our classes, all we need to
do is import it from its module, mix it in by inheritance in our
top-level class, and get rid of the more specific __str__
we coded before. The new print
overload method will be inherited by
instances of Person
, as well as
Manager
; Manager
gets __str__
from Person
, which now obtains it from the
AttrDisplay
coded in another
module. Here is the final version of our person.py file with these
changes applied:
# File person.py (final)
from classtools import AttrDisplay
# Use generic display tool
class Person(AttrDisplay):
"""
Create and process person records
"""
def __init__(self, name, job=None, pay=0):
self.name = name
self.job = job
self.pay = pay
def lastName(self): # Assumes last is last
return self.name.split()[-1]
def giveRaise(self, percent): # Percent must be 0..1
self.pay = int(self.pay * (1 + percent))
class Manager(Person):
"""
A customized Person with special requirements
"""
def __init__(self, name, pay):
Person.__init__(self, name, 'mgr', pay)
def giveRaise(self, percent, bonus=.10):
Person.giveRaise(self, percent + bonus)
if __name__ == '__main__':
bob = Person('Bob Smith')
sue = Person('Sue Jones', job='dev', pay=100000)
print(bob)
print(sue)
print(bob.lastName(), sue.lastName())
sue.giveRaise(.10)
print(sue)
tom = Manager('Tom Jones', 50000)
tom.giveRaise(.10)
print(tom.lastName())
print(tom)
As this is the final revision, we’ve added a few
comments here to document our work—docstrings for functional
descriptions and #
for smaller
notes, per best-practice conventions. When we run this code now, we
see all the attributes of our objects, not just the ones we
hardcoded in the original __str__
. And our final issue is resolved:
because AttrDisplay
takes class
names off the self
instance
directly, each object is shown with the name of its closest (lowest)
class—tom
displays as a Manager
now, not a Person
, and we can finally verify that his
job name has been correctly filled in by the Manager
constructor:
C:misc> person.py
[Person: job=None, name=Bob Smith, pay=0]
[Person: job=dev, name=Sue Jones, pay=100000]
Smith Jones
[Person: job=dev, name=Sue Jones, pay=110000]
Jones
[Manager: job=mgr, name=Tom Jones, pay=60000]
This is the more useful display we were after. From a larger perspective, though, our attribute display class has become a general tool, which we can mix into any class by inheritance to leverage the display format it defines. Further, all its clients will automatically pick up future changes in our tool. Later in the book, we’ll meet even more powerful class tool concepts, such as decorators and metaclasses; along with Python’s introspection tools, they allow us to write code that augments and manages classes in structured and maintainable ways.
At this point, our work is almost complete. We now have a two-module system that not only implements our original design goals for representing people, but also provides a general attribute display tool we can use in other programs in the future. By coding functions and classes in module files, we’ve ensured that they naturally support reuse. And by coding our software as classes, we’ve ensured that it naturally supports extension.
Although our classes work as planned, though, the objects they create are not real database records. That is, if we kill Python, our instances will disappear—they’re transient objects in memory and are not stored in a more permanent medium like a file, so they won’t be available in future program runs. It turns out that it’s easy to make instance objects more permanent, with a Python feature called object persistence—making objects live on after the program that creates them exits. As a final step in this tutorial, let’s make our objects permanent.
Object persistence is implemented by three standard library modules, available in every Python:
pickle
Serializes arbitrary Python objects to and from a string of bytes
dbm
(named anydbm
in Python 2.6)Implements an access-by-key filesystem for storing strings
shelve
Uses the other two modules to store Python objects on a file by key
We met these modules very briefly in Chapter 9 when we studied file basics. They provide powerful data storage options. Although we can’t do them complete justice in this tutorial or book, they are simple enough that a brief introduction is enough to get you started.
The pickle
module is a sort
of super-general object formatting and deformatting tool: given a
nearly arbitrary Python object in memory, it’s clever enough to
convert the object to a string of bytes, which it can use later to
reconstruct the original object in memory. The pickle
module can handle almost any object
you can create—lists, dictionaries, nested combinations thereof, and
class instances. The latter are especially useful things to pickle,
because they provide both data (attributes) and behavior (methods);
in fact, the combination is roughly equivalent to “records” and
“programs.” Because pickle
is so
general, it can replace extra code you might otherwise write to
create and parse custom text file representations for your objects.
By storing an object’s pickle string on a file, you effectively make
it permanent and persistent: simply load and unpickle it later to
re-create the original object.
Although it’s easy to use pickle
by itself to store objects in
simple flat files and load them from there later, the shelve
module provides an extra layer of
structure that allows you to store pickled objects by
key. shelve
translates an object to its pickled string with pickle
and stores that string under a key
in a dbm
file; when later
loading, shelve
fetches the
pickled string by key and re-creates the original object in memory
with pickle
. This is all quite a
trick, but to your script a shelve[62] of pickled objects looks just like a
dictionary—you index by key to fetch,
assign to keys to store, and use dictionary tools such as len
, in
, and dict.keys
to get information. Shelves
automatically map dictionary operations to objects stored in a
file.
In fact, to your script the only coding difference between a shelve and a normal dictionary is that you must open shelves initially and must close them after making changes. The net effect is that a shelve provides a simple database for storing and fetching native Python objects by keys, and thus makes them persistent across program runs. It does not support query tools such as SQL, and it lacks some advanced features found in enterprise-level databases (such as true transaction processing), but native Python objects stored on a shelve may be processed with the full power of the Python language once they are fetched back by key.
Pickling and shelves are somewhat advanced topics, and we won’t go into all their details here; you can read more about them in the standard library manuals, as well as application-focused books such as Programming Python. This is all simpler in Python than in English, though, so let’s jump into some code.
Let’s write a new script that throws objects of our classes
onto a shelve. In your text editor, open a new file we’ll call
makedb.py. Since this is a new file, we’ll
need to import our classes in order to create a few instances to
store. We used from
to load a
class at the interactive prompt earlier, but really, as with
functions and other variables, there are two ways to load a class
from a file (class names are variables like any other, and not at
all magic in this context):
import person # Load class with import bob = person.Person(...) # Go through module name from person import Person # Load class with from bob = Person(...) # Use name directly
We’ll use from
to load in
our script, just because it’s a bit less to type. Copy or retype
this code to make instances of our classes in the new script, so we
have something to store (this is a simple demo, so we won’t worry
about the test-code redundancy here). Once we have some instances,
it’s almost trivial to store them on a shelve. We simply import the
shelve
module, open a new shelve
with an external filename, assign the objects to keys in the shelve,
and close the shelve when we’re done because we’ve made
changes:
# File makedb.py: store Person objects on a shelve database from person import Person, Manager # Load our classes bob = Person('Bob Smith') # Re-create objects to be stored sue = Person('Sue Jones', job='dev', pay=100000) tom = Manager('Tom Jones', 50000) import shelve db = shelve.open('persondb') # Filename where objects are stored for object in (bob, sue, tom): # Use object's name attr as key db[object.name] = object # Store object on shelve by key db.close() # Close after making changes
Notice how we assign objects to the shelve using their own
names as keys. This is just for convenience; in a shelve, the
key can be any string, including one we might
create to be unique using tools such as process IDs and timestamps
(available in the os
and time
standard library modules). The only
rule is that the keys must be strings and should be unique, since we
can store just one object per key (though that object can be a list
or dictionary containing many objects). The
values we store under keys, though, can be
Python objects of almost any sort: built-in types like strings,
lists, and dictionaries, as well as user-defined class instances,
and nested combinations of all of these.
That’s all there is to it—if this script has no output when run, it means it probably worked; we’re not printing anything, just creating and storing objects:
C:misc> makedb.py
At this point, there are one or more real files in the current
directory whose names all start with “persondb”. The actual files
created can vary per platform, and just like in the built-in
open
function, the filename in
shelve.open()
is relative to the
current working directory unless it includes a directory path.
Wherever they are stored, these files implement a keyed-access file
that contains the pickled representation of our three Python
objects. Don’t delete these files—they are your database, and are
what you’ll need to copy or transfer when you back up or move your
storage.
You can look at the shelve’s files if you want to, either from
Windows Explorer or the Python shell, but they are binary hash
files, and most of their content makes little sense outside the
context of the shelve
module.
With Python 3.0 and no extra software installed, our database is
stored in three files (in 2.6, it’s just one file, persondb, because the bsddb
extension module is preinstalled with
Python for shelves; in 3.0, bsddb
is a third-party open source add-on):
# Directory listing module: verify files are present >>>import glob
>>>glob.glob('person*')
['person.py', 'person.pyc', 'persondb.bak', 'persondb.dat', 'persondb.dir'] # Type the file: text mode for string, binary mode for bytes >>>print(open('persondb.dir').read())
'Tom Jones', (1024, 91)...more omitted...
>>>print(open('persondb.dat', 'rb').read())
b'x80x03cperson Person qx00)x81qx01}qx02(Xx03x00x00x00payqx03K......more omitted...
This content isn’t impossible to decipher, but it can vary on different platforms and doesn’t exactly qualify as a user-friendly database interface! To verify our work better, we can write another script, or poke around our shelve at the interactive prompt. Because shelves are Python objects containing Python objects, we can process them with normal Python syntax and development modes. Here, the interactive prompt effectively becomes a database client:
>>>import shelve
>>>db = shelve.open('persondb')
# Reopen the shelve >>>len(db)
# Three 'records' stored 3 >>>list(db.keys())
# keys is the index ['Tom Jones', 'Sue Jones', 'Bob Smith'] # list to make a list in 3.0 >>>bob = db['Bob Smith']
# Fetch bob by key >>>print(bob)
# Runs __str__ from AttrDisplay [Person: job=None, name=Bob Smith, pay=0] >>>bob.lastName()
# Runs lastName from Person 'Smith' >>>for key in db:
# Iterate, fetch, printprint(key, '=>', db[key])
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Bob Smith => [Person: job=None, name=Bob Smith, pay=0] >>>for key in sorted(db):
print(key, '=>', db[key])
# Iterate by sorted keys Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]
Notice that we don’t have to import our Person
or Manager
classes here in order to load or
use our stored objects. For example, we can call bob
’s lastName
method freely, and get his custom
print display format automatically, even though we don’t have his
Person
class in our scope here.
This works because when Python pickles a class instance, it records
its self
instance attributes,
along with the name of the class it was created from and the module
where the class lives. When bob
is later fetched from the shelve and unpickled, Python will
automatically reimport the class and link bob
to it.
The upshot of this scheme is that class instances automatically acquire all their class behavior when they are loaded in the future. We have to import our classes only to make new instances, not to process existing ones. Although a deliberate feature, this scheme has somewhat mixed consequences:
The downside is that classes and
their module’s files must be importable when an instance is
later loaded. More formally, pickleable classes must be coded at
the top level of a module file accessible from a directory
listed on the sys.path
module
search path (and shouldn’t live in the most script files’ module
__main__
unless they’re
always in that module when used). Because of this external
module file requirement, some applications choose to pickle
simpler objects such as dictionaries or lists, especially if
they are to be transferred across the Internet.
The upside is that changes in a class’s source code file are automatically picked up when instances of the class are loaded again; there is often no need to update stored objects themselves, since updating their class’s code changes their behavior.
Shelves also have well-known limitations (the database suggestions at the end of this chapter mention a few of these). For simple object storage, though, shelves and pickles are remarkably easy-to-use tools.
Now for one last script: let’s write a program that
updates an instance (record) each time it runs, to prove the point
that our objects really are persistent (i.e.,
that their current values are available every time a Python program
runs). The following file, updatedb.py, prints the database and
gives a raise to one of our stored objects each time. If you trace
through what’s going on here, you’ll notice that we’re getting a lot
of utility “for free”—printing our objects automatically employs the
general __str__
overloading
method, and we give raises by calling the giveRaise
method we wrote earlier. This
all “just works” for objects based on OOP’s inheritance model, even
when they live in a file:
# File updatedb.py: update Person object on database import shelve db = shelve.open('persondb') # Reopen shelve with same filename for key in sorted(db): # Iterate to display database objects print(key, ' =>', db[key]) # Prints with custom format sue = db['Sue Jones'] # Index by key to fetch sue.giveRaise(.10) # Update in memory using class method db['Sue Jones'] = sue # Assign to key to update in shelve db.close() # Close after making changes
Because this script prints the database when it starts up, we
have to run it a few times to see our objects change. Here it is in
action, displaying all records and increasing sue
’s pay each time it’s run (it’s a
pretty good script for sue
...):
c:misc>updatedb.py
Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:misc>updatedb.py
Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=110000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:misc>updatedb.py
Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=121000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:misc>updatedb.py
Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=133100] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]
Again, what we see here is a product of the shelve
and pickle
tools we get from Python, and of
the behavior we coded in our classes ourselves. And once again, we
can verify our script’s work at the interactive prompt (the shelve’s
equivalent of a database client):
c:misc>python
>>>import shelve
>>>db = shelve.open('persondb')
# Reopen database >>>rec = db['Sue Jones']
# Fetch object by key >>>print(rec)
[Person: job=dev, name=Sue Jones, pay=146410] >>>rec.lastName()
'Jones' >>>rec.pay
146410
For another example of object persistence in this book, see
the sidebar in Chapter 30 titled
Why You Will Care: Classes and Persistence. It
stores a somewhat larger composite object in a flat file with
pickle
instead of shelve
, but the effect is similar. For
more details on both pickles and shelves, see other books or
Python’s manuals.
And that’s a wrap for this tutorial. At this point, you’ve seen all the basics of Python’s OOP machinery in action, and you’ve learned ways to avoid redundancy and its associated maintenance issues in your code. You’ve built full-featured classes that do real work. As an added bonus, you’ve made them real database records by storing them in a Python shelve, so their information lives on persistently.
There is much more we could explore here, of course. For example, we could extend our classes to make them more realistic, add new kinds of behavior to them, and so on. Giving a raise, for instance, should in practice verify that pay increase rates are between zero and one—an extension we’ll add when we meet decorators later in this book. You might also mutate this example into a personal contacts database, by changing the state information stored on objects, as well as the class methods used to process it. We’ll leave this a suggested exercise open to your imagination.
We could also expand our scope to use tools that either come with Python or are freely available in the open source world:
As is, we can only process our database with the
interactive prompt’s command-based interface, and scripts. We
could also work on expanding our object database’s usability by
adding a graphical user interface for browsing and updating its
records. GUIs can be built portably with either Python’s
tkinter
(Tkinter
in 2.6) standard library
support, or third-party toolkits such as WxPython and PyQt.
tkinter
ships with Python,
lets you build simple GUIs quickly, and is ideal for learning
GUI programming techniques; WxPython and PyQt tend to be more
complex to use but often produce higher-grade GUIs in the
end.
Although GUIs are convenient and fast, the Web is hard to beat in terms of accessibility. We might also implement a website for browsing and updating records, instead of or in addition to GUIs and the interactive prompt. Websites can be constructed with either basic CGI scripting tools that come with Python, or full-featured third-party web frameworks such as Django, TurboGears, Pylons, web2Py, Zope, or Google’s App Engine. On the Web, your data can still be stored in a shelve, pickle file, or other Python-based medium; the scripts that process it are simply run automatically on a server in response to requests from web browsers and other clients, and they produce HTML to interact with a user, either directly or by interfacing with Framework APIs.
Although web clients can often parse information in the replies from websites (a technique colorfully known as “screen scraping”), we might go further and provide a more direct way to fetch records on the Web via a web services interface such as SOAP or XML-RPC calls—APIs supported by either Python itself or the third-party open source domain. Such APIs return data in a more direct form, rather than embedded in the HTML of a reply page.
If our database becomes higher-volume or critical, we
might eventually move it from shelves to a more full-featured
storage mechanism such as the open source ZODB object-oriented database system (OODB), or a
more traditional SQL-based relational database system such as
MySQL, Oracle, PostgreSQL, or SQLite. Python itself comes with
the in-process SQLite database system built-in, but other open
source options are freely available on the Web. ZODB, for
example, is similar to Python’s shelve
but addresses many of its
limitations, supporting larger databases, concurrent updates,
transaction processing, and automatic write-through on in-memory
changes. SQL-based systems like MySQL offer enterprise-level
tools for database storage and may be directly used from a
within a Python script.
If we do migrate to a relational database system for
storage, we don’t have to sacrifice Python’s OOP tools.
Object-relational mappers (ORMs) like SQLObject
and SQLAlchemy can automatically map relational tables and rows
to and from Python classes and instances, such that we can
process the stored data using normal Python class syntax. This
approach provides an alternative to OODBs like shelve
and ZODB and leverages the
power of both relational databases and Python’s class
model.
While I hope this introduction whets your appetite for future exploration, all of these topics are of course far beyond the scope of this tutorial and this book at large. If you want to explore any of them on your own, see the Web, Python’s standard library manuals, and application-focused books such as Programming Python. In the latter I pick up this example where we’ve stopped here, showing how to add both a GUI and a website on top of the database to allow for browsing and updating instance records. I hope to see you there eventually, but first, let’s return to class fundamentals and finish up the rest of the core Python language story.
In this chapter, we explored all the fundamentals of Python classes and OOP in action, by building upon a simple but real example, step by step. We added constructors, methods, operator overloading, customization with subclasses, and introspection tools, and we met other concepts (such as composition, delegation, and polymorphism) along the way.
In the end, we took objects created by our classes and made them persistent by storing them on a shelve object database—an easy-to-use system for saving and retrieving native Python objects by key. While exploring class basics, we also encountered multiple ways to factor our code to reduce redundancy and minimize future maintenance costs. Finally, we briefly previewed ways to extend our code with application-programming tools such as GUIs and databases, covered in follow-up books.
In the next chapters of this part of the book we’ll return to our study of the details behind Python’s class model and investigate its application to some of the design concepts used to combine classes in larger programs. Before we move ahead, though, let’s work through this chapter’s quiz to review what we covered here. Since we’ve already done a lot of hands-on work in this chapter, we’ll close with a set of mostly theory-oriented questions designed to make you trace through some of the code and ponder some of the bigger ideas behind it.
When we fetch a Manager
object from the shelve and print it, where does the display format
logic come from?
When we fetch a Person
object from a shelve without importing its module, how does the
object know that it has a giveRaise
method that we can
call?
Why is it so important to move processing into methods, instead of hardcoding it outside the class?
Why is it better to customize by subclassing rather than copying the original and modifying?
Why is it better to call back to a superclass method to run default actions, instead of copying and modifying its code in a subclass?
Why is it better to use tools like __dict__
that allow objects to be
processed generically than
to write more custom code for each type of class?
In general terms, when might you choose to use object embedding and composition instead of inheritance?
How might you modify the classes in this chapter to implement a personal contacts database in Python?
In the final version of our classes, Manager
ultimately inherits its __str__
printing method from AttrDisplay
in the separate classtools
module. Manager
doesn’t have one itself, so the
inheritance search climbs to its Person
superclass; because there is no
__str__
there either, the
search climbs higher and finds it in AttrDisplay
. The class names listed in
parentheses in a class
statement’s header line provide the
links to higher superclasses.
Shelves (really, the pickle
module they use) automatically
relink an instance to the class it was created from when that
instance is later loaded back into memory. Python reimports the
class from its module internally, creates an instance with its
stored attributes, and sets the instance’s __class__
link to point to its original
class. This way, loaded instances automatically obtain all their
original methods (like lastName
, giveRaise
, and __str__
), even if we have not imported
the instance’s class into our scope.
It’s important to move processing into methods so that there is only one copy to change in the future, and so that the methods can be run on any instance. This is Python’s notion of encapsulation—wrapping up logic behind interfaces, to better support future code maintenance. If you don’t do so, you create code redundancy that can multiply your work effort as the code evolves in the future.
Customizing with subclasses reduces development effort. In OOP, we code by customizing what has already been done, rather than copying or changing existing code. This is the real “big idea” in OOP—because we can easily extend our prior work by coding new subclasses, we can leverage what we’ve already done. This is much better than either starting from scratch each time, or introducing multiple redundant copies of code that may all have to be updated in the future.
Copying and modifying code doubles your potential work effort in the future, regardless of the context. If a subclass needs to perform default actions coded in a superclass method, it’s much better to call back to the original through the superclass’s name than to copy its code. This also holds true for superclass constructors. Again, copying code creates redundancy, which is a major issue as code evolves.
Generic tools can avoid hardcoded solutions that must be
kept in sync with the rest of the class as it evolves over time. A
generic __str__
print method,
for example, need not be updated each time a new attribute is
added to instances in an __init__
constructor. In addition, a
generic print
method inherited
by all classes only appears, and need only be modified, in one
place—changes in the generic version are picked up by all classes
that inherit from the generic class. Again, eliminating code
redundancy cuts future development effort;
that’s one of the primary assets classes bring to the
table.
Inheritance is best at coding extensions based on direct
customization (like our Manager
specialization of Person
).
Composition is well suited to scenarios where multiple objects are
aggregated into a whole and directed by a controller layer class.
Inheritance passes calls up to reuse, and composition passes down
to delegate. Inheritance and composition are not mutually
exclusive; often, the objects embedded in a controller are
themselves customizations based upon inheritance.
The classes in this chapter could be used as boilerplate
“template” code to implement
a variety of types of databases. Essentially, you can repurpose
them by modifying the constructors to record different attributes
and providing whatever methods are appropriate for the target
application. For instance, you might use attributes such as name
, address
, birthday
, phone
, email
, and so on for a contacts
database, and methods appropriate for this purpose. A method named
sendmail
, for example, might
use Python’s standard library smtplib
module to send an email to one
of the contacts automatically when called (see Python’s manuals or
application-level books for more details on such tools). The
AttrDisplay
tool we wrote here
could be used verbatim to print your objects, because it is
intentionally generic. Most of the shelve database code here can
be used to store your objects, too, with minor changes.
[62] Yes, we use “shelve” as a noun in Python, much to the chagrin of a variety of editors I’ve worked with over the years, both electronic and human.