No ties bind so strongly as the links of inheritance.
This chapter is essentially a motley collection of ideas, techniques, and opinions related to Perl objects. I have not attempted to weave these threads too closely. The topics are as follows:
Search for an alternative way of representing object attributes, instead of hash tables. The two strategies examined in this chapter occupy less space and are faster.
How to use AUTOLOAD
to automatically forward
method calls.
What I find objectionable about inheritance, along with alternative ways of structuring classes.
Hash tables have traditionally been used for storing object attributes. There are good reasons for doing this:
Each attribute is self-describing (that is, the name and type of each attribute are easily obtained from the object), which makes it easy to write readable code. It also helps modules that do automatic object persistence or visualization of objects, without the object’s explicit cooperation.
Each class in an inheritance hierarchy can add attributes freely and independently.
In fact, each instance (not just the class) can possess a unique set of attributes and can change this set at run time. The artificial intelligence community often uses this slot- or frame-based approach because it adapts itself very well to new pieces of information.
Of course, not every problem requires this degree of generality. In addition, while Perl’s hash tables are fast (within 15% of the speed of arrays) and reasonably compact (key strings are not duplicated), they are not exactly inexpensive. Creating 100 objects means that you have 100 hash tables, each of which tends to optimistically allocate extra space to accommodate future insertions.
This section illustrates two alternate approaches, one using arrays and another using typeglobs. Both approaches are less general than the hash table approach but are faster and leaner. The first is a module called ObjectTemplate developed for this book.[31] The other uses typeglobs and has seen limited application in some standard CPAN modules, most notably IO and Net. I hesitate to suggest this as an alternative approach because it is way too “hackish,” but I present it here to enable you to understand these standard modules.
The module presented in this section uses arrays to store attributes (but not the array per object approach). Let us briefly see its usage before moving on to the implementation.
To implement the Employee
class, with the
attributes “name,” “age,” and
“position,” you simply inherit from ObjectTemplate, and
supply a list of attribute names to a static method called
attributes
(exported by
ObjectTemplate
), as follows:
package Employee; use ObjectTemplate; # Import ObjectTemplate @ISA = qw(ObjectTemplate); # Inherit from it. attributes qw(name age position); # Declare your attributes
That’s all. A user of this module can now create
Employee
objects using a dynamically generated
method called new
and retrieve and modify
attributes using accessor methods (also created automagically):
use Employee; $obj = Employee->new( "name" => "Norma Jean", "age" => 25 ); # new() created by ObjectTemplate $obj->position("Actress"); print $obj->name, ":", $obj->age, " ";
Note that Perl permits you to omit the trailing parentheses for any method call in which there is no ambiguity about its usage. Any word following an arrow is automatically treated as a method, as in the preceding case.
ObjectTemplate provides the following features for an inherited class:
An allocator function called new
. This allocates
an object blessed into the inherited class. new
calls initialize
, which in turn can be overridden
in the inherited class, as explained earlier.
Accessor methods with the same name as the attributes. These methods are created in the inherited module, and everyone, including the object’s own methods, gains access to the attributes only through these methods. This is because ObjectTemplate is the only module that knows how the attributes are stored. For example,
package Employee; sub promote { my $emp = shift; # $emp is the object my $current_position = $emp->position(); # Get attribute my $next_position = lookup_next_position($current_position); $emp->position($next_position); # Set attribute }
The user package can create its own custom accessor methods with the
same naming convention as above; in this case, ObjectTemplate does
not generate one automatically. If a custom accessor method wants
access to the attribute managed by ObjectTemplate, it can use the
get_attribute
and set_attribute
methods.
new()
takes an initializer list, a sequence of
attribute name-value pairs.
ObjectTemplate takes attribute inheritance (@ISA
)
into account, for both the memory layout, and the accessors. Consider
package Employee; use ObjectTemplate; @ISA = qw(ObjectTemplate); attributes qw(name age); package HourlyEmployee; @ISA = qw(Employee); attributes qw(hourly_wage);
In this example, an object of the HourlyEmployee
class contains two inherited attributes, name
and
age
, that all employees possess, and
hourly_wage
, that only hourly employees possess.
All attributes are scalar-valued, so a multivalued attribute such as
friends
has to be stored as a reference:
attributes qw(friends); $obj->friends(['Joe']); # an array reference to the accessor
This is of course true of the hash table representation also.
Figure 8.1 shows how ObjectTemplate organizes object attributes.
The data structure is quite simple. Instead of allocating one array
or hash per object, ObjectTemplate creates only as many arrays as
there are attributes (the columns shown in the figure). Each object
is merely a “horizontal slice” across these attribute
columns. When new()
is called, it allocates a new
logical row and inserts each element of the initializer array in the
corresponding attribute column at the new row offset. The
“object,” therefore, is merely a blessed scalar
containing that row index. This scheme is more space-efficient than
the hash approach, because it creates so few container arrays (only
as many as there are attributes), and it is faster because array
accesses are always a little faster than hash accesses.
There’s a slight hitch when an object is deleted. Although the
corresponding row is logically free, we can’t really move up
the rest of the rows below, because the other object references
(which are indices) and their data will get out of sync.
ObjectTemplate therefore reuses deallocated (free) rows by
maintaining a per-package “free list” called
@_free
. This is a linked list of all free rows
with a scalar $_free
pointing to the head of this
list. Each element of this list contains the row index of the next
free row. When an object is deleted, $_free
points
to that row, and the corresponding index in the free list points to
the previous entry pointed to by $_free
.
Since the freed and active rows do not overlap, we take the liberty
of using one of the attribute columns (the first one) to hold
@_free
. This is done using typeglob aliasing.
Figure 8.2 shows a snapshot of this structure.
You might have noticed that I’m using the same identifier name,
_free
, for two variables,
$_free
and @_free
. Although I
frown on this idea in general, I have used it here for two reasons.
First, both are required for the same task; second, one typeglob
alias gives us access to both variables in one shot. This is
important for performance, as we shall see soon.
ObjectTemplate uses objects, typeglob aliasing, symbolic references,
and eval
liberally, so if you understand the code
below, you can consider yourself a Perl hacker! One way to pore
through this code is to read the descriptions supplied in this
section while using the debugger to step through a small example that
uses this module. Of course, you don’t
have to understand the code to use it.
package ObjectTemplate;
require Exporter;
@ObjectTemplate::ISA = qw(Exporter);
@ObjectTemplate::EXPORT = qw(attributes);
my $debugging = 0; # assign 1 to it to see code generated on the fly
# Create accessor methods, and new()
sub attributes {
my ($pkg) = caller;
@{"${pkg}::_ATTRIBUTES_"} = @_;
my $code = "";
foreach my $attr (get_attribute_names($pkg)) {
# If a field name is "color", create a global array in the
# calling package called @_color
@{"${pkg}::_$attr"} = ();
# Define accessor only if it is not already present
unless ($pkg->can("$attr")) {
$code .= _define_accessor ($pkg, $attr);
}
}
$code .= _define_constructor($pkg);
eval $code;
if ($@) {
die "ERROR defining constructor and attributes for '$pkg':"
. "
$@
"
. "-----------------------------------------------------"
. $code;
}
}
attributes
uses symbolic references to create a
global array called @_ATTRIBUTES_
that remembers
the attribute names. This array is then used by
get_attribute_names
to access all attributes
defined in the current package and all its super classes. For each
such attribute, attributes
creates a global array
in the current package, as we saw in Figure 8.1. If
an accessor has not been defined for that attribute, it calls
_define_accessor
to generate the method
dynamically. Finally, it calls _define_constructor
to create the subroutine new
directly into the
calling package.
sub _define_accessor {
my ($pkg, $attr) = @_;
# This code creates an accessor method for a given
# attribute name. This method returns the attribute value
# if given no args, and modifies it if given one arg.
# Either way, it returns the latest value of that attribute
# qq makes this block behave like a double-quoted string
my $code = qq{
package $pkg;
sub $attr { # Accessor ...
@_ > 1 ? $_${attr} [${$_[0]}] = $_[1] # set
: $_${attr} [${$_[0]}]; # get
}
if (!defined $_free) {
# Alias the first attribute column to _free
*_free = *_$attr;
$_free = 0;
};
};
$code;
}
_define_accessor
is called for every field name
given to attributes
and for every attribute found
in the module’s superclasses. For an attribute called
age
in the Employee module, for example, it
generates the following code:
package Employee; sub age { # Accessor @_ ? $_age[$$_[0]] = $_[1]; # set : $_age[$$_[0]]; # get } if (!defined $_free) { *_free = *_age; # Alias the first attribute column #to _free $_free = 0; };
$_[0]
contains the object, and
$_[1]
contains the attribute value. Therefore
$$_[0]
contains the row index, and
$_age[$$_[0]]
contains the value of the age
attribute of that object. In addition,
_define_accessor
aliases _free
to _age
if the aliases don’t already exist.
sub _define_constructor {
my $pkg = shift;
my $code = qq{
package $pkg;
sub new {
my $class = shift;
my $inst_id;
if (defined($_free[$_free])) {
$inst_id = $_free;
$_free = $_free[$_free];
undef $_free[$inst_id];
} else {
$inst_id = $_free++;
}
my $obj = bless \$inst_id, $class;
$obj->set_attributes(@_) if @_;
$obj->initialize;
$obj;
}
};
$code;
}
_define_constructor
generates code for a
constructor called new
to be installed in the
calling package. new
checks the free list, and if
it contains rows to spare, it uses the row number from the top of
that list. It then undef
’s the head of the
list, because the free list is aliased to the first attribute column,
and we don’t want that attribute’s assessor picking up
garbage values. If the free list does not contain any spare rows, the
object is assigned the next logical index.
sub get_attribute_names {
my $pkg = shift;
$pkg = ref($pkg) if ref($pkg);
my @result = @{"${pkg}::_ATTRIBUTES_"};
if (defined (@{"${pkg}::ISA"})) {
foreach my $base_pkg (@{"${pkg}::ISA"}) {
push (@result, get_attribute_names($base_pkg));
}
}
@result;
}
get_attribute_names
recurses through the
package’s @ISA
array to fetch all attribute
names. This can be used by anyone requiring object meta-data (such as
object persistence modules).
# $obj->set_attributes (name => 'John', age => 23);
# Or, $obj->set_attributes (['age', 'name'], [23, "sriram"])
sub set_attributes {
my $obj = shift;
my $attr_name;
if (ref($_[0])) {
my ($attr_name_list, $attr_value_list) = @_;
my $i = 0;
foreach $attr_name (@$attr_name_list) {
$obj->$attr_name($attr_value_list->[$i++]);
}
} else {
my ($attr_name, $attr_value);
while (@_) {
$attr_name = shift;
$attr_value = shift;
$obj->$attr_name($attr_value);
}
}
}
set_attributes
is given a list of attribute
name-value pairs and simply calls the accessor method for each
attribute. It can also be called with two parameters; an array of
names and an array of values.
# @attrs = $obj->get_attributes (qw(name age));
sub get_attributes {
my $obj = shift;
my (@retval);
map $obj->${_}(), @_;
}
get_attributes
uses map
to
iterate through all attribute names, setting $_
to
each name in every iteration. The first part of
map
simply calls the corresponding accessor method
using a symbolic reference. Because of some weird precedence issues,
you cannot omit the curly braces in ${_}
.
sub set_attribute { my ($obj, $attr_name, $attr_value) = @_; my ($pkg) = ref($obj); ${"${pkg}::_$attr_name"}[$$obj] = $attr_value; } sub get_attribute { my ($obj, $attr_name, $attr_value) = @_; my ($pkg) = ref($obj); return ${"${pkg}::_$attr_name"}[$$obj]; }
The get/set_attribute
pair updates a single
attribute. Unlike the earlier pair of methods, this pair does not
call an accessor; it updates the attribute directly. We saw earlier
that attributes
does not attempt to create
accessor methods for those that already exist. But if the custom
accessors still want to use the storage scheme provided by
ObjectTemplate, they can use the get/set_attribute
pair. The expression ${pkg}::_$attr_name
represents the appropriate column attribute, and
$$obj
represents the logical row. (Recall that the
object is simply a reference to an array index.) These methods are
clearly not as fast as the generated accessor methods, because they
use symbolic references (which involve variable interpolation in a
string and an extra hash lookup).
sub DESTROY {
# release id back to free list
my $obj = $_[0];
my $pkg = ref($obj);
local *_free = *{"${pkg}::_free"};
my $inst_id = $$obj;
# Release all the attributes in that row
local(@attributes) = get_attribute_names($pkg);
foreach my $attr (@attributes) {
undef ${"${pkg}::_$attr"}[$inst_id];
}
$_free[$inst_id] = $_free;
$_free = $inst_id;
}
DESTROY
releases all attribute values
corresponding to that object. This is necessary because the object is
merely a reference to an array index, which, when freed, won’t
touch the reference counts of any of the attributes. A module
defining its own DESTROY
method must make sure
that it always calls ObjectTemplate::DESTROY
.
sub initialize { }; # dummy method, if subclass doesn't define one.
Modules are expected to override this method if they want to do
specific initialization, in addition to what the automatically
generated new()
does.
There are (at least) two areas that could use considerable
improvement. One is that get_attributes
and
set_attributes
are slow because they always call
accessor methods, even if they know which accessors are artificially
provided. Because set_attributes
is called by the
automatically generated new
, it slows down object
construction dramatically. (Using this new
without
arguments is twice as fast as allocating an anonymous hash, but after
invoking set_attributes
, it is around three times
slower.)
Second, custom accessor methods suffer in speed because they are
forced to invoke the other slow pair,
get_attribute
and
set_attribute
. Possibly a better alternative is to
dynamically generate accessor methods prefixed with an
“_”, so that the developer can write normal accessor
methods (without the prefix), and also call these private methods.
You might also want to check out the MethodMaker module available on CPAN, and the Class::Template module that is bundled with the standard distribution. These modules also create accessor methods automatically but assume that the object representation is a hash table. If you like the interface these modules provide, you can attempt to merge their interface with the attribute storage scheme of ObjectTemplate.
This approach, as we mentioned earlier, is not exactly a paragon of readability and is presented here only because it is used in some freely available libraries on CPAN, like the IO and Net distributions. If you don’t wish to understand how these modules work, you can easily skip this section without loss of continuity.
We learned from Chapter 3, that a typeglob
contains pointers to different types of values. If we somehow make a
typeglob into an object reference, we can treat these value pointers
as attributes and access them very quickly. Consider the following
foo
typeglob:
${*foo} = "Oh, my!!" ; # Use the scalar part to store a string @{*foo} = (10, 20); # Use the array part to store an array open (foo, "foo.txt"); # Use it as a filehandle
We are able to hang different types of values (at most one of each
type) from just one identifier, foo
. If we want
many such objects, we can use the Symbol module in the Perl library
to create references to dynamically created typeglobs:
use Symbol; $obj = Symbol::gensym(); # ref to typeglob
$obj
contains a reference to a typeglob. The
different parts of a typeglob can be individually accessed (by
replacing foo
with $obj
):
${*$obj} = "Oh, my!!" ; # Use the scalar part to store a string @{*$obj} = (10, 20); # Use the array part to store an array open ($obj, "foo"); # Use it as a filehandle
Clearly, this is a hideous approach for most general objects; if you
need another scalar-valued attribute, for example, you have no option
but to put it in the hash part of this typeglob. The reason why the
IO group of modules uses this hack is that an instance of any of
these modules can be treated as a filehandle and passed directly
(without dereferencing) to the built-in I/O functions such as
read
and write
. For example:
$sock = new IO::Socket( ... various parameters
...) ;
print $sock "Hello, are you there";
$message = <$sock>;
We’ll use IO::Socket module extensively in the chapters on networking with sockets.[32]
Let us build a small module called File to examine this technique in greater detail. This module allows you to open a file and read the next line; in addition, it allows you to put back a line so that the next attempt to read the file returns that line:
package main;
$obj = File->open("File.pm");
print $obj->next_line();
$obj->put_back("------------------------
");
print $obj->next_line(); # Should print the string put back above
print $obj->next_line();
Since this code opens the File module itself, it should print the following:
package File; ------------------------ use Symbol;
This module uses the scalar part of the typeglob object as a “putback” buffer, the array part of the typeglob to store all the lines read from the file, and the filehandle part of the typeglob to store the filehandle. The implementation of the File module is shown in Example 8.1.
Example 8-1. File Module, Built Using a Typeglob Representation
package File; use Symbol; sub open { my ($pkg, $filename) = @_; $obj = gensym(); # Allocate a typeglob open ($obj, $filename) || return undef; # Use it as a filehandle bless $obj, $pkg; # Upgrade to a File "object" } sub put_back { my ($r_obj, $line) = @_; ${*$r_obj} = $line; # The scalar part holds the } # current line sub next_line { my $r_obj = $_[0]; my $retval; if (${*$r_obj}) { # Check putback buffer $retval = ${*$r_obj}; # yep, it's got stuff ${*$r_obj} = ""; # empty it. } else { $retval = <$r_obj>; # no. read from file push(@{*$r_obj}, $retval); # add to history list. } $retval; } 1;