In this chapter, we get to start having fun, because we get to start talking about software design. If we're going to talk about good software design, we have to talk about Laziness, Impatience, and Hubris, the basis of good software design.
We've all fallen into the trap of using cut-and-paste when we should have defined a higher-level abstraction, if only just a loop or subroutine.[1] To be sure, some folks have gone to the opposite extreme of defining ever-growing mounds of higher-level abstractions when they should have used cut-and-paste.[2] Generally, though, most of us need to think about using more abstraction rather than less.
Caught somewhere in the middle are the people who have a balanced view of how much abstraction is good, but who jump the gun on writing their own abstractions when they should be reusing existing code.[3]
Whenever you're tempted to do any of these things, you need to sit back and think about what will do the most good for you and your neighbor over the long haul. If you're going to pour your creative energies into a lump of code, why not make the world a better place while you're at it? (Even if you're only aiming for the program to succeed, you need to make sure it fits the right ecological niche.)
The first step toward ecologically sustainable programming is simply this: don't litter in the park. When you write a chunk of code, think about giving the code its own namespace, so that your variables and functions don't clobber anyone else's, or vice versa. A namespace is a bit like your home, where you're allowed to be as messy as you like, as long as you keep your external interface to other citizens moderately civil. In Perl, a namespace is called a package. Packages provide the fundamental building block upon which the higher-level concepts of modules and classes are constructed.
Like the notion of "home", the notion of "package" is a
bit nebulous. Packages are independent of files. You can have many
packages in a single file, or a single package that spans several files,
just as your home could be one small garret in a larger building (if
you're a starving artist), or it could comprise several buildings (if
your name happens to be Queen Elizabeth). But the usual size of a home
is one building, and the usual size of a package is one file. Perl
provides some special help for people who want to put one package in one
file, as long as you're willing to give the file the same name as the
package and use an extension of .pm, which is short
for "perl module". The module is the fundamental
unit of reusability in Perl. Indeed, the way you use a module is with
the use
command, which is a compiler directive that
controls the importation of subroutines and variables from a module.
Every example of use
you've seen until now has been
an example of module reuse.
The Comprehensive Perl Archive Network, or CPAN, is where you should put your modules if other people might find them useful. Perl has thrived because of the willingness of programmers to share the fruits of their labor with the community. Naturally, CPAN is also where you can find modules that others have thoughtfully uploaded for everyone to use. See Chapter 22, and http://www.cpan.org for details.
The trend over the last 25 years or so has been to design computer languages that enforce a state of paranoia. You're expected to program every module as if it were in a state of siege. Certainly there are some feudal cultures where this is appropriate, but not all cultures are like this. In Perl culture, for instance, you're expected to stay out of someone's home because you weren't invited in, not because there are bars on the windows.[4]
This is not a book about object-oriented methodology, and we're not here to convert you into a raving object-oriented zealot, even if you want to be converted. There are already plenty of books out there for that. Perl's philosophy of object-oriented design fits right in with Perl's philosophy of everything else: use object-oriented design where it makes sense, and avoid it where it doesn't. Your call.
In OO-speak, every object belongs to a grouping called a class. In Perl, classes and packages and modules are all so closely related that novices can often think of them as being interchangeable. The typical class is implemented by a module that defines a package with the same name as the class. We'll explain all of this in the next few chapters.
When you use
a module, you benefit
from direct software reuse. With classes, you benefit from indirect
software reuse when one class uses another through inheritance. And with
classes, you get something more: a clean interface to another namespace.
Everything in a class is accessed indirectly, insulating the class from
the outside world.
As we mentioned in Chapter 8, object-oriented programming in Perl is accomplished through references whose referents know which class they belong to. In fact, now that you know about references, you know almost everything difficult about objects. The rest of it just "lays under the fingers", as a pianist would say. You will need to practice a little, though.
One of your basic finger exercises consists of learning
how to protect different chunks of code from inadvertently tampering
with each other's variables. Every chunk of code belongs to a particular
package, which determines what variables and
subroutines are available to it. As Perl encounters a chunk of code, it
is compiled into what we call the current package.
The initial current package is called "main
", but you
can switch the current package to another one at any time with the
package
declaration. The current package determines
which symbol table is used to find your variables, subroutines, I/O
handles, and formats.
Any variable not declared with my
is
associated with a package--even seemingly omnipresent variables like
$_
and %SIG
. In fact, there's
really no such thing as a global variable in Perl, just package
variables. (Special identifiers like _
and
SIG
merely seem global because they default to the
main
package instead of the current one.)
The scope of a package
declaration is
from the declaration itself through the end of the enclosing scope
(block, file, or eval
--whichever comes first) or
until another package
declaration at the same level,
which supersedes the earlier one. (This is a common practice).
All subsequent identifiers (including those declared with
our
, but not including those declared with
my
or those qualified with a different package name)
will be placed in the symbol table belonging to the current package.
(Variables declared with my
are independent of
packages; they are always visible within, and only within, their
enclosing scope, regardless of any package declarations.)
Typically, a package
declaration will
be the first statement of a file meant to be included by
require
or use
. But again, that's
by convention. You can put a package
declaration
anywhere you can put a statement. You could even put it at the end of a
block, in which case it would have no effect whatsoever. You can switch
into a package in more than one place; a package declaration merely
selects the symbol table to be used by the compiler for the rest of that
block. (This is how a given package can span more than one file.)
You can refer to identifiers[5] in other packages by prefixing ("qualifying") the
identifier with the package name and a double colon:
$Package::Variable
. If the package name is null, the
main
package is assumed. That is,
$::sail
is equivalent to
$main::sail
.[6]
The old package delimiter was a single quote, so in old
Perl programs you'll see variables like $main'sail
and $somepack'horse
. But the double colon is now the
preferred delimiter, in part because it's more readable to humans, and
in part because it's more readable to emacs macros.
It also makes C++ programmers feel like they know what's going on--as
opposed to using the single quote as the separator, which was there to
make Ada programmers feel like they knew what's going on. Because the
old-fashioned syntax is still supported for backward compatibility, if
you try to use a string like "This is $owner's
house
", you'll be accessing $owner::s
; that
is, the $s
variable in package
owner
, which is probably not what you meant. Use
braces to disambiguate, as in "This is ${owner}'s
house
".
The double colon can be used to chain together identifiers in a
package name: $Red::Blue::var
. This means the
$var
belonging to the Red::Blue
package. The Red::Blue
package has nothing to do with
any Red
or Blue
packages that
might happen to exist. That is, a relationship between
Red::Blue
and Red
or
Blue
may have meaning to the person writing or using
the program, but it means nothing to Perl. (Well, other than the fact
that, in the current implementation, the symbol table
Red::Blue
happens to be stored in the symbol table
Red
. But the Perl language makes no use of that
directly.)
For this reason, every package
declaration must
declare a complete package name. No package name ever assumes any kind
of implied "prefix", even if (seemingly) declared within the scope of
some other package declaration.
Only identifiers (names starting with letters or an
underscore) are stored in a package's symbol table. All other symbols
are kept in the main
package, including all the
nonalphabetic variables, like $!
,
$?
, and $_
. In addition, when
unqualified, the identifiers STDIN
,
STDOUT
, STDERR
,
ARGV
, ARGVOUT
,
ENV
, INC
, and
SIG
are forced to be in package
main
, even when used for other purposes than their
built-in ones. Don't name your package m
,
s
, y
, tr
,
q
, qq
, qr
,
qw
, or qx
unless you're looking
for a lot of trouble. For instance, you won't be able to use the
qualified form of an identifier as a filehandle because it will be
interpreted instead as a pattern match, a substitution, or a
transliteration.
Long ago, variables beginning with an underscore were
forced into the main
package, but we decided it was
more useful for package writers to be able to use a leading underscore
to indicate semi-private identifiers meant for internal use by that
package only. (Truly private variables can be declared as file-scoped
lexicals, but that works best when the package and module have a
one-to-one relationship, which is common but not required.)
The %SIG
hash (which is for trapping
signals; see Chapter 16) is also
special. If you define a signal handler as a string, it's assumed to
refer to a subroutine in the main
package unless
another package name is explicitly used. Use a fully qualified signal
handler name if you want to specify a particular package, or avoid
strings entirely by assigning a typeglob or a function reference
instead:
$SIG{QUIT} = "Pkg::quit_catcher"; # fully qualified handler name $SIG{QUIT} = "quit_catcher"; # implies "main::quit_catcher" $SIG{QUIT} = *quit_catcher; # forces current package's sub $SIG{QUIT} = &quit_catcher; # forces current package's sub $SIG{QUIT} = sub { print "Caught SIGQUIT " }; # anonymous sub
The notion of "current package" is both a compile-time and
run-time concept. Most variable name lookups happen at compile time, but
run-time lookups happen when symbolic references are dereferenced, and
also when new bits of code are parsed under eval
. In
particular, when you eval
a string, Perl knows which
package the eval
was invoked in and propagates that
package inward when evaluating the string. (You can always switch to a
different package inside the eval
string, of course,
since an eval
string counts as a block, just like a
file loaded in with do
, require
,
or use
.)
Alternatively, if an eval
wants to find out
what package it's in, the special symbol __PACKAGE__
contains the current package name. Since you can treat it as a string,
you could use it in a symbolic reference to access a package variable.
But if you were doing that, chances are you should have declared the
variable with our
instead so it could be accessed as
if it were a lexical.
The contents of a package are collectively called a
symbol table. Symbol tables are stored in a hash
whose name is the same as the package, but with two colons appended.
The main
symbol table's name is thus
%main:
:. Since main
also happens
to be the default package, Perl provides %:
: as an
abbreviation for %main:
:.
Likewise, the symbol table for the Red::Blue
package is named %Red::Blue:
:. As it happens, the
main
symbol table contains all other top-level
symbol tables, including itself, so %Red::Blue:
: is
also %main::Red::Blue:
:.
When we say that a symbol table "contains" another
symbol table, we mean that it contains a reference to the other symbol
table. Since main
is the top-level package, it
contains a reference to itself, with the result that
%main:
: is the same as
%main::main:
:, and
%main::main::main:
:, and so on, ad infinitum. It's
important to check for this special case if you write code that
traverses all symbol tables.
Inside a symbol table's hash, each key/value pair
matches a variable name to its value. The keys are the symbol
identifiers, and the values are the corresponding typeglobs. So when
you use the *
NAME
typeglob notation, you're really just accessing a value in the hash
that holds the current package's symbol table. In fact, the following
have (nearly) the same effect:
*sym = *main::variable; *sym = $main::{"variable"};
The first is more efficient because the main
symbol table is accessed at compile time. It will also create a new
typeglob by that name if none previously exists, whereas the second
form will not.
Since a package is a hash, you can look up the keys of the package and get to all the variables of the package. Since the values of the hash are typeglobs, you can dereference them in several ways. Try this:
foreach $symname (sort keys %main::) { local *sym = $main::{$symname}; print "$$symname is defined " if defined $sym; print "@$symname is nonnull " if @sym; print "\%$symname is nonnull " if %sym; }
Since all packages are accessible (directly or
indirectly) through the main
package, you can write
Perl code to visit every package variable in your program. The Perl
debugger does precisely that when you ask it to dump all your
variables with the V
command. Note that if you do
this, you won't see variables declared with my
since those are independent of packages, although you will see
variables declared with our
. See Chapter 20.
Earlier we said that only identifiers are stored in packages
other than main
. That was a bit of a fib: you can
use any string you want as the key in a symbol table hash--it's just
that it wouldn't be valid Perl if you tried to use a non-identifier
directly:
$!@#$% = 0; # WRONG, syntax error. ${'!@#$%'} = 1; # Ok, though unqualified. ${'main::!@#$%'} = 2; # Can qualify within the string. print ${ $main::{'!@#$%'} } # Ok, prints 2!
Assignment to a typeglob performs an aliasing operation; that is,
*dick = *richard;
causes variables, subroutines, formats, and file and directory
handles accessible via the identifier richard
to
also be accessible via the symbol dick
. If you want
to alias only a particular variable or subroutine, assign a reference
instead:
*dick = $richard;
That makes $richard
and
$dick
the same variable, but leaves
@richard
and @dick
as separate
arrays. Tricky, eh?
This is how the Exporter
works when
importing symbols from one package to another. For example:
*SomePack::dick = &OtherPack::richard;
imports the &richard
function from
package OtherPack
into SomePack
,
making it available as the &dick
function. (The
Exporter
module is described in the next chapter.)
If you precede the assignment with a local
, the
aliasing will only last as long as the current dynamic scope.
This mechanism may be used to retrieve a reference from a subroutine, making the referent available as the appropriate data type:
*units = populate() ; # Assign \%newhash to the typeglob print $units{kg}; # Prints 70; no dereferencing needed! sub populate { my %newhash = (km => 10, kg => 70); return \%newhash; }
Likewise, you can pass a reference into a subroutine and use it without dereferencing:
%units = (miles => 6, stones => 11); fillerup( \%units ); # Pass in a reference print $units{quarts}; # Prints 4 sub fillerup { local *hashsym = shift; # Assign \%units to the typeglob $hashsym{quarts} = 4; # Affects %units; no dereferencing needed! }
These are tricky ways to pass around references cheaply when you
don't want to have to explicitly dereference them. Note that both
techniques only work with package variables; they would not have
worked had we declared %units
with
my
.
Another use of symbol tables is for making "constant" scalars:
*PI = 3.14159265358979;
Now you cannot alter $PI
, which is
probably a good thing, all in all. This isn't the same as a constant
subroutine, which is optimized at compile time. A constant subroutine
is one prototyped to take no arguments and to return a constant
expression; see Section
6.4.1 in Chapter 6, for
details. The use constant
pragma (see Glossary) is a convenient
shorthand:
use constant PI => 3.14159;
Under the hood, this uses the subroutine slot of
*PI
, instead of the scalar slot used earlier. It's
equivalent to the more compact (but less readable):
*PI = sub () { 3.14159 };
That's a handy idiom to know anyway--assigning a sub
{}
to a typeglob is the way to give a name to an anonymous
subroutine at run time.
Assigning a typeglob reference to another typeglob
(*sym = *oldvar
) is the same as assigning the
entire typeglob, because Perl automatically dereferences the typeglob
reference for you. And when you set a typeglob to a simple string, you
get the entire typeglob named by that string, because Perl looks up
the string in the current symbol table. The following are all
equivalent to one another, though the first two compute the symbol
table entry at compile time, while the last two do so at run
time:
*sym = *oldvar; *sym = *oldvar; # autodereference *sym = *{"oldvar"}; # explicit symbol table lookup *sym = "oldvar"; # implicit symbol table lookup
When you perform any of the following assignments, you're replacing just one of the references within the typeglob:
*sym = $frodo; *sym = @sam; *sym = \%merry; *sym = &pippin;
If you think about it sideways, the typeglob itself can be viewed as a kind of hash, with entries for the different variable types in it. In this case, the keys are fixed, since a typeglob can contain exactly one scalar, one array, one hash, and so on. But you can pull out the individual references, like this:
*pkg::sym{SCALAR} # same as $pkg::sym *pkg::sym{ARRAY} # same as @pkg::sym *pkg::sym{HASH} # same as \%pkg::sym *pkg::sym{CODE} # same as &pkg::sym *pkg::sym{GLOB} # same as *pkg::sym *pkg::sym{IO} # internal file/dir handle, no direct equivalent *pkg::sym{NAME} # "sym" (not a reference) *pkg::sym{PACKAGE} # "pkg" (not a reference)
You can say *foo{PACKAGE}
and
*foo{NAME}
to find out what name and package the
*foo
symbol table entry comes from. This may be
useful in a subroutine that is passed typeglobs as arguments:
sub identify_typeglob { my $glob = shift; print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, " "; } identify_typeglob(*foo); identify_typeglob(*bar::glarch);
This prints:
You gave me main::foo You gave me bar::glarch
The
*foo{
THING
}
notation can be used to obtain references to individual elements of
*foo
. See Section 8.2.5 in Chapter 8 for details.
This syntax is primarily used to get at the internal filehandle
or directory handle reference, because the other internal references
are already accessible in other ways. (The old
*foo{FILEHANDLE}
is still supported to mean
*foo{IO}
, but don't let its name fool you into
thinking it can distinguish filehandles from directory handles.) But
we thought we'd generalize it because it looks kind of pretty. Sort
of. You probably don't need to remember all this unless you're
planning to write another Perl debugger.
[1] This is a form of False Laziness.
[2] This is a form of False Hubris.
[3] You guessed it—this is False Impatience. But if you're determined to reinvent the wheel, at least try to invent a better one.
[4] But Perl provides some bars if you want them, too. See "Handling Insecure Code" in Chapter 23.
[5] By identifiers, we mean the names used as symbol table keys for accessing scalar variables, array variables, hash variables, subroutines, file or directory handles, and formats. Syntactically speaking, labels are also identifiers, but they aren't put into a particular symbol table; rather, they are attached directly to the statements in your program. Labels cannot be package qualified.
[6] To clear up another bit of potential confusion, in a variable
name like $main::sail
, we use the term
"identifier" to talk about main
and
sail
, but not main::sail
. We
call that a variable name instead, because identifiers cannot
contain colons.