In this chapter you'll extend your basic knowledge in two directions:
Subroutines
Using the Perl debugger
Subroutines are an important way to structure programs. You'll use them in Chapter 7, where you'll learn how to use randomization to simulate the mutation of DNA, and in all the following chapters. The Perl debugger examines a program's behavior in "slow motion" and helps you find those pesky bugs.
Subroutines are an important way to organize a program and are used in all major programming languages.
A subroutine wraps up a bit of code, gives the code a name, and provides a way to pass in some values for its calculations and then report back the results. The rest of the program can then use the subroutine's code just by calling its name, giving the needed values to pass in to the subroutine code and then collecting the results. This use or "invocation" of a subroutine is commonly referred to as calling the subroutine. You can think of a subroutine as a program within a program; just as you run programs to get results, so your programs call subroutines to get results. Once you have a subroutine, you can use it in a program simply by knowing which values to pass in and what kind of values to expect it to pass out.
Subroutines provide several benefits. They endow programs with abstraction, modularization, and the ability to create large programs by organizing the code into manageable chunks with defined inputs and outputs.
Say you need to calculate something, for instance the mean of a distribution, at several places in a program or in several different programs. By writing this calculation as a subroutine, you can write it once, and then call it whenever you need it, thus making your program:
Shorter, since you're reusing the code.
Easier to test, since you can test the subroutine separately.
Easier to understand, since it reduces clutter and better organizes programs.
More reliable, since you have less code when you reuse subroutines, so there are fewer opportunities for something to go wrong.
Faster to write, since you may, for example, have already written some subroutines that handle basic statistics and can just call the one that calculates the mean without having to write it again. Or better yet, you found a good statistics library someone else wrote, and you never had to write it at all.
There is another subtle, yet powerful idea at work here. Subroutines can themselves call other subroutines, that is, a subroutine can use another subroutine for help in its calculations.[1] By writing a set of subroutines, each of which does one or a few things well, you can combine them in various ways to make new subroutines. You can then combine the new subroutines, and so on, and the end result can be large and flexible programming systems. Decomposing problems into sets of subroutines that can be conveniently combined allows you to create environments that can grow and adapt to changing conditions with a minimum of effort.
The trick to all this is in how you partition the code into subroutines. You want subroutines that encapsulate something that will be generally useful, and not just called once (although that sometimes can be useful too). There are various rules of thumb: a subroutine should do one thing well, and it should be no more than a page or two of code. These are not real rules, and exceptions are frequent, but they can help you divide your code into manageable chunks, suitable for subroutines.
Let's look at how subroutines are used and then at how they're defined.
To use a subroutine, you pass data into the subroutine as
arguments
, and then you collect the return value(s) of the subroutine. For
example, say you want a subroutine that, given some DNA, appends "ACGT" to the
end of the DNA and returns the new, longer DNA. Let's call the subroutine
addACGT
. In Perl, you usually call a
subroutine by typing its name, followed by a parenthesized list of arguments (if any). For example, here's a call
to addACGT
with the one argument $dna
:
addACGT($dna);
When calling a subroutine, older versions of Perl required starting the name
of a subroutine with the
& (ampersand) character. It's still okay to do so (e.g., :
&addACGT)
, but these days the
ampersand is usually omitted.[2]
Example 6-1 demonstrates a subroutine that shows in detail how this works.
Example 6-1. A subroutine to append ACGT to DNA
#!/usr/bin/perl -w # A program with a subroutine to append ACGT to DNA # The original DNA $dna = 'CGACGTCTTCTCAGGCGA'; # The call to the subroutine "addACGT". # The argument being passed in is $dna; the result is saved in $longer_dna $longer_dna = addACGT($dna); print "I added ACGT to $dna and got $longer_dna "; exit; ################################################################################ # Subroutines for Example 6-1 ################################################################################ # Here is the definition for subroutine "addACGT" sub addACGT { my($dna) = @_; $dna .= 'ACGT'; return $dna; }
Example 6-1 produces the following output:
I added ACGT to CGACGTCTTCTCAGGCGA and got CGACGTCTTCTCAGGCGAACGT
We'll now look more closely at this code to see how subroutines are defined and used in a Perl program.
The first thing to notice, taking the large view, is that the program now has
two sections. The first section starts from the beginning of the program and
ends with the exit
command. Following that
(and announced by a blizzard of comments for easy reading) is a section for
subroutine definitions, in this case, only the one definition for
subroutine addACGT
. It is common to place
all subroutine definitions together at the end of a program, for ease in
reading. Usually they're listed alphabetically or in some other convenient
way.
Actually, it is legal to put the subroutine definitions almost anywhere in a program. This is because
Perl first scans through the code and does things like check the syntax and
learn subroutine definitions, before it starts to run the program. In
particular, subroutine definitions can come after the point in the code where
you use them (not necessarily before, which many people assume is the rule), and
they don't have to be grouped together but can be scattered throughout the code.
But our method of collecting them together at the end can make reading a program
much easier. The possible exception is when a small subroutine is used in one
section of code, as sometimes happens with the sort
function, for instance. In this case having the definition
right there can save the reader paging back and forth between the subroutine
definition and its use. Usually, it's more convenient to read the program
without the subroutine definitions, to get the overall flow of the program
first, and then go back and look into the subroutines, if necessary.
As you see, Example 6-1 is very
simple. It first stores some DNA into the variable $dna
and then passes that variable as an argument to the
subroutine call, which looks like this: addACGT($dna)
. The
subroutine is called by its name, followed by parentheses containing
the arguments to the subroutine. There may be no
arguments, or if more than one, they are separated by commas. The
value returned by the subroutine can be saved; in this program the value is
saved in a variable called $longer_dna
, which
is then printed, and the program exits.
The part of the program from the beginning to the exit
statement is called variously the main
program
or the main body of the program. By looking
over this section of the code, you can see what happens from the beginning to
the end of the program without looking into the details of the
subroutines.
Now that you've looked over the main program of Example 6-1, it's time to look at the subroutine definition and how it uses the principal of scoping.
[1] Subroutines can even call themselves, and this so-called recursion can be an elegant way to compute (see Chapter 11).
[2] There are times, even in the newer versions of Perl, when an ampersand
is required; you'll see one such case in Chapter 11, in Section 11.2.3, which
describes the File::Find module. (See also the
defined
and undef
functions in the documentation or
the perlref manpage).