A subroutine is defined by the reserved word
[3] for subroutine definitions, sub
; the
subroutine's name, in this case, addACGT
; and a
block
, enclosed in a pair of matching curly braces. This is the same kind of
block as the block used in a loop to group statements together.
In Example 6-1, the name of the
subroutine is addACGT
, and the block is
everything after the name. Here is the subroutine definition again:
sub addACGT { my($dna) = @_; $dna .= 'ACGT'; return $dna; }
Now let's look into the block of the subroutine.
A subroutine is like a separate helper program for the main program, and it needs to have its own variables. You will use two types of variables in your subroutines in this book:[4]
Arguments passed in to the subroutine
Other variables declared with my
and
restricted to the scope of the subroutine
Arguments are the values given to a subroutine when it is used, or called. The
values of the arguments are passed into the subroutine by means of the special
variable
@_
, as you'll see in the next section.
Other variables a subroutine might use must be protected from interacting with
variables in other parts of the program, so they have effect only within the
subroutine's own scope. This is accomplished by declaring them as my
variables, as will be explained shortly.
Finally, most subroutines return their results via the return
function. This can return a single scalar as in return $dna
in our subroutine addACGT
, in a list of scalars as in return ($dna1, $dna2);
in an array as in return @lines;
and more.
To call a subroutine means to type its name and give it
appropriate arguments and, usually, collect its results.
Arguments
, sometimes called parameters, usually contain
the data that the subroutine computes on. In Example 6-1, this is the call of the
subroutine addACGT
with the argument
$dna
:
$longer_dna = addACGT($dna);
The essential point is that whenever you, the programmer, want to use a
subroutine, you can call it with whatever argument(s) it is designed to accept
and with which you need to compute (in this case, whatever DNA that needs
ACGT
appended to it) and the value of
each argument appears in the subroutine in the
@_
array.
When you call a subroutine with certain arguments, the names of the arguments
you provide in the call are not important inside the subroutine. Only the values
of those arguments that are actually passed inside the subroutine are important.
The subroutine typically collects the values from the @_
array and assigns them to new variables that may or may not
have the same names as the variables with which you called the subroutine. The
only thing preserved is the order of the values, not the names of the variables
containing the values.
Here's how it works. The first line in the subroutine's block is:
my($dna) = @_;
The values of the arguments from the call of the subroutine are passed into
the subroutine in the special array variable @_
. You know it's an array because it starts with the @
character. It has the brief name "_", and it's a
special array variable that comes predefined in Perl programs. (It's not a name
you should pick for your own arrays.) The array @_
contains all the scalar values passed into the subroutine.
These scalar values are the values of the arguments to the subroutine. In this
case, there is one scalar value: the string of DNA that's the value of the
variable $dna
passed in as an
argument.
If the subroutine has more arguments—for instance one argument for DNA, one
for the associated protein, and one for the name of the gene—they are all passed
in and assigned to my
variables inside the
subroutine:
my($dna,$protein,$name_of_gene) = @_;
If there are no arguments, just omit that statement in the subroutine.
After the statement:
my($dna) = @_;
executes in the subroutine, the passed-in value is assigned to the
subroutine's variable $dna
. The next section
explains why this is a new variable specific to the subroutine. The subroutine's
variable can be called anything; it certainly doesn't have to be the same name
as the argument, as it happens to be in this example. What's cool about scoping
is that it doesn't matter if it is or not.
By
keeping all variables a subroutine uses active only within the
subroutine, you can make it safe to call the subroutines from anywhere. You make
the variables specific only to the subroutine by declaring them as my
variables. my
is a keyword
defined in Perl that limits variables to the block in which they are used (in
this case, the block is the subroutine).[5]
Hiding variables and making them local to only a restricted part of a program,
is called scoping. In Perl, using my
variables is known as lexical
scoping,
and it's an essential part of modularizing your programs.
You declare that a variable is a my
variable like this:
my($x);
or:
my $x;
or, combining the declaration with an initialization to a value:
my($x) = '49';
or, if you're collecting an argument within a subroutine:
my($x) = @_;
Once a variable is declared in this fashion, it exists only until the end of
the block it was declared in. So in a subroutine, if you declare all your
variables like this (both the arguments and any other variables), they are
active only in the subroutine. If any variable has the same name as another
variable elsewhere in the program, you don't have to worry, because the my
declaration actually creates a new variable,
active only in the enclosing block, and any other variable of the same name used
elsewhere outside the block is kept separate.
The example that showed collecting an argument in a subroutine uses
parentheses around the variable. Because @_
is an array, the parentheses around the new variables put them in
array context and ensure that they are initialized correctly (see
Chapter 4).
Always declare all your variables in your subroutines—even those variables
that don't come in as argument with the my
construct (Unless you're using global variables, which
we're not).
Why use scoping? Example 6-2 shows the trouble that can happen when you don't. Recall that one of the advantages of subroutines is writing a useful bit of code once and then using it whenever you need it. Example 6-2 is a program that has a variable in the main program with the same name as a variable in a subroutine it calls. This can easily happen if you write the subroutine at a time other than the main program (say six months later) or if you call a subroutine someone else wrote.
Example 6-2. The pitfalls of not using my variables
#!/usr/bin/perl -w # Illustrating the pitfalls of not using my variables $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result "; exit; ################################################################################ # Subroutines ################################################################################ sub A_to_T { my($input) = @_; $dna = $input; $dna =~ s/A/T/g; return $dna; }
Example 6-2 gives the following output:
I changed all the A's in TTTTT to T's and got TTTTT
What was expected was this output:
I changed all the A's in AAAAA to T's and got TTTTT
You can obtain the expected output by changing the definition of subroutine
A_to_T
to the following, in which the
variable $dna
in the subroutine is declared
as a my
variable:
sub A_to_T { my($input) = @_; my($dna) = $input; $dna =~ s/A/T/g; return $dna; }
Where exactly did Example 6-2 go
wrong? When the program entered the subroutine, and used the variable $dna
to calculate the string with A's changed to
T's, the Perl language saw that there was already a variable $dna
being used in the main part of the program
and just kept using it. When the program returned from the subroutine and got to
the print
statement, it was still using the
same (the one and only) variable $dna
. So,
when it printed the results, the variable $dna
, instead of having the original DNA in it, had the altered
DNA that had been computed in the subroutine.
Now this sort of thing can happen a lot. Programmers tend to use certain names
for variables a great deal: the usual suspects are names such as $tmp
, $temp
,
$x
, $a
, $number
, $variable
, $var
, $array
, $input
, $output
, $result
, $data
, $file
,
$filename
, and so on. Bioinformaticians
are quite fond of $dna
, $protein
, $motif
, $sequence
, and the
like. As you start using libraries of subroutines from other people and as your
programs get larger, it's much easier—and a whole lot safer—to let the Perl
language worry about avoiding the problem of name collisions.
In fact, from now on we're going to stop using
undeclared variables. From this point forward, all our variables,
even those in the main program, will be declared with my
. You can enforce this discipline by adding the following
directive to your programs:
use strict;
which has the effect of insisting that your programs have all their variables
declared as my
variables.
Lest you rail at this seemingly unnecessary complication to your coding, compared to the simpler and happier days of Chapter 4 and Chapter 5, you should know that many languages require declarations for all their variables. The fact that in Perl you don't have to enforce strict scoping is handy when you're writing short programs, for example, or when you're trying to teach programming without hitting the students with a thousand details at the beginning.
Another benefit you get from strict scoping happens if you accidently misspell a variable name while writing a program. If the variables aren't being declared, Perl creates a new variable with the (misspelled) name. The program may not work correctly, and it may be hard to find where the problem is. By strictly scoping the program, any misspelled variables are also undeclared, and Perl complains about it, saving you hours or days of hair-pulling and bad language.
Finally, let's recap how scoping, arguments, and subroutines work by taking another look at Example 6-1. The subroutine is called
by writing its name addACGT
, passing it the
argument $dna
, and collecting results (if
any) by assignment to $longer_dna
:
$longer_dna = addACGT($dna);
The first line in the subroutine gets the value of the argument from the
special variable @_
, and stores it in its own
variable called $dna
, which can't be seen
outside the subroutine because it uses my
.
Even though the original variable outside the subroutine is also called $dna
, the variable called $dna
within the subroutine is an entirely new variable (with the
same name) that belongs only to the subroutine due to the use of my
. This new variable is in effect only during the
time the program is in the subroutine. Notice in the output from the print
statement at the end of Example 6-2 that even though the
value of a variable called $dna
is altered
inside the subroutine, the original variable, $dna
, outside the subroutine isn't
changed.
[3] A reserved word is a fundamental, defined word in the Perl language, such
as if
, while
, foreach
, or
sub
.
[4] In the subroutines in this book, we won't use
global variables, which can be seen by both the main program and
the subroutines; nor will we use variables declared with local
, which provides a different kind of
scoping restriction than my
.
[5] There are different models of scoping; my
implements a type called lexical
scoping, also known as static
scoping
. Another method is available in Perl via the
local
construct, but you almost always want to use my
.