When you start parsing GenBank, PDB, and BLAST files in later chapters, you'll need more complicated arguments to your subroutines to hold the several fields of data you'll parse out of the records. These next sections explain the way it's done in Perl. You can skim this section and return for a closer read when you get to Chapter 10.
So far, all our subroutines have had fairly simple arguments. The values of these arguments are copied and passed to the subroutines, and whatever happens to those values in the subroutine doesn't affect the values of the arguments in the main program. This is called pass by value or call by value. For example:
#!/usr/bin/perl -w # Example of pass-by-value (a.k.a. call-by-value) use strict; my $i = 2; simple_sub($i); print "In main program, after the subroutine call, $i equals $i "; exit; ################################################################################ # Subroutines ################################################################################ sub simple_sub { my($i) = @_; $i += 100; print "In subroutine simple_sub, $i equals $i "; }
This gives the following output:
In subroutine simple_sub, $i equals 102 In main program, after the subroutine call, $i equals 2
If you have more
complicated arguments, say a mixture of scalars, arrays, and hashes, Perl often
cannot distinguish between them. Perl passes all arguments into the subroutine
as a single array, the special @_
array. If there are arrays or hashes as arguments, their elements
get "flattened" out into this single @_
array
in the subroutine. Here's an example:
#!/usr/bin/perl -w # Example of problem of pass-by-value with two arrays use strict; my @i = ('1', '2', '3'), my @j = ('a', 'b', 'c'), print "In main program before calling subroutine: i = " . "@i "; print "In main program before calling subroutine: j = " . "@j "; reference_sub(@i, @j); print "In main program after calling subroutine: i = " . "@i "; print "In main program after calling subroutine: j = " . "@j "; exit; ################################################################################ # Subroutines ################################################################################ sub reference_sub { my(@i, @j) = @_; print "In subroutine : i = " . "@i "; print "In subroutine : j = " . "@j "; push(@i, '4'), shift(@j); }
The following output illustrates the problem of this approach:
In main program before calling subroutine: i = 1 2 3 In main program before calling subroutine: j = a b c In subroutine : i = 1 2 3 a b c In subroutine : j = In main program after calling subroutine: i = 1 2 3 In main program after calling subroutine: j = a b c
As you see, in the subroutine all the elements of @i
and @j
were grouped into
one @_
array. All distinction between the two
arrays you started with was lost in the subroutine. When you try to get the two
arrays back in the statement:
my(@i, @j) = @_;
Perl assigns everything to the first array, @i
. This behavior makes passing multiple arrays into subroutines
somewhat dicey.
Also, as usual, the original arrays in the main program were not affected by
the subroutine, since you used lexical scoping (my
variables).
To get around this problem, you can pass arguments into subroutines in a style called pass by reference or call by reference. Using pass by reference, you can pass a subroutine any collection of scalars, arrays, hashes, and more, and the subroutine can distinguish between them. There is a price to pay: the resulting code looks a little more complex. But the payoff is often well worth it.
There is one big difference in the behavior of arguments that are passed by reference. When argument variables are passed in this fashion, anything you do to the values of the argument variables in the subroutine also affects the values of the arguments in the main program.
To call a subroutine that has its arguments passed by reference, you call it the same way as before, with one difference: you must preface the argument names with a backslash. In the example of pass-by-reference later in this section, the subroutine call is accomplished like so:
reference_sub(@i, @j);
As you see here, the
arguments are two arrays, and, to preserve the distinction between
them as they are passed into the reference_sub
subroutine, they are passed by reference by prepending their names
with a backslash.
Within the subroutine, there are a few changes. First, the arguments are
collected from the @_
array, and saved as
scalar variables. This is because a reference is a special kind of
data that is stored in a scalar variable, no matter whether it's a reference to
a scalar, an array, a hash, or other. The example collects its arguments as
follows:
my($i, $j) = @_;
reading them from the @_
array as
scalars.
The subroutine has to do one more thing with these referenced arguments. When
it uses them, it has to
dereference them. To dereference a referenced argument, you have to
prepend the reference with the symbol that shows what kind of variable it is: a
$
for a scalar, @
for an array, %
for a hash.
So these variables have two symbols before their name—reading left to right,
their usual symbol and then a $
that
indicates the variable is a reference. The lines:
push(@$i, '4'), shift(@$j);
in the following subroutine are the ones that manipulate the arguments. The
push
adds an element '4' to the end of the @i
array, and the shift
removes the first element from the @j
array. Because these arrays have been passed by reference,
their names in the subroutine are @$i
and
@$j
. (If you want to look at the third
element of the @j
array, which normally is
$j[2]
, you'd say $$j[2]
.)
Whatever changes you make to the arguments in the subroutine also take effect in the main program. This is because the references are references to the actual arguments; they are not copies of their values as in pass by value. So, as you see in the example, after calling the subroutine, the arrays in the main program have been altered accordingly:
#!/usr/bin/perl # Example of pass-by-reference (a.k.a. call-by-reference) use strict; use warnings; my @i = ('1', '2', '3'), my @j = ('a', 'b', 'c'), print "In main program before calling subroutine: i = " . "@i "; print "In main program before calling subroutine: j = " . "@j "; reference_sub(@i, @j); print "In main program after calling subroutine: i = " . "@i "; print "In main program after calling subroutine: j = " . "@j "; exit; ################################################################################ # Subroutines ################################################################################ sub reference_sub { my($i, $j) = @_; print "In subroutine : i = " . "@$i "; print "In subroutine : j = " . "@$j "; push(@$i, '4'), shift(@$j); }
This gives the following output:
In main program before calling subroutine: i = 1 2 3 In main program before calling subroutine: j = a b c In subroutine : i = 1 2 3 In subroutine : j = a b c In main program after calling subroutine: i = 1 2 3 4 In main program after calling subroutine: j = b c
The subroutine can now distinguish between the two arrays passed in as arguments. The changes that were made inside the subroutine to the variables remain in effect after the subroutine has ended, and you've returned to the main program. This is the essential characteristic of pass by reference.