Example 6-3 is another program that uses subroutines. You'll also use the command line to give the program information it needs (such as filenames, or strings of DNA) without having to interactively answer the program's prompts. This is useful if you're scheduling a program to run at a time when you won't be there, for instance.
Example 6-3 also shows a little more about using arrays. You'll see how to use subscripts to access a specific element of an array.
For command-line programs, you type the name of the program, followed by the arguments to the program, if any, and then hit the Enter (or Return) key to start the program running. In Example 6-3, when the user types the program name, she follows that with the argument, which, in this case, is just the string of DNA in which she'll count the G's. So the program is called and returns an answer like so:
% perl example6-3.pl AAGGGGTTTCCC The DNA AAGGGGTTTCCC has 4 G's in it!
Of course, many programs come with a graphical user interface (GUI). This gives the program some or all of the computer screen and usually includes such things as menus, buttons, and places to type in values to set parameters from the keyboard.
However, many programs are run from a command line. Even the newer Mac OS X, which is built on top of Unix, now provides a command line. (Although most Windows users don't use the MS-DOS command window much, it's still useful, e.g., for running Perl programs.) As already mentioned, running a program noninteractively, passing parameters in as command-line arguments, allows you to run the program automatically, say in the middle of the night when no one is actually sitting at the computer.
Example 6-3 counts the number of G's in a string of DNA.
Example 6-3. Counting the G's in some DNA on the command line
#!/usr/bin/perl -w # Counting the number of G's in some DNA on the command line use strict; # Collect the DNA from the arguments on the command line # when the user calls the program. # If no arguments are given, print a USAGE statement and exit. # $0 is a special variable that has the name of the program. my($USAGE) = "$0 DNA "; # @ARGV is an array containing all command-line arguments. # # If it is empty, the test will fail and the print USAGE and exit # statements will be called. unless(@ARGV) { print $USAGE; exit; } # Read in the DNA from the argument on the command line. my($dna) = $ARGV[0]; # Call the subroutine that does the real work, and collect the result. my($num_of_Gs) = countG ( $dna ); # Report the result and exit. print " The DNA $dna has $num_of_Gs G's in it! "; exit; ################################################################################ # Subroutines for Example 6-3 ################################################################################ sub countG { # return a count of the number of G's in the argument $dna # initialize arguments and variables my($dna) = @_; my($count) = 0; # Use the fourth method of counting nucleotides in DNA, as shown in # Chapter Five, "Motifs and Loops" $count = ( $dna =~ tr/Gg//); return $count; }
Now let's look at how this program works, while examining and explaining the new features. For starters, notice the new line:
use strict;
which I will use from now on to ensure all variables are declared with my
, thus enforcing
lexical scoping.
Perl has some special variables it sets so you can easily use the arguments from
the command line. Every Perl program has an array variable
@ARGV
that contains any command-line arguments.
Also, there's a special variable called $0
(a
zero) that has the name of the program as it was called from the command
line.
Notice in Example 6-3 that an
informative message is defined in the variable $USAGE
and that it begins with the value of the variable $0
, followed an indication of the arguments the
program needs. This is a common practice; if the user doesn't give the program what
it needs, which is determined by some kind of test, the program prints information
about how to properly use it and exits.
In fact, this program does check to see if any arguments were typed on the command
line. It checks if @ARGV
has anything in it, in
which case it evaluates to true
; or if it is
completely empty, in which case it evaluates to false
. If you want the program to require an argument be given, you
can use the unless
conditional, and if @ARGV
is empty, to print out the $USAGE
statement and exit the program:
unless(@ARGV) { print $USAGE; exit; }
The next bit of code shows something new about arrays, namely, how to extract one element from an array, as referenced by a subscript. In other words, it shows how to get at the first, fourth, or whichever element. The code in Example 6-3 shows how to extract the first element, which as you've seen, is numbered 0:
my($dna) = $ARGV[0];
Now you already know there is a first element, since you've just tested to make
sure the array isn't empty. You get the first element of array @ARGV
by changing the @
to a $
and appending square
brackets containing the desired subscript; 0 for the first element, 1 for the second
element, and so on. This syntax indicates that since you're now looking at just one
element of the array, and it's a scalar variable, you use the dollar sign, as you
would any other scalar variable.
In Example 6-3, you copy this first
(and only) element of the command-line array @ARGV
into the variable $dna
.
Finally comes the call to the subroutine, which contains nothing new but fulfills a dream from the final paragraph of Chapter 5:
my($num_of_Gs) = countG ( $dna );