This chapter continues demonstrating the basics of the Perl language begun in Chapter 4. By the end of the chapter, you will know how to:
Search for motifs in DNA or protein
Interact with users at the keyboard
Write data to files
Use loops
Use basic regular expressions
Take different actions depending on the outcome of conditional tests
Examine sequence data in detail by operating on strings and arrays
These topics, in addition to what you learned in Chapter 4, will give you the skills necessary to begin to write useful bioinformatics programs; in this chapter, you will learn to write a program that looks for motifs in sequence data.
Flow control is the order in which the statements of a program are executed. A program executes from the first statement at the top of the program to the last statement at the bottom, in order, unless told to do otherwise. There are two ways to tell a program to do otherwise: conditional statements and loops. A conditional statement executes a group of statements only if the conditional test succeeds; otherwise, it just skips the group of statements. A loop repeats a group of statements until an associated test fails.
Let's take another look at the open
statement. Recall that if you try to open a nonexistent file, you
get error messages. You can test for the existence of a file explicitly, before
trying to open it. In fact, such tests are among the most powerful features of
computer languages. The if
, if-else
, and unless
conditional statements are three such
testing mechanisms in Perl.
The main feature of these kinds of constructs is the testing for a
conditional. A conditional evaluates to a true
or false
value. If the
conditional is true
, the statements following
are executed; if the conditional is false
,
they are skipped (or vice versa).
However, "What is truth?" It's a question that programming languages may answer in slightly different ways.
This section contains a few examples that demonstrate some of Perl's
conditionals. The true-false condition in each example is equality between two numbers. Notice that equality of numbers is
represented by two equal signs ==
, because
the single equal sign =
is already used for
assignment to a variable.
Confusion between =
for assignment and ==
for
numeric equality is a frequent programming bug, so watch for it!
The following examples demonstrate whether the conditional will evaluate to
true
or false
. You don't ordinarily have much use for such simple tests.
Usually you test the values that have been read into variables or the return
value of function calls—things you don't necessarily know beforehand.
The if
statement with a true
conditional:
if( 1 == 1) { print "1 equals 1 "; }
produces the output:
1 equals 1
The test is 1
==
1
, or, in English, "Does 1 equal 1?" Since it
does, the conditional evaluates to true
, the
statement associated with the if
statement is
executed, and a message is printed out.
You can also just say:
if( 1) { print "1 evaluates to true "; }
which produces the output:
1 evaluates to true
The if
statement with a false
conditional:
if( 1 == 0) { print "1 equals 0 "; }
produces no output! The test is 1
==
0
or, in English, "Does 1 equal 0?" Since it
doesn't, the conditional evaluates to false
,
the statements associated with the if
statement aren't executed, and no message is printed out.
You can also just say:
if( 0 ) { print "0 evaluates to true "; }
which produces no output, since 0 evaluates to false
, so the statements associated with the if
statement are skipped entirely.
There's another way to write short if
statements that mirrors how the English language works. In English, you can say,
equivalently, "If you build it, they will come" or "They will come if you build
it." Not to be outdone, Perl also allows you to put the if
after the action:
print "1 equals 1 " if (1 == 1);
which does the same thing as the first example in this section and prints out:
1 equals 1
Now, let's look at an if-else
statement with a true
conditional:
if( 1 == 1) { print "1 equals 1 "; } else { print "1 does not equal 1 "; }
which produces the output:
1 equals 1
The if-else
does one thing if the test
evaluates to true
and another if the test
evaluates to false
. Here is if-else
with a false
conditional:
if( 1 == 0) { print "1 equals 0 "; } else { print "1 does not equal 0 "; }
which produces the output:
1 does not equal 0
The final example is unless
—the opposite of if
. It works
like the English word "unless": e.g., "Unless you study Russian literature, you
are ignorant of Chekov." If the conditional evaluates to true
, no action is taken; if it evaluates to
false
, the associated statements are
executed. If "you study Russian literature" is false
, "you are ignorant of Chekov."
unless( 1 == 0) { print "1 does not equal 0 "; }
1 does not equal 0
Two more comments are in order about these statements and their conditional tests.
First, there are several tests that can be used in the conditional part of
these statements. In addition to numeric equality
==
as in the previous example, you can
also test for
inequality !=
, greater than
>
, less than <
, and more.
Similarly, you can test for string equality using the eq
operator: if two strings are the same,
it's true
. There are also file test
operators that allow you to test if a file exists, is empty, if permissions
are set a certain way, and so on (see Appendix B). One common test is just a variable name: if the
variable contains zero, it's considered false
; any other number evaluates to true
. If the variable contains a nonempty string, it
evaluates to true
; the empty string,
designated by ""
or ''
, is false
.
Second, notice that the statements that follow the conditional are enclosed within a matching pair of curly braces. These statements within curly braces are called a block and arise frequently in Perl.[1] Matching pairs of parentheses, brackets, or braces, i.e., ( ), [ ], < >, and { }, are common programming features. Having the same number of left and right braces in the right places is essential for a Perl program to run correctly.
Matching braces are easy to lose track of, so don't be surprised if you miss some and get error messages when you try to run the program. This is a common syntax error; you have to go back and find the missing brace. As code gets more complex, it can be a challenge to figure out where the matching braces are wrong and how to fix them. Even if the braces are in the right place, it can be hard to figure out what statements are grouped together when you're reading code. You can avoid this problem by writing code that doesn't try to do too much on any one line and uses indentation to further highlight the blocks of code (see Section 5.2).[2]
Back to the conditional statements. The if-else
also has an if-elsif-else
form, as in Example
5-1. The conditionals, first the if
and then the elsif
s,
are evaluated in turn, and as soon as one evaluates to true
, its block is executed, and the rest of
the conditionals are ignored. If none of the conditionals evaluates to
true
, the else
block is executed if there is one—it's optional.
Example 5-1. if-elsif-else
#!/usr/bin/perl -w # if-elsif-else $word = 'MNIDDKL'; # if-elsif-else conditionals if($word eq 'QSTVSGE') { print "QSTVSGE "; } elsif($word eq 'MRQQDMISHDEL') { print "MRQQDMISHDEL "; } elsif ( $word eq 'MNIDDKL' ) { print "MNIDDKL--the magic word! "; } else { print "Is "$word" a peptide? This program is not sure. "; } exit;
Notice the "
in the else
block's
print
statement; it lets you print a
double-quote sign (")
within a
double-quoted string. The
backslash character tells Perl to treat the following "
as the sign itself and not interpret it as
the marker for the end of the string. Also note the use of eq
to check for equality between
strings.
Example 5-1 gives the output:
MNIDDKL--the magic word!
A loop
allows you to repeatedly execute a block of statements enclosed
within matching curly braces. There are several ways to loop in Perl: while
loops, for
loops, foreach
loops, and more. Example 5-2 (which modifies Example 4-7) displays the while
loop and how it's used while
reading protein sequence data in from a file.
Example 5-2. Reading protein sequence data from a file, take 4
#!/usr/bin/perl -w # Reading protein sequence data from a file, take 4 # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file, and in case the # open fails, print an error message and exit the program. unless ( open(PROTEINFILE, $proteinfilename) ) { print "Could not open file $proteinfilename! "; exit; } # Read the protein sequence data from the file in a "while" loop, # printing each line as it is read. while( $protein = <PROTEINFILE> ) { print " ###### Here is the next line of the file: "; print $protein; } # Close the file. close PROTEINFILE; exit;
Here's the output of Example 5-2:
###### Here is the next line of the file: MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD ###### Here is the next line of the file: SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ ###### Here is the next line of the file: GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR
In the while
loop, notice how the
variable $protein
is assigned to the next
line of the file each time through the loop. In Perl, an assignment returns the
value of the assignment. Here, the test is whether the assignment succeeds in
reading another line. If there is another line to read in, the assignment
occurs, the conditional is true
, the new line
is stored in the variable $protein
, and the
block with the two print
statements is
executed. If there are no more lines, the assignment is undefined, the
conditional is false
, and the program skips
the block with the two print
statements,
quits the while
loop, and continues to the
following parts of the program (in this case, the close
and exit
functions).
The open
call is a system call, because to
open a file, Perl must ask for the file from the operating
system. The operating system may be a version of Unix or Linux, a version of
Microsoft Windows, one of the Apple Macintosh operating systems, and so on.
Files are managed by the operating system and can be accessed only by
it.
It's a good habit to check for the success or failure of system calls, especially when opening files. If a system call fails, and you're not checking for it, your program will continue, perhaps attempting to read or write to a file you couldn't open in the first place. You should always check for failure and let the user of the program know right away when a file can't be opened. Often you may want to exit the program on failure or try to open a different file.
In Example 5-2, the open
system call is part of the test of the
unless
conditional.
unless
is the opposite of if
. Just as in English you can say "do the
statements in the block if the condition is true"; you can also say the
opposite, "do the statements in the block unless the condition is true." The
open
system call gives you a true
value if it successfully opens the file; so here, in the conditional test of
the unless
statement, if the open
call fails, the statements in the block
are performed, the program prints an error message, and then exits.
To sum up, conditionals and loops are simple ideas and not difficult to learn in Perl. They are among the most powerful features of programming languages. Conditionals allow you to tailor a program to several alternatives, and in that way, make decisions based on the type of input the program gets. They are responsible for a large part of whatever artificial intelligence there is in a computer program. Loops harness the speed of the computer so that in a few lines of code, you can handle large amounts of input or continually iterate and refine a computation.