This section covers getting information into programs and receiving data back from them.
Perl has
several convenient ways to get information into a program. In this
book, I've emphasized opening files and reading in the information contained in
them, because it is frequently used, and because it behaves very much the same
way on all different operating systems. You've observed the open
and close
system calls and how
to associate a filehandle with a file when you open it, which then is used to read
in the data. As an example:
open(FILEHANDLE, "informationfile"); @data_from_informationfile = <FILEHANDLE>; close(FILEHANDLE);
This code opens the file informationfile
and associates the filehandle FILEHANDLE with it. The filehandle is then used
within angle brackets < >
to actually
read in the contents of the file and store the contents in the array @data_from_informationfile
. Finally, the file is
closed by referring once again to the opened filehandle.
Perl allows you to read in any input that is automatically sent to your program via standard input (STDIN). STDIN is a filehandle that by default is always open. Your program may be expecting some input that way. For instance, on a Mac, you can drag and drop a file icon onto the Perl applet for your program to make the file's contents appear in STDIN. On Unix systems, you can pipe the output of some other program into the STDIN of your program with shell commands such as:
someprog | my_perl_program
You can also pipe the contents of a file into your program with:
cat file | my_perl_program
or with:
my_perl_program < file.
Your program can then read in the data (from program or file) that comes as STDIN just as if it came from a file that you've opened:
@data_from_stdin = <STDIN>;
You
can name your input files on the command line. <>
is shorthand for <ARGV>
. The ARGV
filehandle treats the array
@ARGV
as a list of filenames and returns the
contents of all those files, one line at a time. Perl places all command-line
arguments into the array @ARGV
. Some of these
may be special flags, which should be read and removed from @ARGV
if there will also be datafiles named. Perl
assumes that anything in @ARGV
refers to an
input filename when it reaches a < >
command. The contents of the file or files are then available to the program
using the angle brackets < >
without a
filehandle, like so:
@data_from_files = <>;
For example, on Microsoft, Unix, or on the Mac OS X, you specify input files at the command line, like so:
% my_program file1 file2 file3
The print
statement is the most common way to output data from a Perl
program. The print
statement takes as
arguments a list of scalars separated by commas. An array can be an argument, in
which case, the elements of the array are all printed one after the
other:
@array = ('DNA', 'RNA', 'Protein'), print @array;
This prints out:
DNARNAProtein
If you want to put spaces between the elements of an array, place it between
double quotes in the print
statement, like
this:
@array = ('DNA', 'RNA', 'Protein'), print "@array";
This prints out:
DNA RNA Protein
The print
statement can specify a
filehandle as an optional indirect object between the print
statement and the arguments, like so:
print FH "@array";
The printf
function gives more control over the formatting of the output of numbers. For instance, you can specify
field widths; the precision, or number of places after the decimal point; and
whether the value is right- or left-justified in the field. I showed the most
common options in Chapter 12 and refer
you to the Perl documentation that comes with your copy of Perl for all the
details.
The sprintf
function is related to the printf
function; it formats a string instead of printing it
out.
The format
and write
commands are a way to
format a multiline output, as when generating reports. format
can be a useful command, but in practice it isn't used
much. The full details are available in your Perl documentation, and O'Reilly's
Programming Perl contains an entire chapter on format
. You can also see format
in Chapter 12 of
this book.
Standard
output, with the filehandle STDOUT, is the default destination
for output from a Perl program, so it doesn't have to be named. The
following two statements are equivalent unless you used select
to change the default output filehandle:
print "Hello biology world! "; print STDOUT "Hello biology world! ";
Note that the STDOUT isn't followed by a comma. STDOUT is usually directed
to the computer screen, but it may be redirected at the command line to
other programs or files. This Unix command pipes the STDOUT of my_program
to the STDIN of your_program
:
my_program | your_program
This Unix command directs the output of my_program
to the file outputfile:
my_program > outputfile
It's also common to direct certain error messages to the predefined standard error filehandle STDERR or to a file you've opened for input and named with a particular filehandle. Here are examples of these two tasks:
print STDERR "If you reached this part of the program, something is terribly wrong!"; open(OUTPUTFD, ">output_file"); print OUTPUTFD "Here is the first line in the output file output_file ";
STDERR is also usually directed to the computer screen by default, but it can be directed into a file from the command line. This is done differently for different systems, for example, as follows (on Unix with the sh or bash shells):
myprogram 2>myprogram.error
You can also direct STDERR to a file from within your Perl program by including code such as the following before the first output to STDERR. This is the most portable way to redirect STDERR:
open (STDERR, ">myprogram.error") or die "Cannot open error file myprogram.error:$! ";
The problem with this is that the original STDERR is lost. This method, taken from Programming Perl, saves and restores the original STDERR:
open ERRORFILE, ">myprogram.error" or die "Can't open myprogram.error"; open SAVEERR, ">&STDERR"; open STDERR, ">&ERRORFILE; print STDERR "This will appear in error file myprogram.error "; # now, restore STDERR close STDERR; open STDERR, ">&SAVEERR"; print STDERR "This will appear on the computer screen ";
There are a lot of details concerning filehandles not covered in this book, and redirecting one of the predefined filehandles such as STDERR can cause problems, especially as your programs get bigger and rely more on modules and libraries of subroutines. One safe way is to define a new filehandle associated with an error file and to send all your error messages to it:
open (ERRORMESSAGES, ">myprogram.error") or die "Cannot open myprogram.error:$! "; print ERRORMESSAGES "This is an error message ";
Note that the die
function, and the closely
related warn
function, print their error
messages to STDERR.