There are several input operators we'll discuss here
because they parse as terms. Sometimes we call them pseudoliterals
because they act like quoted strings in many ways. (Output operators
like print
parse as list operators and are
discussed in Chapter 29.)
First of all, we have the command input operator, also known as the backtick operator, because it looks like this:
$info = `finger $user`;
A string enclosed by backticks (grave accents, technically)
first undergoes variable interpolation just like a double-quoted
string. The result is then interpreted as a command line by the
system, and the output of that command becomes the value of the
pseudoliteral. (This is modeled after a similar operator in Unix
shells.) In scalar context, a single string consisting of all the
output is returned. In list context, a list of values is returned,
one for each line of output. (You can set $/
to
use a different line terminator.)
The command is executed each time the pseudoliteral
is evaluated. The numeric status value of the command is saved in
$?
(see Chapter
28 for the interpretation of $?
, also
known as $CHILD_ERROR
). Unlike the
csh version of this command, no translation is
done on the return data--newlines remain newlines. Unlike in any of
the shells, single quotes in Perl do not hide variable names in the
command from interpretation. To pass a $
through
to the shell you need to hide it with a backslash. The
$user
in our finger example
above is interpolated by Perl, not by the shell. (Because the
command undergoes shell processing, see Chapter 23, for security
concerns.)
The generalized form of backticks is
qx//
(for "quoted execution"), but the operator
works exactly the same way as ordinary backticks. You just get to
pick your quote characters. As with similar quoting pseudofunctions,
if you happen to choose a single quote as your delimiter, the
command string doesn't undergo double-quote interpolation;
$perl_info = qx(ps $$); # that's Perl's $$ $shell_info = qx'ps $$'; # that's the shell's $$
The most heavily used input operator is the line input
operator, also known as the angle operator or the
readline
function (since that's what it calls
internally). Evaluating a filehandle in angle brackets
(STDIN
, for example) yields the next line from
the associated filehandle. (The newline is included, so according to
Perl's criteria for truth, a freshly input line is always true, up
until end-of-file, at which point an undefined value is returned,
which is conveniently false.) Ordinarily, you would assign the input
value to a variable, but there is one situation where an automatic
assignment happens. If and only if the line input operator is the
only thing inside the conditional of a while
loop, the value is automatically assigned to the special variable
$_
. The assigned value is then tested to see
whether it is defined. (This may seem like an odd thing to you, but
you'll use the construct frequently, so it's worth learning.)
Anyway, the following lines are equivalent:
while (defined($_ = <STDIN>)) { print $_; } # the longest way while ($_ = <STDIN>) { print; } # explicitly to $_ while (<STDIN>) { print; } # the short way for (;<STDIN>;) { print; } # while loop in disguise print $_ while defined($_ = <STDIN>); # long statement modifier print while $_ = <STDIN>; # explicitly to $_ print while <STDIN>; # short statement modifier
Remember that this special magic requires a
while
loop. If you use the input operator
anywhere else, you must assign the result explicitly if you want to
keep the value:
while (<FH1> && <FH2>) { … } # WRONG: discards both inputs if (<STDIN>) { print; } # WRONG: prints old value of $_ if ($_ = <STDIN>) { print; } # suboptimal: doesn't test defined if (defined($_ = <STDIN>)) { print; } # best
When you're implicitly assigning to $_
in a
$_
loop, this is the global variable by that
name, not one localized to the while
loop. You
can protect an existing value of $_
this
way:
while (local $_ = <STDIN>) { print; } # use local $_
Any previous value is restored when the loop is done.
$_
is still a global variable, though, so
functions called from inside that loop could still access it,
intentionally or otherwise. You can avoid this, too, by declaring a
lexical variable:
while (my $line = <STDIN>) { print $line; } # now private
(Both of these while
loops still implicitly
test for whether the result of the assignment is
defined
, because my
and
local
don't change how assignment is seen by the
parser.) The filehandles STDIN
,
STDOUT
, and STDERR
are
predefined and pre-opened. Additional filehandles may be created
with the open
or sysopen
functions. See those functions' documentation in Chapter 29 for details on
this.
In the while
loops above, we were
evaluating the line input operator in a scalar context, so the
operator returns each line separately. However, if you use the
operator in a list context, a list consisting of all remaining input
lines is returned, one line per list element. It's easy to make a
large data space this way, so use this feature
with care:
$one_line = <MYFILE>; # Get first line. @all_lines = <MYFILE>; # Get the rest of the lines.
There is no while
magic associated with the
list form of the input operator, because the condition of a
while
loop always provides a scalar context (as
does any conditional).
Using the null filehandle within the angle operator
is special; it emulates the command-line behavior of typical Unix
filter programs such as sed and
awk. When you read lines from
<>
, it magically gives you all the lines
from all the files mentioned on the command line. If no files were
mentioned, it gives you standard input instead, so your program is
easy to insert into the middle of a pipeline of processes.
Here's how it works: the first time
<>
is evaluated, the
@ARGV
array is checked, and if it is null,
$ARGV[0]
is set to "-
", which
when opened gives you standard input. The @ARGV
array is then processed as a list of filenames. More explicitly, the
loop:
while (<>) { … # code for each line }
is equivalent to the following Perl-like pseudocode:
@ARGV = ('-') unless @ARGV; # assume STDIN iff empty while (@ARGV) { $ARGV = shift @ARGV; # shorten @ARGV each time if (!open(ARGV, $ARGV)) { warn "Can't open $ARGV: $! "; next; } while (<ARGV>) { … # code for each line } }
except that it isn't so cumbersome to say, and will actually
work. It really does shift array @ARGV
and put
the current filename into the global variable
$ARGV
. It also uses the special filehandle
ARGV
internally--<>
is
just a synonym for the more explicitly written
<ARGV>
, which is a magical filehandle. (The
pseudocode above doesn't work because it treats
<ARGV>
as nonmagical.)
You can modify @ARGV
before the first
<>
as long as the array ends up containing
the list of filenames you really want. Because Perl uses its normal
open
function here, a filename of
"-
" counts as standard input wherever it is
encountered, and the more esoteric features of
open
are automatically available to you (such as
opening a "file" named "gzip -dc < file.gz|
").
Line numbers ($
.) continue as if the input were
one big happy file. (But see the example under
eof
in Chapter
29 for how to reset line numbers on each file.)
If you want to set @ARGV
to your own list
of files, go right ahead:
# default to README file if no args given @ARGV = ("README") unless @ARGV;
If you want to pass switches into your script, you can use one
of the Getopt::*
modules or put a loop on the
front like this:
while (@ARGV and $ARGV[0] =~ /^-/) { $_ = shift; last if /^--$/; if (/^-D(.*)/) { $debug = $1 } if (/^-v/) { $verbose++ } … # other switches } while (<>) { … # code for each line }
The <>
symbol will return false only
once. If you call it again after this, it will assume you are
processing another @ARGV
list, and if you haven't
set @ARGV
, it will input from
STDIN
.
If the string inside the angle brackets is a scalar
variable (for example, <$foo>
), that
variable contains an indirect filehandle,
either the name of the filehandle to input from or a reference to
such a filehandle. For example:
$fh = *STDIN; $line = <$fh>;
or:
open($fh, "<data.txt"); $line = <$fh>;
You might wonder what happens to a line input operator if you put something fancier inside the angle brackets. What happens is that it mutates into a different operator. If the string inside the angle brackets is anything other than a filehandle name or a scalar variable (even if there are just extra spaces), it is interpreted as a filename pattern to be "globbed".[19] The pattern is matched against the files in the current directory (or the directory specified as part of the fileglob pattern), and the filenames so matched are returned by the operator. As with line input, names are returned one at a time in scalar context, or all at once in list context. The latter usage is more common; you often see things like:
@files = <*.xml>;
As with other kinds of pseudoliterals, one level of variable
interpolation is done first, but you can't say
<$foo>
because that's an indirect
filehandle as explained earlier. In older versions of Perl,
programmers would insert braces to force interpretation as a
fileglob: <${foo}>
. These days, it's
considered cleaner to call the internal function directly as
glob($foo)
, which is probably the right way to
have invented it in the first place. So instead you'd write
@files = glob("*.xml");
if you despise overloading the angle operator for this. Which you're allowed to do.
Whether you use the glob
function
or the old angle-bracket form, the fileglob operator also does
while
magic like the line input operator,
assigning the result to $_
. (That was the
rationale for overloading the angle operator in the first place.)
For example, if you wanted to change the permissions on all your C
code files, you might say:
while (glob "*.c") { chmod 0644, $_; }
which is equivalent to:
while (<*.c>) { chmod 0644, $_; }
The glob
function was originally
implemented as a shell command in older versions of Perl (and in
even older versions of Unix), which meant it was comparatively
expensive to execute and, worse still, wouldn't work exactly the
same everywhere. Nowadays it's a built-in, so it's more reliable and
a lot faster. See the description of the
File::Glob
module in Chapter 32 for how to alter the
default behavior of this operator, such as whether to treat spaces
in its operand (argument) as pathname separators, whether to expand
tildes or braces, whether to be case insensitive, and whether to
sort the return values--amongst other things.
Of course, the shortest and arguably the most readable way to
do the chmod
command above is to use the fileglob
as a list operator:
chmod 0644, <*.c>;
A fileglob evaluates its (embedded) operand only when starting a new list. All values must be read before the operator will start over. In a list context, this isn't important because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a false value if you've just run out. Again, false is returned only once. So if you're expecting a single value from a fileglob, it is much better to say:
($file) = <blurch*>; # list context
than to say:
$file = <blurch*>; # scalar context
because the former returns all matched filenames and resets the operator, whereas the latter alternates between returning filenames and returning false.
If you're trying to do variable interpolation, it's
definitely better to use the glob
operator
because the older notation can cause confusion with the indirect
filehandle notation. This is where it becomes apparent that the
borderline between terms and operators is a bit mushy:
@files = <$dir/*.[ch]>; # Works, but avoid. @files = glob("$dir/*.[ch]"); # Call glob as function. @files = glob $some_pattern; # Call glob as operator.
We left the parentheses off of the last example to
illustrate that glob
can be used either as a
function (a term) or as a unary operator; that
is, a prefix operator that takes a single argument. The
glob
operator is an example of a named
unary operator, which is just one kind of operator we'll
talk about in the next chapter. Later, we'll talk about
pattern-matching operators, which also parse like terms but behave
like operators.
[19] Fileglobs have nothing to do with the previously mentioned
typeglobs, other than that they both use the
*
character in a wildcard fashion. The
*
character has the nickname "glob" when used
like this. With typeglobs, you're globbing symbols with the same
name from the symbol table. With a fileglob, you're doing
wildcard matching on the filenames in a directory, just as the
various shells do.