Ask almost any Perl programmer, and they'll be glad to give you reams of advice on how to program. We're no different (in case you hadn't noticed). In this chapter, rather than trying to tell you about specific features of Perl, we'll go at it from the other direction and use a more scattergun approach to describe idiomatic Perl. Our hope is that, by putting together various bits of things that seemingly aren't related, you can soak up some of the feeling of what it's like to actually "think Perl". After all, when you're programming, you don't write a bunch of expressions, then a bunch of subroutines, then a bunch of objects. You have to go at everything all at once, more or less. So this chapter is a bit like that.
There is, however, a rudimentary organization to the chapter, in that we'll start with the negative advice and work our way towards the positive advice. We don't know if that will make you feel any better, but it makes us feel better.
The biggest goof of all is forgetting to use
warnings
, which identifies many errors. The second biggest
goof is forgetting to use strict
when it's
appropriate. These two pragmas can save you hours of head-banging when
your program starts getting bigger. (And it will.) Yet another faux
pas is to forget to consult the online FAQ. Suppose you want to find
out if Perl has a round
function. You might try
searching the FAQ first:
% perlfaq round
Apart from those "metagoofs", there are several kinds of programming traps. Some traps almost everyone falls into, and other traps you'll fall into only if you come from a particular culture that does things differently. We've separated these out in the following sections.
Putting a comma after the filehandle in a
print
statement. Although it looks extremely
regular and pretty to say:
print STDOUT, "goodbye", $adj, "world! "; # WRONG
this is nonetheless incorrect, because of that first comma. What you want instead is the indirect object syntax:
print STDOUT "goodbye", $adj, "world! "; # ok
The syntax works this way so that you can say:
print $filehandle "goodbye", $adj, "world! ";
where $filehandle
is a scalar holding
the name of a filehandle at run time. This is distinct
from:
print $notafilehandle, "goodbye", $adj, "world! ";
where $notafilehandle
is simply a
string that is part of the list of things to be printed. See
"indirect object" in the Glossary.
Using ==
instead of
eq
and !=
instead of
ne
. The ==
and
!=
operators are numeric
tests. The other two are string tests. The
strings "123
" and "123.00
"
are equal as numbers, but not equal as strings. Also, any
nonnumeric string is numerically equal to zero. Unless you are
dealing with numbers, you almost always want the string
comparison operators instead.
Forgetting the trailing semicolon. Every statement in Perl is terminated by a semicolon or the end of a block. Newlines aren't statement terminators as they are in awk, Python, or FORTRAN. Remember that Perl is like C.
A statement containing a here document is particularly prone to losing its semicolon. It ought to look like this:
print <<'FINIS'; A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. --Ralph Waldo Emerson FINIS
Forgetting that a
BLOCK
requires braces. Naked
statements are not BLOCK
s. If you are
creating a control structure such as a while
or an if
that requires one or more
BLOCK
s, you must
use braces around each BLOCK
.
Remember that Perl is not like C.
Not saving $1
,
$2
, and so on, across regular expressions.
Remember that every new m/atch/
or
s/ubsti/tution/
will set (or clear, or
mangle) your $1
,
$2
...variables, as well as
$`
, $&
, and
$
'. One way to save them right away is to
evaluate the match within a list context, as in:
my ($one, $two) = /(w+) (w+)/;
Not realizing that a local
also changes the variable's value as seen by other subroutines
called within the scope of the local. It's easy to forget that
local
is a run-time statement that does
dynamic scoping, because there's no equivalent in languages like
C. See the section "Scoped Declarations" in Chapter 4. Usually you want a
my
anyway.
Losing track of brace pairings. A good text editor will help you find the pairs. Get one. (Or two.)
Using loop control statements in do {}
while
. Although the braces in this control structure
look suspiciously like part of a loop
BLOCK
, they aren't.
Saying @foo[1]
when you mean
$foo[1]
. The @foo[1]
reference is an array slice, meaning an
array consisting of the single element
$foo[1]
. Sometimes this doesn't make any
difference, as in:
print "the answer is @foo[1] ";
but it makes a big difference for things like:
@foo[1] = <STDIN>;
which will slurp up all the rest of
STDIN
, assign the first
line to $foo[1]
, and discard everything else.
This is probably not what you intended. Get into the habit of
thinking that $
means a single value, while
@
means a list of values, and you'll do
okay.
Forgetting the parentheses of a list operator
like my
:
my $x, $y = (4, 8); # WRONG my ($x, $y) = (4, 8); # ok
Forgetting to select the right filehandle before
setting $^
, $~
, or
$|
. These variables depend on the currently
selected filehandle, as determined by
select(
FILEHANDLE
)
.
The initial filehandle so selected is STDOUT
.
You should really be using the filehandle methods from the
FileHandle
module instead. See Chapter 28.
Practicing Perl Programmers should take note of the following:
Remember that many operations behave differently in a list context than they do in a scalar one. For instance:
($x) = (4, 5, 6); # List context; $x is set to 4 $x = (4, 5, 6); # Scalar context; $x is set to 6 @a = (4, 5, 6); $x = @a; # Scalar context; $x is set to 3 (the array list)
Avoid barewords if you can, especially all
lowercase ones. You can't tell just by looking at it whether a
word is a function or a bareword string. By using quotes on
strings and parentheses around function call arguments, you
won't ever get them confused. In fact, the pragma use
strict
at the beginning of your program makes
barewords a compile-time error--probably a good thing.
You can't tell just by looking which built-in functions
are unary operators (like chop
and
chdir
), which are list operators (like
print
and unlink
), and
which are argumentless (like time
). You'll
want to learn them by reading Chapter 29. As always, use
parentheses if you aren't sure--or even if you aren't sure
you're sure. Note also that user-defined subroutines are by
default list operators, but they can be declared as unary
operators with a prototype of ($)
or
argumentless with a prototype of ()
.
People have a hard time remembering that some functions
default to $_
, or @ARGV
,
or whatever, while others do not. Take the time to learn which
are which, or avoid default arguments.
<FH>
is not the name of
a filehandle, but an angle operator that does a line-input
operation on the handle. This confusion usually manifests itself
when people try to print
to the angle
operator:
print <FH> "hi"; # WRONG, omit angles
Remember also that data read by the angle operator is
assigned to $_
only when the file read is the
sole condition in a while
loop:
while (<FH>) { } # Data assigned to $_. <FH>; # Data read and discarded!
Don't use =
when you need
=~
; the two constructs are quite
different:
$x = /foo/; # Searches $_ for "foo", puts result in $x $x =~ /foo/; # Searches $x for "foo", discards result
Use my
for local variables
whenever you can get away with it. Using
local
merely gives a temporary value to a
global variable, which leaves you open to unforeseen side
effects of dynamic scoping.
Don't use local
on a module's exported
variables. If you localize an exported variable, its exported
value will not change. The local name becomes an alias to a new
value but the external name is still an alias for the
original.
Cerebral C programmers should take note of the following:
Curlies are required for if
and
while
blocks.
You must use elsif
rather than
"else if" or "elif". Syntax like this:
if (expression) { block; } else if (another_expression) { # WRONG another_block; }
is illegal. The else
part is always a
block, and a naked if
is not a block. You
mustn't expect Perl to be exactly the same as C. What you want
instead is:
if (expression) { block; } elsif (another_expression) { another_block; }
Note also that "elif" is "file" spelled backward. Only Algol-ers would want a keyword that was the same as another word spelled backward.
The break
and
continue
keywords from C become in Perl
last
and next
,
respectively. Unlike in C, these do not
work within a do {} while
construct.
There's no switch statement. (But it's easy to build one on the fly; see "Bare Blocks" and "Case Structures" in Chapter 4.)
You can't take the address of anything, although a similar operator in Perl is the backslash, which creates a reference.
ARGV
must be capitalized.
$ARGV[0]
is C's argv[1]
,
and C's argv[0]
ends up in
$0
.
Syscalls such as link
,
unlink
, and rename
return
true for success, not 0
.
The signal handlers in %SIG
deal with signal names, not numbers.
Sharp shell programmers should take note of the following:
Variables are prefixed with $
,
@
, or %
on the left side
of the assignment as well as the right. A shellish assignment
like:
camel='dromedary'; # WRONG
won't be parsed the way you expect. You need:
$camel='dromedary'; # ok
The loop variable of a foreach
also requires a $
. Although
csh likes:
foreach hump (one two) stuff_it $hump end
in Perl, this is written as:
foreach $hump ("one", "two") { stuff_it($hump); }
The backtick operator does variable interpolation without regard to the presence of single quotes in the command.
The backtick operator does no translation of the return value. In Perl, you have to trim the newline explicitly, like this:
chomp($thishost = `hostname`);
Shells (especially csh) do several levels of substitution on each command line. Perl does interpolation only within certain constructs such as double quotes, backticks, angle brackets, and search patterns.
Shells tend to interpret scripts a little bit at a time.
Perl compiles the entire program before executing it (except for
BEGIN
blocks, which execute before the
compilation is done).
Program arguments are available via
@ARGV
, not $1
,
$2
, and so on.
The environment is not automatically made available as
individual scalar variables. Use the standard
Env
module if you want that to happen.
Penitent Perl 4 (and Prior) Programmers should take note of the following changes between release 4 and release 5 that might affect old scripts:
@
now always interpolates an
array in double-quotish strings. Some programs may now need to
use backslashes to protect any @
that
shouldn't interpolate.
Barewords that used to look like strings to Perl will now look like subroutine calls if a subroutine by that name is defined before the compiler sees them. For example:
sub SeeYa { die "Hasta la vista, baby!" } $SIG{'QUIT'} = SeeYa;
In prior versions of Perl, that code would set the signal
handler. Now, it actually calls the function! You may use the
-w
switch to find such risky usage or
use strict
to outlaw it.
Identifiers starting with "_
"
are no longer forced into package main
,
except for the bare underscore itself (as in
$_
, @_
, and so on).
A double colon is now a valid package separator in an identifier. Thus, the statement:
print "$a::$b::$c ";
now parses $a:
: as the variable
reference, where in prior versions only the
$a
was considered to be the variable
reference. Similarly:
print "$var::abc::xyz ";
is now interpreted as a single variable
$var::abc::xyz
, whereas in prior versions,
the variable $var
would have been followed by
the constant text ::abc::xyz
.
s'$pattern'replacement
' now performs no
interpolation on $pattern
. (The
$
would be interpreted as an end-of-line
assertion.) This behavior occurs only when using single quotes
as the substitution delimiter; in other substitutions,
$pattern
is always interpolated.
The second and third arguments of
splice
are now evaluated in scalar context
rather than in list context.
These are now semantic errors because of precedence:
shift @list + 20; # Now parses like shift(@list + 20), illegal! $n = keys %map + 20; # Now parses like keys(%map + 20), illegal!
Because if those were to work, then this couldn't:
sleep $dormancy + 20;
The precedence of assignment operators is now the same as the precedence of assignment. Previous versions of Perl mistakenly gave them the precedence of the associated operator. So you now must parenthesize them in expressions like:
/foo/ ? ($a += 2) : ($a -= 2);
Otherwise:
/foo/ ? $a += 2 : $a -= 2;
would be erroneously parsed as:
(/foo/ ? $a += 2 : $a) -= 2;
On the other hand:
$a += /foo/ ? 1 : 2;
now works as a C programmer would expect.
open FOO || die
is incorrect. You need
parentheses around the filehandle, because
open
has the precedence of a list
operator.
The elements of argument lists for formats are now evaluated in list context. This means you can interpolate list values now.
You can't do a goto
into a block that
is optimized away. Darn.
It is no longer legal to use whitespace as the name of a variable or as a delimiter for any kind of quote construct. Double darn.
The caller
function now
returns a false value in scalar context if there is no caller.
This lets modules determine whether they're being required or
run directly.
m//g
now attaches its state to the
searched string rather than the regular expression. See Chapter 5, for further details.
reverse
is no longer allowed as the
name of a sort
subroutine.
taintperl is no longer a
separate executable. There is now a -T
switch to turn on tainting when it isn't turned on
automatically.
Double-quoted strings may no longer end with an
unescaped $
or @
.
The archaic if
BLOCK
BLOCK
syntax is no longer supported.
Negative array subscripts now count from the end of the array.
The comma operator in a scalar context is now guaranteed to give a scalar context to its arguments.
The **
operator now binds more tightly
than unary minus.
Setting $#array
lower now discards
array elements immediately.
delete
is not guaranteed to return the
deleted value for tie
d arrays, since this
capability may be onerous for some modules to implement.
The construct "this is $$x
", which used
to interpolate the process ID at that point, now tries to
dereference $x
. $$
by
itself still works fine, however.
The behavior of foreach
when
it iterates over a list that is not an array has changed
slightly. It used to assign the list to a temporary array but
now, for efficiency, no longer does so. This means that you'll
now be iterating over the actual values, not copies of the
values. Modifications to the loop variable can change the
original values, even after the grep
! For
instance:
%perl4 -e '@a = (1,2,3); for (grep(/./, @a)) { $_++ }; print "@a "'
1 2 3 %perl5 -e '@a = (1,2,3); for (grep(/./, @a)) { $_++ }; print "@a "'
2 3 4
To retain prior Perl semantics, you'd need to explicitly assign your list to a temporary array and then iterate over that. For example, you might need to change:
foreach $var (grep /x/, @list) { … }
to:
foreach $var (my @tmp = grep /x/, @list) { … }
Otherwise changing $var
will clobber
the values of @list
. (This most often happens
when you use $_
for the loop variable and
call subroutines in the loop that don't properly localize
$_
.)
Some error messages and warnings will be different.