9. Expand Your Perl Foo

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9. Expand Your Perl Foo

Hacks 84-101

What exactly is a Perl guru? Is it someone who’s programmed Perl for years and years? Is it someone with a dozen modules on the CPAN? Is it someone with patches in the core or her name in the Preface.^[1]

Perhaps instead a guru is someone who knows something that most people never knew existed. A real guru can apply that knowledge productively and appropriately to solve a difficult problem with apparent ease.

Want to be a Perl guru? Here’s some of the magic you may never have suspected. Absorb these secrets. Recognize the situations where you can apply them. Then you too will be a guru.

Double Your Data with Dualvars

Store twice the information in a single scalar.

Some languages are really picky about the contents of a variable. If the variable is a string, it’s always a string. If the variable is a number, it’s always a number—especially of a certain type and size.

Perl’s not that picky; it happily converts back and forth depending on what you do with the variable. Consequently, one variable may hold several different pices of data. You can even peek inside Perl’s storage and do different things depending on how you treat your variable—returning entirely different values whether you treat it as a number or a string.

The Hack

Consider a program that has a lot of constants—say, a graphical program [Hack #16] with screen size, color depth, difficulty level, and so on. If you’re debugging such a problem, it can be difficult to track variables when you’re passing around magic variables. It gets worse when you have to deal with flags that you AND and OR together. How do you know when a number is really an important number or just the coincidental result of a calculation that merely looks like an important number?

Instead of having to look up the symbolic names (easy for programmers to remember) for values (easy for a computer to handle) every time you debug something, consider using dualvars:

use Scalar::Util 'dualvar';

my $width   = dualvar( 800, 'Screen width'       );
my $height  = dualvar( 600, 'Screen height'      );
my $colors  = dualvar(  16, 'Screen color depth' );

# some code
sub debug_variable
{
    my $constant = shift;
    printf STDERR "%s is %d\n", $constant, $constant;
}

Now whenever you encounter a variable you want to inspect for debugging, pass it to debug_variable( ), which will helpfully print something like:

Screen width is 800
Screen height is 600
Screen color depth is 16

Running the Hack

Every Perl scalar has several slots for several different types of data. The dualvar( ) function from the ever-useful Scalar::Util package takes two: a numeric and a string value. It stores the numeric value in the numeric (NV) slot of the scalar and sets a flag that it’s okay to look in that slot in numeric contexts. It does the same for the string value (PV slot, string contexts).

Whenever you access a scalar in a certain type of context, Perl first checks the appropriate flag for that context. If it’s okay to use the value in the appropriate slot, Perl does so. Otherwise, it converts an existing value from another slot to the appropriate type, puts the calculated value in the appropriate slot, and sets the flag it checked.

The dualvar nature affects the value stored in the variable, not the variable itself, so it’s safe to pass back and forth in and out of functions.

Hacking the Hack

You can also use this trick with the constant pragma if you prefer that for your constants:

use constant AUTHOR => dualvar( 006, 'chromatic the superspy author' );

If that’s not paranoid enough for you, the Readonly module also works with this technique:

use Readonly;
Readonly::Scalar my $colors => dualvar(  16, 'Screen color depth' );

Replace Soft References with Real Ones

Combine the benefits of name-based references, lexicals, and typo-checking.

One of the first milestones in becoming an effective programmer is the sudden flash of Zen when you first ask the question “How can I use a variable as a variable name?”^[2] The second half of that zap of enlightenment is when you realize why you usually don’t need to do that.

Sometimes it’s the easiest solution to a problem, though—especially when you’re refactoring a large, complex piece of code written by someone who just didn’t get it yet.

Don’t despair; you can have all of the benefits with almost none of the drawbacks.

Suppose you have a sales reporting application such as one the author had to maintain many lives ago. There are multiple types of items for sale, each with its own separate total in the report. You have a parser that reads external data, giving you a key-value pair with the name of the sale category and the value of the item.

Unfortunately, the program uses several lexical-but-file-global variables and you don’t have time to change the whole thing to use an obvious %totals hash.

Tip

If that’s not scary enough, imagine that the actual system really did use symbolic references here (yes, that implies globals, not lexicals!) without error checking and that the sales totals came from the Internet through unencrypted means. Suddenly being a writer seems appealing.

The code, minus the section you need to change and with a fake data-reading section for the purpose of the example, might look something like:

use strict;
use warnings;

my ($books_total, $movies_total, $candy_total, $certificates_total, $total);

create_report( );
print_report( );
exit( );

sub print_report
{
    print <<END_REPORT;
SALES
    Books:             $books_total
    Movies:            $movies_total
    Candy;             $candy_total
    Gift Certificates: $certificates_total

TOTAL:                 $total
END_REPORT
}

sub create_report
{
    # your code here
}

sub get_row
{
    return unless defined( my $line = <DATA> );
    chomp( $line );
    return split( ':', $line );
}

__DATA__
books:10.00
movies:15.00
candy:7.50
certificates:8.00

The Hack

Use a hash, as the FAQ suggests, stuffed full of references to the variables you need to update. You can build this very concisely, with only a little bit of duplication, with a sadly underused piece of Perl syntax—the list reference constructor. When given a list of scalars, the reference constructor (\) returns a list of references to those scalars. That’s perfect for the list of values to a hash slice!

sub create_report
{
    my %totals;
    @totals{ qw( books movies candy certificates total )} =
    \( $books_total, $movies_total, $candy_total,
       $certificates_total, $total
     );

    while (my ($category, $value)  = get_row( ))
    {
        ${ $totals{ $category } } += $value;
        ${ $totals{total}       } += $value;
    }
}

That’s better. When your data feed changes next week and gives you a list of product names, not categories, change the list slice assignment to %totals to store multiple references to the same category total scalars under different keys. You still have an ugly mapping of strings to lexical variables, but until you can refactor out the yuck of the rest of the application, you’ve at least localized the problem in only one spot you need to touch.

Besides, as far as the author knows, the original application is likely still running.

Hacking the Hack

Validation is still a problem here; how do you prevent a typo in the data from an external source from causing a run-time error? With the hash, you can check for a valid key with exists (though storing total in the hash as well is a potential bug waiting to happen). This may be an appropriate place to use a locked hash [Hack #87].

Optimize Away the Annoying Stuff

File off the rough edges of your code.

Sit down and look at the code you actually write every day. Is there something tedious that you do all the time, something that is a tiny (but constant) irritation? Maybe it’s as simple as adding that final "\n" to the end of just about every print command:

print "Showing first ", @results / 2, " of ", scalar @results, "\n";

for (@results[ 0 .. @results / 2 - 1 ])
{
    if (m/($IDENT): \s* (.*?) \s* $/x)
    {
        print "$1 -> ", normalize($2), "\n";
    }
    else
    {
        print "Strange result: ", substr( $2, 0, 10 ), "\n";
    }
}

The Hack

If you find that you have to tack a newline onto the end of most (or even just many) of your print statements, factor out that tiny annoyance:

sub say { print @_, "\n" }

# and ever after...

say "Showing first ", @results / 2, " of ", scalar @results;

for (@results[ 0 .. @results / 2 - 1 ])
{
    if (m/($IDENT): \s* (.*?) \s* $/x)
    {
        say "$1 -> ", normalize($2);
    }
    else
    {
        say "Strange result: ", substr( $2, 0, 10 );
    }
}

Likewise, if you’re forever opening a file and reading in the contents:

open my $fh, '<', $filename or die "Couldn't open 'filename'";
my $contents = do { local $/; <$fh> };

you could automate that:

sub slurp
{
    my ($file) = @_;
    open my $fh, '<', $file or croak "Couldn't open '$file'";
    local $/;
    return <$fh>;
}

# and thereafter:

my $contents = slurp $filename;

Hacking the Hack

The key here is to find the repetitive, low-level, mechanical things you do and hoist them out of your code into shorter, cleaner, higher-level abbreviations. Factoring out these common “micropatterns” makes the resulting code more readable and less prone to typos and other mishaps. It also frees you to concentrate on solving your actual problem.

There are already good implementations on CPAN for many of these micropatterns. For example, see Perl6::Say, File::Slurp, List::UtilTerm::ReadKey, or Sort::Maker. Subroutines like say( ) and slurp( ) are also prime candidates for adding to your standard toolkit [Hack #34].

Lock Down Your Hashes

Protect against typos in your hashes.

As much as the scalar is the fundamental data type in Perl, the hash is perhaps the most useful—except for one small flaw. Though use strict protects against embarrassing typos in variable names, it does nothing to protect against mistyping hash keys.

If you’re fortunate enough to use Perl 5.8.0 or newer, you can protect yourself with locked hashes.

The Hack

Suppose you’re working on code that needs to sling around several hashes and your coworkers keep mistyping key names.^[3] Though things all look correct, it’s difficult to catch and debug and it causes you plenty of problems.

Rather than searching the entire program’s source code (and vetting the contents of all possible variables and configuration files and every place you could get a new hash key), lock the hash’s keys:

use Test::More tests => 2;
use Hash::Util 'lock_keys';

my %locked   = ( foo => 1, bar => 2 );
my %unlocked = ( foo => 1, bar => 2 );
lock_keys( %locked );

eval {   $locked{fool} = 1 };
eval { $unlocked{fool} = 1 };

is( keys %locked,  2, 'hash with locked keys should disallow unknown key' );
is( keys %unlocked, 3, '... but unlocked hash should not' );

Running the Hack

Anyone can add any keys and values to %unlocked, but trying to read from or write to a key not in the hash (in this case, fool) when you call lock_keys( ) will fail with the error:

Attempt to access disallowed key 'fool' in a restricted hash...

Run your test suite and check the line number of the error message to find the culprit. Fix. Repeat.

Note that you can still call exists without triggering the exception. Also, if your co-workers are particularly evil or at least actively malicious, they can call unlock_keys( ) before doing bad things. If this is the case, you have bigger problems than misspellings—such as not having a comprehensive test suite.

Tip

Though this may seem like it solves some of the problems of using blessed hashes as the basis of objects, it has several limitations, especially that it still doesn’t enforce any encapsulation. See instead “Turn Your Objects Inside Out” [Hack #43].

Clean Up at the End of a Scope

Execute your cleanup code, no matter how you exit a scope.

Successful programs are robust. Even if errors happen, the programs can adapt, continuing if possible, but always exiting cleanly and sensibly.

Robust programs often need to guarantee that some sort of cleanup happens sometimes, whether that’s closing spawned programs properly, flushing buffers, removing temporary files, or giving up an exclusive resource. Some programming languages and platforms provide ways to ensure that your cleanup code always runs. Perl doesn’t—but it does provide the hooks to make it possible.

The Hack

Imagine that you have to write a program that processes and analyzes records from a database. The processing isn’t idempotent, so you need to keep track of the most recent record processed. You can assume that the record ids increase monotonically. However, admins can interrupt the program as necessary, as the task has a low priority. You want to make sure that, no matter what happens, you always record the id of the most-recently processed item.

Use Scope::Guard and a closure to schedule an end-of-scope operation:

use Scope::Guard;

sub process_records
{
    my $records  = fetch_records( );
    my $last_rec = 0;
    my $cleanup  = sub { cleanup( $last_rec ) if $last_rec };
    my $guard    = Scope::Guard->new( $cleanup );

    for my $record ( @$records )
    {
        process_record( $record );
        $last_rec = $record;
    }
}

sub cleanup
{
    my $record = shift;

    # mark record as last record successfully completed
}

process_records( ) declares a lexical variable, $last_rec, to hold the last record successfully processed. It then builds a closure in $cleanup which calls the cleanup( ) subroutine, passing $last_rec. Then it creates a new Scope::Guard object with the closure.

In the normal flow of operation, the subroutine will process all of the records. Then it exits the subroutine. At this point, Perl garbage collects $guard and calls the closure, which itself calls the cleanup( ) subroutine and marks the last successfully processed record.

It’s possible that process_record( ) may throw an exception if it cannot process a record appropriately. It’s also possible that an admin or resource limit will kill the process, or something else could stop the program before it finishes processing all of the records. Even so, $guard still goes out of scope and calls cleanup( ).

Tip

Could you modify process_record( ) to update the record of the last successfully processed record? Absolutely. However, it’s not always appropriate or efficient or possible to do so.

Hacking the Hack

Scope::Guard isn’t just good for cleanup. You can perform all sorts of interesting operations when you leave a scope. For example, what if you need to chdir to various directories to run external processes that expect very specific current working directories? You could write a chdir replacement that takes a directory and return a Scope::Guard object that returns to the current working directory:

use Cwd;

sub change_directory
{
    my $newdir = shift;
    my $curdir = cwd( );
    chdir( $newdir );
    return Scope::Guard->new( sub { chdir $curdir } );
}

Of course, you could also use David Golden’s File::pushd module from the CPAN.

Invoke Functions in Odd Ways

Hide function calls behind familiar syntax.

Everyone’s familiar with the normal ways to invoke Perl functions: foo( ) or foo(someparams...) or, in the case of method calls, class->foo(...) or $obj->foo(...).

The Hack

There are far stranger ways to invoke a function. All of these ways are usually too weird for normal use, but useful only on occasion. That’s another way^[4] to say that they are the perfect hack.

Make a Bareword invoke a function

If you have a function named foo, Perl will let you call it without parens as long as it has seen the function defined (or predefined) by the time it sees the call. That is, if you have:

sub foo
{
    ...a bunch of code....
}

or even just:

sub foo;   # predefine 'foo'

then later in your code you are free to write foo($x,$y,$z) as foo $x,$y,$z. A degenerate case of this is that if you have defined foo as taking no parameters, with the prototype syntax,^[5] like so:

sub foo ( )
{
    ...a bunch of code...
}

or:

sub foo ( );

then you can write foo( ) as just plain foo! Incidentally, the constant pragma prior to Perl 5.9.3 defines constants this way. The Perl time( ) function (very non-constant) also uses this approach—that’s why either of these syntaxes mean the same thing:

my $x = time;
my $x = time( );

You could implement a function that returns time( ) except as a figure in days instead of in seconds:

sub time_days ( )
{
    return time( ) / (24 * 60 * 60);
}

my $xd = time_days;

If you tried calling time_days, without parens, before you defined the function, you would get an error message Bareword "time_days" not allowed while "strict subs" in use. That’s assuming you’re running under “use strict,” which of course you are.

A further example is:

use Data::Format 'time2str';

  sub ymd_now ()
  {
      time2str( '%Y-%m-%d', time )
  }

  print "It is now ", ymd_now, "!!\n";

Tie a scalar variable to a function

Perl provides a mechanism, called “tying”, to call a function when someone apparently accesses a particular variable. See perldoc perltie for the agonizing details.

Consider a scalar variable whose value is really variable—a variable that, when read, returns the current time, somewhat in the style of some old BASIC dialects’ $TIME (or TIME$) variable:

{
    package TimeVar_YMDhms;

    use Tie::Scalar ( );
    use base 'Tie::StdScalar';
    use Date::Format 'time2str';

    sub FETCH { time2str('%Y-%m-%dT%H:%M:%S', time) }
}

tie my $TIME, TimeVar_YMDhms;

print "It is now $TIME\n";
sleep 3;
print "It is now $TIME\n";

That produces output like:

It is now 2006-02-03T16:04:17
It is now 2006-02-03T16:04:20

You can even rewrite that to use a general-purpose class, which will produce the same output:

{
    package Tie::ScalarFnParams;
    sub TIESCALAR
    {
        my($class, $fn, @params) = @_;
        return bless sub { $fn->(@params) }, $class;
    }

    sub FETCH { return shift( )->( ) }
    sub STORE { return } # called for $var = somevalue;
}

use Date::Format 'time2str';

tie my $TIME, Tie::ScalarFnParams,
 # And now any function and optional parameter(s):
   sub { time2str(shift, time) }, '%Y-%m-%dT%H:%M:%S';

print "It is now $TIME\n";
sleep 3;
print "It is now $TIME\n";

Tie an array variable to a function

A more sophisticated approach is to tie an array to a function, so that $somearray[123] will call that function with the parameter 123. Consider, for example, the task of giving a number an English ordinal suffix—that is, taking 2 and returning “2nd,” taking 12 and returning “12th,” and so on. The CPAN module Lingua::EN::Numbers::Ordinate’s ordinate function can do this:

use Lingua::EN::Numbers::Ordinate 'ordinate';
print ordinate(4), "!\n";

This shows “4th!”. To invoke this function on the sly, use a tied-array class:

{
    package Tie::Ordinalize;

    use Lingua::EN::Numbers::Ordinate 'ordinate';
    use base 'Tie::Array';

    sub TIEARRAY  { return bless { }, shift } # dummy obj
    sub FETCH     { return ordinate( $_[1] ) }
    sub FETCHSIZE { return 0 }
}

tie my @TH, Tie::Ordinalize;
print $TH[4], "!\n";

which, also, shows “4th!”. Perl calls the required method FETCH when reading $TH[someindex] and FETCHSIZE when reading @TH in a scalar context (like $x=2+@TH). There are other methods that you can define for accessing the tied array as $TH[123] = somevalue, push(@TH,...), or any of the various other operations you can perform on a normal Perl array. The Tie::Array documentation has all the gory details.

Tying an array to something may seem like a pretty strange idea, but Mark Jason Dominus’s excellent core-Perl module Tie::File [Hack #19] puts this to good use.

Tie a hash variable to a function

One of the limitations of tying an array to a function is that the index (as FETCH sees in $somearray[ index ]) obviously has to be a number. With a tied hash, the FETCH method gets a string argument ($somehash{ index }). In this case, you can use tying to make $NowAs{ str } call time2str( str ):

{
    package Tie::TimeFormatty;
    use Tie::Hash ( );
    use base 'Tie::StdHash';
    use Date::Format 'time2str';
    sub FETCH { time2str($_[1], time) }
}

tie my %NowAs, Tie::TimeFormatty;

print "It is now $NowAs{'%Y-%m-%dT%H:%M:%S'}\n";
sleep 3;
print "It is now $NowAs{'%c'}\n";

That produces output like:

It is now 2006-02-03T18:28:06
It is now 02/03/06 18:28:09

An earlier example showed how to make a class Tie::ScalarFnParams which makes any scalar variable call any function with any parameters. You can more easily do the same thing for hashes—except that it already exists. It’s the CPAN module called Interpolation, originally by Mark Jason Dominus. Use it to rewrite the previous code like:

use Date::Format 'time2str';
use Interpolation NowAs => sub { time2str($_[0],time) };

print "It is now $NowAs{'%Y-%m-%dT%H:%M:%S'}\n";
sleep 3;
print "It is now $NowAs{'%c'}\n";

Other hacks based on this on the CPAN include Tie::DBI, which makes $somehash{ somekey } to query arbitrary databases (or DB_File to make it query a mere Berkeley-style database) and Tie::Ispell which makes $dict{ word } spellcheck the word and suggest possibilities if it appears incorrect.

Of course, you can also tie a filehandle to a function [Hack #90].

Add a function-calling layer to filehandles

Modern versions of Perl provide an even more powerful expansion of the idea of tied filehandles, called “PerlIO layers”, where each layer between the program and the actual filehandle can call particular functions to manipulate the passing data. The non-hackish uses of this include doing encoding conversion and changing the newline format, all so that you can access them like so:

open $fh, '>:somelayer:someotherlayer:yetmore', 'file.dat'

as in:

open( $out, '>:utf8', 'resume.utf' ) or die "Cannot read resume: $!\n";
print {$out} "\x{2605} My R\xE9sum\xE9 \x{2605}\n";
close( $out );  # ^^-- a star character

The documentation for PerlIO::via, PerlIO, and Encoding describe the complex interface for writing layers. For super-simple layers, you can use a base class to reduce the interface to a single method, change. Here it is with two layer classes, Scream and Cookiemonster:

package Function_IO_Layer;

# A dumb base class for simple PerlIO::via::* layers.
# See PerlIO::via::dynamic for a smarter version of this.

sub PUSHED { bless { }, $_[0] } # our dumb ctor

# when reading
sub FILL
{
    my($this, $fh) = @_;
    defined(my $line = readline($fh)) or return undef;
    return $this->change($line);
}

sub WRITE
{
    my($this,$buf,$fh) = @_;
    print {$fh} $this->change($buf)  or return -1;
    return length($buf);
}

sub change { my($this,$str) = @_;  $str; } #override!

# Puts everything in allcaps.
package PerlIO::via::Scream;

use base 'Function_IO_Layer';

sub change
{
    my($this, $str) = @_;
    return uc($str);
}

# Changes "I" to "me".
package PerlIO::via::Cookiemonster;

use base 'Function_IO_Layer';

sub change
{
    my($this, $str) = @_;
    $str =~ s<\bI\b><me>g;
    return $str;
}

Use these layers as simply as:

open my $fh, '>:via(Scream):via(Cookiemonster)',
   'author_bio.txt' or die $!;

print {$fh} "I eat cookies without cease or restraint.\n",
   "I like cookies.\n";

close($fh);

That will make author_bio.txt consist of:

ME EAT COOKIES WITHOUT CEASE OR RESTRAINT.
ME LIKE COOKIES.

You can use PerlIO layers to operate on files you’re reading (just change the '>' to '<') or appending to ('>>'), or even to alter data coming to or from processes ('-|' or '|-'). For example, this:

open my $ls, '-|:via(Scream)', 'ls -C /usr' or die $!;
print <$ls>;

shows:

BIN  GAMES    KERBEROS  LIBEXEC  SBIN   SRC  X11R6
ETC  INCLUDE  LIB       LOCAL    SHARE  TMP

where a simple ls -C ~/.mozilla would show just:

bin  games    kerberos  libexec  sbin   src  X11R6
etc  include  lib       local    share  tmp

Going from encoding-conversion (open $fh, '>:utf8'...) to uppercasing (open $fh, '>:via(Scream)'...) may seem a leap from the sublime to the ridiculous—but consider that in between the two, intrepid CPAN authors have already written such classes as PerlIO::via::LineNumber, which transparently adds line numbers to the start of lines, or PerlIO::via::StripHTML, which strips HTML tags—all very hackish, and yet very useful.

Glob Those Sequences

Don’t settle for counting from one to n by one.

Perl has a syntax for generating simple sequences of increasing integers:

@count_up = 0..100;

There’s no syntax for anything more interesting, such as counting by twos or counting down—unless you create one yourself.

The Hack

The angle brackets in Perl have two distinct purposes: as a shorthand for calling readline, and as a shorthand for calling glob:

my $input = <$fh>;      # shorthand for: readline($fh)

my @files = <*.pl>;     # shorthand for: glob("*.pl")

Assuming you’re not interested in that second rather specialized usage (and you can always use the standard File::Glob module, if you are), you can hijack non-readline angles for something much tastier: list comprehensions.

Tip

A list comprehension is an expression that filters and transforms one list to create another, more interesting, list. Of course, Perl already has map and grep to do that:

@prime_countdown = grep { is_prime($_) } map { 100-$_ } 0..99;

but doesn’t have a dedicated (and optimized) syntax for it:

@prime_countdown = <100..1 : is_prime(X)>;

Running the Hack

By replacing the CORE::GLOBAL::glob( ) subroutine, you replace both the builtin glob( ) function and the angle-bracketed operator version. By rewriting CORE::GLOBAL::glob( ), you can retarget the <...> syntax to do whatever you like, for example, to build sophisticated lists.

Do so with:

package Glob::Lists;

use Carp;

# Regexes to parse the extended list specifications...
my $NUM    = qr{\s* [+-]? \d+ (?:\.\d*)? \s* }xms;
my $TO     = qr{\s* \.\. \s*}xms;
my $FILTER = qr{ (?: : (.*) )? }xms;
my $ABtoZ  = qr{\A ($NUM) (,) ($NUM) ,? $TO ($NUM) $FILTER \Z}xms;
my $AZxN   = qr{\A ($NUM) $TO ($NUM) (?:x ($NUM))? $FILTER \Z}xms;

# Install a new glob( ) function...
no warnings 'redefine';
*CORE::GLOBAL::glob = sub
{
    my ($listspec) = @_;

    # Does the spec match any of the acceptable forms?
    croak "Bad list specification: <$listspec>"
        if $listspec !~ $ABtoZ && $listspec !~ $AZxN;

    # Extract the range of values and any filter...
    my ($from, $to, $incr, $filter) =  $2 eq ',' ? ($1, $4, $3-$1, $5)
                                    :              ($1, $2, $3,    $4);

    # Work out the implicit increment, if no explicit one...
    $incr = $from > $to ? -1 : 1 unless defined $incr;

    # Check for nonsensical increments (zero or the wrong sign)...
    my $delta = $to - $from;
    croak sprintf "Sequence <%s, %s, %s...> will never reach %s",
        $from, $from+$incr, $from+2*$incr, $to
            if $incr = = 0 || $delta * $incr < 0;

    # Generate list of values (and return it, if not filter)...
    my @vals = map { $from + $incr * $_ } 0..($delta/$incr);
    return @vals unless defined $filter;

    # Apply the filter before returning the values...
    $filter =~ s/\b[A-Z]\b/\$_/g;
    return eval "grep {package ".caller."; $filter } \@vals";
};

The $ABtoZ and $AZxN regexes match two kinds of sequence specifiers:

<from, then,..to>

and:

<from..to x increment>

and both also allow you to specify a filtering expression after a colon:

<from, then,..to : filter>
<from..to x incr : filter>

The regexes capture and extract the relevant start and end values, the increment amount, and the filter. The subroutine then computes the increment in the cases where it is implicit, and checks to see that the sequence makes sense (that is, it isn’t something like <1..10 x -1> or <1,2,..-10>).

The code then genereates the sequence using a map, and immediately returns it if there is no filter. If there is a filter, the code evals it into a grep and returns the filtered list instead.

Then you can write:

use Glob::Lists;

for ( <1..100 x 7> ) {...}           # 1, 8, 15, 22,...85, 92, 99

my @even_countdown  = <10,8..0>;     # 10, 8, 6, 4, 2, 0

my @fract_countdown = <10,9.5,..0>;  # 10, 9.5, 9,...1, 0.5, 0

my @some_primes = <1..100 x 3 : /7/ && is_prime(N)>;
                                     # 7, 37, 67, 73, 79, 97

Hacking the Hack

One of the neater hacks based on this idea is the CPAN module Tie::FTP, which diverts file operations to functions that actually perform those operations on files on remote FTP servers. Another notable module is Tie::STDERR, which provides handy options for diverting STDERR output to email to root, an errorlog file, or an arbitrary function. For Zen-like transcendence of the very concept of “hack” or “purpose”, the CPAN module IO::Null can tie to functions that do, and return, nothing at all!

Write Less Error-Checking Code

Identify runtime errors without writing code.

One of the less-endearing features of working with the outside world is that things can fail: you might run out of disk space, lose your network connection, or have some other sort of serious error. Robust programs check for these errors and retry or fail gracefully as necessary. Of course, checking every potential point of failure for every possible failure can make a lot of repetitive code.

Fortunately, Perl provides a way to fail on errors without having to check for them explicitly—the Fatal core module.

The Hack

One of the most failure-prone points of programming is IO programming, whether working with files or other computers across a network. File paths may be wrong, file permissions may change, disks can mysteriously fill up, and transitory networking problems may make remote computers temporarily invisible. If you work much with files, using Fatal can reduce the amount of code you need to write.

The Fatal module takes a list of function names to override to raise exceptions on failures. open and close are good candidates. Pass their names to the use line to avoid writing the or die( ) idiom:

use Fatal qw( open close );

open( my $fh, '>', '/invalid_directory/invalid_file' );
print {$fh} "Hello\n";
close $fh;

If you run this (and don’t have a directory named /invalid_directory), you’ll receive an error message:

Can't open(GLOB(0x10159d74), >, /nodirectory/nofile.txt): No such file or
  directory at (eval 1) line 3
  main::__ANON__('GLOB(0x10159d74)', '>', '/nodirectory/nofile.txt') called
      at fatal_io.pl line 8

If it’s appropriate for your program to exit with an error if this happens, this is all you need to do. To handle the error with more grace, wrap the code in an eval block and do what you need to do:

use Fatal qw( open close );
eval {
    open( my $fh, '>', '/invalid_directory/invalid_file' );
    print {$fh} "Hello\n";
    close $fh;
};

die "File error: $!" if $@;

Tip

Of course, nothing says that your code must do something with the caught exception—but at least consider how robust your code should be.

Hacking the Hack

Fatal can also make your own code strict. Use it within your own modules just as you would normally:

package MyCode;

sub succeed { 1 }
sub fail    { 0 }

use Fatal qw( :void succeed fail );

succeed( );
fail( );

1;

Because fail( ) returns false, Fatal throws an exception. This code has one trick, in that the subroutine declarations come before the Fatal call. If you use the module before Perl parses the subroutine declarations, Fatal will not be able to find them and will throw an error.

This can be useful for your own code, but it’s even more useful when you export these functions to other code. The order of the use lines doesn’t matter here, though:^[6]

package MyCode;
use base 'Exporter';
               our @EXPORT = qw( succeed fail );

sub succeed { 1 }
sub fail    { 0 }

use Fatal qw( :void succeed fail );

1;

This technique even works with classes and objects; you don’t have to export your methods for Fatal to work.

Return Smarter Values

Choose the correct scalar for any context.

There’s always one troublemaker in any bunch. When it comes to return contexts, that troublemaker is scalar.

List and void contexts are easy. In list context you just return everything. In void context, return nothing. Scalar contexts allow you to return only one thing, but there are just too many alternatives: a string, a count, a boolean value, a reference, a typeglob, or an object.

The real problem, though, isn’t actually that there are too many types of possible return value in scalar context; the real problem is that Perl simply doesn’t provide you with enough...well...context with which to decide. The only basis you have for knowing whether to return a string, number, boolean, and so on, is receiving a single uninformative defined-but-false value from wantarray.

Even then, using wantarray leads to a lot of unnecessary and self-undocumenting infrastructure:

if (wantarray)                # wantarray true      --> list context
{
    return @some_list;
}
elsif (defined wantarray)     # wantarray defined   --> scalar context
{
    return $some_scalar;
}
else                          # wantarray undefined --> void context
{
    do_something( );
    return;
}

It would be much easier if you could just specify a single return statement that knew what to return in different contexts, perhaps:

return
    LIST   { @some_list     }
    SCALAR { $some_scalar   }
    VOID   { do_something( ) };

That’s exactly what the Contextual::Return CPAN module does. It makes the previous example work like you’d expect.

Fine Distinctions

The module also allows you to be more specific about what to return in different kinds of scalar context.

For example, you might want a stopwatch( ) subroutine that returns the elapsed time in seconds in numeric contexts, but an HH:MM:SS representation in string contexts. You might also want it to return a true or false value depending on whether the stopwatch is currently running. You can do all of that with:

use Time::HiRes 'time';
use Contextual::Return;

my $elapsed       = 0;
my $started_at    = 0;
my $is_running    = 0;

# Convert elapsed seconds to HH::MM::SS string...
sub _HMS
{
    my ($elapsed) = @_;
    my $hours     = int($elapsed / 3600);
    my $mins      = int($elapsed / 60 % 60);
    my $secs      = int($elapsed) % 60;
    return sprintf "%02d:%02d:%02d", $hours, $mins, $secs;
}

sub stopwatch
{
    my ($run)     = @_;

    # Update elapsed time...
    my $now       =  time( );
    $elapsed     +=  $now - $started_at if $is_running;
    $started_at   =  $now;

    # Defined arg turns stopwatch on/off, undef arg resets it...
    $is_running   =  $run if @_;
    $elapsed      =  0 if @_ && !defined $run;

    # Handle different scalar contexts...
    return
         NUM { $elapsed         }
         STR { _HMS( $elapsed ) }
        BOOL { $is_running      }
}

With that arrangement, you can write code like:

print "The clock's already ticking\n"
    if stopwatch( );                              # treat as a boolean
stopwatch(1);                                    # start
do_stuff( );
stopwatch(0);                                    # stop
print "Did stuff in ", stopwatch( ), "\n";        # report as string

stopwatch(undef);                                # reset
stopwatch(1);                                    # start
do_more_stuff( );
print "Did more stuff in ", stopwatch(0), "\n";  # stop and report

print "Sorry for the delay\n"
    if stopwatch( ) > 5;                          # treat as number

Name that Return Value

The stopwatch example works well, but it still doesn’t explore the full range of possibilities for a scalar return value. For example, the single piece of numeric information you want back might not be the elapsed time, but rather when the stopwatch started. You might also want to return a boolean indicating whether the stopwatch is currently running without always having to cast your call into boolean context:

$stopwatch_running = !!stopwatch( );      # !! --> boolean context

It would be handy if, in addition to all the other return options, stopwatch( ) would also return a hash reference, so you could write:

$stopwatch_running = stopwatch->{running};

print "Stopwatch started at ", stopwatch->{started}, "\n";

Returning a hash reference allows you to send back all the information you have available, from which the caller can then pick out (by name) the interesting bits. Using names to select what you want back also helps the code document what it’s doing.

Contextual::Return makes it easy to add this kind of behavior to stopwatch( ). Just add a specific return value for the HASHREF context:

# Handle different scalar contexts...
return
        NUM { $elapsed         }
        STR { _HMS( $elapsed ) }
       BOOL { $is_running      }
    HASHREF { { elapsed 
               => $elapsed,
                               started   => $now - $elapsed,
                               running   => $is_running,
                             }
                           }

Out, Out, Damn Commas!

Contextual::Return can handle other types of reference returns as well. One of the most useful is SCALARREF {...}. This block specifies what to return when the calling code uses the return value as a reference to a scalar. That is, what to return if you write:

${ stopwatch( ) }    # Call stopwatch( ) and treat result as scalar ref

The reason this particular construct is so interesting is that you can interpolate it directly into a double quoted string. For example, add a SCALARREF return block to stopwatch( ):

# Handle different scalar contexts...
return
        NUM { $elapsed         }
        STR { _HMS($elapsed)   }
  SCALARREF  { \ _HMS($elapsed)   }
       BOOL { $is_running      }
    HASHREF { {   elapsed => $elapsed,
                  started => $now - $elapsed,
                  running => $is_running,
              }
            }

Then, whenever it’s called in a scalar-ref context, the subroutine returns a reference to the HH:MM:SS elapsed string, which the scalar ref context then automatically dereferences. Instead of having to write:

print "Did stuff in ", stopwatch( ), "\n";

you can interpolate the call right into the string itself:

print "Did stuff in ${stopwatch( )}\n";

This turns out to be so amazingly useful that it’s Contextual::Return’s default behaviour. That is, any subroutine that specifies one or more of STR {...}, NUM {...}, or BOOL {...} automatically gets a SCALARREF {...} as well: one that returns a reference to the appropriate string, number, or boolean.

Return Active Values

Return values that automatically change as you use them.

The Contextual::Return module [Hack #92] has another very powerful trick up its sleeve. The scalar values it returns don’t have to be constants; they can be “active.” An active value is one that adapts itself each time it is evaluated. This is useful for performing initialization, cleanup, or error-handling code without forcing the caller to do anything special.

The Hack

For example, you can create a subroutine that returns a value that automatically tracks the elapsed time between events:

use Contextual::Return;
use Time::HiRes qw( sleep time );      # Allow subsecond timing

# Subroutine returns an active timer value...
sub timer
{
    my $start = time;                  # Set initial start time

    return VALUE                       # Return an active value that...
    {                     
        my $elapsed = time - $start;   #    1. computes elapsed time
        $start      = time;            #    2. resets start time
        return $elapsed;               #    3. returns elapsed time
    }
}

# Create an active value...
my $process_timer = timer( );

# Use active value...
while (1)
{
    do_some_long_process( );
    print "Process took $process_timer seconds\n";
}

Because the timer( ) subroutine returns a contextual value that is computed within the VALUE block itself, that returned value becomes active. Each time the value of $process_timer is reevaluated (in the print statement), the value’s VALUE block executes, recomputing and resetting the value stored in $process_timer.

Running the Hack

Of course, the real advantage here is that you can have the subroutine create two or more timers for you:

my $task_timer    = timer( );
my $subtask_timer = timer( );

for my $task (@tasks)
{
    print "Performing $task...\n";
    for my $subtask ($task->get_subtasks( ))
    {
        $subtask->perform( );
        print "\t$subtask took $subtask_timer seconds\n";
    }
    print "Finished $task in $task_timer seconds\n\n";
}

to produce something like:

$ perl do_tasks.pl

Performing set-up...
    Finding files took 0.775737047195435 seconds
    Reading files took 0.985733032226562 seconds
    Verifying data took 0.137604951858521 seconds
Finished set-up in 1.98483791351318 seconds

Performing initialization...
    Creating data structures took 0.627048969268799 seconds
    Cross-correlating took 2.756386041641235 seconds
Finished initialization in 3.45225400924683 seconds

etc.

Hacking the Hack

Active values can use all the other features of the Contextual::Return module. In particular, they can still be context-sensitive. For example, you could create a safer version of the built-in open function, where “safer” means that this version will return a filehandle that explodes catastrophically if you ever try to use the handle without first verifying that it was opened correctly.

Implement it like this:

use Contextual::Return;

sub safe_open
{
    my ($mode, $filename) = @_;
    my $user_has_tested   = 0;

    # Open a filehandle and remember where it was opened...
    open my($filehandle), $mode, $filename;
    my $where = sprintf("'%s' line %s", (caller)[1,2]);

    # Return an active value that's only usable after it's been tested...
    return (
        BOOL
        {
            $user_has_tested = 1;
            return defined $filehandle;
        }
        DEFAULT
        {
            croak "Used untested filehandle (opened at $where)"
                unless $user_has_tested;
            return $filehandle;
        }
    )
}

The safe_open subroutine expects two arguments: the opening mode and the name of the file to open:

my $fh = safe_open '<', $some_file;

The returned value acts like a filehandle in all contexts, but only after you have tested the value in a boolean context. Accessing the returned value in a boolean context invokes the value’s BOOL block, which actively sets the $user_has_tested flag true. If you try to use the filehandle before you’ve tested it:

my $fh    = safe_open '<', $some_file;

my $input = <$fh>;          # Use of untested return value
                            # invokes DEFAULT block

the BOOL block will not have run, so the internal flag will still be false, and the value’s DEFAULT block will throw an exception:

$ perl demo.pl

Used untested filehandle (opened at 'demo.pl' line 12) at demo.pl line 14

If however, the filehandle has been tested in any boolean context:

my $fh = safe_open '<', $some_file
    or croak "Couldn't open $some_file";   # the 'or' evaluates $fh in a
                                           # boolean context so it invokes
                                           # returned value's BOOL block

my $input = <$fh>;          # Invokes returned value's DEFAULT block

then the value’s BOOL block will have set the $user_has_tested flag. Once the flag is set, the DEFAULT block will thereafter return the filehandle without detonating.

Of course, this is incompatible with the use of Fatal as shown in “Write Less Error-Checking Code” [Hack #91].

Add Your Own Perl Syntax

Shape the language as you see fit.

Perl is a great language, but it’s certainly not perfect. Sometimes bits and pieces of the implementation poke through. Sometimes the natural solution to a problem doesn’t fit the existing language very well at all. Some problems are easier if you can just define them away.

Sometimes the simplest solution is just to change the syntax of Perl.

For example, it’s frustrating that you can’t specify a simple parameter list for a subroutine (without some gyrations, as in “Autodeclare Method Arguments” [Hack #47]):

my @NUMERAL_FOR    = (0..9,'A'..'Z'),

sub convert_to_base($base, $number)
{
    my $converted  = "";
    while ($number > 0)
    {
        $converted = $NUMERAL_FOR[$number % $base] . $converted;
        $number    = int( $number / $base);
    }
    return $converted;
}

Instead, you have to do it yourself:

sub convert_to_base
{
    my ($base, $number) = @_;   # <-- DIY parameter list

    my $converted       = ''
    while ($number > 0)
    {
        $converted      = $NUMERAL_FOR[$number % $base] . $converted;
        $number         = int( $number / $base);
    }

    return $converted;
}

This is why far too many people just write:

sub convert_to_base
{
    my $converted  = '';

    while ($_[1] > 0)
    {
        $converted = $NUMERAL_FOR[$_[1] % $_[0]] . $converted;
        $_[1]      = int( $_[1] / $_[0]);
    }

    return $converted;
}

buying themselves a world of future maintenance pain in the process.

The Hack

Although Perl may not be perfect, it is perfectable. For example, recent versions of Perl provide a way to grab your program’s source code before it even reaches the compiler, change it in some useful manner, and then send it on to be compiled and executed. The easiest way to do that is to write a module that uses the standard (in version 5.8.0 and later) Filter::Simple module:

package My::Filter;
use Filter::Simple;

FILTER_ONLY code => sub
{
    # The code from any program that uses this module
    # is passed into this subroutine in $_.
    # Whatever is in $_ at the end of this subroutine
    # becomes the source code that the compiler eventually sees.
};

1;

Because the Perl compiler only sees the end result of these source filters, only that end result has to be valid Perl code. The original source code that the filter intercepts can be anything you like, as long as the filter can transform that anything into valid Perl.

For example, you could augment the Perl subroutine declaration syntax by creating a source filter that looks for subs with parameter lists and converts them to normal subs:

package Sub::With::Params;
use Filter::Simple;

# Regex that matches a valid Perl identifier (e.g. a sub name)...
my $IDENT = qr/[^\W\d]\w*/;

# Apply this filter to the code of any program
# that uses Sub::With::Params...
FILTER_ONLY code => sub
{
    s{ ( sub \s* $IDENT \s* )   # Match any named sub declaration
       (   \( .*? \)        )   # ...followed by a parameter list
       (   \s* \{           )   # ...followed by a sub body
    }
    {$1$3 my $2 = \@_;}gxs;     # Then move the param list inside the
                                # sub, converting it to a list of
                                # lexical variables initialized from @_
};

1;

By setting up this filter module so that it expects subs with parameter lists and transforms them into regular subs that unpack @_ into lexicals, now you can write:

use Sub::With::Params;

sub convert_to_base($base, $number)
{
    my $converted  = '';
    while ($number > 0)
    {
        $converted = $NUMERAL_FOR[$number % $base] . $converted;
        $number    = int( $number / $base);
    }
    return $converted;
}

and have it work as you expect. Sub::With::Params will now intercept your source code on its way to the compiler and convert it to:

sub convert_to_base { my ($base, $number) = @_;
    my $converted  = '';
    while ($number > 0)
    {
        $converted = $NUMERAL_FOR[$number % $base] . $converted;
        $number    = int( $number / $base);
    }
    return $converted;
}

Modify Semantics with a Source Filter

Tweak Perl’s behavior at the syntactic level.

In addition to adding new syntax [Hack #94], source code filters can change the behavior of existing Perl constructs. For example, a common complaint about Perl is that you cannot indent a heredoc properly. Instead you have to write something messed-up like:

sub usage
{
    if ($::VERBOSE)
    {
        print <<"END_USAGE";
Usage: $0 [options] <infile> <outfile>

Options:
    -z       Zero tolerance on formatting errors
    -o       Output overview only
    -d       Debugging mode
END_USAGE
    }
}

rather than something tidily indented like:

sub usage
{
    if ($::VERBOSE)
    {
        print <<"END_USAGE";
            Usage: $0 [options] <infile> <outfile>

            Options:
                -z       Zero tolerance on formatting errors
                -o       Output overview only
                -d       Debugging mode
            END_USAGE
    }
}

Except, of course, you can have your heredoc and indent it too. You just need to filter out the unacceptable indentation before the code reaches the compiler. This is another job for source filters.

The Hack

Suppose that you could use the starting column of a heredoc’s terminator to indicate the left margin of each line of the preceding heredoc content. In other words, what if you could indent every line in the heredoc by the same amount as the final terminator marker? If that were the case, then the previous example would work as expected, printing:

$ ksv -z filename

Usage: ksv [options] <infile> <outfile>

Options:
 -z       Zero tolerance on formatting errors
 -o       Output overview only
 -d       Debugging mode

with the start of each line hard against the left margin.

To make that happen in real life, you need a source filter that recognizes indented heredocs and rewrites them as unindented heredocs before they reach the compiler. Here’s a module that provides just that:

package Heredoc::Indenting;

use Filter::Simple;

FILTER
{
    # Find all instances of...
    1 while
        s{ <<                     #     Heredoc marker
           ( ['"]             )   # $1: Quote for terminator
           ( (?:\\\1|[^\n])*? )   # $2: Terminator specification
             \1                   #     Matching closing quote
           ( [^\n]*  \n       )   # $3: The rest of the statement line
           ( .*? \n           )   # $4: The heredoc contents
           ( [^\S\n]*         )   # $5: Any whitespace indent before...
             \2 \n                #     ...the terminator itself
        }

        # ... and replace it with the same heredoc, with its terminator
        # outdented and the heredoc contents passed through a subroutine
        # that removes the indent from each line...
        {Try::outdent(q{$1$2$1}, '$5',<< $1$2$1)\n$4$2\n$3}xms;
};

use Carp;

# Remove indentations from a string...
sub outdent
{
    my ($name, $indentation, $string) = @_;

    # Complain if any line doesn't have the specified indentation...
    if ($string =~ m/^((?:.*\n)*?)(?!$indentation)(.*\S.*)\n/m)
    {
        my ($good_lines, $bad_line) = ($1, $2);
        my $bad_line_pos = 1 + ($good_lines =~ tr/\n/\n/);
        croak "Negative indentation on line $bad_line_pos ",
              "of <<$name heredoc specified";
    }

    # Otherwise remove the indentations from each line...
    $string =~ s/^$indentation//gm;
    return $string;
}

1;

The FILTER {...} block tells Filter::Simple how to filter any code that uses the Heredoc::Indenting module. The code comes in in the $_ variable and the block then uses a repeated regex substitution to replace each outdented heredoc with a regular left-justified heredoc.

The regex is complex because it has to break a heredoc up into: introducer, quoted terminator specification, remainder of statement, heredoc contents, terminator indent, and terminator. The replacement is complex too, as it reorders those components as: outdenter function, introducer, quoted terminator specification, heredoc contents, terminator, and remainder of statement.

This reordering also explains why the FILTER block uses 1 while s/.../.../ instead of s/.../.../g. Using the /g flag doesn’t allow for overlapping matches, which would cause the substitution to skip over the rewritten remainder of statement component. The remainder of the statement might contain another indented heredoc however, which would then process incorrectly. In contrast, the 1 while... form rematches the partially rewritten source code from the start, so it correctly handles multiple heredocs on the same line.

There’s a cunning layout trick used here. Because each heredoc is rewritten as a (modified) heredoc, on the second iteration of the 1 while, the first heredoc it will find is the one it just rewrote, so the substitution is in danger of reprocessing and re-reprocessing and re-re-reprocessing that very first heredoc ad infinitum. To avoid that, the module requires that indented heredocs have no space between their << introducer and their terminator specification, like so:

print <<"END_USAGE";

Then it carefully rewrites each heredoc so that it does have a space between those two components:

{Try::outdent(q{$1$2$1}, '$5',<< $1$2$1)\n$4$2\n$3}xms;
#                               ^
#                               |

That way, the next time the iterated substitution matches against the source code, it will ignore any already-rewritten heredocs and move on to the first unrewritten one instead.

Each heredoc is rewritten to pass through the Try::outdent( ) subroutine at runtime. This subroutine removes the specified indentation (passed as its second argument) from the heredoc text, checking for invalid indentations as it does so.

Hacking the Hack

As an alternative, the FILTER block itself could run the heredoc contents through outdent( ) as it rewrites them. To do that, the second half of the substitution would look instead like:

{"<< $1$2$1\n" . Try::outdent($1.$2.$1, $5, $4) . "$2\n$3"}exms;

with the /e flag allowing you to specify the replacement as an expression to be evaluated, rather than as a simple string.

The advantage of this second version of the filter is that the outdenting of each heredoc now occurs only once, at compile time during the original source filtering, rather than every time perl encounters the heredoc at run-time. The disadvantage is that Perl will report any errors during the outdenting as occurring at the use Heredoc::Indenting line, rather than in the correct position of the heredoc in the source code. Although that’s entirely accurate—they are occurring during the loading of the filtering module—it’s not very useful to users of the module, who really want to know where their heredocs are broken, not where your module detected the breakage.

Use Shared Libraries Without XS

Call C code from Perl without needing a compiler.

One of the few ways in which installing Perl modules is painful is when they link to shared libraries written in other languages. The first pain is that someone has to write XS or Inline::C or Swig bindings for the shared library. The second is that installing such modules usually requires a working C development environment—and not every machine nor user has such a luxury.

For simple tasks that merely wrap a shared library, there’s no reason you need that much; with a little clever coding you can use just about any shared library written in C with an idea backported from Perl 6 to Perl 5.

The Hack

Consider how Perl passes arguments to functions: in @_, on a stack. As far as the calling conventions work, any function that takes a string, an array reference, and a hash reference looks the same for the purposes of calling the function.

Consider how C passes arguments to functions: much the same way. Any function that takes two integers and returns a double looks the same, again as far as the calling conventions go.

Similarly, any XS code that converts between a Perl function that passes two integers (well, scalars containing integers) to a C function and expects a double (well, a scalar containing a numeric value) is the same. The only difference is the name of the function as Perl sees it and the actual C function it calls. Those are actually very easy to vary.

The P5NCI module builds up its own library of thunk functions (the glue between Perl and C) and allows you to bind thunks to functions in shared libraries.

Running the Hack

Suppose you want to use a good, fast library for determining the cube root of any given number. The libm shared library in the C standard library provides a nice function cbrt that takes a double and returns a double and does it fairly quickly—at least more quickly than you could do it in Perl (and certainly more easily than you could write XS code to do it).

Loading the library and creating the wrapper around cbrt is easy, once you know the name of the shared library and the function signature:

use P5NCI::Library;

my $lib = P5NCI::Library->new( library => 'm' );
$lib->install_function( 'cbrt', 'dd' );

print cbrt( 27 ), "\n";
print cbrt( 31 ), "\n";

Note that, if you have P5NCI installed, you don’t need a compiler to do this, nor do you even need math.h installed for this to work! Just create a new P5NCI::Library object, passing the name of the library without any platform-specific prefix or suffix.^[7] Then, call install_function( ) on that object, passing the name of the function within the shared library to wrap as well as the signature, where the first letter is the type of the returned variable and the remaining letters are the types of the arguments to the function. You probably need to read the header or at least the documentation for the shared library to find out the signature, but you don’t have to have the development package or tools installed when you deploy your code.

Then call the function as if it were a normal Perl function—as far as the code cares, it is.

See the module’s documentation for other call signatures.

Hacking the Hack

One drawback of the NCI approach as described here is that it requires a shared library containing the thunking layer between Perl’s and C’s calling conventions. Even with a few possible data types and signatures no longer than four characters, there are still many, many possible necessary thunking functions. If your project only needs a few types of signatures, building your own thunking library can be useful. If you want to distribute this thunking layer, you must compile it for the destination machines. Fortunately, you only have to compile it once.

For general use, wrapping a library such as libffi may be a better approach—it can generate the thunks on its own as needed, requiring only that you have the FFI library installed. Look for updates to P5NCI on the CPAN that do this.

It’s possible to handle pointers to structures passed to and from the shared library as well. Marcus Holland-Moritz’s Convert::Binary::C is a likely candidate for giving access to struct members.

Run Two Services on a Single TCP Port

Reuse your precious ports simultaneously.

It is a well-known trick to use the HTTP CONNECT method to politely ask a web proxy to open a specific port on a specific machine on the Internet. This is how many people manage to connect back to their SSH server at home. SSH clients such as PuTTY know how to go through web proxies using this technique.

However, your company security administrator may have configured the proxy to only allow port 443^[8] for outgoing CONNECT requests. Well, you can easily set up your SSH server so that it listens on both 22 and 443 ports:

# sshd_config file
Port 22
Port 443

What if you also run a HTTPS server on this machine? There is no way for you to contact it outside port 443 (due to the security policy) and besides, everyone else using the service at https://home.example.com/ uses port 443.

You have one port and two services. Do you really have to abandon one of them?

The Hack

You need some kind of proxy, or rather, reverse-proxy sitting on port 443 at home.example.com that can tell the difference between a SSL connection and a SSH connection.

Using a tool such as Ethereal, it’s quite easy to notice the differences between the two protocols by looking at the first few packets of data exchanged. The SSH server packets look something like:

SSH-2.0-OpenSSH_3.9p1

while the client resembles:

SSH-2.0-OpenSSH_4.2p1 Debian-5

Then they both negotiate the cyphering protocol and everything else. HTTP over SSL looks different. A common session might be:

Client: ....s...o......$.@]w#.U!..F.(.h..^.#y....D....[/.x.=...."..w.4..
Server: ....J...F..C.B.....y..cY.}s......h\.qo.......9..8.i.|..7..

Here it’s unreadable garbage from the beginning—but did you notice the difference?

When using a protocol like SSH, the server always speaks first, and sends a banner to the client. When using HTTP over SSL, it’s the client that speaks first.

Now you have a way to discriminate between the two services: once a client has connected to the reverse-proxy’s port 443, if it immediately sends data then it’s an SSL client; if it does nothing and waits for data to be sent by the server, then it’s a SSH client. The reverse proxy can wait for a short timeout before deciding which of the two services to contact. With the connection established, it can start its proxy work and send data back and forth between client and server.

In 2003, Philippe “BooK” Bruhat wrote a 160-line script that did just that. Nowadays, all the necessary logic to write a network proxy is in a module aptly named Net::Proxy. Creating a reverse proxy to serve both HTTPS and SSH on port 443 of your home machine is now a handful of lines of code away:

#!/usr/bin/perl

use strict;
use warnings;

use Net::Proxy;

# show some information on STDERR
Net::Proxy->set_verbosity(1);

# run this on the server that should listen on port 443
my $proxy = Net::Proxy->new(
    {   in =>
        {
            type         => 'dual',
            host         => '0.0.0.0',
            port         => 443,
            client_first =>
            {
                type => 'tcp',
                port => 444,     # move the https server to another port
            },
            server_first =>
            {
                type => 'tcp',
                port => 22,      # good old SSH
            },

            # wait during a 2 second timeout
            timeout      => 2,
        },
        out => { type => 'dummy' },
    }
);

$proxy->register( );

Net::Proxy->mainloop( );

Depending on your operating system, you may need to run the program with some administrative privileges to listen on a port below 1024.

Running the Hack

Run the program on your server. From your workstation, connect as usual. The only limitation of dual is that you have to find a pair of services with these special characteristics. HTTP and HTTPS are protocols where the client speaks first. The server speaks first with SSH, POP3, and SMTP.

Hacking the Hack

Net::Proxy is how BooK hacked the sslh hack (the 160-line Perl script). This module introduces the concept of connectors: in connectors accept incoming (client) connections and forward them toward servers via out connectors.

There is a tcp connector that handles standard TCP inbound and outbound connections, a connect connector that implements the CONNECT trick mentioned earlier, and a dummy connector that does nothing. BooK plans to add new connectors over time.

The dual connector used in this example uses the timeout trick to decide which connector (server_first or client_first) will contact the remote service. So the usual out parameter is just a dummy connector.

You can also use Net::Proxy to proxy an outgoing SSH connection through the corporate proxy:

#!/usr/bin/perl

use strict;
use warnings;

use Net::Proxy;

# show some information on STDERR
Net::Proxy->set_verbosity(1);

# run this on your workstation
my $proxy = Net::Proxy->new(
    {   in =>
        {
            # local port for local SSH client
            port => 2222,
            type => 'tcp',
        },
        out =>
        {
            host        => 'home.example.com',
            port        => 443,
            proxy_host  => 'proxy.company.com',
            proxy_port  => 8080,
            proxy_user  => 'id23494',
            proxy_pass  => 's3kr3t',
            proxy_agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)',
        },
    }
);

$proxy->register( );

Net::Proxy->mainloop( );

To reach the https server, use your browser as usual—you’ve already configured it to use the corporate proxy!

Two scripts included in the Net::Proxy distribution support both of these uses. sslh lets you run two services on a single port on the server side and connect-tunnel helps you get through the corporate proxy on the client side.

Improve Your Dispatch Tables

Run code based on regex matches.

A dispatch table, in the form of a hash, is a useful technique for associating code with keys:

my %dispatch =
(
    red   => sub { return qq{<font color="#ff0000">$_[0]</font>} },
    green => sub { return qq{<font color="#00ff00">$_[0]</font>} },
    blue  => sub { return qq{<font color="#0000ff">$_[0]</font>} },
    black => sub { return qq{<font color="#000000">$_[0]</font>} },
    white => sub { return qq{<font color="#ffffff">$_[0]</font>} },
);

This approach lets you print out pretty HTML:

print $dispatch{black}->('knight'),

Of course, this only works as long as the keys you use are fixed strings, because the hash lookup relies on string equality.

A regular expression that contains meta-characters (such as \d or [abc]) can match strings, but the string matched is not equal (in the sense of string equality) to the regular expression. In other words, this reasonable-looking code just does not work:

my %dispatch =
(
  # note that backslashes need to be "doubled up"
  '\\d'   => sub { return "saw a digit" },
  '[a-z]' => sub { return "saw a lowercase letter" },
);

Looking up $dispatch{5} won’t find anything. Being able to make it work would be very useful; Regexp::Assemble will let you do just that.

The hack

The idea is to gather all the different keys of the dispatch table and assemble them into a single regular expression. Given such an expression, you can then apply it to a target string and see what matches.

Even better, specifying a tracked pattern lets you find out after the match which pattern from the dispatch triggered the match. Once you have this, use it as a key into the dispatch table and call the corresponding code block. The more keys there are, the better the situation becomes, because instead of running down a long chain of regular expression matches in an if/elsif/elsif chain sequentially, you need only one match to try them all at once.

At the simplest, assemble the keys in the above dispatch table into a single tracked regular expression with:

my $re = Regexp::Assemble->new->track->add(keys %dispatch);

You can then use this to process a file with a loop as simple as:

while (<>)
{
    $re->match($_) and print $dispatch{$re->matched}->( );
}

Running the Code

As an example, consider an IRC bot. You may wish to program a bot to react to many different messages observed on a channel. Ordinarily, you might do this with a mini-parser running through a list of regular expressions. Regexp::Assemble allows you to use a dispatch table instead.

All that you need is a hash whose keys are regular expressions (or to be precise, scalars usable as regexps), and whose values are code references.

First assemble the hash keys, and then match the resulting expression against incoming messages on an IRC channel. When a match occurs, recover the original regexp and use it to look up the code reference in the dispatch table and call that, passing in the captured variables that the pattern specified.

Here’s a bare-bones IRC bot that has just enough smarts to keep track of karma (foo++, bar--) and factoids (for instance, the association that TPF is The Perl Foundation, so when someone asks “TPF?”, the bot responds with the definition).

The instantiating code is very short, thanks to Bot::BasicBot:

use DispatchBot;

my $bot = DispatchBot->new(
    server   => "irc.perl.org",
    port     => "6667",
    channels => ["#bottest"],
    nick     => 'rebot',
);
$bot->run( );

The package DispatchBot is where everything all happens:

package DispatchBot;

use strict;
use Regexp::Assemble;
use Bot::BasicBot;
use YAML qw(LoadFile DumpFile);

use vars qw( $VERSION @ISA );
$VERSION    = '0.03';
@ISA        = 'Bot::BasicBot';

my $factoid = _load( 'factoid.dat' ); # "foo" is "bar" factoids
my $karma   = _load( 'karma.dat' );   # keep track of foo++ and foo--

sub _load
{
    my $file = shift;
    return -e $file ? LoadFile($file) : { };
}

sub _save
{
    my ($dictionary, $file) = @_;
    DumpFile( $file, $dictionary );
}

sub _flush
{
    _save( $factoid, 'factoid.dat' );
    _save( $karma,   'karma.dat' );
}

END { _flush }

my %dispatch =
(
    # define a factoid
    '(\\S+) is (.*)$' => sub { $factoid->{$_[0]} = $_[1]; _flush; return },

    # query a factoid
    '(\\S+)\s*\\?$' => sub
    {
        exists $factoid->{$_[0]}
            and return "I believe that $_[0] is $factoid->{$_[0]}"
    },

    # drop a factoid
    'forget (\\S+)$'=> sub
    {
        if (exists $factoid->{$_[0]})
        {
            my $message = "I forgot $_[0]";
            delete $factoid->{$_[0]};
            _flush;
            return $message;
        }
    },

    # karma shifts
    '(\\S+)\\+\\+' => sub { $karma->{$_[0]}++; _flush; return },
    '(\\S+)--'     => sub { $karma->{$_[0]}--; _flush; return },

    # karma query
    '^karma (\\S+)$' => sub
    {
        return exists $karma->{$_[0]}
            ? "$_[0] has karma of $karma->{$_[0]}"
            : "$_[0] has neutral karma"
    },

    # time... to die
    '^!quit$' => sub { exit },
);

my $re = Regexp::Assemble->new->track->add(keys %dispatch);

sub said
{
    my ($self, $arg) = @_;
    $re->match($arg->{body})
        and return $dispatch{$re->matched}->($re->capture);
    return;
               
               
}

Track Your Approximations

Avoid rounding errors; get the right results.

Floating-point numbers are inherently approximate. Perl represents numbers in the most accurate way it can on your hardware, but typically it can’t do better than about 16 significant digits. That means that a calculation you think is accurate:

my $dx   = 21123059652674105965267410596526741059652674100000000;
my $rate = 1.23e12;

my $end  = ( 23 * $dx - $rate * 230 - 2.34562516 ** 2 - 0.5 ) ** 0.33;

may in fact not be precise.

On a 32-bit machine, the various floating-point approximations introduced along the way may mean the final value of $end is inaccurate by about 937. Of course, the correct value of $end is approximately 520642400412471062.6461124479995125761153. If $end is your annual profit margin, you may not really care if it’s off by a thousand dollars or so. On the other hand, if the same calculation were a trajectory for a lunar lander, turning off the retro-rockets 937 meters too high might matter a great deal.

How can you make sure your calculations aren’t fatally inaccurate? The easiest way is to let Perl do it for you. To accomplish that, you need to change the way floating-point numbers work. Easy.

Interval interlude

Interval arithmetic is a simple technique for tracking the accuracy of numeric computations. Instead of representing each value as a single number, encode it as a range: minimum possible value to maximum possible value.

For example, most platforms can’t represent the number 1.2345678901234567890 exactly. Interval arithmetic would encode it as the range [1.23456789012345, 1.23456789012346], assuming those two values are the closest lower and upper bounds that your machine can represent. On the other hand, if your machine could represent the number 1.23456, then it would encode it as [1.23456, 1.23456], with identical lower and upper bounds as there is no uncertainty about the actual value.

Once every number is properly encoded, any unary operations on the number are applied to both the minimal and maximal values, producing a new range that encodes the result. For example, the operation:

sqrt( [1.2, 1.3] )

yields:

[1.095445, 1.140175]

being the square roots of the minimal and maximal values. Logically, the true square root of the true value must lie somewhere in that new range.

Binary operations are more complex. They have to be performed on every possible combination of the minimal and maximal values of each operand. Then the minimal and maximal outcomes are used as the encoding of the result. For example, the multiplication:

[1.2, 1.3] * [-1, 0.9]

produces the result:

[-1.3, 1.17]

because:

1.2 *  -1 → -1.2
1.3 *  -1 → -1.3   (minimal value)
1.2 * 0.9 →  1.08
1.3 * 0.9 →  1.17  (maximal value)

The advantage of interval arithmetic is that, provided you’re careful about rounding errors, the exact result is always guaranteed to lie somewhere in the interval you produce. Intervals also make it easy to estimate how accurate your computation is: the smaller the interval, the more precise the answer.

You can also think of intervals as: average value( half-interval). So you could also write the operation:

[1.2, 1.3] * [-1, 0.9]  →  [-1.3, 1.17]

as:

1.25(0.05) * -0.05(0.95)  →  -0.065(1.235)

This representation gives a clear indication of how the accuracy of the approximation changes under different operations, and how the uncertainty in the result grows over time.

Teaching Perl to think in intervals

To make Perl track the accuracy of your floating-point calculations, you first have to convince it to represent every floating point number as an interval:

package Number::Intervals;

# Compute maximal error in the representation of a given number...
sub _eps_for
{
    my ($num, $epsilon) = (shift) x 2;              # copy arg to both vars
    $epsilon /= 2 while $num + $epsilon/2 != $num;  # whittle epsilon down
    return $epsilon;
}

# Create an interval object, allowing for representation errors...
sub _interval
{
    use List::Util qw( min max );
    my ($min, $max) = ( min(@_), max(@_) );
    return bless [$min - _eps_for($min), $max + _eps_for($max)], __PACKAGE__;
}

# Convert all floating-point constants to interval objects...
sub import
{
    use overload;

    overload::constant(
        float => sub
        {
            my ($raw, $cooked) = @_;
            return _interval($cooked);
        },
    );
}

When your code uses Number::Intervals, its import( ) will call the constant( ) subroutine from the standard overload module. That subroutine does exactly what its name suggests: it overloads the standard handling of constants with new behaviors. In this case, you’re overloading the handling of floating-point constants by providing a subroutine that will be called every time a literal floating-point value appears in your program.

That handler subroutine receives two arguments: a string containing the raw source code that defined the constant ($raw), and a numeric value that is how Perl would normally interpret that source code definition ($cooked). The subroutine should return an object to use instead of the constant value.

In this instance, the handler just returns the corresponding interval object for the cooked value, as provided by _interval($cooked). That function determines the minimal and maximal values it receives and uses them as the lower and upper bounds of the resulting range. Note that it also subtracts the smallest possible amount (_eps_for($min)) from the lower bound and adds the smallest possible amount (_eps_for($max)) to the upper bound.

Adding and subtracting these epsilon values doesn’t produce the smallest possible interval representing the original value, but the interval it does produce has three essential advantages: it’s trivial to compute, it’s guaranteed to correctly bound any number passed to _interval( ) (regardless of the rounding scheme your floating-point implementation uses), and it still produces the second-smallest possible interval representing the original number.

Of course, this isn’t the end of the story. Depending on how you use Number::Intervals objects, you may get the wrong results. You can fix that too, though, in “Overload Your Operators” [Hack #100].

Overload Your Operators

Make your objects look like numbers, strings, and booleans sensibly.

Few people realize that Perl is an operator-oriented language, where the behavior of data depends on the operations you perform on it. You’ve probably had the experience of inadvertently stringifying an object or reference and wondering where and why you suddenly see memory addresses.

Fortunately, you can control what happens to your objects in various contexts.

Consider the Number::Intervals module from “Track Your Approximations” [Hack #99]. It’s useful, but as shown there it has a few drawbacks.

The effect of the import( ) subroutine is that any code that declares:

use Number::Intervals;

will thereafter have every floating-point constant replaced by a Number::Intervals object that encodes upper and lower bounds on the original constant. That impressive achievement (utterly impossible in most other programming languages) will, sadly, be somewhat undermined when you then write:

use Number::Intervals;

my $avogadro    = 6.02214199e23;   # standard physical constant
my $atomic_mass = 55.847;          # atomic mass of iron
my $mass        = 100;             # mass in grams

my $count       = int( $mass * $avogadro/$atomic_mass );

print "Number of atoms in $mass grams of iron = $count\n";

The unfortunate result is:

$ perl count_atoms.pl

Number of atoms in 100 grams of iron = 99

Iron atoms are heavy, but they’re not that heavy. The correct answer is just a little over 1 million billion billion, so converting to intervals appears to have made the calculation noticably less accurate.

The problem is that the import( ) code you implemented to reinterpret Perl’s floating-point constants did just that. It converted those constants into interval objects; that is, into references to blessed arrays. When you multiply and divide those interval objects, Perl converts the corresponding array references to integer addresses, which it then multiplies and divides. The calculation:

$mass * $avogadro / $atomic_mass

becomes something like:

100 * 0x1808248 / 0x182dc10

which is:

100 * 25199176 / 25353232

which is where the spurious 99 came from.

Somehow, you need to teach Perl not only how to convert floating-point numbers to interval objects, but also how to compute sensibly with those objects.

The Hack

The trick, of course, is to overload the arithmetic operators that will apply to Number::Intervals objects by using the overload pragma:

# Overload operators for Number::Intervals objects...
use overload
(
    # Add two intervals by independently adding minima and maxima...
    q{+} => sub
    {
        my ($x, $y) = _check_args(@_);
        return _interval($x->[0] + $y->[0], $x->[1] + $y->[1]);
    },

    # Subtract intervals by subtracting maxima from minima and vice versa...
    q{-} => sub
    {
        my ($x, $y) = _check_args(@_);
        return _interval($x->[0] - $y->[1], $x->[1] - $y->[0]);
    },

    # Multiply intervals by taking least and greatest products...
    q{*} => sub
    {
        my ($x, $y) = _check_args(@_);
        return _interval($x->[0] * $y->[0], $x->[1] * $y->[0],
                         $x->[1] * $y->[1], $x->[0] * $y->[1],
                        );
    },

    # Divide intervals by taking least and greatest quotients...
    q{/} => sub
    {
        my ($x, $y) = _check_args(@_);
        return _interval($x->[0] / $y->[0], $x->[1] / $y->[0],
                         $x->[1] / $y->[1], $x->[0] / $y->[1],
                        );
    },

    # Exponentiate intervals by taking least and greatest powers...
    q{**} => sub
    {
        my ($x, $y) = _check_args(@_);
        return _interval($x->[0] ** $y->[0], $x->[1] ** $y->[0],
                         $x->[1] ** $y->[1], $x->[0] ** $y->[1],
                        );
    },

    # Integer value of an interval is integer value of bounds...
    q{int} => sub
    {
        my ($x) = @_;
        return _interval(int $x->[0], int $x->[1]);
    },

    # Square root of interval is square roots of bounds...
    q{sqrt} => sub
    {
        my ($x) = @_;
        return _interval(sqrt $x->[0], sqrt $x->[1]);
    },

    # Unary minus: negate bounds and swap upper/lower:
    q{neg} => sub
    {
        my ($x) = @_;
        return _interval(-$x->[1], -$x->[0]);
    },

    # etc. etc. for the other arithmetic operators...
);

The overload module expects a list of key/value pairs, where each key is the name of an operator and each value is a subroutine that implements that operator. Once they’re installed, each of the implementation subroutines will be called whenever an object of the class is an argument to the corresponding operator.

Unary operators (including int, neg, and sqrt) receive the operand object as their only argument; binary operators (like +, *, and **) receive three arguments: their two operands and an extra flag indicating whether the operands appear in reversed order (because the first operand wasn’t an object). Binary operators therefore need to check, and sometimes unreverse, their arguments, which the _check_args( ) subroutine does for them:

# Flip args if necessary, converting to an interval if not already...
sub _check_args
{
    my ($x, $y, $reversed) = @_;

    return $reversed              ?  ( _interval($y), $x            )
         : ref $y ne __PACKAGE__  ?  ( $x,            _interval($y) )
         :                           ( $x,            $y            );
}

Note that this utility subroutine also converts any non-interval arguments (integers, for example) to interval ranges. This means that, after calling _check_args( ), all of the binary handlers can be certain that their operands are in the correct order and that both operands are proper interval objects. This greatly simplifies the implementation of the overloaded operators. In particular, they don’t need to implement three separate sets of logic for handling interval/number, number/interval, and interval/interval interactions.

Saying what you mean

Reimplementing the necessary operators enables you to add, subtract, multiply, divide, and so on, interval representations correctly. However, even with the overloading in place, the results of counting the atoms are still more ironic than ferric:

$ perl count_atoms_v2.pl

Number of atoms = Number::Intervals=ARRAY(0x182f89c)

The problem is that, although Perl now knows how to do operations on interval objects, it still has no idea how to convert those interval objects back to simple numbers, or to strings. When you try to print a floating-point interval object, it prints the string representation of the object reference, rather than the string representation of the value that the object represents.

Fortunately, it’s easy to tell the interpreter how to convert intervals back to sensible numbers and strings. Just give the Number::Intervals class two extra handlers for stringification and numerification, like this:

use overload
(
    # Stringify intervals as: VALUE (UNCERTAINTY)...
    q{""} => sub
    {
        my ($self) = @_;

        my $uncert = ($self->[1] - $self->[0]) / 2;

        use charnames qw( :full );
        return $self->[0]+$uncert . " (\N{PLUS-MINUS SIGN}$uncert)";
    },

    # Numerify intervals by averaging their bounds (with warning)...
    q{0+} => sub
    {
        my ($self) = @_;
        carp "Approximating interval by a single (averaged) number";
        return ($self->[0] + $self->[1]) /2;
    },
);

With that back-translation in place, the floating point calculations can finally proceed correctly, with their accuracy being automatically tracked and reported as well:

$ perl count_atoms_v3.pl

Number of atoms = 1.07832864612244e+24 (805306368)

Learn from Obfuscations

Learn more about Perl from the play of others.

Perl has a reputation for serious play. Think of Perl golf (solving problems in the fewest characters possible), JAPHS (printing a simple message in creative ways), and obfuscation (writing odd code that does surprising things). Though you’d never use these tricks in production code, producing such creative programs requires careful study and exploration—both tricks of good hackers.

Exploring obfuscation can also expand your Perl skills.

Consider an obfuscation I posted at Perl Monks (http://www.perlmonks.org/index.pl?node_id=77619; the link includes a deconstruction and explanation by Guildenstern). It is a non-traditional JAPH that is self-referential. Sort of. The use of a variable called pi, the use of the sin function, and the visual layout of the code all hint at what the output will be. The irony of course is that while the layout helps you know what to expect, it actually hinders understanding.

#!/usr/bin/perl                              # how to (ab)use substr
use warnings;
use strict;

my $pi='3.14159210535152623346475240375062163750446240333543375062';

     substr      ($^X,0)=
       substr    ($pi,-6);map{
         substr  ($^X,$.++,1)=chr(
          substr ($pi,21,2)+
          substr ($pi,$_,2))}(12,28,-18,-6,-10,14);map{$^O=$"x(
         substr  ($pi,-5,2));
       substr    ($^O,sin(++$a/8)*32+
     substr      ($pi,-2)/2+1,1)=$_;
   substr        ($^O,sin($a/4)*(
 substr          ($pi,2,2))+
substr           ($pi,-7,-5)-1,1)=$_;print"$^O$/";eval($^X.('$b,'x3).
substr           ($pi,-3,1).'.'.
 substr          ($pi,9,2));}(map{chr($_+
   substr        ($pi,21,2))}(
     substr      ($pi,8)x6)=~/../g);

“So”, you may think, “what could be the pedagogical value of this rather ridiculous piece of code?”. I believe its value lies in its ability to raise questions that cause the curious to seek out answers.

Complete beginners might inquire:

l00k5 k3w1 d00d!!! Wh4t l4ngu4g3 15 th4t?
Is this really a computer program that runs and does something?

Those with a little exposure to Perl may ask:

Does Perl allow such bizarre formatting without throwing errors?
Can you really create a number with that many digits of accuracy?
What does substr do? Is it like the C function?

More experienced programmers may wonder:

How does the animation work with only one print statement and no for or while loops?
You can put substr on the left hand side?
Are there both two and three argument forms of substr?
I’ve figured out that there’s a select call in the program, but what’s it doing?
How come use strict; and use warnings; don’t complain? Aren’t they supposed to ensure the quality of your code?
Why didn’t he have to declare $a and $b using my?

As for myself, the value of this obfu was in the enjoyment I received from creating and sharing it with the Perl community, and in the comments and feedback I received because of it. I consider the few hours that it took to write it time well spent.

When you come across Perl play in your travels, certainly do take apart the code with B::Deparse and pore through perldoc perlfunc and perldoc perlvar to find out what’s going on. Then take a step back to ask the deeper questions and ponder how you too can assemble your own creations from Perl’s rich vocabulary.

Happy coding!

^[1]Actually, that might be a good sign!

^[2]See perlfaq7 if you’ve really never asked this.

^[3]As if anyone would ever misspell “referrer.”

^[4]“One of God’s own prototypes. Some kind of high-powered mutant never even considered for mass production. Too weird to live, and too rare to die.” —Hunter S. Thompson

^[5]See “Prototypes” in the Perldoc perlsub.

^[6]See “Understand What Happens When” [Hack #70] to learn why.

^[7]On a Unix system, the file is actually libm.so. On a Windows system, it’s probably math.dll. Mac OS X likely refers to it as libm.dylib.

^[8]The standard port for HTTPS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Expand Your Perl Foo

Create new playlist

Sign In

Sign Up

Chapter 9. Expand Your Perl Foo

Double Your Data with Dualvars

The Hack

Running the Hack

Hacking the Hack

Replace Soft References with Real Ones

Tip

The Hack

Hacking the Hack

Optimize Away the Annoying Stuff

The Hack

Hacking the Hack

Lock Down Your Hashes

The Hack

Running the Hack

Tip

Clean Up at the End of a Scope

The Hack

Tip

Hacking the Hack

Invoke Functions in Odd Ways

The Hack

Make a Bareword invoke a function

Tie a scalar variable to a function

Tie an array variable to a function

Tie a hash variable to a function

Add a function-calling layer to filehandles

Glob Those Sequences

The Hack

Tip

Running the Hack

Hacking the Hack

Write Less Error-Checking Code

The Hack

Tip

Hacking the Hack

Return Smarter Values

Fine Distinctions

Name that Return Value

Out, Out, Damn Commas!

Return Active Values

The Hack

Running the Hack

Hacking the Hack

Add Your Own Perl Syntax

The Hack

Modify Semantics with a Source Filter

The Hack

Hacking the Hack

Use Shared Libraries Without XS

The Hack

Running the Hack

Hacking the Hack

Run Two Services on a Single TCP Port

The Hack

Running the Hack

Hacking the Hack

Improve Your Dispatch Tables

The hack

Running the Code

Track Your Approximations

Interval interlude

Teaching Perl to think in intervals

Overload Your Operators

The Hack

Saying what you mean

Learn from Obfuscations

Table of Contents for
9. Expand Your Perl Foo