So far, what we've seen in this chapter are simple, two-level, homogeneous data structures: each element contains the same kind of referent as all the other elements at that level. It certainly doesn't have to be that way. Any element can hold any kind of scalar, which means that it could be a string, a number, or a reference to anything at all. The reference could be an array or hash reference, or a pseudohash, or a reference to a named or anonymous function, or an object. The only thing you can't do is to stuff multiple referents into one scalar. If you find yourself trying to do that, it's a sign that you need an array or hash reference to collapse multiple values into one.
In the sections that follow, you will find code examples designed to illustrate many of the possible types of data you might want to store in a record, which we'll implement using a hash reference. The keys are uppercase strings, a convention sometimes employed (and occasionally unemployed, but only briefly) when the hash is being used as a specific record type.
Here is a record with six disparate fields:
$rec = { TEXT => $string, SEQUENCE => [ @old_values ], LOOKUP => { %some_table }, THATCODE => &some_function, THISCODE => sub { $_[0] ** $_[1] }, HANDLE => *STDOUT, };
The TEXT
field is a simple string, so you
can just print it:
print $rec->{TEXT};
SEQUENCE
and LOOKUP
are
regular array and hash references:
print $rec->{SEQUENCE}[0]; $last = pop @{ $rec->{SEQUENCE} }; print $rec->{LOOKUP}{"key"}; ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
THATCODE
is a named subroutine and
THISCODE
is an anonymous subroutine, but they're
invoked identically:
$that_answer = $rec->{THATCODE}->($arg1, $arg2); $this_answer = $rec->{THISCODE}->($arg1, $arg2);
With an extra pair of braces, you can treat
$rec->{HANDLE}
as an indirect object:
print { $rec->{HANDLE} } "a string ";
If you're using the FileHandle
module, you
can even treat the handle as a regular object:
use FileHandle; $rec->{HANDLE}->autoflush(1); $rec->{HANDLE}->print("a string ");
Naturally, the fields of your data structures can themselves be arbitrarily complex data structures in their own right:
%TV = ( flintstones => { series => "flintstones", nights => [ "monday", "thursday", "friday" ], members => [ { name => "fred", role => "husband", age => 36, }, { name => "wilma", role => "wife", age => 31, }, { name => "pebbles", role => "kid", age => 4, }, ], }, jetsons => { series => "jetsons", nights => [ "wednesday", "saturday" ], members => [ { name => "george", role => "husband", age => 41, }, { name => "jane", role => "wife", age => 39, }, { name => "elroy", role => "kid", age => 9, }, ], }, simpsons => { series => "simpsons", nights => [ "monday" ], members => [ { name => "homer", role => "husband", age => 34, }, { name => "marge", role => "wife", age => 37, }, { name => "bart", role => "kid", age => 11, }, ], }, );
Because Perl is quite good at parsing complex data
structures, you might just put your data declarations in a separate
file as regular Perl code, and then load them in with the
do
or require
built-in
functions. Another popular approach is to use a CPAN module (such as
XML::Parser
) to load in arbitrary data structures
expressed in some other language (such as XML).
You can build data structures piecemeal:
$rec = {}; $rec->{series} = "flintstones"; $rec->{nights} = [ find_days() ];
Or read them in from a file (here, assumed to be in
field=value
syntax):
@members = (); while (<>) { %fields = split /[s=]+/; push @members, { %fields }; } $rec->{members} = [ @members ];
And fold them into larger data structures keyed by one of the subfields:
$TV{ $rec->{series} } = $rec;
You can use extra pointer fields to avoid duplicate data. For
example, you might want a "kids
" field included
in a person's record, which might be a reference to an array
containing references to the kids' own records. By having parts of
your data structure refer to other parts, you avoid
the data skew that would result from updating the data in one place
but not in another:
for $family (keys %TV) { my $rec = $TV{$family}; # temporary pointer @kids = (); for $person ( @{$rec->{members}} ) { if ($person->{role} =~ /kid|son|daughter/) { push @kids, $person; } } # $rec and $TV{$family} point to same data! $rec->{kids} = [ @kids ]; }
The $rec->{kids} = [ @kids ]
assignment
copies the array contents--but they are merely references to
uncopied data. This means that if you age Bart as follows:
$TV{simpsons}{kids}[0]{age}++; # increments to 12
then you'll see the following result, because
$TV{simpsons}{kids}[0]
and
$TV{simpsons}{members}[2]
both point to the same
underlying anonymous hash table:
print $TV{simpsons}{members}[2]{age}; # also prints 12
Now, to print the entire %TV
structure:
for $family ( keys %TV ) { print "the $family"; print " is on ", join (" and ", @{ $TV{$family}{nights} }), " "; print "its members are: "; for $who ( @{ $TV{$family}{members} } ) { print " $who->{name} ($who->{role}), age $who->{age} "; } print "children: "; print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); print " "; }