Another Example: A Database of Artists and Their Works

Nested data structures work best for representing complex sets of data and allow you to do various things with that data. In this example, then, we'll look at a database of artists, some information about those artists, and their various works. To save space, we'll keep this example short. All this example does is

  • Read the artist data from a file into a complex nested data structure

  • Prompt for a search string

  • Print the data for that particular artist, given that search string

The data we'll look at in this example consists of an artist's first and last names, their birth and death dates, and a list of titles of their works. The artist's data is stored in an external file consisting of two lines per artist:

Monet,Claude,1840,1926
Woman With a Parasol:Field of Poppies:Camille at the Window:Water Lillies

The first line consists of the artist's personal data, separated by commas. The second line is the artist's works, separated by colons. The data file (which I've called artists.txt) contains a number of artists in this format.

The structure we'll read this information into is a hash of hashes with a nested array. The topmost hash is keyed by artist's last name. The extra artist data is a nested hash with the keys “FN,” “BD,” “DD,” and “works.” The value of the works key is, in turn, an array consisting of all the titles. Figure 19.5 shows how a single record (artist) of this structure might look and where each part of the data fits into that structure.

Listing 19.2 shows the code for this simple example. Before reading down to the discussion of this code, look carefully at the lines inside the while loop in the &read_input() subroutine (lines 21 through 35), and the dereferences in the &process() subroutine (lines 53 and 55).

Listing 19.2. The artists.pl Script
1:  #!/usr/ bin/perl -w
2:  use strict;
3:
4:  my $artdb = "artists.txt";              # name of artists database
5:  my %artists = ();                       # hash of artists, keyed by last name
6:
7:  &read_input();
8:  &process();
9:
10: sub read_input {
11:     my $in = '';                # temp input line
12:     my ($fn,$ln,$bd,$dd);       # last name, first name
13:                                 # date of birth, date of death
14:     my %artist = ();            # temp artist hash
15:
16:     open(FILE, $artdb) or die "Cannot open artist's database ($artdb): $!
";
17:
18:     while () {
19:         # name and dates on first line
20:         chomp($in = <FILE>);
21:         if ($in) {
22:             ($ln,$fn,$bd,$dd) = split(',',$in);
23:             $artist{FN}  = $fn;
24:             $artist{BD}  = $bd;
25:             $artist{DD}  = $dd;
26:
27:             chomp($in = <FILE>); # list of works in second line
28:             if ($in) {
29:                 my @works = split(':',$in);
30:                 $artist{works}  = @works;
31:             }  else { print "no works";}
32:
33:             # add a reference to the artist hash in the bigger
34:             # artists hash
35:             $artists{$ln}  = { %artist } ;
36:
37:         }  else { last; }         # end of DB
38:     }
39:
40: }
41:
42: sub process {
43:     my $input = '';
44:     my $matched = 0;
45:
46:     print "Enter an Artist's Name: ";
47:     chomp($input = <>);
48:
49:     foreach (keys %artists) {
50:         if (/$input/i  and !$matched) {
51:             $matched = 1;
52:             my $ref = $artists{$_} ;
53:             print "$_, $ref->{FN}  $ref->{BD} -$ref->{DD} 
";
54:             my $work = '';
55:             foreach $work (@{$ref->{works} } ) {
56:                 print "   $work
";
57:             }
58:         }
59:     }
60:     if (!$matched) {
61:         print "Artist $input not found.
";
62:     }
63: }
					

You may note that in this example I did exactly the reverse of the last example: I'm using a global variable to hold the global artists database, rather than keeping all variables local. How you organize your data and variables is your choice; in this case I'm using a global variable because the dereferences are complicated enough without adding another level of reference at the topmost level.

One other kind of odd thing I did in this example is to hard-code the name of the artists database into the script, rather than indicating the name of the database file on the command line. Once again, this is a question of programmer choice and how the script will be used; either way will work equally well (note, however, that I put the filename of the artists database right at the top of the script so it can be easily changed if necessary).

Let's look first at the &read_input() subroutine, which reads the artists database and fills our nested data structure with that data. The way I've approached this task is to create a temporary hash for the current artist, to fill up that hash with the data, and then to put that temporary hash into the larger hash with a reference.

We start in line 18 with a loop that reads in the artists database file, two lines at a time. The loop will be exited when there's no more data (as determined by the test in line 21). We'll start with the first line of data, which contains the artist's name and date information:

Monet,Claude,1840,1926

Line 22 splits this data into its component parts, and lines 23 and 25 put that data into a temporary hash (called %artist, not to be confused with the larger %artists hash).

Line 27 reads the second line of each artist's data, the list of works:

Woman With a Parasol:Field of Poppies:Camille at the Window:Water Lillies

In line 29, we split this line into list elements, based on a “:” separator character, and then store that list into the @works temporary array. In line 30 we add a reference to that array to the temporary %artist hash with the key “works.” Note that each time the while loop executes, we'll end up with a new @works temporary array (declared with my), so we'll avoid the problem of referencing the same memory location each time.

With the individual artist's data built, we can add that record to the larger artist's hash with the last name as the key. Line 35 does just that. Note in this instance that because we use the same %artist hash for each turn of the loop, we'll use an anonymous hash constructor and a copy of the %artist hash to make sure the reference points to a different memory location each time.

If &read_input() puts data into the nested hash, then &process() subroutine gets that data out again. Here we'll use a simple search on the artist's last name and print the matching record. The output that gets printed looks like this:

Enter an Artist's Name: Monet
Monet, Claude 1840-1926
   Woman With a Parasol
   Field of Poppies
   Camille at the Window
   Water Lillies
   The Artist's Garden at Giverny

The most important parts of this subroutine are the parts that dereference the references to get at the important data in lines 52 through 56. But let's back up a bit and start from line 49, the foreach loop. In this loop, because we don't have an actual loop variable, Perl will store each key (each artist's last name) in the $_ variable.

Line 50 is our core test: We use a pattern-match here with the input and the current key to see if a match was made. And, because we're only interested in the first match for this example, we'll also keep track of a $matched variable to see if we've already found a match.

Assuming a match was indeed found, we move to line 52. Here we'll create a temporary variable to hold the reference to the artist's data record—as in the stats example, not entirely necessary, but it makes it easier to manage references this way. In this case, because $_ holds the matched key, we can use a simple hash lookup to get the reference.

With the reference in hand, we can dereference it to gain access to the contents of the hash. In line 53 we print the basic data: the last name ($_), the first name (the value of the key “FN” in the hash), the date of birth (“BD”), and date of death (“DD”).

Lines 54 through 56 are used to print each of the artist's works on separate lines. The only odd part of these lines is the reference in the foreach loop. Let's look at that one in detail:

@{$ref->{works} }

Remember that what we have in $ref is a reference to a hash. The expression $ref->{works} dereferences that reference, and returns the value indicated by the key “works.” But that value is also a reference, this time a reference to an array. To dereference that reference, and end up with an actual array for the foreach loop to iterate over, you need the block syntax for dereferencing: @{}.

Figuring out references and how to get at the actual data you want can be a complex process. It helps to start from the outer data structure and work inward, using blocks where necessary and temporary variables where it's helpful. Examining different referencing expressions in the Perl debugger or with print statements can also go a long way toward helping create the right dereferences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset