A Searchable Address Book (address.pl)

Our first script today consists of two parts:

  • A simple address book file, containing names, addresses, and phone numbers

  • A Perl script that prompts you for things to search for, and then prints out any matching addresses

This script makes use of just about everything you've learned so far this week: scalar and hash data, conditionals, loops, input and output, subroutines, local variables, and pattern matching. There's even a function call here and there to make things interesting. And so, without further ado, let's dive in.

How It Works

The address.pl script is called with a single argument: the address file, called address.txt. Call it on the command line as you have other Perl scripts:

% address.pl address.txt
						

The first thing the address book script does is prompt you for what you want to search:

Search for what? Johnson
						

The search pattern you give to address.pl can be in several different forms:

  • Single words, such as the Johnson in the preceding example.

  • Multiple words (John Maggie Alice). Any addresses that match any of those words will be printed (equivalent to an OR search).

  • Multiple words separated by AND or OR (in upper or lowercase). Boolean searches behave as logical operators do in Perl, and are tested left to right. (Note that AND searches only make sense when matched inside a single address; they will not match across multiple addresses the way OR searches will.)

  • Multiple words surrounded by quotes ("this that") are treated as a single search pattern. Spaces are relevant in this case.

  • Pattern-matching characters are accepted and processed as regular expressions (don't include the // around the patterns).

So, for example, in my sample address.txt file, the search for the word Johnson returned this output:

*********************
Paul Johnson
212 345 9492
234 33rd St Apt 12C, NY, NY 10023
http://www.foo.org/users/don/paul.html
*********************
Alice Johnson
(502) 348 2387
(502) 348 2341
*********************
Mary Johnson
(408) 342 0999
(408) 323 2342
[email protected]
http://www.mjproductions.com
*********************
						

Note

In generating this sample address file, I made up all names, addresses, phone numbers, and Web pages. Any similarity between this data and any persons living or dead is coincidental.


The Address File

The core of the address book (and something you'll have to generate yourself if you want to use this script) is a file of addresses in a specific format that Perl can understand. You could consider this a simple text database, and write Perl scripts to add and delete records (addresses) to or from that database.

The format of the address book file looks like this, generically:

Name: Name
Phone: number
Fax: number
Address: address
Email: email address
URL: Web URL
---

For example:

Name: Paul Johnson
Phone: 212 345 9492
Address: 234 33rd St Apt 12C, NY, NY 10023
URL: http://www.foo.org/users/don/paul.html
---

Each record has a list of fields (Name, Phone, Fax, Address, E-mail, and URL, although not all of them are required), and ends with three dashes. The field names (Name, Phone, URL, and so on) are separated from their values with a colon and a space. The values do not have to be in any specific format. Although you can include other field names in the database, and search keys will search those extra fields, those fields will be ignored in the final printed output (but if you had to have an extra field, say, for a pager number, you could always modify the script. Perl is easy that way).

You can have as many addresses in the address.txt file as you like; the larger the address book, the longer it will take to find matching records, as each address is checked from start to finish in turn. Unless you have four or five million friends, you probably won't notice Perl working very hard.

Inside the Script

The address.pl script reads in the address.txt file, one address at a time, and then processes the search pattern for each of those addresses. The topmost part of the script is a while loop that does just this, which in turn works through five other subroutines to handle more complex parts of the script.

Let's start with this topmost part of the script. At this top level, we define three global variables:

  • %rec, which will hold the current address record, indexed by field name.

  • $search, the search pattern you enter at the prompt.

  • $bigmatch, whether a record was found anywhere in the address file that matched the search pattern (there are also local variables for whether the current record matches, but we'll get to those soon enough.

Step one in this outer part of the script is to prompt for the search pattern and store it in $search:

$search = &getpattern();        # prompt for pattern

The &getpattern() subroutine is the basic “read the input/chomp it/return the result” code you've seen all too often in this book so far:

sub getpattern {
    my $in = '';  # input
    print 'Search for what? ';
    chomp($in = <STDIN>);
    return $in;
}

Step two in the outer script is an infinite while loop that reads in a record, processes the search pattern, and prints it if it matches. That while loop looks like this:

while () {                      # range over address file
    my %rec = &read_addr();
    if (%rec) {         # got a record
        &perform_search($search, %rec);
    } else {            # end of address file, finish up
        if (!$bigmatch) {
            print "Nothing found.
";
        } else { print "*********************
"; }
        last;           # exit, we're done
    }

Inside the while loop, we call &read_addr() to read in a record, and if a record was found, we search for it by calling &perform_search(). At the end of the address file, if the $bigmatch variable is 0, that means no matches were found, and we can print a helpful message. At any rate, at the end of the address file, we call last to exit from the loop and finish the script.

Reading the Address

The &read_addr() subroutine is used to read in an address record. Listing 14.1 shows the contents of &read_addr().

Listing 14.1. The &read_addr() Subroutine
sub read_addr {
    my %curr = ();              # current record
    my $key = '';               # temp key
    my $val = '';               # temp value

    while (<>) {                # stop if we get to EOF
        chomp;
        if (!/^---/) {      # record seperator
            ($key, $val) = split(/: /,$_,2);
            $curr{$key} = $val;
        }
        else { last; }
    }
    return %curr;

}

In past examples of using a while loop with <>, we've read in and processed the entire file at once. This while loop is a little different; this one reads chunks of the file, and stops when it reaches a record separator (in this case, the string '---'). I use a regular expression that matches any lines that don't begin with '---'. The next time the &read_addr() subroutine is called, the while loop picks up where it left off in the address file. Perl has no problem with this stopping and restarting the input, and it makes it particularly convenient for reading and processing sections of a file as we have here.

That said, what this subroutine does is read in a line. If the line does not begin with '---', then it's the inside of a record, and that line gets split into the field name (Name:, Phone:, and so on) and the value. The call to the split function in line 9 is where this takes place; note that the 2 argument at the end of split means we'll only end up with two things overall. With the field name ($key) and the value ($val), you can start building up the hash for this address.

If the line that gets read is the end-of-record marker, then the if statement in line 8 drops to the else part in line 12, and the last command exits the loop. The result of this subroutine is a hash containing all the lines in the address, indexed by the field name.

Performing the Search

At this point in the script's execution, you have a search pattern, stored in the $search variable, and an address stored in the %rec variable. The next step is to move to the next part of our big while loop at the top of the script, where if %rec is defined (an address exists), then we call the &perform_search() subroutine to actually see if the pattern in $search can be matched to the address in %rec.

The &perform_search() subroutine is shown in Listing 14.2.

Listing 14.2. The &perform_search() subroutine
sub perform_search {
    my ($str, %rec) = @_;
    my $matched = 0;            # overall match
    my $i = 0;                  # position inside pattern
    my $thing = '';             # temporary word

    my @things = $str =~ /("[^"]+"|S+)/g;  # split into search items

    while ($i <= $#things) {
        $thing = $things[$i];   # search item, AND or OR
        if ($thing =~ /^or$/i) { # OR case
            if (!$matched) {    # no match yet, look at next thing
                $matched = &isitthere($things[$i+1], %rec);
            }
            $i += 2;            # skip OR and next thing
        }
        elsif ($thing =~ /^and$/i) { # AND case
            if ($matched) {     # got a match, need to check other side
                $matched = &isitthere($things[$i+1], %rec);
            }
            $i += 2;            # skip AND and next thing
        }
        elsif (!$matched) {     # no match yet
            $matched = &isitthere($thing, %rec);
            $i++;               # next!
        }
        else { $i++; }         # $match is found, move onto next thing
    }

    if ($matched) {             # all keys done, did we match?
        $bigmatch = 1;  # yes, we found something
        print_addr(%rec);      # print the record then
    }
}

That's one large subroutine, and quite complex, but it's not as awful as it looks. Starting from the top, then, this subroutine takes two arguments: the search pattern and the address hash. Note that because those values are stored in global variables, it's not necessary to pass these along into the subroutine via arguments; we could have referred to those global variables in the body of this subroutine. This strategy, however, makes the subroutine more self-contained in that the only data it deals with is that which it gets explicitly. You could, for example, copy and paste this subroutine into another search script without having to worry about renaming any variables.

The first real operation we do in this subroutine is in line 7, where we split the search pattern into its component parts. Remember that the search pattern can appear in many different ways, including nested quoted strings, ANDs and ORs, or just a list of keywords. Line 7 extracts each element from the search pattern and stores all the search “things” in the array @things (note that the regular expression has the g option at the end, and is evaluated in a list context—meaning the @things list will contain all the possible matches captured by the parentheses. What does that particular pattern match? There are two groups of patterns, separated by alternation (|). The first is this one:

"[^"]+"

Which, if you remember your patterns, is a double-quote, followed by one or more characters that are not a double-quote, followed by another closing quote. This pattern will match quoted strings in the search pattern such as "John Smith" or "San Francisco" and treat them as a single search element. This could also be written as "*+?", the question mark indicating that this is a nongreedy expression (thereby stopping at the first “ it finds, rather than the final one.

The second part of the pattern is simply one or more characters that are not whitespace (S). This part of the pattern matches any single words, such as AND or OR, or single keywords. Between these two patterns, a long complex pattern such as "San Jose" OR "San Francisco" AND John will break into the list ("San Jose", OR, "San Francisco", AND, John).

With all our search objects in a list, the hard part is to work through that list, search the address when necessary, and deal with the logical expressions. This all takes place in the big while loop that starts in line 9, which keeps a placeholder variable $i for the current position in the pattern, and loops over the pattern until the end. Throughout the while loop, the $matched variable keeps track of whether any particular part of the pattern has matched the record. We start with a 0—false—for no match yet.

Inside the while loop, we start in line 10 by setting the variable $thing to the current part of the pattern we're examining, just as a shorthand. Then, there are four major tests:

  • If the current thing is an OR, then we're in the middle of two tests, one of which has already occurred and returned either true or false depending on the value of $matched. If $matched was true, then the thing on the left side was a match, and there's no point to actually trying the thing on the right (yes, it's a short-circuiting OR). If the thing on the left didn't match, the $matched variable will be 0, and we have to test the thing on the right. That's what line 13 does; it calls the &isitthere() subroutine to actually search for a search pattern, giving it an argument of the right side of the OR (the next thing in the @things array) and the record itself (%rec).

    Whether there was a match or not, this test handles both the OR itself and the pattern on the right of the OR, so we can skip two elements forward in the @things array. Line 15 increments the $i counter to do just that.

  • If the current thing is an AND, we trigger the test in line 17. This section operates in much the same way as the OR did, with one exception; it short-circuits in the other way. Remember, given a test x AND y, if x is false then the entire expression is false. If x is true, you still have to test to see if y is also true. That's how this test works; if the $matched variable is true, then the left side of the AND was true, and we call &isitthere() to test the right side. Otherwise, we do nothing, and in either case we just skip the AND and the right side of the AND ($i+=2, line 21) and move on.

  • At line 23, we've covered ANDs and ORs, so the thing we're looking at must be an actual search pattern. It could be a single search pattern, or it could be one of many search patterns contained in a string. Doesn't matter; each one can be treated individually. But we only need to actually search for it if we haven't already found a match (remember, multiple searches are treated as OR tests, so if one matches we're all set for the others). So, in line 23, if we haven't found a match, we search for that thing using &isitthere().

  • The final case covers a $thing that's an actual search key, and $matched is true. We don't need to actually do anything here because the match has already been made. So, increment the position in the pattern by one and restart the loop.

If you can follow all of that, you've made it through the hardest part of this script, by far. If it's still confusing you, try working through various search patterns, with single search elements, elements separated with ANDs and ORs, and patterns with multiple search keys. Watch the values of $i and $matched as the loop progresses (when you learn how to use the Perl debugger, this will be easy, but you can do it on paper by hand as well).

So what happens in the mysterious &isitthere() subroutine that gets called throughout that big while loop? That's where the actual searching takes place, given a pattern and the record. I'm not going to show you the contents of &isitthere() itself (you can see it in the full code printout in Listing 14.3), other than to note that it simply loops through the contents of the address hash and compares the pattern to each line using a regular expression. If it matches, the subroutine returns 1, and returns 0 if it doesn't match.

In the last part of the subroutine, all the parts of the pattern have been read, some amount of searching has taken place, and now we know whether the pattern matched the record or not. In lines 30 through 33, we test to see if a match was made, and if it was we set the $bigmatch variable (we found at least one address that matched), and call &print_addr() to print the actual address.

Printing the Record

It's all downhill from here. The last subroutine in the file is one that's only called if a match was made. The &print_addr() subroutine simply loops over the record hash and prints out the values to display the address record:

sub print_addr {
    my %record = @_;
    print "*********************
";
    foreach my $key (qw(Name Phone Fax Address Email URL)) {
        if (defined($record{$key})) {
            print "$record{$key}
";
        }
    }

}

The only interesting part of this subroutine is the list of keys in the foreach loop. I've listed the specific keys here in this order (and quoted them using the qw function) so that the output will print in a specific order. The keys in a hash are not stored in any reliable order, so I have to take measures like this one. It also lets us print only the lines that were actually available—the call to defined inside the foreach loop makes sure that only those fields that existed in the record get printed.

The Code

Got it? No? Sometimes seeing all the code at once can help. Listing 14.3 shows the full code for address.pl. If you've downloaded the source from this book's Web site at http://www.typerl.com, the code there has many more comments to help you figure out what's going on.

Note

As I mentioned yesterday in the section on my variables, some versions of Perl might have difficulties with this script's use of my variables and foreach loops. To get around this problem, simply predeclare the foreach variable before using it, like this:

my $key = 0;
foreach $key (qw(Name Phone Fax Address Email URL)) { ...


Listing 14.3. The Code for address.pl
1:  #!/usr/bin/perl -w
2:  use strict;
3:
4:  my $bigmatch = 0;               # was anything found?
5:  my $search = '';                # thing to search for
6:
7:  $search = &getpattern();        # prompt for pattern
8:
9:  while () {                      # range over address file
10:     my %rec = &read_addr();
11:     if (%rec) {         # got a record
12:         &perform_search($search, %rec);
13:     } else {            # end of address file, finish up
14:         if (!$bigmatch) {
15:             print "Nothing found.
";
16:         } else { print "*********************
"; }
17:         last;           # exit, we're done
18:     }
19: }
20:
21: sub getpattern {
22:     my $in = '';  # input
23:     print 'Search for what? ';
24:     chomp($in = <STDIN>);
25:     return $in;
26: }
27:
28: sub read_addr {
29:     my %curr = ();              # current record
30:     my $key = '';               # temp key
31:     my $val = '';               # temp value
32:
33:     while (<>) {                # stop if we get to EOF
34:         chomp;
35:         if (!/^---/) {      # record seperator
36:             ($key, $val) = split(/: /,$_,2);
37:             $curr{$key} = $val;
38:         }
39:         else { last; }
40:     }
41:     return %curr;
42: }
43:
44: sub perform_search {
45:     my ($str, %rec) = @_;
46:     my $matched = 0;            # overall match
47:     my $i = 0;                  # position inside pattern
48:     my $thing = '';             # temporary word
49:
50:     my @things = $str =~ /("[^"]+"|S+)/g;  # split into search items
51:
52:     while ($i <= $#things) {
53:         $thing = $things[$i];   # search item, AND or OR
54:         if ($thing =~ /^or$/i) { # OR case
55:             if (!$matched) {    # no match yet, look at next thing
56:                 $matched = &isitthere($things[$i+1], %rec);
57:             }
58:             $i += 2;            # skip OR and next thing
59:         }
60:         elsif ($thing =~ /^and$/i) { # AND case
61:             if ($matched) {     # got a match, need to check other side
62:                 $matched = &isitthere($things[$i+1], %rec);
63:             }
64:             $i += 2;            # skip AND and next thing
65:         }
66:         elsif (!$matched) {     # no match yet
67:             $matched = &isitthere($thing, %rec);
68:             $i++;               # next!
69:         }
70:         else { $i++; }          # $match is found, move onto next thing
71:     }
72:
73:     if ($matched) {             # all keys done, did we match?
74:         $bigmatch = 1;          # yes, we found something
75:         print_addr(%rec);       # print the record then
76:     }
77: }
78:
79: sub isitthere {                 # simple test
80:     my ($pat, %rec) = @_;
81:     foreach my $line (values %rec) {
82:         if ($line =~ /$pat/) {
83:             return 1;
84:         }
85:     }
86:     return 0;
87: }
88:
89: sub print_addr {
90:     my %record = @_;
91:     print "*********************
";
92:     foreach my $key (qw(Name Phone Fax Address Email URL)) {
93:         if (defined($record{$key})) {
94:             print "$record{$key}
";
95:         }
96:     }
97: }
						

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset