The Whys and Wherefores of Pattern Matching

Pattern matching is the technique of searching a string containing text or binary data for some set of characters based on a specific search pattern. When you search for a string of characters in a file using the Find command in your word processor, or when you use a search engine to look for something on the Web, you're using a simple version of pattern matching: your criteria is “find these characters.” In those environments, you can often customize your criteria in particular ways, for example, to search for this or that, to search for this or that but not the other thing, to search for whole words only, or to search only for those words that are 12 points and underlined. As you've seen from the regular expressions I've already explained, pattern matching in Perl, however, can be even more complicated than that. Using Perl, you can define an incredibly specific set of search criteria, and do it in an incredibly small amount of space using a pattern-definition minilanguage called regular expressions.

Perl's regular expressions, often called just regexps or REs, borrow from the regular expressions used in many Unix tools, such as grep(1) and sed(1). As with many other features Perl has borrowed from other places, however, Perl includes slight changes and lots of added capabilities. If you're used to using regular expressions, you'll be able to pick up Perl's regular expressions fairly easily because most of the same rules apply (although there are some gotchas to be aware of, particularly if you've used sophisticated regular expressions in the past).

Note

The term regular expressions might seem sort of nonsensical. They don't really seem to be expressions, nor is it easy to figure out what's regular about them. Don't get hung up on the term itself; regular expression is a term borrowed from mathematics that refers to the actual language with which you write patterns for pattern matching in Perl.


I used the example of the search engine and the Find command earlier to describe the sorts of things that pattern matching can do. It's important for you not to get hung up on thinking that pattern matching is only good for plain old searching. The sorts of things regular expressions can do in Perl include

  • Making sure your user has entered the data you're looking for—input validation.

  • Verifying that input is in the right specific format, for example, that e-mail addresses have the right components.

  • Extracting parts of a file that match a specific criteria (for example, you could extract the headings from a file to build a table of contents, or extract all the links in an HTML file).

  • Splitting a string into elements based on different separator fields (and often, complex nested separator fields).

  • Finding irregularities in a set of data—multiple spaces that don't belong there, duplicated words, errors in formatting.

  • Counting the number of occurrences of a pattern in a string.

  • Searching and replacing—find a string that matches a pattern and replace it with some other string.

This is only a partial list, of course—you can apply Perl's regular expressions to all kinds of tasks. Generally, if there's a task for which you'd want to iterate over a string or over your data in another language, that task is probably better solved in Perl using regular expressions. Many of the operations you learned about yesterday for finding bits of strings can be better done with patterns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset