Pattern Matching

We provided a peek at ANSI SQL pattern matching earlier with the query:

SELECT name FROM people WHERE name LIKE 'Stac%'

Using the LIKE operator, we compared a column value (name) to an incomplete literal ('Stac%'). MySQL supports the ability to place special characters into string literals that match like wild cards. The % character, for example, matches any arbitrary number of characters, including no character at all. The above SELECT statement would therefore match Stacey, Stacie, Stacy, and even Stac. The character _ matches any single character. Stac_y would match only Stacey. Stac__ would match Stacie and Stacey, but not Stacy or Stac.

Pattern-matching expressions should never be used with the basic comparison operators. Instead, they require the LIKE and NOT LIKE operators. It is also important to remember that these comparisons are case-insensitive except on binary columns.

MySQL supports a non-ANSI kind of pattern matching that is actually much more powerful using the same kind of expressions to which Perl programmers and grep users are accustomed. MySQL refers to these as extended regular expressions. Instead of LIKE and NOT LIKE, these operators must be used with the REGEXP and NOT REGEXP operators. MySQL provides synonyms for these: RLIKE and NOT RLIKE. Table 3-7 contains a list of the supported extended regular expression patterns.

Table 3-7. MySQL extended regular expressions

Pattern

Description

Examples

.

Matches any single character.

Stac.. matches any value containing the characters “Stac” followed by two characters of any value.

[]

Matches any character in the brackets. You can also match a range of characters.

[Ss]tacey matches values containing both “Stacey” and “stacey.”

[a-zA-Z] matches values containing one instance of any character in the English (unaccented) portion of the Roman alphabet.

*

Matches zero or more instances of the character that precedes it.

Ap*le matches values containing “Aple,” “Apple,” “Appple,” etc.

Los .*es matches values containing the strings “Los " and “es” with anything in between.

[0-9]* matches values containing any arbitrary number.

^

What follows must come at the beginning of the value.

^Stacey matches values that start with “Stacey.”

$

What precedes it must end the value.

cheese$ matches any value ending in the string “cheese.”

You should note a couple of important facts about extended regular expressions. Unlike basic pattern matching, MySQL extended regular expressions are case sensitive. They also do not require a match for the entire string. The pattern simply needs to occur somewhere within the value. Consider the following example:

mysql> SELECT * FROM BOOK;
+---------+-----------------------------------------+---------------+
| BOOK_ID | TITLE                                   | AUTHOR        |
+---------+-----------------------------------------+---------------+
|       1 | Database Programming with JDBC and Java | George Reese  |
|       2 | JavaServer Pages                        | Hans Bergsten |
|       3 | Java Distributed Computing              | Jim Farley    |
+---------+-----------------------------------------+---------------+
3 rows in set (0.01 sec)

In this table, we have three books from O’Reilly’s Java series. The interesting thing about the Java series is that all books begin with or end with the word “Java.” The first sample query checks for any titles LIKE 'Java':

mysql> SELECT TITLE FROM BOOK WHERE TITLE LIKE 'Java'; 
Empty set (0.01 sec)

Because LIKE looks for an exact match of the pattern specified, no rows match—none of the titles are exactly 'Java'. To find out which books start with the word Java using simple patterns, we need to add a %:

mysql> SELECT TITLE FROM BOOK WHERE TITLE LIKE 'Java%';
+----------------------------+
| TITLE                      |
+----------------------------+
| JavaServer Pages           |
| Java Distributed Computing |
+----------------------------+
2 rows in set (0.00 sec)

This query had two matches because only two of the books had titles that matched Java% exactly. The extended regular expression matches, however, are not exact matches. They simply look for the expression anywhere within the compared value:

mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP 'Java';
+-----------------------------------------+
| TITLE                                   |
+-----------------------------------------+
| Database Programming with JDBC and Java |
| JavaServer Pages                        |
| Java Distributed Computing              |
+-----------------------------------------+
3 rows in set (0.06 sec)

By simply changing the operator from LIKE to REGEXP, we changed how it matches things. Java appears somewhere in each of the titles, so the query returns all the titles. To find only the titles that start with the word Java using extended regular expressions, we need to specify that we are interested in the start:

mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP '^Java';
+----------------------------+
| TITLE                      |
+----------------------------+
| JavaServer Pages           |
| Java Distributed Computing |
+----------------------------+
2 rows in set (0.01 sec)

The same thing applies to finding titles with Java at the end:

mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP 'Java$';
+-----------------------------------------+
| TITLE                                   |
+-----------------------------------------+
| Database Programming with JDBC and Java |
+-----------------------------------------+
1 row in set (0.00 sec)

The extended regular expression syntax is definitely much more complex than the simple pattern matching of ANSI SQL. In addition to the burden of extra complexity, you should also consider the fact that MySQL extended regular expressions do not work in most other databases. When you need complex pattern matching, however, they provide you with power that is simply unsupportable by simple pattern matching.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset