We provided a peek at ANSI SQL pattern matching earlier with the query:
SELECT name FROM people WHERE name LIKE 'Stac%'
Using the LIKE
operator, we compared a column
value (name
) to an incomplete literal
('Stac%'
). MySQL supports the ability to place
special characters into string literals that match like wild cards.
The %
character, for example, matches any
arbitrary number of characters, including no character at all. The
above SELECT
statement would therefore match
Stacey
, Stacie
,
Stacy
, and even Stac
. The
character _
matches any single character.
Stac_y
would match only Stacey
.
Stac__
would match Stacie
and
Stacey
, but not Stacy
or
Stac
.
Pattern-matching expressions should never be used with the basic
comparison operators. Instead, they require the
LIKE
and NOT
LIKE
operators. It is also important to
remember that these comparisons are case-insensitive except on binary
columns.
MySQL supports a non-ANSI kind of pattern matching that is actually
much more powerful using the same kind of expressions to which Perl
programmers and grep users are accustomed. MySQL
refers to these as extended regular expressions.
Instead of LIKE
and NOT LIKE
,
these operators must be used with the
REGEXP
and NOT REGEXP
operators. MySQL provides synonyms for these:
RLIKE
and NOT
RLIKE
. Table 3-7 contains a
list of the supported extended regular expression patterns.
Pattern |
Description |
Examples |
Matches any single character. |
| |
Matches any character in the brackets. You can also match a range of characters. |
[a-zA-Z] matches values containing one instance of any character in the English (unaccented) portion of the Roman alphabet. | |
Matches zero or more instances of the character that precedes it. |
Los .*es matches values containing the strings “Los " and “es” with anything in between. [0-9]* matches values containing any arbitrary number. | |
What follows must come at the beginning of the value. |
| |
What precedes it must end the value. |
|
You should note a couple of important facts about extended regular expressions. Unlike basic pattern matching, MySQL extended regular expressions are case sensitive. They also do not require a match for the entire string. The pattern simply needs to occur somewhere within the value. Consider the following example:
mysql> SELECT * FROM BOOK;
+---------+-----------------------------------------+---------------+
| BOOK_ID | TITLE | AUTHOR |
+---------+-----------------------------------------+---------------+
| 1 | Database Programming with JDBC and Java | George Reese |
| 2 | JavaServer Pages | Hans Bergsten |
| 3 | Java Distributed Computing | Jim Farley |
+---------+-----------------------------------------+---------------+
3 rows in set (0.01 sec)
In this table, we have three books from
O’Reilly’s Java series. The
interesting thing about the Java series is that all books begin with
or end with the word “Java.” The
first sample query checks for any titles LIKE
'Java'
:
mysql> SELECT TITLE FROM BOOK WHERE TITLE LIKE 'Java';
Empty set (0.01 sec)
Because LIKE
looks for an exact match of the
pattern specified, no rows match—none of the titles are exactly
'Java'
. To find out which books start with the
word Java using simple patterns, we need to add a
%
:
mysql> SELECT TITLE FROM BOOK WHERE TITLE LIKE 'Java%';
+----------------------------+
| TITLE |
+----------------------------+
| JavaServer Pages |
| Java Distributed Computing |
+----------------------------+
2 rows in set (0.00 sec)
This query had two matches because only two of the books had titles
that matched Java%
exactly. The extended regular
expression matches, however, are not exact matches. They simply look
for the expression anywhere within the compared value:
mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP 'Java';
+-----------------------------------------+
| TITLE |
+-----------------------------------------+
| Database Programming with JDBC and Java |
| JavaServer Pages |
| Java Distributed Computing |
+-----------------------------------------+
3 rows in set (0.06 sec)
By simply changing the operator from LIKE
to
REGEXP
, we changed how it matches things.
Java
appears somewhere in each of the titles, so
the query returns all the titles. To find only the titles that start
with the word Java using extended regular expressions, we need to
specify that we are interested in the start:
mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP '^Java';
+----------------------------+
| TITLE |
+----------------------------+
| JavaServer Pages |
| Java Distributed Computing |
+----------------------------+
2 rows in set (0.01 sec)
The same thing applies to finding titles with Java at the end:
mysql> SELECT TITLE FROM BOOK WHERE TITLE REGEXP 'Java$';
+-----------------------------------------+
| TITLE |
+-----------------------------------------+
| Database Programming with JDBC and Java |
+-----------------------------------------+
1 row in set (0.00 sec)
The extended regular expression syntax is definitely much more complex than the simple pattern matching of ANSI SQL. In addition to the burden of extra complexity, you should also consider the fact that MySQL extended regular expressions do not work in most other databases. When you need complex pattern matching, however, they provide you with power that is simply unsupportable by simple pattern matching.