Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5.4. Find All Except a Specific Word

Problem

You want to use a regular expression to match any complete word except cat. Catwoman, vindicate, and other words that merely contain the letters “cat” should be matched—just not cat.

Solution

A negative lookahead can help you rule out specific words, and is key to this next regex:

(?!cat)w+

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Although a negated character class (written as ‹[^⋯]›) makes it easy to match anything except a specific character, you can’t just write ‹[^cat]› to match anything except the word cat. ‹[^cat]› is a valid regex, but it matches any character except c, a, or t. Hence, although ‹[^cat]+› would avoid matching the word cat, it wouldn’t match the word time either, because it contains the forbidden letter t. The regular expression ‹[^c][^a][^t]w*› is no good either, because it would reject any word with c as its first letter, a as its second letter, or t as its third. Furthermore, that doesn’t restrict the first three letters to word characters, and it only matches words with at least three characters since none of the negated character classes are optional.

With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:

     # Assert position at a word boundary.
(?!    # Not followed by:
  cat  #   Match "cat".
     #   Assert position at a word boundary.
)      # End the negative lookahead.
w+    # Match one or more word characters.

Regex options: Free-spacing, case insensitive

Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

The key to this pattern is its negative lookahead, ‹(?!⋯)›. The negative lookahead disallows the sequence cat followed by a word boundary, without preventing the use of those letters when they do not appear in that exact sequence, or when they appear as part of a longer or shorter word. There’s no word boundary at the very end of the regular expression, because it wouldn’t change what the regex matches. The ‹+› quantifier in ‹w+› repeats the word character token as many times as possible, which means that it will always match until the next word boundary.

When applied to the subject string categorically match any word except cat, the regex will find five matches: categorically, match, any, word, and except.

Variations

Find words that don’t contain another word

If, instead of trying to match any word that is not cat, you are trying to match any word that does not contain cat, a slightly different approach is needed:

(?:(?!cat)w)+

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

In the earlier section of this recipe, the word boundary at the beginning of the regular expression provided a convenient anchor that allowed us to simply place the negative lookahead at the beginning of the word. The solution used here is not as efficient, but it’s nevertheless a commonly used construct that allows you to match something other than a particular word or pattern. It does this by repeating a group containing a negative lookahead and a single word character. Before matching each character, the regex engine makes sure that the word cat cannot be matched starting at the current position.

Unlike the previous regular expression, this one requires a terminating word boundary. Otherwise, it could match just the first part of a word, up to where cat appears within it.

When applied to the subject string categorically match any word except cat, the regex will find four matches: match, any, word, and except.

Table of Contents for
5.4. Find All Except a Specific Word

5.4. Find All Except a Specific Word

Problem

Solution

Discussion

Variations

Find words that don’t contain another word

See Also

Table of Contents for 5.4. Find All Except a Specific Word

Create new playlist

Sign In

Sign Up

5.4. Find All Except a Specific Word

Problem

Solution

Discussion

Variations

Find words that don’t contain another word

See Also

Table of Contents for
5.4. Find All Except a Specific Word