Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.7. Numbers Within a Certain Range

Problem

You want to match an integer number within a certain range of numbers. You want the regular expression to specify the range accurately, rather than just limiting the number of digits.

Solution

1 to 12 (hour or month):

^(1[0-2]|[1-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1 to 24 (hour):

^(2[0-4]|1[0-9]|[1-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1 to 31 (day of the month):

^(3[01]|[12][0-9]|[1-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1 to 53 (week of the year):

^(5[0-3]|[1-4][0-9]|[1-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 59 (minute or second):

^[1-5]?[0-9]$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 100 (percentage):

^(100|[1-9]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1 to 100:

^(100|[1-9][0-9]?)$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

32 to 126 (printable ASCII codes):

^(12[0-6]|1[01][0-9]|[4-9][0-9]|3[2-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 127 (nonnegative signed byte):

^(12[0-7]|1[01][0-9]|[1-9]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

–128 to 127 (signed byte):

^(12[0-7]|1[01][0-9]|[1-9]?[0-9]|-(12[0-8]|1[01][0-9]|[1-9]?[0-9]))$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 255 (unsigned byte):

^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1 to 366 (day of the year):

^(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|[1-9][0-9]?)$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1900 to 2099 (year):

^(19|20)[0-9]{2}$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 32767 (nonnegative signed word):

^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|↵
[1-9][0-9]{1,3}|[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

–32768 to 32767 (signed word):

^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|↵
[1-9][0-9]{1,3}|[0-9]|-(3276[0-8]|327[0-5][0-9]|32[0-6][0-9]{2}|↵
3[01][0-9]{3}|[12][0-9]{4}|[1-9][0-9]{1,3}|[0-9]))$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

0 to 65535 (unsigned word):

^(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|↵
[1-9][0-9]{1,3}|[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The previous recipes matched integers with any number of digits, or with a certain number of digits. They allowed the full range of digits for all the digits in the number. Such regular expressions are very straightforward.

Matching a number in a specific range (e.g., a number between 0 and 255) is not a simple task with regular expressions. You can’t write ‹[0-255]›. Well, you could, but it wouldn’t match a number between 0 and 255. This character class, which is equivalent to ‹[0125]›, matches a single character that is one of the digits 0, 1, 2, or 5.

Tip

Because these regular expressions are quite a bit longer, the solutions all use anchors to make the regex suitable to check whether a string, such as user input, consists of a single acceptable number. Recipe 6.1 explains how you can use word boundaries or lookaround instead of the anchors for other purposes. In the discussion, we show the regexes without any anchors, keep the focus on dealing with numeric ranges. If you want to use any of these regexes, you’ll have to add anchors or word boundaries to make sure your regex doesn’t match digits that are part of a longer number.

Regular expressions work character by character. If we want to match a number that consists of more than one digit, we have to spell out all the possible combinations for the digits. The essential building blocks are character classes (Recipe 2.3) and alternation (Recipe 2.8).

In character classes, we can use ranges for single digits, such as ‹[0-5]›. That’s because the characters for the digits 0 through 9 occupy consecutive positions in the ASCII and Unicode character tables. ‹[0-5]› matches one of six characters, just like ‹[j-o]› and ‹[x09-x0E]› match different ranges of six characters.

When a numeric range is represented as text, it consists of a number of positions. Each position allows a certain range of digits. Some ranges have a fixed number of positions, such as 12 to 24. Others have a variable number of positions, such as 1 to 12. The range of digits allowed by each position can be either interdependent or independent of the digits in the other positions. In the range 40 to 59, the positions are independent. In the range 44 to 55, the positions are interdependent.

The easiest ranges are those with a fixed number of independent positions, such as 40 to 59. To code these as a regular expression, all you need to do is to string together a bunch of character classes. Use one character class for each position, specifying the range of digits allowed at that position.

[45][0-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The range 40 to 59 requires a number with two digits. Thus we need two character classes. The first digit must be a 4 or 5. The character class ‹[45]› matches either digit. The second digit can be any of the 10 digits. ‹[0-9]› does the trick.

Tip

We could also have used the shorthand ‹d› instead of ‹[0-9]›. We use the explicit range ‹[0-9]› for consistency with the other character classes, to help maintain readability. Reducing the number of backslashes in your regexes is also very helpful if you’re working with a programming language such as Java that requires backslashes to be escaped in literal strings.

The numbers in the range 44 to 55 also need two positions, but they’re not independent. The first digit must be 4 or 5. If the first digit is 4, the second digit must be between 4 and 9. That covers the numbers 44 to 49. If the first digit is 5, the second digit must be between 0 and 5. That covers the numbers 50 to 55. To create our regex, we simply use alternation to combine the two ranges:

4[4-9]|5[0-5]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

By using alternation, we’re telling the regex engine to match ‹4[4-9]› or ‹5[0-5]›. The alternation operator has the lowest precedence of all regex operators, and so we don’t need to group the digits, as in ‹(4[4-9])|(5[0-5)›.

You can string together as many ranges using alternation as you want. The range 34 to 65 also has two interdependent positions. The first digit must be between 3 and 6. If the first digit is 3, the second must be 4 to 9. If the first is 4 or 5, the second can be any digit. If the first is 6, the second must be 0 to 5:

3[4-9]|[45][0-9]|6[0-5]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Just like we use alternation to split ranges with interdependent positions into multiple ranges with independent positions, we can use alternation to split ranges with a variable number of positions into multiple ranges with a fixed number of positions. The range 1 to 12 has numbers with one or two positions. We split this into the range 1 to 9 with one position, and the range 10 to 12 with two positions. The positions in each of these two ranges are independent, so we don’t need to split them up further:

1[0-2]|[1-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

We listed the range with two digits before the one with a single digit. This is intentional because the regular expression engine is eager. It scans the alternatives from left to right, and stops as soon as one matches. If your subject text is 12, then ‹1[0-2|[1-9]› matches 12, whereas ‹[1-9]|1[0-2]› matches just ‹1›. The first alternative, ‹[1-9]›, is tried first. Since that alternative is happy to match just 1, the regex engine never tries to check whether ‹1[0-2]› might offer a “better” solution.

Some Regex Engines Are Not Eager

POSIX-compliant regex engines and DFA regex engines do not follow this rule. They try all alternatives, and return the one that finds the longest match. All the flavors discussed in this book, however, are NFA engines, which don’t do the extra work required by POSIX. They will all tell you that ‹[1-9]|1[0-2]› matches 1 in 12.

In practice, you’ll usually use anchors or word boundaries around your list of alternatives. Then the order of alternatives doesn’t really matter. ‹^([1-9]|1[0-2])$› and ‹^(1[0-2]|[1-9])$› both match 12 in 12 with all regex flavors in this book, as well as POSIX “extended” regular expressions and DFA engines. The anchors require the regex to match either the whole string or nothing at all. DFA and NFA are defined in the sidebar History of the Term “Regular Expression” in Chapter 1.

The range 85 to 117 includes numbers of two different lengths. The range 85 to 99 has two positions, and the range 100 to 117 has three positions. The positions in these ranges are interdependent, and so we have to split them up further. For the two-digit range, if the first digit is 8, the second must be between 5 and 9. If the first digit is 9, the second digit can be any digit. For the three-digit range, the first position allows only the digit 1. If the second position has the digit 0, the third position allows any digit. But if the second digit is 1, then the third digit must be between 0 and 7. This gives us four ranges total: 85 to 89, 90 to 99, 100 to 109, and 110 to 117. Though things are getting long-winded, the regular expression remains as straightforward as the previous ones:

8[5-9]|9[0-9]|10[0-9]|11[0-7]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

That’s all there really is to matching numeric ranges with regular expressions: simply split up the range until you have ranges with a fixed number of positions with independent digits. This way, you’ll always get a correct regular expression that is easy to read and maintain, even if it may get a bit long-winded.

There are some extra techniques that allow for shorter regular expressions. For example, using the previous system, the range 0 to 65535 would require this regex:

6553[0-5]|655[0-2][0-9]|65[0-4][0-9][0-9]|6[0-4][0-9][0-9][0-9]|↵
[1-5][0-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9]|↵
[1-9][0-9]|[0-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

This regular expression works perfectly, and you won’t be able to come up with a regex that runs measurably faster. Any optimizations that could be made (e.g., there are various alternatives starting with a 6) are already made by the regular expression engine when it compiles your regular expression. There’s no need to waste your time to make your regex more complicated in the hopes of getting it faster. But you can make your regex shorter, to reduce the amount of typing you need to do, while still keeping it readable.

Several of the alternatives have identical character classes next to each other. You can eliminate the duplication by using quantifiers. Recipe 2.12 tells you all about those.

6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|↵
[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[0-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The ‹[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]› part of the regex has three very similar alternatives, and they all have the same pair of character classes. The only difference is the number of times the second class is repeated. We can easily combine that into ‹[1-9][0-9]{1,3}›.

6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|↵
[1-9][0-9]{1,3}|[0-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Any further tricks will hurt readability. For example, you could isolate the leading 6 from the first four alternatives:

6(?:553[0-5]|55[0-2][0-9]|5[0-4][0-9]{2}|[0-4][0-9]{3})|[1-5][0-9]{4}|↵
[1-9][0-9]{1,3}|[0-9]

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

But this regex is actually one character longer because we had to add a noncapturing group to isolate the alternatives with the leading 6 from the other alternatives. You won’t get a performance benefit with any of the regex flavors discussed in this book. They all make this optimization internally.

Table of Contents for
6.7. Numbers Within a Certain Range

6.7. Numbers Within a Certain Range

Problem

Solution

Discussion

Tip

Tip

See Also

Table of Contents for 6.7. Numbers Within a Certain Range

Create new playlist

Sign In

Sign Up

6.7. Numbers Within a Certain Range

Problem

Solution

Discussion

Tip

Tip

See Also

Table of Contents for
6.7. Numbers Within a Certain Range