You want to find various kinds of integer numbers in a
larger body of text, or check whether a string variable holds an integer
number. Underscores are allowed as separators between groups of numbers,
to make the integers easier to read. Numbers may not begin or end with
an underscore. You want to allow decimal, octal, hexadecimal, and binary
numbers. Hexadecimal and binary numbers must be prefixed with 0x
and 0b
.
0b0111_1111_1111_1111_1111_1111_1111_1111
,
0177_7777_7777
, 2_147_483_647
, and
0x7fff_ffff
are examples of valid numbers.
Find any decimal or octal integer with optional underscores in a larger body of text:
[0-9]+(_+[0-9]+)*
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any hexadecimal integer with optional underscores in a larger body of text:
0x[0-9A-F]+(_+[0-9A-F]+)*
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any binary integer with optional underscores in a larger body of text:
0b[01]+(_+[01]+)*
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any decimal, octal, hexadecimal, or binary integer with optional underscores in a larger body of text:
([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Check whether a text string holds just a decimal, octal, hexadecimal, or binary integer with optional underscores:
A([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
Recipes 6.1, 6.2, and 6.3 explain in detail how to match
integer numbers. These recipes do not allow underscores in the numbers.
Their regular expressions can easily use ‹[0-9]+
›, ‹[0-9A-F]+
›, and ‹[01]+
› to match decimal, hexadecimal, and binary
numbers.
If we wanted to allow underscores anywhere, we could just add the
underscore to these three character classes. But we do not want to allow
underscores at the start or the end. The first and last characters in
the number must be a digit. You might think of ‹[0-9][0-9_]+[0-9]
› as an easy solution. But this
fails to match single digit numbers. So we need a slightly more complex
solution.
Our solution ‹[0-9]+(_+[0-9]+)*
› uses ‹[0-9]+
› to match the initial digit or digits as
before. We add ‹(_+[0-9]+)*
› to allow the digits to be followed by
one or more underscores, as long as those underscores are followed by
more digits. ‹_+
› allows
any number of sequential underscores. ‹[0-9]+
› allows any number of digits after the
underscores. We put those two inside a group that we repeat zero or more
times with a asterisk. This allows any number of nonsequential
underscores with digits in between them and after them, while also
allowing numbers with no underscores at all.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.