You want to find hexadecimal numbers in a larger body of text, or check whether a string variable holds a hexadecimal number.
Find any hexadecimal number in a larger body of text:
[0-9A-F]+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
[0-9A-Fa-f]+
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Check whether a text string holds just a hexadecimal number:
A[0-9A-F]+
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^[0-9A-F]+$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
Find a hexadecimal number with a 0x
prefix:
0x[0-9A-F]+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal number with an &H
prefix:
&H[0-9A-F]+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal number with an H
suffix:
[0-9A-F]+H
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal byte value or 8-bit number:
[0-9A-F]{2}
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal word value or 16-bit number:
[0-9A-F]{4}
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal double word value or 32-bit number:
[0-9A-F]{8}
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a hexadecimal quad word value or 64-bit number:
[0-9A-F]{16}
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find a string of hexadecimal bytes (i.e., an even number of hexadecimal digits):
(?:[0-9A-F]{2})+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
The techniques for matching hexadecimal integers with a regular expression is the same as matching decimal integers. The only difference is that the character class that matches a single digit now has to include the letters A through F. You have to consider whether the letters must be either uppercase or lowercase, or if mixed case is permitted. The regular expressions shown here all allow mixed case.
By default, regular expressions are case-sensitive. ‹[0-9a-f]
› allows only lowercase
hexadecimal digits, and ‹[0-9A-F]
› allows only uppercase hexadecimal
digits. To allow mixed case, use ‹[0-9a-fA-F]
› or turn on the option to make your
regular expression case insensitive. Recipe 3.4 explains how to do that with the
programming languages covered by this book. The first regex in the
solution is shown twice, using the two different ways of making it
case-insensitive. The others shown use only the second method.
If you only want to allow uppercase letters in hexadecimal
numbers, use the regexes shown with case insensitivity turned off. To
allow only lowercase letters, turn off case insensitivity and replace
‹A-F
› with ‹a-f
›.
‹(?:[0-9A-F]{2})+
›
matches an even number of hexadecimal digits. ‹[0-9A-F]{2}
› matches exactly two hexadecimal
digits. ‹(?:[0-9A-F]{2})+
›
does that one or more times. The noncapturing group (see Recipe 2.9) is required because the plus needs to
repeat the character class and the quantifier ‹{2}
› combined. ‹[0-9]{2}+
› is not a syntax error in Java, PCRE,
and Perl 5.10, but it doesn’t do what you want. The extra ‹+
› makes the ‹{2}
› possessive. That has no
effect, because ‹{2}
›
cannot repeat fewer than two times anyway.
Several of the solutions show how to require the hexadecimal number to have one of the prefixes or suffixes commonly used to identify hexadecimal numbers. These are used to differentiate between decimal numbers and hexadecimal numbers that happen to consist of only decimal digits. For example, 10 could be the decimal number between 9 and 11, or the hexadecimal number between F and 11.
Most solutions are shown with word boundaries (Recipe 2.6). Use word boundaries as shown to
find numbers within a larger body of text. Notice that the regex using
the &H
prefix does not have a
word boundary at the start. That’s because the ampersand is not a word
boundary. If we put a word boundary at the start of that regex, it would
only find hexadecimal numbers immediately after a word character.
If you want to check whether your string holds nothing but
a hexadecimal number, simply put start-of-string and end-of-string
anchors around your regex. ‹A
› and
‹› are
your best options, because their meanings don’t change. Unfortunately,
JavaScript doesn’t support them. In JavaScript, use ‹
^
› and ‹$
›, and make sure you don’t specify the /m
flag that makes the
caret and dollar match at line breaks. In Ruby, the caret and dollar
always match at line breaks, so you can’t reliably use them to force
your regex to match the whole string.
All the other recipes in this chapter show more ways of matching different kinds of numbers with a regular expression.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.6 explains word boundaries. Recipe 2.12 explains repetition.