6.2. Hexadecimal Numbers

Problem

You want to find hexadecimal numbers in a larger body of text, or check whether a string variable holds a hexadecimal number.

Solution

Find any hexadecimal number in a larger body of text:

[0-9A-F]+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
[0-9A-Fa-f]+
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Check whether a text string holds just a hexadecimal number:

A[0-9A-F]+
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^[0-9A-F]+$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Find a hexadecimal number with a 0x prefix:

0x[0-9A-F]+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal number with an &H prefix:

&H[0-9A-F]+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal number with an H suffix:

[0-9A-F]+H
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal byte value or 8-bit number:

[0-9A-F]{2}
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal word value or 16-bit number:

[0-9A-F]{4}
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal double word value or 32-bit number:

[0-9A-F]{8}
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a hexadecimal quad word value or 64-bit number:

[0-9A-F]{16}
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find a string of hexadecimal bytes (i.e., an even number of hexadecimal digits):

(?:[0-9A-F]{2})+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The techniques for matching hexadecimal integers with a regular expression is the same as matching decimal integers. The only difference is that the character class that matches a single digit now has to include the letters A through F. You have to consider whether the letters must be either uppercase or lowercase, or if mixed case is permitted. The regular expressions shown here all allow mixed case.

By default, regular expressions are case-sensitive. [0-9a-f] allows only lowercase hexadecimal digits, and [0-9A-F] allows only uppercase hexadecimal digits. To allow mixed case, use [0-9a-fA-F] or turn on the option to make your regular expression case insensitive. Recipe 3.4 explains how to do that with the programming languages covered by this book. The first regex in the solution is shown twice, using the two different ways of making it case-insensitive. The others shown use only the second method.

If you only want to allow uppercase letters in hexadecimal numbers, use the regexes shown with case insensitivity turned off. To allow only lowercase letters, turn off case insensitivity and replace A-F with a-f.

(?:[0-9A-F]{2})+ matches an even number of hexadecimal digits. [0-9A-F]{2} matches exactly two hexadecimal digits. (?:[0-9A-F]{2})+ does that one or more times. The noncapturing group (see Recipe 2.9) is required because the plus needs to repeat the character class and the quantifier {2} combined. [0-9]{2}+ is not a syntax error in Java, PCRE, and Perl 5.10, but it doesn’t do what you want. The extra + makes the {2} possessive. That has no effect, because {2} cannot repeat fewer than two times anyway.

Several of the solutions show how to require the hexadecimal number to have one of the prefixes or suffixes commonly used to identify hexadecimal numbers. These are used to differentiate between decimal numbers and hexadecimal numbers that happen to consist of only decimal digits. For example, 10 could be the decimal number between 9 and 11, or the hexadecimal number between F and 11.

Most solutions are shown with word boundaries (Recipe 2.6). Use word boundaries as shown to find numbers within a larger body of text. Notice that the regex using the &H prefix does not have a word boundary at the start. That’s because the ampersand is not a word boundary. If we put a word boundary at the start of that regex, it would only find hexadecimal numbers immediately after a word character.

If you want to check whether your string holds nothing but a hexadecimal number, simply put start-of-string and end-of-string anchors around your regex. A and  are your best options, because their meanings don’t change. Unfortunately, JavaScript doesn’t support them. In JavaScript, use ^ and $, and make sure you don’t specify the /m flag that makes the caret and dollar match at line breaks. In Ruby, the caret and dollar always match at line breaks, so you can’t reliably use them to force your regex to match the whole string.

See Also

All the other recipes in this chapter show more ways of matching different kinds of numbers with a regular expression.

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.6 explains word boundaries. Recipe 2.12 explains repetition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset