Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.13. Roman Numerals

Problem

You want to match Roman numerals such as IV, XIII, and MVIII.

Solution

Roman numerals without validation:

^[MDCLXVI]+$

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Modern Roman numerals, strict:

^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Modern Roman numerals, flexible:

^(?=[MDCLXVI])M*(C[MD]|D?C*)(X[CL]|L?X*)(I[XV]|V?I*)$

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Simple Roman numerals:

^(?=[MDCLXVI])M*D?C{0,4}L?X{0,4}V?I{0,4}$

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Roman numerals are written using the letters M, D, C, L, X, V, and I, representing the values 1,000, 500, 100, 50, 10, 5, and 1, respectively. The first regex matches any string composed of these letters, without checking whether the letters appear in the order or quantity necessary to form a proper Roman numeral.

In modern times (meaning during the past few hundred years), Roman numerals have generally been written following a strict set of rules. These rules yield exactly one Roman numeral per number. For example, 4 is always written as IV, never as IIII. The second regex in the solution matches only Roman numerals that follow these modern rules.

Each nonzero digit of the decimal number is written out separately in the Roman numeral. 1999 is written as MCMXCIX, where M is 1000, CM is 900, XC is 90, and IX is 9. We don’t write MIM or IMM.

The thousands are easy: one M per thousand, easily matched with ‹M*›.

There are 10 variations for the hundreds, which we match using two alternatives. ‹C[MD]› matches CM and CD, which represent 900 and 400. ‹D?C{0,3}› matches DCCC, DCC, DC, D, CCC, CC, C, and the empty string, representing 800, 700, 600, 500, 300, 200, 100, and nothing. This gives us all of the 10 digits for the hundreds.

We match the tens with ‹X[CL]|L?X{0,3}› and the units with ‹I[XV]|V?I{0,3}›. These use the same syntax, but with different letters.

All four parts of the regex allow everything to be optional, because each of the digits could be zero. The Romans did not have a symbol, or even a word, to represent zero. Thus, zero is unwritten in Roman numerals. While each part of the regex should indeed be optional, they’re not all optional at the same time. We have to make sure our regex does not allow zero-length matches. To do this, we put the lookahead ‹(?=[MDCLXVI])› at the start of the regex. This lookahead, as Recipe 2.16 explains, makes sure that there’s at least one letter in the regex match. The lookahead does not consume the letter that it matches, so that letter can be matched again by the remainder of the regex.

The third regex is a bit more flexible. It also accepts numerals such as IIII, while still accepting IV.

The fourth regex only allows numerals written without using subtraction, and therefore all the letters must be in descending order. 4 must be written as IIII rather than IV. The Romans themselves usually wrote numbers this way.

Tip

All regular expressions are wrapped between anchors (Recipe 2.5) to make sure we check whether the whole input is a Roman numeral, as opposed to a floating-point number occurring in a larger string. You can replace ‹^› and ‹$› with ‹› word boundaries if you want to find Roman numerals in a larger body of text.

Convert Roman Numerals to Decimal

This Perl function uses the “strict” regular expression from this recipe to check whether the input is a valid Roman numeral. If it is, it uses the regex ‹[MDLV]|C[MD]?|X[CL]?|I[XV]?› to iterate over all of the letters in the numeral, adding up their values:

sub roman2decimal {
    my $roman = shift;
    if ($roman =~
        m/^(?=[MDCLXVI])
          (M*)               # 1000
          (C[MD]|D?C{0,3})   # 100
          (X[CL]|L?X{0,3})   # 10
          (I[XV]|V?I{0,3})   # 1
          $/ix)
    {
        # Roman numeral found
        my %r2d = ('I' =>    1, 'IV' =>   4, 'V' =>   5, 'IX' =>   9,
                   'X' =>   10, 'XL' =>  40, 'L' =>  50, 'XC' =>  90,
                   'C' =>  100, 'CD' => 400, 'D' => 500, 'CM' => 900,
                   'M' => 1000);
        my $decimal = 0;
        while ($roman =~ m/[MDLV]|C[MD]?|X[CL]?|I[XV]?/ig) {
            $decimal += $r2d{uc($&)};
        }
        return $decimal;
    } else {
        # Not a Roman numeral
        return 0;
    }
}

Table of Contents for
6.13. Roman Numerals

6.13. Roman Numerals

Problem

Solution

Discussion

Tip

Convert Roman Numerals to Decimal

See Also

Table of Contents for 6.13. Roman Numerals

Create new playlist

Sign In

Sign Up

6.13. Roman Numerals

Problem

Solution

Discussion

Tip

Convert Roman Numerals to Decimal

See Also

Table of Contents for
6.13. Roman Numerals