You want to match INI parameter name-value pairs (e.g.,
Item1=Value1
),
separating each match into two parts using capturing groups.
Backreference 1 should contain the parameter name (Item1
), and backreference 2 should contain the
value (Value1
).
Here’s the regular expression to get the job done:
^([^=; ]+)=([^; ]*)
Regex options: ^ and $ match at line breaks |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Or with free-spacing mode turned on:
^ # Start of a line ( [^=; ]+ ) # Capture the name to backreference 1 = # Name-value delimiter ( [^; ]* ) # Capture the value to backreference 2
Regex options: ^ and $ match at line breaks, free-spacing |
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
Like the other INI recipes in this chapter, we’re working with
pretty straightforward regex ingredients here. The pattern starts with
‹^
›, to
match the position at the start of a line (make sure the “^ and $ match
at line breaks” option is enabled). This is important because without
the assurance that matches start at the beginning of a line, you could
match part of a commented-out line.
Next, the regex uses a capturing group that contains the negated
character class ‹[^=;
]
› followed by the ‹+
›
one-or-more quantifier to match the name of the parameter and remember
it as backreference 1. The negated class matches any character except
the following four: equals sign, semicolon, carriage return (‹
›), and line feed (‹
›). The carriage return and line
feed characters are both used to end an INI parameter, a semicolon marks
the start of a comment, and an equals sign separates a parameter’s name
and value.
After matching the parameter name, the regex matches a literal
equals sign (the name-value delimiter), and then the parameter value.
The value is matched using a second capturing group that is similar to
the pattern used to match the parameter name but has two fewer
restrictions. First, this second subpattern allows matching equals signs
as part of the value (i.e., there is one less negated character in the
character class). Second, it uses a ‹*
› quantifier to remove the need to match at least
one character since parameter values may be empty.
And we’re done.
Recipe 9.13 explains how to match INI section headers. Recipe 9.14 covers how to match INI section blocks.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.2 explains how to match nonprinting characters. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.