9.15. Match INI Name-Value Pairs

Problem

You want to match INI parameter name-value pairs (e.g., Item1=Value1), separating each match into two parts using capturing groups. Backreference 1 should contain the parameter name (Item1), and backreference 2 should contain the value (Value1).

Solution

Here’s the regular expression to get the job done:

^([^=;
]+)=([^;
]*)
Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Or with free-spacing mode turned on:

^               # Start of a line
( [^=;
]+ )  # Capture the name to backreference 1
=               # Name-value delimiter
( [^;
]* )   # Capture the value to backreference 2
Regex options: ^ and $ match at line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Discussion

Like the other INI recipes in this chapter, we’re working with pretty straightforward regex ingredients here. The pattern starts with ^, to match the position at the start of a line (make sure the “^ and $ match at line breaks” option is enabled). This is important because without the assurance that matches start at the beginning of a line, you could match part of a commented-out line.

Next, the regex uses a capturing group that contains the negated character class [^=; ] followed by the + one-or-more quantifier to match the name of the parameter and remember it as backreference 1. The negated class matches any character except the following four: equals sign, semicolon, carriage return ( ), and line feed ( ). The carriage return and line feed characters are both used to end an INI parameter, a semicolon marks the start of a comment, and an equals sign separates a parameter’s name and value.

After matching the parameter name, the regex matches a literal equals sign (the name-value delimiter), and then the parameter value. The value is matched using a second capturing group that is similar to the pattern used to match the parameter name but has two fewer restrictions. First, this second subpattern allows matching equals signs as part of the value (i.e., there is one less negated character in the character class). Second, it uses a * quantifier to remove the need to match at least one character since parameter values may be empty.

And we’re done.

See Also

Recipe 9.13 explains how to match INI section headers. Recipe 9.14 covers how to match INI section blocks.

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.2 explains how to match nonprinting characters. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset