Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Here Documents

Problem

You need a regex that matches here documents in source files for a scripting language in which a here document can be started with << followed by a word. The word may have single or double quotes around it. The here document ends when that word appears at the very start of a line, without any quotes, using the same case.

Solution

<<(["']?)([A-Za-z]+)1.*?^2

Regex options: Dot matches line breaks, ^ and $ match at line breaks

Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<<(["']?)([A-Za-z]+)1[sS]*?^2

Regex options: ^ and $ match at line breaks

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

This regex may look a bit cryptic, but it is very straightforward. ‹<<› simply matches <<. ‹(["']?)›, then matches an optional single or double quote. The parentheses form a capturing group to store the quote, or the lack thereof. It is important that the quantifier ‹?› is inside the group rather than outside of it, so that the group always participates in the match. If we made the group itself optional, the group would not participate in the match when no quote can be matched, and a backreference to that group would fail to match.

The capturing group with character class ‹([A-Za-z]+)› matches a word and stores it into the second backreference. The word boundary ‹› makes sure we match the entire word after ‹<<›. If we were to omit the word boundary, the regex engine would backtrack. It would try to match the word partially if the backreference ‹2› cannot be matched. We do not need a word boundary before the word, because ‹<<(["']?)› already makes sure there is a nonword character before the word.

‹1› is a backreference to the first capturing group. This group will hold the quote if we matched one; otherwise, the group holds the empty string. Thus ‹1› matches the same quote matched by the capturing group. ‹1› has no effect if the capturing group holds the empty string.

‹.*?› matches any amount of text. We turned on the option “dot matches line breaks” to allow it to span multiple lines. JavaScript does not have that option, and so for JavaScript we use ‹[sS]*?› to match the text. Either way, the question mark makes the asterisk lazy, telling it to match as few characters as possible. The here document should end at the first occurrence of the terminating word rather than the last occurrence. The file may have multiple here documents using the same terminating word, and the lazy quantifier makes sure we match each here document separately.

‹^› matches at the start of any line because we turned on the option to make the caret and dollar match at line breaks. Ruby does not have this option. Because the caret and dollar always match at line breaks in Ruby, this does not change our solution. There is just one less option to set.

‹2› is a backreference to the second capturing group. This group holds the word we matched at the start of the here document. Because the here document syntax of our scripting language is case sensitive, our regex needs to be case sensitive too. That’s why we used ‹[A-Za-z]+› to match the word rather than using ‹[a-z]+› or ‹[A-Z]+› and turning on case insensitivity. Backreferences also become case insensitive when the case insensitivity option is turned on.

Finally, another word boundary ‹› makes sure that the regex stops only if ‹2› matched the word on its own, rather than as part of a longer word. We do not need a word boundary before ‹›, as the caret has already made sure the word is at the start of the line. Whenever ‹2› or the final ‹› fail to match, the regex engine will backtrack and let ‹.*?› match more characters.

Table of Contents for
Here Documents

Here Documents

Problem

Solution

Discussion

See Also

Table of Contents for Here Documents

Create new playlist

Sign In

Sign Up

Here Documents

Problem

Solution

Discussion

See Also

Table of Contents for
Here Documents