Multiline Comments

Problem

You want to match a comment that starts with /* and ends with */. Nested comments are not permitted. Any /* between /* and */ is simply part of the comment. Comments can span across lines.

Solution

/*.*?*/
Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby
/*[sS]*?*/
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The forward slash has no special meaning in regular expressions, but the asterisk does. We need to escape the asterisk with a backslash. This gives /* and */ to match /* and */. Backslashes and/or forward slashes may get other special meanings when you add literal regular expressions to your source code, so you may need to escape the forward slashes as explained in Recipe 3.1.

We use .*? to match anything between the two delimiters of the comment. The option “dot matches line breaks” that most regex engines have allows this to span multiple lines. We need to use a lazy quantifier to make sure that the comment stops at the first */ after the /*, rather than at the last */ in the file.

JavaScript is the only regex flavor in this book that does not have an option to make the dot match line breaks. If you’re using JavaScript without the XRegExp library, you can use [sS] to accomplish the same. Although you could use [sS] with the other regex flavors too, we do not recommend it, as regex engines generally have optimized code to handle the dot, which is one of the most elementary features of regular expressions.

Variations

If the regex will be used in a system that needs to deal with source code files while they’re being edited, you may want to make the closing delimiter optional. Then everything until the end of the file will be matched as a comment while it is being typed in, until the closing */ has been typed in. Syntax coloring in text editors, for example, usually works this way. Making the closing delimiter optional does not change how this regex works on files that only have properly closed multiline comments. The quantifier for the closing delimiter is greedy, so it will be matched if present. The quantifier for the dot is lazy, so it will stop as soon as the closing delimiter can be matched.

/*.*?(?:*/)?
Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby
/*[sS]*?(?:*/)?
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

See Also

Recipe 2.4 explains the dot, including the option to make it match line breaks, and the workaround for JavaScript. Recipe 2.13 explains the difference between greedy and lazy quantifiers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset