You need a regex that matches a string, which is a sequence of zero or more characters enclosed by double quotes. A string with nothing between the quotes is an empty string. A double quote can be included in the string by escaping it with a backslash, and backslashes can also be used to escape other characters in the string. Strings cannot include line breaks, and line breaks cannot be escaped with backslashes.
"[^"\ ]*(?:\.[^"\ ]*)"
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
This regular expression has the same structure as the one in the
preceding recipe. The difference is that we now have two characters with
a special meaning: the double quote and the backslash. We exclude both
from the characters matched by the two negated character classes. We use
‹\.
› to separately match
any escaped character. ‹\
› matches a single backslash, and ‹.
› matches
any character that is not a line break. Make sure the option “dot
matches line breaks” is turned off.
Strings delimited with single quotes can be matched just as easily:
'[^'\ ]*(?:\.[^'\ ]*)'
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
If your language supports both single-quoted and double-quoted strings, you’ll need to handle those as separate alternatives:
"[^"\ ]*(?:\.[^"\ ]*)"|'[^'\ ]*(?:\.[^'\ ]*)'
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
If strings can include line breaks escaped with a backslash, we
can modify our original regular expression to allow a line break to be
matched after the backslash. We use ‹(?:.|
?
)
› rather than just the dot with the
“dot matches line breaks option” to make sure that Windows-style line
breaks are matched correctly. The dot would match only the CR in a CR LF
line break, and the regex would then fail to match the LF. ‹
?
›
handles both Windows-style and Unix-style line breaks.
"[^"\ ]*(?:\(?:.| ? )[^"\ ]*)"
Regex options: None (make sure “dot matches line breaks” is off) |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
If strings can include line breaks even when they are not escaped, remove them from the negated character classes. Also make sure to allow the dot to match line breaks.
"[^"\]*(?:\.[^"\]*)*"
Regex options: None |
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
We need a separate solution for JavaScript without XRegExp, because it does not have an option to make the dot match lines.
"[^"\]*(?:\[sS][^"\]*)*"
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Strings explains the basic structure of the regular expression in this recipe’s solution. Recipe 2.4 explains the dot, including the option to make it match line breaks, and the workaround for JavaScript.