Traditional grep tools apply your regular expression to one line of text at a time, and display the lines matched (or not matched) by the regular expression. You have an array of strings, or a multiline string, that you want to process in this way.
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
string[] lines = Regex.Split(subjectString, " ? ");
Then, iterate over the lines
array:
Regex regexObj = new Regex("regex pattern"); for (int i = 0; i < lines.Length; i++) { if (regexObj.IsMatch(lines[i])) { // The regex matches lines[i] } else { // The regex does not match lines[i] } }
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
Dim Lines = Regex.Split(SubjectString, " ? ")
Then, iterate over the lines
array:
Dim RegexObj As New Regex("regex pattern") For i As Integer = 0 To Lines.Length - 1 If RegexObj.IsMatch(Lines(i)) Then 'The regex matches Lines(i) Else 'The regex does not match Lines(i) End If Next
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
String[] lines = subjectString.split(" ? ");
Then, iterate over the lines
array:
Pattern regex = Pattern.compile("regex pattern"); Matcher regexMatcher = regex.matcher(""); for (int i = 0; i < lines.length; i++) { regexMatcher.reset(lines[i]); if (regexMatcher.find()) { // The regex matches lines[i] } else { // The regex does not match lines[i] } }
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text.
var lines = subject.split(/ ? /);
Then, iterate over the lines
array:
var regexp = /regex pattern/; for (var i = 0; i < lines.length; i++) { if (lines[i].match(regexp)) { // The regex matches lines[i] } else { // The regex does not match lines[i] } }
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
$lines = preg_split('/ ? /', $subject)
Then, iterate over the $lines
array:
foreach ($lines as $line) { if (preg_match('/regex pattern/', $line)) { // The regex matches $line } else { // The regex does not match $line } }
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
@lines = split(m/ ? /, $subject)
Then, iterate over the $lines
array:
foreach $line (@lines) { if ($line =~ m/regex pattern/) { # The regex matches $line } else { # The regex does not match $line } }
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
lines = re.split(" ? ", subject)
Then, iterate over the lines
array:
reobj = re.compile("regex pattern") for line in lines[:]: if reobj.search(line): # The regex matches line else: # The regex does not match line
If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text:
lines = subject.split(/ ? /)
Then, iterate over the lines
array:
re = /regex pattern/ lines.each { |line| if line =~ re # The regex matches line else # The regex does not match line }
When working with line-based data, you can save yourself a lot of trouble if you split the data into an array of lines, instead of trying to work with one long string with embedded line breaks. Then, you can apply your actual regex to each string in the array, without worrying about matching more than one line. This approach also makes it easy to keep track of the relationship between lines. For example, you could easily iterate over the array using one regex to find a header line and then another to find the footer line. With the delimiting lines found, you can then use a third regex to find the data lines you’re interested in. Though this may seem like a lot of work, it’s all very straightforward, and will yield code that performs well. Trying to craft a single regex to find the header, data, and footer all at once will be a lot more complicated, and will result in a much slower regex.
Processing a string line by line also makes it easy to negate a regular expression. Regular expressions don’t provide an easy way of saying “match a line that does not contain this or that word.” Only character classes can be easily negated. But if you’ve already split your string into lines, finding the lines that don’t contain a word becomes as easy as doing a literal text search in all the lines, and removing the ones in which the word can be found.
Recipe 3.19 shows how you can easily
split a string into an array. The regular expression ‹
›
matches a pair of CR
and LF
characters, which delimit lines on the
Microsoft Windows platforms. ‹
› matches
an LF
character, which delimits lines
on Unix and its derivatives, such as Linux and even OS X. Since these
two regular expressions are essentially plain text, you don’t even need
to use a regular expression. If your programming language can split
strings using literal text, by all means split the string that
way.
If you’re not sure which line break style your data uses, you
could split it using the regular expression ‹
?
›. By
making the CR
optional, this regex
matches either a CRLF
Windows line
break or an LF
Unix line break.
Once you have your strings into the array, you can easily loop over it. Inside the loop, follow the recipe shown in Recipe 3.5 to check which lines match, and which don’t.
This recipe uses techniques introduced by two earlier recipes. Recipe 3.11 shows code to iterate over all the matches a regex can find in a string. Recipe 3.19 shows code to split a string into an array or list using a regular expression.