GREP

Grep is perhaps the most powerful tool for searching text. Since everyone asks, g/re/p (“globally find regular expression and print”) was an early Unix command that found a text pattern (regular expression) and printed its occurrences. It has a long and colorful history.

Literal strings are easy to look for, but more abstract patterns require some thought and some code. InDesign comes with much of that code built into the special character menu in the GREP section of Find/Change, and there are many online resources to help us build useful queries. There are wildcards that represent letters, digits, upper- and lowercase characters, and even locations in text (the end or beginning of paragraphs, for example).

Code for Good

Let’s revisit an example from earlier. We need to find every paragraph that opens a chapter and apply a chapter header paragraph style to them. Earlier, we benefited from a writer who put something unique at the beginning of every such paragraph. But what if they didn’t?

How would such a paragraph be structured? It would start with the word “Chapter” (capitalized), followed by a space, then either a number or a word. Odds are slim that any other paragraph opens with that word, so we can leverage that position to find those header paragraphs and apply the correct style.

So in the GREP section of Find/Change, we type “Chapter” in the Find what field. Case is important in grep, so we’re careful to capitalize it. In front of that word, we need to insert a special character. From the Special characters menu, we choose Locations > Beginning of Paragraph. This inserts a caret (^), which in grep represents the start of a paragraph. A dollar sign ($) means “end of paragraph,” if you’re curious. In Change Format, we choose our Chapter Head paragraph style. When we click Change All, every paragraph that contains our pattern is formatted. The text in the example above even contains an occurrence of the word “Chapter,” which is capitalized, but it isn’t affected because it’s not the opening word of that paragraph.

Building a Query

My first experience with grep was with a list of 800 names, the members of the Seattle InDesign User Group that I ran at the time. Unfortunately, the list was not made surname first. It was more like the list shown here, though I think none of the folks in this list were members.

Let’s consider a simple case of the data that confronted me: Givenname Surname. Almost all were like that. What I needed was Surname, Givenname. That is, I had a chunk of characters, a space, then another chunk of characters; but I needed the second chunk of characters to come first, followed by a comma and a space, then the first chunk of characters.

The Find what query could be built like this: choosing the GREP Special characters menu > Wildcards > Any Character enters a period (.). That’s right, a simple dot in grep means “any character.” (I suggest acceptance is the right attitude here, rather than comprehension.) But I wanted a chunk of characters, so I chose Special characters menu > Repeat > One or More Times, which inserted a plus sign. Thus, “.+” means “one or more characters.” That could be the given name, but since “any character” can include spaces and much else, it could also be the entire name! So I needed to be more specific.

So I added a space and another “.+” to mean two chunks of characters separated by a space. My Find what now read like this: .+ .+

Now, I often use code even for spaces since they’re hard to see in that small field. The code for “horizontal space” (which includes various size spaces and tabs) is h. So my query could have been .+h.+ but I didn’t know that back then.

In the Change to field, I needed a way to refer to the chunks on either side of that space so I could transpose them. It took some time to figure out that meant I needed a “Marking Subexpression,” which in practical terms means surrounding each chunk with parentheses. This “marks” them so they can be referenced in the Change to field. So my final Find what read (.+) (.+).

To refer to the second chunk (the surname) we use $2, and the first is $1. Since we want a comma and space between them, that made my Change to $2, $1. How did I get lucky with the middle names? Grep used the longest match for the first chunk, then a space, then the surname.

More Grep Queries

If you look at the list in the Query menu at the top of Find/Change, many of the options use grep. Consider the one for Phone Number Conversion (dot format).

Find what: (?(ddd))?[-. ]?(ddd)[-. ]?(dddd)

Change to: $1.$2.$3

This converts any configuration of North American phone number and returns one like this: 206.555.5555. In the Change to field, the only special characters (beyond the metacharacters of a plain Text search) are the Founds ($1 = Found 1, etc.). So the periods there do not mean “any character,” as they do in the field above.

When configuring a query in grep, sometimes we need to specify a literal character that’s used for a special grep purpose, like a period or parenthesis. To search for a literal period, we use .. It’s said that the backslash “escapes” the period, freeing it, I suppose, to be a simple period again. So in the phone number search above, we are looking for the possible use of literal parentheses with (?, where the question mark means zero or one of them (or as I prefer to phrase it, “maybe it’s there, maybe it ain’t”). We’re also using them to group the digits into groups we can refer to in the Change to field. Each d means “any digit,” and we can scan across that query and pick out three, then three, then four of them, as in a phone number. Another way to look for exactly three digits is d{3}, which could make that whole query a little shorter.

The square brackets contain literal characters between which we’re to read “or.” So between each chunk of digits, there could be a hyphen, a dot, or a space. Or there could be none of those things, so the question mark is added after the bracket to mean zero or one of those.

It’s a wonderfully thought-out query that we didn’t have to come up with! If you don’t like the dot format, just change the Change to. Want it more old school? ($1) $2-$3 would give (206) 555-5555.

Another approach to searching for a choice of characters if there are multiple options is to use the pipe character (|). Since I live fairly close to Canada, I see the two spellings for center: center or centre. To search for both, I can use cent(er|re). In a case where a letter may or may not be in a word, the ? comes in handy: harbou?r finds “harbor” and “harbour.” I can replace either with my preferred version. Might it be capitalized? Then look for [Hh]arbou?r. If we’re looking for a capitalized word, we can specify an uppercase letter (u) that’s followed by one or more lowercase letters (l+): ul+ will match “Photoshop” but not “InDesign.”

We often like to find characters before or after others, but don’t wish to include those others when we use Change to—perhaps the ordinal after a number (“st,” “nd,” “rd,” or “th”), but not the number itself. To indicate the entity behind (before) the text we’re interested in changing, we use a positive lookbehind. It looks like this: (?<=), with the character that’s just before the text we want to match inserted after the equal sign. For a digit (“d”), it would be (?<=d). The whole query (the lookbehind and the text we want to matching) looks like this:
(?<=d)(st|nd|rd|th)

There’s also a positive lookahead for something that comes after the text we’re matching. The Negative versions means the entity does not precede or follow the text we’re seeking.

I sometimes wish to find any paragraphs that begin with a lowercase letter. We know now that the caret (^) means beginning of paragraph, and l is a lowercase character, thus we’d use: ^l. Unfortunately, this also finds lowercase letters after a forced line break. To exclude lowercase letters that follow those, we use a negative lookbehind: (?<!), putting the code for a forced line break ( ) after the exclamation mark.

So, to find a lowercase letter that starts a paragraph but doesn’t follow a forced line break, the query is:

(?<! )^l

Grep Resources

There are many sources of wisdom on this topic from generous and knowledgeable folks. A favorite is the site indesignsecrets.com/resources/grep. And if you type “grep Erica Gamet”

into your favorite internet search engine, you’ll be richly rewarded. The InDesign user community really is a community, and this topic seems to show that. The folks above and many more have likely made some footprints in this neck of the InDesign woods, and we’d do well to follow them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset