Chapter 3. Special Characters

In this chapter, we will be taking a look at some special characters and some more advanced techniques that will help us create more detailed Regex patterns. We will also slowly transition from using our Regex testing environment, and go back to using standard JavaScript to build more complete real-world examples.

Before we get ahead of ourselves, there are still a couple things we can learn using our current setup, starting with some constraints.

In this chapter ,we will cover the following topics:

  • Defining boundaries for a Regex
  • Defining nongreedy quantifiers
  • Defining Regex with groups

Nonvisual constraints

Until now, all the constraints we have been putting on our patterns had to do with characters that could or couldn't be displayed, but Regex provides a number of positional constraints, which allow you to filter out some false positives.

Matching the beginning and end of an input

The first such set is the start and end of string matchers. Using the (^) caret character to match the start of a string and the ($) dollar sign to match the end, we can force a pattern to be positioned in these locations, for example, you can add the dollar sign at the end of a word to make sure that it is the last thing in the provided string. In the next example, I used the /^word|word$/g pattern to match an occurrence of word, which either starts or ends a string. The following image exemplifies the match of the regular expression when given a Text input:

Matching the beginning and end of an input

Using both the start and end character together assure that your pattern is the only thing in the string. For example if you have a /world/ pattern, it will match both the world string as well as any other string which merely contains world in it, such as hello world. However, if you wanted to make sure that the string only contains world, you can modify the pattern to be /^world$/. This means that Regex will attempt to find the pattern which, both, begins the string and ends it. This, of course, will only happen if it is the only thing in the string.

This is the default behavior but it is worth mentioning that this isn't always the case. In the previous chapter, we saw the m or multiline flag, and what this flag does is that it makes the caret character match not only the beginning of the string but also the beginning of any line. The same goes for the dollar sign: it will match the end of each line instead of the end of the entire string. So, it really comes down to what you need in a given situation.

Matching word boundaries

Word boundaries are very similar to the string boundaries we just saw, except that they work in the context of a single word. For example, we want to match can, but this refers to can alone, and not can from candy. We saw in the previous example, if you just type a pattern, such as /can/g, you will get matches for can even if it's a part of another word, for example, in a situation where the user typed candy. Using a backslash () character, we can denote a word boundary (either in the beginning or at the end), so that we can fix this problem using a pattern similar to /can/g, as shown here:

Matching word boundaries

Matching nonword boundaries

Paired with the  character, we have the B symbol, which is its inverse. Similar to what we have seen on multiple occasions, a capital symbol usually refers to the opposite functionality, and is no exception. The uppercase version will put a constraint on the pattern that limits it from being at the edge of word. Now, we'll run the same example text, except with /canB/g, which will swap the matches; this is because the n in can is at its boundary:

Matching nonword boundaries

Matching a whitespace character

You can match a whitespace character using the backslash s character, and it matches things such as spaces and tabs. It is similar to a word boundary, but it does have some distinctions. First of all, a word boundary matches the end of a word even if it is the last word in a pattern, unlike the whitespace character, which would require an extra space. So, /foo/ would match foo. However, /foos/ would not, because there is no following space character at the end of the string. Another difference is that a boundary matcher will count something similar to a period or dash as an actual boundary, though the whitespace character will only match a string if there is a whitespace:

Matching a whitespace character

Note

It's worth mentioning that the whitespace character has an S inverse matcher, which will match anything but a whitespace character.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset