In this chapter, we will be taking a look at some special characters and some more advanced techniques that will help us create more detailed Regex patterns. We will also slowly transition from using our Regex testing environment, and go back to using standard JavaScript to build more complete real-world examples.
Before we get ahead of ourselves, there are still a couple things we can learn using our current setup, starting with some constraints.
In this chapter ,we will cover the following topics:
Until now, all the constraints we have been putting on our patterns had to do with characters that could or couldn't be displayed, but Regex provides a number of positional constraints, which allow you to filter out some false positives.
The first such set is the start and end of string matchers. Using the (^
) caret character to match the start of a string and the ($
) dollar sign to match the end, we can force a pattern to be positioned in these locations, for example, you can add the dollar sign at the end of a word to make sure that it is the last thing in the provided string. In the next example, I used the /^word|word$/g
pattern to match an occurrence of word
, which either starts or ends a string. The following image exemplifies the match of the regular expression when given a Text input:
Using both the start and end character together assure that your pattern is the only thing in the string. For example if you have a /world/
pattern, it will match both the world
string as well as any other string which merely contains world
in it, such as hello world
. However, if you wanted to make sure that the string only contains world
, you can modify the pattern to be /^world$/
. This means that Regex will attempt to find the pattern which, both, begins the string and ends it. This, of course, will only happen if it is the only thing in the string.
This is the default behavior but it is worth mentioning that this isn't always the case. In the previous chapter, we saw the m
or multiline flag, and what this flag does is that it makes the caret character match not only the beginning of the string but also the beginning of any line. The same goes for the dollar sign: it will match the end of each line instead of the end of the entire string. So, it really comes down to what you need in a given situation.
Word boundaries are very similar to the string boundaries we just saw, except that they work in the context of a single word. For example, we want to match can
, but this refers to can
alone, and not can
from candy
. We saw in the previous example, if you just type a pattern, such as /can/g
, you will get matches for can
even if it's a part of another word, for example, in a situation where the user typed candy
. Using a backslash () character, we can denote a word boundary (either in the beginning or at the end), so that we can fix this problem using a pattern similar to
/can/g
, as shown here:
Paired with the character, we have the
B
symbol, which is its inverse. Similar to what we have seen on multiple occasions, a capital symbol usually refers to the opposite functionality, and is no exception. The uppercase version will put a constraint on the pattern that limits it from being at the edge of word. Now, we'll run the same example text, except with /canB/g
, which will swap the matches; this is because the n
in can
is at its boundary:
You can match a whitespace character using the backslash s
character, and it matches things such as spaces and tabs. It is similar to a word boundary, but it does have some distinctions. First of all, a word boundary matches the end of a word even if it is the last word in a pattern, unlike the whitespace character, which would require an extra space. So, /foo/
would match foo
. However, /foos/
would not, because there is no following space character at the end of the string. Another difference is that a boundary matcher will count something similar to a period or dash as an actual boundary, though the whitespace character will only match a string if there is a whitespace: