Dots and pipes

In this section, we're going to cover two basic bits of regular expression syntax, and those are dots and pipes. So, to begin, we are going to install the regular expression library in Haskell, and we are going to introduce the dot and the pipe syntax. Let's find the Terminal, and we need to begin by installing the library, which can be done with the following command:

So, cabal install regex-posix will install our regular expression library. Now, once installed, let's go and create a new notebook, and dive in. We are going to name this notebook as RegexLearning. We need to import the Text.Regex.Posix library, so that we can access the =~ operator, which is necessary to look at regular expressions. Let's define a couple of strings in order to get us started:

As you can see, str1 is "one fish two fish red fish blue fish", the title of a popular Dr. Seuss book that I like to use when teaching regular expressions. The second string, str2, is going to be a classic: "The quick brown fox jumps over the lazy dog.". Now that we have a couple of strings and we have our library imported, we now have access to the =~ operator, which can be used to evaluate whether a pattern exists in a string. So, let's do a few quick examples. The first example is going to be very simple; we need to check whether a string exists inside of another string:

What we are asking here is, does the substring one exist inside of str1? And, we can see that it is True. Now, let's do this again with str2:

We are checking whether the same string, one, exists in str2. The result is False as there is no such string in str2.

Now, let's go over our first bit of regular expression syntax, which is the dot, also called the period. The dot matches any one character. So, we know that the word one exists in str1, but what about a different expression? So, let's say the following:

So, what we're asking here is, does the sequence o followed by the character e exist in str1? Well, we already know we have the word one in str1 where this condition is true, hence the output is True. We did the same thing for the str2 string, and that too came out to be True because the letters o and e in the word over exist inside str2

The second regular expression character that we would like to introduce in this section is the pipe. The pipe character is made using the vertical character or bar that appears over the Enter key on most keyboards. Many programming languages use the pipe to represent OR, and regular expressions are no different. We can put a pipe between two expressions, and that means that either the first or the second expression is valid. So, let's do a quick example:

In this first case, we are checking whether the word fish or fox appears in our string str1. We know that the word fish appears several times in our first string, so it will return True. We also did the same with str2 and we know that the word fox appears in our second string. So, of course, this also results in a True.

We can try out one more example and check for whether the word dog or cat exists in our strings, str1 and str2:

As you can see, both of these are going to result in False.

So, in this section, we installed the regular expression library and we looked at two symbols within the regular expression syntax: the dot and the pipe. The dot represents any one character, and the pipe means that any two expressions can be true. We could also chain multiple expressions with pipe so that any one of the expressions in the pipe chain can be true. In our next section, we will be looking at simple modifiers with regular expressions, and understand what an atom is.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset