Chapter 1. Getting Started with Regex

Regular expressions are special kinds of tools used to represent patterns syntactically. When working with any type of textual input, you don't always know what the value will be, but you can usually assume (or even demand) the format you are going to receive into your application. These types of situations arise when you create a regular expression to extract and manipulate this input.

Consequently, to match a specific pattern requires a very mechanical syntax, since a change in even a single character or two can vastly change the behavior of a regular expression and, as a result, the final outcome as well.

Regular expressions by themselves (or Regex, for short) are not specific to any single programming language and you can definitely use them in nearly all the modern languages straight out of the box. However, different languages have implemented Regex with different feature sets and options; in this book, we will be taking a look at Regex through JavaScript, and its specific implementation and functions.

It's all about patterns

Regular expressions are strings that describe a pattern using a specialized syntax of characters, and throughout this book, we will be learning about these different characters and codes that are used to match and manipulate different pieces of data in a vague sort of manner. Now, before we can attempt to create a regular expression, we need to be able to spot and describe these patterns (in English). Let's take a look at a few different and common examples and later on in the book, when we have a stronger grasp on the syntax, we will see how to represent these patterns in code.

Analyzing a phone number

Let's begin with something simple, and take a look at a single phone number:

123-123-1234

We can describe this pattern as being three digits, a dash, then another three numbers, followed by a second dash, and finally four more numbers. It is pretty simple to do; we look at a string and describe how it is made up, and the preceding description will work perfectly if all your numbers follow the given pattern. Now, let's say, we add the following three phone numbers to this set:

123-123-1234
(123)-123-1234
1231231234

These are all valid phone numbers, and in your application, you probably want to be able to match all of them, giving the user the flexibility to write in whichever manner they feel most comfortable. So, let's have another go at our pattern. Now, I would say we have three numbers, optionally inside brackets, then an optional dash, another three numbers, followed by another optional dash, and finally four more digits. In this example, the only parts that are mandatory are the ten digits: the placing of dashes and brackets would completely be up to the user.

Notice also that we haven't put any constraints on the actual digits, and as a matter of fact, we don't even know what they will be, but we do know that they have to be numbers (as opposed to letters, for instance), so we've only placed this constraint:

Analyzing a phone number

Analyzing a simple log file

Sometimes, we might have a more specific constraint than just a digit or a letter; in other cases, we may want a specific word or at least a word from a specific group. In these cases (and mostly with all patterns), the more specific you can be, the better. Let's take the following example:

[info] – App Started
[warning] – Job Queue Full
[info] – Client Connected
[error] – Error Parsing Input
[info] – Application Exited Successfully

This is an example of some sort of log, of course, and we can simply say that each line is a single log message. However, this doesn't help us if we want to manipulate or extract the data more specifically. Another option would be to say that we have some kind of word in brackets, which refers to the log level, and then a message after the dash, which will consist of any number of words. Again, this isn't too specific, and our application may only know how to handle the three preceding log levels, so, you may want to ignore everything else or raise an error.

To best describe the preceding pattern, we would say that you have a word, which can either be info, a warning, or an error inside a pair of square brackets, followed by a dash and then some sort of sentence, which makes up the log message. This will allow us to capture the information from the log more accurately and make sure our system is ready to handle the data before we send it:

Analyzing a simple log file

Analyzing an XML file

The last example I want to discuss is when your pattern relies on itself; a perfect example of this is with something like XML. In XML you may have the following markup:

<title>Demo</title>
<size>45MB</size>
<date>24 Dec, 2013</date>

We could just say that the pattern consists of a tag, some text, and a closing tag. This isn't really specific enough for it to be a valid XML, since the closing tag has to match the opening one. So, if we define the pattern again, we would say that it contains some text wrapped by an opening tag on the left-hand side and a matching closing tag on the right-hand side:

Analyzing an XML file

The last three examples were just used to get us into the Regex train of thought; these are just a few of the common types of patterns and constraints, which you can use in your own applications.

Now that we know what kind of patterns we can create, let's take a moment to discuss what we can do with them; this includes the actual features and functions JavaScript provides to allow us to use these patterns once they're made.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset