Manipulating data

We are going to add one more input to our form, which will be for the user's description. In the description, we will parse for things, such as e-mails, and then create both a plain text and HTML version of the user's description.

The HTML for this form is pretty straightforward; we will be using a standard textbox and give it an appropriate field:

Description: <br />
<textarea id="description_field"></textarea><br />

Next, let's start with the bare scaffold needed to begin processing the form data:

function process_description() {
    var field = document.getElementById("description_field");
    var description = field.value;

    data.text_description = description;

    // More Processing Here

    data.html_description = "<p>" + description + "</p>";

    return true;
}

fns.push(process_description);

This code gets the text from the textbox on the page and then saves both a plain text version and an HTML version of it. At this stage, the HTML version is simply the plain text version wrapped between a pair of paragraph tags, but this is what we will be working on now. The first thing I want to do is split between paragraphs, in a text area the user may have different split-ups—lines and paragraphs. For our example, let's say the user just entered a single new line character, then we will add a <br /> tag and if there is more than one character, we will create a new paragraph using the <p> tag.

Using the String.replace method

We are going to use JavaScript's replace method on the string object This function can accept a Regex pattern as its first parameter, and a function as its second; each time it finds the pattern it will call the function and anything returned by the function will be inserted in place of the matched text.

So, for our example, we will be looking for new line characters, and in the function, we will decide if we want to replace the new line with a break line tag or an actual new paragraph, based on how many new line characters it was able to pick up:

var line_pattern = /
+/g;
description = description.replace(line_pattern, function(match) {
    if (match == "
") {
        return "<br />";
    } else {
        return "</p><p>";
    }
});

The first thing you may notice is that we need to use the g flag in the pattern, so that it will look for all possible matches as opposed to only the first. Besides this, the rest is pretty straightforward. Consider this form:

Using the String.replace method

If you take a look at the output from the console of the preceding code, you should get something similar to this:

Using the String.replace method

Matching a description field

The next thing we need to do is try and extract e-mails from the text and automatically wrap them in a link tag. We have already covered a Regexp pattern to capture e-mails, but we will need to modify it slightly, as our previous pattern expects that an e-mail is the only thing present in the text. In this situation, we are interested in all the e-mails included in a large body of text.

If you were simply looking for a word, you would be able to use the  matcher, which matches any boundary (that can be the end of a word/the end of a sentence), so instead of the dollar sign, which we used before to denote the end of a string, we would place the boundary character to denote the end of a word. However, in our case it isn't quite good enough, as there are boundary characters that are valid e-mail characters, for example, the period character is valid. To get around this, we can use the boundary character in conjunction with a lookahead group and say we want it to end with a word boundary, but only if it is followed by a space or end of a sentence/string. This will ensure we aren't cutting off a subdomain or a part of a domain, if there is some invalid information mid-way through the address.

Now, we aren't creating something that will try and parse e-mails no matter how they are entered; the point of creating validators and patterns is to force the user to enter something logical. That said, we assume that if the user wrote an e-mail address and then a period, that he/she didn't enter an invalid address, rather, he/she entered an address and then ended a sentence (the period is not part of the address).

In our code, we assume that to the end an address, the user is either going to have a space after, such as some kind of punctuation, or that he/she is ending the string/line. We no longer have to deal with lines because we converted them to HTML, but we do have to worry that our pattern doesn't pick up an HTML tag in the process.

At the end of this, our pattern will look similar to this:

/[^s<>@]+@[^s<>@.]+.[^s<>@]+(?=.?(?:s|<|$))/g

We start off with a word boundary, then, we look for the pattern we had before. I added both the (>) greater-than and the (<) less-than characters to the group of disallowed characters, so that it will not pick up any HTML tags. At the end of the pattern, you can see that we want to end on a word boundary, but only if it is followed by a space, an HTML tag, or the end of a string. The complete function, which does all the matching, is as follows:

function process_description() {
    var field = document.getElementById("description_field");
    var description = field.value;

    data.text_description = description;

    var line_pattern = /
+/g;
    description = description.replace(line_pattern, function(match) {
        if (match == "
") {
            return "<br />";
        } else {
            return "</p><p>";
        }
    });

    var email_pattern = /[^s<>@]+@[^s<>@.]+.[^s<>@]+(?=.?(?:s|<|$))/g;
    description = description.replace(email_pattern, function(match){
        return "<a href='mailto:" + match + "'>" + match + "</a>";
    });

    data.html_description = "<p>" + description + "</p>";

    return true;
}

We can continue to add fields, but I think the point has been understood. You have a pattern that matches what you want, and with the extracted data, you are able to extract and manipulate the data into any format you may need.

Understanding the description Regex

Let's go back to the regular expression used to match the name entered by the user:

/[^s<>@]+@[^s<>@.]+.[^s<>@]+(?=.?(?:s|<|$))/g

This is a brief explanation of the Regex:

  •  asserts its position at a (^w|w$|Ww|wW) word boundary
  • [^s<>@]+ matches a single character not present in the list:
    • The + quantifier between one and unlimited times
    • s matches a [ f ] whitespace character
    • <>@ is a single character in the <>@ list (case-sensitive)
    • @ matches the @ character literally
  • [^s<>@.]+ matches a single character not present in this list:
    • The + quantifier between one and unlimited times
    • s matches any [ f] whitespace character
    • <>@. is a single character in the <>@. list literally (case sensitive)
    • . matches the . character literally
  • [^s<>@]+ matches a single character not present in this the list:
    • The + quantifier between one and unlimited times
    • s matches a [ f ] whitespace character
    • <>@ is a single character in the <>@ list literally (case sensitive)
    •  asserts its position at a (^w|w$|Ww|wW) word boundary
  • (?=.?(?:s|<|$)) Positive lookahead - Assert that the Regex below can be matched
    • .? matches any character (except new line)
    • The ? quantifier between zero and one time
    • (?:s|<|$) is a non-capturing group:
  • First alternative: s matches any white space character [ f]
  • Second alternative: < matches the character < literally
  • Third alternative: $ assert position at end of the string
  • The g modifier: global match. Returns all matches of the regular expression, not only the first one

Explaining a Markdown example

More examples of regular expressions can be seen with the popular Markdown syntax (refer to http://en.wikipedia.org/wiki/Markdown). This is a situation where a user is forced to write things in a custom format, although it's still a format, which saves typing and is easier to understand. For example, to create a link in Markdown, you would type something similar to this:

[Click Me](http://gabrielmanricks.com)

This would then be converted to:

<a href="http://gabrielmanricks.com">Click Me</a>

Disregarding any validation on the URL itself, this can easily be achieved using this pattern:

/[([^]]*)](([^(]*))/g

It looks a little complex, because both the square brackets and parenthesis are both special characters that need to be escaped. Basically, what we are saying is that we want an open square bracket, anything up to the closing square bracket, then we want an open parenthesis, and again, anything until the closing parenthesis.

Tip

A good website to write markdown documents is http://dillinger.io/.

Since we wrapped each section into its own capture group, we can write this function:

text.replace(/[([^]]*)](([^(]*))/g, function(match, text, link){
    return "<a href='" + link + "'>" + text + "</a>";
});

We haven't been using capture groups in our manipulation examples, but if you use them, then the first parameter to the callback is the entire match (similar to the ones we have been working with) and then all the individual groups are passed as subsequent parameters, in the order that they appear in the pattern.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset