Text-to-HTML Converter Script

Let's finish up with a slightly more useful script: webbuild.pl takes a simple text file as an argument, prompts you for some basic values, and then spits out an HTML version of your text file. It's not a very sophisticated HTML generator—it won't handle embedded boldface or other formatting, it doesn't handle links or images; really, it does little other than stick paragraph tags in the right place, let you specify the foreground and background colors, and give you a simple heading and a link to your e-mail address. But it does give you a basic HTML template to work from.

How It Works

In addition to simply converting text input to HTML, the webbuild.pl script prompts you for several other values, including

  • The title of the page (<title></title> in HTML).

  • Background and text colors (here I've limited it to the built-in colors supported by HTML, and we'll verify the input to make sure that it's one of those colors). This part also includes some rudimentary online help as well.

  • An initial heading (<h1></h1> in HTML).

  • An e-mail address, which will be inserted as a link at the bottom of the final HTML page

Here's what running the webbuild.pl script would produce with the some given prompts and output:

% webbuild.pl janeeyre.txt
Enter the title to use for your web page: Charlotte Bronte, Jane Eyre, Chapter
One
Enter the background color (? for options): ?
One of:
white, black, red, green, blue,
orange, purple, yellow, aqua, gray,
silver, fuchsia, lime, maroon, navy,
olive, or Return for none
Enter the backgroundcolor (? for options): white
Enter the text color (? for options): black
Enter a heading: Chapter One
Enter your email address: [email protected]
******************************
<html>
<head>
<title>Charlotte Bronte, Jane Eyre, Chapter One</title>
</head>
<body bgcolor="white" text="black">
<h1>Chapter One</h1>
<p>There was no possibility of taking a walk that day. We had been
wandering, indeed, in the leafless shrubbery an hour in the morning;
... more text deleted for space ...
fireside, and with her darlings about her (for the time neither
</p>
<hr>
<address><a href="mailto:[email protected]">[email protected]</a></address>
</body>
</html>

The resulting HTML file, as the previous output shows, could then be copy-and-pasted into a text editor, saved, and loaded into a Web browser to see the result (Figure 7.1 shows that result).

Figure 7.1. The result of the webbuild.pl script.


Later in this book (on Day 15, “Managing I/O,” specifically), I'll show you a way to output the data to a file, rather than to the screen.

The Input File

One note about the text file you give to webbuild.pl to convert: The script assumes the data you give it is a file of paragraphs, with each paragraph separated by a blank line. For example, here are the contents of the file janeeyre.txt, which I used for the example output:

There was no possibility of taking a walk that day. We had been
wandering, indeed, in the leafless shrubbery an hour in the morning;
but since dinner (Mrs. Reed, when there was no company, dined early)
the cold winter wind had brought with it clouds so sombre, and a
rain so penetrating, that further outdoor exercise was now out of
the question.

I was glad of it: I never liked long walks, especially on chilly
afternoons: dreadful to me was the coming home in the raw twilight,
with nipped fingers and toes, and a heart saddened by the chidings
of Bessie, the nurse, and humbled by the consciousness of my
physical inferiority to Eliza, John, and Georgiana Reed.

The said Eliza, John, and Georgiana were now clustered round
their mama in the drawing-room: she lay reclined on a sofa by the
fireside, and with her darlings about her (for the time neither

The Script

Listing 7.3 shows the code for our script.

Listing 7.3. webbuild.pl
1:  #!/usr/local/bin/perl -w
2:  #
3:  # webbuild:  simple text-file conversion to HTML
4:  # *very* simple.  Assumes no funky characters, embedded
5:  # links or boldface, etc.  Blank spaces == paragraph
6:  # breaks.
7:
8:  $title = '';                    # <TITLE>
9:  $bgcolor = '';                  # BGCOLOR
10: $text = '';                     # TEXT
11: $head = '';                     # main heading
12: $mail = '';                     # email address
13: $paragraph = '';                # is there currently an open paragraph tag?
14:
15: print "Enter the title to use for your web page: ";
16: chomp($title = <STDIN>);
17:
18: foreach $color ('background', 'text') { # run twice, once for each color
19:     $in = '';                   # temporary input
20:     while () {
21:         print "Enter the $color color (? for options): ";
22:         chomp($in = <STDIN>);
23:         $in = lc $in;
24:
25:         if ($in eq '?') {       # print help
26:             print "One of: 
white, black, red, green, blue,
";
27:             print "orange, purple, yellow, aqua, gray,
";
28:             print "silver, fuchsia, lime, maroon, navy,
";
29:             print "olive, or Return for none
";
30:             next;
31:         }  elsif ($in eq '' or
32:                  $in eq 'white' or
33:                  $in eq 'black' or
34:                  $in eq 'red' or
35:                  $in eq 'blue' or
36:                  $in eq 'green' or
37:                  $in eq 'orange' or
38:                  $in eq 'purple' or
39:                  $in eq 'yellow' or
40:                  $in eq 'aqua' or
41:                  $in eq 'gray' or
42:                  $in eq 'silver' or
43:                  $in eq 'fuchsia' or
44:                  $in eq 'lime' or
45:                  $in eq 'maroon' or
46:                  $in eq 'navy' or
47:                  $in eq 'olive') { last; }
48:         else {
49:             print "that's not a color.
";
50:         }
51:     }
52:
53:     if ($color eq 'background') {
54:         $bgcolor = $in;
55:     }  else {
56:         $text = $in;
57:     }
58: }
59:
60: print "Enter a heading: ";
61: chomp($head = <STDIN>);
62:
63: print "Enter your email address: ";
64: chomp($mail = <STDIN>);
65:
66: print '*' x 30;
67:
68: print "
<html>
<head>
<title>$title</title>
";
69: print "</head>
<body";
70: if ($bgcolor ne '') { print qq( bgcolor="$bgcolor"); }
71: if ($text ne '') { print qq( text="$text"); }
72: print">
";
73: print "<h1>$head</h1>
<p>";
74: $paragraph = 'y';
75:
76: while (<>) {
77:     if ($_ =~ /^s$/) {
78:         if ($paragraph eq 'y') {
79:             print "</p>
";
80:             $paragraph = 'n';
81:         }
82:
83:         print "<p>
";
84:         $paragraph = 'y';
85:     }  else {
86:         print $_;
87:     }
88: }
89:
90: if ($paragraph eq 'y') {
91:     print "</p>
";
92: }
93:
94: print qq(<hr>
<address><a href="mailto:$mail">$mail</a></address>
);
95: print "</body>
</html>
";
						

There's little that's overly complex, syntax-wise, in this script; it doesn't even use any arrays or hashes (it doesn't need to; there's nothing that really needs storing or processing here). It's just a lot of loops and tests.

There are at least a few points to be made about why I organized the script the way I did, so we can't end this lesson quite yet. Let's start with the large foreach loop starting in line 18.

This loop handles the prompt for both the background and text colors. Because both of these prompts behave in exactly the same way, I didn't want to have to repeat the same code for each one (particularly given that there's a really huge if test in lines 31 through 47). Later, you'll learn how to put this kind of repetitive code into a subroutine, and then just call the subroutine twice. But for now, because we know a lot about loops at this point, and nothing about subroutines, I opted for a sneaky foreach loop.

The loop will run twice, once for the string 'background' and once for the string 'text'. We'll use these strings for the prompts, and later to make sure the right value gets assigned to the right variable ($bgcolor or $text).

Inside the foreach loop, we have another loop, an infinite while loop, which will repeat each prompt until we get acceptable input (input verification is always a good programming practice). At the prompt, the user has three choices: enter one of the sixteen built-in colors, hit Return (or Enter) to use the default colors, or type ? for a list of the choices.

The tests in lines 25 through 50 process each of these choices. First, ?. In response to a question mark, all we have to do is print a helpful message, and then use next to drop down to the next iteration of the while loop (that is, redisplay the prompt and wait for more data).

The next test (starting in line 30) makes sure we have correct input: either a Return, in which case the input is empty (line 30); or one of the sixteen built-in colors. Note that the tests all test lowercase colors, which would seem overly limiting if the user typed BLACK or Black or some other odd-combination of upper and lowercase. But fear not; in line 23, we used the lc function to lowercase the input, which combines all those case issues into one (but conveniently doesn't affect input of ?).

If the input matches any of those seventeen cases, we call last in line 47 to drop out of the while loop (keep in mind that next and last, minus the presence of labels, refer to the nearest enclosing loop—to the while, not to the foreach). If the input doesn't match, we drop to the final else case in line 48, print an error message, and restart the while loop.

The final test in the foreach loop determines whether we have a value for the background color or for the text color, and assigns that value to the appropriate variable.

The final part of the script, starting on line 68 and continuing to the end, prints the top part of our HTML file, reads in and converts the text file indicated on the command line to HTML, and finishes up with the last part of the HTML file. Note the tests in line 69 and 70; if there are no values for $bgcolor or $text, we'll leave off those attributes to the HTML <body> tag altogether. (A simpler version would be to just leave them there, as bgcolor="" or text="", but that doesn't look as nice in the output).

You'll note also the use of the qq function. You learned about qq in passing way back in the “Going Deeper” section on Day 2, “Working with Strings and Numbers.” The qq function is a way of creating a double-quoted string without actually using any double-quotes. I used it here because if I had actually used double-quotes, I would have had to backslash the double-quotes in the string itself. I think it looks better this way.

Lines 74 through 80 read in the input file (using <>), and then simply print it all back out again, inserting paragraph tags at the appropriate spots (that is, where there are blank lines). I use the $paragraph variable to keep track of whether there's an open <p> tag with no corresponding closing tag. If there is, the script prints out a closing </p> tag before printing another opening <p>. A more robust version of this script would watch for things such as embedded special characters (accents, bullets, and so on) and replace them with the appropriate HTML codes—but that's a task done much easier with pattern matching, so we'll leave it for later.

All that's left is to print the final e-mail link (using an HTML mailto URL and link tags) and finish up the HTML file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset