Chapter 2
In This Chapter
Introducing the concept of valid pages
Using a doctype
Setting the character set
Meeting the W3C validator
Fixing things when they go wrong
Using HTML Tidy to clean your pages
Web development is undergoing a revolution. As the web matures and becomes a greater part of everyday life, it's important to ensure that web pages perform properly — thus, a call for web developers to follow voluntary standards of web development.
In the bad old days, the web was an informal affair. People wrote HTML pages any way they wanted. Although this was easy, it led to a lot of problems:
In short, the world of HTML was a real mess.
In 2000, the World Wide Web Consortium (usually abbreviated as W3C) got together and proposed some fixes for HTML. The basic plan was to create a new form of HTML that complied with a stricter form of markup, or eXtensible Markup Language (XML). The details are long and boring, but essentially, they came up with some agreements about how web pages are standardized. Here are some of those standards:
This sounds like strict librarian rules, but really they aren't restricting at all. Most of the good HTML coders were already following these guidelines or something similar.
Even though you're moving past XHTML into HTML5, these aspects of XHTML remain, and they are guidelines all good HTML5 developers still use.
In old-style HTML, you never really knew how your pages would look on various browsers. In fact, you never really knew if your page was even written properly. Some mistakes would look fine on one browser but cause another browser to blow up.
The idea of validation is to take away some of the uncertainty of HTML. It's like a spell checker for your code. My regular spell checker makes me feel a little stupid sometimes because I make mistakes. I like it, though, because I'm the only one who sees the errors. I can fix the spelling errors before I pass the document on to you, so I look smart. (Well, maybe.)
It'd be cool if you could have a special kind of checker that does the same things for your web pages. Instead of checking your spelling, it'd test your page for errors and let you know if you made any mistakes. It'd be even cooler if you could have some sort of certification that your page follows a standard of excellence.
That's how page validation works. You can designate that your page will follow a particular standard and use a software tool to ensure that your page meets that standard's specifications. The software tool is a validator. I show you two different validators in the upcoming “Validating Your Page” section.
The browsers also promise to follow a particular standard. If your page validates to a given standard, any browser that validates to that same standard can reproduce your document correctly, which is a big deal.
The most important validator is the W3C validator at http://validator.w3.org, as shown in Figure 2-1.
A validator is actually the front end of a piece of software that checks pages for validity. It looks at your web page's doctype and sees whether the page conforms to the rules of that doctype. If not, it tells you what might have gone wrong.
You can submit code to a validator in three ways:
Validation might sound like a big hassle, but it's really a wonderful tool because sloppy HTML code can cause lots of problems. Worse, you might think everything's okay until somebody else looks at your page, and suddenly, the page doesn't display correctly.
To explain all this, I created a web page the way Aesop might have done in ancient Greece. Okay, maybe Aesop didn't write his famous fables as web pages, but if he had, they might have looked like the following code listing:
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<!-- oxWheels1.html -->
<!-- note this page has deliberate errors! Please see the text
and oxWheelsCorrect.html for a corrected version.
-->
</head>
<body>
<title>The Oxen and the Wheels</title>
<h1>The Oxen and the Wheels
<h2></h1>From Aesop's Fables</h2>
<p>
A pair of Oxen were drawing a heavily loaded wagon along a
miry country road. They had to use all their strength to pull
the wagon, but they did not complain.
<p>
<p>
The Wheels of the wagon were of a different sort. Though the
task they had to do was very light compared with that of the
Oxen, they creaked and groaned at every turn. The poor Oxen,
pulling with all their might to draw the wagon through the
deep mud, had their ears filled with the loud complaining of
the Wheels. And this, you may well know, made their work so
much the harder to endure.
</p>
<p>
"Silence!" the Oxen cried at last, out of patience. "What have
you Wheels to complain about so loudly? We are drawing all the
weight, not you, and we are keeping still about it besides."
</p>
<h2>
They complain most who suffer least.
</h2>
</body>
</html>
The code looks okay, but actually has a number of problems. Aesop may have been a great storyteller, but from this example, it appears he was a sloppy coder. The mistakes can be hard to see, but trust me, they're there. The question is, how do you find the problems before your users do?
You might think that the problems would be evident if you viewed the page in a web browser. The various web browsers seem to handle the page decently, even if they don't display it in an identical way. Figure 2-2 shows oxWheels1.html in a browser.
Chrome appears to handle the page pretty well, but From Aesop's Fables is supposed to be a headline level two, or H2, and it appears as plain text. Other than that, there's very little indication that something is wrong.
If it looks fine, who cares if it's exactly right? You might wonder why we care if there are mistakes in the underlying code, as long as everything works okay. After all, who's going to look at the code if the page displays properly?
The problem is, you don't know if it'll display properly, and mistakes in your code will eventually come back to haunt you. If possible, you want to know immediately what parts of your code are problematic so you can fix them and not worry.
To find out what's going on with this page, pay a visit to the W3C validator at http://validator.w3.org. Figure 2-3 shows me visiting this site and uploading a copy of oxWheels1.html to it.
Hold your breath and click the Check button. You might be surprised at the results shown in Figure 2-4.
The validator is a picky beast, and it doesn't seem to like this page at all. The validator does return some useful information and gives enough hints that you can decode things soon enough.
Before you look at the specific complaints, take a quick look at the web page the validator sends you. The web page is chock-full of handy information. The top of the page tells you a lot of useful things:
The validator doesn't always tell you everything you need to know, but it does give you some pretty good clues. Page validation is tedious but not as difficult as it might seem at first. Here are some strategies for working through page validation:
Look again at the results for the oxWheels1.html page. The first error message looks like Figure 2-5.
Figure 2-5 shows the first two error messages. The first complains that the head is missing a title. The second error message is whining about the title being in the body. The relevant code is repeated here:
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<!-- oxWheels1.html -->
<!-- note this page has deliberate errors! Please see the text
and oxWheelsCorrect.html for a corrected version.
-->
</head>
<body>
<title>The Oxen and the Wheels</title>
Look carefully at the head and title tag pairs and review the notes in the error messages, and you'll probably see the problem. The <title> element is supposed to be in the heading, but I accidentally put it in the body! (Okay, it wasn't accidental; I made this mistake deliberately here to show you what happens. However, I have made this mistake for real in the past.)
If the title tag is the problem, a quick change in the HTML should fix it. oxWheels2.html shows another form of the page with my proposed fix:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<!-- oxWheels2.html -->
<!-- Moved the title tag inside the header -->
<title>The Oxen and the Wheels</title>
</head>
<body>
Note: I'm only showing the parts of the page that I changed. The entire page is available on this book's website. See this book's Introduction for more on the website.
The fix for this problem is pretty easy:
One down, but more to go. The next error (refer to Figure 2-6) looks strange, but it makes sense when you look over the code.
This type of error is very common. What it usually means is you forgot to close something or you put something in the wrong place. The error message indicates a problem in line 17. The next error is line 17, too. See if you can find the problem here in the relevant code:
<body>
<h1>The Oxen and the Wheels
<h2></h1>From Aesop's Fables</h2>
After you know where to look, the problem becomes a bit easier to spot. I got sloppy and started the <h2> tag before I finished the <h1>. In many cases, one tag can be completely embedded inside another, but you can't have tag definitions overlap as I've done here. The <h1> has to close before I can start the <h2> tag.
This explains why browsers might be confused about how to display the headings. It isn't clear whether this code should be displayed in H1 or H2 format, or perhaps with no special formatting at all. It's much better to know the problem and fix it than to remain ignorant until something goes wrong.
The third version — oxWheels3.html — fixes this part of the program:
<!-- oxWheels3.html -->
<!-- sort out the h1 and h2 tags at the top -->
<title>The Oxen and the Wheels</title>
</head>
<body>
<h1>The Oxen and the Wheels</h1>
<h2>From Aesop's Fables</h2>
The validator has fixed a number of errors, but there's one really sneaky problem still in the page. See if you can find it, and then read ahead.
The W3C validator isn't the only game in town. Another great resource — HTML Tidy — can be used to fix your pages. You can download Tidy or just use the online version at http://infohound.net/tidy. Figure 2-7 illustrates the online version.
Unlike W3C's validator, Tidy actually attempts to fix your page. Figure 2-8 displays how Tidy suggests the oxWheels1.html page be fixed.
Tidy examines the page for a number of common errors and does its best to fix the errors. However, the result is not quite perfect:
I sometimes use Tidy when I'm stumped because I find the error messages are easier to understand than the W3C validator. However, I never trust it completely. Until it's updated to truly understand HTML5, it sometimes deletes perfectly valid HTML5 tags. There's really no substitute for good old detective skills and the official W3C validator.
Did you figure out that last error? I tried to close a paragraph with <p> rather than </p>. That sort of thing freaks out an XHTML validator, but HTML takes it in stride, so you might not even know there is a problem. Tidy does notice the problem and repairs it. Remember this when you're working with a complex page and something doesn't seem right. It's possible there's a mistake you can't even see, and it's messing you up. In that case, consider using a validator and Tidy to figure out what's going wrong and fix it.