The Perversion of HTML

Everybody is doing HTML wrong.

Oh, alright, a few people are doing HTML properly. But the number is so small as to be statistically insignificant. Most "Web designers" don't follow the rules of proper HTML. Many Web designers don't even realise there are rules for correctly writing HTML.

A Historical Overview

In the days before HTML, computers didn't have a convenient way to share documents. Each different type of computer stored documents in a different format, so copying a document from a PC to a Mac® to a Unix system to a mainframe was very awkward and cumbersome. Even on the same type of computer, different pieces of software treated documents in different ways and often couldn't talk to each other. As more and more computers became connected over the Internet, a way was needed of seamlessly moving documents around so that anybody could see them on any computer.

That problem was solved ten years ago by a visionary computer scientist, Tim Berners-Lee. In one fell swoop, he invented HyperText Transfer Protocol (HTTP), a system for communicating between different computers, and HyperText Mark-up Language (HTML), a document format that every computer in the world could potentially read. In short, he invented the World Wide Web.

Some Technical Stuff

HTML, as defined by Berners-Lee and his early co-workers, was a simple system of codes that described the structure of a document. In theory any document in the world, from the "Gettysburg Address" to Stephen King's It, from Einstein's paper on Special Relativity to a script for Frasier, could be completely and thoroughly defined by these simple codes.

And the codes really were simple. <P> denoted a paragraph, <H1> indicated the document's main title, <H2> showed a sub-heading, <BLOCKQUOTE> showed a paragraph of quoted material, and so on.

<H1>My Simple Document

<P>This is a very simple document.

<P>It has a title with three paragraphs below it.

<P>See how easy it is?

It really was incredibly simple. So simple that anybody with anything to say could package up their writing in HTML and have it read anywhere in the world. The World Wide Web exploded.

Back To The History

The software used to read a HTML page is called a Web Browser (or just Browser). Initially, browsers were very simple pieces of software. All they needed to do was understand how to break a document into paragraphs wherever the little <P> codes told them to.

Then along came Netscape® Navigator.

In their own way, the people who founded the company that would become Netscape were as visionary as Berners-Lee. For they realised that there was money in the Web. And one way to make the money was to make sure that everybody used your Web browser. So this is what they set out to do. Combining clever software design with aggressive marketing, they made sure that almost everybody who used the rapidly-expanding Web did so with their Navigator browser. Netscape became one of the biggest success stories in history and a household word in the computer industry. (And when Microsoft® repeated the formula a few years later but did it better, Netscape went crying for government intervention. It always makes me think of spoiled kids in the playground, but that's a rant for another day.)

What made Netscape Navigator take off? Why did every Web user in the world suddenly want it?

Netscape added bits to HTML. They took Tim Berners-Lee's simple document mark-up system, and blew it out of all recognition. Suddenly, Web designers could do tricks in HTML that Berners-Lee had never considered. You want coloured text? You want different fonts? You want a page to sing and dance (literally)? Great, here's how you do it! The only problem is, all these new tricks only show up if your readers are using Netscape Navigator. And everybody wants these new tricks. What a superb marketing ploy!

Except, it's an open market. Nobody owns HTML. What Netscape did, others could copy. So they did. Suddenly, we had a market flooded with Browsers. (Everybody knows Netscape Navigator and Microsoft Internet Explorer, but believe me there are many others. I have nine different ones on my computer.)

Of course, every company wants you to use their browser, not the other guys'. So everybody raced to add new and different features. Everybody wanted their browser to do something the others didn't. No two browsers ever worked the same. A HTML page which looked great on one browser would look awful, or possibly not show up at all, on another.

Remember why HTML was invented in the first place? I'll recap:

"computers didn't have a convenient way to share documents. Each different type of computer stored documents in a different format . . . different pieces of software treated documents in different ways and often couldn't talk to each other."

The wheel has come full circle.

Setting The Standard

For something to work universally, there has to be a commonly-agreed standard. So we have the metric system, for example, a standard which allows anybody in the world to measure something in the same way. We have international standards in chemistry, which allow chemists to swap formulas and share research without blowing each other up. And so on and so on. HTML should have given us a standard means of swapping computer documents, but it all went horribly wrong.

So Tim Berners-Lee tried to correct the mess. He founded the World Wide Web Consortium (W3C), a non-profit organisation dedicated to defining a standard form of HTML. They devised a back-to-basics specification that would fulfil the original purpose of HTML: allow documents to be swapped between different computer systems.

The HTML standards are remarkably simple, clean, and easy to use. Much easier than the mess of bolt-on options and ad-hoc features that Netscape and others have provided us with.

But nobody is following the standards.

Why not? Well if you know a little HTML you might be surprised to find that the following tags do not exist in strict, standards-compliant HTML (and this is just a small selection):

<font color=red>
<font face=Arial>
<b>
<i>

Yes, I've just told all of you who use HTML to pretty-up your pages here on Themestream that you are wrong. Every piece of HTML code you have painstakingly learned is non-standard, bad practice, and basically not HTML. I'm sorry, but there it is.

I've seen a lot of attractive pages here on Themestream, and on the Web in general. Some of you are doing stunning design work, and you should be proud of what you produce. But that doesn't alter the fact that you're using your tools incorrectly.

So What IS HTML For?

If you can't make your pages look good, what's the point of using HTML? Well, HTML is not a page layout language. It was never intended to make a page look pretty. It's a logical mark-up language. It describes the structure of a document, not the look of a document.

Logical mark-up is a tool with incredible depth and breadth of use, and yet is elegant in its simplicity. I don't have the space here to explain the full implications of it. Logical page mark-up requires a huge conceptual leap away from what you are all currently using HTML for, and it deserves an essay to itself. All you need to know, for now, is that it has absolutely nothing to do with changing the colour or the typeface of your writing.

If you want to design a page and lay it out according to your exact specifications, you shouldn't be using HTML. Try Adobe® PDF or Postscript®,instead, or Microsoft's RTF, or any one of a hundred proprietary page layout formats. Or try that old stalwart of publishing, offset lithography.

So Why Should You Care?

The pages you write look great when you look at them, don't they? What's the point of a standard that doesn't let you do what you want to do?

First, look at any of my articles on Themestream. If you have a recent browser version, you will see colour, boxes, columns, fonts, and pretty much everything I've just told you that the standards ban. Yet my work is 100% standards-compliant. How? Well, I don't add the pretty stuff with HTML, I use a complementary technology called CSS. Which also happens to be a standard defined by the W3C. And I can actually do more neat stuff by following the CSS standard than any of you can do with your versions of HTML.

Yes, so? That doesn't explain why a standard, whether for HTML, CSS, ST:TNG, or WTF, is important.

Standards are just inherently important. Isn't this a self-evident truth? Think about the metric system I mentioned earlier. How would we function if people started measuring things in phooms, plonks, Queerbits, or any other unit they felt like inventing? How could we ever know what people were talking about or asking us for? (It's hard enough that some people deal in inches and some in centimetres.)

But Web browsers mostly cope with all this non-standard HTML, don't they? And tools like Dreamweaver or FrontPage produce this non-standard HTML too, don't they? Everything works.

No. Not everything works. Not every page displays properly everywhere. Many pages can't be accessed by blind people with screen-reading software. Many pages throw up messages like "Please download component XYZ to see anything on this page". Some pages show up completely blank in your particular browser, and you don't have a clue why. You won't get any of those problems with a page that follows the correct standards.

Think of standards in English. Yes, you could ignore the accepted standards of spelling and grammar and still make yourself understood. You're reeders be smart. Them can werk out what u r righting even tho u be righting rubbish. Yes, sure they can. But they have to work a bit harder to do it. And somebody who has English as a second language might have no chance at all of deciphering your meaning. As writers, we should aspire to follow accepted standards of grammar and spelling, even when that constrains our creativity. Or be good enough that it doesn't constrain our creativity. So it is with HTML.

Convinced Yet?

If you've made it this far, I hope you're at least considering that standard HTML might be something worth investigating.

I am considering writing a regular column on how to do HTML. I mean strict, standards-compliant HTML. But I'm not going to write a column that nobody will read. So, are you interested? Please use the "talk back" section, below, to give me your views. Maybe nobody cares about doing things right. I care, and I hope I've convinced a few of you.

Thanks for listening.

For more of my ranting about the Web, read Why I Hate Working on the World Wide Web.

<-Index

© 2001 by David Meadows. All rights reserved.
2 February 2001