Choosing a doctype

So making sure that your webpages validate is important because it will make the pages render more consistently across different browsers, meaning you can spend less time debugging cross browser incompatibility and more time with your loved ones/down the pub/plotting world domination.
programmer dreaming of world domination

But what are we going to validate against? The current most popular options are:

  • HTML 4.01 Transitional
  • HTML 4.01 Strict
  • XHTML 1.0 Transitional
  • XHTML 1.0 Strict

HTML or XHTML?

The only difference between HTML 4.01 and XHTML 1.0 is syntax. XHTML documents are XML documents and HTML documents aren’t. This means that XHTML has some extra rules to make it compatible with XML (the most obvious being lowercase tags and terminating empty elements).

There are no differences between the tags and attributes supported in XHTML 1.0 and HTML 4.01. Each version of these two standards (strict, transitional and frameset) supports all of the same elements. The only difference is the syntax. XHTML 1.1 deprecates some more elements and attributes that were allowable in XHTML 1.0, mostly to do with presentation because presentation should be controlled by using CSS.
HTML 4.01

The main advantages of using regular HTML are:

  • It is supported by all browsers. Older browsers and alternative browsers like mobile browsers may not understand XHTML. Even IE6 and IE7 support is a bit dubious.
  • The syntax is simpler. Empty tags don’t need to be terminated. Elements like tbody can be automatically inferred.
  • Even if there are mistakes, the page will still attempt to render and in most cases will still be usable by the person viewing the page. In some browsers when XHTML is served as “application/xhtml+xml” errors in markup will cause an error message to be displayed rather than the page’s content.

XHTML 1.0

The main advantages of using XHTML are:

  • It’s XML so you can use XML technologies like XSLT and XPATH on the document.
  • Markup mistakes are easy to find because some browsers will display an error message if they’re told to treat the page as an xml document. This in theory makes maintenance easier but may make it a bad choice for documents with dynamic or user contributed content.
  • In theory it’s extensible so it’s possible to plug other XML standards (imagine scalable vector graphics) into the same document. This doesn’t have widespread browser support yet.

There’s no good answer for ASP.NET developers

Choosing between XHTML and HTML for ASP.NET developers is choosing between a rock and a hard place.

ASP.NET 2.0 only really supports XHTML by default. Deciding to use HTML instead will mean fighting the framework. There isn’t a built in way to get it to render HTML syntax (that I could find – please leave a comment if I’m wrong). You can force ASP.NET 2.0 to render HTML 4.0 compatible syntax but it could lead to unpredictable problems because the framework is expecting to be using XHTML syntax.

On the other hand, the browser support for XHTML isn’t really there yet. IE6 and IE7 don’t really support XHTML in the same way other new browsers like Firefox and Opera do. IE can display XHTML webpages if they are sent from the web server as HTML documents (so if they have a mime-type of text/html) but it is really only displaying them as if they were HTML pages. This is technically allowed for XHTML 1.0 documents only but can be problematic. Other browsers will also display any XHTML pages that are sent as HTML pages as if they were regular HTML. It doesn’t trigger XML mode.

The problem is XHTML pages are meant to be sent from the server as XML (mimetype application/xhtml+xml). Other browsers can display this, but there are many subtle differences to the way they display HTML pages. The one that makes my blood run a bit cold is that accessing element.tagName in javascript will return an uppercase tagname for a page running in text/html mode and a lowercase tagname for a page running in application/xml mode.

IE can’t display pages that are sent from the server as xml at all. Here’s what IE users will see if they try to view a page sent as xml:
the page is displayed as an xml document in IE

To find out more about the differences between HTML and XHTML, there’s a great explanation on Sitepoint.

Configuring ASP.NET

ASP.NET 2.0 generates XHTML by default and doesn’t have a mode for generating HTML 4.0 specific syntax (let me know in the comments if I’m wrong about this – I searched but couldn’t find a setting that would work).

If you decide that HTML 4.0 is a better fit for your site, you can use the ASP.NET 2.0 adapter model to create a custom HtmlTextWriter object that generates HTML 4.0 compliant markup. This has the advantage of giving you very fine grained control over the markup that is created by the elements on your page but it uses fairly advanced ASP.NET functionality and could potentially lead to unpredictable problems with code that assumes XHTML compliant syntax. This is not the path for an easy life, but it’s possible if you really want or need to use HTML 4.0.

The pragmatist in me says that fighting with the framework is ultimately a pretty futile thing to do. I think we’re probably stuck with XHTML syntax for this version of ASP.NET. Hopefully Microsoft will make it a choice in future versions, especially now HTML development will be continued with HTML 5.

Strict vs transitional

The main differences between the strict and transitional versions are the tags that are supported. Transitional versions tend to contain older tags that are retired in the strict version. For example it is perfectly legitimate to use an iframe tag in a HTML 4.0 transitional but not in HTML 4.0 strict.

Elements left out of the strict version tend to be tags that historically don’t tend to work consistently across different browsers (like the iframe) or tags that have been replaced by a new technology (like the font tag that has been replaced by css styling).

It doesn’t matter whether you’re using HTML or XHTML, you should be using the strict doctype where possible. The legacy tags in HTML 4.0 transitional will one day be retired and have been replaced by better ways to do things. Only use the transitional version if you need some of the elements that it supports.

Configuring ASP.NET

Fortunately setting strict or transitional is pretty simple.

HTML generated by ASP.NET (like HTML generated by server controls) will automatically target the transitional doctype by default. You can configure it to target strict instead by setting the xhtml conformance property in your site’s web.config:

  1. <system .web>
  2.     …
  3.     <xhtmlconformance mode="Strict"/>
  4.     …
  5. </system>

ASP.NET will not use the information in the webpage (the doctype) to decide whether to generate strict or transitional HTML. It’s up to the developer to make sure that the xhtml conformance property has the right setting for the webpages in the site.

Here’s an example of the differences that the xhtml conformance property makes:
differences between strict and transitional

Doctypes

Once you’ve chosen what type of HTML you want to use, you need to let the browser know by adding a doctype to your page. This is a statement you add to the first line of the webpage. It has the type of HTML/XHTML to use and a link to the DTD file that defines it.

Here’s a page with a HTML 4.0 strict doctype:

  1. < !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <html>
  3.     …
  4. </html>

Browsers are very particular about what doctypes they will accept. They must be completely accurate or the browser will use a partial standards mode or will act as though the page has no doctype at all. There’s a complete list of valid doctypes on the W3C website.

Browsers without a doctype will render in a backwards compatible mode called quirks mode that is not compatible across different browsers.

Posted on 31 Dec 07 by Helen Emerson (last updated on 31 Dec 07).
Filed under ASP.NET