TMT: Document Type Declaration

This is an installment of Ten Minute Tech; where I pick a technology related subject and then write a paragraph or two about what I know. I then pick and read a reference article related to the subject and then write another paragraph or two about what I learned. This edition’s topic is about Document Type Declaration’s.

Before

In my opinion Document Type Declaration’s (DOCTYPE) are one of the most overlooked items when it comes to web development. What I mean is that many developers don’t use them at all or they use but don’t understand why they should. It really is remarkable how using a proper DOCTYPE in your xhtml files can effect the rendering of your pages, especially when it comes to cross browser coding.

I am not going to go into too much more detail about how a proper DOCTYPE can effect web page rendering (volumes could be written on it); I guess I am just really tired of people still complaining about the Box Model problem in Internet Explorer 6 (if they would use a proper DOCTYPE they wouldn’t have to worry about it).

Now, a DOCTYPE is an instruction (line of code) that links to a file that describes the markup syntax for the rest of the page. Its most practical application is for use at the top of an xhtml or html page to describe to the browser what tags and attributes the page can use in the xhtml/html code. The file the DOCTYPE links to is a language of it own know as a Document Type Definition (DTD).

For a non-technical example take the English language. If you were writing a letter to your friend in English and you wrote at the very top of the letter ‘This letter is written in English’. This would be equivalent to the DOCTYPE. You are simply stating right away what someone is going need to understand in order to read the rest of the letter.

The Document Type Definition (DTD) would be the dictionaries, thesaurus and grammar books that some could use to decipher the message (if they didn’t already know the language).

For the computer a proper DOCTYPE not only declares the language, but it also tells the computer where to find the DTD. This would be like you not only writing the fact that the letter is in English, but also listing a bunch of ISBN numbers for the person to lookup the English books.

DOCTYPE’s and their definitions are typically used to define markup languages (html, xhtml, rss, soap, etc); but I believe they can be extended to almost any type of syntax (if only in theory).

As far as the syntax of the DOCTYPE itself, I myself can never remember, I always have to look them up.

Article: Document Type Declaration

After

The article itself is actually very narrowly written. It focuses on the various html and xhtml DOCTYPE’s (maybe should be expanded). It does however mention the fact that it is used for SGML and XML based documents.

I think the real potential in DOCTYPE’s and DTD’s is the the idea that they are a practical application of a way to create a computer readable language that describes other languages (i.e human languages). I think that this could really be the foundation of something much greater. Just think of a time that we could truly teach a computer how read and understand a human language. The applications in translation and human to computer understanding could be great, and I think we will slowly grow in this direction. I would be interested to know if there are projects and studies out there today that are working toward this very thing.

Some good additional reading is the article on Document Type Definition’s. It goes into the nuts and bolts of how these technologies are used today to describe custom markup languages. Unfortunately this article does not talk about the theory behind it.
http://en.wikipedia.org/wiki/Document_Type_Definition

Here is an article on markup languages in general. It briefly touches on the idea of semantic markup; the idea that the code reflects the [human] meaning of the information inside of it.
http://en.wikipedia.org/wiki/Markup_language

One Response to “TMT: Document Type Declaration”

  1. jordoncm Says:

    I am getting a lot of misdirected feedback on this article and I just wanted to take a moment to clarify what DTD/Schema really is.

    DTD/Schema is a set of computer readable languages that can be used to describe the structure and format of a markup language. It is used today to define the structure of all markup languages.

    The idea is that you can define your own markup language specific to your needs.

    For example, let say I was trying to write an application that stores and displays addresses.

    Therefore a sample of the data I am storing would be:

    123 Main St. Apt 210. Somewhere IA 12345

    To a computer this line above really means nothing besides that it is a string of text.

    Now if I were to use DTD and a DOCTYPE, I could store the information like this:

    <address>
      <street>123 Main St.</street>
      <apartment>210</apartment>
      <city>Somewhere</city>
      <state>IA</state>
      <zipcode>12345</zipcode>
    </address>
    

    In this format I bring more meaning to the data by wrapping it in a custom markup language. The data becomes much easier for a computer to extract specific parts of it. Also as a human reading over the data, the markup brings more human readable meaning.

    We have effectively bridged a gap between the human and the computer. We have brought more meaning to a human reading the raw data and we have enabled the computer to analyze and break down the data in the same way a human would.

    DTD/Schema and DOCTYPE would be used to tell the computer that inside of the tag ‘address’ there are tags ’street’, ‘apartment’, etc.

    My comments about the application of this in the arena of artificial intelligence were simply remarking on the theory this provides. The fact that we have a computer language that can define the syntax of another language. I think that this idea could someday grow into use with human spoken language.

Leave a Reply

{
}