HTML5 Microdata: What is it, and why should you care?

Tags html5, microdata, microformat, tutorial, vocabularyOctober 4, 2010

HTML5 is booming. One of the main reasons more and more articles about this subject are popping up on the web, is because more and more webbrowsers are supporting it. Even the most feared browser by webdevelopers "Internet Explorer" is making huge progress to make IE9 HTML5 ready. The demo’s Microsoft created tell us enough already.

But what exactly is HTML5? This subject is way too big to place into one blog article, but I’m trying to handle several aspects of the subject in several posts. For today, we’ll take a look at one of the new features of HTML5 called microdata. I’ll explain what it is, and why you should start using it.

I assume you don’t want to read any further, if you can’t start using this HTML5 microdata right now. Lucky for you, microdata is one of the features that you can start using today already! Browsers that don’t (fully) support HTML5 will completely ignore the microdata. For those who completely love SEO: Search engines will absolutely love microdata.

Semantics

As many people know by now, is that one of the goals of HTML5 is to be semantic (give meaning to the HTML). Consider the following piece of code (many websites these days have similar HTML):

Seems pretty familiar, doesn’t it? But take note of the id and class attributes that are used. The developer added these values to reference them for JavaScript or CSS, and give them a logical name that only he/she understands (id and class could contain everything, even other languages etc.). When you strip those, you’ll see what the browser actually sees:

Since the browser doesn’t understand what id="footer" means, it doesn’t do anything special with it. The browser is blind for these values, and so are search engines. The only thing they see, is a bunch of <div> (division) elements. Right now, the browser doesn’t know which information is stored inside each element.

That is where HTML5 semantics kick in. Here, we’ve rewritten the previous HTML to use a couple of new HTML5 elements to give the HTML more meaning:

As you can see, the HTML now has a lot more meaning. Not only your browser will understand these elements, search engines will also know where important data can be found (the article element will probably contain more useful info than the footer.

HTML5 has loads of new elements, each created for a specific task (examples are the figure, details and nav elements). Their goal is to give more meaning to the HTML (read more about HTML and semantics on 1stwebdesigner).

We want more!

Cool! We can now give each HTML element a more meaningful name. But there is a problem with the goal of HTML5 to be semantic using elements. What if you are in the need of an element, but it doesn’t exist? Take a look at the following examples:

That looks pretty good, doesn’t it? But sadly, all these elements described above don’t exist. It would be impossible for W3C to create all these kind of very specific elements. But does this mean HTML5 doesn’t meet the goal of being fully semantic?

Microdata to the rescue

Now we (finally) get to the point of microdata. In short, microdata allows the user to create "custom" elements (sort of), to give a specific meaning to them. But how can this be achieved?

We will need to create a vocabulary to make this work, and set it up in an object oriented way. Consider the following HTML:

Looks pretty good, doesn’t it? But how does "a machine" (your browser, a search engine) know which part of this HTML displays your picture, or tells us your name? For this example, we’re going to use the vocabulary called data-vocabulary.org (created by Google). Take note that you can create your own vocabulary, extend it and use it to suite your specific needs. Just make sure you document it, since you want the vocabulary to grow and people to use it.

Since we’re going to describe a person, we can make use of the Person microdata. Let’s add this to our previous code, and see how the HTML will look like with added microdata:

Whoah, our HTML just exploded! Why would you do that? Take note of the added itemscope and itemtype attributes.

Next to the fact you’ll be feeling good since your HTML will be perfect HTML5, Google (and other search engines) will award you for your effort. They can now easily scan your HTML (since they now know which parts of the HTML contains a name, an address etc.), and prettyfy the search results. When I enter the HTML described above in the Google Webmasters Rich Snippets tool, it shows an example on how it would look like in the Google Search results:

As you can see, Google extracted the data from the page and displays it in the search results. Looks pretty cool, doesn’t it? This valueble information could now lead to people finding your page faster. For example, in the future, a search for ".NET developer" could bring up actual people (if the page used microdata).

Conclusion

Although the goal of HTML5 to be semantic is being achieved using microdata, the only use for it (right now) is to enhance search results (we’re helping search engines with microdata). On the other hand, it’s pretty easy to implement and people might find your page faster, so why not use it right now?

As for the future, I hope browsers will make smart use of the microdata too. For example, when we have contact info stored on the page using microformat, one press of a button could easily copy this contact directly into our address book, without us having to type anything. Use your imagination to extend the use of microdata even further.

What do you think about this neat HTML feature? Are we becoming "Google slaves", or does it really have great input in the next version of HTML? Feel free to share.

Marcofolio.net