Tuesday, 11 May 2010

Bridging the Human and Data Webs

Introduction
“Today's web is built predominantly for human consumption. Even as machine-readable data begins to appear on the web, it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, web browsers can provide only minimal assistance to humans in parsing and processing web data: browsers only see presentation information.” W3C RDF Primer

Web pages are written using a language called HTML (Hyper Text Markup Language). Without going too far into a technical explanation of what that is, it is basically a means of adding specific attributes to your web page that is then rendered by your browser.

For example the attribute (more commonly referred to as a tag) for adding bold to a piece of text on a web page is to use the bold tags and

<b>This is bold text</b>

The above text when viewed in a browser will appear as bold courtesy of the bold start <b> and finish </b> tags. HTML consists of many hundred such tags for formatting text, adding links and images and all manner of other formatting types. All web pages are built from HTML, so it’s a very important language within the world of the web.

Web pages are naturally created for humans to look at. Humans visit web sites, buy things online, read blogs and so on, so this makes perfect sense. But there are scenarios when it may be useful for the web page to be understood by other applications or programs. Say your web page is an events page, containing dates, times and places of events. Wouldn’t it be really useful if these events could be added to your Outlook calendar by simply marking them up in a specific way.

By creating more intelligent web pages we are more able to build intelligent applications that are able to use the additional information from the web page. Applications will be able to act upon and respond to this additional information, merge related content, update other applications, all based on additional information (or tags).

RDFa
This is where RDFa (Resource Development Framework-in-attributes) comes in. RDFa is a recommendation for embedding rich metadata into web content. It does this by adding attribute level extensions to HTML (or more precisely XHTML).

To prevent web page authors adding their own ad hoc metadata elements to their web pages, RDFa metadata must use those elements as defined by the Dublin Core standard. The Dublin Core metadata elements are used widely within the fields of libraries, as well as computing. They provide a standard way of adding supplementary information to resources, including web pages.

Dublin Core
The Dublin Core is able to add the following metadata elements to content:
The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:
  • Title
  • Creator
  • Subject
  • Description
  • Publisher
  • Contributor
  • Date
  • Type
  • Format
  • Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights


Example of adding information about a web page
In the following example the text on the web page is enhanced by representing the heading tags of <h2> and <h3> to indicate that they represent the author and title of the web page.

Here is the original XHTML
<h2>Dominic’s Little Blog</h2>
<h3>Dominic Burford</h3>


Here is the transformed XHTML containing RDFa elements.
<div dc="http://purl.org/dc/elements/1.1/">
<h2 property="dc:title"<Dominic’s Little Blog</h2>
<h3 property="dc:creator"<Dominic Burford>/h3>
...
</div>


The dc above stands for Dublin Core.

To explain what we have done here, we have introduced the Dublin Core namespace to the page by adding a reference to http://purl.org/dc/elements/1.1/
<xmlns:dc="http://purl.org/dc/elements/1.1/">

We then assign the <h2> tag to indicate that it represents the title of the document by assigning the property attribute (an attribute introduced specifically to support RDFa) to be the title.
<h2 property="dc:title">Dominic’s Little Blog</h2>

We then assign the <h3> tag to indicate that it represents the author of the document by assigning the property attribute to be the creator.
<h3 property="dc:creator">Dminic Burford<h3>

Example of adding contact information
Next, we can embed contact information (email, phone number etc) onto the page. This will allow users visiting the web site to easily add the contact details into their Contact Management System.

Here is the original XHTML
<div>
<p>
 Dominic Burford
</p>
<p>
 Email: <a href="mailto:dominic@blog.com">dominic@blog.com</a>
</p>
<p>
 Phone: <a href="tel:+999 999">+999 999</a>
</p>
</div>


Here is the transformed XHTML containing RDFa elements
<div typeof="foaf:Person" foaf="http://xmlns.com/foaf/0.1/">
<p property="foaf:name">
 Dominic Burford
</p>
<p>
 Email: <a rel="foaf:mbox" href="mailto:dominic@blog.com">dominic@blog.com</a>
</p>
<p>
 Phone: <a rel="foaf:phone" href="tel:+999 999"<+999 999</a>
</p>
</div>


Friend-of-a-Friend
To add contact information, you are obviously going to need to describe yourself (or whoever the author is). Unfortunately, the Dublin Core vocabulary does not contain property names for describing contact information, but the Friend-of-a Friend (FOAF) vocabulary does**. Using RDFa it is quite common to combine vocabularies (Dublin Core and Friend-of-a-Friend) on the same page.

** FOAF is the vocabulary used by social networks such as Facebook when finding friends you have in common with other friends. It does this using the attribute knows.

To explain what we have done, we have introduced the Friend-of-a-Friend namespace to the page by adding a reference to http://xmlns.com/foaf/0.1/.

This is similar to what we did in the earlier example.
<div typeof="foaf:Person" foaf="http://xmlns.com/foaf/0.1/">

Next we introduce the contact information using elements from the FOAF vocabulary.

Contact Name
<p property="foaf:name">

Email
<a rel="foaf:mbox" href="mailto:dominic@blog.com">dominic@blog.com</a>

Telephone number
<a rel="foaf:phone" href="tel:+999 999">+999 999</a>

The FOAF type we have used is of type person which contains attributes for name, mbox (email) and phone.

Summary
Although these examples were fairly trivial, they should provide enough evidence for the usefulness of adding additional information to web pages. This is done in a consistent way using standard elements, attributes and vocabularies. Adding such information to web pages makes them much more useful, and brings the web ever nearer to becoming a semantic and social web.

No comments:

Post a Comment