rNews: Why rNews?

The average web document, like the one you're reading right now is written in the Hypertext Markup Language (HTML). In the early days of the web, whole sites consisted of nothing but folder upon folder of pre-written HTML documents. As the web matured, folks begin offloading page content into databases and using programming languages like Perl and PHP to dynamically generate HTML pages from their databases. This separation of a website's functions into data, logic and presentation is known as a three-tiered architecture and it's the way most modern websites are built.

In a three-tiered architecture you have a data tier, a logic tier, and a presentation tier. The data tier consists of one or more databases used to house a site's content. The logic tier is a bunch of code that responds to user requests for specific pages by reading the required data out of the database, building an HTML document to display that data and delivering that page to the requesting user's browser. The presentation tier is the HTML document and accompanying resources (scripts, images, etc) delivered by the logic to the requesting user's browser.

So that's how most modern websites are built. But there is a problem lurking in this architecture, the problem of structured data. At the data tier, a news publisher can precisely specify the structure its content. For instance, a news publisher might store an article in a database table with fields for ID, headline, byline, publication date and so forth. When a user requests that article, the logic tier fetches the required data from the highly-structured database and renders it into HTML for display.

And here's where the problem arises.

HTML is a language for specifying how content should look, not for specifying what the content means. For instance, HTML makes it easy to display an article's headline in a 16-point bold typeface, but provides no mechanism for asserting that a block of text is an article's headline. Why is this a problem? Because the search, social and aggregation sites that news publishers rely on for referral traffic can't automatically show a headline when linking to an article.

If that sounds bad, the full story is even worse. At the data tier, publishers harbor a wealth of information that could improve the quality of inbound links; including bylines, thumbnails, images, datelines, tags and summaries. Yet publishers are prevented from providing this information in the display tier because of the fundamental limitations of HTML.

Fortunately: several technologies have arisen to address this shortcoming, including HTML 5 Microdata and RDFa. Both HTML 5 Microdata and RDFa allow web authors to indicate that a certain portion of a page has a certain meaning. It is up to the author, however, to provide the vocabulary used for enumerating and indicating such meanings.

This can lead to some confusion. Is the short explanatory text immediately preceding an article called a "title", "headline" or an "Überschrift?" Should an article include attributes for indicating the postal address of people mentioned in the text?

To avoid this kind of confusion, it is necessary to define a common vocabulary and a data model for embedding publishing metadata into web documents.

And that's what rNews is. rNews is a data model for embedding machine-readable publishing metadata in web documents and a set of suggested implementations in both HTML 5 Microdata and RDFa.

In the following lessons, we'll introduce both RDFa and HTML 5 Microdata. After that we'll provide full details on how you can use these technologies to implement rNews on your website

Want to comment on rNews: we invite you to post your comment to the rNews Forum.