rNews 1.0: Schema.org-compatible Implementation Guide HTML 5 Microdata

Schema.org-compatible rNews Implementation Guide for HTML 5 Microdata

This guide will show you how to implement a schema.org-compatible version of the rNews data model using HTML 5 Microdata. With the exception of two properties, this implementation of rNews relies on vocabularies specified at schema.org. If you are unfamiliar with either Microdata or schema.org, we suggest that you start by reading this excellent "Getting started with schema.org" provided by Google, Bing and Yahoo!

Now that you're an expert in both Schema.org and HTML 5 Microdata, there is one additional wrinkle to the schema.org-compatible Microdata impelmentation of rNews. All of the rNews properties specified at Schema.org have the exact same names as they do in the rNews specification, however, the names of several rNews Classes differ in name from their Schema.org counterparts. The following table specifies the alignment between the classes in the rNews specification and their Schema.org counterparts.

rNews ClassSchema.org Class
rNews:NewsItem http://schema.org/CreativeWork
rNews:Article http://schema.org/NewsArticle
rNews:ImageObject http://schema.org/ImageObject
rNews:AudioObject http://schema.org/AudioObject
rNews:VideoObject http://schema.org/VideoObject
rNews:UserComment http://schema.org/UserComments
rNews:Concept http://schema.org/Thing
rNews:Place http://schema.org/Place
rNews:GeoCoordinates http://schema.org/GeoCoordinates http://schema.org/GeoShape
rNews:Person http://schema.org/Person
rNews:Organization http://schema.org/Organization
rNews:Postal Address http://schema.org/PostalAddress
Figure: Alignment between rNews and Schema.org classes.

Ok, lets implement rNews in HTML 5 microdata!

 
The First Steps

The very first thing we have to do is to assert that the primary thing we'll be describing is of type Article and that it has the name <http://dev.iptc.org/rnews/sample_story.html>. So we modify the third line of the sample HTML document to read:

  <html itemscope 
      itemtype="http://schema.org/NewsArticle"
      itemid="http://dev.iptc.org/rnews/sample_story.html">
  

Please note that in these examples, we're adding carriage returns for clarity. You do not need to include these in your impelmentation.

Adding the basics

Now that we've asserted the name, type and initial scope, we'll continue by embedding the headline, alternate headline, dateline and creation date into our sample article. To do this, we change lines 7 and 15 like so:

    <div itemprop="headline" class="headline">
      Allies Are Split on Goal and Exit Strategy in Libya
    </div>
    <div itemprop="alternativeHeadline" class="rider">
      NATO Takes Command
    </div>
  
  <span itemprop="dateline">WASHINGTON</span> | 
  <meta itemprop="dateCreated" content="2011-03-24"/>March 24, 2011
  

A couple of notes:

  • To annotate the dateline and dateCreated fields it was necessary to introduce two new elements as containers for the microdata markup. We introduced a new <span> element for the dateline and a new <meta> element for the dateCreated.
  • A further note on the dateline property. The documentation for this propery specifies that the date be specified in YYYY-MM-DD format, but the date appears in the text as "March 24, 2011". As a consequence, we need to introduce a hidden element to properly specify this property. In HTML 5 Microdata we do this by introducing a <meta> element and setting its content attribute to the desired value. This may seem unusual as the <meta> element usually appears in the <head> of an HTML document. Strange as this seems, this is perfectly legal in HTML 5.

Applying these notes, we'll now annotate the article's body text, copyright statement, copyright year, and usage terms as follows:

  <div class="article_text">
    <p itemprop="articleBody">
    Having largely succeeded...
    </p>
    <p itemprop="articleBody">
    The United States has all but...
    </p>
  </div>
  <div class="legalese">
    <p>
    <a  itemprop="copyrightNotice" 
      href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html">
       &copy; Copyright 
    </a>
    <span itemprop="copyrightYear">2011</span>
    ...
    </p>
    <p>
    <a  itemprop="usageTerms"
      href="http://www.nytimes.com/ref/membercenter/help/agree.html">
      Usage Terms
    </a>
    </p>
  </div>
  

Two more notes:

  • You may have noticed that we specified the attribute itemprop="articleBody" twice; once on the first paragraph and once on the second. This is perfectly legal in HTML 5 Microdata. When a property is specified more than once in the scope of the same item, it is parsed as an array of values. So, in this case, the text of our sample article will be parsed as an array containing the values of both paragraphs.
  • A careful examination of schema.org will reveal that the rNews properties copyrightNotice and usageTerms are not part of the schema.org vocabularies. So why are we specifying these properties as if they were part of schema.org? This is because schema.org provides an extension mechanism for including properties and classes not included in the official specification. To include a property as an extension to a schema.org class, one simply specifies the property as if it were part of the class. Full details of the schema.org extension mechanism can be found here.
Embedding Hidden Metadata

Next we will embed a brief description of the article along with the article's language and a thumbnail image. "But wait", you say, "that information isn't visible on the article page or hiding anywhere in the HTML document!"

You, of course, are right.

So how do we add this missing metadata? By including a number of <meta> elements in the document's <head> section. The markup looks like this.

  <meta itemprop="description" content="The questions about the command..." />
  <meta itemprop="inLanguage" content="en-US" />
  <meta itemprop="thumbnailUrl" content="http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif" />
  
Embedding Multiple Objects

Our next task is to embed our article's provenance information into our hidden div. As you will recall the model for this information is as follows:

  <copyrightHolder>
    <Organization>
    <id>http://www.nytimes.com</id>
    <itemtype>http://schema.org/Organization</itemtype>
    <name>The New York Times Company</name>
    <tickerSymbol>NYSE NYT</tickerSymbol>
    </Organization>
  </copyrightHolder>
  
  <provider>
    <Organization>
    <id>http://www.nytimes.com</id>
    <itemtype>http://schema.org/Organization</itemtype>
    <name>The New York Times Company</name>
    <tickerSymbol>NYSE NYT</tickerSymbol>
    </Organization>
  </provider>

  <sourceOrganization>
    <Organization>
    <id>http://www.nytimes.com</id>
    <itemtype>http://schema.org/Organization</itemtype>
    <name>The New York Times Company</name>
    <tickerSymbol>NYSE NYT</tickerSymbol>
    </Organization>
  </sourceOrganization>

  

We'll start with the copyrightedBy information. To express this in Microdata, we add the following markup to our document:

  <html itemscope 
      itemtype="http://schema.org/NewsArticle"
      itemid="http://dev.iptc.org/rnews/sample_story.html">

     ...  

     <span
     itemprop="copyrightHolder" 
     itemscope
     itemtype="http://schema.org/Organization"
     itemid="http://www.nytimes.com">
     <span itemprop="name">The New York Times Company</span>
     <meta itemprop="tickerSymbol" content="NYSE NYT"/>
     </span>

     ...

  <html>
  

The above code snippet asserts that an object of type http://schema.org/NewsArticle with the id http://dev.iptc.org/rnews/sample_story.html has a copyrightHolder and that this copyrightHolder is described by an object of type http://schema.org/Organization with the id http://www.nytimes.com.

Next we want to assert that our sample article has both a provider and a sourceOrganization. As you can see from the model the article's provider, sourceOrganization, and copyrightHolder are all the same organization, so we can save some typing by adding these relations to the existing itemprop attribute as follows.

  <span
    itemprop="copyrightHolder provider sourceOrganization" 
    itemscope
    itemtype="http://schema.org/Organization"
    itemid="http://www.nytimes.com">
    <span itemprop="name">The New York Times Company</span>
    <meta itemprop="tickerSymbol" content="NYSE NYT"/>
  </span>
  
Marking Up The Byline

Now we're going to combine several of the techniques we've discussed so far to markup the byline. As you will recall, the original HTML for the byline looks like this.

  <div>By STEVEN LEE MYERS</div>
  

And our sample model describes the byline as follows:

  <createdBy>
    <Person>
    <id>http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/</id>
    <itemtype>http://schema.org/Person</itemtype>
    <name>STEVEN LEE MYERS</name>
    </Person>
  </createdBy>    
  

To shoehorn all of this metadata into our sample HTML document, we write the following:

  <div
    itemprop="createdBy" 
    itemscope
    itemtype="http://schema.org/Person"
    itemid="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
    class="byline">By 
      <span itemprop="name">STEVEN LEE MYERS</span>
  </div>    
Marking Up The Concepts

Following a similar approach we can mark up this original HTML for the tag.

  <div>
    <div>
    <div>People</div>
    <div>Qaddafi, Muammar el-</div>
    </div>
  </div>      
  

With this data.

  <about>
    <Person>
    <id>http://data.nytimes.com/91178019641520997503</id>
    <itemtype>http://schema.org/Person</itemtype>
    <name>Qaddafi, Muammar el-</name>
    </Person>
  </about>
  

Giving us this results.

   <div
      itemprop="about" 
      itemscope
      itemtype="http://schema.org/Person"
      itemid="http://data.nytimes.com/91178019641520997503"
      class="tag tag_first tag_last">
     <span itemprop="name">Qaddafi, Muammar el-</span>
   </div>
  
Marking up The User Comment

Original HTML:

   <div>
     <div>Discussion (3)</div>
     <div>
     <div>
       So the question is..."
     </div>
     <div>
       <a href="http://timespeople.nytimes.com/view/user/27242827/activities.html">Chuck</a>
     </div>
     <div>
       March 25th, 2011 8:27 am
     </div>
     </div>
   </div>
  

Data:

  <comment>
    <UserComment>
    <commentText>So the question is: Why is Secretary of Defense Hillary Clinton speaking as the Defense Secretary...</commentText>
    <commentTime>2011-03-2011T08:27Z</commentTime>
    <id>http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=1#comment1</id>
    <itemtype>http://schema.org/UserComment</itemtype>
    
    <creator>
      <Person>
      <id>http://timespeople.nytimes.com/view/user/27242827/activities.html</id>
      <itemtype>http://schema.org/Person</itemtype>
      <name>Chuck</name>
      </Person>
    </creator>
    
    </UserComment>
  </comment>
  

Result:

  <div
     itemprop="comment" 
     itemscope
     itemtype="http://schema.org/UserComment"
     itemid="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=1#comment1"
     class="comment">
    <div>&quot;<span itemprop="commentText" class="comment_text">So the question is: Why is Secretary of Defense Hillary Clinton speaking as the Defense Secretary...</span>&quot;</div>
    <div
       itemprop="creator" 
       itemscope
       itemtype="http://schema.org/Person"
       itemid="http://timespeople.nytimes.com/view/user/27242827/activities.html"
       class="username">
    <a href="http://timespeople.nytimes.com/view/user/27242827/activities.html">
      <span itemprop="name">Chuck</span>
    </a>
    </div>
    <div class="username">
    <meta itemprop="commentTime" content="2011-03-2011T08:27Z"/>March 25th, 2011 8:27 am
    </div>
  </div>
  
Marking Up The Image

We're going to finish by annotating all of the data we've asserted concerning the article's associated. And In case you forgot, this data is as follows:

  <associatedMedia>
    <ImageObject>
    <itemtype>http://schema.org/ImageObject</itemtype>
    <id>http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg</id>
    <description>Rebel fighters take cover during a shelling near Ajdabiyah, Libya on Thursday.</description>
    
    <copyrightHolder>
      <Organization>
      <id>http://www.reuters.com</id>
      <itemtype>http://schema.org/Organization</itemtype>
      <name>Reuters</name>
      </Organization>
    </copyrightHolder>
    
    <sourceOrganization>
      <Organization>
      <id>http://www.reuters.com</id>
      <itemtype>http://schema.org/Organization</itemtype>
      <name>Reuters</name>
      </Organization>
    </sourceOrganization>
    
    <provider>
      <Organization>
      <id>http://www.reuters.com</id>
      <itemtype>http://schema.org/Organization</itemtype>
      <name>Reuters</name>
      </Organization>
    </provider>
    
    <creator>
      <Person>
      <id>http://blogs.reuters.com/goran-tomasevic/</id>
      <itemtype>http://schema.org/Person</itemtype>
      <name>Goran Tomasevic</name>
      </Person>
    </creator>
    
    </ImageObject>
  </associatedMedia>
  

The original HTML for the image looks like this.

  <div>
    <img src="img/libya_sample_reuters.jpg"/>
    <div>Credit: Goran Tomasevic/Reuters</div>
    <div>Rebel fighters take...</div>
  </div>
  

Using all the tricks we've covered so far, we can embed the above data into this HTML as follows:

  <div
     itemprop="associatedMedia"
     itemscope
     itemtype="http://schema.org/ImageObject"
     itemid="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
     class="main_image">
    <img class="image" src="/files/Simple-Sample/libya_sample_reuters.jpg"/>
    <div class="image_credit">Credit: 
      <span
        itemprop="creator" 
        itemscope
        itemtype="http://schema.org/Person"
        itemid="http://blogs.reuters.com/goran-tomasevic/">
        <span itemprop="name">Goran Tomasevic</span>
      </span>
      /
      <span
        itemprop="copyrightHolder sourceOrganization provider" 
        itemscope
        itemtype="http://schema.org/Organization"
        itemid="http://www.reuters.com">
        <span itemprop="name">Reuters</span>
      </span>
    </div>
    <div itemprop="description" class="image_caption">Rebel fighters...</div>
  </div>
  
Final Assembly

In the examples above, we've demonstrated how to embed rNews triples into HTML documents using HTML 5 MIicrodata. Now as a parting gift, we'd like to show you what the complete HTML for our sample article looks like once we've embedded all of the data in our sample data model.

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
  <html itemscope itemtype="http://schema.org/NewsArticle" itemid="http://blogs.reuters.com/goran-tomasevic/">
  <head>
    <style type="text/css">@import url(/files/Simple-Sample/iptc_times2.css);</style>
    <meta itemprop="description" content="The questions about the command mirrored the strategic divisions over how the coalition will end the operation." />
    <meta itemprop="inLanguage" content="en-US" />
    <meta itemprop="thumbnailUrl" content="http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif" />
  </head>
  <body>
    <div class="article" style="height:623px">
    <div class="a_column">
      <div itemprop="headline" class="headline">Allies Are Split on Goal and Exit Strategy in Libya</div>
      <div itemprop="alternativeHeadline" class="rider">NATO Takes Command</div>
      <div      
      itemprop="associatedMedia" 
      itemscope
      itemtype="http://schema.org/ImageObject"
      itemid="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
      class="main_image">
      <img class="image" src="/files/Simple-Sample/libya_sample_reuters.jpg"/>
      <div class="image_credit">Credit: 
        <span
          itemprop="creator" 
          itemscope
          itemtype="http://schema.org/Person"
          itemid="http://blogs.reuters.com/goran-tomasevic/">
          <span itemprop="name">Goran Tomasevic</span>
        </span>
        /
        <span
          itemprop="copyrightHolder sourceOrganization provider" 
          itemscope
          itemtype="http://schema.org/Organization"
          itemid="http://www.reuters.com">
          <span itemprop="name">Reuters</span>
        </span>
      </div>
      <div itemprop="description" class="image_caption">Rebel fighters take cover during a shelling near Ajdabiyah, Libya on Thursday.</div>
      </div>
      <div
      itemprop="createdBy" 
      itemscope
      itemtype="http://schema.org/Person"
      itemid="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
      class="byline">By 
        <span itemprop="name">STEVEN LEE MYERS</span>
      </div>
      <div class="publication_date">
      <span itemprop="dateline">WASHINGTON</span> | 
      <meta itemprop="dateCreated" content="2011-03-24"/>March 24, 2011
      </div>
      <div class="article_text">
      <p itemprop="articleBody">Having largely succeeded in stopping a rout of Libya&rsquo;s rebels, the inchoate coalition attacking Col. Muammar el-Qaddafi&rsquo;s forces remains divided over the ultimate goal &mdash; and exit strategy &mdash; of what officials acknowledged Thursday would be a military campaign that could last for weeks.        </p>
      </div>
      <div class="legalese">
      <p>
        <a  itemprop="copyrightNotice" 
          href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html">
           &copy; Copyright 
        </a>
        <span itemprop="copyrightYear">2011</span>
        <span
        itemprop="copyrightHolder provider sourceOrganization" 
        itemscope
        itemtype="http://schema.org/Organization"
        itemid="http://www.nytimes.com">
        <span itemprop="name">The New York Times Company</span>
        <meta itemprop="tickerSymbol" content="NYSE NYT"/>
        </span>
      </p>
      <p>
        <a  itemprop="usageTerms"
          href="http://www.nytimes.com/ref/membercenter/help/agree.html">
          Usage Terms
        </a>
      </p>
      </div>
    </div>
    <div class="b_column">
      <div class="b_module">
      <div class="section_head">Section</div>
      <div class="tag" itemprop="articleSection">World</div>
      </div>
    
      <div class="section_head">Tags</div>
      <div class="b_module tag_module">
      <div>
        <div class="tag_head">People</div>
        <div 
        itemprop="about" 
        itemscope
        itemtype="http://schema.org/Person"
        itemid="http://data.nytimes.com/91178019641520997503"
        class="tag tag_first tag_last">
        <span itemprop="name">Qaddafi, Muammar el-</span>
        </div>
      </div>
      </div>
      
      <div class="b_module">
      <div class="section_head">Discussion</div>
      <div 
        itemprop="comment" 
        itemscope
        itemtype="http://schema.org/UserComment"
        itemid="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=1#comment1"
        class="comment">
        <div>&quot;<span itemprop="commentText" class="comment_text">So the question is: Why is Secretary of Defense Hillary Clinton speaking as the Defense Secretary...</span>&quot;</div>
        <div 
        itemprop="creator" 
        itemscope
        itemtype="http://schema.org/Person"
        itemid="http://timespeople.nytimes.com/view/user/27242827/activities.html"
        class="username">
        <a href="http://timespeople.nytimes.com/view/user/27242827/activities.html">
          <span itemprop="name">Chuck</span>
        </a>
        </div>
        <div class="username">
        <meta itemprop="commentTime" content="2011-03-2011T08:27Z"/>March 25th, 2011 8:27 am
        </div>
      </div>
      </div>
    </div>
    </div>    
    </span>
  </body>
  </html>
  
In Conclusion

So that's it! All you need to embed rNews markup into your news documents. Now go forth and embed.

Want to comment on rNews: we invite you to post your comment to the rNews Forum.