NewsCodes: The CV Server

IPTC shares its Controlled Vocabularies (CV) by a server at http://cv.iptc.org/newscodes/

This document provides guidelines in using this server for the retrieval of full CVs or only single concepts:

The Key Features of this Server

  • it implements IPTC's CV design: each CV and each concept in a CV has an http-URL as identifier. This allows to retrieve the data of the CV or concept by accessing the corresponding URL.
  • it provides a catalog of all available CVs
  • each CV is delivered as a list of concepts pertaining to this CV and additional CV-specific details
  • any concept which is a member of an IPTC CV is delivered as a dataset
  • the datasets of the CVs and concepts are delivered in four different formats: HTML as human readable variant, and NewsML-G2 Knowledge Items (XML), RDF/XML plus RDF/Turtle as primarily machine readable variants.

Quick Start Guide

  • Go to the catalog of available IPTC CVs - at http://cv.iptc.org/newscodes/
  • All the names and definitions of CVs and concepts are displayed in the preferred language of your web browser if a translation into that language is available. If no translation exists names and definitions are displayed in the default language British English (language tag "en-GB").
  • Browse the available CVs - and click on the Scheme URI of a CV to see all its member concepts
  • If you want to see a single concept only: click on the Concept ID (URI) link displayed for each concept in this list.
  • If you need a CV or a concept in another language
    • to display all available languages: append ?lang=x-all to the web address in the browser
    • to display a specific language: append ?lang=...language tag... to the web address in the browser.
  • If you need the data in a machine-readable format: find a guideline below.

Semantic Design of IPTC CVs

IPTC CVs implement the design and rules of IPTC's QCodes and of W3C SKOS:
  • Each CV has an http-URL as Globally Unique Identifier (GUID)
  • For each CV, a name and a definition are provided (at least) in British English.
  • Each concept has an http-URL as Globally Unique Identifier (GUID): the first part of it is inherited from the CV URL and the code of this concept is appended making a new URL (see QCodes in a Nutshell)
  • For each concept a name and a definition are provided (at least) in British English.
  • Hierarchical relationships of concepts in a scheme are expressed by skos:broader or skos:narrower terms
  • The mapping of concepts to other concepts exclusively in other CVs is expressed by skos:closeMatch, skos:exactMatch, skos:broadMatch, or skos:narrowMatch

Catalog of Available CVs

A catalog of all available IPTC CVs can be found at http://cv.iptc.org/newscodes/. Accessing this URL delivers a list of the CVs as HTML page. No other formats are available.
 

Delivery of CVs or Concepts by URLs

For a CV, the URL assigned as its GUID must be applied to an http request.
The response delivers the data in the requested format and language; see below.
Example: http://cv.iptc.org/newscodes/genre/
 
For a concept, the URL assigned as its GUID must be applied to an http request.
The response delivers the data in the requested format and language; see below. Example: http://cv.iptc.org/newscodes/genre/Actuality

How to Select Different Formats and Languages for Delivery

Which data format and which language is used by the server's HTTP response can be selected by the HTTP request.
 
  • One option is the so called HTTP content negotiation:
    • For the selection of the format the HTTP request sends an Accept header with a specific IANA Media Type (also known as MIME Type) which corresponds to the requested format. If the server is able to deliver this format it returns 200 as status code and the data in the requested format. Further the server adds the MIME type of this format to the Content-Type header of the HTTP response. If the format can not be delivered the IPTC CV server returns a 404 status code. 
      If no MIME type is set in the Accept header HTML is delivered as default format.
      These IANA Media (MIME) Types may be used:
      • for HTML: text/html or application/xhtml+xml
      • for NewsML-G2 Knowledge Items: application/vnd.iptc.g2.knowledgeitem+xml
      • for RDF/XML documents: application/rdf+xml
      • for RDF/Turtle documents: text/turtle
    • For the selection of the language the HTTP request sends an Accept-Language header with one to many accepted languages tags as defined by IETF BPC 47 - e.g. fr for French,  es for Spanish or de for German.
      The IPTC CV server uses only the first tag if multiple tags are in the header. If the natural language properties (name, definition, notes) of the CV or concept are available in this language they are delivered, if not these properties are delivered in British English as default language.
  • Another option is the use of a URL parameter:
    • For the selection of the format a parameter format must be used with one of these values
      • for HTML: format=html
      • for NewsML-G2 Knowledge Items: format=g2ki
      • for RDF/XML documents: format=rdfxml
      • for RDF/Turtle documents: format=rdfttl
    • For the selection of a language a parameter lang must be used, e.g.:
      - lang=fr ... French, selected by its tag
      - lang=x-all ... all available languages for this CV or concept are delivered. Be aware this could create a high data volume. 
    • Examplehttp://cv.iptc.org/newscodes/scene/?format=g2ki&lang=de delivers the concepts of the Scene NewsCodes CV as NewsML-G2 Knowledge Items with the natural language properties in German.

Conditions/limitations for using the IPTC CV server

IPTC provides access to all of its Controlled Vocabularies on the CV server under these conditions:

  • They are copyright protected and can be used under the conditions of the Creative Commons Attribution 4.0 license - see the full license agreement at http://creativecommons.org/licenses/by/4.0/
  • They can be used free of any royalty fee
  • The IPTC CV server is not made for production use. Regular requests more frequently than ten per hour may be blocked.

Tools for Retrieving CVs or Concepts (in different formats or languages)

For retrieving CVs or concepts beyond HTML find below two of the many tools which may be used to retrieve IPTC NewsCodes in non-HTML formats:

** wget

This widely used command line tool for retrieving web content can be taylored to request one of the formats above. The command line example below retrieves the IPTC Scene NewsCodes as IPTC G2 Knowledge Item and stores them into an XML file named IPTCscene with file name extensions corresponding to the format.

For IPTC G2:
wget -OIPTCscene-g2.xml --header="Accept:application/vnd.iptc.g2.knowledgeitem+xml" http://cv.iptc.org/newscodes/scene/
or 
wget -OIPTCscene-g2.xml http://cv.iptc.org/newscodes/scene/?format=g2ki
 
For RDF/XML:
wget -OIPTCscene.rdf --header="Accept:application/rdf+xml" http://cv.iptc.org/newscodes/scene/
or
wget -OIPTCscene.rdf http://cv.iptc.org/newscodes/scene/?format=rdfxml 
 
For RDF/Turtle:
wget -OIPTCscene.ttl --header="Accept:text/turtle" http://cv.iptc.org/newscodes/scene/
or
wget -OIPTCscene.ttl http://cv.iptc.org/newscodes/scene/?format=rdfttl 
 

** Firefox with the "Modify Headers" plug-in

For the Firefox browser a plug-in for tweaking the HTTP request headers is available, it is named "Modify Headers": search for it, download and install it.

The following then needs to be specified via the Modify Headers user interface::
Action = 'modify'
Name = 'accept'
Value: input one of the MIME types above

When you enable such an entry in the Modify Headers user interface and apply a URL for a full scheme or a single concept Firefox will retrieve it. How exactly Firefox reacts on the machine readable formats depends on its settings; in most cases it will ask you to open or to save the response - we recommend: save it and open the saved file with an appropriate tool.