QCodes in the Semantic Web context
<<< QCodes in a Nutshell | QCodes' Home | QCodes Specification >>>
QCodes were initially defined for use in the context of news, as typically values from CVs are used to categorize news content. Examples are ticker codes for companies assigned by a stock exchange, codes for countries, codes for the subject of a news story, and more ...
The major IPTC business requirements for identifiers of Concepts in CVs, as raised by its members were:
- URIs/IRIs which align with the Semantic Web requirements shall be used to identify Concepts.
- A subsection of such a URI string shall identify the CV to which this Concept URI pertains.
- This subsection should be shortened by using an CV Alias to save space and bandwidth, and to improve human readability.
- The mapping of CV Aliases to CV URIs should be done in the XML document or by an external file which is referenced by a URL from the XML document, thus saving space and bandwidth again.
Many of these requirements may sound familiar to people acquainted with XML technology and the Semantic Web.
Why didn’t the IPTC adopt an existing technology? The reasons are:
a) W3C XML Namespaces and QNames:
The Local Part of a QName (the part to the right of the colon) cannot start with a digit (as per the W3C specifications). As many existing CVs include Code Values which use a digit as their first character, QNames would require to change those Code Values – which implies losing compatibility with well known Codes Values.
b) W3C CURIE:
The technology behind the QCodes is the IPTC News Architecture (NAR), which was designed in the period 2004 to 2007. At that time only first drafts for the CURIE specifications existed. Currently (as of summer 2012) only a Candidate Recommendation 1.0 of January 2009 exists – but no final recommendation.
Furthermore, CURIEs do not align with the IPTC requirements for a key feature: the CURIE specs define that the subsection of URI which is represented by a prefix may be any subsection of the URI; that CURIE subsection neither has to be a valid URI by itself nor has it a specific meaning or role. For QCodes, this subsection is the CV URI, the identifier for a specific CV, and it is required that this is a fully valid URI.
Finally the mapping of CURIE prefixes to URI sections is not part of the CURIE specification; this is left to the "hosting language" and is therefore indeterminate.
On the other hand the IPTC aligned the QCodes as much as possible to Semantic Web technology:
- They follow the notion of URIs being globally unique Web resource identifiers.
- They follow the recommendation that these URIs should be HTTP URLs and activating such a URL should retrieve information about the Concept.
- Additionally, for retrieving information by such a URL, the recommendation for using content negotiation is supported. This means that by using appropriate MIME Types in the HTTP Request header the information about the Concept can be retrieved in different formats - and the formats may be human readable or machine readable.
Further, the IPTC is aware that the design of QCodes aligns with requirements and use cases of recent Semantic Web technologies:
- RDFa has adopted CURIEs for expressing RDF Predicates and Objects but has twisted their use in a QCode-like way by proposing that the CURIE prefix could be an identifier for a CV.
- RDF syntax like N3 or Turtle uses a prefix to shorten URIs of RDF Predicates or Objects. The only concern is that N3 and Turtle defines that this prefix mechanism aligns to the QNames syntax rules of XML - see the IPTC concerns about that above. Additionally the RDF/XML specifications allow for Objects only the xml:base mechanics to abbreviate URIs.
We welcome feedback on and questions about QCode. You may post to the public QCodes Forum.
<<< QCodes in a Nutshell | QCodes' Home | QCodes Specification >>>