Deploying Linked Data Guide - Part 2c: Transparent Content Negotiation

Deploying Linked Data Guide - TOC

Section Contents

Transparent Content Negotiation

So as not to overload our preceding description of Linked Data deployment with excessive detail, the description of content negotiation presented thus far was kept deliberately brief. This section discusses content negotiation in more detail.

HTTP/1.1 Content Negotiation

Recall that a resource (conceptual entity) identified by a URI may be associated with more than one representation (e.g. multiple languages, data formats, sizes, resolutions). If multiple representations are available, the resource is referred to as negotiable and each of its representations is termed a variant. For instance, a Web document resource, named 'ALFKI' may have three variants: alfki.xml, alfki.html and alfki.txt all representing the same data. Content negotiation provides a mechanism for selecting the best variant.

As outlined in the earlier brief discussion of content negotiation, when a user agent requests a resource, it can include with the request Accept headers (Accept, Accept-Language, Accept-Charset, Accept-Encoding etc.) which express the user preferences and user agent capabilities. The server then chooses and returns the best variant based on the Accept headers. Because the selection of the best resource representation is made by the server, this scheme is classed as server-driven negotiation.

Transparent Content Negotiation

An alternative content negotiation mechanism is Transparent Content Negotiation (TCN), a protocol defined by RFC2295 . TCN offers a number of benefits over standard HTTP/1.1 negotiation, for suitably enabled user agents.

RFC2295 introduces a number of new HTTP headers including the Negotiate request header, and the TCN and Alternates response headers. ( Krishnamurthy et al. note that although the HTTP/1.1 specification reserved the Alternates header for use in agent driven negotiation, it was not fully specified. Consequently under a pure HTTP/1.1 implementation as defined by RFC2616 , server-driven content negotiation is the only option. RFC2295 addresses this issue.)

Deficiencies of HTTP/1.1 Server-Driven Negotiation

Weaknesses of server-driven negotiation highlighted by RFCs 2295 and 2616 include:

  • Inefficiency - Sending details of a user agent's capabilities and preferences with every request is very inefficient, not least because very few Web resources have multiple variants, and expensive in terms of the number of Accept headers required to fully describe all but the most simple browser's capabilities.
  • Server doesn't always know 'best' - Having the server decide on the 'best' variant may not always result in the most suitable resource representation being returned to the client. The user agent might often be better placed to decide what is best for its needs.

Variant Selection By User Agent

Rather than rely on server-driven negotiation and variant selection by the server, a user agent can take full control over deciding the best variant by explicitly requesting transparent content negotiation through the Negotiate request header. The negotiation is 'transparent' because it makes all the variants on the server visible to the agent.

Under this scheme, the server sends the user agent a list, represented in an Alternates header, containing the available variants and their properties. The user agent can then choose the best variant itself. Consequently, the agent no longer needs to send large Accept headers describing in detail its capabilities and preferences. (However, unless caching is used, user-agent driven negotiation does suffer from the disadvantage of needing a second request to obtain the best representation. By sending its best guess as the first response, server driven negotiation avoids this second request if the initial best guess is acceptable.)

Variant Selection By Server

As well as variant selection by the user agent, TCN allows the server to choose on behalf of the user agent if the user agent explicitly allows it through the Negotiate request header. This option allows the user agent to send smaller Accept headers containing enough information to allow the server to choose the best variant and return it directly. The server's choice is controlled by a 'remote variant selection algorithm' as defined in RFC2296 .

Variant Selection By End-User

A further option is to allow the end-user to select a variant, in case the choice made by negotiation process is not optimal. For instance, the user agent could display an HTML-based 'pick list' of variants constructed from the variant list returned by the server. Alternatively the server could generate this pick list itself and include it in the response to a user agent's request for a variant list. (Virtuoso currently responds this way.)

Transparent Content Negotiation in Virtuoso HTTP Server

The following section describes the Virtuoso HTTP server's TCN implementation which is based on RFC2295, but without "Feature" negotiation. OpenLink's RDF rich clients, iSparql and the OpenLink RDF Browser, both support TCN. User agents which do not support transparent content negotiation continue to be handled using HTTP/1.1 style content negotiation (whereby server-side selection is the only option - the server selects the best variant and returns a list of variants in an Alternates response header).

Describing Resource Variants

In order to negotiate a resource, the server needs to be given information about each of the variants. Variant descriptions are held in SQL table HTTP_VARIANT_MAP. The descriptions themselves can be created, updated or deleted using Virtuoso/PL or through the Conductor UI.

HTTP_VARIANT_MAP Table

The table definition is as follows:


create table DB.DBA.HTTP_VARIANT_MAP ( 
VM_ID integer identity,   	-- unique ID
VM_RULELIST varchar,      	-- HTTP rule list name 
VM_URI varchar,           	-- name of requested resource e.g. 'page'
VM_VARIANT_URI varchar,         -- name of variant e.g. 'page.xml', 'page.de.html' etc. 
VM_QS float,                    -- Source quality, a number in the range 0.001-1.000, with 3 digit precision
VM_TYPE varchar,             	-- Content type of the variant e.g. text/xml 
VM_LANG varchar,            	-- Content language e.g. 'en', 'de' etc. 
VM_ENC varchar,                 -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
VM_DESCRIPTION long varchar,    -- a human readable description about the variant e.g. 'Profile in RDF format' 
VM_ALGO int default 0,          -- reserved for future use 
primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
); 
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID);

Configuration using Virtuoso/PL

Two functions are provided for adding or updating, or removing variant descriptions using Virtuoso/PL:

Adding or Updating a Resource Variant
DB.DBA.HTTP_VARIANT_ADD ( 
in rulelist_uri varchar,       	-- HTTP rule list name 
in uri varchar,           	-- Requested resource name e.g. 'page'
in variant_uri varchar,    	-- Variant name e.g. 'page.xml', 'page.de.html' etc. 
in mime varchar,            	-- Content type of the variant e.g. text/xml
in qs float := 1.0,          	-- Source quality, a floating point number with 3 digit precision in 0.001-1.000 range 
in description varchar := null, -- a human readable description of the variant e.g. 'Profile in RDF format' 
in lang varchar := null,       	-- Content language e.g. 'en', 'bg'. 'de' etc. 
in enc varchar := null          -- Content encoding e.g. 'utf-8', 'ISO-8892' etc. 
)

Removing a Resource Variant


DB.DBA.HTTP_VARIANT_REMOVE ( 
in rulelist_uri varchar,    	-- HTTP rule list name 
in uri varchar,           	-- Name of requested resource e.g. 'page' 
in variant_uri varchar := '%'  	-- Variant name filter 
)

Configuration using Conductor UI

The Conductor 'Content negotiation' panel for describing resource variants and configuring content negotiation is depicted below. It can be reached by selecting the 'Virtual Domains & Directories' tab under the 'Web Application Server' menu item, then selecting the 'URL rewrite' option for a logical path listed amongst those for the relevant HTTP host, e.g. '{Default Web Site}'.

The screen snapshot shows the variant descriptions created by issuing the HTTP_VARIANT_ADD and VHOST_DEFINE Virtuoso/PL calls detailed in the examples at the end of this section. Obviously these definitions could instead have been created entirely 'from scratch' through the Conductor UI.

The input fields reflect the supported 'dimensions' of negotiation which include content type, language and encoding. Quality values corresponding to the options for 'Source Quality' are as follows:

Source Quality Quality Value
perfect representation 1.000
threshold of noticeable loss of quality 0.900
noticeable, but acceptable quality reduction 0.800
barely acceptable quality 0.500
severely degraded quality 0.300
completely degraded quality 0.000
Content negotiation rules in Conductor
Content negotiation rules in Conductor

Variant Selection Algorithm

When a user agent instructs the server to select the best variant, Virtuoso does so using the selection algorithm below:

If a virtual directory has URL rewriting enabled (has the 'url_rewrite' option set), the web server:

  1. Looks in DB.DBA.HTTP_VARIANT_MAP for a VM_RULELIST matching the one specified in the 'url_rewrite' option
  2. If present, it loops over all variants for which VM_URI is equal to the resource requested
  3. For every variant it calculates the source quality based on the value of VM_QS and the source quality given by the user agent
  4. If the best variant is found, it adds TCN HTTP headers to the response and passes the VM_VARIANT_URI to the URL rewriter
  5. If the user agent has asked for a variant list, it composes such a list and returns an 'Alternates' HTTP header with response code 300
  6. If no URL rewriter rules exist for the target URL, the web server returns the content of the dereferenced VM_VARIANT_URI.

The server may return the best-choice resource representation or a list of available resource variants. When a user agent requests transparent negotiation, the web server returns the TCN header "choice". When a user agent asks for a variant list, the server returns the TCN header "list".

Transparent Content Negotiation Examples

Simple TCN with Static Content

In this example we assume the following files have been uploaded to the Virtuoso WebDAV server, with each containing the same information but in different formats:

  • /DAV/TCN/page.xml - a XML variant
  • /DAV/TCN/page.html - a HTML variant
  • /DAV/TCN/page.txt - a text variant

We add TCN rules and define a virtual directory:


DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html', 0.900000, 'HTML variant'); 
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain', 0.500000, 'Text document'); 
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml', 1.000000, 'XML variant'); 
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1, vsp_user=>'dba',
    opts=>vector ('url_rewrite', 'http_rule_list_1')); 

Having done this we can now test the setup with a suitable HTTP client, in this case the curl command line utility. In the following examples, the curl client supplies Negotiate request headers containing content negotiation directives which include:

  • "trans" - The user agent supports transparent content negotiation for the current request.
  • "vlist" - The user agent requests that any transparently negotiated response for the current request includes an Alternates header with the variant list bound to the negotiable resource. Implies "trans".
  • "*" - The user agent allows servers and proxies to run any remote variant selection algorithm.

The server returns a TCN response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate.

In the first curl exchange, the user agent indicates to the server that, of the formats it recognizes, HTML is preferred and it instructs the server to perform transparent content negotiation. In the response, the Vary header field expresses the parameters the server used to select a representation, i.e. only the Negotiate and Accept header fields are considered.


$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3" 
          -H "Negotiate: *" http://localhost:8890/DAV/TCN/page 
HTTP/1.1 200 OK 
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB 
Connection: Keep-Alive
Date: Wed, 31 Oct 2007 15:43:18 GMT 
Accept-Ranges: bytes 
TCN: choice 
Vary: negotiate,accept
Content-Location: page.html 
Content-Type: text/html 
ETag: "14056a25c066a6e0a6e65889754a0602" 
Content-Length: 49 
<html> 
<body>
some html 
</body>
</html>

Next, the source quality values are adjusted so that the user agent indicates that XML is its preferred format.


$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3" 
          -H "Negotiate: *" http://localhost:8890/DAV/TCN/page 
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive 
Date: Wed, 31 Oct 2007 15:44:07 GMT
Accept-Ranges: bytes 
TCN: choice 
Vary: negotiate,accept
Content-Location: page.xml 
Content-Type: text/xml 
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39 
<?xml version="1.0" ?> 
<a>some xml</a>

In the final example, the user agent wants to decide itself which is the most suitable representation, so it asks for a list of variants. The server provides the list, in the form of an Alternates response header, and, in addition, sends an HTML representation of the list so that the end user can decide on the preferred variant himself if the user agent is unable to.


$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3" 
          -H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices 
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB 
Connection: close 
Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2007 15:44:35 GMT 
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type text/plain}}, 
    {"page.xml" 1.000000 {type text/xml}}
Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
<html>
<head> 
<title>300 Multiple Choices</title> 
</head>
<body> 
<h1>Multiple Choices</h1>
Available variants:
<ul> 
<li><a href="page.html">HTML variant</a>, type text/html</li> 
<li><a href="page.txt">Text document</a>, type text/plain</li> 
<li><a href="page.xml">XML variant</a>, type text/xml</li> 
</ul> 
</body>
</html>

Northwind Linked Data View

Our next example illustrates the use of a slash URI scheme in an Linked Data View, and shows how to combine URL rewriting and transparent content negotiation. The example is taken from the Linked Data View tutorial , one of many Virtuoso on-line tutorials.

The view generates an RDF rendering of Virtuoso's Northwind 'Demo' database. (Note: The 'tutorial' Linked Data View described here is distinct from the hash-URI-based 'demo' Linked Data View created by the Demonstration VAD.) If you intend trying the example locally, both the Demonstration and Tutorial VADs must be installed on the local machine.

To generate the Linked Data View and setup the URL rewriting rules, the tutorial runs the script rd_v_1.sql (see the 'View Source' tab of the Linked Data View tutorial, or WebDAV folder DAV/VAD/tutorial/rdfview/rd_v_1). The view creates two RDF graphs:

  • http://<URIQADefaultHost>/tutorial/Northwind - containing the base RDF data
  • http://<URIQADefaultHost>/tutorial/Northwind/ontology - containing the OWL class definitions

A slash URI scheme is adhered to throughout. Each entity exposed by the view is identified by the URI prefix http://<URIQADefaultHost>/tutorial/Northwind/resource/. For example:

RDF and HTML representation documents describing Northwind entities are identified by URIs with prefixes http://<URIQADefaultHost>/tutorial/Northwind/data/ and http://<URIQADefaultHost>/tutorial/Northwind/page/, e.g.

Transparent content negotiation is enabled to allow entity representations to be rendered in several formats. The available variants can be seen using curl. e.g.


curl -I -L -H "Negotiate: vlist" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"

returns


HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 15 May 2009 11:11:19 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"ALFKI.html" 0.600000 {type text/html}}, {"ALFKI.n3" 0.800000 {type text/rdf+n3}}, 
    {"ALFKI.ttl" 0.700000 {type application/x-turtle}}, {"ALFKI.xml" 0.950000 {type application/rdf+xml}}
Location: http://demo.openlinksw.com/tutorial/Northwind/page/Customer/ALFKI
Content-Length: 443

Requesting RDF/XML as the preferred representation of a resource (and requesting only the HTTP headers be displayed)


curl -I -L -H "Accept: application/rdf+xml;q=0.95,text/rdf+n3;q=0.80" 
           -H "Negotiate: *" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"

returns


HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Date: Fri, 15 May 2009 16:17:11 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: ALFKI.xml
Content-Type: application/rdf+xml; qs=0.9025
Location: http://demo.openlinksw.com/tutorial/Northwind/data/Customer/ALFKI.xml
Content-Length: 0

HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 15 May 2009 16:17:11 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?default-graph-uri=http%3A//demo.openlinksw.com/tutorial/Northwind&
    query=DESCRIBE+%3Chttp%3A//demo.openlinksw.com/tutorial/Northwind%2Fresource%2FCustomer%2FALFKI%3E&format=rdf
Content-Length: 0

HTTP/1.1 200 OK
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: Keep-Alive
Date: Fri, 15 May 2009 16:17:11 GMT
Accept-Ranges: bytes
Content-Type: application/rdf+xml; charset=UTF-8
Content-Length: 6358

Likewise, specifying N3 as the preferred format


curl -I -L -H "Accept: text/rdf+n3;q=1.0,application/rdf+xml;q=0.5" 
           -H "Negotiate: *" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"

generates


HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Date: Fri, 15 May 2009 16:30:27 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: ALFKI.n3
Content-Type: text/rdf+n3; qs=0.8
Location: http://demo.openlinksw.com/tutorial/Northwind/data/Customer/ALFKI.n3
Content-Length: 0

HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 15 May 2009 16:30:28 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?default-graph-uri=http%3A//demo.openlinksw.com/tutorial/Northwind&
    query=DESCRIBE+%3Chttp%3A//demo.openlinksw.com/tutorial/Northwind%2Fresource%2FCustomer%2FALFKI%3E&format=n3
Content-Length: 0

HTTP/1.1 200 OK
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: Keep-Alive
Date: Fri, 15 May 2009 16:30:28 GMT
Accept-Ranges: bytes
Content-Type: text/rdf+n3; charset=UTF-8
Content-Length: 2018

To explain how this TCN configuration is set up, the salient portions of the rd_v_1.sql setup script are described below.

A URL rewriting rule list, nwtut_rule_list_1, is associated with logical path /tutorial/Northwind/resource. Two rules, resource_rule_1 and resource_rule_2 are added to the rule list. Each rewrites request paths containing '/tutorial/Northwind/resource/'.


DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/resource',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/', is_dav=>1, is_brws=>1, vsp_user=>'dba',
opts=>vector ('url_rewrite', 'nwtut_rule_list_1'));
 ...
DB.DBA.URLREWRITE_CREATE_RULELIST ('nwtut_rule_list_1', 1, vector ('resource_rule_1', 'resource_rule_2'));

The first rule, resource_rule_1, acts as a 'catch all', handling requests for content types not handled by the second rule. The latter handles requests for different RDF serialization formats: RDF/XML, N3, TTL, redirecting them to path /tutorial/Northwind/data/... . resource_rule_1 forces requests for any other content types to 'text/html', redirecting the request to path /tutorial/Northwind/page/... .


DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('resource_rule_1', 1, '/resource/([^.]*)', 
  vector ('par_1'), 1,'/tutorial/Northwind/page/%s', 
  vector ('par_1'), NULL, NULL, 2, 303, NULL);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('resource_rule_2', 1, '/resource/(.*)\x24', 
  vector ('par_1'), 1,'/tutorial/Northwind/data/%s', 
  vector ('par_1'), NULL, '(application/rdf.xml)|(text/rdf.n3)|(application/x-turtle)', 2, 303);

So, requests for /tutorial/Northwind/resource/$1 are routed to:

  • /tutorial/Northwind/data/$1.xml - if content type application/rdf+xml was requested
  • /tutorial/Northwind/data/$1.n3 - if content type text/rdf+n3 was requested
  • /tutorial/Northwind/data/$1.ttl - if content type application/x-turtle was requested
  • /tutorial/Northwind/page/$1.html - if content type text/html, or any other content type, was requested

where $1 signifies the remainder portion of the input path. The Customer entity ALFKI has four description document variants, ALFKI.xml, ALFKI.n3, ALFKI.ttl and ALFKI.html. Each variant is described using function HTTP_VARIANT_ADD. (Here, the '$' character is coded using its hex value, \x24.)


DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.xml', 'application/rdf+xml', 0.95, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.n3', 'text/rdf+n3', 0.80, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.ttl', 'application/x-turtle', 0.70, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.html', 'text/html', 0.60, location_hook=>null);

Finally, the paths /tutorial/Northwind/data and /tutorial/Northwind/page have their own rewrite rules, attached to rule lists nwtut_rule_list2 and nwtut_rule_list3 respectively.


DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/data',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/',
is_dav=>1, is_brws=>1, vsp_user=>'dba',
opts=>vector ('url_rewrite', 'nwtut_rule_list_2'));

DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/page',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/',
is_dav=>1, is_brws=>1, vsp_user=>'dba',
opts=>vector ('url_rewrite', 'nwtut_rule_list_3'));

nwtut_rule_list2 contains three rewrite rules (data_rule_1/2/3), one for each RDF description document variant. Each rewrites the resource request as a SPARQL DESCRIBE query, the only difference between the queries being the request response serialization format. nwtut_rule_list3 contains one rule (page_rule_1) to re-route requests for text/html through the /about/html Sponger proxy, and so generate an HTML rendering. Each rule strips off any file suffix identifying the variant; e.g. only the 'Customer/ALKI' portion of 'Customer/ALFKI.n3' or 'Customer/ALFKI.html' is inserted into the rewritten request.


DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_1', 1, '/data/(.*)\\.(xml)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
    query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=rdf',
vector ('par_1'), NULL, NULL, 2, 303, '');

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_2', 1, '/data/(.*)\\.(ttl)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
    query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=n3',
vector ('par_1'), NULL, NULL, 2, 303, '');

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_3', 1, '/data/(.*)\\.(n3|rdf)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
    query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=%U',
vector ('par_1', 'f'), NULL, NULL, 2, 303, '');

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'page_rule_1', 1, '/page/([^.]*)', vector ('par_1'), 1,
'/about/html/http://^{URIQADefaultHost}^/tutorial/Northwind/resource/%s', vector ('par_1'), NULL, '(text/html)', 2, 303);

DBpedia

Under the umbrella of the W3C Linking Open Data (LOD) Community Project , DBpedia is a well known initiative to extract structured information from Wikipedia and make this information available on the Web. The DBpedia knowledge base is accessible through a SPARQL endpoint or through a Linked Data interface. As DBpedia defines Linked Data URIs for millions of concepts, it forms one of the central interlinking hubs in the LOD Cloud and the emerging Web of Data.

When serving the DBpedia dataset as Linked Data, DBpedia supports transparent content negotiation in a similar manner to that already described for the Northwind Tutorial Linked Data Views. Indeed, the Northwind Linked Data View's TCN configuration was modelled as a simplifed version of DBpedia's.

DBpedia uses a slash URI scheme when distinguishing between resource and description document URIs. Depending on the content type preferences of the consuming client expressed in any 'Accept' request headers and the 'best' variant as selected by the server, a request for resource http://dbpedia.org/resource/The_Beatles is redirected to one of:

As with the Northwind Linked Data View, the URI prefixes http://dbpedia.org/resource/..., http/dbpedia.org/page/... and http://dbpedia.org/data/... distinguish between a resource and its HTML or RDF description documents.

The available RDF description document variants can be listed using curl. The command:


curl -I -L -H "Negotiate: vlist" 
           -H "Accept: application/rdf+xml" "http://dbpedia.org/resource/The_Beatles"

yields:


HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=UTF-8
Date: Mon, 18 May 2009 14:47:31 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"The_Beatles.n3" 0.800000 {type text/rdf+n3}}, {"The_Beatles.ttl" 0.700000 {type application/x-turtle}}, 
    {"The_Beatles.xml" 0.950000 {type application/rdf+xml}}
Location: http://dbpedia.org/data/__The_Beatles
Content-Length: 418

Requesting resource "The_Beatles" with RDF/XML as the preferred description format, using:


curl -I -L -H "Negotiate: *" 
           -H "Accept: application/rdf+xml;q=0.95,text/rdf+n3;q=0.80,text/html;q=0.60" "http://dbpedia.org/resource/The_Beatles"

returns:


HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Date: Mon, 18 May 2009 14:56:39 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: The_Beatles.xml
Content-Type: application/rdf+xml; qs=0.9025
Location: http://dbpedia.org/data/The_Beatles.xml
Content-Length: 0

HTTP/1.1 200 OK
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: Keep-Alive
Date: Mon, 18 May 2009 14:56:40 GMT
Accept-Ranges: bytes
Content-Type: application/rdf+xml; charset=UTF-8
Content-Length: 55844

Changing the preferred description format to N3:


curl -I -L -H "Negotiate: *" -H "Accept: application/rdf+xml;q=0.70,text/rdf+n3;q=0.95,text/html;q=0.60" 
     "http://dbpedia.org/resource/The_Beatles"

results in the response:


HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Date: Mon, 18 May 2009 15:00:16 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: The_Beatles.n3
Content-Type: text/rdf+n3; qs=0.76
Location: http://dbpedia.org/data/The_Beatles.n3
Content-Length: 0

HTTP/1.1 200 OK
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: Keep-Alive
Date: Mon, 18 May 2009 15:00:20 GMT
Accept-Ranges: bytes
Content-Type: text/rdf+n3; charset=UTF-8
Content-Length: 29259

DBpedia's URL rewriting rules and TCN support are configured using script dbpedia_init.sql, portions of which are listed below. For completeness, dbpedia_init.sql is available here.

Using VHOST_DEFINE, the logical paths http://dbpedia.org/resource, http://dbpedia.org/page and http://dbpedia.org/data are each associated with URL rewriting rule lists. Requests to /resource are redirected to /page/%s or /data/__%s accordingly depending on whether an HTML or RDF description is being requested, and where %s is the portion of the request path after /resource/. Resource descriptions provided by path /data/__%s are available in three variants RDF/XML, N3 and TTL - each variant is described using HTTP_VARIANT_ADD.


DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org', lpath=>'/resource',
ppath=>'/', is_dav=>0, def_page=>'',
opts=>vector ('url_rewrite', 'dbp_rule_list_2'));

 ...

DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org', lpath=>'/page',
ppath=>registry_get('_dbpedia_path_'),
is_dav=>atoi (registry_get('_dbpedia_dav_')),
opts=>vector ('url_rewrite', 'dbp_rule_list_7'));

 ...

DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org', lpath=>'/data',
ppath=>registry_get('_dbpedia_path_'),
is_dav=>atoi (registry_get('_dbpedia_dav_')), vsp_user=>'dba',
opts=>vector ('url_rewrite', 'pvsp_rule_list2'));

DB.DBA.URLREWRITE_CREATE_RULELIST ( 'dbp_rule_list_2', 1, vector ('dbp_rule_14', 'dbp_rule_12'));

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_14', 1, '/resource/(.*)\x24', vector ('par_1'), 1,
'/page/%s', vector ('par_1'), NULL, NULL, 2, 303, NULL);

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_12', 1, '/resource/(.*)\x24', vector ('par_1'), 1,
'/data/__%s', vector ('par_1'), NULL, '(application/rdf.xml)|(text/rdf.n3)|(application/x-turtle)', 2, 303);

delete from DB.DBA.HTTP_VARIANT_MAP where VM_RULELIST = 'dbp_rule_list_2';

DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.xml', 'application/rdf+xml', 0.95, location_hook=>null);

DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.n3', 'text/rdf+n3', 0.80, location_hook=>null);

DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.ttl', 'application/x-turtle', 0.70, location_hook=>null);

 ...

DB.DBA.URLREWRITE_CREATE_RULELIST ( 'dbp_rule_list_7', 1, vector ('dbp_rule_13'));

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_13', 1, '(/[^#]*)', vector ('par_1'), 1,

registry_get('_dbpedia_path_')||'description.vsp?res=%U', vector ('par_1'), NULL, NULL, 0, 0, '');

 ...

DB.DBA.URLREWRITE_CREATE_RULELIST ( 'pvsp_rule_list2', 1, vector ('pvsp_data_rule2', 'pvsp_data_rule3', 'pvsp_data_rule4'));

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule2', 1, '/data/(.*)\\.(xml)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
    query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=rdf',
vector ('par_1'), NULL, NULL, 2, null, '');

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule3', 1, '/data/(.*)\\.(ttl)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
    query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=n3',
vector ('par_1'), NULL, NULL, 2, null, '');

DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule4', 1, '/data/(.*)\\.(n3|rdf)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
    query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=%U',
vector ('par_1', 'f'), NULL, NULL, 2, null, '');

Requests redirected to /data/__%s are redirected to /data/%s.xml, /data/%s.ttl or /data/%s.(n3|rdf) depending on the content type of the chosen variant. The data for these RDF variants is furnished by similar SPARQL DESCRIBE queries which differ only in the format= query string parameter used to specify the result set representation format.

Requests redirected to /page/%s are in turn redirected to the page description template description.vsp which provides the HTML rendering. In effect, this is equivalent to the external 303 redirect to the /about/html proxy used by the Northwind tutorial Linked Data View - the proxy uses description.vsp internally.

Simplifying Deployment with RDFa

With Yahoo and Google both having announced support for RDFa, this format has arguably become the most important of the RDF syntaxes. From the perspective of content providers, RDFa brings other benefits beyond the obvious attraction of increasing your content's page rank by providing more accurate, semantically richer metadata to RDFa aware crawlers. Key amongst these is that RDFa provides the simplest route to deploying Linked Data.

No Content Negotiation or 303 Redirects

In this guide we have emphasized the distinction between a real world concept or entity and its, possibly many, descriptions, where each description is associated with a different media-type. Earlier examples have shown how to serve multiple representation formats: HTML, RDF/XML, N3, TTL etc. In essence these formats boil down to a choice between either an HTML representation or some variant of RDF. What RDFa gives you is both representations combined in a single entity description document. Consequently the need for content negotiation or 303 redirects to different representation documents is removed. This fundamental difference is depicted in the following three diagrams contrasting the differences between serving content using HTML+RDFa and serving separate HTML and RDF description documents through a hash or slash URI scheme.

Content negotiation with a hash URI scheme
Content negotiation with a hash URI scheme
Content negotiation and 303 redirect with a slash URI scheme
Content negotiation and 303 redirect with a slash URI scheme
RDFa can remove the need for content negotiation and HTTP redirects
HTML+RDFa potentially removes the need for content negotiation and HTTP redirects

Generating RDFa Dynamically Using Description.vsp

While authors of small sites might opt to serve static content and mark up their HTML with RDFa manually, for large datasets this becomes unattractive. In cases where the HTML representation itself is being generated from an RDF quad store, it makes sense to generate any embedded RDFa alongside the HTML. Virtuoso provides this option through description.vsp, a Virtuoso Server Page which provides an HTML description of RDF Linked Data. Appendix A provides a brief overview.

When dereferencing an entity URI, the description returned is determined by the media-type(s) specified in any Accept headers expressing the client's preferred representation formats. A client can request an XHTML+RDFa description by supplying an Accept header with a media-type of application/xhtml+xml or text/html. In the absence of Accept headers, OpenLink's rewriting rules are normally configured to return HTML+RDFa by default. (Rewriting rules configured by the cartridges_dav VAD typically use this convention.)

As our earlier coverage of Virtuoso's proxy service URIs explained, requests for an HTML rendering of an entity description are normally redirected internally to the /about/html proxy. This proxy in turn uses description.vsp to generate an HTML rendering with embedded RDFa. So, by exploiting the default URL rewriting rules, internal redirects (as opposed to much slower external 303 redirects) and the /about/html proxy service, it is possible to combine description.vsp's HTML+RDFa generation capabilities with the deployment benefits of RDFa.

RDFa Output From Non-RDF Data Sources

If viewing Virtuoso purely as an RDF publishing service, RDFa simply constitutes another supported syntax for encoding RDF metadata, alongside RDF/XML, N3, Turtle, NTriples and JSON. However, RDF metadata drawn from the Virtuoso quad store and rendered in one of these formats can itself have been extracted directly or synthesised from a multitude of non-RDF data sources using Virtuoso's Sponger. (Obviously raw RDF data can also be imported directly.)

When sponging an XHTML resource, the Sponger will, via the xHTML cartridge, automatically ingest any RDFa found and cache the extracted RDF in the quad store. But, the Sponger can also generate RDF metadata describing non-RDF data sources. The net result is that the Sponger in combination with description.vsp can generate RDFa for data sources containing neither RDF nor RDFa.

Sample RDFa Output From Description.vsp

As well as being invoked by the /about/html proxy, description.vsp also underpins the OpenLink Data Explorer's "View Page Metadata" option. ODE provides a simple means to examine the RDFa generated by description.vsp.

The screenshot below shows ODE's "View Page Metadata" output when http://www.crunchbase.com/company/twitter is fetched by the public Sponger at http://linkeddata.uriburner.com. The subsequent screenshot highlights some of the RDFa markup in a heavily cutdown extract from the description.vsp generated page source.

Retrieved Twitter company profile
Retrieved Twitter company profile
Page source extract highlighting snippets of generated RDFa
Page source extract highlighting snippets of generated RDFa

Essentially, in the description.vsp output page, values listed under the "Has Attributes & Values" tab are described using RDFa attributes @rel and @resource, if the object part of the triple is a URI, or using @property if the object part is a literal. Entities listed under the "Is Attribute Value Of" tab are described using RDFa attributes @rev and @resource.

Back to Deploying Linked Data Guide | Previous: Browsing & Exploring the Northwind Linked Data View | Next: Appendix A: Description.vsp - Rendering RDF as HTML