Section Contents:
The preceding sections described a generic approach to deploying Linked Data into the existing Web. We now turn our attention to Virtuoso, to describe its solution for Linked Data deployment. In fact, Virtuoso's solution is to implement the generic approach outlined in the prior sections, using the twin pillars of content negotiation and URL rewriting.
Virtuoso provides a URL rewriter that can be enabled for URLs matching specified patterns. Coupled with customizable HTTP response headers and response codes, Linked Data Web server administrators can configure highly flexible rules for driving content negotiation and URL rewriting. The key elements of the URL rewriter are:
Location:" response headersEach of these elements is described in more detail below, although complete descriptions of the features or functions in question are not given. The intention here is to provide an overview of Virtuoso's URL rewriting capabilities and their application to deploying Linked Data. Please refer to the Virtuoso Reference Documentation for full details.
Virtuoso is a full-blown HTTP server in its own right. The HTTP server functionality co-exists with the product core (i.e. DBMS Engine, Web Services Platform, WebDAV filesystem, and other components of the Universal Server). As a result, it has the ability to multi-home Web domains within a single instance across a variety of domain name and port combinations. In addition, it also enables the creation of multiple virtual directories per domain.
In addition to the basic functionality describe above, Virtuoso lets you associate URL rewrite rules with the virtual directories associated with a particular hosted Web domain.
In all cases, Virtuoso enables you to configure virtual domains, virtual directories and URL rewrite rules for one or more virtual directories, via the (X)HTML-based Conductor Admin User Interface or a collection of Virtuoso Stored Procedure Language (PL)-based APIs.
A Virtuoso virtual directory maps a logical path to a physical directory in
your file system or WebDAV repository. This mechanism allows physical locations
to be hidden or simply reorganised. Virtual directory definitions are held in
the system table DB.DBA.HTTP_PATH. Virtual directories can be
administered in three basic ways:
vhost_define() and
vhost_remove(); andHTTP_PATH system
table.Although we are approaching the URL Rewriter from the perspective of deploying Linked Data, the rewriter was developed with additional objectives in mind. These in turn have influenced the naming of some of the formal argument names in the Configuration API function prototypes. In the following sections, "long" URLs are those containing a query string with named parameters; "nice" (also known as "source") URLs have data encoded in some other format. The primary goal of the Rewriter was to accept a nice URL from an application and convert this into a long URL, which then identifies the page that should actually be retrieved.
When an HTTP request is accepted by the Virtuoso HTTP server, the received
nice URL is passed to an internal path translation function. This function
takes the nice URL and, if the current virtual directory has a
url_rewrite option set to an existing rule list name, tries to
match the corresponding rule lists and rules; that is, the function performs a
recursive traversal of any rule list associated with the virtual directory. For
every rule in the rule list, the same logic is applied (only the logic for
regex-based rules is described; that for sprintf-based rules is very
similar):
/' after
the host:port fields to the end of the URL.split_and_decode().POST
method, the value of a named parameter in the body of the
POST request; orNote:The path translation function described above is internal
to the Web server, so its signature is not appropriate for Virtuoso/PL calls
and thus is not published. Virtuoso/PL developers can harness the same
functionality using the DB.DBA.URLREWRITE_APPLY API call.
The URL rewriting examples which follow are taken from the Virtuoso Northwind demonstration database, which is included in the Demo VAD (Virtuoso Application Distribution) archive.
To check which version of the Demo VAD is installed, or to upgrade it, refer to the Conductor's 'VAD Packages' screen, reachable through the 'System Admin' > 'Packages' menu items. The latest VADs for the closed source releases of Virtuoso can be downloaded from the downloads area of the OpenLink website. Select either the 'DBMS (WebDAV) Hosted' or 'File System Hosted' product format from the 'Distributed Collaborative Applications' section, depending on whether you want the Virtuoso application to be run from WebDAV or native filesystem storage. VADs for Virtuoso Open Source edition (VOS) are available for download from the VOS Wiki. |
The Virtuoso Northwind database (contained in the "Demo" catalog) is very
similar to the Northwind example database available for SQL Server. Its schema
comprises commonly understood SQL tables that include: Customers,
Orders, Employees, Products,
Product Categories, Shippers, Countries,
Provinces, etc.
Northwind is installed with a preconfigured RDF view and a set of preconfigured URL rewrite rules that collectively expose RDF based entity graphs and URLs of (X)HTML web pages that describe the back-end relational data.
An RDF View over relational data is a named collection (graph) of RDF records (triples) derived from an RDBMS-to-RDF source data map exposed via a Virtuoso Quad Store. The process of declaring RDF Views over RDBMS data using the Virtuoso Meta-schema Language is described in detail in our RDF Views of SQL white paper.
To view the Northwind entity graph in RDF format, starting with the entity
"ALFKI", simply place the following document URL into the OpenLink Data Explorer
:
http://demo.openlinksw.com/Northwind/Customer/ALFKI
Alternatively, you can view an (X)HTML based description of the entity
"ALFKI" by pointing your Web browser to the same URL. (The details
of these URLs will be explained shortly; for now they are presented purely as
pointers to illustrate example data available from Northwind.)
The steps for configuring URL Rewrite rules via the Virtuoso Conductor are as follows:
Conductor's Hosted Domains and Virtual Directories
screen
Accessing the URL rewrite rules for the Northwind demo
database
Northwind URL rewrite rule for HTML requests
Northwind URL rewrite rule for RDF requests cURL" or any other HTTP-based user agent.The screenshots above show the default Northwind rewrite rules. Let's analyze what they are doing.
The regex rule for handling RDF/XML or N3 representation requests specifies
a 'Request Path Pattern' of (/[^#]*) . Recall that
the input path is the portion of the input URL from the first '/'
after the host:port fields to the end of the normalized URL. So,
given a request for
http://demo.openlinksw.com/Northwind/Customer/ALFKI, the request
path pattern would match /Northwind/Customer/ALFKI. Parentheses in
the pattern collect the results of the pattern matching into parameters. Each
successive pair of parentheses denotes a parameter, referred to elsewhere in
the rewrite rule as $U1, $U2, $U3, ..., or $s1, $s2, $s3,
..., etc. These parameters can then be used to substitute a part of the
input path that was matched into the new URL being composed. The parameter
markers $U1 and $s1 (likewise $U2 and
$s2 etc.) identify the same pattern segment in the request path
pattern. The only difference between them is how the matched text is encoded
when it is inserted into the new URL. The 's' format specifier
inserts the matched text as is, whereas the 'U' format specifier
causes the inserted text to be URL encoded.
Content types specified in the request's Accept header and
matched by the 'Accept Header Request Pattern' are available for
substitution into the rewritten URL through the $accept
variable.
Rather than hardcoding host names and ports, the rules are made more generic
by using the convenience macro URIQADefaultHost. Every occurrence
of ^{URIQADefaultHost}^ will be substituted with the value of the
DefaultHost parameter defined in the URIQA section of the Virtuoso
configuration file, virtuoso.ini. "DefaultHost" is
the "canonical" server name that is used to identify the service. It should be
either a server host name including domain (i.e. an FQDN), or an IP address in
standard notation. If Virtuoso's default HTTP port is not equal to
80 then the port should also be included, e.g.
"www.example.com:8890".
The parameter markers, variables and macros just described provide the
building blocks for constructing the 'Destination Path Format' which
serves as a template for the rewritten URL. It must be stressed that it is not necessary to URL-encode the Destination
Path Format by hand. You need only write the underlying
CONSTRUCT or DESCRIBE SPARQL query. When defining a
new Destination Path Format, click on the SPARQL button to enable a text box
(shown below) into which you can enter the base SPARQL query which will
describe the entity being dereferenced. On clicking the 'Format' button
to return, the SPARQL query will be expanded into a full query string,
including a result-set format-specifier, and URL-encoded automatically. For
example, the base query:
DESCRIBE <http://^{URIQADefaultHost}^$U1#this> <http://^{URIQADefaultHost}^$U1> FROM <http://^{URIQADefaultHost}^/Northwind>
becomes:
/sparql?query=DESCRIBE+%3Chttp%3A//^{URIQADefaultHost}^$U1%23this%3E+%3Chttp%3A//^{URIQADefaultHost}^$U1%3E+FROM+%3Chttp%3A//^{URIQADefaultHost}^/Northwind%3E&format=$accept
The pre-configured
http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI#this
and
http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI
http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI
identifies a document (an entity of type
foaf:Document) that has the entity
http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI#this
as its foaf:PrimaryTopic property value. This
relationship is the key to using the description of the document (a
report) about "ALFKI" to expose the deeper entity graph
that describes the entity "ALFKI#this". |
Defining the SPARQL query for the Northwind RDF
requests The process of rewriting a request for an RDF representation of Northwind customer ALFKI, through the corresponding regex rule, is depicted below as a data flow diagram. The arcs connecting similarly-colored items attempt to illustrate how portions of the input request are matched and substituted into the rewritten request.
Breakdown of the URL rewriting process for Northwind RDF
requests The Northwind regex rule for HTML requests functions in a similar way to the regex rule for RDF requests. That is, the mechanisms for pattern matching and parameter substitution are the same. The only differences are the content types matched and the target URL.
In this case, the destination path format is:
/about/html/http://^{URIQADefaultHost}^$s1
Here, the path /about/html/ redirects the client to the Virtuoso Sponger proxy interface. The Sponger itself is a
highly customizable RDFizer. Virtuoso reserves two paths for the proxy service,
'/about/rdf/' and '/about/html/'. (Note: These proxy paths
have since been augmented to support a richer slash URI scheme for identifying format variants.
Please refer to Appendix B for more details.)
The web service takes the target URL following the proxy path and either returns the content
"as is" or tries to transform it to RDF. The RDF graph derived from the
sponging process is then rendered in one of the RDF serialization formats
(RDF/XML or N3) or HTML depending on whether the request specified /about/rdf/
or /about/html/. Thus, the proxy service can be used as a middleware for
enabling RDF based exploration of non-RDF sources using dedicated RDF browsers
or standard (X)HTML browsers.
The mechanism through which Virtuoso composes an HTML rendering of RDF data
(whether this be a native RDF description, or one extracted by the Sponger) is
via the "description.vsp" rendering template, a specialized Virtuoso
Server Page specifically aimed at RDF-model-based resource description. The
"description.vsp" template is described in more detail in Appendix
A. A usage example covering the description of the entity
<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>
is shown below.
description.vsp HTML rendering of Customer entity ALFKI While the Conductor UI provides the easiest way to set up URL rewriting, on occasion it may be preferable to configure URL rewriting programmatically using Virtuoso PL.
The Conductor lets you export configured rules as Virtuoso PL, making it easier to use them on another system, for instance. The exported script recreates the rewrite rules using Virtuoso's URL Rewriting Configuration API.
Conductor's 'Export' button for exporting URL rewrite rules The code listing below shows the exported Northwind rules. Describing the Configuration API and this exported rules file forms the focus of this section.
DB.DBA.VHOST_REMOVE ( lhost=>'*ini*', vhost=>'*ini*', lpath=>'/Northwind');
DB.DBA.VHOST_DEFINE ( lhost=>'*ini*', vhost=>'*ini*', lpath=>'/Northwind',
ppath=>'/DAV/home/demo/', is_dav=>1, vsp_user=>'dba', ses_vars=>0, opts=>
vector ('url_rewrite', 'demo_nw_rule_list1'), is_default_host=>0);
DB.DBA.URLREWRITE_CREATE_RULELIST ( 'demo_nw_rule_list1', 1, vector ('demo_nw_rule1', 'demo_nw_rule2'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'demo_nw_rule1', 1, '(/[^#]*)', vector ('path'), 1,
'/about/html/http://^{URIQADefaultHost}^%s', vector ('path'), NULL, '(text/html)|(\\*/\\*)', 0, 303, NULL );
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'demo_nw_rule2', 1, '(/[^#]*)', vector ('path'), 1,
'/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%3E+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Northwind%%3E&format=%U',
vector ('path', 'path', '*accept*'), NULL, '(text/rdf.n3)|(application/rdf.xml)', 0, NULL, NULL );
| Exporting Rewrite Rules from a Script
Use the function
|
As can be seen above, the vhost_define() API call is used to
define virtual hosts and virtual paths hosted by the Virtuoso HTTP server. URL
rewriting is enabled through this function's opts parameter.
opts is of type ANY, e.g. a vector of field-value
pairs. Numerous fields are recognized for controlling different options. The
field value url_rewrite controls URL rewriting. The corresponding
field value is the IRI of a rule list to apply.
Virtuoso includes the following functions for managing URL rewrite rules and rule lists. The names are self-explanatory.
DB.DBA.URLREWRITE_DROP_RULE - Deletes a rewrite
rule.DB.DBA.URLREWRITE_CREATE_SPRINTF_RULE - Creates a
rewrite rule which uses sprintf-based pattern matching.DB.DBA.URLREWRITE_CREATE_REGEX_RULE - Creates a
rewrite rule which uses regular expression (regex)-based pattern
matching.DB.DBA.URLREWRITE_DROP_RULELIST - Deletes a rewrite
rule list.DB.DBA.URLREWRITE_CREATE_RULELIST - Creates a
rewrite rule list.DB.DBA.URLREWRITE_ENUMERATE_RULES - Lists all the
rules whose IRIs match the specified 'SQL like' pattern.DB.DBA.URLREWRITE_ENUMERATE_RULELISTS - Lists all
the rule lists whose IRIs match the specified 'SQL like' pattern.Rewrite rules take two forms: sprintf-based or regex-based. When used for nice URL to long URL conversion, the only difference between them is the syntax of format strings. The reverse long to nice conversion works only for sprintf-based rules, whereas regex-based rules are unidirectional. For the purpose of describing how to make dereferenceable URIs for Linked Data, we will focus on regex-based rules.
Regex rules are created using the
URLREWRITE_CREATE_REGEX_RULE() function.
Function Prototype:
URLREWRITE_CREATE_REGEX_RULE ( rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null, accept_pattern := null, do_not_continue := 0, http_redirect_code := null, http_headers := null );
Parameters: rule_iri
: VARCHAR
allow_update : INTEGER
1 indicates yes;
0 indicates no. The update is subject to the following rules:
rule_iri is already in use as a rule list
identifier, an error is signalled.rule_iri is already in use as a rule
identifier and allow_update for the existing rule is zero,
an error is signalled.rule_iri is already in use as a rule
identifier and allow_update for the existing rule is
non-zero, the existing rule is updated.nice_match : VARCHAR
nice_params : ANY
(...)' specifiers in the
format string.nice_min_params : INTEGER
target_compose : VARCHAR
target_params : ANY
target_compose) as $1,
$2 and so on.target_expn : VARCHAR
accept_pattern : VARCHAR
Accept headerdo_not_continue : INTEGER
1
signifies do not try the next rule from same rule list, and 0
signifies try the next rule.http_redirect_code : INTEGER
NULL or the integer values 301,
302, 303, or 406, are currently
allowed. If a 3xx redirect code is given, an HTTP redirect
response will be sent back to client. If NULL is specified,
the server will process the redirect internally.http_headers : VARCHAR
Having briefly outlined the URL Rewriting API, we return now to the Northwind rule configuration script listed earlier.
At the start of the script, we define a virtual directory in order to turn
on URL rewriting through vhost_define(). We first remove any
existing definition for logical path /Northwind on the virtual
host defined by vhost, before redefining the logical path.
vhost specifies the host name sent to a user-agent in an HTTP
response. This must be a valid fully-qualified host name or alias and port
separated by ':'. This parameter accepts the special value
'*ini*' which will be replaced with the hostname and port
configured in the virtuoso.ini file.
The /Northwind virtual directory is mapped to a DAV folder
(indicated by is_dav being non-zero) whose physical path is
/DAV/home/demo. The machine hosting the virtual directory listens
on the IP address and port specified by lhost (i.e., listen host).
Like vhost, this accepts the special value '*ini*'.
Any VSP pages contained in the virtual directory will run as user
'dba'.
URL rewriting is enabled through the url_rewrite field in the
opts vector; the URL rewriter will use the rule list named
demo_nw_rule_list1. The latter is defined by the
URLREWRITE_CREATE_RULELIST function call which follows. The rule
list contains two regex-based rules, demo_nw_rule1 and
demo_nw_rule2, each defined by calls to function
URLREWRITE_CREATE_REGEX_RULE.
Consider first rule demo_nw_rule2. In this rule, the regular
expression '(/[^#]*)' specified for nice_match matches the input
IRI up to fragment delimiter (#). The corresponding occurrence is
named 'path' in the nice_params vector. The client
must be requesting the return data as RDF serialized as N3 or RDF/XML in order
for the rule to apply.
Argument target_compose specifies a URL-encoded template for
the rewritten destination URL. Spaces are encoded as '+' or
'%20', the reserved character '#' is percent-encoded as '%23' and the
'%' character itself is escaped by '%'.
Removing the URL encoding and the final format specifier
('&format=%U'), the SPARQL DESCRIBE query being
built takes the form:
DESCRIBE <http://^{URIQADefaultHost}^%U#this>
<http://^{URIQADefaultHost}^%U> FROM
<http://^{URIQADefaultHost}^/Northwind>
Unsurprisingly this is almost identical to the SPARQL query displayed by
Conductor, when the same rewrite rules are viewed through the Conductor UI. The
only difference lies in the slightly different syntax used for parameter
markers (%U or %s, as opposed to $U1, $U2,
... or $s1, $s2, ... in Conductor). Here, the two
sprintf-like format characters %U are placeholders which receive
the first two entries in the target_params vector, i.e., the value
of 'path'. In our example, the value of 'path' would
be '/Northwind/Customer/ALFKI'.
The query response format is controlled by the format query parameter. In
the format specifier ('&format=%U') at the end of the
constructed query string, the third placeholder '%U' receives the
value of the third entry in the target_params vector,
'*accept*'. The '*accept*' parameter is used to pass
the part of Accept header matched against
accept_pattern, e.g. if the Accept header specified
MIME types of 'application/rdf+xml, application/xml' and the
accept_pattern is
'(text/rdf.n3)|(application/rdf.xml)', then the
'*accept*' parameter will have the value of
'application/rdf+xml'.
The other rule, demo_nw_rule1, is essentially similar, but
targeted at HTML browsers rather than RDF browsers. Rather than the internal
redirect used by demo_nw_rule2, this rule returns HTTP redirect
code 303 to the client when the rewrite rule is applied.
Internal Rewrites vs External Redirects External redirect: Tells the client to ask for the requested content again using a new URL and HTTP request. An external redirect is indicated by one of the HTTP response codes: 301 - Moved permanently (for permanent
redirection)302 - Found (the most common way of performing
a redirection)303 - See Other (the correct manner in which
to redirect web applications to a new URI)Internal rewrite/redirect: Gets the content for the requested URL from a different server file path than implied by the requested URL. |
As described earlier when examining the Conductor-configured rules, HTML
requests are redirected to description.vsp via the Sponger proxy
interface.
System Tables Supporting URL Rewriting If you need to check your rewrite rule definitions, an alternative
to inspecting them using Conductor is to query Virtuoso's system tables
directly. The relevant system tables for URL rewriting are
|
Earlier we presented a data flow diagram showing the process of rewriting a request for an RDF representation of Northwind customer ALFKI, through a regex rule defined in the Conductor. Below is a similar diagram, depicting the same request rewrite, this time using the Virtuoso PL definition of the same rule. As before, the arcs connecting similarly coloured items illustrate how portions of the input request are matched and substituted into the rewritten request.
Breakdown of the URL rewriting process for Northwind RDF requests
cURLAs illustrated earlier, the curl utility provides a useful tool for verifying HTTP server responses and rewrite rules. The first two curl exchanges below show the default Northwind URL rewrite rules being applied.
Example 1:
$ curl -I -H "Accept: text/html" http://demo.openlinksw.com/Northwind/Customer/ALFKI HTTP/1.1 303 See Other Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Content-Type: text/html; charset=ISO-8859-1Date: Fri, 06 Feb 2009 11:11:01 GMT Accept-Ranges: bytes Location: http://demo.openlinksw.com/about/html/http://demo.openlinksw.com/Northwind/Customer/ALFKI Content-Length: 0
Example 2:
$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI HTTP/1.1 200 OK Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: Keep-Alive Date: Fri, 06 Feb 2009 11:14:49 GMT Accept-Ranges: bytes Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 9488
Example 3:
$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI HTTP/1.1 303 See Other Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Content-Type: text/html; charset=ISO-8859-1 Date: Thu, 12 Feb 2009 11:23:31 GMT Accept-Ranges: bytes Location: http://demo.openlinksw.com/sparql?query=DESCRIBE+%3Chttp%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%23this%3E+%3Chttp%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%3E+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E&format=application%2Frdf%2Bxml Content-Length: 0
The third example shows the response generated when the default rule for RDF requests is changed to return an HTTP response code of 303, rather than use an internal redirect. Making this temporary change allows the generated SPARQL query to be viewed and checked with curl.
In this section, we are going to interact with Linked Data deployed into the Linked Data Web from a live instance of Virtuoso, which uses the URL Rewrite rules from the prior section.
The components used in the example are as follows:
Steps:
Continuing in this way, one can navigate over the Northwind RDF graph, drilling down to uncover more details of selected entities.
We can interact with the same Information Resource and associated RDF using the iSPARQL Query tool as follows:
So as not to overload our preceding description of Linked Data deployment with excessive detail, the description of content negotiation presented thus far was kept deliberately brief. This section discusses content negotiation in more detail.
Recall that a resource (conceptual entity) identified by a URI may be associated with more than one representation (e.g. multiple languages, data formats, sizes, resolutions). If multiple representations are available, the resource is referred to as negotiable and each of its representations is termed a variant. For instance, a Web document resource, named 'ALFKI' may have three variants: alfki.xml, alfki.html and alfki.txt all representing the same data. Content negotiation provides a mechanism for selecting the best variant.
As outlined in the earlier brief discussion of content negotiation, when a user agent requests a resource, it can include with the request Accept headers (Accept, Accept-Language, Accept-Charset, Accept-Encoding etc.) which express the user preferences and user agent capabilities. The server then chooses and returns the best variant based on the Accept headers. Because the selection of the best resource representation is made by the server, this scheme is classed as server-driven negotiation.
An alternative content negotiation mechanism is Transparent Content Negotiation (TCN), a protocol defined by RFC2295 . TCN offers a number of benefits over standard HTTP/1.1 negotiation, for suitably enabled user agents.
RFC2295 introduces a number of new HTTP headers including the Negotiate request header, and the TCN and Alternates response headers. (Krishnamurthy et al. note that although the HTTP/1.1 specification reserved the Alternates header for use in agent driven negotiation, it was not fully specified. Consequently under a pure HTTP/1.1 implementation as defined by RFC2616, server-driven content negotiation is the only option. RFC2295 addresses this issue.)
Weaknesses of server-driven negotiation highlighted by RFCs 2295 and 2616 include:
Rather than rely on server-driven negotiation and variant selection by the server, a user agent can take full control over deciding the best variant by explicitly requesting transparent content negotiation through the Negotiate request header. The negotiation is 'transparent' because it makes all the variants on the server visible to the agent.
Under this scheme, the server sends the user agent a list, represented in an Alternates header, containing the available variants and their properties. The user agent can then choose the best variant itself. Consequently, the agent no longer needs to send large Accept headers describing in detail its capabilities and preferences. (However, unless caching is used, user-agent driven negotiation does suffer from the disadvantage of needing a second request to obtain the best representation. By sending its best guess as the first response, server driven negotiation avoids this second request if the initial best guess is acceptable.)
As well as variant selection by the user agent, TCN allows the server to choose on behalf of the user agent if the user agent explicitly allows it through the Negotiate request header. This option allows the user agent to send smaller Accept headers containing enough information to allow the server to choose the best variant and return it directly. The server's choice is controlled by a 'remote variant selection algorithm' as defined in RFC2296.
A further option is to allow the end-user to select a variant, in case the choice made by negotiation process is not optimal. For instance, the user agent could display an HTML-based 'pick list' of variants constructed from the variant list returned by the server. Alternatively the server could generate this pick list itself and include it in the response to a user agent's request for a variant list. (Virtuoso currently responds this way.)
The following section describes the Virtuoso HTTP server's TCN implementation which is based on RFC2295, but without "Feature" negotiation. OpenLink's RDF rich clients, iSparql and the OpenLink RDF Browser, both support TCN. User agents which do not support transparent content negotiation continue to be handled using HTTP/1.1 style content negotiation (whereby server-side selection is the only option - the server selects the best variant and returns a list of variants in an Alternates response header).
In order to negotiate a resource, the server needs to be given information about each of the variants. Variant descriptions are held in SQL table HTTP_VARIANT_MAP. The descriptions themselves can be created, updated or deleted using Virtuoso/PL or through the Conductor UI.
The table definition is as follows:
create table DB.DBA.HTTP_VARIANT_MAP ( VM_ID integer identity, -- unique ID VM_RULELIST varchar, -- HTTP rule list name VM_URI varchar, -- name of requested resource e.g. 'page' VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml', 'page.de.html' etc. VM_QS float, -- Source quality, a number in the range 0.001-1.000, with 3 digit precision VM_TYPE varchar, -- Content type of the variant e.g. text/xml VM_LANG varchar, -- Content language e.g. 'en', 'de' etc. VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892' etc. VM_DESCRIPTION long varchar, -- a human readable description about the variant e.g. 'Profile in RDF format' VM_ALGO int default 0, -- reserved for future use primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI) ); create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID);
Two functions are provided for adding or updating, or removing variant descriptions using Virtuoso/PL:
Adding or Updating a Resource Variant
DB.DBA.HTTP_VARIANT_ADD ( in rulelist_uri varchar, -- HTTP rule list name in uri varchar, -- Requested resource name e.g. 'page' in variant_uri varchar, -- Variant name e.g. 'page.xml', 'page.de.html' etc. in mime varchar, -- Content type of the variant e.g. text/xml in qs float := 1.0, -- Source quality, a floating point number with 3 digit precision in 0.001-1.000 range in description varchar := null, -- a human readable description of the variant e.g. 'Profile in RDF format' in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc. in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc. )
Removing a Resource Variant
DB.DBA.HTTP_VARIANT_REMOVE ( in rulelist_uri varchar, -- HTTP rule list name in uri varchar, -- Name of requested resource e.g. 'page' in variant_uri varchar := '%' -- Variant name filter )
The Conductor 'Content negotiation' panel for describing resource variants and configuring content negotiation is depicted below. It can be reached by selecting the 'Virtual Domains & Directories' tab under the 'Web Application Server' menu item, then selecting the 'URL rewrite' option for a logical path listed amongst those for the relevant HTTP host, e.g. '{Default Web Site}'.
The screen snapshot shows the variant descriptions created by issuing the HTTP_VARIANT_ADD and VHOST_DEFINE Virtuoso/PL calls detailed in the examples at the end of this section. Obviously these definitions could instead have been created entirely 'from scratch' through the Conductor UI.
The input fields reflect the supported 'dimensions' of negotiation which include content type, language and encoding. Quality values corresponding to the options for 'Source Quality' are as follows:
| Source Quality | Quality Value |
| perfect representation | 1.000 |
| threshold of noticeable loss of quality | 0.900 |
| noticeable, but acceptable quality reduction | 0.800 |
| barely acceptable quality | 0.500 |
| severely degraded quality | 0.300 |
| completely degraded quality | 0.000 |
Content negotiation rules in Conductor When a user agent instructs the server to select the best variant, Virtuoso does so using the selection algorithm below:
If a virtual directory has URL rewriting enabled (has the 'url_rewrite' option set), the web server:
The server may return the best-choice resource representation or a list of available resource variants. When a user agent requests transparent negotiation, the web server returns the TCN header "choice". When a user agent asks for a variant list, the server returns the TCN header "list".
In this example we assume the following files have been uploaded to the Virtuoso WebDAV server, with each containing the same information but in different formats:
We add TCN rules and define a virtual directory:
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html', 0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain', 0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml', 1.000000, 'XML variant');
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1, vsp_user=>'dba',
opts=>vector ('url_rewrite', 'http_rule_list_1'));
Having done this we can now test the setup with a suitable HTTP client, in this case the curl command line utility. In the following examples, the curl client supplies Negotiate request headers containing content negotiation directives which include:
The server returns a TCN response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate.
In the first curl exchange, the user agent indicates to the server that, of the formats it recognizes, HTML is preferred and it instructs the server to perform transparent content negotiation. In the response, the Vary header field expresses the parameters the server used to select a representation, i.e. only the Negotiate and Accept header fields are considered.
$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://localhost:8890/DAV/TCN/page
HTTP/1.1 200 OK
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2007 15:43:18 GMT Accept-Ranges: bytes
TCN: choice Vary: negotiate,accept
Content-Location: page.html
Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html>
<body>
some html
</body>
</html>
Next, the source quality values are adjusted so that the user agent indicates that XML is its preferred format.
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://localhost:8890/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2007 15:44:07 GMT
Accept-Ranges: bytes TCN: choice
Vary: negotiate,accept
Content-Location: page.xml
Content-Type: text/xml
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39
<?xml version="1.0" ?>
<a>some xml</a>
In the final example, the user agent wants to decide itself which is the most suitable representation, so it asks for a list of variants. The server provides the list, in the form of an Alternates response header, and, in addition, sends an HTML representation of the list so that the end user can decide on the preferred variant himself if the user agent is unable to.
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: close Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2007 15:44:35 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type text/plain}},
{"page.xml" 1.000000 {type text/xml}}
Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>300 Multiple Choices</title>
</head>
<body>
<h1>Multiple Choices</h1>
Available variants:
<ul>
<li><a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>
</ul>
</body>
</html>
Our next example illustrates the use of a slash URI scheme in an RDF view, and shows how to combine URL rewriting and transparent content negotiation. The example is taken from the RDF View tutorial , one of many Virtuoso on-line tutorials.
The view generates an RDF rendering of Virtuoso's Northwind 'Demo' database. (Note: The 'tutorial' RDF view described here is distinct from the hash-URI-based 'demo' RDF view created by the Demonstration VAD.) If you intend trying the example locally, both the Demonstration and Tutorial VADs must be installed on the local machine.
To generate the RDF view and setup the URL rewriting rules, the tutorial runs the script rd_v_1.sql (see the 'View Source' tab of the RDF View tutorial, or WebDAV folder DAV/VAD/tutorial/rdfview/rd_v_1). The view creates two RDF graphs:
http://<URIQADefaultHost>/tutorial/Northwind -
containing the base RDF datahttp://<URIQADefaultHost>/tutorial/Northwind/ontology
- containing the OWL class definitionsA slash URI scheme is adhered to throughout. Each entity exposed by the view
is identified by the URI prefix
http://<URIQADefaultHost>/tutorial/Northwind/resource/. For
example:
RDF and HTML representation documents describing Northwind entities are
identified by URIs with prefixes
http://<URIQADefaultHost>/tutorial/Northwind/data/ and
http://<URIQADefaultHost>/tutorial/Northwind/page/, e.g.
Transparent content negotiation is enabled to allow entity representations to be rendered in several formats. The available variants can be seen using curl. e.g.
curl -I -L -H "Negotiate: vlist" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"
returns
HTTP/1.1 303 See Other
Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 15 May 2009 11:11:19 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"ALFKI.html" 0.600000 {type text/html}}, {"ALFKI.n3" 0.800000 {type text/rdf+n3}},
{"ALFKI.ttl" 0.700000 {type application/x-turtle}}, {"ALFKI.xml" 0.950000 {type application/rdf+xml}}
Location: http://demo.openlinksw.com/tutorial/Northwind/page/Customer/ALFKI
Content-Length: 443
Requesting RDF/XML as the preferred representation of a resource (and requesting only the HTTP headers be displayed)
curl -I -L -H "Accept: application/rdf+xml;q=0.95,text/rdf+n3;q=0.80"
-H "Negotiate: *" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"
returns
HTTP/1.1 303 See Other Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Date: Fri, 15 May 2009 16:17:11 GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept Content-Location: ALFKI.xml Content-Type: application/rdf+xml; qs=0.9025 Location: http://demo.openlinksw.com/tutorial/Northwind/data/Customer/ALFKI.xml Content-Length: 0 HTTP/1.1 303 See Other Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Content-Type: text/html; charset=ISO-8859-1 Date: Fri, 15 May 2009 16:17:11 GMT Accept-Ranges: bytes Location: http://demo.openlinksw.com/sparql?default-graph-uri=http%3A//demo.openlinksw.com/tutorial/Northwind& query=DESCRIBE+%3Chttp%3A//demo.openlinksw.com/tutorial/Northwind%2Fresource%2FCustomer%2FALFKI%3E&format=rdf Content-Length: 0 HTTP/1.1 200 OK Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: Keep-Alive Date: Fri, 15 May 2009 16:17:11 GMT Accept-Ranges: bytes Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 6358
Likewise, specifying N3 as the preferred format
curl -I -L -H "Accept: text/rdf+n3;q=1.0,application/rdf+xml;q=0.5"
-H "Negotiate: *" "http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI"
generates
HTTP/1.1 303 See Other Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Date: Fri, 15 May 2009 16:30:27 GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept Content-Location: ALFKI.n3 Content-Type: text/rdf+n3; qs=0.8 Location: http://demo.openlinksw.com/tutorial/Northwind/data/Customer/ALFKI.n3 Content-Length: 0 HTTP/1.1 303 See Other Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Content-Type: text/html; charset=ISO-8859-1 Date: Fri, 15 May 2009 16:30:28 GMT Accept-Ranges: bytes Location: http://demo.openlinksw.com/sparql?default-graph-uri=http%3A//demo.openlinksw.com/tutorial/Northwind& query=DESCRIBE+%3Chttp%3A//demo.openlinksw.com/tutorial/Northwind%2Fresource%2FCustomer%2FALFKI%3E&format=n3 Content-Length: 0 HTTP/1.1 200 OK Server: Virtuoso/05.10.3038 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: Keep-Alive Date: Fri, 15 May 2009 16:30:28 GMT Accept-Ranges: bytes Content-Type: text/rdf+n3; charset=UTF-8 Content-Length: 2018
To explain how this TCN configuration is set up, the salient portions of the rd_v_1.sql setup script are described below.
A URL rewriting rule list, nwtut_rule_list_1, is associated with logical
path /tutorial/Northwind/resource. Two rules, resource_rule_1 and
resource_rule_2 are added to the rule list. Each rewrites request paths
containing '/tutorial/Northwind/resource/'.
DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/resource',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/', is_dav=>1, is_brws=>1,
vsp_user=>'dba',opts=>vector ('url_rewrite', 'nwtut_rule_list_1'));
...
DB.DBA.URLREWRITE_CREATE_RULELIST ('nwtut_rule_list_1', 1, vector('resource_rule_1', 'resource_rule_2'));
The first rule, resource_rule_1, acts as a 'catch all', handling requests
for content types not handled by the second rule. The latter handles requests
for different RDF serialization formats: RDF/XML, N3, TTL, redirecting them to
path /tutorial/Northwind/data/... . resource_rule_1 forces
requests for any other content types to 'text/html', redirecting the request to
path /tutorial/Northwind/page/... .
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('resource_rule_1', 1, '/resource/([^.]*)',
vector ('par_1'), 1,'/tutorial/Northwind/page/%s',
vector ('par_1'), NULL, NULL, 2, 303, NULL);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('resource_rule_2', 1, '/resource/(.*)\x24',
vector ('par_1'), 1,'/tutorial/Northwind/data/%s',
vector ('par_1'), NULL, '(application/rdf.xml)|(text/rdf.n3)|(application/x-turtle)', 2, 303);
So, requests for /tutorial/Northwind/resource/$1 are routed to:
where $1 signifies the remainder portion of the input path. The Customer entity ALFKI has four description document variants, ALFKI.xml, ALFKI.n3, ALFKI.ttl and ALFKI.html. Each variant is described using function HTTP_VARIANT_ADD. (Here, the '$' character is coded using its hex value, \x24.)
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.xml', 'application/rdf+xml', 0.95, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.n3', 'text/rdf+n3', 0.80, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.ttl', 'application/x-turtle', 0.70, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('nwtut_rule_list_1', '(.*)', '\x241.html', 'text/html', 0.60, location_hook=>null);
Finally, the paths /tutorial/Northwind/data and
/tutorial/Northwind/page have their own rewrite rules, attached to
rule lists nwtut_rule_list2 and nwtut_rule_list3 respectively.
DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/data',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/',
is_dav=>1, is_brws=>1, vsp_user=>'dba',opts=>vector ('url_rewrite', 'nwtut_rule_list_2'));
DB.DBA.VHOST_DEFINE (lpath=>'/tutorial/Northwind/page',
ppath=>'/DAV/VAD/tutorial/rdfview/rd_v_1/',
is_dav=>1, is_brws=>1, vsp_user=>'dba',
opts=>vector ('url_rewrite', 'nwtut_rule_list_3'));
nwtut_rule_list2 contains three rewrite rules (data_rule_1/2/3), one for each RDF description document variant. Each rewrites the resource request as a SPARQL DESCRIBE query, the only difference between the queries being the request response serialization format. nwtut_rule_list3 contains one rule (page_rule_1) to re-route requests for text/html through the /about/html Sponger proxy, and so generate an HTML rendering. Each rule strips off any file suffix identifying the variant; e.g. only the 'Customer/ALKI' portion of 'Customer/ALFKI.n3' or 'Customer/ALFKI.html' is inserted into the rewritten request.
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_1', 1, '/data/(.*)\\.(xml)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=rdf',
vector ('par_1'), NULL, NULL, 2, 303, '');
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_2', 1, '/data/(.*)\\.(ttl)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=n3',
vector ('par_1'), NULL, NULL, 2, 303, '');
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'data_rule_3', 1, '/data/(.*)\\.(n3|rdf)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A//^{URIQADefaultHost}^/tutorial/Northwind&
query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/tutorial/Northwind%%2Fresource%%2F%U%%3E&format=%U',
vector ('par_1', 'f'), NULL, NULL, 2, 303, '');
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'page_rule_1', 1, '/page/([^.]*)', vector ('par_1'), 1,
'/about/html/http://^{URIQADefaultHost}^/tutorial/Northwind/resource/%s',
vector ('par_1'), NULL, '(text/html)', 2, 303);
Under the umbrella of the W3C Linking Open Data (LOD) Community Project, DBpedia is a well known initiative to extract structured information from Wikipedia and make this information available on the Web. The DBpedia knowledge base is accessible through a SPARQL endpoint or through a Linked Data interface. As DBpedia defines Linked Data URIs for millions of concepts, it forms one of the central interlinking hubs in the LOD Cloud and the emerging Web of Data.
When serving the DBpedia dataset as Linked Data, DBpedia supports transparent content negotiation in a similar manner to that already described for the Northwind Tutorial RDF View. Indeed, the Northwind RDF View's TCN configuration was modelled as a simplifed version of DBpedia's.
DBpedia uses a slash URI scheme when distinguishing between resource and description document URIs. Depending on the content type preferences of the consuming client expressed in any 'Accept' request headers and the 'best' variant as selected by the server, a request for resource http://dbpedia.org/resource/The_Beatles is redirected to one of:
As with the Northwind RDF view, the URI prefixes
http://dbpedia.org/resource/...,
http/dbpedia.org/page/... and
http://dbpedia.org/data/... distinguish between a resource and its
HTML or RDF description documents.
The available RDF description document variants can be listed using curl. The command:
curl -I -L -H "Negotiate: vlist" -H "Accept: application/rdf+xml" "http://dbpedia.org/resource/The_Beatles"
yields:
HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=UTF-8
Date: Mon, 18 May 2009 14:47:31 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"The_Beatles.n3" 0.800000 {type text/rdf+n3}}, {"The_Beatles.ttl" 0.700000 {type application/x-turtle}},
{"The_Beatles.xml" 0.950000 {type application/rdf+xml}}
Location: http://dbpedia.org/data/__The_Beatles
Content-Length: 418
Requesting resource "The_Beatles" with RDF/XML as the preferred description format, using:
curl -I -L -H "Negotiate: *"
-H "Accept: application/rdf+xml;q=0.95,text/rdf+n3;q=0.80,text/html;q=0.60"
"http://dbpedia.org/resource/The_Beatles"
returns:
HTTP/1.1 303 See Other Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Date: Mon, 18 May 2009 14:56:39 GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept Content-Location: The_Beatles.xml Content-Type: application/rdf+xml; qs=0.9025 Location: http://dbpedia.org/data/The_Beatles.xml Content-Length: 0 HTTP/1.1 200 OK Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: Keep-Alive Date: Mon, 18 May 2009 14:56:40 GMT Accept-Ranges: bytes Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 55844
Changing the preferred description format to N3:
curl -I -L -H "Negotiate: *"
-H "Accept: application/rdf+xml;q=0.70,text/rdf+n3;q=0.95,text/html;q=0.60"
"http://dbpedia.org/resource/The_Beatles"
results in the response:
HTTP/1.1 303 See Other Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: close Date: Mon, 18 May 2009 15:00:16 GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept Content-Location: The_Beatles.n3 Content-Type: text/rdf+n3; qs=0.76 Location: http://dbpedia.org/data/The_Beatles.n3 Content-Length: 0 HTTP/1.1 200 OK Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64 VDB Connection: Keep-Alive Date: Mon, 18 May 2009 15:00:20 GMT Accept-Ranges: bytes Content-Type: text/rdf+n3; charset=UTF-8 Content-Length: 29259
DBpedia's URL rewriting rules and TCN support are configured using script dbpedia_init.sql, portions of which are listed below. For completeness, dbpedia_init.sql is available here.
Using VHOST_DEFINE, the logical paths http://dbpedia.org/resource,
http://dbpedia.org/page and http://dbpedia.org/data are each associated with
URL rewriting rule lists. Requests to /resource are redirected to
/page/%s or /data/__%s accordingly depending on
whether an HTML or RDF description is being requested, and where %s is the
portion of the request path after /resource/. Resource
descriptions provided by path /data/__%s are available in three
variants RDF/XML, N3 and TTL - each variant is described using
HTTP_VARIANT_ADD.
DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org', lpath=>'/resource',
ppath=>'/', is_dav=>0, def_page=>'',
opts=>vector ('url_rewrite', 'dbp_rule_list_2'));
...
DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org',
lpath=>'/page',
ppath=>registry_get('_dbpedia_path_'),
is_dav=>atoi (registry_get('_dbpedia_dav_')),
opts=>vector ('url_rewrite', 'dbp_rule_list_7'));
...
DB.DBA.VHOST_DEFINE ( lhost=>':80', vhost=>'dbpedia.org', lpath=>'/data',
ppath=>registry_get('_dbpedia_path_'),
is_dav=>atoi (registry_get('_dbpedia_dav_')), vsp_user=>'dba',
opts=>vector ('url_rewrite', 'pvsp_rule_list2'));
DB.DBA.URLREWRITE_CREATE_RULELIST ( 'dbp_rule_list_2', 1, vector('dbp_rule_14', 'dbp_rule_12'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_14', 1, '/resource/(.*)\x24', vector ('par_1'), 1,
'/page/%s', vector ('par_1'), NULL, NULL, 2, 303, NULL);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_12', 1, '/resource/(.*)\x24', vector ('par_1'), 1,
'/data/__%s', vector ('par_1'), NULL, '(application/rdf.xml)|(text/rdf.n3)|(application/x-turtle)', 2, 303);
delete from DB.DBA.HTTP_VARIANT_MAP where VM_RULELIST = 'dbp_rule_list_2';
DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.xml', 'application/rdf+xml', 0.95, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.n3', 'text/rdf+n3', 0.80, location_hook=>null);
DB.DBA.HTTP_VARIANT_ADD ('dbp_rule_list_2', '__(.*)', '\x241.ttl', 'application/x-turtle', 0.70, location_hook=>null);
...
DB.DBA.URLREWRITE_CREATE_RULELIST ( 'dbp_rule_list_7', 1, vector ('dbp_rule_13'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'dbp_rule_13', 1, '(/[^#]*)', vector ('par_1'), 1,
registry_get('_dbpedia_path_')||'description.vsp?res=%U', vector ('par_1'),
NULL, NULL, 0, 0, '');
...
DB.DBA.URLREWRITE_CREATE_RULELIST ( 'pvsp_rule_list2', 1, vector ('pvsp_data_rule2', 'pvsp_data_rule3', 'pvsp_data_rule4'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule2', 1, '/data/(.*)\\.(xml)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=rdf',
vector ('par_1'), NULL, NULL, 2, null, '');
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule3', 1, '/data/(.*)\\.(ttl)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=n3',
vector ('par_1'), NULL, NULL, 2, null, '');
DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'pvsp_data_rule4', 1, '/data/(.*)\\.(n3|rdf)', vector ('par_1', 'f'), 1,
'/sparql?default-graph-uri=http%%3A%%2F%%2Fdbpedia.org&
query=DESCRIBE+%%3Chttp%%3A%%2F%%2Fdbpedia.org%%2Fresource%%2F%U%%3E&format=%U',
vector ('par_1', 'f'), NULL, NULL, 2, null, '');
Requests redirected to /data/__%s are redirected to
/data/%s.xml, /data/%s.ttl or
/data/%s.(n3|rdf) depending on the content type of the chosen
variant. The data for these RDF variants is furnished by similar SPARQL
DESCRIBE queries which differ only in the format= query string parameter used
to specify the result set representation format.
Requests redirected to /page/%s are in turn redirected to the
page description template description.vsp which provides the HTML rendering. In
effect, this is equivalent to the external 303 redirect to the
/about/html proxy used by the Northwind tutorial RDF view - the
proxy uses description.vsp internally.
With Yahoo and Google both having announced support for RDFa, this format has arguably become the most important of the RDF syntaxes. From the perspective of content providers, RDFa brings other benefits beyond the obvious attraction of increasing your content's page rank by providing more accurate, semantically richer metadata to RDFa aware crawlers. Key amongst these is that RDFa provides the simplest route to deploying Linked Data.
In this guide we have emphasized the distinction between a real world concept or entity and its, possibly many, descriptions, where each description is associated with a different media-type. Earlier examples have shown how to serve multiple representation formats: HTML, RDF/XML, N3, TTL etc. In essence these formats boil down to a choice between either an HTML representation or some variant of RDF. What RDFa gives you is both representations combined in a single entity description document. Consequently the need for content negotiation or 303 redirects to different representation documents is removed. This fundamental difference is depicted in the following three diagrams contrasting the differences between serving content using HTML+RDFa and serving separate HTML and RDF description documents through a hash or slash URI scheme.



While authors of small sites might opt to serve static content and mark up their HTML with RDFa manually, for large datasets this becomes unattractive. In cases where the HTML representation itself is being generated from an RDF quad store, it makes sense to generate any embedded RDFa alongside the HTML. Virtuoso provides this option through description.vsp, a Virtuoso Server Page which provides an HTML description of RDF Linked Data. Appendix A provides a brief overview.
When dereferencing an entity URI, the description returned is determined by the media-type(s) specified in any Accept headers expressing the client's preferred representation formats. A client can request an XHTML+RDFa description by supplying an Accept header with a media-type of application/xhtml+xml or text/html. In the absence of Accept headers, OpenLink's rewriting rules are normally configured to return HTML+RDFa by default. (Rewriting rules configured by the rdf_mappers VAD typically use this convention.)
As our earlier coverage of Virtuoso's proxy service URIs explained, requests for an HTML rendering of an entity description are normally redirected internally to the /about/html proxy. This proxy in turn uses description.vsp to generate an HTML rendering with embedded RDFa. So, by exploiting the default URL rewriting rules, internal redirects (as opposed to much slower external 303 redirects) and the /about/html proxy service, it is possible to combine description.vsp's HTML+RDFa generation capabilities with the deployment benefits of RDFa.
If viewing Virtuoso purely as an RDF publishing service, RDFa simply constitutes another supported syntax for encoding RDF metadata, alongside RDF/XML, N3, Turtle, NTriples and JSON. However, RDF metadata drawn from the Virtuoso quad store and rendered in one of these formats can itself have been extracted directly or synthesised from a multitude of non-RDF data sources using Virtuoso's Sponger. (Obviously raw RDF data can also be imported directly.)
When sponging an XHTML resource, the Sponger will, via the xHTML cartridge, automatically ingest any RDFa found and cache the extracted RDF in the quad store. But, the Sponger can also generate RDF metadata describing non-RDF data sources. The net result is that the Sponger in combination with description.vsp can generate RDFa for data sources containing neither RDF nor RDFa.
As well as being invoked by the /about/html proxy, description.vsp also underpins the OpenLink Data Explorer's "View Page Metadata" option. ODE provides a simple means to examine the RDFa generated by description.vsp.
The screenshot below shows ODE's "View Page Metadata" output when http://www.crunchbase.com/company/twitter is sponged by the public Sponger at http://linkeddata.uriburner.com. The subsequent screenshot highlights some of the RDFa markup in a heavily cutdown extract from the description.vsp generated page source.
Essentially, in the description.vsp output page, values listed under the "Has Attributes & Values" tab are described using RDFa attributes @rel and @resource, if the object part of the triple is a URI, or using @property if the object part is a literal. Entities listed under the "Is Attribute Value Of" tab are described using RDFa attributes @rev and @resource.