OpenLink Virtuoso - Linked Data



Deploying Linked Data


Linked Data

Common Web & Different Nature of URIs

'Linked Data Web' and the 'Document Web' two dimensions of the Web separated by a common element - the Uniform Resource Identifier (URI)


What are Resources?

Web parlance for a Data Object or Entity that may be physical or abstract

Resource Identity, Representation, and Access

Linked Data Deployment Requirements

To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour the following requirements:

Challenges:

Real-World Object Naming - URI Schemes

Linked Data Web URIs can take two forms:

Slash URI Semantics

Separating identification and naming from representation using Slash URIs

Separating identification and naming from representation using Slash URIs

Hash URI Semantics

Separating identification and naming from representation using Hash URIs

Separating identification and naming from representation using Hash URIs

Handling Identity with 'Slash' URIs

For this URI scheme HTTP redirection (30X response) is required in order for resource "Identity" to be separated from "representation". Examples:

Handling Identity with 'Hash' URIs

For this URI scheme HTTP redirection isn't required in order for resource "Identity" to be separated from "representation". Examples:

Negotiated Representation of Resource Descriptions

Use HTTP's inbuilt Content Negotiation mechanism to:

Content Negotiation - Example

HTTP Request:

HTML browser requests a HTML/XHTML document in English or French

GET /whitepapers/data_mngmnt HTTP/1.1
Host: www.openlinksw.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, fr


Content Negotiation - Example

HTTP Response:

Server redirects to a URL where the appropriate version can be found

HTTP/1.1 302 Found
Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html


Content Negotiation Decision Table

For static descriptions of a Data Object:
Assumes there are static HTML and RDF documents available to provide HTML and RDF representations of the customer entity ALFKI

Content Negotiation Decision Table

Dynamic RDF Renderings

If entity descriptions are held in an RDF quad store:

To provide a dynamic RDF rendering of the entity being dereferenced by the client:

Use SPARQL DESCRIBE or CONSTRUCT


Content Negotiation Decision Table

For dynamically derived descriptions of a Data Object using SPARQL DESCRIBE:

Content Negotiation Decision Table

URL Rewriting


URL Rewriting - Example Pipeline

URL Rewriting - Example Pipeline

Content negotiation for RDF representation

Content negotiation for RDF representation

Deploying Linked Data Using Virtuoso


Virtuoso - URL Rewriter Key Elements

Conductor UI for URL Rewriter

RDF view for Northwind sample database:
Rewriting rule for HTML requests

Conductor UI for URL Rewriter

Conductor UI for URL Rewriter

RDF view for Northwind sample database:
Rewriting rule for RDF/XML or N3 based resource description requests

Conductor UI for URL Rewriter

Conductor UI for URL Rewriter

Defining the SPARQL query underpinning the 'Destination Path Format' of the RDF/XML / N3 rewriting rule - Automatically URL encoded when saved

Conductor UI for URL Rewriter

Rewrite Rule Components in Conductor UI

URL Rewriter - URIQADefaultHost Macro

URIQADefaultHost Macro


URL Rewriting Process for RDF Requests

URL Rewriting Process for RDF Requests

URL Rewriting Process for HTML Requests

HTML requests are redirected via proxy /about/html to a rendering template - description.vsp

description.vsp rendering of Customer entity <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>

URL Rewriting Process for HTML Requests

description.vsp - Rendering RDF as HTML

Destination path in rewrite rule for HTML requests: /about/html/http://^{URIQADefaultHost}^$s1


Exporting URL Rewriting Rules from Conductor

Exporting URL Rewriting Rules from Conductor

Example Exported Rule Definitions

DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0,
opts=>vector ('url_rewrite', 'demo_nw_rule_list1'),
is_default_host=>0);
DB.DBA.URLREWRITE_CREATE_RULELIST (
'demo_nw_rule_list1', 1, vector ('demo_nw_rule1', 'demo_nw_rule2'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'demo_nw_rule1', 1, '(/[^#]*)',vector ('path'), 1,
'/about/html/http://^{URIQADefaultHost}^%s',
vector ('path'),
NULL, '(text/html)|(\\*/\\*)', 0, 303, NULL);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'demo_nw_rule2', 1, '(/[^#]*)', vector ('path'), 1,
'/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%3E+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Northwind%%3E&format=%U', vector ('path', 'path', '*accept*'), NULL, '(text/rdf.n3)|(application/rdf.xml)', 0, NULL, NULL);

URL Rewriter API: Enabling Rewriting

e.g

DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0, opts=>vector ('url_rewrite', 'demo_nw_rule_list1'), is_default_host=>0);

URL Rewriter API: Enabling Rewriting

Functions in DB.DBA schema:

'Nice' URLs vs 'Long' URLs

Sprintf Rules vs Regex Rules


URLREWRITE_CREATE_REGEX_RULE

URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
rule_iri: rule's name / identifier
nice_match: regex to parse URL into a vector of 'occurrences'
nice_params: vector of names of the parsed parameters.
Length of vector equals # of '(...)' specifiers in the regex
target_compose: 'compose' regex for the destination URL
target_params: vector of names of parameters to pass to the
'compose' expression as $1, $2 etc
target_expn: optional SQL text to execute instead of a regex compose
accept_pattern: regex expression to match the HTTP Accept header
do_not_continue: on a match, try / don't try next rule in rule list
http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect
http_headers:HTTP headers to supply with the rewritten request


URL Rewriter - Verification with curl

curl utility provides a useful tool for verifying HTTP server responses and rewriting rules

$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Feb 2009 11:23:31 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?query=DESCRIBE+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%23this%3E+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%3E+FROM+%3Chttp
%3A//demo.openlinksw.com/Northwind%3E&format=application%2Frdf%2Bxml
Content-Length: 0

Note: default rule for RDF requests changed to return HTTP response 303, rather than use an internal redirect, to allow the generated SPARQL query to be viewed and checked with curl


Browsing & Exploring Linked Data

OpenLink Data Explorer (ODE)

iSparql Query Tool

Content Negotiation Revisited - TCN

Virtuoso supports two flavours of content negotiation:

Transparent Content Negotiation

Transparent Content Negotiation

Variant Selection by User Agent:

Variant Selection by Server:

TCN - Basic Mechanics

Client

Client

*New headers introduced by RFC2295

Example - Preferred format: XML

$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
          -H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2008 15:44:07 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.xml
Content-Type: text/xml
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39
<?xml version="1.0" ?>
<a>some xml</a>


Example - Preferred format: HTML

$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3"
          -H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2008 15:43:18 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.html
Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html>
  <body>
    some html
  </body>
</html>


Example - Variant list request

$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
          -H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2008 15:44:35 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type
text/plain}}, {"page.xml" 1.000000 {type text/xml}}

Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>300 Multiple Choices</title></head>
<body><h1>Multiple Choices</h1>Available variants:<ul>
<li><a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>

</ul></body></html>


TCN Configuration - via Virtuoso/PL

create table DB.DBA.HTTP_VARIANT_MAP (
   VM_ID integer identity, -- unique ID
   VM_RULELIST varchar, -- HTTP rule list name
   VM_URI varchar, -- name of requested resource e.g. 'page'
   VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml','page.de.html' etc.
   VM_QS float, -- Source quality, number in the range 0.001-1.000, with 3 digit precision
   VM_TYPE varchar, -- Content type of the variant e.g. text/xml
   VM_LANG varchar, -- Content language e.g. 'en', 'de' etc.
   VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
   VM_DESCRIPTION long varchar, -- human readable variant description e.g. 'Profile in RDF format'
   VM_ALGO int default 0, -- reserved for future use
   primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
)
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID)


TCN Configuration - via Conductor UI

TCN Configuration - via Conductor UI

TCN Configuration - Variant Description

DB.DBA.HTTP_VARIANT_ADD (
   in rulelist_uri varchar, -- HTTP rule list name
   in uri varchar, -- Requested resource name e.g. 'page'
   in variant_uri varchar, -- Variant name e.g. 'page.xml', 'page.de.html' etc.
   in mime varchar, -- Content type of the variant e.g. text/xml
   in qs float := 1.0, -- Source quality, a floating point number with 3 digit precision in 0.001-1.000 range
   in description varchar := null, -- a human readable description of the variant e.g. 'Profile in RDF format'
   in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc.
   in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
   )

DB.DBA.HTTP_VARIANT_REMOVE (
   in rulelist_uri varchar, -- HTTP rule list name
   in uri varchar, -- Name of requested resource e.g. 'page'
   in variant_uri varchar := '%' -- Variant name filter
   )


TCN Configuration - via Virtuoso/PL

Adding resource variant descriptions

DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html', 0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain', 0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml', 1.000000, 'XML variant');

DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1, vsp_user=>'dba', opts=>vector ('url_rewrite', 'http_rule_list_1'));