OpenLink Virtuoso - Linked Data
Deploying Linked Data
Linked Data
- Term coined by Tim Berners-Lee
- Describes recommended best practice for exposing & connecting data on the Semantic Web
- Use the RDF data model
- Identify real or abstract things (resources) in your 'universe of discourse' (Data Spaces), using URIs as unique IDs
- Make URIs accessible via HTTP so people can discover and explore these Data Spaces
- Allow these URIs to be dereferenced and return information
- Include links to provide 'discovery paths' to entities in other Data Spaces
Common Web & Different Nature of URIs
'Linked Data Web' and the 'Document Web'
two dimensions of the Web separated by a common element - the
Uniform Resource Identifier (URI)
- Document Web URIs
- These always point to physical resources i.e., they are URLs
- Linked Data Web
- URIs Identify physical or abstract resources
What are Resources?
Web parlance for a Data Object or Entity that may be physical or abstract
- Document Web Resources are physical units of information (containers of contextualized data)
- Linked Data Web Resources are generic real-world data objects or entities that include:
- People, Places, and other Things
- Abstract concepts (e.g. Emotion)
- Subject Matter (e.g. Science, Geography, Economics etc.)
Resource Identity, Representation, and Access
- Identity (URI) of an Object or Entity should be unambiguous and globally unique
- On the Web a URI
should provide an unambiguous data access path
- Reference to abstract (physically inaccessible) Objects or Entities is only achievable via
conduit documents that carry representations of entity descriptions (which at best are facets of an
entire description)
- The descriptive representations of an Object or Entity must be distinct from their URIs
- Data Access mechanisms must be independent and facilitate negotiation of representation.
Linked Data Deployment Requirements
To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour
the following requirements:
- Unique Global Identity for Resources using HTTP-based URIs
- Deployment platform needs ability to generate proxy Web resources to convey descriptions of
real-world (possibly abstract) resources
Challenges:
- Separation of Identity and
Representation within the context of HTTP protocol mechanics
- Negotiable representation of resource descriptions through Transparent Content
Negotiation and client-side or server-side QoS algorithms
- URL rewriting and query association
Real-World Object Naming - URI Schemes
Linked Data Web URIs can take two forms:
- 'Slash' URIs - don't contain a fragment identifier (#)
-
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
http://demo.openlinksw.com/Northwind/Customer/ALFKI/page
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
- Identify an entity, it's HTML representation (document), and it's RDF representation
(document) respectively
- 'Hash' URIs - contain a fragment identifier
- E.g. http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
- Identifies the entity ALFKI, distinct from its representation
(http://demo.openlinksw.com/Northwind/Customer/ALFKI)
Slash URI Semantics
Separating identification and naming from representation using Slash URIs
Hash URI Semantics
Separating identification and naming from representation using Hash URIs
Handling Identity with 'Slash' URIs
For this URI scheme HTTP redirection (30X response) is required in order for resource "Identity" to
be separated from "representation". Examples:
Handling Identity with 'Hash' URIs
For this URI scheme HTTP redirection isn't required in order for resource "Identity" to be separated from "representation". Examples:
Negotiated Representation of Resource Descriptions
Use HTTP's inbuilt Content Negotiation mechanism to:
- Serve different format variants of the same resource description from one location
- Enable user agent (client) side specification of preferred description representations by preference order
- Enable server side specification of preferred description representations by preference order
Content Negotiation - Example
HTTP Request:
HTML browser requests a HTML/XHTML document in English or French
GET /whitepapers/data_mngmnt HTTP/1.1
Host: www.openlinksw.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, fr
- Accept header indicates preferred MIME types
- RDF browser might instead stipulate a MIME type of
application/rdf+xml or application/rdf+n3
Content Negotiation - Example
HTTP Response:
Server redirects to a URL where the appropriate version can be found
HTTP/1.1 302 Found
Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html
- Redirect is indicated by HTTP status code 302 (Found)
- Client then sends another HTTP request to the new URL
- HTTP defines several 3xx status codes for redirection
Content Negotiation Decision Table
For static descriptions of a Data Object:
Assumes there are static HTML and RDF documents available to provide HTML and RDF representations
of the customer entity ALFKI
Dynamic RDF Renderings
If entity descriptions are held in an RDF quad store:
To provide a dynamic RDF rendering of the entity being dereferenced by the client:
Use SPARQL DESCRIBE or CONSTRUCT
-
DESCRIBE <entity-uri> FROM <graph-uri>
- 'Unconstrained' - DESCRIBE output not prescribed by SPARQL specification
- Virtuoso supports custom procedures for generating output through SPARQL define sql:describe-mode
-
CONSTRUCT { <entity-uri> ?p ?o } FROM <graph-uri> WHERE { <entity-uri> ?p ?o }
Content Negotiation Decision Table
For dynamically derived descriptions of a Data Object using SPARQL
DESCRIBE:
URL Rewriting
- Is the act of modifying a URL prior to final processing by a Web server
- Provides a means to build a URL 'on the fly' identifying the resource in the required representation format referred to by a 303 redirection
- Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions
URL Rewriting - Example Pipeline
Content negotiation for RDF representation
Deploying Linked Data Using Virtuoso
- Virtuoso's approach is to implement the generic solution outlined so far, using
- Content negotiation
- URL rewriting
- Virtuoso includes a Rules-based URL Rewriter
- Can be used to inject Linked data into the Document Web
Virtuoso - URL Rewriter Key Elements
-
Rewriting Rule
- Describes how to parse a 'nice' URL and compose the actual 'long' URL of the resource to be returned
- Two types: sprintf-based and regex-based
-
Rewriting Rule List
- Named, ordered list of rewriting rules or rule lists
- Tried from top to bottom, first matching rule is applied
-
Conductor UI
for rewriting rule configuration
-
Configuration API
- alternative to Conductor UI, for scripts
- Functions for creating, dropping, enumerating rules & rule lists
Conductor UI for URL Rewriter
RDF view for Northwind sample database:
Rewriting rule for HTML requests
Conductor UI for URL Rewriter
RDF view for Northwind sample database:
Rewriting rule for RDF/XML or N3 based resource description requests
Conductor UI for URL Rewriter
Defining the SPARQL query underpinning the 'Destination Path Format' of the RDF/XML / N3 rewriting
rule - Automatically URL encoded when saved
Rewrite Rule Components in Conductor UI
- Request Path Pattern e.g. (/[^#]*)
- a regular expression matched against the input path
- Substitution parameters
- Each successive pair of parentheses in the regex denotes a parameter referred to elsewhere in the rewrite rule as $U1, $U2, $U3 ... or $s1, $s2, $s3 ...
- Can be used to substitute the part of the input path that was matched into the new URL being composed
- $accept parameter substitutes matched content types specified in Accept header
- 'U' format specifier - URL encodes inserted text
- 's' format specifier - inserts matched text 'as is'
URL Rewriter - URIQADefaultHost Macro
URIQADefaultHost Macro
URL Rewriting Process for RDF Requests
URL Rewriting Process for HTML Requests
HTML requests are redirected via proxy /about/html to a rendering template - description.vsp
description.vsp rendering of Customer entity
<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>
description.vsp - Rendering RDF as HTML
Destination path in rewrite rule for HTML requests: /about/html/http://^{URIQADefaultHost}^$s1
- Redirects client to the Virtuoso 'Page Description Service' via proxy interface /about/html
- Page description services invokes description.vsp which in turn invokes the Virtuoso Sponger
- Sponger: a customizable RDFizer with pluggable cartridges
- Extracts RDF from the target URL
- Native RDF sources: RDF is returned 'as is'
- Non-RDF sources: Meta-data is extracted and converted to RDF using ontology mapping and XSLT
- description.vsp renders the extracted RDF as HTML
- Substitutes RDF 'hyperdata' links with HTML hyperlinks
Exporting URL Rewriting Rules from Conductor
- Rewrite rules configured in Conductor can be exported as Virtuoso PL for backup, use on another
system etc.
- Exported script recreates rules using Virtuoso's URL Rewriting Configuration API
Example Exported Rule Definitions
DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0,
opts=>vector ('url_rewrite', 'demo_nw_rule_list1'),
is_default_host=>0);
DB.DBA.URLREWRITE_CREATE_RULELIST (
'demo_nw_rule_list1', 1, vector ('demo_nw_rule1', 'demo_nw_rule2'));
DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'demo_nw_rule1', 1, '(/[^#]*)',vector ('path'), 1,
'/about/html/http://^{URIQADefaultHost}^%s',
vector ('path'),
NULL, '(text/html)|(\\*/\\*)', 0, 303, NULL);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'demo_nw_rule2', 1, '(/[^#]*)', vector ('path'), 1,
'/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%3E+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Northwind%%3E&format=%U', vector ('path', 'path', '*accept*'), NULL, '(text/rdf.n3)|(application/rdf.xml)', 0, NULL, NULL);
URL Rewriter API: Enabling Rewriting
- Enabled through vhost_define( ) function
-
vhost_define( ) defines a virtual host or virtual path
-
opts parameter is a vector of field-value pairs
- Field url_rewrite controls / enables URL rewriting
- Field value is the IRI of the rule list to apply
e.g
DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0, opts=>vector ('url_rewrite', 'demo_nw_rule_list1'), is_default_host=>0);
URL Rewriter API: Enabling Rewriting
Functions in DB.DBA schema:
- URLREWRITE_CREATE_SPRINTF_RULE
- URLREWRITE_CREATE_REGEX_RULE
- URLREWRITE_CREATE_RULELIST
- URLREWRITE_DROP_RULE
- URLREWRITE_DROP_RULELIST
- URLREWRITE_ENUMERATE_RULES
- URLREWRITE_ENUMERATE_RULELISTS
'Nice' URLs vs 'Long' URLs
- Rewriter developed with broader objectives than Linked Data - consequently influenced terminology
- Rewriter takes a 'nice' URL and rewrites it as a 'long' URL
- 'Nice' URL
- Free from parameters, typically short
- 'Long' URL
- Typically contains query string with named parameters
- Often ignored by web crawlers (viewed as highly dynamic) => low page ranking
Sprintf Rules vs Regex Rules
- For 'nice' to 'long' URL conversion
- Functionally equivalent
- Only difference is syntax of match pattern definition
- For 'long' to 'nice' URL conversion
- Only works for sprintf-based rules
- Regex-based rules are unidirectional
URLREWRITE_CREATE_REGEX_RULE
URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
rule_iri: rule's name / identifier
nice_match: regex to parse URL into a vector of 'occurrences'
nice_params: vector of names of the parsed parameters.
Length of vector equals # of '(...)' specifiers in the regex
target_compose: 'compose' regex for the destination URL
target_params: vector of names of parameters to pass to the
'compose' expression as $1, $2 etc
target_expn: optional SQL text to execute instead of a regex compose
accept_pattern: regex expression to match the HTTP Accept header
do_not_continue: on a match, try / don't try next rule in rule list
http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect
http_headers:HTTP headers to supply with the rewritten request
URL Rewriter - Verification with curl
curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Feb 2009 11:23:31 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?query=DESCRIBE+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%23this%3E+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%3E+FROM+%3Chttp
%3A//demo.openlinksw.com/Northwind%3E&format=application%2Frdf%2Bxml
Content-Length: 0
Note: default rule for RDF requests changed to return HTTP response 303, rather than use an internal redirect, to allow the generated SPARQL query to be viewed and checked with curl
Browsing & Exploring Linked Data
OpenLink Data Explorer (ODE)
- Browser extension (Firefox, support for others to follow)
- RDF and HTML views of Linked Data
- RDF view incorporates 'hyperdata' links between entities
- HTML view substitutes hyperlinks
- Also available as a hosted service
iSparql Query Tool
Content Negotiation Revisited - TCN
Virtuoso supports two flavours of content negotiation:
-
HTTP/1.1 style content negotiation (introduced earlier)
- Server-driven negotiation only
-
Transparent Content Negotiation (TCN)
- Server-driven or agent-driven negotiation
- Suitably enabled user agents / browsers can take advantage of TCN
- Non-TCN capable user agents continue to be handled using HTTP/1.1 content negotiation
Transparent Content Negotiation
- A protocol defined by RFC2295, layered on top of HTTP/1.1
- Addresses deficiencies in HTTP/1.1 content negotiation
- Limited to server selecting best variant (server-driven negotiation)
- Server doesn't always know/select best variant
- User agent might often be better placed to decide what is best for its needs
- Inefficient
- Sending details of user agent's capabilities and preferences with every request is inefficient
- Large number of Accept headers required
- Very few Web resources have multiple variants
Transparent Content Negotiation
- Supports variant selection by user agent or by server
- Transparent - all variants on server are visible to the agent
Variant Selection by User Agent:
- User agent chooses best variant itself from variant list sent by server
- Requires sending fewer/smaller Accept headers
Variant Selection by Server:
- User agent can instruct server to select best variant on its behalf
- Server uses 'remote variant selection algorithm' (RFC2296)
TCN - Basic Mechanics
Client
- Supplies Negotiate* request header
- Addresses deficiencies in HTTP/1.1 content negotiation
- Content negotiation directives include:
-
"trans" => user agent supports TCN for the current request
-
"vlist" - user agent wants a variant list for the resource
- Variant list is expressed as an Alternates header.
- Implies "trans".
-
"*" - user agent allows servers and proxies to run any remote variant selection algorithm
Client
- Returns a TCN* response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate
*New headers introduced by RFC2295
Example - Preferred format: XML
- Assumes Virtuoso WebDAV server contains 3 variants of resource named 'page':
- /DAV/TCN/page.xml
- /DAV/TCN/page.html
- /DAV/TCN/page.txt
- User agent indicates preference for XML
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2008 15:44:07 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.xml
Content-Type: text/xml
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39
<?xml version="1.0" ?>
<a>some xml</a>
Example - Preferred format: HTML
- User agent indicates preference for HTML
$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2008 15:43:18 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.html
Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html>
<body>
some html
</body>
</html>
Example - Variant list request
- User agent asks for a list of variants
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2008 15:44:35 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type
text/plain}}, {"page.xml" 1.000000 {type text/xml}}
Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>300 Multiple Choices</title></head>
<body><h1>Multiple Choices</h1>Available variants:<ul>
<li><a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>
</ul></body></html>
TCN Configuration - via Virtuoso/PL
- Variant descriptions held in SQL table HTTP_VARIANT_MAP
- Added/updated/removed through Virtuoso/PL or Conductor UI
create table DB.DBA.HTTP_VARIANT_MAP (
VM_ID integer identity, -- unique ID
VM_RULELIST varchar, -- HTTP rule list name
VM_URI varchar, -- name of requested resource e.g. 'page'
VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml','page.de.html' etc.
VM_QS float, -- Source quality, number in the range 0.001-1.000, with 3 digit precision
VM_TYPE varchar, -- Content type of the variant e.g. text/xml
VM_LANG varchar, -- Content language e.g. 'en', 'de' etc.
VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
VM_DESCRIPTION long varchar, -- human readable variant description e.g. 'Profile in RDF format'
VM_ALGO int default 0, -- reserved for future use
primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
)
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID)
TCN Configuration - via Conductor UI
TCN Configuration - Variant Description
-
Adding or Updating a Resource Variant
DB.DBA.HTTP_VARIANT_ADD (
in rulelist_uri varchar, -- HTTP rule list name
in uri varchar, -- Requested resource name e.g. 'page'
in variant_uri varchar, -- Variant name e.g. 'page.xml', 'page.de.html' etc.
in mime varchar, -- Content type of the variant e.g. text/xml
in qs float := 1.0, -- Source quality, a floating point number with 3 digit precision in 0.001-1.000 range
in description varchar := null, -- a human readable description of the variant e.g. 'Profile in RDF format'
in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc.
in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
)
-
Removing a Resource Variant
DB.DBA.HTTP_VARIANT_REMOVE (
in rulelist_uri varchar, -- HTTP rule list name
in uri varchar, -- Name of requested resource e.g. 'page'
in variant_uri varchar := '%' -- Variant name filter
)
TCN Configuration - via Virtuoso/PL
Adding resource variant descriptions
-
Define variant descriptions & associate them with a rule list
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html', 0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain', 0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml', 1.000000, 'XML variant');
-
Define a virtual directory & associate the rule list with it
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1, vsp_user=>'dba', opts=>vector ('url_rewrite', 'http_rule_list_1'));
Virtuoso
OpenLink
TCN
Linked Data
HTTP
Data Web
URI
WebDAV
DAV
SQL
Semantic Web
RDF
Data Spaces
ID
Web
Linked Data Web
Data Object
Entity
Document
Resource
Science
Geography
Economics
Object
Data Access
Identity
Representation
Negotiable
Hash
Slash
Turtle
N3
XML
URL
Content Negotiation
Request
MIME
Response
Server
Web server
Pipeline
SPARQL
DESCRIBE
CONSTRUCT
rewriting
Destination Path Format
parameter
View
Process
Virtuoso_Sponger
Meta-data
XSLT
hyperdata
hyperlink
IRI
OpenLink Data Explorer
iSparql