Deploying Linked Data - Part 3

Deploying Linked Data - TOC

Section Contents:

Glossary

  • class - A concept in a domain of interest. A class describes the common attributes and behaviours shared by entities belonging to the same group by virtue of their common characteristics.
  • content negotiation - A mechanism defined in HTTP which supports serving different representations of a URL-addressable resource. An HTTP client can indicate which representation formats it understands and prefers.
  • cURL - A command line tool for transferring files to or from a URL. It writes to standard output by default and provides a good tool for simulating a web browser's interaction with an HTTP server.
  • data source - A source of data (e.g. a place that provides access to property values associated with one or more Entities).
  • data resource - same as data source
  • data space - A moniker for Web-accessible atomic containers that manage and expose data, information, services, processes, and knowledge. Data Spaces are fundamentally problem-domain-specific database applications with the benefit of being data model and query language agnostic.
  • dereferencing - The act of accessing and retrieving data, in desired representation, from a location identified by URL.
  • document resource - A Web information resource in a specific representation that is identifiable and accessible via a URL. Documents are the dominant information resource form on the Document Web (i.e., the current Web).
  • Document Web - Web of Linked Documents.
  • entity - Something, real or conceptual, which exists apart from other things.
  • entity ID - A unique identifier for an entity, uniquely identifying and distinguishing a particular entity instance from other similar entities (typically of the same type or class).
  • entity set - A collection of entities all belonging to the same class.
  • HTTP (HyperText Transport Protocol) - A communication protocol for information transfer on the World Wide Web.
  • HTTP header - A text record exchanged between an HTTP client and server, which forms part of an HTTP request or HTTP response message. A request consists of a method (or verb), headers, and an optional message body. The request header fields allow the client to send additional information about the request and the client itself. A response consists of a status line, headers, and an optional message body. A response header typically contains information about the data being returned and about the server itself.
  • information resource - An encapsulation of data and representation that forms the basic payload unit (packet) on the Web Information Bus.
  • IRI (Internationalized Resource Identifier) - An internationalized version of a Uniform Resource Identifier (URI). While URIs are limited to a subset of the ASCII character set, IRIs may contain any Unicode character.
  • Linked Data - A Data Access by Reference mechanism that uses HTTP as a pointer system for accessing the negotiated representation of resource/entity descriptions. For example, an RDF model based resource description can be projected (represented) using (X)HTML, N3, Turtle, or RDF/XML via content negotiation. At all times the data access mechanism and ultimate presentation/representation format are distinct.
  • Linked Data Web - Web of Linked Data.
  • non-information resource - Any resource that is not an information resource (i.e., not Web transportable in basic form). Structured data resource (see below) is a more accurate and preferable term.
  • structured data resource - A Web accessible container of structured data representing physical and abstract entities.
  • structured data - Data organized into semantic chunks or entities, with similar entities grouped together in relations or classes, and presented in a patterned manner.
  • structured data source - A repository of structured data.
  • URL (Uniform Resource Locator) - A URI that identifies a physical Web resource.
  • URI (Uniform Resource Identifier) - A global identification mechanism for resources (entities or objects) that is completely distinct from their presentation, representation and data access mechanism.
  • Web information resource - A compound document style of artifact that provides a materialized contextualization of data.

Bibliography

Appendix A: Description.vsp - Rendering RDF as HTML

Description.vsp is a Virtuoso Server Page (Virtuoso's equivalent of ASP) which provides a hypertext description of RDF Linked Data. Its purpose is to provide a default HTML rendering of RDF data, to allow it to be navigated using an HTML, rather than RDF, browser. Description.vsp underpins the 'Page Description' facility in the OpenLink Data Explorer (ODE) browser extension. (ODE is also available as a hosted service - e.g. http://demo.openlinksw.com/ode.) The HTML view it provides substitutes RDF hyperdata links with hypertext links. The description is tabular, listing the properties of the entity being described and, adjacent, the property values.

fig8

Description.vsp is invoked through the Virtuoso 'Page Description' service, a proxy service accessed via /about/html. For instance, to extract RDF data from an HTML page, http://musicbrainz.org/artist/72c090b6-a68e-4cb9-b330-85278681a714.html describing musician John Cale, and then view the extracted RDF as HTML, the page can be retrieved via the Page Description Service hosted by the Virtuoso instance at linkeddata.uriburner.com. viz:

Similarly, when deploying your own Linked Data, you can exploit the power of Virtuoso's URL rewriting to automatically redirect requests for HTML renditions of the RDF data to the /about/html proxy.

When description.vsp is executed, the source URI is sponged to extract the RDF data to be displayed. Two routes through the Virtuoso Sponger are possible:

  1. If the source contains RDF directly, this is used 'as is'.
  2. If not, the Virtuoso Sponger extracts any available metadata through one or more Sponger cartridges and converts this to RDF.

Data from the Northwind RDF view follows route 1. Data from the MusicBrainz page on John Cale follows route 2. Whichever route is followed, the Sponger caches the RDF data in the Virtuoso RDF quad store.

With the Northwind demo rewrite rule for HTML requests set up with a Request Path Pattern of /about/html/(.*) and a Destination Path Format of /rdfdesc/description.vsp?g=$U1 , a request for an HTML rendering of http://myhost/Northwind/Customer/ALFKI#this results in description.vsp being called with parameter g set to http://myhost/Northwind/Customer/ALFKI#this.

description.vsp uses Virtuoso's SPARQL extensions for IRI dereferencing (see the Virtuoso on-line documentation: IRI Dereferencing for FROM clauses) to invoke the Sponger via the get:soft "soft" option and crawl the URI identified by 'g'. e.g.

sparql define get:soft "soft" SELECT * from <http://myhost/Northwind/Customer/ALFKI> where { ?x ?y ?z }

The Sponger creates graph <http://myhost/Northwind/Customer/ALFKI> to hold the extracted RDF data describing entity ALFKI. Once cached in the quad store, description.vsp then issues a series of SPARQL queries to identify the predicates and predicate values associated with all RDF statements having <http://myhost/Northwind/Customer/ALFKI#this> as the subject. These are then displayed in an HTML table.

Appendix B: New Proxy URI Formats

As of September 2009, the Sponger proxy paths /about/html and /about/rdf have been augmented to support a richer slash URI scheme for identifying an entity and its metadata in a variety of representation formats.

The proxy path /about/html returns an XHTML description of an entity as before, but now includes richer embedded RDFa. Although some of the examples in this document still refer to /about/rdf (which is still usable), please bear in mind that this path has been deprecated in favour of /about/id.

The new proxy path /about/id returns an RDF description of an entity, using a default serialization format of RDF/XML. Different serialization formats can be requested by specifying the appropriate media type in an Accept header. Supported alternative formats are N3, Turtle (TTL), NTriples or JSON. Alternatively, rather than using /about/id in combination with an Accept header specifying a media type, it is also possible to request a serialization format directly using another new proxy path /about/data. In this case, no Accept header is required as the required format is specified as part of the request URL.

To dereference the description of a Web-addressable resource via your browser simply type in one of the following URL patterns:

  • HTML description - http://<sponger proxy host>/about/html/<URLscheme>/<hostname>/<localpart>
  • RDF description - http://<sponger proxy host>/about/data/<format>/<URLscheme>/<hostname>/<localpart> where format is one of xml, n3, nt, ttl or json.
Examples

The examples which follow, illustrating how RDF metadata about a product described at www.bestbuy.com can be requested in different formats, use a public Virtuoso Sponger service hosted at linkeddata.uriburner.com. For more information refer to the URIBurner Wiki.

Notice how requests to /about/id are redirected to /about/html, /about/data/nt, /about/data/xml or /about/data/json depending on the requested format. The required URL rewriting rules are preconfigured when the rdf_mappers VAD is installed.

HTML+RDFa based metadata
curl -I -H "Accept: text/html" "http://linkeddata.uriburner.com/about/id/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278"

HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3040 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Tue, 01 Sep 2009 21:41:52 GMT
Accept-Ranges: bytes
Location: http://linkeddata.uriburner.com/about/html/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278
Content-Length: 13
    
or
curl -I -H "Accept: application/xhtml+xml" "http://linkeddata.uriburner.com/about/id/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278"

HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3040 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 03 Sep 2009 14:33:45 GMT
Accept-Ranges: bytes
Location: http://linkeddata.uriburner.com/about/html/http/www.bestbuycom/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278
Content-Length: 13 
    
N3 based metadata
curl -I -H "Accept: text/n3" "http://linkeddata.uriburner.com/about/id/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278"

HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3040 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: close
Date: Tue, 01 Sep 2009 21:38:44 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: /about/data/nt/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Type: text/n3; qs=0.8
Location: http://linkeddata.uriburner.com/about/data/nt/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Length: 13
    
RDF/XML based metadata
curl -I -H "Accept: application/rdf+xml" "http://linkeddata.uriburner.com/about/id/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278"

HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3040 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: close
Date: Tue, 01 Sep 2009 21:33:23 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: /about/data/xml/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Type: application/rdf+xml; qs=0.95
Location: http://linkeddata.uriburner.com/about/data/xml/http/www.bestbuy.com/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Length: 13
    
JSON based metadata
curl -I -H "Accept: application/rdf+json" "http://linkeddata.uriburner.com/about/id/http/www.bestbuycom/site/olspage.jsp?skuId=9491935&type=product&id=1218115079278"

HTTP/1.1 303 See Other
Server: Virtuoso/05.11.3040 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: close
Date: Tue, 01 Sep 2009 11:22:52 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: /about/data/json/http/www.bestbuycom/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Type: application/rdf+json; qs=0.7
Location: http://linkeddata.uriburner.com/about/data/json/http/www.bestbuycom/site/olspage.jsp?skuId=9491935%26type=product%26id=1218115079278
Content-Length: 13