Open Conceptual Data Models
Making the Conceptual Layer Real
via
HTTP based
Linked Data (aka. Linked Data)
Conceptual Data Models in the Linked Data Web
Linked Data Vision:
- The transition of the HTTP based Webs (Intranet, Extranet, or Internet)
-
from a Webs of Linked Documents
-
to Webs of interlinked Structured
data items (aka: entities, data objects, resources)
Concurrent trend in the IT industry:
- A recognition of the benefits of conceptual data models vs logical data models
The Big Question:
- To what extent does the Linked Data support conceptual level data models?
Open Conceptual Data Models
Topics:
- Conceptual & Logical Data Models
- Conceptual Models for the Semantic Web
- Realizing Conceptual Models through Ontologies & Linked Data
- Virtuoso's RDF based Linked Data Views
- ADO.NET Data Services & the Entity Data Model
Data Model Layers
-
Physical
- How data is physically represented on disk
-
Logical (aka logical schema)
- Expresses problem domain in terms of data management technology (tables / columns)
- e.g. relational schema
-
Conceptual (aka conceptual schema)
- Purely semantic description of problem space
- Describes things (entities), their
characteristics (attributes) &
associations between things (relationships)
Logical Data Model
- Most prominent of the three data model types
- Main focus of database applications
- Due to pervasiveness of relational database driven applications within the enterprise
and across the Web
Weaknesses
- Impedance mismatch
- Loss of semantics during development process
- Heterogeneous databases & interoperability
Logical Data Model Weaknesses
Impedance Mismatch
- SQL expresses queries in terms of tables / views
- => targets logical schema
- Normalization fragments the data model
- Entities & their attributes may be split across several tables
- Navigation between objects requires relational joins over two or more tables
- Table rows must be reconstituted into higher level conceptual entities
Conceptual level data model is desirable to:
- Remove impedance mismatch
- Isolate application from changes to logical data model
- Provide framework for human level interaction
Logical Data Model Weaknesses
Loss of Semantic Fidelity During Development
Process:
- Develop conceptual model (E-R modelling)
- Transform to logical model for implementation
- DBMS generates physical model
Problems:
- Each move to a lower level model depreciates semantic fidelity of the higher level model
- Conceptual Model semantics fragmented across schema / business rules / application code
- Application & Users must understand logical data model
- Must be hardcoded or inferred (imperfectly) from system tables
Logical Data Model Weaknesses
Heterogeneous Databases & Interoperability
Logical data model
- Describes problem domain in terms of tables/columns
- Requires costly table joins to navigate model
Application
- Exposed to specifics of a particular vendor's RDBMS
In heterogeneous database environment, must handle
- Different SQL dialects
- Different schemas
- No explicit data model. No explicit semantics.
- Interoperability/integration = perpetual problem for IT depts
Conceptual Models for Linked Data Webs
Explosion of User Generated Data from Web 2.0 applications and their
Data Silos is driving the recognition of the need to move from logical to conceptual models, exemplified
by:
- Microsoft's Entity Data Model / Entity Framework
- W3C's Semantic Web Project which includes powerful technologies for this paradigm shift such as:
- Resource Description Framework (RDF Data Model and Data Representation Formats)
- Web Ontology Language (OWL)
- SPARQL (Query Language, RESTful Interface, and Query Result Serialization Formats)
Benefits of Conceptual Models
- More faithfully represents human view of domain of interest
- Conceptual model & semantics
- Explicit & available globally
- Not implicit & fragmented across business logic / UI etc
- Better / explicit semantics facilitates move from "search" to "esoteric precision find"
- Much easier heterogeneous data integration
- User Generated Data is inherently heterogeneous & disparately located
Application Areas - Present & Future
-
Social Media, eCommerce, Distributed Collaborative Apps.
- Require shareable, standards-based, cross-platform conceptual views of data
-
Data portability
- Needed as users maintain multiple points of presence & identity across - blogs,
social network accounts etc.
-
Open business models
- Require exchange & integration of large amounts of data
-
Scientific research - sharing of knowledge & findings
- Requires transparent access to distributed heterogeneous data
- Requires database integration using global schema
-
Autonomous intelligent agents
- Free humans from large-volume information processing
Semantic Web Project Technologies
These technologies offer:
Ontologies
- For representing common semantics
- Spanning databases, applications, enterprises, on-line communities
- Deliver shared conceptual model
- Provide common schemas (Dublin Core, FOAF, SIOC, GoodRelations etc)
Common Semantics (Ontologies) & Common Data Representation (RDF)
- Enable cross data source querying using SPARQL
- Data across several databases (or data spaces) can be meshed, expanded, and explored
- Querying using proprietary APIs unnecessary
- Brute force data merging via code is unnecessary
Open Data Formats, Platform Independence, Common Models
- Facilitate data portability, accessibility, and integration.
Realizing Conceptual Models
Ontologies
- Provide the building blocks for conceptual models
- Define the concepts and their relationships in a domain of interest (or world view)
Describing Classes & Properties - Ontology Languages
- RDFS
- Introduces the notions of concepts (classes) & instances
- OWL
- Adds more vocabulary for describing:
- relations between classes
- cardinality
- richer typing of properties, etc.
Goodness of Fit
- RDF was designed from the ground up as a metadata data model
- RDF / RDFS / OWL work directly at the level of conceptual models
- Conceptual model terminology matches RDF/OWL terminology
- Concepts, entities, attributes, relationships
A natural fit!
RDF lends itself naturally to describing conceptual models
Semantic Expressivity Comparison
Data Definition Language (DDL)-based Relational Model
- Relationship between two entities isn't explicit
- Foreign key relating two rows in separate tables doesn't express the nature of the relationship
- Semantics must often be inferred from table definitions
RDF-based Conceptual Model
- Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple
- Semantic expressivity of RDF/RDFS/OWL is much better than DDL
- Has richer semantic content than equivalent DDL-based logical/relational model
RDF Conceptual Model - Artist / Records / Tracks
Global Granular Information Sharing
Traditional Logical/Relational Data Model
- Schema described by DDL is internal to DBMS
- Primary keys identifying an individual table row (i.e. entity instance) not globally unique, not easily usable outside host DBMS
- Gives rise to 'data silos'
RDF's use of Generic HTTP-based URIs
- Externalises the data and schema
- Makes both globally accessible & scalable
- Provides globally unique IDs for entities/relations/classes
- A vehicle for granular, global information sharing down
to the equivalent of the record level
Linked Data - What is It?
A method for exposing, sharing & connecting data
on HTTP based Data Networks.
- A term coined by Tim Berners-Lee that describes a RESTful mechanism for HTTP based
Data Access & Manipulation by Reference
-
A record level HTTP based Open Data Access
& Connectivity mechanism
- A richer hyperlinking mechanism that takes us from Hypertext Links (Document to Document) to
Hypertext Links (Data Item to Data Itemt)
Linked Data - Why Is It Important
- It exposes the compound nature of Data Containers (e.g., Documents) such that
- Data Containers are uniquely identified & referenceable
- Data Items within Data Containers are uniquely identified & referencable
- It provides a conceptual model oriented Open Data Access &
Connectivity mechanism
- It delivers a powerful mechanism for meshing disparate and heterogeneous data sources
Linked Data Model
Changes the focus from linked documents to linked entities
The document as a data container becomes less relevant
Hyperdata Links Between Data Objects
Linked Data Benefits - Data Exploration
Natural Navigation Through Typed Links
- RDF entities (instance data, classes, and properties) are identified by
dereferencable HTTP URIs
- Navigating from one data item to another is easy via:
- Single LINK click from any HTTP user agent commences data item relationship navigation
- Linked Data Browers such as OpenLink Data Explorer
Relational/Logical Model
- Cumbersome
- Requires SQL joins + typically Object-Relational mapping
- e.g. in C#: track = lennonAlbum.Tracks["Imagine"]
Linked Data Benefits - Aggregatable Data
Often desirable to have an integrated view of all the data
available about an item or topic
Database Realm
- Integration problematic, difficult to combine logical schemas
Semantic Web
- Data aggregation is easy: every resource has a unique URI
- Individual items can be linked
- Conceptual models can be linked
- Cross-domain links enrich domain knowledge
- Different facets of the same entity may be described by different URIs minted by different authors
- Can be linked. e.g. owl:sameAs, rdf:type predicates
- May expose facts not directly represented in any one source
Linked Data - Data Aggregation
Linked Data Benefits - Self Describing Data
Resource Description Framework (RDF)
- A technology for creating self-describing Web resources
- Data Item's type definition 'accompanies' it via rdfs:type relations
- An RDF based data can be queried using SPARQL without knowing anything beforehand about the
data definition (schema comes last in this realm)
- Provides the basis for powerful deductive data exploration tools
Logical / Relational Schema
- Users / applications need a detailed understanding of the schema to use and navigate the data
- Application's knowledge of the schema typically hardcoded
- Ad-hoc end-user data exploration potentially error prone
Linked Data Benefits - SPARQL
If a user agent has no built-in knowledge of a particular Data Item, it can dereference its
Generic HTTP URI to obtain such information
The Power of SPARQL
Discover what sorts of things a data source contains
-
SELECT DISTINCT ?URI ?ObjectType WHERE { ?URI a ?ObjectType }
Determine all the properties of an data item's class
-
SELECT * WHERE { <http://my.org/resourceTypes/Department>
?property ?hasValue }
Determine all the properties and values of an data item instance
-
DESCRIBE <http://my.org/resource/Accounts>
No prior knowledge of the RDF data source is needed
Virtuoso - Linked Data Generation Options
Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
Virtuoso RDF based Linked Data Views
- Expose relational model data as RDF graph model data
- Provide the means to move from a logical model to a conceptual model view
- Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)
- No physical regeneration of relational data
- RDF Views =
- Virtuoso RDF Meta-Schema (MSL) +
- Meta-Schema Language
- MSL =
- A domain specific, declarative language for mapping a logical SQL data model to a
conceptual RDF data model
Northwind Demo Database: RDF View Definition Extract
Demo.demo.Customers:
Northwind RDF View Definition
prefix northwind: <http://www.openlinksw.com/schemas/northwind#>
...
create iri class northwind:Customer
<http://^{URIQADefaultHost}^/Northwind/Customer/%U#this>
(in customer_id varchar not null)
...
alter quad storage virtrdf:DefaultQuadStorage
...
from Demo.demo.Customers as customers
from Demo.demo.Orders as orders ...{
create virtrdf:NorthwindDemo
as graph iri ("http://^{URIQADefaultHost}^/Northwind") {
...
northwind:Customer(customers.CustomerID) a foaf:Organization
as virtrdf:Customer-CustomerID ;
northwind:companyName customers.CompanyName as ... ;
...
northwind:fax customers.Fax as virtrdf:Customer-fax .
northwind:Customer(orders.CustomerID)
northwind:has_order northwind:Order(orders.OrderID)
as virtrdf:Order-has_order .
...
}}
Northwind Demo Database: Customer Table to RDF data item Mapping
LinqToRdf to MusicBrainz - Conceptual Model Veneer
ADO.NET Data Services & Entity Data Model
A framework for exposing 'pure data' service over HTTP
No support for RDF
- Fails to imbibe any of RDF's inherent benefits
Lack of platform independence & standards compliance
- Supports REST-style interfaces
- Supports Atom, JSON and XML payloads
But
- Server-side: Windows only
- Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser
ADO.NET Data Services & Entity Data Model
Server-side only conceptual model
- Powerful URL addressing to query/navigate/sort/filter etc
-
Customers collection: http://myserver/data.svc/Customers
-
Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')
-
Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders
But
- Client must know conceptual schema
- e.g. to construct above URIs
Lack of Deferencable Entity IDs
- Ability to discover entities and dereference their descriptions (attributes/relations)
is confined to the facilities offered by .NET
c.f. SPARQL's ability to handle unknown data sources
ADO.NET Data Services & Entity Data Model
No Support for Non-SQL Data Sources
- Astoria is aimed exclusively at making relational data Web accessible
c.f. Linked Data
- Recognize that vast amounts of data resides in unstructured and semi-structured data sources
- Support for embedding RDF into existing (X)HTML
- Emerging tools for converting non-RDF data to RDF model data
- Emerging tools for exposing Relational data as RDF Graph Model data
Astoria lacks scalability & scope of Semantic Web technologies