Open Conceptual Data Models
Conceptual Data Models in the Linked Data Web
Linked Data Vision:
- The transition of the Web
-
from a Web of linked documents
-
to a Web of interlinked structured data items
(aka: entities, data objects, resources)
Concurrent trend in the IT industry:
- The transition of the Web
- A recognition of the benefits of conceptual data models vs logical data models
The Big Question:
- To what extent does the Linked Data support conceptual level data models?
Open Conceptual Data Models
Topics:
- Conceptual & Logical Data Models
- Conceptual Models for the Semantic Web
- Realizing Conceptual Models through Ontologies & Linked Data
-
Virtuoso RDF Views
- ADO.NET Data Services & the
Entity Data Model
Conceptual & Logical Data Models
Describe a software system's target problem space
Typically, in today's database-driven applications
Three levels of data model
-
Physical
- How data is physically represented on disk
-
Logical (aka logical schema)
- Expresses problem domain in terms of data management technology (tables / columns)
- e.g. relational schema
-
Conceptual (aka conceptual schema)
- Purely semantic description of problem space
- Describes things (entities), their
characteristics (attributes) &
associations between things (relationships)
Logical Data Model
- Most prominent of the three data model types
- Main focus of database applications
- Due to pervasiveness of SQL in application code
Weaknesses
- Impedance mismatch
- Loss of semantics during development process
- Heterogeneous databases & interoperability
Logical Data Model Weaknesses
Impedance Mismatch
- SQL expresses queries in terms of tables / views
- => targets logical schema
- Normalization fragments the data model
- Entities & their attributes may be split across several tables
- Navigation between objects requires relational joins over two or more tables
- Table rows must be reconstituted into higher level conceptual entities
Conceptual level data model is desirable to:
- Remove impedance mismatch
- Isolate application from changes to logical data model
- Provide framework for human level interaction
Logical Data Model Weaknesses
Loss of Semantics During Development
Process:
- Develop conceptual model (E-R modelling)
- Transform to logical model for implementation
- Derive physical model from logical model
Problems:
- Each move to a lower level model discards meaning
- Higher level model typically not retained
- Model semantics fragmented across schema / business rules / application code
- Application must know logical data model
- Must be hardcoded or inferred (imperfectly) from system tables
Logical Data Model Weaknesses
Heterogeneous
Databases &
Interoperability
Logical data model
- Describes problem domain in terms of tables/columns
- Requires SQL to navigate model
Application
- Exposed to specifics of a particular vendor's RDBMS
In heterogeneous database environment, must handle
- Different SQL dialects
- Different schemas
- No explicit data model.
- No explicit semantics.
- Interoperability/integration = perpetual problem for IT depts
Conceptual Models for the Semantic Web
- Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications
- e.g. Microsoft's Entity Data Model / Entity Framework
- Semantic Web technologies provide powerful tools for this paradigm shift
Benefits of Conceptual Models
How the Semantic Web benefits
- More faithfully represents human view of domain of interest
- Conceptual model & semantics
- Explicit & available globally
- Not implicit & fragmented across business logic / UI etc
- Better / explicit semantics promises better search engines
- Much easier heterogeneous data integration
- Data on the Web is inherently heterogeneous
Application Areas - Present & Future
-
Social networking,
e-commerce, collaborative working
- Require shareable, standards-based, cross-platform conceptual views of data
-
Data portability
- Needed as Web users maintain multiple points of presence - blogs, social network accounts etc.
-
Open business models
- Require exchange & integration of large amounts of data
-
Scientific research - sharing of knowledge & findings
- Requires transparent access to distributed heterogeneous data
- Requires database integration using global schema
-
Autonomous intelligent agents
- Free humans from large-volume information processing
Semantic Web Technology Benefits
What Semantic Web technologies bring:
Ontologies
- Can represent common semantics
- Spanning databases, applications, enterprises, on-line communities
- Act as a shared conceptual model
- Provide common models (FOAF,
SIOC etc)
Common Semantics (Ontologies) & Common Data Representation (RDF)
- Enable cross data source querying using SPARQL
- Content from several sites can be combined / explored
- Querying using proprietary APIs unnecessary
- Brute force data merging unnecessary
Open Data Formats, Platform Independence, Common Models
- Allow data portability and data integration
Realizing Conceptual Models
Ontologies
- Provide the building blocks of Semantic Web conceptual models
- Define the concepts and their relationships in a domain of interest
Describing Classes & Properties -
Ontology Languages
- RDFS
- Introduces the notions of concepts (classes) & instances
- OWL
- Adds more vocabulary for describing:
- relations between classes
- cardinality
- richer typing of properties, etc.
Goodness of Fit
- RDF was designed from the ground up as a metadata data model
- RDF / RDFS / OWL work directly at the level of conceptual models
- Conceptual model terminology matches RDF/OWL terminology
- Concepts, entities, attributes, relationships
A natural fit!
RDF lends itself naturally to describing conceptual models
Semantic Expressivity
DDL-based Relational Model
- Relationship between two entities isn't explicit
- Foreign key relating two rows in separate tables doesn't express the nature of the relationship
- Semantics must often be inferred from table definitions
RDF-based Conceptual Model
- Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple
- Semantic expressivity of RDF/RDFS/OWL is much better than DDL
- Has richer semantic content than equivalent DDL-based logical/relational model
RDF Conceptual Model - Artist / Records / Tracks
Global Granular Information Sharing
Traditional Logical/Relational Data Model
-
Schema described by DDL is internal to DBMS
- Primary keys identifying an individual table row (i.e. entity instance) not globally unique, not easily usable outside host DBMS
- Gives rise to 'data silos'
RDF's use of HTTP-based URLs
- Externalises the data and schema
- Makes both globally accessible & scalable
- Provides globally unique IDs for entities/relations/classes
- A vehicle for granular, global information sharing down
to the equivalent of the record level
Linked Data - What is It?
A method for exposing, sharing & connecting data on the Web
- A term coined by Tim Berners-Lee that describes HTTP-based
Data Access
by Reference for the Web
-
Open Data Access & Connectivity mechanism for the Web
- A richer linking mechanism for the Web that takes us from
Hypertext Links (Document to Document) to Hyperdata Links (across things that documents are about)
Linked Data - Why Is It Important
- It exposes the compound nature of Web Resources
- Information resources (Containers) are uniquely identified & referenceable
- Entities within Containers are uniquely identified & referencable
- It provides an Open Data Access & Connectivity mechanism for the Web
- It delivers a powerful mechanism for meshing disparate and heterogeneous data sources
Linked Data Model
Changes the focus from linked documents to linked entities
The document as a data container becomes less relevant
Hyperdata Links Between Data Objects
Linked Data Benefits - Natural Navigation
Natural Navigation Through Typed Links
- RDF entities are identified by dereferencable URIs (URLs)
- Navigating from one data item to another is easy
- One click to dereference in Semantic Web Browser
- e.g. OpenLink Data Explorer
-
URI of object in an RDF statement is a typed link
- Link's "type" is defined by the statement predicate
Relational/Logical Model
- Cumbersome
- Requires SQL joins + typically Object-Relational mapping
- e.g. in C#: track = lennonAlbum.Tracks["Imagine"]
Linked Data Benefits - Aggregatable Data
Often desirable to have an integrated view of all the data available about an item or topic
Database Realm
- Integration problematic, difficult to combine logical schemas
Semantic Web
- Data aggregation is easy: every resource has a unique URI
- Individual items can be linked
- Conceptual models can be linked
- Cross-domain links enrich domain knowledge
- Different facets of the same entity may be described by different URIs minted by different authors
- Can be linked. e.g. owl:sameAs, rdf:type predicates
- May expose facts not directly represented in any one source
Linked Data - Data Aggregation
Linked Data Benefits - Self Describing Data
RDF
- A technology for creating self-describing Web resources
- Entity's type definition 'accompanies' it using rdfs:type
- An RDF dataset can be queried using SPARQL without knowing anything beforehand about the data
- Provides the basis for powerful data exploration tools
Logical / Relational Schema
- Users / applications need a detailed understanding of the schema to use and navigate the data
- Application's knowledge of the schema typically hardcoded
- Ad-hoc end-user data exploration potentially error prone
Linked Data Benefits - SPARQL
If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it
can use the URI to retrieve the information
The Power of SPARQL
Discover what sorts of things a data source contains
-
select distinct ?URI ?ObjectType where { ?URI a ?ObjectType }
Determine all the properties of an entity class
-
select * where { <http://my.org/resourceTypes/Department>
?property ?hasValue }
Determine all the properties and values of an entity instance
-
DESCRIBE <http://my.org/resource/Accounts>
No prior knowledge of the RDF data source is needed
Virtuoso - Linked Data Generation Options
Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
Virtuoso RDF Views
- Expose relational data as RDF
- Provide the means to move from a logical model view to a conceptual model view
- Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)
- No physical regeneration of relational data
- RDF Views =
- Virtuoso RDF Meta-Schema +
- Meta-Schema Language
- MSL =
- A domain specific, declarative language for mapping a logical SQL data model to a conceptual RDF data model
Northwind Demo Database: RDF View Definition Extract
Demo.demo.Customers:
Northwind RDF View Definition
prefix northwind: <http://www.openlinksw.com/schemas/northwind#>
...
create iri class northwind:Customer
<http://^{URIQADefaultHost}^/Northwind/Customer/%U#this>
(in customer_id varchar not null)
...
alter quad storage virtrdf:DefaultQuadStorage
...
from Demo.demo.Customers as customers
from Demo.demo.Orders as orders ...{
create virtrdf:NorthwindDemo
as graph iri ("http://^{URIQADefaultHost}^/Northwind") {
...
northwind:Customer(customers.CustomerID) a foaf:Organization
as virtrdf:Customer-CustomerID ;
northwind:companyName customers.CompanyName as ... ;
...
northwind:fax customers.Fax as virtrdf:Customer-fax .
northwind:Customer(orders.CustomerID)
northwind:has_order northwind:Order(orders.OrderID)
as virtrdf:Order-has_order .
...
}}
Northwind Demo Database: Customer Table to RDF Entity Mapping
LinqToRdf to MusicBrainz - Conceptual Model Veneer
ADO.NET Data Services & Entity Data Model
A framework for exposing 'pure data' service over HTTP
No support for RDF
- Fails to imbibe any of RDF's inherent benefits
Lack of platform independence & standards compliance
- Supports REST-style interfaces
- Supports Atom, JSON
and XML payloads
But
- Server-side: Windows only
- Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser
ADO.NET Data Services & Entity Data Model
Server-side only conceptual model
- Powerful URL addressing to query/navigate/sort/filter etc
-
Customers collection: http://myserver/data.svc/Customers
-
Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')
-
Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders
But
- Client must know conceptual schema
- e.g. to construct above URIs
Lack of Deferencable Entity IDs
- Ability to discover entities and dereference their descriptions (attributes/relations)
is confined to the facilities offered by .NET
c.f. SPARQL's ability to handle unknown data sources
ADO.NET Data Services & Entity Data Model
No Support for Non-SQL Data Sources
- Astoria is aimed exclusively at making relational data Web accessible
c.f. Semantic Web & Linked Data
- Recognize that vast amounts of data resides in unstructured and semi-structured data sources
- Support for embedding RDF into existing (X)HTML
- Emerging tools for converting non-RDF data to RDF
- Emerging tools for exposing SQL data as RDF
Astoria lacks scalability & scope of Semantic Web technologies