• Topic
  • Discussion
  • VOS.VirtCartridgesReification(Last) -- DAVWikiAdmin? , 2019-12-05 15:01:41 Edit WebDAV System Administrator 2019-12-05 15:01:41

    Reification in the Virtuoso Sponger

    Note: Some of the underlying implementation of reification is in flux

    What is Reification?

    Reification is one level of useful abstraction, in which raw triples are modeled as resources in their own right, allowing description and annotation of those triples.

    A typical use is provenance: given a particular resource to sponge, the Virtuoso Sponger has many components that can contribute triples, so it can be useful to trace which cartridge is responsible.

    Data Islands

    In addition to the datasource-specific cartridges, the HTML+Variants extractor cartridge identifies several ways of embedding RDF data in HTML, which we term data islands.

    • HTML5 Microdata (itemscope, itemtype, itemprop attributes)
    • RDFa microdata (about, property, resource attributes)
    • JSON-LD using <script type="application/ld+json"> ... </script>
    • Turtle and N3 using <script type="text/turtle"> ... </script>
    • GRDDL (hRecipe, hCard, hCalendar, hProduct, xFolk, eRDF, etc)

    Additionally, if installed, the Turtle Meta-cartridge identifies Turtle in any "content" triple, e.g. titles, descriptions, social media post bodies, etc.

    Configuration

    The HTML+Variants extractor cartridge takes a handful of options by which one can configure which data-islands contribute:

    • rdfa=yes - controls whether the RDFa extractor runs
    • reify_rdfa=1 - determines whether extracted RDFa is reified
    • reify_html5md=1 - determines whether extracted HTML5 Microdata is reified
    • reify_jsonld=1 - determines whether extracted JSON-LD is reified
    • reify_all_grddl=0 - determines whether all other GRDDL data is reified

    Sample Input

    Let us assume a very simple input HTML document, as follows:


    <html 
      xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
      <head>
        <title property="dc:title" content="Turtle test">Turtle-in-script test</title>
        <script type="text/turtle">
        <![CDATA[
        <http://example.org/person/Mark_Twain>
    	<http://example.org/relation/author> 
    	<http://example.org/books/Huckleberry_Finn> ;
    	<http://xmlns.com/foaf/0.1/#name> "Mark Twain" .
        ]]>
        </script>
        </head>
      <body>
        <h1>Testing Turtle in scripts</h1>
        Stuff
        <hr />
      </body>
    </html>
    

    As we can see, this contains one RDFa statement in the <title> element and a small pool of Turtle data in a script element.

    Sample Output

    When sponging with the default settings for HTML+Variants extractor cartridge enabled, we see:

    type Document
    sameAs #this
    container of Embedded RDFa Statement 1
    Embedded TTL-script Statement 1
    Embedded TTL-script Statement 2
    Title Turtle-in-script test

    Expanding the Embedded RDFa Statement 1, we see:

    type Statement
    label Embedded RDFa Statement 1
    described by Turtle test
    <>
    subject Turtle test
    predicate Title
    object Turtle test
    Sponge Time 2014-06-11 14:42:40.200348 (xsd:date)