You are here:
/ Dashboard / Main / VirtSpongerCartridgeRDFExtractor

Sponger Cartridge RDF Extractor

Used to extract RDF from a Web Data Source it consumes services from: Virtuoso PL, C/C++, Java based RDF Extractors

The RDF mappers provide a way to extract metadata from non-RDF documents such as HTML pages, images Office documents etc. and pass to SPARQL sponger (crawler which retrieve missing source graphs). For brevity further in this article the "RDF mapper" we simply will call "mapper".

RDF Mappers Concept

The mappers consist of PL procedure (hook) and extractor, where extractor itself can be built using PL, C or any external language supported by Virtuoso server. See the Sponger Cartridge RDF Extractor PL Requirements for more information.

Once the mapper is developed it must be plugged into the SPARQL engine by adding a record in the table DB.DBA.SYS_RDF_MAPPERS.

If a SPARQL query instructs the SPARQL processor to retrieve target graph into local storage, then the SPARQL sponger will be invoked. If the target graph IRI represents a deferencable URL then content will be retrieved using content negotiation. The next step is the content type to be detected:

  • If RDF and no further transformation such as GRDDL is needed, then the process would stop.
  • If such as 'text/plain' and is not known to have metadata, then the SPARQL sponger will look in the DB.DBA.SYS_RDF_MAPPERS table by order of RM_ID and for every matching URL or MIME type pattern (depends on column RM_TYPE) will call the mapper hook.
    • If hook returns zero the next mapper will be tried;
    • If result is negative the process would stop instructing the SPARQL nothing was retrieved;
    • If result is positive the process would stop instructing the SPARQL that metadata was retrieved.

References

Virtuoso and the Virtuoso Website are Copyright (C) OpenLink Software 2006-
SourceForge.net Logo