Virtuoso Open-Source Wiki
Virtuoso Open-Source, OpenLink Data Spaces, and OpenLink Ajax Toolkit
Advanced Search
Location: / Dashboard / Main / VOSIndex / VirtSetCrawlerJobsGuide / VirtCrawlerSPARQLEndpoints

Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint

The following step-by guide walks you through the process of:

  • Populating a Virtuoso Quad Store with data from a 3rd party SPARQL endpoint
  • Generating RDF dumps that are accessible to basic HTTP or WebDAV user agents.
  1. Sample SPARQL query producing a list SPARQL endpoints:

    PREFIX rdf: <> PREFIX rdfs: <> PREFIX owl: <> PREFIX xsd: <> PREFIX foaf: <> PREFIX dcterms: <> PREFIX scovo: <> PREFIX void: <> PREFIX akt: <> SELECT DISTINCT ?endpoint WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?endpoint }

  2. Here is a sample SPARQL protocol URL constructed from one of the sparql endpoints in the result from the query above:

  3. Here is the cURL output showing a Virtuoso SPARQL URL that executes against a 3rd party SPARQL Endpoint URL:

    $ curl " omepage+%3Furl+%7D%0D%0A&format=sparql" <?xml version="1.0"?> <sparql xmlns=""> <head> <variable name="url"/> </head> <results ordered="false" distinct="true"> <result> <binding name="url"><uri></uri></binding> </result> <result> <binding name="url"><uri></uri></binding> </result> <result> <binding name="url"><uri></uri></binding> </result> <result> <binding name="url"><uri></uri></binding> </result> ... ... ... </results> </sparql>

  4. Go to Conductor UI. For ex. http://localhost:8890/conductor :

  5. Enter dba credentials
  6. Go to "Web Application Server"-> "Content Management" -> "Content Imports"

  7. Click "New Target"

  8. In the presented form enter for ex.:
    • "Crawl Job Name": voiD store
    • "Data Source Address (URL)": the url from above i.e.:

    • "Local WebDAV Identifier":


    • "Follow links matching (delimited with ;)":


    • Un-hatch "Use robots.txt" ;
    • "XPath expression for links extraction":


    • Hatch "Semantic Web Crawling";
    • "If Graph IRI is unassigned use this Data Source URL:": enter for ex:


    • Hatch "Follow URLs outside of the target host";
    • Hatch "Run "Sponger" and "Accept RDF"

  9. Click "Create".
  10. The target should be created and presented in the list of available targets:

  11. Click "Import Queues":

  12. Click "Run" for the imported target:

  13. To check the retrieved content go to "Web Application Server"-> "Content Management" -> "Content Imports" -> "Retrieved Sites":

  14. Click voiD store -> "Edit":

  15. To check the imported URLs go to "Web Application Server"-> "Content Management" -> "Repository" path DAV/

  16. To check the inserted into the RDF QUAD data go to http://cname/sparql and execute the following query:

    SELECT * FROM <http://void.collection> WHERE { ?s ?p ?o }


Powered By Virtuoso