Virtuoso Frequently Asked Questions

SQL Database Engine Functionality

Is Virtuoso a relational DBMS?

Virtuoso implements a traditional relational database engine's functionality. This functionality includes SQL (Structured Query Language) for relational data, a Query processor, View Support, Standard Datatype support including character, number and dates, stored procedures, concurrency support, transactions, distributed transactions, scrollable cursor support just to name a few.

What client API's does Virtuoso support?

ODBC, JDBC, .net data provider, OLE/DB. The Java and .net clients are pure Java and .Net respectively. ODBC is Virtuoso's native low level CLI.

What level of SQL is supported?

Virtuoso is fully SQL-92 compliant and supports several of the SQL 200n features, including:

User Defined Data Types with methods, inheritance etc.
User defined aggregates, including 20+ built-in statistical operations.
SQLX for generating XML results in queries, including a native XML data type.
Cubes and Rollup in GROUP BY.

What is the SQL security model supported by Virtuoso?

Virtuoso supports standard SQL role based security and full table and column level granting of privileges. Additionally, Virtuoso offers row level security through a system of policy functions. Event hooks can be defined for performing extra validation at login time, such as consulting an LDAP directory for the user's credentials.

What is Virtuoso's transaction support?

Virtuoso's Transaction Manager component ensures that transactions are Atomic, Consistent, Isolated and Durable (ACID). The Transaction Manager ensures VDB Engines are capable of supporting Online Transaction Processing (OLTP) and Distributed Transaction oriented applications and services. VirtuosoA~?A^?s Transaction Manager component ensures that transactions are Atomic, Consistent, Isolated and Durable (ACID). Virtuoso provides the 4 standard levels of isolation, dirty read, read committed, repeatable read and serializable. Repeatable read is the default setting. See Virtual Database Questions for a discussion of distributed transactions.

What is Virtuoso's distributed transaction support?

Virtuoso preserve transaction atomicity, consistency, integrity, and durability across it own database servers as well as heterogeneous servers through the support of transaction commits and rollbacks using a 2-phase commit protocol.

What SQL optimization does Virtuoso perform?

Virtuoso has a cost-based SQL optimizer. It uses table row count, data size and value distribution statistics for evaluating the cost of diverse execution plans. For each plan, loop invariants are extracted, loop and hash join types are evaluated, different indices are compared for access performance and predicates are evaluated as early as possible, most restrictive first, for any join order being contemplated. Additionally, the programmer can explicitly specify the join order and join type (loop/hash) for each table.

How does one administrate a Virtuoso database?

Virtuoso database can be administered through the Virtuoso Admin Interface or through the interactive SQL Tool ISQL.

How does Virtuoso handle backups?

Backups can be handled in a variety of ways. Virtuoso has both off-line and on-line incremental backups.

Virtuoso's transaction mechanism is based on keeping a read-only checkpoint plus a transaction log for any committed but non-checkpointed transactions. Virtuoso has an incremental, on-line backup function, which can backup a running database in its checkpoint state without restricting concurrent update activity. Virtuoso records at each checkpoint the pages, which were changed since the last backup. Hence the next backup will only cover pages , which are changed in the checkpoint state since the last backup. Each checkpoint can optionally start a new transaction log, leaving a full audit trail of transaction logs. Having a series of full plus optionally incremental backups up to a certain checkpoint and then the transaction logs consecutive to the last backed up checkpoint will guarantee a full recovery up to the latest transaction committed at the time of failure. The backup function automatically compresses the backup and chunks it into fixed size chunks for convenience in handling large databases.

On the other hand, as long as no checkpoint intervenes, it is safe to copy the database file(s) plus the log for an up-to-data clean image of the database.

What is Virtuoso's support of stored procedures?

Virtuoso has a very extensive procedure language called Virtuoso PL. . The syntax of the Virtuoso PL resembles C with in-line SQL 99 and PSM 96 features such as exception handlers. Stored procedures provide a significant performance increase over client applications on the same machine for any application involving a number of short SQL operations, as in the case of OLTP.

Can one extend Virtuoso's SQL?

Virtuoso can extend SQL by import Java and .NET classes and functions as well as build-in SQL functions written in C. In addition , you can persist instances of imported classes into Virtuoso tables. For all intents and purposes, a hosted class instance is indistinguishable from a native SQL user defined type instance.

Is there full text indexing support?

Virtuoso supports full text indexing which provides the ability to choose complex, multi-part document id's for application specific sorting of hits, efficient storage of secondary, non-free text data in the free text index for best retrieval performance, options for restarting searches at a specific hit as well as ascending and descending orders of the document id's.

What is the internationalization support of Virtuoso?

Virtuoso supports Unicode (NCHAR/LONG NVARCHAR) columns as well as supporting a national character set, which define how strings will get converted from narrow to wide characters. There is a number of pre-defined character sets included in a system table called SYS_CHARSETS. This list can be updated by defining new characters using a build-in function called charset_define().

What encoding formats are supported?

Virtuoso supports Unicode, ASCII and UTF.

What development environment/debugging/profiling facilities are there for SQL and stored procedures?

Virtuoso comes with a library of online tutorials which demonstrate all salient aspects of software development in Virtuoso/PL. Examples include XML processing, full text features, dynamic web pages, hosting Java and .Net and various scripting languages etc.

Virtuoso has a rich set of debugging and profiling tools. The interactive SQL utility has a debug mode where it can set breakpoints and single step stored procedures with functionality similar to what gdb or dbx offer for C.

On Windows, there is a MS Visual Studio extension package which allows defining Virtuoso SQL projects, provides IntelliSense syntax highlighting and completion and many other features. The Virtuoso .net data provider has design time interfaces to Visual Studio for drag and drop use in C# projects.

For performance profiling, Virtuoso offers a call graph profiler and test coverage utility. These show elapsed times and line-by-line execution counts for stored procedure code.

For regular database statistics, Virtuoso provides index by index cache hit rates, wait times, deadlock counts and other information for assisting database tuning.

Does Virtuoso support stored procedures across different database engines?

Yes, Virtuoso enables you to write stored procedures that reference tables hosted in different database engines. This has two major benefits:

You can store more of your database centric application logic within virtuoso and then leverage the performance advantages that stored procedures have over dynamic SQL.
Your stored procedures no longer have to be database specific, a major reason why stored procedures are not used in when writing database independent applications.

Does Virtuoso support VIEWS that include tables hosted different database environments?

Yes, you can create logical VIEWS that include joins across tables within different database engines and data sources (XML and Web Services).

What tools does Virtuoso support for analyzing data?

Virtuoso supports the basic SQL OLAP extensions for GROUP BY, i.e. CUBE, ROLLUP, GROUPING etc. Virtuoso does not offer specialized storage for OLAP cubes.

Object-Relational Functionality

Object-Relational Database

What about SQL-200n Object Oriented features?

Virtuoso SQL supports SQL2000-style objects as standalone data in procedures and as column data types. The Virtuoso Object System supports single inheritance, late binding, polymorphism and persistence of objects as column values in SQL tables. An object's implementation may be native SQL, with methods in/PL, or the objects may be implemented in Java, or any Microsoft .NET or Mono ECMA CLI bound language. Hosted and native objects are indistinguishable from the perspective of a Virtuoso SQL application. The native methods and data members are automatically made accessible from SQL when the class or class hierarchy is imported into the SQL schema (data dictionary). Virtuoso supports single inheritance between tables through the SQL-200n UNDER clause. A subtable will inherit the supertable's columns, primary key and indices and can for itself add new columns and indices. Selects on supertables will also include rows that belong to subtables.

What does Virtuoso offer in terms of User Defined Types or other datatype extensions?

Virtuoso supports SQL 2K User Defined Types (UDTs) and User-Defined Functions (UDFs) which can be defined in SQL, Java and .NET-bound languages. Virtuoso also offers Procedure Views which are similar to table-valued functions that allows you to define a view as a stored procedure.

How do Third-party Runtime-Hosting modules execute?

Runtime Hosting Languages such as Perl, Python, Ruby loaded in process. These are callable via special functions from SQL. Web pages in these languages can be hosted without any special programming.

What types of security or privileges are provided for Third-party Runtime Hosting?

Privileges are roles based at the table, column, procedure and type levels. Row Level security is based on table by table policy functions.

Does Virtuoso Object-Relational database support arbitrary length types?

Virtuoso supports all combinations of narrow, wide and binary strings.

What does Virtuoso offer in terms of Complex Objects like the creation of Composite Types?

Virtuoso allows you to create UDTs which can reference any other UDTs. You can also declare columns to contain sort and long UDT instances regardless of implementation be it SQL, Java or .NET.

How are Composite Types handled and called?

User-Defined Functions' arguments and return-values are passed by value and by reference. OID-referenced objects have no persistent identity. Virtuoso functions and composite types support dot-operators for data-members and member functions like other formal languages. In addition, since the objects are types at runtime arrays and other complex data structures can also contain UDT instances as well as other data.

What about Data Inheritance?

Single inheritance for tables is provided via the under table and under for UDT functionality. Function-overloading is provided via dynamic binding of methods based on runtime type of self when there are multiple eligible methods, discriminating on compile-time types of arguments.

Does the Virtuoso Object-relational Database Support Triggers?

Virtuoso supports SQL99 row-level triggers that can call outside code for signalling to clients although no special client-notification mechanism is supported by the Virtuoso client-server protocol.

XML Data Management

XML Database

How does Virtuoso store XML?

Virtuoso offers the XMLType datatype. This data-type can be made from a text representation and can again be converted into text. XPATH, XSLT, XQuery and SQLX operations can be applied to this type. It can be stored as either text or in a pre-parsed binary format. XMLType defined as a column value, a XML tree can be specifically full text indexed so that XML elements are taken into account. Besides normal text index operations, this indexing also allows resolving occurrence of elements, inclusion of text inside elements etc.

How does Virtuoso generate XML from relational data?

Virtuoso has several mechanisms for transforming SQL data into XML. One option is the "FOR XML" clause, which is a SQL option that can convert the output of a SELECT statement into a tree.

SQLX is the standard SQL way of creating XML content from relational data. Special aggregates and functions allow composing trees in queries
Mapping Schemas - A mapping schema is an XML schema, which defines how the data is to be extracted from, joins of relational tables. When an XPATH query is evaluated against a mapping schema, Virtuoso generates an SQL query, which will retrieve only the required rows and then generate the XML tree.

Can Virtuoso present legacy RDBMSs as XML stores?

The SQL-based XML-generation features do not differentiate between local and remote tables, so any relational data will be accessible via these means as XML.

How does one combine native XML and relational data in queries?

Virtuoso SQL offers special predicates for testing stored XML against XPATH and text conditions as well as extracting fragments from stored XML trees. New XML can be made from these with the SQLX functions or XSLT stylesheets, for example. A result column of a query may fully well consist of XML data. Inside a stored procedure, further XML operations can be applied to the data, in the case of a client the XML is seen as text via the client API.

Another approach is to write queries in XQuery and access SQL data from there. Any XML mapping schema appears as a document accessible with the XMLView XPATH extension function. Joins of tables can thus be accessed and filtered as if they were XML data to begin with. Note that this does not entail constructing XML for data, which is actually not needed by the Xquery statement.

How do you make and transform XML inside queries and SQL procedures?

XSLT and XQuery are most convenient for extensive processing of XML. For quickly generating simple XML trees from SQL data, SQLX functions are most convenient. For complex mapping of relational and XML data, mapping schemas are most convenient. For a one-to-one mapping between a foreign-key relation in SQL and tree-hierarchy in XML, the `FOR XML' SQL extension is most convenient.

What is the XPATH, XSLT, XQuery standard-compliance?

Virtuoso currently supports XPATH 2.0, XSLT 1.1, and XQuery 1.0.

What are the searching functions available: Standard XML languages (XQuery, XPath or/and proprietary languages)?

XPath, Xquery, Freetext queries, and, or not proximity and wildcards. A free-text expression may be embedded inside an XPath predicate using a special XPath expression.

What about SQLX standard support?

Virtuoso supports offers full support of SQLX which is SQL with a collection of functions added for creating XML entities from standard relational queries. The SQL/XML is an emerging standard driven by the H2.3 Task Group (formerly the SQLX Group). Current support for these functions includes XMLELEMENT, XMLATTRIBUTES, XMLFOREST, XMLCONCAT, and XMLAGG when combined with SQL result in XML that is returned in a column in a result set.

How can one combine full text and XML structure-based queries in document-centric applications?

This is possible in either SQL or XQuery. In SQL, for example, one can join a table of XML documents with author information by writing a SQL join between a publications table and an authors table. The author reference would just be extracted with XPATH after application of a free-text-based content-filtering.

In XQuery, the documents table and authors tables would be represented as documents via mapping schemas and the one could write flwr statements for joining the two.

RDF Data Management

RDF Triple Store

How do you load RDF data into Virtuoso RDF Triple Store?

You can load N3, Turtle, and RDF/ XML files into a Virtuoso-hosted "named graph" using Virtuoso SQL functions. The same functionality is also available to single triple statements.

Virtuoso also has the ability to automatically extract metadata from DAV resources via metadata extractors. Virtuoso includes a number of metadata-extractors for a range of known data-formats (typically microformats and some popular binary file types). These metadata extractors enable automatic triple-generation, graph-association, and storage in Virtuoso's RDF Triple Store. It is also important to note that Virtuoso actually converts WebDAV metadata into RDF providing richer query capability against WEbDAV resources.

How do you query and perform searches with Virtuoso RDF Triple Store?

SPARQL statements can be written inside SQL statements or presented as top-level SQL queries. This means that any ODBC, JDBC, .NET or OLE/DB application can simply make SPARQL queries just as if they were SQL queries.

Virtuoso also supports the SPARQL transport protocol, allowing SPARQL queries to be executed over HTTP. It also supports the SPARQL XML results serialization format.

What RDF Datatypes are supported in Virtuoso RDF Triple Store?

If an RDF type corresponds to a SQL data type, the data is stored as a native instance of the SQL type in vfaq_question. For strings with language tags and other RDF data that has no direct SQL counterpart, a special representation preserving the RDF semantic is used.

How many graph models can Virtuoso RDF Triple Store support?

Unlimited.

What Data Access methods are supported?

SPARQL protocol over HTTP; any SQL client library can be used for issuing SPARQL queries just as well as SQL queries.

Application Integration (Web Services & SOA)

Web Services

Why is Virtuoso described as a Web Services Platform?

Virtuoso implements the complete stack of Web Services foundation protocols; SOAP, WSDL, and UDDI. It enables SQL Stored Procedures, Microsoft .NET, Mono, and Java-based application logic to be invoked using SOAP. These SOAP-compliant services are automatically described using WSDL, and advertised for binding via UDDI. This entire process is achievable without writing a single line of new code.

What is the benefit of Service Composition in Virtuoso?

Exposure of existing time-tested application-logic for invocation using Web Services protocols without any code re-writes. Code format support includes SQL Stored Procedures, .NET assemblies, Java Classes, C/C++ modules, etc.

What is meant by Service Invocation Endpoints?

Service Invocation endpoint are HTTP/WebDAV based virtual directory and multi-homing functionality that provides endpoints for SOAP-, WS-Security-, WSDL-, and UDDI-compliant interactions with composite services.

Content Management, WebDAV and HTTP Services

Can OpenLink Virtuoso act as a repository?

OpenLink Virtuoso can act as a repository through DAV, or WebDAV, a protocol for Web-based Distributed Authoring and Versioning. Repository content elements are called documents, corresponding to files, and folders/collections, corresponding to directories. Collectively these documents and folders (collections) are known as resources.

OpenLink Virtuoso implements the DAV protocol, allowing you to create and manage resources either directly through repository manipulations or indirectly, through a variety of WebDAV services.

Can I host an Entire Web Site inside a OpenLink Virtuoso Database?

Yes, as an HTTP/WebDAV-compliant server all of the components within its WebDAV repository are accessible by URL and WebDAV protocol using HTTP/WebDAV-compliant clients (user agents such as web browsers). The same applies to local operating systems files (subject to security controls of course).

Examples of WebDAV clients include the Windows (via Web Folders feature), Mac OS X, Nautilus Desktop (Linux and Solaris). An ever increasing pool web development tools and content-management systems include in-built support for WebDAV.

Migrating your existing Web Site to one hosted by OpenLink Virtuoso is as simple as using the OpenLink Virtuoso HTML-based UI to import the site.

What file-transfer protocols does OpenLink Virtuoso support?

Virtuoso has provided a built-in RFC-959/RFC-2389 Protocol Server for FTP as well as an FTP client since V3.2. OpenLink virtuoso FTP Client allows through OpenLink virtuoso PL code the ability to retrieve, submit and list files from any FTP server and stored the results on a local file system or in the database. The FTP Server provides FTP access to the OpenLink virtuoso WebDAV repository using the same authentication and permissions system as WebDAV.

Web Server

Does Virtuoso offer a Web application development language?

Yes, the Web application development language is called OpenLink Virtuoso Server Pages (VSP; file extension .vsp) which can be used in conjunction with XML-based server side control called OpenLink Virtuoso Server Pages for XML (VSPX, file extension .vspx). VSPX offers a suite of data-bound controls for browsing and updating SQL data, input-validation, session-management and any other common web application development environment features.

XSLT can be used for pre- and post-processing VSPX pages. On the preprocessing side, it can serve to divide logic from layout by expanding simple markup into complex controls and scripting. On the output side it can be used to process the HTML generated by a dynamic web page to add HTML layout and graphic elements to bare-bones data produced by the business logic.

Can Virtuoso be used as Search Engine?

Yes, like any search engine it possesses Free Text search capabilities that leverage its ability to produce Free Text indexes on all text data (SQL or otherwise). OpenLink Virtuoso free text search includes word-proximity searches and the ability to combine XPATH, Free Text and Regular Expressions, if required, in the same query. Ultimately you can harvest and then index any form of text-based web data (HTML, XHTML, XML etc.) stored in Virtuoso.

What is Virtuoso's relevance in the world of XML?

It addresses the fundamental vfaq_question: from where is all the XML data going to come? You can't exploit the benefits of XML without XML data; likewise you can't manually recreate XML data in an attempt to address this reality. OpenLink Virtuoso enables you to create XML data from existing data sources such as your SQL databases. It also enables the creation of XML data from data external URL accessible data. An in-built validating XML parser and an XSL-T engine lie at the root of Virtuoso's XML Services offerings.

Does Virtuoso store XML Data Natively?

Yes, XML documents are stored in an XML repository. These documents may be parsed or unparsed at time of storage; in either case indexes are built which provide rapid access to these documents.

Why is XML Data-Storage Important?

A major benefit of XML is its ability to provide an open format for data representation, exchange, protocol and application modeling. By using XML as a uniform data interface to disparate data sources, it becomes much easier to cost-effectively develop and deploy next generation web applications; increasingly these applications will depend on data hosted in a variety of databases and data sources.

Can I create Dynamic XML documents from SQL Data?

Yes, Virtuoso implements SQL extensions that enable the results of standard SQL queries to be transformed into XML documents, which are openly accessible to user agents such as Web Browsers via HTTP and/or WebDAV. It is important to note that these SQL-XML documents are accessible by URL and sensitive to the underlying changes that occur in the underlying database tables from which they have been derived.

Can Dynamic XML documents be built using SQL data from different databases?

Yes, the SQL to XML functionality sits above Virtuoso's virtual database (VDB) functionality, which enables a unified logical and physical representation of database tables and views that reside in disparate database engines from different database vendors. The only requirement is that these databases have to be ODBC- or JDBC-accessible - implying the existence of data-access drivers for these databases.

Which databases are supported by Virtuoso's SQL to XML functionality?

It supports any database that supports ODBC and/or JDBC, so pretty much every database.

Why is creation of XML documents from SQL important?

A lot of data that you would typically like to use as the foundation of your web application initiatives more than likely resides in application databases that are predominantly SQL-based.

What is the core web services protocol support?

SOAP 1.1, 1.2, WS-Security, WS-Trust and WS-Policy.

What WS-Security features are supported?

Virtuoso support of Web Services Security includes enabling the use of symmetric and asymmetric encryption, digital signing, and identity authentication as defined by the WS-Security specification.

How does one test web services under Virtuoso?

Published SOAP Services can be tested through VSMX which is a Virtuoso-generated test page allowing you to test SOAP Services. The SOAP Services can be referenced by the Server instances URL and the virtual directory containing the logical path of the services.

How does one publish existing business logic via Web Services?

Business logic defined in stored procedures and functions can be exposed as SOAP services whether they are native within Virtuoso or from remote data sources. These stored procedure can be published by linking the selected stored procedure through the Remote Procedure Interface, then creating a new or selecting an existing Virtual Directory and finally publishing to the Virtual Directory using the publish function in the Virtual Director User Interface.

What is the proven interoperability of the Virtuoso Web Services implementation as compared to other vendors?

Virtuoso participates in interoperability tests rounds for different implementations of the SOAP and WSDL specifications defined by the SOAP Builders and WS-RM Interop communities. The Virtuoso SOAP server and client implements SOAP protocol versions 1.1 and 1.2. The SOAP services hosted in Virtuoso can be described with an automatically generated WSDL document or with a user-supplied WSDL document. The Virtuoso SOAP Server and client with protocol support version 1.1 and automatically generated WSDL documents have been tested. Interop tests are organized into rounds and each round consists of the following groups: (List of service endpoints and client results is available in html form or via SOAP).

SOAP Interop Round 1 - Base interop tests are superseded by Round 2 Base test
SOAP Interop Round 2 - SOAP interop base, GroupB and GroupC
SOAP Interop Round 3 - WSDL interop including EmptySA, Import1-3, Compound1 &2, DocLit, DocLit Parameters and RpcEncoded Tests
SOAP Interop Round 4 - Contains DIME/SwA, Fault message processing, WSDL/XSD
WS-RM - WS-ReliableMessaging tests to test messaging over SOAP

Virtuoso's SOAP server and client with protocol support for version 1.2 have been tested with test cases as per W3C SOAP Version 1.2 Specification Assertions and Test Collection. Rounds 1-4 tests are located on Virtuoso Interop and through our SOAP on-line tutorials. The WS-RM interop tests can be experienced through our on-line WS-RM tutorials.

Process Management & Integration (BPEL)

What is BPEL?

BPEL4WS (BPEL for short) is an XML vocabulary for orchestrating SOAP and WSDL-compliant Web Services. It is the critical standard for creating composite processes from a collection of Web Services using the principles of Service Oriented Architecture (SOA).

What is Virtuoso BPEL Process Manager?

Virtuoso BPEL Process Manager (VBPM) engine is a run-time and administration environment for executing processes based on BPEL4WS 1.1 (BPEL for Web Services), the latest version of a specification designed by Microsoft, IBM, BEA Systems and Siebel Systems. The software vendors are shepherding the spec through e-business standards body OASIS.

BPEL engine and Process

Is the BPEL engine and Process Manager compliant with Web Service standards?

Virtuoso provides a number of WS protocols on the WS-I standards including Security (WS-Security) and Reliability (WS-Reliable Messaging). Virtuoso BPEL engine also includes a number of Web Services protocols that add security, reliability and Enterprise scalability.

XML
UDDI
SOAP
HTTP
BPEL4WS
WSDL
WS-Security
WS-Routing
WS-Reliable Messaging
WS-Policy Attachment
WS-Policy Assertions
WS-Policy
WS-Addressing
XML Signature
XML Encryption

How is support for WS-Reliable Messaging integrated with VBPM?

Processes in can be deployed for any business-critical transactions over the Internet using the WS-Reliable Messaging (WS-RM) specification through guaranteed (at-most-once, at-least-once, and exactly once) messaging for any partner via BPEL Process Manager.

Can I use my existing BPEL files with the BPEL Process Manager?

Yes. Virtuoso BPEL Process manager fully implements the BPEL specification and can deploy any BPEL document created using any BPEL modeling tool that supports this specification.

How do I monitor process activity?

The BPEL Process Manager includes a web-based user interface for testing, debugging, and monitoring deployed business process.

How are problems with long-running transactions handled?

The BPEL Process manager includes support for compensation, fault-handling and event handling.

How do I test my business processes for all of the errors?

The BPEL Process Manager product includes a debugger to test business processes in the Debug Message Queue, which includes the state of a given process, actions for the process and list of instances for the process.

What does the BPEL Process Manager system tell me about my executing processes?

The BPEL Process Manager provides real-time status for all running processes and transactions on any BPEL processes deployed on the server including Information on bpel source, wsdl and partner links.

What information is provided on process?

The BPEL Process manager provides statistics and reporting details on process and instances which can be so that processes can be analyzed and later optimized for deployment.

What other integration does the BPEL Process Manager offer?

The BPEL Process manager has extensive database-, XML- and Web Service integration and provides for intelligent transformation for XML and non-XML applications and data. Virtuoso allows integration of databases into Service-Oriented Architectures (SOAs), by automating the creation of Web Services from multiple tables in any ODBC-, JDBC- or ADO.Net-compliant database.

What about support for .NET or Java?

Through the BPEL process manager any BPEL processes can directly invoke local hosted Java or .NET logic as well as access SQL data resident in local and remote tables.

What about Interoperability?

Virtuoso BPEL has gone through an extensive interoperability testing against Microsoft, Oracle, and other vendors' implementations. Virtuoso also offers an interoperability site for testing and sharing results, which validate the Virtuoso BPEL engine, and ensures the rapid orchestration of existing Web services.

Are there any special requirement to use Virtuoso BPEL?

The BPEL engine requires at least Virtuoso Universal Server 4.5 and the BPEL Process Manager, which is compatible with any browser on any platform.

Collaboration and Network Effects FAQ

SMTP & POP3 and NNTP

Mail

What Mail Services does Virtuoso provide?

Virtuoso includes an SMTP sink/drive that enables any SMTP-compliant Mail Transfer Agent (MTA, such as sendmail, exim) to write its mail into a Virtuoso database. Email data stored in Virtuoso is retrievable via the POP3 protocol with IMAP4 support planned for a later release. SMTP and POP3 are also available as Virtuoso PL functions enabling the creation of sophisticated mail-oriented applications.

Can I store e-mail received by a mail server in Virtuoso?

Yes, Virtuoso provides e-mail storage drivers for popular SMTP servers such as sendmail and Windows SMTP. This means that you can configure your mail server such that it stores e-mails inside Virtuoso rather than in the OS.

How do I interact with my e-mails if stored within Virtuoso?

There are two ways of doing this: the first is use Virtuoso as your POP3 server, and then let your POP3-based mail client interact directly with Virtuoso. Secondly, you can move mail from within the Virtuoso SQL repository into the WebDAV repository for direct access.

What about Spam-filtering and Mail Services?

Virtuoso enables you to combine the functionality of a mail system and database so that you can custom develop spam filters or customize Virtuoso (via its server extensions kit) to work with 3rd party Spam filter tools. Virtuoso can enable spam-filtering to occur immediately after mails are deposited in your mailbox, and prior to POP3 mail retrieval.

Enterprise Data Management

News

Can I use Virtuoso to host newsgroups?

Yes, Virtuoso implements NNTP (Network News Transport Protocol) allowing a very simple interface for creating your own newsgroups.

Can I use Virtuoso to communicate with other news servers?

Yes, you can use Virtuoso to attach to newsgroups hosted on any NNTP Server.

Data-access and Security

What security is built into Virtuoso?

For ODBC/JDBC Access:

Standard SQL-92 GRANT/REVOKE statements for DB users.
Access control lists for incoming clients based on source IP.
Each incoming connection can be passed to a hook function (a Virtuoso/PL stored procedure) for custom security measures.
SSL TLS is available for both ODBC and JDBC clients
x509 certificates for both ODBC and JDBC
For HTTP/DAV/SOAP Access
By default, requests for resources contained within the DAV repository are checked using HTTP/1.1 Digest authentication, using credentials stored in system users tables.
A UNIX-like permission mechanism is used to control access to contents of DAV repository: these permissions apply to users and groups of users.
BASIC and HTTP/1.1 Digest authentication.
Custom authentication methods may be implemented via an authentication hook API.

Database:

Role-Based Security

Web Services:

WS-Security - enabling the use of symmetric and asymmetric encryption, digital signing, and identity authentication as defined by the WS-Security specification. Visit the online tutorials.

Object- General

How big is a typical Virtuoso installation?

Approximately 300MB. If you exclude tutorial materials, sample applications, demos, and documentation the installation is approximately 100MB.

How much memory does Virtuoso require?

Nothing special, it can run within an 64MB environment if this is all you have, but it can also take advantage of large system memory providing you with the ability configure Virtuoso as an in-memory database if you so desire.

What operating systems does Virtuoso support?

Virtuoso currently supports Windows (XP/2K/2003), Linux (Redhat,Suse) Mac OS X, FreeBSD, Solaris, and other UNIX (32- & 64-bit platforms).

How is Virtuoso Commercially Packaged?

Virtuoso comes in two distinct formats: Standard Edition and Enterprise Edition. The Enterprise Edition includes out of the box bindings for third-party distributed transaction-management environments such as MTS/COM+ and J2EE.

Does your product have an API and/or an SDK?

Virtuoso clients connect to Virtuoso via standard APIs and protocols. The APIs includes ODBC, JDBC, a .NET data provider and OLE/DB. Client libraries are provided for all these. Protocols include SOAP, most WS* protocols, DAV, XML for Analysis (XML/A) and others. Common operating systems and development environments offer clients for these protocols.

For adding application logic server side, one may use Virtuoso/PL, Java, .NET bound languages or C. Examples of all these are included in the package.

Can Virtuoso be configured to make use of multiple processors

A single query runs on single thread (plus I/O in separate thread) in Virtuoso. Without waiting for disk, it's faster to run it in a single thread than to parallelize and get synchronization overheads . While waiting for disk access, CPU times does not matter as the idle CPUs are waiting for disk reads.

The key requirement to an RDBMS database is its ability to serve hundreds clients in parallel. Having 4 or more queries from 4 or more clients, all cores of the machine will have something to run.

When we complete the Virtuoso cluster version you will be able to parallelize a query between boxes, in that case many boxes may simultaneously run many threads per query; but any given box of the cluster will run no more than one thread per query at time. The cluster wins in speed because the total RAM is bigger, not because multi-threading per query is so useful.

Linked Data

What is Linked Data? How does Linked Data benefit me or my enterprise?

See detailed information here.

Virtuoso 6 FAQ

What is the storage cost per triple?

This depends on the index scheme. If indexed 2 ways, assuming that the graph will always be stated in queries, this is 31 bytes.

With 4 indices, supporting queries where the graph can be left unspecified (i.e., triples from any graph will be considered in query evaluation), this is 39 bytes. The numbers are measured with the LUBM validation data set of 121K triples, with no full-text index on literals.

With 4 indices and a full text index on all literals, the Billion Triples Challenge data set, 1115M triples, is about 120 GB of database pages. The database file size is larger due to space in reserve and other factors. 120 GB is the number to use when assessing RAM-to-disk ratio, i.e., how much RAM the system ought to have in order to provide good response. This data set is a heterogeneous collection including social network data, conversations harvested from the Web, DBpedia, Freebase, etc., with relatively numerous and long text literals.

The numbers do not involve any database page stream compression such as gzip. Using such compression does not save in terms of RAM because cached pages must be kept uncompressed but will cut the disk usage to about half.

What is the cost to insert a triple (for the insertion itself, as well as for updating any indices)?

The more triples are inserted at a time, the faster this goes. Also, the more concurrent triple insertions are going on, the better the throughput. When loading data such as the US Census, a cluster of 2 commodity servers can insert up to 100,000 triples per second.

A single 4-core machine can load 1 billion triples of LUBM data at an average rate of 36K triples per second. This is limited by disk.

What is the cost to delete a triple (for the deletion itself, as well as for updating any indices)?

The delete cost is similar to insert cost.

What is the cost to search on a given property?

If we are looking for equality matches, a single 2GHz core can do about 250,000 single triple random lookups per second as long as disk reads are not involved. If each triple requires a disk seek the number is naturally lower.

Parallelism depends on the query. With a query like counting all x and y such that x knows y and y knows x, we get up to 3.4 million single-triple lookups-per-second on a cluster of 2 8-core Xeon servers. With complex nested sub-queries the parallelism may be less.

Lookups involving ranges of values, such as ranges of geographical coordinates or dates use an index, since quads are indexed in a manner that collates in the natural order of the data type.

What data types are supported?

Virtuoso supports all RDF data types, including language-tagged and XML schema typed strings as native data types. Thus there is no overhead converting between RDF data types and types supported by the underlying DBMS.

What inferencing is supported?

Subclass, subproperty, identity by inverse-functional properties, and owl:sameAs are processed at run time if an inference context option is specified in the query.

There is a general-purpose transitivity feature that can be used for a wide variety of graph algorithms. For example:

SELECT ?friend
WHERE
{
<alice> foaf:knows ?friend option (transitive)
}

would return all the people directly or indirectly known by <alice>.

Is the inferencing dynamic, or is an extra step required before inferencing can be used?

The mentioned types of inferencing are enabled by a switch in the query and are done at run-time, with no step for materialization of entailed triples needed. The pattern:

{<thing> a ?class}

will, if the match of ?class has superclasses, also return the superclasses even though the superclass membership is not physically stored for each superclass.

Of course, one can always materialize entailed triples by running SPARQL/SPARUL statements to explicitly add any implied information.

If two subjects have the same inverse functional property with the same value, they will be considered the same. For example, if two people have the same email address, they will be considered the same.

If two subjects are declared to be owl:sameAs, either directly or through a chain of x owl:sameAs y, y owl:sameAs z, and so on, they will be considered the same.

These features can be individually enabled and disabled. They all have some run time cost, hence they are optional. The advantage is that no preprocessing of the data itself is needed before querying, and the data does not get bigger. This is important, especially if the database is very large and queries touch only small parts of it. In such cases, materializing implied triples can be very costly. See discussion at E Pluribus Unum ...

Do you support full-text search?

Virtuoso has an optional full-text index on RDF literals. Searching for text matches using the SPARQL regex feature is very inefficient in the best of cases. This is why Virtuoso offers a special bif:contains predicate similar to the SQL contains predicate of many relational databases. This supports a full-text query language with proximity, and/or/and-not, wildcards, etc.

While the full-text index is a general-purpose SQL feature in Virtuoso, there is extra RDF-specific intelligence built into it. One can, for example, specify which properties are indexed, and within which graphs this applies.

What programming interfaces are supported? Do you support standard SPARQL protocol?

Virtuoso supports the standard SPARQL protocol.

Virtuoso offers drivers for the Jena, Sesame, and Redland frameworks. These allow using Virtuoso's store and SPARQL implementation as the back end of Jena, Sesame, or Redland applications. Virtuoso will then do the query optimization and execution. Jena and Sesame drivers come standard; contact us about Redland.

Virtuoso SPARQL can be used through any SQL call level interface (CLI) supported by Virtuoso (i.e., ODBC, JDBC, OLE-DB, ADO.NET, XMLA). All have suitable extensions for RDF specific data types such as IRIs and typed literals. In this way, one can write, for example, PHP web pages with SPARQL queries embedded, just using the SQL tools. Prefixing a SQL query with the keyword "sparql" will invoke SPARQL instead of SQL, through any SQL client API.

How can data be partitioned across multiple servers?

Virtuoso Cluster partitions each index of all tables containing RDF data separately. The partitioning is by hash. The result is that the data is evenly distributed over the selected number of servers. Immediately consecutive triples are generally in the same partition, since the low bits of IDs do not enter in into the partition hash. This means that key compression works well.

Since RDF tables are in the end just SQL tables, SQL can be used for specifying a non-standard partitioning scheme. For example, one could dedicate one set of servers for one index, and another set for another index. Special cases might justify doing this.

With very large deployments, using a degree of application-specific data structures may be advisable. See "Does Virtuoso support property tables" below.

How many triples can a single server handle?

With free-form data and text indexing enabled, 500M triples per 16G RAM can be a ballpark guideline. If the triples are very short and repetitive, like the LUBM test data, then 16G per one billion triples is a possibility. Much depends on the expected query load. If queries are simple lookups, then less memory per billion triples is needed. If queries will be complex (analytics, join sequences, and aggregations all over the data set), then relatively more RAM is necessary for good performance.

The count of quads has little impact on performance as long as the working set fits in memory. If the working set is in memory, there may be 15-20% difference between a million and a billion triples. If the database must frequently go to disk, this degrades performance since one can easily do 2000 random accesses in memory in the time it takes to do one random access from disk. But working-set characteristics depend entirely on the application.

Whether the quads in a store all belong to one graph or any number of graphs makes no difference. There are Virtuoso instances in regular online use with hundreds of millions of triples, such as DBpedia database for example.

What is the performance impact of going from the billion to the trillion triples?

Performance dynamics change when going from a single server to a cluster. If each partition is around a billion triples in size, then the single triple lookup takes the same time, but there is cluster interconnect latency added to the mix.

On the other hand, queries that touch multiple partitions or multiple triples in a partition will do this in parallel and usually with a single message per partition. Thus throughput is higher.

In general terms, operations on a single triple at a time from a single thread are penalized and operations on hundreds or more triples at a time win. Multiuser throughput is generally better due to more cores and more memory, and latency is absorbed by having large numbers of concurrent requests.

See Post about Virtuoso & SPARQL scalability.

Do you support additional metadata for triples, such as time-stamps, security tags etc?

Since quads (triple plus graph) are stored in a regular SQL table with special data types, changing the table layout to add a column is possible. This column would not however be visible to SPARQL without some extra tuning. For coarse grain provenance and security information, we recommend doing this at the graph level, where triples that belong together are tagged with the same provenance or security are in the same graph. The graph can then have the relevant metadata as its properties.

If tagging at the single triple level is needed, this will most often not be needed for all triples. Hence altering the table for all triples may not be the best choice. Making a special table that has the graph, subject, predicate and object of the tagged triple as a key and the tag data as a dependent part may be more efficient. Also, this table could be more easily accessed from SPARQL.

Using the RDF reification vocabulary is not recommended as a first choice but is possible without any alterations.

Alterations of this nature are possible but we recommend contacting us for specifics. We can provide consultancy on the best way to do this for each application. Altering the storage layout without some extra support from us is not recommended.

Should we use RDF for our large metadata store? What are the alternatives?

If the application has high heterogeneity of schema and frequent need for adaptation, then RDF is recommended. The alternative is making a relational database.

Making a custom non-RDF object-attribute-value representation on Virtuoso or some other RDBMS is possible but not recommended.

The reason for this is that this would miss many of the optimizations made specifically for RDF, use of the SPARQL language, inference, compatibility with diverse browsers and front end tools, etc. Not to mention interoperability and joinability with the body of linked data. Even if the application is strictly private, using entity names and ontologies from the open world can still have advantages.

If some customization to the quad (triple plus graph layout) is needed, we can provide consultancy on how to do this while staying within the general RDF framework and retaining all the interoperability benefits.

How multithreaded is Virtuoso?

All server and client components are multithreaded, using pthreads on Unix/Linux, Windows native on Windows. Multithread/multicore scalability is good; see BSBM

In the case of Virtuoso Cluster, in order to have the maximum number of threads on a single query, we recommend that each server on the cluster be running one Virtuoso process per 1.2 cores.

Can multiple servers run off a single shared disk database?

This might be possible with some customization but this is not our preferred way. Instead, we can store selected indices in duplicate or more copies inside a clustered database. In this way, all servers can have their own disk. Each key of each index will belong to one partition but each partition will have more than one physical copy, each on a different server. The cluster query logic will perform the load balancing. On the update side, the cluster will automatically do a distributed transaction with two phase commit to keep the duplicates in sync.

Can Virtuoso run on a SAN?

Yes. Unlike Oracle RAC, for example, Virtuoso Cluster does not require a SAN. Each server has its own database files and is solely responsible for these. In this way, having shared disk among all servers is not required. Running on a SAN may still be desirable for administration reasons. If using a SAN, the connection to the SAN should be high performance, such as Infiniband.

How does Virtuoso join across partitions?

Partitioning is entirely transparent to the application. Virtuoso has a highly optimized message-flow between cluster nodes that combines operations into large batches and evaluates conditions close to the data. See Post about Virtuoso & Web Scale RDF.

Does Virtuoso support federated triple stores? If there are multiple SPARQL end points, can Virtuoso be used to do queries joining between these?

This is a planned extension. The logic for optimizing message flow between multiple end-points on a wide-area network is similar to the logic for message-optimization on a cluster. This will allow submitting a query with a list of end-points. The query will then consider triples from each of the end points, as if the content of all the end points were in a single store.

End-point meta information, such as voiD descriptions of the graphs in the end-points, may be used to avoid sending queries to end points that are known not to have a certain type of data.

How many servers can a cluster contain?

There is no fixed limit. If you have a large cluster installed, you can try Virtuoso there. Having an even point-to-point latency is desirable.

How do I reconfigure a cluster, adding and removing machines, etc?

We are working on a system whereby servers can be added and removed from a cluster during operation and no repartitioning of the data is needed.

In the first release, the number of server processes that make up the cluster is set when creating the database. These processes with their database files can then be moved between machines but this requires stopping the cluster and updating configuration files.

How will Virtuoso handle regional clusters?

Performance of a cluster depends on the latency and bandwidth of the interconnect. At least dual 1Gbit ethernet is recommended for each node. Thus a cluster should be on a single local or system area network.

If regional copies are needed, we would replicate between clusters by asynchronous log shipping. This requires some custom engineering.

When a transaction is committed at one site, it is logged and sent to the subscribing sites if they are online. If there is no connection, the subscribing sites will get the data from the log. This scheme now works between single Virtuoso servers, and needs some custom development to be adapted to clusters.

If replicating all the data of one site to another site is not possible, then application logic should be involved. Also, if consolidated queries should be made against large, geographically-separated clusters, then it is best to query them separately and merge the results in the application. All depends on the application level rules on where data resides.

Is there a mechanism for terminating long running queries?

Virtuoso SPARQL and SQL offer an "anytime" option that will return partial results after a configurable timeout.

In this way, queries will return in a predictable time and indicate whether the results are complete or not, as well as give a summary of resource utilization.

This is especially useful for publishing a SPARQL endpoint where a single long running query could impact the performance of the whole system. This timeout significantly reduces the risk of denial of service.

This is also more user-friendly than simply timing-out a query after a set period and returning an error. With the anytime option, the user gets a feel for what data may exist, including whether any data exists at all. This feature works with arbitrarily complex queries, including aggregation, GROUP BY, ORDER BY, transitivity, etc.

Since the Virtuoso SPARQL endpoint supports open authentication (OAuth), the authentication can be used for setting timeouts, so as to give different service to different users.

It is also possible to set a timeout that will simply abort a query or an update transaction if it fails to terminate in a set time.

Disconnecting the client from the server will also terminate any processing on behalf of that client, regardless of timeout settings.

The SQL client call-level interfaces (ODBC, JDBC, OLE-DB, ADO.NET, XMLA) each support a cancel call that can terminate a long running query from the application, without needing to disconnect.

Can the user be asynchronously notified when a long running query terminates?

There is no off-the-shelf API for this but making an adaptation of the SPARQL endpoint that could proceed after the client disconnected and, for example, could send results by email is trivial. Since SOAP and REST Web services can be programmed directly in Virtuoso's stored procedure language, implementing and exposing this type of application logic is easy.

How many concurrent queries can Virtuoso handle?

There is no set limit. As with any DBMS, response times get longer if there is severe congestion.

For example, having 2 or 3 concurrent queries per core is a good performance point which will keep all parts of the system busy. Having more than this is possible but will not increase overall throughput.

With a cluster, each server has both HTTP and SQL listeners, so clients can be evenly spread across all nodes. In a heavy traffic Web application, it is best to have a load balancer in front of the HTTP endpoints to divide the connections among the servers and to keep some cap on the number of concurrently running requests, enforcing a maximum request-rate per client IP address, etc.

What is the relative performance of SPARQL queries vs native relational queries?

This is application dependent. In Virtuoso, SPARQL and SQL share the same query execution engine, query optimizer, and cost model. If data is highly regular (i.e., a good fit for relational representation), and if queries typically access most of the row, then SQL will be more efficient. If queries are unpredictable, data is ragged, schema changes frequent, or inference is needed, then RDF will do relatively better.

The recent Berlin SPARQL Benchmark shows some figures comparing Virtuoso SQL and SPARQL and SPARQL in front of relational representation. However, the test workload is heavily biased in favor of relational. See also BSBM: MySQL vs Virtuoso.

With the TPC-H workload, relationally stored data, and SPARQL mapped to SQL, we find that with about half the queries there is no significant cost to SPARQL. With some queries there is additional overhead because the mapping does not produce a SQL query identical to that specified in the benchmark.

Does Virtuoso Support Property Tables?

For large applications, we would recommend RDF whenever there is significant variability of schema, but would still use an application-specific, relational style representation for those parts of the data that are regular in format. This is possible without loss of flexibility for the variable-schema part. However, this will introduce relational-style restrictions on the regular data; for example, a person could only have a single date-of-birth by design. In many cases, such restrictions are quite acceptable. Querying will still take place in SPARQL, and the representation will be transparent.

A relational table where the primary key is the RDF subject and where columns represent single-valued properties is usually called a property table. These can be defined in a manner similar to defining RDF mappings of relational tables.

What performance metrics does Virtuoso offer?

There is an extensive array of performance metrics. This includes:

Cluster status summary with thread counts, CPU utilization, interconnect traffic, clean and dirty cache pages, virtual memory swapping warning, etc. This is either a cluster total or a total with breakdown per cluster node.
Disk access, lock contention, general concurrency, and access count per index
Statistics on memory usage for disk caching index-by-index, cache replacement statistics, disk random and sequential read times
Count of random, sequential index access, disk access, lock contention, cluster interconnect traffic per query/client
Detailed query-execution plans are available through the explain function

What support do you provide for concurrent/multithreaded operation? Is your interface thread-safe?

All client interfaces and server-side processes are multithreaded. As usual, each thread of an application should use a different connection to the database.

What level of ACID properties is supported?

Virtuoso supports all 4 isolation levels from dirty read to serializable, for both relational and RDF data.

The recommended default isolation is read-committed, which offers a clean historical read of data that has uncommitted updates. This mode is similar to the Oracle default isolation and guarantees that no uncommitted data is seen, and that no read will block waiting for a lock held by another client.

There is transaction logging and roll forward recovery, with two phase commit used in Virtuoso Cluster if an update transaction modifies more than one server.

For RDF workloads which typically are not transactional and have large bulk loads, we recommend running in a "row autocommit" mode without transaction logging. This virtually eliminates log contention but still guarantees consistent results of multithreaded bulk loads.

Setting this up requires some consultancy and custom development but is well worthwhile for large projects.

Do you provide the ability to atomically add a set of triples, where either all are added or none are added?

Yes. Doing this with millions of triples per transaction may run out of rollback space. Also, there is risk of deadlock if multiple such inserts run at the same time. For good concurrency, the inserts should be of moderate size. As usual, deadlocks are resolved by aborting one of the conflicting transactions.

Do you provide the ability to add a set of triples, respecting the isolation property (so concurrent accessors either see none of the triple values, or all of them)?

Yes. The reading client should specify serializable isolation and the inserting client should perform the insert as a transaction, no row autocommit mode.

What is the time to start a database, create/open a graph?

Starting a Virtuoso server process takes a few seconds. Making a new graph takes no time beyond the time to insert the triples into it. Once the server process(es) are running, all the data is online.

With high-traffic applications, reaching cruising speed may sometimes take a long time, specially if the load is random-access intensive. Filling gigabytes of RAM with cached disk pages takes a long time if done a page at a time. To alleviate this, Virtuoso pre-reads 2MB-sized extents instead of single pages if there is repeated access to the same extent within a short time. Thus cache warm-up times are shortened.

What sort of security features are built into Virtuoso?

For SQL, we have the standard role-based security and an Oracle-style row-level security (policy) feature.

For SPARQL, users may have read or update roles at the level of the quad store.

With RDF, a graph may be owned by a user. The user may specify read and write privileges on the graph. These are then enforced for SPARUL (the SPARQL update language) and SPARQL.

When an RDF graph is based on relationally stored data in Virtuoso or another RDBMS through Virtuoso's SQL federation feature (i.e., if the graph is an RDF View of underlying SQL data), then all relational security controls apply.

Further, due to the dual-nature of Virtuoso, sophisticated ontology-based security models are feasible. Such models are not currently used by default, but they are achievable with our consultancy.