Ask a question?
  • Contact Us
  • Support Case System
  • Support Forum

Contents

  1. Intro
  2. Benchmark Datasets
  3. Benchmark Machine
  4. Benchmark Results for the Explore Use Case
    1. BigData
    2. BigOwlim
    3. TDB
    4. Virtuoso 6 & 7
  5. Benchmark Results for the BI use case
    1. BigData
    2. BigOwlim
    3. TDB
    4. Virtuoso 6 & 7
  6. Benchmark Results for the Cluster edition
    1. BigOwlim
    2. Virtuoso7
  7. Store Comparison
  8. Thanks


Document Version: 0.9
Publication Date: 04/22/2013


 

1. Introduction

The Berlin SPARQL Benchmark (BSBM) is a benchmark for comparing the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and consumers have posted reviews about products.

We note that the data and query generator used here are using an updated version of the original BSBM (http://sf.net/projects/bibm) which provides several modifications in the test driver and the data generator. These changes have been adopted in the official V3.1 BSBM benchmark definition. The changes are as follows:

  • The test driver reports more and more detailed metrics including "power" and "throughput" scores.
  • The test driver has a drill down mode that starts at a broad product category, and then zooms in subsequent queries into smaller categories. Previously, the product category query parameter was picked randomly for each query; if this was a broad category, the query would be very slow;  if it is a very specific category it would run very fast. This made it hard to compare individual query runs; and also introduced large variation in the overall result metric. The drill down mode makes it more stable and also tests a query pattern (drill down) that is common in practice.
  • One query (i.e., BI Q6) was removed that returned a quadratic result. This query would become very expensive in the 1G and larger tests, so its performance would dominate the result.
  • The text data in the generated strings is more realistic. This means you can do (more) sensible keyword queries on it.
  • The new generator was adapted to enable parallel data generation. Specifically, one can let it generate a subset of the data files. By starting multiple data generators on multiple machines one can thus hand-parallelize data generation. This is quite handy for the larger-size tests, which literally otherwise takes weeks.


As the original BSBM benchmark, the test driver can run with single-user run or multi-user run.

This document presents the results of a April 2013 BSBM experiment in which the Berlin SPARQL Benchmark Version 3.1 was used to measure the performance of:

  • BigData (rev. 6528 as of  July 02 2012)
  • BigOwlim (version 5.2.5524)   +   BigOwlim (version 5.3.5777)  for the cluster edition
  • TDB (version 0.9.4)
  • Virtuoso (06.04.3132-pthreads for Linux as of May 14 2012)
  • Virtuoso (07.00.3202-pthreads for Linux as of Jan  1 2013)

The stores were benchmarked with datasets of by up to 150 billion triples. Details about the dataset sizes are shown as in following table.


Single Machine
Cluster
Use Cases
Explore
BI
Explore & BI
Datasets (million triples)
100, 200, 1000
10, 100, 1000
10000, 50000, 150000

These results extend the state-of-the-art in various dimensions:
  • scale: this is the first time that RDF store benchmark results on such a large size have been published. The previous published BSBM results published were on 200M triples, the 150B experiments thus mark a 750x increase in scale.
  • workload: this is the first time that results on the Business Intelligence (BI) workload are published. In contrast to the Explore workload, which features short-running "transactional" queries, the BI workload consists of queries that go through possibly billions of tuples, grouping and aggregating them (using the respective functionality, new in SPARQL1.1). In contrast to one year ago, we find that now the majority of the RDF stores is able to run the BI workload.
  • architecture: this is the first time that RDF store technology with cluster functionality has been publicly benchmarked. These experiments include tests using the Virtuoso7 Cluster Edition as well as the BIGOWLIM 5.3 cluster edition.


 

2. Benchmark Datasets

We ran the benchmark using the Triple version of the BSBM dataset. The benchmark was run for different dataset sizes. The datasets were generated using the BIBM data generator and fulfill the characteristics described in section the BSBM specification.

Details about the benchmark datasets are summarized in the following table:

 

Number of Triples
10M 100M 200M 1B 10B 50B 150B
Number of Products 28480 284800 569600 2848000 28480000 142400000 427200000
Number of Producers 559 5623 11232 56288 563142 2815554 8446788
Number of Product Features 19180 47531 93876 167836 423832 796470 1593390
Number of Product Types 585 2011 3949 7021 22527 42129 84259
Number of Vendors 284 2838 5675 28439 284610 1421729 4264028
Number of Offers 569600 5696000 11392000 56960000 569600000 2848000000 8544000000
Number of Reviewers 14613 145961 291923 1459584 14599162 72989573 218974622
Number of Reviews 284800 2848000 5696000 28480000 284800000 1424000000 4272000000
Exact Total Number of Triples* 10119864 100062249 199945456 999700717 9967546016 49853640808 149513009920
File Size Turtle (unzipped) 467 MB 4.6 GB 9.2 GB 48 GB 568 GB 2.8 TB 8.6 TB

(*: As datasets 10B, 50B, 150B are generated in parallel in 8 machines, the number of triples is computed approximately by multiplying the number of triples generated in one machine with 8)

Note: All datasets were generated with the -fc option for forward chaining.

The BSBM dataset generator and test driver can be downloaded from SourceForge.

The RDF representation of the benchmark datasets can be generated in the following way:

To generate the 100M dataset as Turtle file type the following command in the BSBM directory:

./generate -fc -s ttl -fn dataset_100M -pc 284826 -pareto


To generate the 150B dataset in 1000 Turtle files in multiple machines (e.g., 8 machines, each machine has 125 files),
type the following command in the BSBM directory:

./generate -fc -s ttl -fn dataset150000m -pc 427200000 -nof 1000 -nom 8 -mId <machineID> -pareto

(The <machineID> will be 1, 2, 3, …, 8 according to which machine in 8 machines that the command is run)


Variations:

* generate N-Triples instead of Turtle:

use -s nt instead of -s ttl

* generate update dataset for the Explore and Update use case:

add -ud

* Generate multiple files instead of one, for example 100 files:

add -nof 100

* Write test driver data to a different directory (default is td_data), for example for the 100M dataset:

add -dir td_data_100M
 

 

3. Benchmark Machine

We used CWI Scilens (www.scilens.org) cluster for the benchmark experiment. This cluster is designed for high I/O bandwidth, and consists of multiple layers of machines. In order to get large amounts of RAM, we used only the “bricks” layer, which contains its most powerful machines. The machines were connected by Mellanox MCX353A-QCBT ConnectX3 VPI HCA card (QDR IB 40Gb/s and 10GigE) through an InfiniScale IV QDR InfiniBand Switch (Mellanox MIS5025Q). Each machine has the following specification.

  • Hardware: (8 machines)
    • Processors: 2 x Intel(R) Xeon(R) CPU E5-2650, 2.00GHz (8 cores & hyperthreading), Sandy Bridge architecture
    • Memory: 256GB
    • Hard Disks: 3 x 1.8TB (7,200 rpm) SATA in RAID 0 (180MB/s sequential throughput).
  • Software:
    • Operating System: Linux version 3.3.4-3.fc16.x86_64
      • Filesystem: ext4
    • Java Version and JVM: Version 1.6.0_31, 64-Bit Server VM (build 20.6-b01).
    • BSBM generator and test driver version: bibm-0.7.8

The total cost of this configuration was EUR 70,000; when acquired in 2012.


 

4. Benchmark Results for the Explore Use Case

This section reports the results of running the Explore use case of the BSBM benchmark against:

  • BigData (rev. 6528)
  • BigOwlim (version 5.2.5524)
  • TDB (version 0.9.4)
  • Virtuoso6 (06.04.3132-pthreads for Linux as of May 14 2012)
  • Virtuoso7 (07.00.3202-pthreads for Linux as of Jan  1 2013)

Test Procedure

The load performance of the systems was measured by loading the Turtle representation of the BSBM datasets into the triple stores. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes  against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a large number of warm-up runs  are executed before actual single-client test runs (as a ramp-up period). Drill down mode is used for all tests. 

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, (optional: clear OS caches and swap), restart store.
  3. Execute single-client test run (500 mixes performance measurement, randomizer seed: 9834533) with 2000 warm-up runs.
    ./testdriver -seed 9834533 –w 2000 –runs 500 –drill –o result_single.xml http://sparql-endpoint
    
  4. Execute multi-client runs (4, 8 and 64 clients; randomizer seeds: 8188326, 9175932 and 4187411). For each run add two times the number of clients of warm up query mixes.
    For example for a run with 4 clients execute:

    ./testdriver -seed 8188326 -w 8 -mt 4 -drill -o results_4clients.xml http://sparql-endpoint

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

An overview of load times for SUTs and the different datasets are given in the following table (in hh:min:sec):

SUT
10M
100M 200M 1B
BigData
00:2:39 00:25:35 00:59:25 -
BigOwlim
00:2:31 00:22:47 00:47:19 4:9:39
TDB
00:9:41 1:37:55 3:34:59 -
Virtuoso6
00:7:06 00:19:26 00:31:30 1:10:30
Virtuoso7
-
00:03:09 -
00:27:11

* The dataset was split into 1, 10, 20, 100 Turtle files respectively .
-  We do not test/load this dataset with the SUT

4.1 BigData


  BigData homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:

  • BigData: Version rev. 6528
Copy the bibm3 into bigdata-perf directory. 
For loading and starting the server the ANT script in the directory "bigdata-perf/bibm3" was used.


4.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
00:25:35 00:59:25



4.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100M 200M
Query 1 49.955
49.520
Query 2 42.769
43.713
Query 3 37.280
38.355
Query 4 36.846
36.830
Query 5 2.684
1.799
Query 7 16.172
16.548
Query 8 37.498
38.721
Query 9 59.524
61.476
Query 10 41.326
42.427
Query 11 62.375
63.784
Query 12 50.989
52.094

 4.1.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 12512.278
17949.632
19574.007
20422.626
200M 10059.940 9762.856
11572.433
12935.595


4.1.5 Result Summaries



4.2 BigOwlim


Owlim homepage

4.2.1 Configuration

The following changes were made to the default configuration of the software:

  • BigOwlim: Version 5.2.5524
  • Tomcat: Version 7.0.30
Modified heap size:

JAVA_OPTS="-Dinfo.aduna.platform.appdata.basedir=`pwd`/data -Xmx200G "


4.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M 1B
00:22:47 00:47:19 4:9:39


4.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100M 200M 1B
Query 1 93.773 65.385
25.128
Query 2 115.960
65.158
34.181
Query 3 170.242
61.155
26.042
Query 4 140.607
127.747
12.868
Query 5 1.868
1.199
0.198
Query 7 75.746
98.357
32.593
Query 8 93.467
193.087
60.702
Query 9 202.041
105.759
38.391
Query 10 146.327
69.411
60.357
Query 11 368.732
74.074
65.428
Query 12 244.738
197.239
61.418

 4.2.4 Benchmark Overall results: QMpH for the 100M, 200M, 1B datasets for all runs

For the 100M, 200M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 14029.453
17184.314
11677.860
8321.202
200M 9170.083
8130.137
5614.489
5150.768
1B
1669.899
2246.865
1081.508
912.518


4.2.5 Result Summaries



  • Bigowlim 1000M:
    Number of clients Single
    4
    8
    64
    Download links xml xml

4.3 TDB


TDB homepage

Fuseki homepage

4.3.1 Configuration

The following changes were made to the default configuration of the software:

  • TDB: Version 0.9.4
Loading was done with tdbloader2

Statistics for the BGP optimizer were generated with the "tdbconfig stats" command and copied into the database directory.
  • Fuseki: Version 0.2.5
Started server with: ./fuseki-server --loc /database/tdb /bsbm


4.3.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
1:37:55 3:34:59



4.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100M 200M
Query 1 119.048
94.877
Query 2 158.755
151.883
Query 3 84.660
70.492
Query 4 70.912
52.759
Query 5 1.959
1.308
Query 7 196.754
184.349
Query 8 228.258
199.362
Query 9 355.999
319.489
Query 10 297.619
267.094
Query 11 483.092
450.045
Query 12 204.834
192.901

 4.3.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 15381.857
19036.097
24646.705
14838.483
200M 10573.858
9540.452
18610.896
8265.151


4.3.5 Result Summaries



4.4 Virtuoso6 & Virtuoso7


Virtuoso homepage

4.4.1 Configuration

The following changes were made to the default configuration of the software:

  • Virtuoso6: Version 06.04.3132-pthreads for Linux as of May 14 2012
  • Virtuoso7: Version 07.00.3202-pthreads for Linux as of Jan    1 2013
Loading of datasets:

The loading was done by running multiple loading process (call rdf_loader_run() function).
For the 100M, 200M, 1B datasets 10, 20, 100 files were generated, respectively.

For the configuration see the "virtuoso.ini" file.


4.4.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

  • Virtuoso6
100M 200M 1B
00:19:26 00:31:30 1:10:30


  • Virtuoso7
100M
1B
00:03:09
00:27:11


4.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

  • Virtuoso6
  100M 200M 1B
Query 1 232.234
217.865
87.245
Query 2 109.445
110.019
79.791
Query 3 180.245
174.216
119.104
Query 4 116.604
111.732
42.586
Query 5 9.976
7.168
1.201
Query 7 30.001
32.918
31.840
Query 8 117.247
124.502
127.698
Query 9 397.456
363.042
132.459
Query 10 122.926
123.487
99.433
Query 11 539.957
493.583
500.501
Query 12 220.167
215.424
207.641

  • Virtuoso7
  100M
1B
Query 1 125.786
75.324
Query 2 68.929
68.820
Query 3 117.426
62.243
Query 4 58.514
30.473
Query 5 21.182
6.064
Query 7 54.484
55.356
Query 8 93.336
97.248
Query 9 173.898
176.772
Query 10 107.968
101.678
Query 11 214.133
225.124
Query 12 126.743
137.287

 4.4.4 Benchmark Overall results: QMpH for the 100M, 200M, 1B datasets for all runs

For the 100M, 200M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

  • Virtuoso6

1 4 8
64
100M 37678.319 64885.747
112388.811
20647.413
200M 32969.006
31387.107
77224.941
14480.812
1B
8984.789
15637.439
14343.728
2800.053

  • Virtuoso7

1 4 8 64
100M
47178.820
91505.200
188632.144
216118.852
1B 27933.682
56714.875
79261.626
132685.957

4.4.5 Result Summaries

  • Virtuoso6 100M:
    Number of clients Single
    4
    8
    64
    Download links xml xml

  • Virtuoso6 200M:
    Number of clients Single
    4
    8
    64
    Download links xml xml


  • Virtuoso7 100M:
    Number of clients Single
    4
    8
    64
    Download links xml xml



5. Benchmark Results for the BI Use Case


This section reports the results of running the BI use case of the BSBM benchmark against:

  • BigData (rev. 6528)
  • BigOwlim (version 5.2.5524)
  • TDB (version 0.9.4)
  • Virtuoso6 (06.04.3132-pthreads for Linux as of May 14 2012)
  • Virtuoso7 (07.00.3202-pthreads for Linux as of Jan  1 2013)

Test Procedure

The load process is the same as for the Explore use case. (See section 4)

The test procedure is similar to that for the Explore use case, however, for the single-client run, we only run 25 warm-up runs. Since running a BI query mix touches most of the data, few warm-up runs can make the SUTs sufficiently warm and they can have sustainable performance after that.
 

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, (optional: clear OS caches and swap), restart store.
  3. Execute single-client test run (10 mixes performance measurement, randomizer seed: 9834533) with 25 warm-up runs.
    ./testdriver -seed 9834533  -uc bsbm/bi –w 25 –runs 10 –drill –o result_single.xml http://sparql-endpoint
    
  4. Execute multi-client runs (4, 8 and 64 clients; randomizer seeds: 8188326, 9175932 and 4187411). For each run add two times the number of clients of warm up query mixes.
    For example for a run with 4 clients execute:

    ./testdriver -seed 8188326 -uc bsbm/bi -w 8 -mt 4 -drill -o results_4clients.xml http://sparql-endpoint

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.


5.1 BigData


  BigData homepage

5.1.1 Configuration

The following changes were made to the default configuration of the software:

  • BigData: Version rev. 6528
Copy the bibm3 into bigdata-perf directory. 
For loading and starting the server the ANT script in the directory "bigdata-perf/bibm3" was used.


5.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M
00:2:39



5.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):


  10M
Query 1 0.453
Query 2 0.445
Query 3 0.300
Query 4 0.167
Query 5 1.992
Query 6 9.917
Query 7 0.006
Query 8 0.568

 5.1.4 Benchmark Overall results: QMpH for the 10 dataset for all runs

For the 10M dataset we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
10M 7.290
16.222
16.439
18.812


5.1.5 Result Summaries



5.2 BigOwlim


Owlim homepage

5.2.1 Configuration

The following changes were made to the default configuration of the software:

  • BigOwlim: Version 5.2.5524
  • Tomcat: Version 7.0.30
Modified heap size:

JAVA_OPTS="-Dinfo.aduna.platform.appdata.basedir=`pwd`/data -Xmx200G "


5.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M 100M
1B
00:2:31 00:22:47 4:9:39

5.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):


  10M 100M
1B
Query 1 1.426
0.176
0.016
Query 2 0.069
0.009
0.001
Query 3 2.540
0.105
0.002
Query 4 0.150
0.027
0.003
Query 5 1.923
0.240 0.020
Query 6 23.923
15.538
13.951
Query 7 2.232
0.369
0.040
Query 8 1.395
0.191
0.016

5.2.4 Benchmark Overall results: QMpH for the 10M, 1B datasets for all runs

For the 10M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
10M 121.841
265.294
177.338
218.678
100M
15.512
33.986
20.263
15.076
1B
1.400
3.465
2.323
*

( *: No error was found, but this 64-client run was stopped when it ran for more than 2 days.)


5.2.5 Result Summaries



  • Bigowlim 1B:
    Number of clients Single
    4
    8
    Download links xml xml xml

5.3 TDB


TDB homepage

Fuseki homepage

5.3.1 Configuration

The following changes were made to the default configuration of the software:

  • TDB: Version 0.9.4
Loading was done with tdbloader2

Statistics for the BGP optimizer were generated with the "tdbconfig stats" command and copied into the database directory.
  • Fuseki: Version 0.2.5
Started server with: ./fuseki-server --loc /database/tdb /bsbm


5.3.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M
00:9:41


5.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):


  10M
Query 1 0.488
Query 2 0.023
Query 3 0.018
Query 4 0.140
Query 5 0.008
Query 6 16.202
Query 7 0.849
Query 8 0.018

 5.3.4 Benchmark Overall results: QMpH for the 10 dataset for all runs

For the 10M dataset we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
10M 7.468
17.698
9.503
8.414


5.3.5 Result Summaries

  • TDB 10M:
    Number of clients Single
    4
    8
    64
    Download links
    xml
    xml
    xml xml


5.4 Virtuoso6 & Virtuoso7


Virtuoso homepage

5.4.1 Configuration

The following changes were made to the default configuration of the software:

  • Virtuoso6: Version 06.04.3132-pthreads for Linux as of May 14 2012
  • Virtuoso7: Version 07.00.3202-pthreads for Linux as of Jan    1 2013
Loading of datasets:

The loading was done by running multiple loading process (call rdf_loader_run() function).
For the 100M, 200M, 1B datasets 10, 20, 100 files were generated, respectively.

For the configuration see the "virtuoso.ini" file.


5.4.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

  • Virtuoso6
10M 100M
1B
00:7:06 00:19:26 1:10:30


  • Virtuoso7
100M
1B
00:03:09
00:27:11


5.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):

  • Virtuoso6
  10M 100M
1B
Query 1 1.469
0.118
0.009
Query 2 37.707
7.931
0.635
Query 3 0.768
0.090
0.007
Query 4 1.183
0.216
0.020
Query 5 1.920
0.240
0.009
Query 6 14.988
10.767
3.726
Query 7 9.849
1.466
0.122
Query 8 0.592
0.048
0.003
  • Virtuoso7
  100M
1B
Query 1 11.558
0.462
Query 2 28.969
2.409
Query 3 0.886
0.035
Query 4 3.773
0.644
Query 5 5.496
0.468
Query 6 18.997
10.517
Query 7 14.816
1.912
Query 8 2.512
0.215

5.4.4 Benchmark Overall results: QMpH for the 10M, 1B datasets for all runs

For the 10M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

  • Virtuoso6

1 4 8
64
10M 431.465
2667.657
3915.854
1401.186
100M
35.342
191.431
268.428
99.321
1B
2.383
17.777
21.457
8.355

  • Virtuoso7

1 4 8 64
100M
996.795
5644.323
6402.190
7132.212
1B 75.236
348.666
361.205
134.459

5.4.5 Result Summaries


  • Virtuoso6 100M:
    Number of clients Single
    4
    8
    64
    Download links xml xml


  • Virtuoso7 100M:
    Number of clients Single
    4
    8
    64
    Download links xml xml




6. Benchmark Results for the Cluster Edition

This section reports the results of running the Explore and BI use cases of the BSBM benchmark with the cluster editions of

  • BigOwlim (version 5.2.5524)
  • Virtuoso7 (07.00.3202-pthreads for Linux as of Jan  1 2013)


Test Procedure

For the case of 10B triples dataset, we applied the following test procedure to each store:
  1. Load data into the store.
  2. Shutdown store, (optional: clear OS caches and swap), restart store.
  3. Execute single-client test run of Explore use case (100 mixes performance measurement, randomizer seed: 9834533) with 100 warm-up runs.
    ./testdriver -seed 9834533  -uc bsbm/explore –w 100 –runs 100 –drill –o result_single.xml http://sparql-endpoint
    
  4. Execute multi-client runs with 8 clients of Explore use case (randomizer seed 9175932) and 16 warm-up runs.
    ./testdriver -seed 8188326  -uc bsbm/explore -w 16 -mt 8 -drill -o results_8clients.xml http://sparql-endpoint
  1. Execute single-client test run of BI use case (1 mix performance measurement, randomizer seed: 9834533) no warm-up run
  1. ./testdriver -seed 9834533  -uc bsbm/bi –runs 1 –drill –o result_single_bi.xml http://sparql-endpoint
    
  2. Execute multi-client runs with 8 clients of BI use case (randomizer seed 9175932), no warm-up run.
    ./testdriver -seed 8188326  -uc bsbm/bi -mt 8 -drill -o results_8clients_bi.xml http://sparql-endpoint
    


The cases of 50B and 150B triples datasets are performed with Virtuoso7 cluster version only. For these datasets, with BI use case, no specific warm-up was used and the single user run was run immediately following a cold start of the multi-user run.

6.1 BigOwlim


Owlim homepage

6.1.1 Configuration

The following changes were made to the default configuration of the software:

  • BigOwlim: Version 5.3.5777
Modified heap size and cache-memory in example.sh

-Xmx200G -Xms160G -Dcache-memory=100G


6.1.2 Load Time

We use the application in getting-started directory for loading the data.

The dataset is first generated into 100 .nt files (~100 million triples/file), and then copy to the getting-started/preload for loading. For Bigowlim, the data generator is also modified so that it writes the first 100 million triples to the first file, then writes the next 100 million triples to the second file and so on. (Note: The original data generator writes triples to 100 files in a round robin style, e.g., first triple go to first file, next triple go to second file, ...., 100th triple go the 100th file, 101th triple go to the first file, and so on).

However, since we had to stop and resume loading process many times for tuning the parameters and solving problems happened during the loading process, it is hard to calculate the loading time.

After the getting-started app had finished loading process, the built database is manually copied to each worker node. With 8 machines in the cluster, we have 8 replications.


6.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query in single-client runs (in QpS):
  • Explore use case
  10B
Query 1 1.262
Query 2 8.771
Query 3 0.230
Query 4 0.685
Query 5 0.010
Query 7 0.864
Query 8 2.811
Query 9 20.222
Query 10 1.763
Query 11 20.325
Query 12 35.461

  • BI use case
  10B
Query 1 0.000076
Query 2 0.00005
Query 3 0.002
Query 4 0.0003
Query 5 0.0003
Query 6 0.036
Query 7 0.001
Query 8 0.001

6.1.4 Benchmark Overall results: QMpH for the 10B datasets for all runs

For the 10B datasets we ran a test with 1, 8 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

  • Explore use case

1 8
10B 16.506
257.399
  • BI use case

1 8
10B 0.044
0.120



6.1.5 Result Summaries

  • Bigowlim 10B Explore use case:
    Number of clients Single
    8
    Download links
    xml
    xml

  • Bigowlim 10BBI use case:
    Number of clients Single
    8
    Download links xml xml

6.2 Virtuoso7 cluster


Virtuoso homepage

6.2.1 Configuration

The following changes were made to the default configuration of the software:

  • Virtuoso7: Version 07.00.3202-pthreads for Linux as of Jan    1 2013
- Loading of datasets:

The loading was done by executing multiple loading process in all cluster node (call cl_exec (' rdf_ld_srv ()' )).

For all datasets 1000 files were generated (125 files in each node).
This means that multiple files are read at the same time by the multiple cores of each CPU.


- The best performance was obtained with 7 loading threads per server process.
Hence, with two server processes per machine and 8 machines, 112 files were being read at the same time.


6.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10B
50B
150B
00:1:05 00:06:28
*

*: The largest load (150B) was slowed down by one machine showing markedly lower disk write throughput than the others. On the slowest machine iostat showed a continuous disk activity of about 700 device transactions per second, writing anything from 1 to 3 MB of data per second. On the other machines, disks were mostly idle with occasional flushing of database buffers to disk producing up to 2000 device transactions per second and 100MB/s write throughput. Since data is evenly divided and 2 of 16 processes were not runnable because the OS had too much buffered disk writes, this could stop the whole cluster for up to several minutes at a stretch. Our theory is that these problems were being caused by hardware malfunction. To complete the 150B load, we interrupted the stalling server processes, moved the data directories to different drives, and resumed the loading again. The need for manual intervention, and the prior period of very slow progress makes it hard to calculate the total time it took for the 150B load.


6.2.3 Benchmark Query results: QpS (Queries per Second)

We configured the BSBM driver to use 4 sparql endpoints for these query tests, so not all clients connect through the same machine.

The table below summarizes the query throughput for each type of query in single-client runs (in QpS):

  • Explore use case
  10B
50B
150B
Query 1 15.530
15.049
8.843
Query 2 23.725
22.005
15.107
Query 3 15.349
8.921
9.016
Query 4 10.335
6.416
3.244
Query 5 0.959
0.267
0.124
Query 7 16.496
6.451
3.873
Query 8 22.121
10.035
5.314
Query 9 86.843
92.400
87.527
Query 10 35.663
6.823
4.987
Query 11 204.918
198.413
177.936
Query 12 65.402
73.046
75.075

  • BI use case
  10B
50B
150B
Query 1 0.111
0.002
0.001
Query 2 1.511
0.005
0.005
Query 3 0.385
0.003
0.001
Query 4 0.088
0.055
0.005
Query 5 0.061
0.005
0.001
Query 6 3.802
0.021
0.041
Query 7 0.514
0.027
0.017
Query 8 0.045
0.004
0.001

6.2.4 Benchmark Overall results: QMpH for the 10B, 50B, 150B datasets for all runs

For the 10B we ran a test with 1 and 8 clients. For 50B and 150B datasets, we ran tests with 1 and 4 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

  • Explore use case

1 4 8
10B
2360.210
-
4978.511
50B
4253.157
2837.285
-
150B
2090.574
1471.032
-

  • BI use case

1 4 8
10B 13.078
-
20.554
50B
0.964
1.588
-
150B
0.285
0.480
-

6.2.5 Result Summaries

  • Virtuoso7 10B - Explore use case:
    Number of clients Single
    8
    Download links xml

  • Virtuoso7 10B - BI use case:
    Number of clients Single
    8
    Download links xml

  • Virtuoso7 50B - Explore use case:
    Number of clients Single
    4
    Download links xml

  • Virtuoso7 50B - BI use case:
    Number of clients Single
    4
    Download links xml


  • Virtuoso7 150B - Explore use case:
    Number of clients Single
    4
    Download links xml

  • Virtuoso7 150B - BI use case:
    Number of clients Single
    4
    Download links xml


7. Store Comparison

This section compares the SPARQL query performance of the different stores.

7.1 Query Mixes per Hour for Single Clients

Running 500 query mixes against the different stores resulted in the following performance numbers (in QMpH). The best performance figure for each dataset size is set bold in the tables.


7.1.1 QMpH: Explore use case

The complete query mix is given here.


  100M 200M 1B
BigData 12512.278 10059.940 -
BigOwlim 14029.453 9170.083 1669.899
TDB 15381.857 10573.858 -
Virtuoso6 37678.319 32969.006 8984.789
Virtuoso7
47178.820
-
27933.682

A much more detailed view of the results for the Explore use case is given under  Detailed Results For The Explore-Query-Mix Benchmark Run.

7.1.2 QMpH: BI use case


  10M 100M
1B
BigData 7.290 -
-
BigOwlim 121.841 15.512
1.400
TDB 7.468 -
-
Virtuoso6 431.465 35.342
2.383
Virtuoso7
- 996.795
75.236

A much more detailed view of the results for the BI use case is given under Detailed Results For The BI-Query-Mix Benchmark Run.


7.1.2 QMpH: Cluster edition

  • Explore use case
  10B 50B
150B
BigOwlim 16.506 -
-
Virtuoso7
2360.210 4253.157 2090.574
  • BI use case
  10B 50B
150B
BigOwlim 0.044 -
-
Virtuoso7
13.078 0.964 0.285

7.2 Query Mixes per Hour for Multiple Clients



  • Explore use case
Dataset Size 100M   Number of clients  
 
1
4
8
64
BigData 12512.278
17949.632
19574.007
20422.626
BigOwlim
14029.453 17184.314 11677.860 8321.202
TDB
15381.857 19036.097 24646.705 14838.483
Virtuoso6
37678.319 64885.747 112388.811 20647.413
Virtuoso7
47178.820
91505.200
188632.144
216118.852

 

Dataset Size 200M   Number of clients  
 
1
4
8
64
BigData
10059.940 9762.856 11572.433 12935.595
BigOwlim
9170.083 8130.137 5614.489 5150.768
TDB
10573.858 9540.452 18610.896 8265.151
Virtuoso6
32969.006 31387.107 77224.941 14480.812


Dataset Size 1B   Number of clients  
 
1
4
8
64
BigOwlim
1669.899 2246.865 1081.508 912.518
Virtuoso6
8984.789 15637.439 14343.728 2800.053
Virtuoso7
27933.682 56714.875 79261.626 132685.957


  • BI use case
Dataset Size 10M   Number of clients  
 
1
4
8
64
BigData 7.290 16.222 16.439 18.812
BigOwlim
121.841 265.294 177.338 218.678
TDB
7.468 17.698 9.503 8.414
Virtuoso6
431.465 2667.657 3915.854 1401.186


Dataset Size 100M   Number of clients  
 
1
4
8
64
BigOwlim
15.512
33.986
20.263
15.076
Virtuoso6
35.342
191.431
268.428
99.321
Virtuoso7
996.795 5644.323 6402.190 7132.212


Dataset Size 1B   Number of clients  
 
1
4
8
64
BigOwlim
1.400 3.465 2.323 -
Virtuoso6
2.383 17.777 21.457 8.355
Virtuoso7
75.236 348.666 361.205 134.459

  • Cluster - Explore use case (10B only)
Dataset Size 10B Number of clients
 
1
8
BigOwlim
16.506 257.399
Virtuoso7
2360.210 4978.511
  • Cluster - BI use case (10B only)
Dataset Size 10B Number of clients
 
1
8
BigOwlim
0.044 0.120
Virtuoso7
13.078 20.554


7.3 Detailed Results For The Explore-Query-Mix Benchmark Run

The details of running the Explore query mix are given here. There are two different views:

7.3.1 Queries per Second by Query and Dataset Size

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.


Query 1

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 49.955
93.773 119.048 232.234 125.786
200M 49.520 52.094 94.877 217.865
1B

25.128
87.245 75.324

Query 2

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 42.769
115.960 158.755 109.445 68.929
200M 43.713 65.158 151.883 110.019
1B

34.181
79.791 68.820

Query 3

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 37.280
170.242 84.660 180.245 117.426
200M 38.355 61.155 70.492 174.216
1B
26.042
119.104 62.243

Query 4

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 36.846
140.607 70.912 116.604 58.514
200M 36.830 127.747 52.759 111.732
1B
12.868
42.586 30.473

Query 5

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 2.684
1.868 1.959 9.976 21.182
200M 1.799 1.199 1.308 7.168
1B
0.198
1.201 6.064

Query 6

Removed.

Query 7

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 16.172
75.746 196.754 30.001 54.484
200M 16.548 98.357 184.349 32.918
1B
32.593
31.840 55.356

Query 8

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 37.498
93.467 228.258 117.247 93.336
200M 38.721 193.087 199.362 124.502
1B
60.702
127.698 97.248

Query 9

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 59.524
202.041 355.999 397.456 173.898
200M 61.476 105.759 319.489 363.042
1B
38.391
132.459 176.772

Query 10

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 41.326
146.327 297.619 122.926 107.968
200M 42.427 69.411 267.094 123.487
1B
60.357
99.433 101.678

Query 11

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 62.375
368.732 483.092 539.957 214.133
200M 63.784 74.074 450.045 493.583


65.428
500.501 225.124

Query 12

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
100M 50.989
244.738 204.834 220.167 126.743
200M 52.094 197.239 192.901 215.424
1B
61.418
207.641 137.287

7.3.2 Queries per Second by Dataset Size and Query

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables. 


100M

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
Query 1 49.955
93.773 119.048 232.234 125.786
Query 2 42.769
115.960 158.755 109.445 68.929
Query 3 37.280
170.242 84.660 180.245 117.426
Query 4 36.846
140.607 70.912 116.604 58.514
Query 5 2.684
1.868 1.959 9.976 21.182
Query 7 16.172
75.746 196.754 30.001 54.484
Query 8 37.498
93.467 228.258 117.247 93.336
Query 9 59.524
202.041 355.999 397.456 173.898
Query 10 41.326
146.327 297.619 122.926 107.968
Query 11 62.375
368.732 483.092 539.957 214.133
Query 12 50.989
244.738 204.834 220.167 126.743

200M

  BigData BigOwlim TDB Virtuoso6
Query 1 49.520 52.094 94.877 217.865
Query 2 43.713 65.158 151.883 110.019
Query 3 38.355 61.155 70.492 174.216
Query 4 36.830 127.747 52.759 111.732
Query 5 1.799 1.199 1.308 7.168
Query 7 16.548 98.357 184.349 32.918
Query 8 38.721 193.087 199.362 124.502
Query 9 61.476 105.759 319.489 363.042
Query 10 42.427 69.411 267.094 123.487
Query 11 63.784 74.074 450.045 493.583
Query 12 52.094 197.239 192.901 215.424

 

1B

  BigOwlim Virtuoso6 Virtuoso7
Query 1 25.128 87.245 75.324
Query 2 34.181 79.791 68.820
Query 3 26.042 119.104 62.243
Query 4 12.868 42.586 30.473
Query 5 0.198 1.201 6.064
Query 7 32.593 31.840 55.356
Query 8 60.702 127.698 97.248
Query 9 38.391 132.459 176.772
Query 10 60.357 99.433 101.678
Query 11 65.428 500.501 225.124
Query 12 61.418 207.641 137.287

 

7.4 Detailed Results For The BI-Query-Mix Benchmark Run

The details of running the BI query mix are given here. There are two different views:

7.4.1 Queries per Second by Query and Dataset Size

Running 10 query mixes against the different stores lead to the following query throughput for each type of query over all 10 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.


Query 1

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.453 1.426 0.488 1.469
100M

0.176

0.118
11.558
1B

0.016
0.009 0.462

Query 2

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.445 0.069 0.023 37.707
100M
0.009

7.931
28.969
1B

0.001
0.635 2.409

Query 3

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.300 2.540 0.018 0.768
100M
0.105

0.090
0.886
1B

0.002
0.007 0.035

Query 4

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.167 0.150 0.140 1.183
100M
0.027

0.216
3.773
1B

0.003
0.020 0.644

Query 5

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 1.992 1.923 0.008 1.920
100M
0.240

0.240
5.496
1B

0.020
0.009 0.468

Query 6

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 9.917 23.923 16.202 14.988
100M
15.538

10.767
18.997
1B
13.951
3.726 10.517

Query 7

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.006 2.232 0.849 9.849
100M
0.369

1.466
14.816
1B
0.040
0.122 1.912

Query 8

  BigData BigOwlim TDB Virtuoso6 Virtuoso7
10M 0.568 1.395 0.018 0.592
100M
0.191

0.048
2.512
1B
0.016
0.003 0.215

7.4.2 Queries per Second by Dataset Size and Query

Running 10 query mixes against the different stores lead to the following query throughput for each type of query over all 10 runs (in Queries per Second). The best performance figure for each query is set bold in the tables. 


10M

  Bigdata
Bigowlim
TDB
Virtuoso6
Query 1 0.453 1.426 0.488 1.469
Query 2 0.445 0.069 0.023 37.707
Query 3 0.300 2.540 0.018 0.768
Query 4 0.167 0.150 0.140 1.183
Query 5 1.992 1.923 0.008 1.920
Query 6 9.917 23.923 16.202 14.988
Query 7 0.006 2.232 0.849 9.849
Query 8 0.568 1.395 0.018 0.592

100M

  Bigowlim
Virtuoso6
Virtuoso7
Query 1 0.176
0.118
11.558
Query 2 0.009
7.931
28.969
Query 3 0.105
0.090
0.886
Query 4 0.027
0.216
3.773
Query 5 0.240
0.240
5.496
Query 6 15.538
10.767
18.997
Query 7 0.369
1.466
14.816
Query 8 0.191
0.048
2.512

1B

  Bigowlim
Virtuoso6
Virtuoso7
Query 1 0.016 0.009 0.462
Query 2 0.001 0.635 2.409
Query 3 0.002 0.007 0.035
Query 4 0.003 0.020 0.644
Query 5 0.020 0.009 0.468
Query 6 13.951 3.726 10.517
Query 7 0.040 0.122 1.912
Query 8 0.016 0.003 0.215



8. Thanks


Thanks a lot to BSBM authors Chris Bizer and Andreas Schultz for providing instructions and sharing the software/scripts at the very beginning of our benchmark experiment.

We want to thank the store vendors and implementors for helping us to setup and configure their stores for the experiment. Lots of thanks to Orri Erling, Ivan Mikhailov, Mitko Iliev, Hugh Williams, Alexei Kaigorodov, Zdravko Tashev, Barry Bishop, Bryan Thompson, Mike Personick.

The work on the BSBM Benchmark Version 3 is funded through the LOD2 - Creating Knowledge out of Linked Data project.

 

Please send comments and feedback about the benchmark to and .