Comparison of Open Source Databases with TPC D Queries
Last time we talked about database engine and transactions. Now we have come to the realm of query processing in our revisiting of the DBMS side of Virtuoso.
Now the well established, respectable standard benchmark for the basics of query processing is TPC D with its derivatives H and R. So we have, for testing how different SQL optimizers manage the 22 queries, run a mini version of the D queries with a 1% scale database, some 30M of data, all in memory. This basically catches whether SQL implementations miss some of the expected tricks and how efficient in memory loop and hash joins and aggregation are.
When we get to our next stop, high volume I/O, we will run the same with D databases in the 10G ballpark.
The databases were tested on the same machine, with warm cache, taking the best run of 3. All had full statistics and were running with read committed isolation, where applicable. The data was generated using the procedures from the Virtuoso test suite. The Virtuoso version tested was 5.0, to be released shortly. The MySQL was 5.0.27, the PostgreSQL was 8.1.6.
Query | Query Times in Milliseconds |
Virtuoso | PostgreSQL | MySQL | MySQL with InnoDB |
Q1 | 206 | 763 | 312 | 198 |
Q2 | 4 | 6 | 3 | 3 |
Q3 | 13 | 51 | 254 | 64 |
Q4 | 4 | 16 | 24 | 60 |
Q5 | 15 | 22 | 64 | 68 |
Q6 | 9 | 70 | 189 | 65 |
Q7 | 52 | 143 | 211 | 84 |
Q8 | 29 | 31 | 13 | 11 |
Q9 | 36 | 114 | 97 | 61 |
Q10 | 32 | 51 | 117 | 57 |
Q11 | 16 | 9 | 12 | 10 |
Q12 | 8 | 21 | 18 | 130 |
Q13 | 18 | 74 | - | - |
Q14 | 7 | 21 | 418 | 1425 |
Q15 | 14 | 43 | 389 | 122 |
Q16 | 16 | 22 | 18 | 25 |
Q17 | 1 | 54 | 26 | 10 |
Q18 | 82 | 120 | - | - |
Q19 | 19 | 8 | 2 | 17 |
Q20 | 7 | 15 | 66 | 52 |
Q21 | 34 | 86 | 524 | 278 |
Q22 | 4 | 323 | 3311 | 805 |
Total (msec) | 626 | 2063 | 6068 | 3545 |
We lead by a fair margin but MySQL is hampered by obviously getting some execution plans wrong and not doing Q13 and Q18 at all, at least not under several tens of seconds; so we left these out of the table in the interest of having comparable totals.
As usual, we also ran the workload on Oracle 10g R2. Since Oracle does not like their numbers being published without explicit approval, we will just say that we are even with them within the parameters described above. Oracle has a more efficient decimal type so it wins where that is central, as on Q1. Also it seems to notice that the GROUP BY
s of Q18 are produced in order of grouping columns, so it needs no intermediate table for storing the aggregates. If we addressed these matters, we'd lead by some 15% whereas now we are even. A faster decimal arithmetic implementation may be in the release after next.
In the next posts, we will look at IO and disk allocation, and also return to RDF and LUBM.