Paraccel does it again?
Today analytic database vendor Paraccel announced another record breaking TPC-H benchmark result for the 1TB size segment. There are indeed a couple of things to be proud of here: the Price/QphH now matches Kickfires overall lowest result, and they beat Oracle by a nice margin in pure performance. And yes, they did this using a virtualized cluster, though you can question the effects of this, either negative or positive. In any case, it's a nice showcase for VMWare, who was the sponsor of this benchmark, proving that there's no such thing as a virtualization performance penalty anymore. At least not a noticeable one.
So far the good news; there's another side to this benchmark too. For starters: I like the TPC-H for the purpose I'm using it for myself: provide a level playing field for different products and see whether a) they can run the query set AT ALL and b) they run reasonably fast. Several products fail these tests by not being able to run correlated subqueries, return wrong results, or just take forever for a query to complete. It's also just fun to be able to use the same benchmark as the big industry vendors.
What I don't like about TPC-H is the ridiculous amount of hardware vendors throw at it. Paraccel is doing somewhat (but not spectacular) better than Oracle and Exasol on the 1TB run but probably could do so due to advances in hardware (more on that later). It certainly doesn't come close to the claim on their website that they are "5 to 500 times faster than any other database (even those others that claim to be fastest)". The cluster they used has almost 3TB of RAM, 96 TB of disk space and the 40 machines consume 30KW of power. In my opinion, that's totally insane for a 1TB database! As I've said somewhere before, it's about time that Watt/QphH is taken into account as well (and I'm very pleased to see this column is now actually added to the TPC results overview, though no results are available yet).
Now back to a little analysis of the Paraccel results, and let's forget about Oracle here (to put things into perspective: the total cost of the sytem they used for running a measly 1TB was $ 6,3 MILLION). Comparing to Exasol is a lot more interesting; the technology used is very similar but there are some notable differences too:
|CPU type||Xeon 5560||Xeon 5460|
|RAM type||DDR3 1333 Mhz||DDR2 667 Mhz|
|RAM amount||2,880 GB||768 GB|
|Disk type||SAS 10k 300GB||SAS 15k 146GB|
|Load time||16 min 23||1 hr 3 min 42|
So Exasol used 20% more servers (48 vs. 40) but was using a previous generation CPU's: raw performance improvement of the Xeon 55x compared to the 54x series is 35-40%. Memory speed of the Paraccel cluster is doubled compared to the older DDR2 class of memory Exasol used, and since both these benchmarks ran entirely in memory this makes all the difference. Furthermore, with more than 3 times the amount of disks, it's no wonder that load times are lower. And don't get confused by the RPM speed difference: the read/write speeds of the current 10K 2.5" SAS disks that Paraccel used are far better than those of the 15K 3.5" disks in the Exasol cluster.
This brings us back to the title of this blog, or actually the question mark at the end. Is Paraccel really doing so much better, or are they just taking advantage of hardware innovations? The only way to tell is to have Exasol run the same benchmark on the VMWare cluster Paraccel used, but I doubt if that will ever happen.