In addition to my previously posted thoughts on the Oracle Exadata/data warehouse appliance announcement, let me offer some more concise observations.
Microsoft had leapfrogged Oracle with its DATAllegro acquisition. Now Oracle’s back in the game.
But Oracle Exadata Release 1 is hardly going to put Teradata, Netezza, or Greenplum out of business.
After long denying it, Oracle has finally admitted that putting more than 10 TB on Oracle had been an extremely painful thing to do.
Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing.
Medium-long term, the Exadata technical strategy could work very well. Exadata storage management addr... Read more
Oracle Exadata was pre-teased as “Extreme performance.” Some incorrect speculation shortly before the announcement focused on the possibility of OLTP without disk, which clearly would speed things up a lot. I interpret that in part as being wishful thinking.
The most compelling approach I’ve seen to that problem yet is H-Store, which however makes some radical architectural assumptions. One point I didn’t stress in my earlier posts, but which turned out to be a deal-breaker for one early tire-kicker, is that to use H-Store you have to be able to shoehorn each transaction into its own stored procedure. Depending on how intricate your logic is, that might make it hard to port an existing app to H-Store.
Even for new apps, it could get in the way of some ... Read more
Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn’t stand in query and analytic parallelization. This post supersedes prior discussions of the subject over the past week.
Let’s start with the part everybody pretty much knows already:
There are two parts to a parallelization story — how you get data off of disk, and what you do with it once you have it.
To a first approximation, the best way to get a lot of data off of disk is in parallel, specifically with different CPUs talking to different disk drives. Until last week’s announcement of Exadata, Oracle was the most prominent holdout against this view. (That dubious honor now goes to Sybase.)
If processing units are working ... Read more
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Our call focused on the part that I didn’t think I’d understood well before, namely:
What are the database design and programming assumptions required for H-Store to work?
How generally ... Read more
Obviously, the big news this week is Exadata, and its parallelization or lack thereof. But let’s not forget the rest of Oracle’s data warehousing technology.
Frankly, I’ve come to think that disk-based OLAP cubes and materialized views are both cop-outs, indicative of a relational data warehouse architecture that can’t answer queries quickly enough straight-up. But if you disagree, then you might like Oracle’s new OLAP cube materialized views, which sound like a worthy competitor to Microsoft Analysis Services. (Further confusing things, I’ve seen reports that Oracle is increasing its commitment to Essbase, a separate MOLAP engine. I hope those are incorrect.)
A few weeks ago, I came to realize that Oracle’s data mining database f... Read more
Edit: Answers to the title question have now shown up, and so the post below is now superseded by this one.
In most respects — including most data warehousing respects — Oracle’s query optimizer is the most sophisticated on the planet (even ahead of IBM’s, I’d say). But in all the Exadata discussion — and also in a good, comprehensive review of Oracle’s data warehouse technology — I haven’t seen any claims that Oracle has tackled the hard problems of parallel analytics.
Yes, Oracle is now getting data off of multiple disks onto multiple processors at once, without SAN bottlenecks, and doing some local filtering. That’s the heart of the Exadata storage story, and it’s indeed a huge advance over Oracle’s pr... Read more
I’ve long argued that:
Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include:
A definitive agreement has been signed.
Deal closing is expected in a few weeks.
I got the impression that the undisclosed price is surely a nice step-up from the Series D round that closed a few months ago.
DATAllegro CEO Stuart Frost will run an engineering division, based at DATAllegro’s c... Read more
Geospatial data management is one of the flavors of the month:
Last week, Teradata claimed it has the most sophisticated analytic geospatial data management capability.
Also last week, Netezza’s newly acquired Netezza Spatial technology attracted a lot of attention.
This week, Oracle called attention to its geospatial capabilities.
So I asked Netezza and Teradata what this geospatial analytics stuff is all about.
The first thing to note is that OLTP/general-purpose DBMS and analytic DBMS handle geospatial data differently. That is, most serious general-purpose RDBMS use an indexing scheme like r-trees, which excel at management of individual geographic-coordinate database records. Analytic DBMS vendors, however, who focus on focusing large sets o... Read more
Oracle has put up an Exadata white paper (hat tip to Kevin Closson’s Exadata FAQ). There’s a section on Smart Scan Join Processing. Sounds exciting, huh? It reads, in its entirety:
Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom Filters, which are a very efficient probabilistic method to determine whether a row is a member of the desired result set.
Jeez. That almost sounds as if Exadata is an immature, Release 1 data warehouse appliance!... Read more
The figures in this post have now been updated.聽 There’s a new spreadsheet at that link as well.
I’ve been trying to figure out how much Oracle Exadata actually costs. My first cut comes up with prices of $58-190K/TB (user data), based on a total system price of $5,322,000, and user data figures of 28 and 92.4 TB for the two available sizes of disk drive. But of course there are a lot of uncertainties in these figures. You can use this spreadsheet (Edit: That’s the old one) to see where the final numbers come from, and to modify the estimates as you see fit. Difficulties include:
The Oracle Exadata package has two parts — a storage cluster and a RAC server cluster. It’s not certain what the proper balance between the two is …