June, 2008
Top-end data warehouse sizes have grown hundreds-fold over the past 12 years
I just tripped across a link from February, 1996 in which NCR/Teradata:
Bragged that it had half a dozen customers with >1 TB of raw user data
Showed off a “record-breaking” 11 TB simulation
That represents roughly a 60-70% annual growth rate in top-end database sizes in the intervening 12 years.... Read more
Netezza, enterprise data warehouses, and the 100 terabyte mark
Phil Francisco of Netezza checked in tonight with some news that will be embargoed for a few hours. While I had him on the phone anyway, I asked him about large databases and/or enterprise data warehouses. Highlights included:
Netezza has one customer with 200 TB of user data. The name is confidential (but he told me who it was).
Netezza has sold 15 or so of its NPS 10-800s, which are rated at 100 TB capacity.
The second-largest database in production on Netezza is probably 80 TB or so at Catalina Marketing, which has been a Netezza early adopter all along.
Netezza’s biggest users typically have a handful (literally — off the top of his head, Phil said “4 to 6″) of applications, each with its own primary set of fact tables.
Each appl... Read more
Netezza on compression
Phil Francisco put up a nice post on Netezza’s company blog about a month ago, explaining the Netezza compression story. Highlights include:
Like other row-based vendors, Netezza compresses data on a column-by-column basis, then stores the results in rows. This is obviously something of a limitation — no run-length encoding for them — but can surely accommodate several major compression techniques.
The Netezza “Compress Engine” compresses data on a block-by-block basis. This is a disadvantage for row-based systems vs. columnar ones in the area of compression, because columnar systems have more values per block to play with, and that yields higher degrees of compression. And among row-based systems, typical block size is an indicator of compress... Read more
DATAllegro on compression
DATAllegro CEO Stuart Frost has been blogging quite a bit recently (and not before time!). A couple of his posts have touched on compression. In one he gave actual numbers for compression, namely:
DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1.
In another recent post, Stuart touched on architecture, saying:
Due to the way our compression code works, DATAllegro鈥檚 current products are optimized for performance under heavy concurrency. The end result is that we don’t use the full power of the platform when running one query at a time.
Not immediately seeing the connection, I emailed Stuart for clarification. His answer boiled down to:
Compression adds to I/O latency.... Read more
Data warehouse appliance power user TEOCO
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >... Read more
Netezza has an EMC deal too
Netezza has an EMC deal too. As befits a hardware vendor, Netezza has an actual OEM relationship with EMC, in which it is offering CLARiiONs built straight into NPS appliances. 5 TB of CLARiiON will be free in any Netezza system from 2 racks on upward. (A rack holds about 12.5 TB.) In addition, you’ll be able to buy 10 TB more of CLARiiON in every Netezza rack, if you want. The whole thing is supposed to ship before year-end.
Unlike DATAllegro, which manages all of your data on EMC boxes, or ParAccel, which wants to manage some of it on EMC, Netezza will continue to manage data solely on direct-attached disk, using CLARiiONS just as staging areas. The cutesy name for this is “Storage pad”, unless I misheard and it’s really “Storage pod”; the... Read more
Detailed analysis of Perst and other in-memory object-oriented DBMS
Dan Weinreb — inspired by but not linking to my recent short post on McObject’s object-oriented in-memory DBMS Perst — has posted a detailed discussion of Perst on his own blog. For context, he compares it briefly to analogous products, most especially Progress’s — which used to be ObjectStore, of which Dan was the chief architect.
This was based on documentation and general sleuthing (Dan figured out who McObject got Perst from), rather than hands-on experience, so performance figures and the like aren’t validated. Still, if you’re interested in such technology, it’s a fascinating post.... Read more
McObject eXtremeDB
McObject — vendor of memory-centric DBMS eXtremeDB — is a tiny, tiny company, without a development team of the size one would think needed to turn out one or more highly-reliable DBMS. So I haven’t spent a lot of time thinking about whether it’s a serious alternative to solidDB for embedded DBMS, e.g. in telecom equipment. However:
IBM’s acquisition of Solid seems to suggest a focus on DB2 caching rather than the embedded market
McObject actually has built up something of a customer list, as per the boilerplate on any of its press releases.
And they do seem to have some nice features, including Patricia tries (like solidDB), R-trees (for geospatial), and some kind of hybrid disk-centric/memory-centric operation.... Read more
ANTs bails out of the DBMS market
ANTs Data Server — i.e., the ANTs DBMS — has been sold off to a company called 4Js. It is now to be called Genero DB. Actually, 4Js has been selling or working on a version of the product called Genero DB since 2006, specifically an Informix-compatible one.
I’m not totally clear on why an Informix-compatible DBMS is needed in a world that already has Informix SE, but maybe IBM is overcharging for maintenance even on the low-end version of the product.
Meanwhile, ANTs, which had originally tried to get enterprises to migrate away from Oracle, is now focused on middleware called the ANTs Compatibility Server to help them migrate to Oracle, specifically/initially from Sybase.... Read more
Open source in-memory DBMS
I’ve gotten email about two different open source in-memory DBMS products/projects. I don’t know much about either, but in case you care, here are some pointers to more info.
First, the McObject guys — who also sell a relational in-memory product — have an object-oriented, apparently Java-centric product called Perst. They’ve sent over various press releases about same, the details of which didn’t make much of an impression on me. (Upon review, I see that one of the main improvements they cite in Perst 3.0 is that they added 38 pages of documentation.)
Second, I just got email about something called CSQL Cache. You can read more about CSQL Cache here, if you’re willing to navigate some fractured English. CSQL’s SourceForge pa... Read more