Michael Stonebraker is predicting that the dominance of the generic relational database is coming to an end. Having recently founded several database companies, he has a vested interested in this prediction .

Here is Stonebraker logic: we can outperform relational databases with specialized solutions. Therefore, users will migrate to specialized engines. In effect, specialized players such as Vertica will grab market shares from Oracle Database and Microsoft SQL Server.

Unfortunately, Stonebraker’s arguments are misleading. As far as performance is concerned, Stonebraker is obviously right: we are undergoing major changes. As pointed out by Daniel Tunkelang, you can store a lot of data in 32GB of RAM. Solid-state drives can be used to wipe out some IO bottlenecks. Yet, these technological changes will not change the game for two reasons:

  • We have always been able to outperform generic relational databases: (1) column stores have been around since the seventies when they were called transposed files (2) search engines have always used their own indexes (3) lightweight key-value engines like Tokyo Cabinet have always been around. Generic relational databases did not achieve dominance due to their superior performance.
  • Generic relational databases are frequently catching up to specialized engines. In particular, they are not limited to row stores. Curt Monash’s blog post on Oracle’s hybrid columnar approach makes this obvious. Nicolas Bruno, in Teaching an Old Elephant New Tricks, predicted that the lessons learned by start-ups such as Vertica will be integrated into traditional relational engines.

Further reading: I was motivated by the latest StorageMojo blog post. See also my blog posts Trading compression for speed with vectorization, Changing your perspective: horizontal, vertical and hybrid data models, Column stores and row stores: should you care? and Native XML databases: have they taken the world over yet?

7 Comments

  1. Daniel, I think you make an excellent point. There might just be what I call a “Borg effect” whereas the big boys start absorbing from columnar and other such niche technologies (I use the term loosely). We see this in PAX/Hybrid approaches to tabular/columnar and of course the new improved Exadata V2 which kinda indicates an OLTP/OLAP mix although the latter seems a bit tough to swallow based on existing numbers.

    Comment by Jerome Pineau — 16/9/2009 @ 13:08

  2. Thanks Jerome. My point exactly. Though I don’t predict that Oracle will continue to do well… I just don’t think that Stonebraker’s arguments as to why it will soon fail are correct. Making predictions that come true and having the proper world model are two different things.

    I should point out that I am a big fan of Stonebraker’s research. I would just not hire him as a business analyst.

    Comment by Daniel Lemire — 16/9/2009 @ 13:19

  3. I think the sentiment of Stonebraker’s message and the hype assigned to get it noticed are probably two slightly different things.

    Yes column like stores and hash tables, Berkley and so on have been around for a long time. But over the last two decades we have largely forgotten about them and moved towards the general purpose RBDMS for almost everything. Ask 100 dev shops when was the last time they built something data centric not for a GP RDBMS (or even not for MySQL, Oracle or SQL Server) and I would be surprised if you got more than a couple of responses. But, this has had a lot of highly positive benefits in terms of standardization of development and economies of scale in data management (plus the GP RDBMS brings a lot to the pary in terms of performance, reliability, availability, recoverability and consistency etc).

    But right now it is clear we still have problems. Analytic data volumes some time ago began to exceed the levels that it was possible to process queries in a reasonable timeframe on a single node. Plucky young analytics startups realized this and now it is pretty common to find MPP solutions in many organizations across all sectors (yes I know MPP also isn’t new and Teradata has been around since the late 70’s, but MPP really hasn’t been mainstream accessible until recently). Also now even fairly benign enterprises, which may have hundreds or thousands of general purposes RDBMS deployments, are commonly struggling with a set of very high transaction processing requirements that are restricted with the limitations of GP RDBMS.

    And of course in the web space the GP RDBMS was prevalent with MySQL the modern default for any web related data store. But, Web 2.0 forced the big players to go off and solve their own scale problems. This has led to the development of Cassandra, Voldemort, Dynamo and Hadoop etc – and of course spawned the whole NoSQL movement. This was needs driven, I am sure they would have preferred the database community to have had a solution for them.

    Any slightly extreme requirement in terms of performance, scalability, predictability or volume on GP RDBMS becomes an expensive and continually frustrating challenge.

    With over 90% of a $20b market GP RDBMS won’t be going away soon. However I think it is clear that “one size” really doesn’t fit all, and the percentage of requirements it does fit is slowly declining (but still maintaining the vast majority for the time being). A more diverse set of data management options with specialized solutions in both OLTP & analytics seems to me to be a positive path forward.

    Comment by Tony Bain — 17/9/2009 @ 6:45

  4. It is a case of standardization and the “good enough” effect. Will RDMS’es always be on top of the heap? I don’t think so. But I do think they are going to be around a long time.

    The problem is there is a lot of things already built on top of them and quite a bit performs just fine.

    Just because another technology is superior in every way doesn’t guarantee that it will dominate over all others.

    Comment by Wes Brown — 17/9/2009 @ 8:14

  5. Mmmmm…
    Isn’t that old news, I mean… Internetwise!

    Comment by Kevembuangga — 17/9/2009 @ 14:24

  6. Mongodb use cases:
    “Highly transactional systems, such as banking systems and accounting. Applications with highly complex yet atomic transactions are more suited to a traditional relational DBMS.”
    http://www.mongodb.org/display/DOCS/Use+Cases

    Maybe most applications will be better off with key-value stores, but there are applications where RDBMSs are close to the perfect solution.

    Comment by Daniel Haran — 17/9/2009 @ 17:32

  7. What would be the equivalent of this for database development?
    Ha! Ha!

    Comment by Kevembuangga — 24/9/2009 @ 1:48

Sorry, the comment form is closed at this time.

« Blog's main page

Powered by WordPress