The Web is killing database systems

A typical enterprise computing architecture relies on databases, professionally managed by DBAs. Developers grow applications which all update or query the same databases. The value is not in the software per se, but in the data architecture.

Given the DNA of our industrial-age organizations, this makes sense. The data was stored in books and entered by clerks. The clerks have to be interchangeable, easily replaceable. The data, however, is the blood of your company. When running a factory, if you can’t keep track of sales and income, you die. Software replaced the clerks, but it is just as insignificant, just as replaceable. The database system itself is akin to the books: it is not thought of as software, but as support for the data. In this sense, database systems acquire a mythical status in enterprise computing. You get people swearing by the database system as if it were a religion. Of course, enterprises are often stuck with software that they cannot replace. But this is often seen as a weakness. Meanwhile, being stuck with a database system is not a concern in enterprise computing.

People often think that a company like Google is all about the data. But, of course, this is wrong. I could wipe out all of Google’s databases. It would hurt Google’s stock prices. But within 6 months to a year, Google would be back where it is. Part of the value of Google is the brand itself. But if brand was everything, then Microsoft or Yahoo! would have wiped out Google a long time ago. The value of Google is in the software itself (and in its software engineers).

For many people who love software, the natural evolution of your architecture goes as follows:

  1. Build application with what is effectively an embedded database. If you use a database system, it is mostly to save yourself some coding.
  2. If others need your data, you build an API engineered from your application (typically as a web service). Instead of offering people direct access to part of your database, you effectively build a machine-friendly version of your application.

This means that the ratio between applications and databases moves closer to 1. Let us call this model software-centric.

One of the reasons a database system like Oracle is valuable in enterprise computing is that you can throw away the applications, and you still have your data. It is data-centric. But if you use Oracle with the software-centric model, the value lies entirely in the scalability, expressivity and reliability of Oracle’s software. While Oracle makes solid software, other people may make software that is a better fit for the application at hand.

The software-centric model also allows more innovation. It is not tempting to invent something better than the double-entry accounting system: it works and all clerks should know about it. Similar, in traditional enterprise computing, it is not tempting to use something other than a relational database. But in the software-centric model, only few people will ever touch the database system, assuming there is one. And, in fact, developers often change the database system. For this reason, they do not want others to access the database directly.

In a typical enterprise computing database, the semantics must lie with the database (e.g., through documentation) because we want to be able to throw away the software. The software-centric system also captures the semantics in software. While this can be tragic if you have poor programmers, this can be a blessing if you have top-notch programmers. There is nothing more reusable than a well-designed API.

So? Which is better? The software-centric or the data-centric approach?  Most large organizations have many bad programmers. This should not come as a surprise: they don’t value software much. So, the software-centric approach would be a catastrophe for them. And, at least so far, they have not had much need for great software.

But the innovation is with software-centric engineering. This means that eventually, software-centric tools will be orders of magnitude better than the data-centric ones. The IT department at your local large company is already outgunned a hundred-to-one by what Google can offer. The gap will grow wider until data-centric systems are finally retired.

The Web, like nothing else, has embraced the software-centric approach. In this sense, the Web is killing database systems. How else could you explain the relative dominance of MySQL? (Remember that Facebook uses MySQL.) MySQL is hardly the best database system around, even among free solutions. The truth is that the database system no longer matters.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

17 thoughts on “The Web is killing database systems”

  1. @Regehr

    Good or bad?

    We are in pain right now with respect to data. Most people are stuck with paradigms that have always only barely worked.

    My guess is that it is a good time for innovation.

  2. @Itman

    Consider what I wrote:

    Similar, in traditional enterprise computing, it is not tempting to use something other than a relational database.

    Moreover, Google is not interested in building the kind of sales operations that Microsoft and Oracle have.

    So if I were in charge of the system you describe, I would not change it, nor would I fear Google.

    But I am pretty sure that the guys running corporate email systems thought that hotmail was a joke… and until recently, they also thought that GMail was a joke… except that more and more large organizations ditch corporate email systems for GMail. Oooops! The fact is that Google’s software is 10 or 100 times better than most corporate email systems. Yes, it falls short of doing everything. But it gets the things that matter right.

  3. Imagine an enterprise that has a relatively simple accounting system, which relies, say, on a database with 100 tables and 200 stored procedures/functions. Everyone familiar with enterprise computing will agree that this would be a small system. What can Google suggest to replace this system?

  4. MySQL is free. Oracle is not.

    PostgreSQL is free too, and it has been technically superior on most points to MySQL for years.

    IT, however, is *by definition* all about managing information for the benefit of the business (…) as I noted above, data is the only reason you HAVE an IT application .. product data, order data, customer data, sales data, logistics data, etc.

    Yes. We agree. Enterprise computing is data centric.

  5. This is utterly wrong-headed.

    Firstly, you’re conflating the reasons for why people like things like MySQL with some broader trend. It’s much simpler than that: MySQL is free. Oracle is not.

    Secondly, you’re confusing technology businesses and IT. Google’s technology business is all about the software, because their business is search and advertising. IT, however, is *by definition* all about managing information for the benefit of the business.

    If, hypothetically speaking, Google’s IT databases (and backups) failed, their stock price would more than crash – they would lose the ability to run and operate their business and claim massive ($billions) in financial losses while they tried to recover. Senior executives would have to resign. They don’t have the human power to manually process and recovery transactions — nobody does.

    Furthermore, IT applications at most shops are rarely thrown out – the legacy lasts upwards of 10 to 15 years. Yes, data (often) must be preserved over the applications because, as I noted above, data is the only reason you HAVE an IT application .. product data, order data, customer data, sales data, logistics data, etc. All of that is a massive pain in the ass to lose or have in poor quality as it leads to poor customer service and poor operational performance.

    Finally, there’s the need for BI and analysis of data. Massive data warehouses are the norm – with Hadoop-style clusters growing in influence because handles scale well but held back because of all the BI tooling. Companies like Amazon, Yahoo, etc., all have huge Oracle databases or the like to complement their “software-centric” systems, you just don’t see them on geek blogs because they’re old hat.

    Basically you seem to be arguing that technology companies are better than non-technology companies because they don’t have to build IT systems. Which is nonsense. A more modest argument, the other hand, would be that it’s better for a developer to work for a technology company than an IT shop, I have no real argument (IT shops usually suck, though it can be fun if you have the authority to help them suck less).

  6. @Where_

    Your speculations are a complete bullshit, Daniel.

    Oddly, I agree with you wrote and I think it matches what my post says. Can we blame this disagreement on my communication skills?

    But what Google does the best is that data generalization and coverage. No enterprise is interested in doing that.

    Many companies would like to drown in data the way Google or Amazon does, but they simply cannot do it. It took years for Microsoft to catch up with Google and they had unlimited budgets.

    (…) in 2-3 year time Google (and alike) WILL become those enterprise databases.

    While I did not specify a time frame (e.g. 2-3 years), I made exactly the same prediction:

    “This means that eventually, software-centric tools will be orders of magnitude better than the data-centric ones. The IT department at your local large company is already outgunned a hundred-to-one by what Google can offer. The gap will grow wider until data-centric systems are finally retired.”

  7. Your speculations are a complete bullshit, Daniel. They are based on no facts, nor a familiarity with Google ecosystem.

    Google is all about data, just as much as any enterprise. But what Google does the best is that data generalization and coverage. No enterprise is interested in doing that. Add to this scales, processing power, access APIs, frameworks, platforms build all around that huge data, and you get to one conclusion and only: in 2-3 year time Google (and alike) WILL become those enterprise databases.

    Start learning facts. Way to go Google visionaries!

  8. Daniel,

    Well, if that’s what you meant…
    You confused me a lot with web and data-centric and API and software vs. data stuff. The points should be simple:
    1) universal database (both in terms of scale and usage scenarios) is just a matter of technology;
    2) Google (and alike) are able to deliver those technologies BECAUSE they are not nailed to a few business processes, and data thereof, to milk their money;
    3) that’s a great business for the future, but only few companies will enjoy from sharing this market;
    4) the rest will pay them fortune, but still save money compared to the today’s investment in IT and stuff;
    5) the time has arrived, the Cloud is just another buzz word, let’s see beyond;
    6) in 2-3 years time we’ll start to feel a substantial move of the data from enterprises into Google (and alike) silos

    You MUST learn the emerging Google ecosystem to see the complete picture. To name a few: Chrome, Android, Apps, APIs (Mail, Contacts, Calendar, Tasks, Prediction, Checkout, Authentication, Maps, and whatever you could possibly think of… and map your enterprise processes/data to!)

  9. @Where_

    In the future, we are going to offer services comparable to what enterprise computing offers right now, but using orders of magnitude fewer staff members. I have trouble imagining a future where this does not happen.

    Of course, if I *were* working in an IT department, I would not worry. I would feel quite safe. After all, we have been talking about automated maintenance of IT systems for decades and nothing much has happened.

    But because I look at things from a distance, I can only imagine that many of these IT folks will quickly become unemployed (and unemployable). The reason IT folks cannot see the threat is that it does not come from enterprise computing vendors. It comes from outsiders, and possibly from companies that don’t exist yet.

    For me, as a database researcher, this means that I need to play with more radical ideas. I use this blog to discuss some of them. See, for example, my post Who will need database administrators in 2020?

    Possibly, people who are interested in the future of data systems should create discussions groups. Forums where ideas about the future can be discussed.

  10. I suspect the problem you’re pointing to is that of the database schema as a user interface. The plus of a database for accounting is that it’s not meant to serve as a user interface, it’s meant to serve as a substrate for report design and generation.

    In your software-centric model, the database is not the whole story. Either it is buffered by services accessing limited views (materialized tables), or it is augumented by principal components data such as user profiles and query choices made.

    The questions to me are what semantic domain a database supports, if fuzzier or sharper results are needed, and if queries are fresh or canned.

  11. Your text is not as clear as it could be IMHO, but while reading the comments and your reponses, it’s much more clear.
    Thanks.

    I think another way to tell the story is the following:
    – the package is now bigger than before
    – the package is now all-inclusive: so, just “plug and play”
    – you don’t have to care about internals, and then, about the type of the database
    – you don’t see and you don’t care about the database, all you see is user experience and software
    – all you care is GUI+SLA

    So, in one sentence : users are moving towards SaaS

    I have got another way to tell it, through metaphor, this time 😉
    – let’s say, you are very fond of dancing
    – the only place you know, for dancing, is a place where girls look like all the same
    – So, with limited choice, you are quite akin to follow whatever these girls ask you
    – you dance less, as you say ‘yes’ to whatever the girls ask you, to please them, because they are all girls you know for dancing
    – then, you discover “Google Search” and guess what !?
    – you discover much more dancefloors
    – you will forget the time within the only girls you met for dancing look like all clones.
    – may be, you are able to find a girl about who you feel in mutual attraction, and you dance more or less, that’s not the point: in fact, you have had choice, and you are more globally happy
    – and Google-like companies are now your friends 😉

Leave a Reply to Daniel Lemire Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.