XML for databases: a dead idea

One of my colleagues is teaching an artificial intelligence class. In his class, he uses old videos where experts from the early eighties make predictions about where AI is going. These experts come from the best schools such as Stanford.

These videos were not meant as a joke. When you watch them today, they are properly hilarious however. One of the predictions by a famous AI researcher was that the software industry would be dominated by expert systems by year 2000. This was a reasonable prediction: Wikipedia says that in the early 1980s, two thirds of the Fortune 1000 companies used expert systems in their daily business activities.

I believe that the majority of software programmers today would describe the importance of expert systems in their work to be… negligible. Of course, the researchers have not given up: the Semantic Web initiative can be viewed as a direct descendant of expert systems. And there are still some specific applications where an expert system is the right tool, I am sure. However, to put it bluntly, expert systems were a failure, by the standards set forth by their proponents.

Did you ever notice how much energy people put into promoting (their) new idea, and how little you hear about failures? That’s because there is little profit in calling something a failure and much risk: there are always people in denial who will fight you to the death.

I think it is unfortunate that we never dare look at our mistakes. What did Burke they say? “Those who don’t know history are destined to repeat it.”

When XML was originally conceived, it was meant for document formats. And by that standard… boy! did it succeed! Virtually all word processing and e-book formats are in XML today. The only notable failure is HTML. They tried to make HTML and XML work together, but it was never a good fit (except maybe within e-books). In a sense, the inventors of XML could not have succeeded more thoroughly.

Then, unfortunately, the data people took XML and decided that it solved their problems. So we got configuration files in XML, databases in XML, and so on. Some of these applications did ok. Storing data in XML for long-term interoperability is an acceptable use of XML. Indeed, XML is supported by virtually all programming languages and that is unlikely to change.

However, XML as a technology for databases was supposed to solve new problems. All major database vendors added support for XML. DBAs were told to learn XML or else… We also got handfuls of serious XML databases. More critically, the major database research conferences were flooded with XML research papers.

And then it stopped. For fun I estimated the number of research papers focused on XML in a major conference like VLDB: 2003: 27; 2008: 14; 2012: 3. That is, it went from a very popular topic for researchers to a niche topic. Meanwhile, the International XML Database Symposium ran from 2003 to 2010, missing only year 2008. It now appears dead.

That is not to say that there is no valid research focused on XML today. The few papers on XML accepted in major database journals and conferences are solid. In fact, the papers from 2003 were probably mostly excellent. Just last week, I reviewed a paper on XML for a major journal and I recommended acceptance. I have been teaching a course on XML every year since 2005 and I plan to continue to teach it. Still, it is undeniable that XML databases have failed as anything but a niche topic.

I initially wanted to write an actual research article to examine why XML for databases failed. I was strongly discouraged: this will be unpublishable because too many people will want to argue against the failure itself. This is probably a great defect of modern science: we are obsessed with success and we work to forget failure.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

14 thoughts on “XML for databases: a dead idea”

  1. @s

    To make things interesting, you need a sizeable community with stated goals and ambitions. You cannot just take any niche topic and call it a failure. A failure compared to what?

    I predicted the demise of the Semantic Web. It is still going strong under different names. It is all going to eventually collapse the same way expert systems failed.

    But failures in research are also about not investigating some problems, or falsely considering them as being solved. Database design comes to mind: it is considered a textbook topic… but it is far from being solved! In fact, the information provided by textbooks is not even valid! The truth is that textbooks don’t tell you how to design databases in the real world.

    That is, I believe, a great sin. We often insist that some problems are solved when what we advocate fails. Again, we refuse to see failures.

  2. @Robert Primmer

    What happened when expert systems failed? Did all these researchers renounced their way? Some have, but a lot continued to do the same research under different guise.

    Tim Berners-Lee teamed up with James Hendler to propose the Semantic Web. Hendler is a long-time classical AI guy who wrote a textbook on expert systems. You could reasonably view the Semantic Web as a distributed expert system where the expertise is encoded as RDF.

    They all share a common vision of what intelligence is… this vision goes back to classical AI. To these people, intelligence is a collection of facts together with a “reasoning engine”.

    This vision has failed, repeatedly, and it keeps on failing… but its proponents have no trouble finding new disciples with each generation because their vision is compelling. It is still wrong though. It does not lead to intelligence.

    It all makes me a bit sad.

  3. The VLDB statistics are even more striking when you note that the number of VLDB papers per year has roughly doubled in that time frame (from 100 to 200). So as a percentage of accepted VLDB papers, the numbers look more like 2003: 25%, 2008: 10%, 2012: 1.5%.

  4. I have a huge grudge against XML. People are so obsessed by it, so they don’t want to tolerate a couple of binary formats here and there. At the same time, from the performance perspective, XML sucks big time.

  5. Some would argue that we sort of having that mega expert system that transformed the world. Some call it “Dr. Google” and ask it all sort of questions in their native language. While the answer they get are most indirect, answer they are still.

    You can find more ‘expert systems’ like that (e.g Wolfram Alpha) and while purist would object saying those are not expert system, I certainly think they are and am sure that if had we described what they do to someone back then in the early 80, they would have also counted them as expert system.


  6. @Harari

    I am sure that some would call Google an expert system. The only problem is that it works nothing like an expert system.

    In fact, if you went back in 1980 and told people who Google works, they would not believe you that such a thing is possible.

    Heck. They would not believe that Wikipedia is possible.

  7. @Itman

    I actually would have liked XML databases to succeed. It would have given database researchers decades of research problems.

    I don’t understand why the research aspect of it collapsed so fast. It is somewhat puzzling. Researchers are typically more stubborn.

  8. okay so xml is not great as a database per se compared to mysql etc but many bodies and industry are using xml to store and carry data with standards such as HL7 and the finance standards for credit card data etc

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.