Native XML databases: have they taken the world over yet?

Some years ago, the database research community jumped into XML. Finally, something new to work on! For about 5 years now, I have seen predictions that the XML databases would take the world over. Every organization would soon have its XML database. People would run web sites out of XML databases. Countless start-ups emerged ready to become the next Oracle.

What happened in practise is a bit underwhelming. Oracle, Microsoft, MySQL and others all included some XML support in their relational databases, but native XML databases failed to grab any market share.

Where are we?

  • Regarding programming languages, XQuery finally became a W3C recommendation in January 2007. More or less, XQuery together with XPath specify the equivalent of a select instruction in SQL.
  • What if you want to update your XML database?
    XUpdate has been around for some time, but it is not widely supported. The W3C is working on something called XQuery Update Facility.
  • Interfacing XQuery with your favorite programming language is still awkward. We have an API for XML databases (XML:DB), but I am not sure how well it is supported by the various vendors.

Want to take an XML database out for a spin? Some XML databases worthy of mention:

  • eXist is open source and free.
  • Sedna is another free XML database.

My take: Once again, the relational data model shows great resilience in the marketplace. It is entirely possible that XML databases may go the way of the objected-oriented databases: useful for some niche problems, but nothing more. We could blame the lack of standards for the failure of XML databases, but SQL was never standardized and still took off.

I like XML. I like CSS. I like XSLT/XPath. But I have always be less certain about XQuery.

XML databases look too much like a solution in search of a problem.

Reference: The W3C publishes the result of their XQuery conformance testing. There is a lot of room for improvement!

10 thoughts on “Native XML databases: have they taken the world over yet?”

  1. Those databases came around a time I was starting to throw up XML and XSLT.

    The new hotness – one that’s not just driven by marketing hype – is document dbs. CouchDB looks like it’s on course to let me do things as a developer that none of these legacy vendors are really attempting.

  2. XML and XSLT are fine. I like them both. For some tasks, they are ideal. (And no, they are not good for most things.)

    Document-based databases such as CouchDB and Lotus Notes are indeed very interesting. I am just too cheap and lazy to get a cluster and work with CouchDB.

  3. Mark Logic (I work there) offers an XML database that can be used for free for personal non-commercial projects:
    http://developer.marklogic.com/

    XML databases are ideal for the storage and query of documents where the content and the structure of the document are part of the query. In this case an XML database provides the ability to query documents at an arbitrary granularity across different document schemas.

    An interesting read related to the topic of trends in different types of database systems is the ‘One Size Fits All’ paper from Stonebraker et al:
    http://www.cs.brown.edu/~ugur/fits_all.pdf

  4. @Vermeulen

    RDF is a (flexible) data model, not a database technology. Comparing CouchDB to RDF is apple-to-oranges. RDF does not say anything about indexing, aggregation, querying, updating… it is just a data model. In fact, RDF is not even XML (a common misconception)… it is just often written as XML.

    What something like CouchDB does is to allow you to search and aggregate without *any* top-down schema definition.

    Suppose, for example, that you want to add a new attribute to an existing database, say “cost in Canada”. With a tool like MySQL, this means you must change some table definition. But you cannot allow just any user to do it.

    So your tool is not very flexible. With CouchDB… you are free as a bird.

    But how do they still get fast queries? Ah! There is the magic!

  5. @Daniel Thanks for the detailed explanation!

    I was actually referring to RDF mapped into a relational DB to allow for more flexible schemas. But I’m not at all sure if this will really work, and what will be the performance implications. See: http://www.rdfabout.com/comparisons.xpd#versus-rdbms and http://infolab.stanford.edu/~melnik/rdf/db.html for how RDF could be mapped to a relational DB.

    CouchDB certainly seems interesting, I should have a better look at it.

  6. @Vermeulen Ok. But still, RDF is at the model level. Something like CouchDB is really at the physical level. (It is actually an implementation of a physical model.)

    I guess you could map a RDF model to CouchDB or to just about any database engine. As far as I can see, any database able to represent a 3-column table can be used with RDF.

  7. I see. I often just use RDF as a distributed data store whose schema can evolve easily 🙂 Heavy inferencing is usually too slow on mobile devices anyway. Maybe CouchDB can then be an alternative for this particular use case.

    I believe so as well, representing (subject predicate object) triples is all you need.

  8. Hi Daniel,

    Having worked at an object database company (Versant) and an XML database company (Mark Logic), I believe that things are different this time.

    I believe ODBMS failed to achieve broad adoption for two reasons: (1) the RDBMS itself was just being adopted so the timing was too early, (2) the primary value of an ODBMS was in easy persistence of C++ objects that could be worked around with about 15% more effort to map them relationally.

    I think XML databases are different in a few respects. (1) ODBMS were DBMS i.e., they focused on D, data. Successful XML databases (e.g., MarkLogic) focus on content (i.e., documents). The data/document divide is real and there is a bigger gap that’s harder for the RDBMSs to simply absorb. Sure they can stuff XML in columsn, but can they search large amounts of it effectively? No yet.

    (2) XML databases are emerging at a time of general specialization in the DBMS market.

    Think Teradata (data warehouse), Netezza (DW), Streambase (streams), MarkLogic (XML), Vertica (columns), and arguably even BigTable (parallelization) as many different types of DBMSs that are emerging.

    So the idea isn’t as simple as one new type of DBMS will replace the RDBMS. The RDBMS, slowly and over time, will be replaced by a family of specialized ones.

Leave a Reply

Your email address will not be published. Required fields are marked *