It is often important to index XPath queries. Not only is XPath useful on its own, but it is also the basis for the FLWOR expressions in XQuery.

A typical XPath expression will select only a small fraction of any XML document (such as the value of a particular attribute). Thus, a sensible strategy is to represent the XML documents as tables. There are several possible maps from XML documents to tables. One of the most common is ORDPATH.

In the ORDPATH model, the root node receives the identifier 1, the first node contained in the root node receives the identifier 1.1, the second one receives the identifier 1.2, and so on. Given the ORDPATH identifiers, we can easily determine whether two nodes are neighbors, or whether they have a child-parent relationship.

As an example, here’s an XML document and its (simplified) ORDPATH representation:


<liste temps="janvier" >
<bateau />
<bateau >
<canard />
</bateau>
</liste>

ORDPATH name type value
1 liste element -
1.1 temps attribute janvier
1.2 bateau element -
1.3 bateau element -
1.3.1 canard element -

Given a table, we can easily index it using standard indexes such as B trees or hash tables. For example, if we index the value column, we can quickly process the XPath expression @temps=”janvier”.

Effectively, we can map XPath and XQuery queries into SQL. This leaves relatively little room for XML-specific indexes. I am certain that XML database designers have even smarter strategies, but do they work significantly better?

Reference: P. O’Neil, et al.. ORDPATHs: insert-friendly XML node labels. 2004.

Further reading: Native XML databases: have they taken the world over yet?

4 Comments »

  1. Thanks Daniel. I’m interested in anything more you learn on this topic, so please follow up if and when you do.

    I think there’s a typo with “”: it should probably be “”.

    Comment by Andre Vellino — 22/6/2010 @ 10:41

  2. Interesting bug with WordPress. I tried to say that \ should be \ but all I see are “” signs

    Comment by Andre Vellino — 22/6/2010 @ 10:44

  3. OK I give up – how *do* you type XML in WordPress?

    Comment by Andre Vellino — 22/6/2010 @ 10:44

  4. Thanks Andre. My XML was not well formed. Hopefully, it is correct now.

    Comment by Daniel Lemire — 22/6/2010 @ 17:46

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress