Indexing XML

It is often important to index XPath queries. Not only is XPath useful on its own, but it is also the basis for the FLWOR expressions in XQuery.

A typical XPath expression will select only a small fraction of any XML document (such as the value of a particular attribute). Thus, a sensible strategy is to represent the XML documents as tables. There are several possible maps from XML documents to tables. One of the most common is ORDPATH.

In the ORDPATH model, the root node receives the identifier 1, the first node contained in the root node receives the identifier 1.1, the second one receives the identifier 1.2, and so on. Given the ORDPATH identifiers, we can easily determine whether two nodes are neighbors, or whether they have a child-parent relationship.

As an example, here’s an XML document and its (simplified) ORDPATH representation:


<liste temps="janvier" >
<bateau />
<bateau >
<canard />
</bateau>
</liste>

ORDPATH name type value
1 liste element
1.1 temps attribute janvier
1.2 bateau element
1.3 bateau element
1.3.1 canard element

Given a table, we can easily index it using standard indexes such as B trees or hash tables. For example, if we index the value column, we can quickly process the XPath expression @temps=”janvier”.

Effectively, we can map XPath and XQuery queries into SQL. This leaves relatively little room for XML-specific indexes. I am certain that XML database designers have even smarter strategies, but do they work significantly better?

Reference: P. O’Neil, et al.. ORDPATHs: insert-friendly XML node labels. 2004.

Further reading: Native XML databases: have they taken the world over yet?

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

4 thoughts on “Indexing XML”

  1. Thanks Daniel. I’m interested in anything more you learn on this topic, so please follow up if and when you do.

    I think there’s a typo with “”: it should probably be “”.

Leave a Reply

Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com.

You may subscribe to this blog by email.