XPath support in Java 1.5

Things are getting somewhat better in Java land. You can no do some XPath work in Java, see this sample code I wrote this morning (it is not standalone though):

    String xpathexpression = "//xdoc[dtd!='']/fname/text()";
    XPath xpath = XPathFactory.newInstance().newXPath();
    InputSource indexname_input = new InputSource(indexname);
    NodeList nl = (NodeList) xpath.evaluate(xpathexpression, 
                              indexname_input, XPathConstants.NODESET);
    for (int i = 0; i < nl.getLength(); ++i) {
      System.out.println("loading document " + (i + 1) + " of " + nl.getLength());
      System.out.println("It uses DTD: "+xpath.evaluate("../../dtd",nl.item(i)));
      String xmlfile = nl.item(i).getNodeValue();
      String xmlPath = baseurl + datadir + xmlfile;

However, I was disappointed to see that the new “foreach” construct in Java doesn’t apply to NodeList objects… I’m sorry, but I getting more and more convinced, with every version, that Java is an ugly hack. I mean, you have a collection of nodes, a standard one at that, and you can’t “foreach” it… what gives?

What is the “foreach” construct: Java 1.5 introduces the idea, well known in many languages, of the “foreach” construct. In effect, if you have a set of elements and want to go through them one at a time, using “for(int i=0; i < length; ++i)” is ugly and error-prone. It is much better to do ” for (element in set) “. Java 1.5 now has this as “for (type element : set)”. This being said, I was under the impression that the Java people had been careful to make sure that the “foreach” construct would work with all standard collections of objects… not so, alas.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

10 thoughts on “XPath support in Java 1.5”

  1. It probably would not be that hard to write a iterator for the NodeList object or even extend NodeList to implement Iterator. If you did that, foreach should work just fine.

  2. The problem is that the Sun engineers should have made the foreach construct work with their API.

    Basically, I think that Sun’s leadership with regard to Java is damaging, we don’t have a the benefit of a community-based approach (Sun keeps on reinventing what other people have done) and we have the drawbacks of a community-based approach (an heterogeneous API).

  3. John, did I promote .NET? No I did not. Life is not a choice between Java and C#. There are many other languages out there… Python, PHP, Lisp, Scheme, Haskell, Prolog, BASIC, Delphi…

    Some, like Python, implement iterators in an elegant and consistent fashion.

    In fact, using Jython, under a JVM, you get iterators that work well. So the problem is not with the Java architecture, but rather with the fact that the Java language itself is a bit of a hack and it is certainly not as good as it could have been.

    I don’t use .NET. I don’t use C#. I don’t care about these technologies and I’m never comparing Java to C# or VB.net.

  4. A little missing like forEach is far out-wieghed by the rest of the advantages that java offers. You’re obviously acutely ignorant. Why don’t you go use .Net and see if your code still runs when .Net2 comes out.

  5. NodeList has nothign to do with Sun. Is is a w3c dom API. And of course, the DOM API has to catch up to Java 1.5 to offer an iterable NodeList. So your accusation of Sun and Java is plain invalid.

  6. However, while NodeList doesn’t belong to Sun, several other objects belonging to Sun don’t support foreach. For example, you cannot use foreach with an iterator (why not?).

  7. FYI … This is from the 1.5 tutorial and explains when the for each construct is used in java:

    Traversing Collections
    There are two ways to traverse collections: (1)with the for-each construct and (2) by using Iterators.
    for-each Construct
    The for-each construct allows you to concisely traverse a collection or array using a for loop — see The for Statement. The following code uses the for-each construct to print out each element of a collection on a separate line.
    for (Object o : collection)

    An Iterator is an object that enables you to traverse through a collection and to remove elements from the collection selectively, if desired. You get an Iterator for a collection by calling its iterator method. The following is the Iterator interface.
    public interface Iterator {
    boolean hasNext();
    E next();
    void remove(); //optional

    The hasNext method returns true if the iteration has more elements, and the next method returns the next element in the iteration. The remove method removes the last element that was returned by next from the underlying Collection. The remove method may be called only once per call to next and throws an exception if this rule is violated.
    Note that Iterator.remove is the only safe way to modify a collection during iteration; the behavior is unspecified if the underlying collection is modified in any other way while the iteration is in progress.

    Use Iterator instead of the for-each construct when you need to:

    Remove the current element. The for-each construct hides the iterator, so you cannot call remove. Therefore, the for-each construct is not usable for filtering.
    Replace elements in a list or array as you traverse it.
    Iterate over multiple collections in parallel.
    The following method shows you how to use an Iterator to filter an arbitrary Collection — that is, traverse the collection removing specific elements.
    static void filter(Collection c) {
    for (Iterator i = c.iterator(); i.hasNext(); )
    if (!cond(i.next()))

    This simple piece of code is polymorphic, which means that it works for any Collection regardless of implementation. This example demonstrates how easy it is to write a polymorphic algorithm using the Java Collections Framework.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax