Parsing comma-separated integers in Java

We often encounter lists of integers (e.g., “1,2,3,10,1000”) stored in strings. Parsing these strings for the integer values can become a performance bottleneck if you have to scan thousands of those strings.

The standard Java approach is to use the Scanner class, as follows:

Scanner sc = new Scanner(input).useDelimiter(",");
ArrayList<Integer> al = new ArrayList<Integer>();
while (sc.hasNextInt()) {
    al.add(sc.nextInt());
}

Another sensible approach is to use the String.split method:

String[] p = input.split(",");
 int[] ans = new int[p.length];
 for (int i = 0; i < p.length; i++) {
    ans[i] = Integer.parseInt(p[i]);
 }

Which is fastest? I tested with a string made of 2048 random integers. My results indicate that splitting the strings is much faster:

Scanner800 ops/s
Split8000 ops/s

It is pathetic. If all my machine had to do was serve requests to split strings of 2048 integers, it would top at 800 queries per second when using the Scanner method.

Is this the best you can do? Not by a long shot. I threw together a manual solution that is twice as fast as the split method described above. But that, itself, is probably not even close to being optimal. My guess is that it ought to be possible to be at least 10 times faster than the Split method. And it is entirely possible that I am being pessimistic.

My code is available.

Published by

Daniel Lemire

A computer science professor at the Université du Québec (TELUQ).

2 thoughts on “Parsing comma-separated integers in Java”

  1. Your manual implementation doesn’t even work. The results don’t correlate with the actual values; partially because of some false assumptions. I could go on but it was actually easier to write a working implementation with ~4x the throughput.

    m.l.m.p.ParseInt.manualSplit thrpt 5 12624.529 � 445.839 ops/s
    m.l.m.p.ParseInt.monolithicSplit thrpt 5 41031.703 � 5054.782 ops/s

    @Benchmark
    public int[] monolithicSplit(BenchmarkState s) {
    final String text = s.myarray;
    final int length = text.length();
    final int limit = (length / 2) + 1;
    int[] array = new int[limit];
    int pos = 0, tmp = 0;
    boolean neg = false;
    for (int i = 0; i < length; i++) {
    char c = text.charAt(i);
    if (c == ',') {
    array[pos++] = neg ? 0 – tmp : tmp;
    tmp = 0;
    neg = false;
    } else if (c == '-') {
    neg = true;
    } else {
    tmp = (tmp << 3) + (tmp << 1) + (c & 0xF);
    }
    }
    array[pos++] = neg ? 0 – tmp : tmp;
    int[] result = new int[pos];
    System.arraycopy(array, 0, result, 0, pos);
    return result;
    }

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax