Parsing comma-separated integers in Java

We often encounter lists of integers (e.g., “1,2,3,10,1000”) stored in strings. Parsing these strings for the integer values can become a performance bottleneck if you have to scan thousands of those strings.

The standard Java approach is to use the Scanner class, as follows:

Scanner sc = new Scanner(input).useDelimiter(",");
ArrayList<Integer> al = new ArrayList<Integer>();
while (sc.hasNextInt()) {
    al.add(sc.nextInt());
}

Another sensible approach is to use the String.split method:

String[] p = input.split(",");
 int[] ans = new int[p.length];
 for (int i = 0; i < p.length; i++) {
    ans[i] = Integer.parseInt(p[i]);
 }

Which is fastest? I tested with a string made of 2048 random integers. My results indicate that splitting the strings is much faster:

Scanner800 ops/s
Split8000 ops/s

It is pathetic. If all my machine had to do was serve requests to split strings of 2048 integers, it would top at 800 queries per second when using the Scanner method.

Is this the best you can do? Not by a long shot. I threw together a manual solution that is twice as fast as the split method described above. But that, itself, is probably not even close to being optimal. My guess is that it ought to be possible to be at least 10 times faster than the Split method. And it is entirely possible that I am being pessimistic.

My code is available.

2 thoughts on “Parsing comma-separated integers in Java”

  1. Your manual implementation doesn’t even work. The results don’t correlate with the actual values; partially because of some false assumptions. I could go on but it was actually easier to write a working implementation with ~4x the throughput.

    m.l.m.p.ParseInt.manualSplit thrpt 5 12624.529 � 445.839 ops/s
    m.l.m.p.ParseInt.monolithicSplit thrpt 5 41031.703 � 5054.782 ops/s

    @Benchmark
    public int[] monolithicSplit(BenchmarkState s) {
    final String text = s.myarray;
    final int length = text.length();
    final int limit = (length / 2) + 1;
    int[] array = new int[limit];
    int pos = 0, tmp = 0;
    boolean neg = false;
    for (int i = 0; i < length; i++) {
    char c = text.charAt(i);
    if (c == ',') {
    array[pos++] = neg ? 0 – tmp : tmp;
    tmp = 0;
    neg = false;
    } else if (c == '-') {
    neg = true;
    } else {
    tmp = (tmp << 3) + (tmp << 1) + (c & 0xF);
    }
    }
    array[pos++] = neg ? 0 – tmp : tmp;
    int[] result = new int[pos];
    System.arraycopy(array, 0, result, 0, pos);
    return result;
    }

Leave a Reply

Your email address will not be published. Required fields are marked *