We often encounter lists of integers (e.g., “1,2,3,10,1000”) stored in strings. Parsing these strings for the integer values can become a performance bottleneck if you have to scan thousands of those strings.
The standard Java approach is to use the Scanner class, as follows:
Scanner sc = new Scanner(input).useDelimiter(","); ArrayList<Integer> al = new ArrayList<Integer>(); while (sc.hasNextInt()) { al.add(sc.nextInt()); }
Another sensible approach is to use the String.split method:
String[] p = input.split(","); int[] ans = new int[p.length]; for (int i = 0; i < p.length; i++) { ans[i] = Integer.parseInt(p[i]); }
Which is fastest? I tested with a string made of 2048 random integers. My results indicate that splitting the strings is much faster:
| Scanner | 800 ops/s |
| Split | 8000 ops/s |
It is pathetic. If all my machine had to do was serve requests to split strings of 2048 integers, it would top at 800 queries per second when using the Scanner method.
Is this the best you can do? Not by a long shot. I threw together a manual solution that is twice as fast as the split method described above. But that, itself, is probably not even close to being optimal. My guess is that it ought to be possible to be at least 10 times faster than the Split method. And it is entirely possible that I am being pessimistic.
Your manual implementation doesn’t even work. The results don’t correlate with the actual values; partially because of some false assumptions. I could go on but it was actually easier to write a working implementation with ~4x the throughput.
m.l.m.p.ParseInt.manualSplit thrpt 5 12624.529 � 445.839 ops/s
m.l.m.p.ParseInt.monolithicSplit thrpt 5 41031.703 � 5054.782 ops/s
@Benchmark
public int[] monolithicSplit(BenchmarkState s) {
final String text = s.myarray;
final int length = text.length();
final int limit = (length / 2) + 1;
int[] array = new int[limit];
int pos = 0, tmp = 0;
boolean neg = false;
for (int i = 0; i < length; i++) {
char c = text.charAt(i);
if (c == ',') {
array[pos++] = neg ? 0 – tmp : tmp;
tmp = 0;
neg = false;
} else if (c == '-') {
neg = true;
} else {
tmp = (tmp << 3) + (tmp << 1) + (c & 0xF);
}
}
array[pos++] = neg ? 0 – tmp : tmp;
int[] result = new int[pos];
System.arraycopy(array, 0, result, 0, pos);
return result;
}
Your approach is indeed better than my manual hack and I have updated the code accordingly. Note that this was indicated in my post.