Don’t read your data from a straw

It is common for binary data to be serialized to bytes. Data structures like indexes can be saved to disk or transmitted over the network in such a form. Many serialized data structures can be viewed as sets of ‘integer’ values. That is the case, for example, of a Roaring bitmap. We must then read back this data. An integer could be serialized as four bytes in a stream of bytes.

We are thus interested in the problem where we want to convert an array of bytes into an array of integer values.

In C or C++, you can safely convert from a stream of bytes to in-memory values using a function like ‘memcpy’. It is fast. Or you can just cheat and do a “cast”.

What do you do in Java?

A convenient approach is to wrap the byte array into an InputStream and then wrap that InputStream into a DataInputStream, like in this example where we convert an array of bytes into an array of integers:

byte[] array = ...
int[] recipient = new int[N / 4];
DataInput di = new DataInputStream(new 
            ByteArrayInputStream(array));
for(int k = 0; k < recipient.length; k++)
    recipient[k] = di.readInt();

The benefit of this approach is improved abstraction: you do not care whether the data comes from an array of bytes or from disk, it is all the same code. If you are have serialization and deserialization code, it is probably written in terms of OutputStream and InputStream anyhow, so why not reuse that perfectly good code?

However, Java offers a performance-oriented concept called a ByteBuffer to represent and array of bytes. It is not as high level as an input stream since it assumes that you do have, somewhere, an array of bytes.

You can achieve the same conversion as before using a ByteBuffer instead:

byte[] array = ...
int[] recipient = new int[N / 4];
ByteBuffer bb = ByteBuffer.wrap(s.array);
bb.asIntBuffer().get(recipient);

Here is the time required to convert 1 million 32-bit integers on my 4.2GHz 2018 iMac:

DataInputStream 10 ms
ByteBuffer 1 ms

That is, the ByteBuffer is 10x faster. My code is available.

Because I have 1 million integers, we can convert back these timings into “time per integer”: the ByteBuffer approach achieves a speed of one 32-bit integer converted per nanosecond. Given that my iMac can execute probably something like a dozen operations per nanosecond, that’s not impressive… but it is at least a bit respectable. The DataInputStream takes 10 nanosecond (or something like 40 or 50 cycles) per integer: it is grossly inefficient.

This has interesting practical consequences. In the RoaringBitmap project, Alvarado pointed out that it is faster to create a bitmap using a ByteBuffer backend, and then convert it back into an normal data structure, than to construct it directly from an input steam. And the difference is not small.

Practically, this means that it may be worth it to provide a special function that can construct a bitmap directly from a ByteBuffer or a byte array, bypassing the stream approach. (Thankfully, we have bitmaps backed with ByteBuffer to support applications such as memory-file mapping.)

Speaking for myself, I was going under the impression that Java would do a fine job reading from a byte array using an input stream. At the very least, we found that ByteArrayInputStream is not the right tool. That a ByteBuffer would be fast is not so surprising: as far as I can tell, they were introduced in the standard API precisely for performance reasons. However, a factor of ten is quite a bit more than I expected.

In any case, it is a perfectly good example of the problem whereas abstractions force you to consume data as if it went through a straw. Streams and iterators are handy abstractions but they often lead you astray with respect to performance.

Further reading. Ondrej Kokes has reproduced these results in Go.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Don’t read your data from a straw”

  1. One of the things that can hurt Java here is the classic big endian vs. little endian problem…
    In a lot of cases, Java is prepared to swap endianess to be compatible across different CPU architectures. Something where in C you usually have to manually insert htonl and ntohl calls etc. – is all your code endianess safe?

    1. Thanks for raising this point.

      In the case above, we do Java-vs-Java comparisons so endianness is not an issue.

      In both Java and C/C++, you sometimes need to flip the bytes around. In C/C++, you have to check whether you have a big endian or little endian system, whereas with Java, it is always big endian. Yet, in my own experience, I have been able to safely assume that all systems I care about are little endian. So I have designed binary formats that are explicitly little endian.

      This being said, the computational burden of reversing byte order is tiny.

  2. ByteBuffer has very unfortunate API. Pretty much all systems/high-performance projects in Java (Netty, Aeron, Chronicle, etc) reimplement it on their own.

    We’ve built Memory project (link) specifically to back up data structure implementations in Java. It is used in DataSketches (link).

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax