Writing out large arrays in Go: binary.Write is inefficient for large arrays

Programmers often need to write data structures to disk or to networks. The data structure then needs to be interpreted as a sequence of bytes. Regarding integer values, most computer systems adopt “little endian” encoding whereas an 8-byte integer is written out using the least significant bytes first. In the Go programming language, you can write an array of integers to a buffer as follows:

var data []uint64
var buf *bytes.Buffer = new(bytes.Buffer)

...

err := binary.Write(buf, binary.LittleEndian, data)

Until recently, I assumed that the binary.Write function did not allocate memory. Unfortunately, it does. The function converts the input array to a new, temporary byte arrays.

Instead, you can create a small buffer just big enough to hold you 8-byte integer and write that small buffer repeatedly:

var item = make([]byte, 8)
for _, x := range data {
    binary.LittleEndian.PutUint64(item, x)
    buf.Write(item)
}

Sadly, this might have poor performance on disks or networks where each write/read has a high overhead. To avoid this problem, you can use Go’s buffered writer and write the integers one by one. Internally, Go will allocate a small buffer.

writer := bufio.NewWriter(buf)
var item = make([]byte, 8)
for _, x := range data {
	binary.LittleEndian.PutUint64(item, x)
	writer.Write(item)
}
writer.Flush()

I wrote a small benchmark that writes an array of 100M integers to memory.

function memory usage time
binary.Write 1.5 GB 1.2 s
one-by-one 0 0.87 s
buffered one-by-one 4 kB 1.2 s

(Timings will vary depending on your hardware and testing procedure. I used Go 1.16.)

The buffered one-by-one approach is not beneficial with respect to speed in this instance, but it would be more helpful in other cases. In my benchmark, the simple one-by-one approach is fastest and uses least memory. For small inputs, binary.Write would be faster. The ideal function might have a fast path for small arrays, and a more careful handling of the larger inputs.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Writing out large arrays in Go: binary.Write is inefficient for large arrays”

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.