Writing out large arrays in Go: binary.Write is inefficient for large arrays

Programmers often need to write data structures to disk or to networks. The data structure then needs to be interpreted as a sequence of bytes. Regarding integer values, most computer systems adopt “little endian” encoding whereas an 8-byte integer is written out using the least significant bytes first. In the Go programming language, you can write an array of integers to a buffer as follows:

var data []uint64
var buf *bytes.Buffer = new(bytes.Buffer)

...

err := binary.Write(buf, binary.LittleEndian, data)

Until recently, I assumed that the binary.Write function did not allocate memory. Unfortunately, it does. The function converts the input array to a new, temporary byte arrays.

Instead, you can create a small buffer just big enough to hold you 8-byte integer and write that small buffer repeatedly:

var item = make([]byte, 8)
for _, x := range data {
    binary.LittleEndian.PutUint64(item, x)
    buf.Write(item)
}

Sadly, this might have poor performance on disks or networks where each write/read has a high overhead. To avoid this problem, you can use Go’s buffered writer and write the integers one by one. Internally, Go will allocate a small buffer.

writer := bufio.NewWriter(buf)
var item = make([]byte, 8)
for _, x := range data {
	binary.LittleEndian.PutUint64(item, x)
	writer.Write(item)
}
writer.Flush()

I wrote a small benchmark that writes an array of 100M integers to memory.

function memory usage time
binary.Write 1.5 GB 1.2 s
one-by-one 0 0.87 s
buffered one-by-one 4 kB 1.2 s

(Timings will vary depending on your hardware and testing procedure. I used Go 1.16.)

The buffered one-by-one approach is not beneficial with respect to speed in this instance, but it would be more helpful in other cases. In my benchmark, the simple one-by-one approach is fastest and uses least memory. For small inputs, binary.Write would be faster. The ideal function might have a fast path for small arrays, and a more careful handling of the larger inputs.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Writing out large arrays in Go: binary.Write is inefficient for large arrays”

Leave a Reply to pasha Cancel reply

Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com.

You may subscribe to this blog by email.