Programmers often need to write data structures to disk or to networks. The data structure then needs to be interpreted as a sequence of bytes. Regarding integer values, most computer systems adopt “little endian” encoding whereas an 8-byte integer is written out using the least significant bytes first. In the Go programming language, you can write an array of integers to a buffer as follows:
var data []uint64
var buf *bytes.Buffer = new(bytes.Buffer)
...
err := binary.Write(buf, binary.LittleEndian, data)
Until recently, I assumed that the binary.Write function did not allocate memory. Unfortunately, it does. The function converts the input array to a new, temporary byte arrays.
Instead, you can create a small buffer just big enough to hold you 8-byte integer and write that small buffer repeatedly:
var item = make([]byte, 8)
for _, x := range data {
binary.LittleEndian.PutUint64(item, x)
buf.Write(item)
}
Sadly, this might have poor performance on disks or networks where each write/read has a high overhead. To avoid this problem, you can use Go’s buffered writer and write the integers one by one. Internally, Go will allocate a small buffer.
writer := bufio.NewWriter(buf)
var item = make([]byte, 8)
for _, x := range data {
binary.LittleEndian.PutUint64(item, x)
writer.Write(item)
}
writer.Flush()
I wrote a small benchmark that writes an array of 100M integers to memory.
function | memory usage | time |
---|---|---|
binary.Write | 1.5 GB | 1.2 s |
one-by-one | 0 | 0.87 s |
buffered one-by-one | 4 kB | 1.2 s |
(Timings will vary depending on your hardware and testing procedure. I used Go 1.16.)
The buffered one-by-one approach is not beneficial with respect to speed in this instance, but it would be more helpful in other cases. In my benchmark, the simple one-by-one approach is fastest and uses least memory. For small inputs, binary.Write would be faster. The ideal function might have a fast path for small arrays, and a more careful handling of the larger inputs.
For go 1.18 i have a little different results
Was mmap (including the time to .Flush()) worth considering?
Maybe. Did you happen to try it out?