Daniel Lemire's blog

Data alignment for speed: myth or reality?

Compilers align data structures so that if you read an object using 4 bytes, its memory address is divisible by 4. There are two reasons for data alignment:

So, data alignment is important for performance.

Is it?

I decided to write a little program to test it out. My program takes a long array, it initializes it, and it computes a Karp-Rabin-like hash value from the result. It repeats this operation on arrays that have a different offset from an aligned boundary. For example, when it uses 4-byte integers, it will try offsets of 0, 1, 2 and 3. If aligned data is faster, then the case with an offset of 0 should be faster.

I repeat all tests 20 times and report the average wall clock time (in milliseconds). My source code in C++ is available.

4-byte integers

offset  Core i7   Core 2 
0 28.1 34.1
1 28.7 38.1
2 28.7 38.1
3 28.7 38.1

8-byte integers

offset  Core i7   Core 2 
0 33.6 69.1
1 33.6 77
2 33.6 77
3 33.6 77
4 33.6 77
5 33.6 77
6 33.6 77
7 33.6 77

Analysis:

I see no evidence that unaligned data processing could be several times slower. On a cheap Core 2 processor, there is a difference of about 10% in my tests. On a more recent processor (Core i7), there is no measurable difference.

On recent Intel processors (Sandy Bridge and Nehalem), there is no performance penalty for reading or writing misaligned memory operands. There might be more of a difference on some AMD processors, but the busy AMD server I tested showed no measurable penalty due to data alignment.

Intel processors use 64-byte cache lines and if you need to load a register overlapping two cache lines, it might limit the best speed you can get. But we are not talking about a severalfold penalty except maybe if you need to lock the instruction (for parallel programming). Thus it only matters in very specific code where loading and storing data from the fastest CPU cache is a critical bottleneck, and even then, you should not expect a large difference.

Conclusion: On recent Intel processors, data alignment does not make processing a lot faster. It is a micro-optimization. Data alignment for speed is a myth.

Acknowledgement: I am grateful to Owen Kaser for pointing me to the references on this issue.

Further reading:

Update: Laurent Gauthier provided a counter-example where unaligned access is significantly slower (by 50%). However, it involves a particular setup where you read words separated by specific intervals.