Data structure size and cache-line accesses

On many systems, memory is accessed in fixed blocks called “cache lines”. On Intel systems, the cache line spans 64 bytes. That is, if you access memory at byte address 64, 65… up to 127… it is all on the same cache line. The next cache line starts at address 128, and so forth.

In turn, data in software is often organized in data structures having a fixed size (in bytes). We often organize these data structures in arrays. In general, a data structure may reside on more than one cache line. For example, if I put a 5-byte data structure at byte address 127, then it will occupy the last byte of one cache line, and four bytes in the next cache line.

When loading a data structure from memory, a naive model of the cost is the number of cache lines that are accessed. If your data structure spans 32 bytes or 64 bytes, and you have aligned the first element of an array, then you only ever need to access one cache line every time you load a data structure.

What if my data structures has 5 bytes? Suppose that I packed them in an array, using only 5 bytes per instance. What if I pick one at random… how many cache lines do I touch? Expectedly, the answer is barely more than 1 cache line on average.

Let us generalize.

Suppose that my data structure spans z bytes. Let g be the greatest common divisor between z and 64. Suppose that you load one instance of the data structure at random from a large array. In general, the expected number of additional cache lines accesses is (z – g)/64. The expected total number of cache line accesses is one more: 1 + (z – g)/64. You can check that it works for z = 32, since g is then 32 and you have (z – g)/64 is (32-32)/64 or zero.

I created the following table for all data structures no larger than a cache line. The worst-case scenario is a data structure spanning 63 bytes: you then almost always touch two cache lines.

I find it interesting that you have the same expected number of cache line accesses for data structures of size 17, 20, 24. It does not follow that computational cost a data structure spanning 24 bytes is the same as the cost of a data structure spanning 17 bytes. Everything else being identical, a smaller data structure should fare better, as it can fit more easily in CPU cache.

size of data structure (z) expected cache line access
1 1.0
2 1.0
3 1.03125
4 1.0
5 1.0625
6 1.0625
7 1.09375
8 1.0
9 1.125
10 1.125
11 1.15625
12 1.125
13 1.1875
14 1.1875
15 1.21875
16 1.0
17 1.25
18 1.25
19 1.28125
20 1.25
21 1.3125
22 1.3125
23 1.34375
24 1.25
25 1.375
26 1.375
27 1.40625
28 1.375
29 1.4375
30 1.4375
31 1.46875
32 1.0
33 1.5
34 1.5
35 1.53125
36 1.5
37 1.5625
38 1.5625
39 1.59375
40 1.5
41 1.625
42 1.625
43 1.65625
44 1.625
45 1.6875
46 1.6875
47 1.71875
48 1.5
49 1.75
50 1.75
51 1.78125
52 1.75
53 1.8125
54 1.8125
55 1.84375
56 1.75
57 1.875
58 1.875
59 1.90625
60 1.875
61 1.9375
62 1.9375
63 1.96875
64 1.0

Thanks to Maximilian Böther for the motivation of this post.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Data structure size and cache-line accesses”

  1. It’s quite bit counter-intuitive.

    Take size=2 for example, the offset could be any random value. If address span is [63, 64], then we touch 2 cache lines. So size = 2 can not be 1.0.

    I write a python script to compute expected cache line access

    def expected_access(size):
    C = 64 # Cache line size
    ss = []
    for off in range(0, C): # offset could be random value.
    a = off // C
    b = (off + size – 1) // C
    ss.append(b – a + 1) # access b-a+1 cache lines
    return sum(ss) / len(ss) # compute average

    and 2 bytes expected value is 1.015625, 32 bytes expected value is 1.484375.

    1. If you have a 2-byte value, and you lay it out in a packed array as described in the blog post, you will never have a 2-byte value covering two cache lines.

      I think that you are considering a different model where you put your values anywhere in memory (at a random address).

      Then I suspect that the probability of an overlap is min(1-(64-x+1)/64,1).

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.