AVX-512 throttling: heavy instructions are maybe not so dangerous

Recent Intel processors have fancy instructions operating over 512-bit registers. They are reported to cause a frequency throttling of the core where they are run, and possibly of other cores in some cases. Thus, it has been recommended to avoid AVX-512 instructions. I have written a series of blog posts on the topic trying to reproduce the effect. Though I can measure some level of performance degradation, if I work hard, I simply cannot find the “obvious” performance degradations (50%) that are often advertised. I tested on two distinct processors. I tried single-threaded and multi-threaded code.

There is more to the story than appears at first.

Travis Downs wrote a fancy tool to investigate the issue. Let me reproduce some of his findings in my own words. According to Intel’s documentation, there are two types of AVX-512, light instructions (e.g., integer additions) and heavy instructions (e.g., multiplications). Heavy instructions reportedly cause a much greater frequency throttle. None of my tests showed that. Travis found that it is quite hard to trigger:

Even a stream of 1 FMAD [fused multiply–add] every 4 or even 2 cycles doesn’t set the frequency down lower. The lowest speed is only reached if FMAs [fused multiply–add] come at a rate of more than 1 every 2 cycles.

As far as I can tell, this is absent from Intel’s documentation. If Travis is right, and I have no reason to doubt him, this means that the reported massive frequency throttling (slowest license) that we find everywhere online (including on Intel’s site) requires substantial qualification. Few people will ever achieve the rate of sustained heavy instructions that Travis documents.

For example, if you use AVX-512 to for pattern matching (Intel Hyperscan), to code and decode base64, or to compress and uncompress integers, you are probably never going to trigger massive throttling. If you do a lot of cryptography, machine learning or number crunching, the story might be different.

It is important to take into account how much you gain in the first place by going to AVX-512. For example, openssl found that a particular cryptographic routine involving many multiplications ran 30% faster on a per-cycle basis with AVX-512. Once you factor in some throttling, it is easy to see how it could be wasteful. So maybe a sensible approach is to ensure that you make substantial gains when using AVX-512 if it involves many heavy instructions.

Update: The same holds true for AVX (256-bit) instructions. For AVX instructions to lead to any throttle at all, you have sustain expensive instructions repeatedly every 1 or 2 cycles.

Further reading: AVX-512: when and how to use these new instructions

3 thoughts on “AVX-512 throttling: heavy instructions are maybe not so dangerous”

  1. Note that heavy/light applies to both AVX2 and AVX-512.

    So for example heavy AVX2 is in principle the same thing as light AVX-512 (best to show in a chart).

    However, the same “heavy doesn’t necessarily mean heavy unless you do it a lot” thing you discuss in this post applies to AVX2 heavy instructions, which in fact makes them quite different than AVX-512 light: because AVX-512 light take effect immediately as soon as one occurs (as far as I can tell), but AVX2 heavy need to be run a lot. So in practice, AVX2 heavy is lets you run one speed tier higher (the highest tier, in fact) compared to AVX-512 light.

    1. So in practice, AVX2 heavy is lets you run one speed tier higher (the
      highest tier, in fact) compared to AVX-512 light.

      That should read “… AVX2 heavy often lets you run one speed tier higher…”.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax