Modern processors use many tricks to go faster. They are superscalar which means that they can execute many instructions at once. They are multicore, which means that each CPU is made of several baby processors that are partially independent. And they are vectorized, which means that they have instructions that can operate over wide registers (spanning 128, 256 or even 512 bits).
Regarding vectorization, Intel is currently ahead of the curve with its AVX-512 instruction sets. They have the only commodity-level processors able to work over 512-bit registers. AMD is barely doing 256 bits and your phone is limited to 128 bits.
The more you use your CPU, the more heat it produces and the more energy it uses. Intel does not want your CPU to burn out or to run out of power. So it throttles the CPU (makes it run slower). Your CPU stays warm but not too hot, it does not use too much power. When the processor does not have to use AVX-512 instructions, some of the silicon remains dark, and thus less heat is generated and less power is consumed.
Vlad Krasnov from Cloudfare wrote a blog post last year warning us against AVX-512 throttling:
If you do not require AVX-512 for some specific high performance tasks, I suggest you disable AVX-512 execution on your server or desktop, to avoid accidental AVX-512 throttling.
I am sure that it is the case that AVX-512 can cause problems for some use cases. It is also the case that some people die if you give them aspirin; yet we don’t retire aspirin.
Should we really disable AVX-512 as a precautionary stance?
In an earlier blog post, I tried to measure this throttling on a server I own but initially found no effect whatsoever. (Update: The fuller story is that I was the victim of a GNU LIBC bug.)
Vlad offered me test case in C. His test case involves AVX-512 multiplications, while much of the running time is spent on some bubble-sort routine. It can run in both AVX-512 mode and in the regular (non-AVX-512) mode. To be clear, it is not meant to be a demonstration in favour of AVX-512: it is meant to show that AVX-512 can be detrimental.
I did not want to run my tests using my own server this time. So I went to Packet and launched a powerful two-CPU Xeon Gold server (Intel Xeon Gold 5120 CPU @ 2.20GHz). Each of these processors have 14 cores, so we have 28 cores in total. Because of hyperthreading, it supports up to 56 physical threads (running two threads per core).
|threads||AVX-512 disabled||with AVX-512|
|20||8.4 s||7.2 s|
|40||6 s||5.0 s|
|80||5.7 s||4.7 s|
In an earlier test, I was using an older compiler (GCC 5) and when exceeding the number of physically supported threads (56), I was getting poorer results, especially with AVX-512. I am not sure why that would be but I suspect it is not related to throttling; it might have to do with context switching and register initialization (though that is speculation on my part).
In my latest test, I use a more recent compiler (GCC 7). As you can see, the AVX-512 version is always faster. Otherwise, I see no negative effect from the application of AVX-512. If there is throttling, it appears that the benefits of AVX-512 offset it.
My code is available along with all the scripts and the outputs for your inspection. You should be able to reproduce my results. It is not like Xeon Gold processors are magical faeries: anyone can grab an instance. For the record, the bill I got from Packet was $2.
Update: Later, Travis Downs reviewed this benchmark and found that it was almost designed to make AVX-512 look good. If one changes the parameters, it is easy to measure the throttling effect.
Note: I have no conflict of interest to disclose. I do not own Intel stock.
Instructions: On Packet, hit “deploy servers”. Choose Sunnyvale CA and choose m2.xlarge.x86. I went with the latest Ubuntu (18.04). You can then access it through ssh. It is a pain that GCC is not already installed once the server is deployed, so that sudo apt-get install gcc fails. I fixed the problem by prepending “us.” to hyperlinks in /etc/apt/sources.list and running sudo apt-get update. (You may achieve this result with sudo sed -i 's|http://|http://us.|g' /etc/apt/sources.list.)
Further reading: AVX-512: when and how to use these new instructions