Amazon has some neat ARM-based systems based on Amazon’s own chips (Graviton). You can access them through Amazon’s web services (AWS). These processors have advanced vector instructions able to process many values at once. These instructions are part of an instruction sets called SVE for Scalable Vector Extension. SVE has a trick: it hides its internal register size from you. Thus, to the question “how many values can it process at once?”, the answer is ‘”it depends”.
Thankfully, you can still write a program to find out. The svcntb intrinsic tells you how many 8-bit integers fits in a full register. Thus the following C++ line should tell you the vector register size in bytes:
std::cout << "detected vector register size (SVE): " << svcntb() << " bytes" << std::endl;
And here is what I get currently on an AWS server:
$ ./svesize detected vector register size (SVE): 32 bytes
It is hard to find ARM processors with such wide registers (32 bytes) and it is unclear whether future iterations will still have 32 bytes registers.
Interesting – svlen_* isn’t documented in the current ACLE, but it seems like it may have been in an earlier version. It seems like current GCC/Clang accept it as well.
Should probably use the documented svcntb() instead though.
I think Neoverse V1 is the only ARM processor with 256-bit vectors. Neoverse V2 has reverted to 128-bit vectors.
I agree that svcntb is nicer. Thanks.
The svlen_* intrinsics are documented in a currently available manual.
Reference:
Arm C Language Extensions for SVE
https://developer.arm.com/documentation/100987/0000/
Section 6.27.6. LEN: Return the number of elements in a vector
Weird, I searched that exact document but search didn’t find it for some reason. Oh well, thanks for the correction!
ARM processors with wide vector registers may be difficult to find, but you can find wider vectors in Fugaku’s Fujitsu A64FX CPUs. Their ARM cores also support SVE extensions with 512bit wide SIMD registers:
https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
Thanks for the link.