Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake

Most systems today rely on Unicode strings. However, we have two popular Unicode formats: UTF-8 and UTF-16. We often need to convert from one format to the other. For example, you might have a database formatted with UTF-16, but you need to produce JSON documents using UTF-8. This conversion is often called ‘transcoding’.

In the last few years, we wrote a specialized library that process Unicode strings, with a focus on performance: the simdutf library. The library is used by JavaScript runtimes (Node JS and bun).

The simdutf library is able to benefit from the latest and most powerful instructions on your processors. In particular, it does well with recent processors with AVX-512 instructions (Intel Ice Lake, Rocket Lake, as well as AMD Zen 4).

I do not yet have a Zen 4 processor, but Velu Erwan was kind of enough to benchmark it for me. A reasonable task is to transcode an Arabic file from UTF-8 to UTF-16: it is typically a non-trivial task because Arabic UTF-8 is a mix of one-byte and two-byte characters that we must convert to two-byte UTF-16 characters (with validation). The steps required (under Linux) are as follows:

git clone https://github.com/simdutf/simdutf && 
cd simdutf && 
cmake -B build && cmake --build build && 
wget –content-disposition https://cutt.ly/d2cIxRx && 
./build/benchmarks/benchmark -F Arabic-Lipsum.utf8.txt -P convert_utf8 

(Ideally, run the last command with privileged access to the performance counters.)

Like Intel, AMD has its own compiler. I did not have access to the Intel compiler for my tests, but Velu has the AMD compiler.

A sensible reference point is the iconv, as it is provided by the runtime library. The AMD processor is running much faster than the Intel core (5.4 GHz vs. 3.4 GHz). We use GCC 12.

transcoder Intel Ice Lake (GCC) AMD Zen 4 (GCC) AMD Zen 4 (AMD compiler)
iconv 0.70 GB/s 0.97 GB/s 0.98 GB/s
simdutf 7.8 GB/s 11 GB/s 12 GB/s

At a glance, the Zen 4 processor is slightly less efficient on a per-cycle basis when running the simdutf AVX-512 code (2.8 instructions/cycle for AMD versus 3.1 instructions/cycle for Intel) but keep in mind that we did not have access to a Zen 4 processor when tuning our code. The efficiency difference is small enough that we can consider the processors roughly on par pending further investigations.

The big difference that the AMD Zen 4 runs at a much higher frequency. If I rely on wikipedia, I do not think that there is an Ice Lake processor that can match the 5.4 GHz. However, some Rocket Lake processors come close.

In our benchmarks, we track the CPU frequency and we get the same measured frequency when running an AVX-512 as when running conventional code (iconv).  Thus AVX-512 can be really advantageous.

These results suggest that AMD Zen 4 is matching Intel Ice Lake in AVX-512 performance. Given that the Zen 4 microarchitecture is the first AMD attempt at supporting AVX-512 commercially, it is a remarkable feat.

Further reading: AMD Zen 4 performance while parsing JSON (phoronix.com).

Note: Raw AMD results are available: GCC 12 and AOCC.

Credit: Velu Erwan got the processor from AMD France. The exact specification is AMD 7950X, 2x16GB DDR5 4800MT reconfig as 5600MT. The UTF-8 to UTF-16 transcoder is largely based on the work of Robert Clausecker.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

11 thoughts on “Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake”

  1. It’s worth noting that Zen 4 implements AVX-512 by splitting execution into two 256 bit stages, so instructions take twice more cycles (at least those that are 1-cycle on Intel, for complex instructions the difference is less than 2x, and in fact Zen 4 has powerful shuffling units, IIRC).

    1. That is not true. It uses two 256-bit units (if available), but it takes only a single amount of cycles.

    1. That’s interesting. I gave up on Intel compilers some time ago because it was tiring to manage the licensing. It seems like great news that they have simplified the process.

      1. Intel switched to LLVM two years ago for their main C/C++ compiler, but they still release and update their Compiler Classic, based on their own compiler internals: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Release_history

        They seem to support a lot more optimization and acceleration than competitors, though the branding for the features is hard to keep track of. They might be the only compiler to support the new matrix instructions (AMX), and have lots of support for OpenMP up through 5.x, and their libraries like Threaded Building Blocks and SYCL/OpenCL support.

        But it’s hard to keep track of their branding, products, and features. A lot of it is under “OneAPI” now. I think OneAPI is meant to include their compiler, but I’m not sure.

        Have you looked at the matrix instructions?

  2. Is there any reason for maintaining “a database formatted with UTF-16”? I had thought that the only use for UTF-16 in the modern age is for legacy operating system interfaces.

Leave a Reply

Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com.

You may subscribe to this blog by email.