JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4

Intel has release a new generation of server processors (Sapphire Rapids) while the latest AMD technology (Zen 4) is now broadly available. There are extensive comparisons available. Of particular interest is the open benchmark results which assess various aspects of processor speeds, including JSON parsing performance. In these benchmarks, AMD systems appear to dominate.

I decided to run my own benchmarks using JSON parsing as a reference and commonly available Amazon big nodes. For these tests, I use Amazon Linux 2023 with GCC 11. I use two instances that cost about 5 dollars per hour. Amazon charges me about the same amount for both the AMD and Intel systems.

The AMD instance is of type c7a.24xlarge with an AMD EPYC 9R14 processor (Zen 4 microarchitecture). The Intel instance is of type c7i.metal-24xl with an Intel XeonPlatinum 8488C (Sapphire Rapids microarchitecture). I use systems with multiple cores but my benchmark is entirely single threaded. I could have optimized either system by going with systems that have fewer cores running hotter. In my case, both processors run in practice at a comparable frequency, with a slight advantage for AMD (3.5 GHz vs 3.4 GHz).

The gist of the result is that neither system dominates the other one. In some benchmarks, Intel wins, in others AMD wins. It is very closely matched.

Intel results:

simdjson On-Demand simdjson DOM yyjson rapidjson nlohmann/json Boost JSON
json2msgpack 3.68 GB/s 2.67 GB/s 1.72 GB/s 0.71 GB/s 0.03 GB/s 0.42 GB/s
partial_tweets 6.83 GB/s 4.77 GB/s 2.41 GB/s 0.77 GB/s 0.13 GB/s 0.50 GB/s
distinct_user_id 6.99 GB/s 4.90 GB/s 2.52 GB/s 0.67 GB/s 0.14 GB/s 0.49 GB/s
kostya 2.92 GB/s 2.03 GB/s 0.83 GB/s 0.80 GB/s 0.12 GB/s 0.47 GB/s

AMD results:

simdjson On-Demand simdjson DOM yyjson rapidjson nlohmann/json Boost JSON
json2msgpack 3.09 GB/s 2.45 GB/s 1.93 GB/s 0.68 GB/s 0.03 GB/s 0.38 GB/s
partial_tweets 6.84 GB/s 4.22 GB/s 2.64 GB/s 0.77 GB/s 0.12 GB/s 0.46 GB/s
distinct_user_id 6.94 GB/s 4.26 GB/s 2.58 GB/s 0.77 GB/s 0.13 GB/s 0.47 GB/s
kostya 4.03 GB/s 2.71 GB/s 1.00 GB/s 0.78 GB/s 0.12 GB/s 0.52 GB/s

You can reproduce my results by grabbing simdjson and running bench_ondemand.

I do not pretend that this single data point is sufficient to make purchasing decisions or to assess the Intel and AMD technology. Take it as a data point.

Further reading. On-demand JSON: A better way to parse documents?, Software: Practice and Experience (to appear)

Daniel Lemire, "JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4," in Daniel Lemire's blog, February 9, 2024.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4”

  1. The cloud numbers look really low. My Ryzen 7900 (not the 7900x) gets:
    ~/git/simdjson/build/benchmark$ cat log | grep partial_tweets | grep “simdjson_ondemand”
    partial_tweets/manual_time 66705 ns 75966 ns 10453 best_bytes_per_sec=9.59662G best_docs_per_sec=15.1962k best_items_per_sec=1.51962M bytes=631.515k bytes_per_second=8.81706G/s docs_per_sec=14.9913k/s items=100 items_per_second=1.49913M/s [BEST: throughput= 9.60 GB/s doc_throughput= 15196 docs/s items= 100 avg_time= 66705 ns]

    So 3x faster than the AMD EPYC 9R14?

  2. Are they both using AVX-512? Does Zen 4 support AVX-512? I thought AMD was splitting registers on AVX2 (which Intel might have done until Skylake? I don’t remember.), and their initial AVX-512 implementation might be similarly suboptimal. I thought Intel improved AVX-512 after the first gen implementation, the throttling issues, etc.

    If they’re both using AVX-512, it implies AMD has more room to improve since their implementation should be suboptimal first-gen. It might also mean that AMD would win if it were AVX2 only. That chip beat the Intel chip in all of Phoronix’s tests, but I don’t know what many of these benchmarks are, or if any are parsing centered: https://www.phoronix.com/review/aws-m7a-ec2-benchmarks

    AMD uses a better node in TSMC’s N5 “5nm”, compared to Intel’s 10nm SuperFin or Enhanced SuperFin (which they rebranded as Intel 7). Intel’s “10nm” should be comparable to TSMC’s “7nm” family, used in previous generation Zen CPUs. There are a few other factors in CPU engineering of course – Intel seems to beat AMD when using comparable nodes – but I’d be surprised if Intel can hang with Zen 4 generally, given their inferior node.

    It would also be interesting to compare GB/s per watt, even though AWS doesn’t bill for that.

    1. their initial AVX-512 implementation might be similarly suboptimal.

      AMD Zen 4 has an AVX-512 implementation that is comparable in performance to Intel Ice Lake. I was very surprised. See this other blog post where I report on another benchmark… https://lemire.me/blog/2023/01/05/transcoding-unicode-with-avx-512-amd-zen-4-vs-intel-ice-lake/

      You can also review the latency/throughput tables and you will notice that Zen 4 is not significantly behind Intel as far as AVX-512 instructions are concerned. AMD has a couple of isolated weak points, like the compressed stores… but that’s about it.

      I thought AMD was splitting registers on AVX2

      I suspect that it is too much of a simplification. For example, AMD have fast byte-wise compression instructions. You give it a 64-bit mask and you get to keep or remove any of the 64 bytes from the input, packing them left. It is a fundamentally 64-byte operation and AMD does it at a competitive speed compared to Intel.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.