Whenever you enter a URL into a system, it must be parsed and validated. It is a surprisingly challenging task: it may require hundreds of nanoseconds and possibly over a thousand cycles to parse a typical URL.
We can use URL parsing as a reasonable benchmark of a system performance. Of course, no single measure is sufficient… but URL parsing is interesting because it is a fairly generic task involving strings, and substrings, and characters searches and so forth.
I am going to compare the following ARM-based systems:
- c7g.large: Amazon Graviton 3 running Ubuntu 22.04 (GCC 11)
- macBook Air 2022: Apple M2 LLVM 14
- Windows Dev Kit 2023: Qualcomm 8cx 3rd gen running Ubuntu 22.04 (GCC 11) inside WSL (Windows 11)
The Windows Dev Kit is a little plastic box designed to allow Windows developers to get their applications ready for Windows for 64-bit ARM. It is a tiny low-power device that I leave on my desk. The Amazon Graviton 3 nodes from Amazon are their best ARM-based servers. The macBook Air contains one of the best laptop processors on the market.
The benchmark we run loads 100,000 URLs found on the top 100 most visited web sites. It is single-threaded and requires no disk or network access: it is a pure CPU test.
I run the following routine:
- git clone https://github.com/ada-url/ada
- cd ada
- cmake -B build -D ADA_BENCHMARKS=ON
- cmake --build build --target benchdata
- ./build/benchmarks/benchdata --benchmark_filter=BasicBench_AdaURL_aggregator_href
|Graviton 3||285 ns/url|
|Apple M2||190 ns/url|
|Qualcomm 8cx 3rd gen||245 ns/url|
We can also plot these average timings.
On this particular benchmark, the Qualcomm processor is 30% slower than the Apple M2 processor. That is to be expected: Apple Silicon is generally superior.
However, in this particular test, the Qualcomm system beats the Graviton 3 node from Amazon. On a related benchmark, I showed that the Graviton 3 had competitive performance and could beat state-of-the-art Intel Ice Lake nodes. Amazon themselves claim that Graviton 3 instances might be superior for machine learning tasks.
We can try to correct for frequency differences. The Graviton runs at 2.6 GHz, the Apple M2 runs at 3.5 GHz and the Qualcomm processor at 3.0 GHz. Let us correct the numbers:
|Graviton 3 (model)||245 ns/url (corrected for 3 GHz)|
|Apple M2 (model)||220 ns/url (corrected for 3 GHz)|
|Qualcomm 8cx 3rd gen||245 ns/url|
Note that you cannot blindly correct for frequency in this manner because it is not physically possible to just change the frequency as I did: it is a model to help us think.
Overall, these numbers suggest that the Qualcomm processor is competitive. It is not likely to establish speed records, but I would not shy away from a Qualcomm-based system if it is meant for low power usage.
How likely is it that my results are misleading? They seem to match roughly the results that Alex Ellis got running a more complete benchmark:
So I believe that my result is roughly correct: Qualcomm is inferior to Apple Silicon, but not by a very wide margin.
A separate issue is the Windows performance itself. Much of Windows is still x86 specific and though Windows can run x86 applications in emulation under 64-bit ARM, there is a penalty which could be substantial. Nevertheless, my own experience has been quite good. Of course, I do not play games on these machines nor do I do video editing. Your mileage will vary.
Further reading: Linux on Microsoft Dev Kit 2023
7 thoughts on “Graviton 3, Apple M2 and Qualcomm 8cx 3rd gen: a URL parsing benchmark”
asahi is fast 🙂
Run on (8 X 2424 MHz CPU s)
BasicBench_AdaURL_aggregator_href 15501605 ns 15478609 ns 45 speed=561.297M/s time/byte=1.78159ns time/url=154.747ns url/s=6.46214M/s
Linux amac 6.2.0-asahi-11-1-edge-ARCH #2 SMP PREEMPT_DYNAMIC Sun, 19 Mar 2023 10:26:57 +0000 aarch64 GNU/Linux
sudo dmesg | grep Machine
[ 0.000000] Machine model: Apple MacBook Air (13-inch, M2, 2022)
but why WSL? there is some performace penalty for WSL
This benchmark is purely computational, there is no disk and IO access, so I do not expect any penalty from WSL. This being said, given that nobody knows how to install Linux on the Windows Dev Kit, the point is somewhat moot regarding Linux.
Admittedly, I could have installed FreeBSD on it, but that’s a fair amount of effort.
As for running the benchmarks under Windows proper… Visual Studio is typically behind in terms of optimization and performance. Running benchmarks under WSL, with native GCC or CLANG, is often better than compiling binaries using Visual Studio.
It is disappointing, but Microsoft has not kept up very well in terms of compiler technology.
AMD 7950x, clang 14 (targeting GCC 11.2’s STL), linux:
BasicBench_AdaURL_aggregator_href 10916160 ns 10902775 ns 64 GHz=5.41849 cycle/byte=6.81415 cycles/url=591.872 instructions/byte=23.2465 instructions/cycle=3.4115 instructions/ns=18.4852 instructions/url=2.01917k ns/url=109.232 speed=796.87M/s time/byte=1.25491ns time/url=109ns url/s=9.17427M/s
Corrected for 3.0 GHz:
* using the reported frequency of 5.4 – it is 196 ns/url, which is still faster. 🙂
* using the max boost of 5.7 – it is 207 ns/url
AMD Zen 4 looks very competitive.
Intel and AMD embarrassed Apple’s M2. 1280P and 5900HS are proven to be generally better. Apples silicone is generally inferior. But hey, that’s just after a total of 190 benchmarks.
You may subscribe to this blog by email.