Inquire about the performance of gcrypt on ARM architecture
Jussi Kivilinna
jussi.kivilinna at iki.fi
Sun Nov 16 13:21:36 CET 2025
Hello,
On 14/11/2025 11:05, Mizar Zhou via Gnupg-users wrote:
> Hi everyone,
>
>
> I’d like to ask about the performance of Libgcrypt on ARM architectures.
>
>
> In my tests, using the same Libgcrypt version on ARMv8 results in performance that is *three times slower, or even more*, compared to Intel. Is this expected behavior? If not, are there any performance-related configuration options or build switches that I might have overlooked?
>
When comparing two different systems, you'd need to also check at differences of those systems. For example, does the other system have significantly higher clock speed? When comparing AES performance, does both systems have AES acceleration instructions sets available? With Linux system, you can check /proc/cpuinfo. You can check if libgcrypt is detecting AES acceleration on CPU with 'tests/version'.
Here's example on x86-64 architecture:
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 97
model name : AMD Ryzen 9 7900X 12-Core Processor
stepping : 2
microcode : 0xa60120c
cpu MHz : 4947.451
cache size : 1024 KB
physical id : 0
siblings : 24
core id : 0
cpu cores : 12
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt ***aes*** xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni ***vaes*** vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user tsa vmscape
bogomips : 9400.07
TLB size : 3584 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
$ tests/version
version:1.12.0-beta677:10c00:1.51:13300:
cc:150200:gcc:15.2.0:
ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:aria:
pubkeys:dsa:elgamal:rsa:ecc:kyber:dilithium:
digests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:
rnd-mod:getentropy:
cpu-arch:x86:amd64:
mpi-asm:amd64/mpih-add1.S:amd64/mpih-sub1.S:amd64/mpih-mul1.S:amd64/mpih-mul2.S:amd64/mpih-mul3.S:amd64/mpih-lshift.S:amd64/mpih-rshift.S:
mpi-powm:fixed-window
hwflist:intel-bmi2:intel-ssse3:intel-sse4.1:intel-pclmul:***intel-aesni***:intel-rdrand:intel-avx:intel-avx2:intel-rdtsc:intel-shaext:***intel-vaes-vpclmul***:intel-avx512:intel-gfni:
fips-mode:n:::
rng-type:standard:1:3030000:1:
compliance:::
You can disable AES acceleration with --disable-hwf option to 'tests/benchmark' or 'tests/bench-slope'. This way you can check if libgcrypt's AES acceleration is active by default on your target system:
$ tests/benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
Running each test 1000 times.
ECB/Stream CBC/Poly1305 CFB OFB CTR XTS CCM GCM OCB EAX
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256 40ms 50ms 710ms 50ms 670ms 40ms 980ms 980ms 40ms 50ms 50ms 60ms 710ms 710ms 80ms 70ms 50ms 40ms 710ms 710ms
$ ./benchmark --large-buffers --cipher-repetitions 1000 --disable-hwf intel-aesni --disable-hwf intel-vaes-vpclmul cipher aes256
Running each test 1000 times.
ECB/Stream CBC/Poly1305 CFB OFB CTR XTS CCM GCM OCB EAX
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256 3060ms 3870ms 3280ms 3830ms 3250ms 2950ms 3330ms 3320ms 3000ms 3000ms 3110ms 3990ms 6270ms 6290ms 3090ms 3060ms 3020ms 3900ms 6290ms 6260ms
To get some estimate on your CPU frequency during tests, you can use '--cpu-mhz auto' setting of bench-slope tool:
$ tests/bench-slope --cpu-mhz auto cipher aes256
Cipher:
AES256 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.040 ns/B 23975 MiB/s 0.224 c/B 5624
ECB dec | 0.040 ns/B 24061 MiB/s 0.223 c/B 5624
CBC enc | 0.647 ns/B 1473 MiB/s 3.60 c/B 5555±1
CBC dec | 0.040 ns/B 24044 MiB/s 0.223 c/B 5624
CFB enc | 0.647 ns/B 1475 MiB/s 3.55 c/B 5487±5
CFB dec | 0.041 ns/B 23519 MiB/s 0.223 c/B 5500
OFB enc | 0.937 ns/B 1018 MiB/s 5.27 c/B 5624
OFB dec | 0.932 ns/B 1024 MiB/s 5.24 c/B 5624
CTR enc | 0.041 ns/B 23294 MiB/s 0.225 c/B 5500
CTR dec | 0.041 ns/B 23332 MiB/s 0.225 c/B 5500
XTS enc | 0.053 ns/B 17877 MiB/s 0.293 c/B 5500
XTS dec | 0.054 ns/B 17652 MiB/s 0.297 c/B 5500
CCM enc | 0.692 ns/B 1378 MiB/s 3.90 c/B 5640±3
CCM dec | 0.688 ns/B 1386 MiB/s 3.74 c/B 5437±5
CCM auth | 0.647 ns/B 1475 MiB/s 3.65 c/B 5651±1
EAX enc | 0.691 ns/B 1380 MiB/s 3.95 c/B 5717±4
EAX dec | 0.692 ns/B 1378 MiB/s 3.80 c/B 5487±5
EAX auth | 0.646 ns/B 1476 MiB/s 3.54 c/B 5473±1
GCM enc | 0.072 ns/B 13281 MiB/s 0.395 c/B 5500
GCM dec | 0.072 ns/B 13271 MiB/s 0.395 c/B 5500
GCM auth | 0.030 ns/B 31772 MiB/s 0.165 c/B 5500
OCB enc | 0.041 ns/B 23461 MiB/s 0.224 c/B 5500
OCB dec | 0.044 ns/B 21456 MiB/s 0.244 c/B 5500
OCB auth | 0.040 ns/B 23562 MiB/s 0.223 c/B 5500
SIV enc | 0.693 ns/B 1376 MiB/s 3.82 c/B 5510±1
SIV dec | 0.696 ns/B 1370 MiB/s 3.82 c/B 5487±5
SIV auth | 0.650 ns/B 1466 MiB/s 3.67 c/B 5637±4
GCM-SIV enc | 0.074 ns/B 12831 MiB/s 0.418 c/B 5624
GCM-SIV dec | 0.079 ns/B 12045 MiB/s 0.445 c/B 5624
GCM-SIV auth | 0.033 ns/B 29124 MiB/s 0.180 c/B 5500
=
>
> I’m using *Libgcrypt 1.10.0 *in ARMv8, compiled with the default settings.
>
>
> Arm:
>
> [root at node-2 tests]# ./benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
>
> Running each test 1000 times.
>
> ECB/Stream CBC/Poly1305 CFB OFB CTR *XTS* CCM GCM OCB EAX
>
> --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
>
> AES256 380ms 390ms 1350ms 440ms 1360ms 440ms 1350ms 1360ms 430ms 440ms *530ms 550ms* 1820ms 1800ms 680ms 670ms 480ms 480ms 1820ms 1810ms
>
My old ARMv8 system (with AES acceleration) shows following results:
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm ***aes*** pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
$ ./version
version:1.11.1-beta23:10b01:1.50-beta2:13200:
cc:130200:gcc:13.2.0:
ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:aria:
pubkeys:dsa:elgamal:rsa:ecc:
digests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:
rnd-mod:getentropy:
cpu-arch:arm:
mpi-asm:arm/mpih-add1.S:arm/mpih-sub1.S:arm/mpih-mul1.S:arm/mpih-mul2.S:arm/mpih-mul3.S:generic/mpih-lshift.c:generic/mpih-rshift.c:
hwflist:arm-neon:***arm-aes***:arm-sha1:arm-sha2:arm-pmull:
fips-mode:n:::
rng-type:standard:1:3030000:2:
compliance:::
With AES acceleration:
$ ./benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
Running each test 1000 times.
ECB/Stream CBC/Poly1305 CFB OFB CTR XTS CCM GCM OCB EAX
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256 6340ms 6540ms 1950ms 1200ms 1830ms 1190ms 6740ms 6740ms 1340ms 1340ms 1560ms 1560ms 3330ms 3350ms 2280ms 2260ms 1560ms 1550ms 3350ms 3340ms
$ ./bench-slope --cpu-mhz auto cipher aes256
Cipher:
AES256 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.02 ns/B 936.5 MiB/s 1.17 c/B 1152
ECB dec | 1.02 ns/B 935.9 MiB/s 1.17 c/B 1152
CBC enc | 1.57 ns/B 605.9 MiB/s 1.81 c/B 1152
CBC dec | 1.06 ns/B 899.6 MiB/s 1.22 c/B 1152
CFB enc | 1.63 ns/B 585.7 MiB/s 1.88 c/B 1152
CFB dec | 1.06 ns/B 899.8 MiB/s 1.22 c/B 1152
OFB enc | 6.29 ns/B 151.5 MiB/s 7.25 c/B 1152
OFB dec | 6.29 ns/B 151.5 MiB/s 7.25 c/B 1152
CTR enc | 1.11 ns/B 857.2 MiB/s 1.28 c/B 1152
CTR dec | 1.11 ns/B 855.3 MiB/s 1.28 c/B 1152
XTS enc | 1.44 ns/B 660.5 MiB/s 1.66 c/B 1152
XTS dec | 1.44 ns/B 661.3 MiB/s 1.66 c/B 1152
CCM enc | 2.75 ns/B 347.4 MiB/s 3.16 c/B 1152
CCM dec | 2.74 ns/B 347.5 MiB/s 3.16 c/B 1152
CCM auth | 1.74 ns/B 549.0 MiB/s 2.00 c/B 1152
EAX enc | 2.75 ns/B 347.1 MiB/s 3.16 c/B 1152
EAX dec | 2.76 ns/B 345.9 MiB/s 3.18 c/B 1152
EAX auth | 1.63 ns/B 583.8 MiB/s 1.88 c/B 1152
GCM enc | 1.99 ns/B 478.2 MiB/s 2.30 c/B 1152
GCM dec | 2.00 ns/B 477.9 MiB/s 2.30 c/B 1152
GCM auth | 0.881 ns/B 1082 MiB/s 1.02 c/B 1152
OCB enc | 1.21 ns/B 788.1 MiB/s 1.39 c/B 1152
OCB dec | 1.21 ns/B 785.2 MiB/s 1.40 c/B 1152
OCB auth | 1.32 ns/B 724.7 MiB/s 1.52 c/B 1152
SIV enc | 2.76 ns/B 346.1 MiB/s 3.17 c/B 1152
SIV dec | 2.85 ns/B 334.6 MiB/s 3.28 c/B 1152
SIV auth | 1.63 ns/B 583.7 MiB/s 1.88 c/B 1152
GCM-SIV enc | 2.10 ns/B 453.1 MiB/s 2.42 c/B 1152
GCM-SIV dec | 2.20 ns/B 433.2 MiB/s 2.54 c/B 1152
GCM-SIV auth | 0.990 ns/B 963.6 MiB/s 1.14 c/B 1152
=
Without AES acceleration:
$ ./benchmark --large-buffers --cipher-repetitions 1000 --disable-hwf arm-aes cipher aes256
Running each test 1000 times.
ECB/Stream CBC/Poly1305 CFB OFB CTR XTS CCM GCM OCB EAX
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256 30100ms 30640ms 26200ms 26330ms 26290ms 26180ms 30400ms 30370ms 26740ms 26690ms 26360ms 26380ms 52890ms 52910ms 27640ms 27620ms 27310ms 27520ms 52980ms 52900ms
$ ./bench-slope --cpu-mhz auto --disable-hwf arm-aes cipher aes256
Cipher:
AES256 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 30.67 ns/B 31.09 MiB/s 35.33 c/B 1152
ECB dec | 33.22 ns/B 28.71 MiB/s 38.27 c/B 1152
CBC enc | 26.99 ns/B 35.33 MiB/s 31.09 c/B 1152
CBC dec | 29.21 ns/B 32.65 MiB/s 33.64 c/B 1152
CFB enc | 26.94 ns/B 35.40 MiB/s 31.03 c/B 1152
CFB dec | 26.94 ns/B 35.40 MiB/s 31.03 c/B 1152
OFB enc | 30.94 ns/B 30.82 MiB/s 35.64 c/B 1152
OFB dec | 30.94 ns/B 30.82 MiB/s 35.64 c/B 1152
CTR enc | 19.82 ns/B 48.12 MiB/s 22.83 c/B 1152
CTR dec | 19.82 ns/B 48.11 MiB/s 22.83 c/B 1152
XTS enc | 31.00 ns/B 30.77 MiB/s 35.71 c/B 1152
XTS dec | 33.49 ns/B 28.47 MiB/s 38.58 c/B 1152
CCM enc | 46.87 ns/B 20.35 MiB/s 54.00 c/B 1152
CCM dec | 46.88 ns/B 20.34 MiB/s 53.99 c/B 1152
CCM auth | 27.10 ns/B 35.19 MiB/s 31.22 c/B 1152
EAX enc | 46.88 ns/B 20.34 MiB/s 54.00 c/B 1152
EAX dec | 46.88 ns/B 20.34 MiB/s 54.00 c/B 1152
EAX auth | 27.05 ns/B 35.25 MiB/s 31.16 c/B 1152
GCM enc | 20.70 ns/B 46.08 MiB/s 23.84 c/B 1152
GCM dec | 20.70 ns/B 46.07 MiB/s 23.85 c/B 1152
GCM auth | 0.877 ns/B 1087 MiB/s 1.01 c/B 1152
OCB enc | 27.32 ns/B 34.91 MiB/s 31.46 c/B 1152
OCB dec | 29.58 ns/B 32.24 MiB/s 34.08 c/B 1152
OCB auth | 27.27 ns/B 34.97 MiB/s 31.42 c/B 1152
SIV enc | 46.89 ns/B 20.34 MiB/s 54.00 c/B 1152
SIV dec | 46.97 ns/B 20.30 MiB/s 54.11 c/B 1152
SIV auth | 27.05 ns/B 35.25 MiB/s 31.16 c/B 1152
GCM-SIV enc | 32.09 ns/B 29.72 MiB/s 36.96 c/B 1152
GCM-SIV dec | 32.19 ns/B 29.63 MiB/s 37.08 c/B 1152
GCM-SIV auth | 0.976 ns/B 976.7 MiB/s 1.12 c/B 1152
=
-Jussi
More information about the Gnupg-users
mailing list