Inquire about the performance of gcrypt on ARM architecture

Jussi Kivilinna jussi.kivilinna at iki.fi
Sun Nov 16 13:21:36 CET 2025


Hello,

On 14/11/2025 11:05, Mizar Zhou via Gnupg-users wrote:
> Hi everyone,
> 
> 
> I’d like to ask about the performance of Libgcrypt on ARM architectures.
> 
> 
> In my tests, using the same Libgcrypt version on ARMv8 results in performance that is *three times slower, or even more*, compared to Intel. Is this expected behavior? If not, are there any performance-related configuration options or build switches that I might have overlooked?
> 

When comparing two different systems, you'd need to also check at differences of those systems. For example, does the other system have significantly higher clock speed? When comparing AES performance, does both systems have AES acceleration instructions sets available? With Linux system, you can check /proc/cpuinfo. You can check if libgcrypt is detecting AES acceleration on CPU with 'tests/version'.

Here's example on x86-64 architecture:


$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 97
model name      : AMD Ryzen 9 7900X 12-Core Processor
stepping        : 2
microcode       : 0xa60120c
cpu MHz         : 4947.451
cache size      : 1024 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt ***aes*** xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni ***vaes*** vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user tsa vmscape
bogomips        : 9400.07
TLB size        : 3584 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

$ tests/version
version:1.12.0-beta677:10c00:1.51:13300:
cc:150200:gcc:15.2.0:
ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:aria:
pubkeys:dsa:elgamal:rsa:ecc:kyber:dilithium:
digests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:
rnd-mod:getentropy:
cpu-arch:x86:amd64:
mpi-asm:amd64/mpih-add1.S:amd64/mpih-sub1.S:amd64/mpih-mul1.S:amd64/mpih-mul2.S:amd64/mpih-mul3.S:amd64/mpih-lshift.S:amd64/mpih-rshift.S:
mpi-powm:fixed-window
hwflist:intel-bmi2:intel-ssse3:intel-sse4.1:intel-pclmul:***intel-aesni***:intel-rdrand:intel-avx:intel-avx2:intel-rdtsc:intel-shaext:***intel-vaes-vpclmul***:intel-avx512:intel-gfni:
fips-mode:n:::
rng-type:standard:1:3030000:1:
compliance:::


You can disable AES acceleration with --disable-hwf option to 'tests/benchmark' or 'tests/bench-slope'. This way you can check if libgcrypt's AES acceleration is active by default on your target system:


$ tests/benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
Running each test 1000 times.
                 ECB/Stream    CBC/Poly1305         CFB             OFB             CTR             XTS             CCM             GCM             OCB             EAX
              --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256          40ms    50ms   710ms    50ms   670ms    40ms   980ms   980ms    40ms    50ms    50ms    60ms   710ms   710ms    80ms    70ms    50ms    40ms   710ms   710ms

$ ./benchmark --large-buffers --cipher-repetitions 1000 --disable-hwf intel-aesni --disable-hwf intel-vaes-vpclmul cipher aes256
Running each test 1000 times.
                 ECB/Stream    CBC/Poly1305         CFB             OFB             CTR             XTS             CCM             GCM             OCB             EAX
              --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256        3060ms  3870ms  3280ms  3830ms  3250ms  2950ms  3330ms  3320ms  3000ms  3000ms  3110ms  3990ms  6270ms  6290ms  3090ms  3060ms  3020ms  3900ms  6290ms  6260ms


To get some estimate on your CPU frequency during tests, you can use '--cpu-mhz auto' setting of bench-slope tool:

$ tests/bench-slope --cpu-mhz auto cipher aes256
Cipher:
  AES256         |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         ECB enc |     0.040 ns/B     23975 MiB/s     0.224 c/B      5624
         ECB dec |     0.040 ns/B     24061 MiB/s     0.223 c/B      5624
         CBC enc |     0.647 ns/B      1473 MiB/s      3.60 c/B      5555±1
         CBC dec |     0.040 ns/B     24044 MiB/s     0.223 c/B      5624
         CFB enc |     0.647 ns/B      1475 MiB/s      3.55 c/B      5487±5
         CFB dec |     0.041 ns/B     23519 MiB/s     0.223 c/B      5500
         OFB enc |     0.937 ns/B      1018 MiB/s      5.27 c/B      5624
         OFB dec |     0.932 ns/B      1024 MiB/s      5.24 c/B      5624
         CTR enc |     0.041 ns/B     23294 MiB/s     0.225 c/B      5500
         CTR dec |     0.041 ns/B     23332 MiB/s     0.225 c/B      5500
         XTS enc |     0.053 ns/B     17877 MiB/s     0.293 c/B      5500
         XTS dec |     0.054 ns/B     17652 MiB/s     0.297 c/B      5500
         CCM enc |     0.692 ns/B      1378 MiB/s      3.90 c/B      5640±3
         CCM dec |     0.688 ns/B      1386 MiB/s      3.74 c/B      5437±5
        CCM auth |     0.647 ns/B      1475 MiB/s      3.65 c/B      5651±1
         EAX enc |     0.691 ns/B      1380 MiB/s      3.95 c/B      5717±4
         EAX dec |     0.692 ns/B      1378 MiB/s      3.80 c/B      5487±5
        EAX auth |     0.646 ns/B      1476 MiB/s      3.54 c/B      5473±1
         GCM enc |     0.072 ns/B     13281 MiB/s     0.395 c/B      5500
         GCM dec |     0.072 ns/B     13271 MiB/s     0.395 c/B      5500
        GCM auth |     0.030 ns/B     31772 MiB/s     0.165 c/B      5500
         OCB enc |     0.041 ns/B     23461 MiB/s     0.224 c/B      5500
         OCB dec |     0.044 ns/B     21456 MiB/s     0.244 c/B      5500
        OCB auth |     0.040 ns/B     23562 MiB/s     0.223 c/B      5500
         SIV enc |     0.693 ns/B      1376 MiB/s      3.82 c/B      5510±1
         SIV dec |     0.696 ns/B      1370 MiB/s      3.82 c/B      5487±5
        SIV auth |     0.650 ns/B      1466 MiB/s      3.67 c/B      5637±4
     GCM-SIV enc |     0.074 ns/B     12831 MiB/s     0.418 c/B      5624
     GCM-SIV dec |     0.079 ns/B     12045 MiB/s     0.445 c/B      5624
    GCM-SIV auth |     0.033 ns/B     29124 MiB/s     0.180 c/B      5500
                 =

> 
> I’m using *Libgcrypt 1.10.0 *in ARMv8, compiled with the default settings.
> 
> 
> Arm:
> 
> [root at node-2 tests]# ./benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
> 
> Running each test 1000 times.
> 
>                  ECB/Stream    CBC/Poly1305         CFB             OFB             CTR *XTS*             CCM             GCM             OCB             EAX
> 
>               --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
> 
> AES256         380ms   390ms  1350ms   440ms  1360ms   440ms  1350ms  1360ms   430ms   440ms *530ms   550ms*  1820ms  1800ms   680ms   670ms   480ms   480ms  1820ms  1810ms
> 

My old ARMv8 system (with AES acceleration) shows following results:


$ cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 48.00
Features        : fp asimd evtstrm ***aes*** pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

$ ./version
version:1.11.1-beta23:10b01:1.50-beta2:13200:
cc:130200:gcc:13.2.0:
ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:aria:
pubkeys:dsa:elgamal:rsa:ecc:
digests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:
rnd-mod:getentropy:
cpu-arch:arm:
mpi-asm:arm/mpih-add1.S:arm/mpih-sub1.S:arm/mpih-mul1.S:arm/mpih-mul2.S:arm/mpih-mul3.S:generic/mpih-lshift.c:generic/mpih-rshift.c:
hwflist:arm-neon:***arm-aes***:arm-sha1:arm-sha2:arm-pmull:
fips-mode:n:::
rng-type:standard:1:3030000:2:
compliance:::


With AES acceleration:


$ ./benchmark --large-buffers --cipher-repetitions 1000 cipher aes256
Running each test 1000 times.
                 ECB/Stream    CBC/Poly1305         CFB             OFB             CTR             XTS             CCM             GCM             OCB             EAX
              --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256        6340ms  6540ms  1950ms  1200ms  1830ms  1190ms  6740ms  6740ms  1340ms  1340ms  1560ms  1560ms  3330ms  3350ms  2280ms  2260ms  1560ms  1550ms  3350ms  3340ms

$ ./bench-slope --cpu-mhz auto cipher aes256
Cipher:
  AES256         |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         ECB enc |      1.02 ns/B     936.5 MiB/s      1.17 c/B      1152
         ECB dec |      1.02 ns/B     935.9 MiB/s      1.17 c/B      1152
         CBC enc |      1.57 ns/B     605.9 MiB/s      1.81 c/B      1152
         CBC dec |      1.06 ns/B     899.6 MiB/s      1.22 c/B      1152
         CFB enc |      1.63 ns/B     585.7 MiB/s      1.88 c/B      1152
         CFB dec |      1.06 ns/B     899.8 MiB/s      1.22 c/B      1152
         OFB enc |      6.29 ns/B     151.5 MiB/s      7.25 c/B      1152
         OFB dec |      6.29 ns/B     151.5 MiB/s      7.25 c/B      1152
         CTR enc |      1.11 ns/B     857.2 MiB/s      1.28 c/B      1152
         CTR dec |      1.11 ns/B     855.3 MiB/s      1.28 c/B      1152
         XTS enc |      1.44 ns/B     660.5 MiB/s      1.66 c/B      1152
         XTS dec |      1.44 ns/B     661.3 MiB/s      1.66 c/B      1152
         CCM enc |      2.75 ns/B     347.4 MiB/s      3.16 c/B      1152
         CCM dec |      2.74 ns/B     347.5 MiB/s      3.16 c/B      1152
        CCM auth |      1.74 ns/B     549.0 MiB/s      2.00 c/B      1152
         EAX enc |      2.75 ns/B     347.1 MiB/s      3.16 c/B      1152
         EAX dec |      2.76 ns/B     345.9 MiB/s      3.18 c/B      1152
        EAX auth |      1.63 ns/B     583.8 MiB/s      1.88 c/B      1152
         GCM enc |      1.99 ns/B     478.2 MiB/s      2.30 c/B      1152
         GCM dec |      2.00 ns/B     477.9 MiB/s      2.30 c/B      1152
        GCM auth |     0.881 ns/B      1082 MiB/s      1.02 c/B      1152
         OCB enc |      1.21 ns/B     788.1 MiB/s      1.39 c/B      1152
         OCB dec |      1.21 ns/B     785.2 MiB/s      1.40 c/B      1152
        OCB auth |      1.32 ns/B     724.7 MiB/s      1.52 c/B      1152
         SIV enc |      2.76 ns/B     346.1 MiB/s      3.17 c/B      1152
         SIV dec |      2.85 ns/B     334.6 MiB/s      3.28 c/B      1152
        SIV auth |      1.63 ns/B     583.7 MiB/s      1.88 c/B      1152
     GCM-SIV enc |      2.10 ns/B     453.1 MiB/s      2.42 c/B      1152
     GCM-SIV dec |      2.20 ns/B     433.2 MiB/s      2.54 c/B      1152
    GCM-SIV auth |     0.990 ns/B     963.6 MiB/s      1.14 c/B      1152
                 =


Without AES acceleration:


$ ./benchmark --large-buffers --cipher-repetitions 1000 --disable-hwf arm-aes cipher aes256
Running each test 1000 times.
                 ECB/Stream    CBC/Poly1305         CFB             OFB             CTR             XTS             CCM             GCM             OCB             EAX
              --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
AES256       30100ms 30640ms 26200ms 26330ms 26290ms 26180ms 30400ms 30370ms 26740ms 26690ms 26360ms 26380ms 52890ms 52910ms 27640ms 27620ms 27310ms 27520ms 52980ms 52900ms

$ ./bench-slope --cpu-mhz auto --disable-hwf arm-aes cipher aes256
Cipher:
  AES256         |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
         ECB enc |     30.67 ns/B     31.09 MiB/s     35.33 c/B      1152
         ECB dec |     33.22 ns/B     28.71 MiB/s     38.27 c/B      1152
         CBC enc |     26.99 ns/B     35.33 MiB/s     31.09 c/B      1152
         CBC dec |     29.21 ns/B     32.65 MiB/s     33.64 c/B      1152
         CFB enc |     26.94 ns/B     35.40 MiB/s     31.03 c/B      1152
         CFB dec |     26.94 ns/B     35.40 MiB/s     31.03 c/B      1152
         OFB enc |     30.94 ns/B     30.82 MiB/s     35.64 c/B      1152
         OFB dec |     30.94 ns/B     30.82 MiB/s     35.64 c/B      1152
         CTR enc |     19.82 ns/B     48.12 MiB/s     22.83 c/B      1152
         CTR dec |     19.82 ns/B     48.11 MiB/s     22.83 c/B      1152
         XTS enc |     31.00 ns/B     30.77 MiB/s     35.71 c/B      1152
         XTS dec |     33.49 ns/B     28.47 MiB/s     38.58 c/B      1152
         CCM enc |     46.87 ns/B     20.35 MiB/s     54.00 c/B      1152
         CCM dec |     46.88 ns/B     20.34 MiB/s     53.99 c/B      1152
        CCM auth |     27.10 ns/B     35.19 MiB/s     31.22 c/B      1152
         EAX enc |     46.88 ns/B     20.34 MiB/s     54.00 c/B      1152
         EAX dec |     46.88 ns/B     20.34 MiB/s     54.00 c/B      1152
        EAX auth |     27.05 ns/B     35.25 MiB/s     31.16 c/B      1152
         GCM enc |     20.70 ns/B     46.08 MiB/s     23.84 c/B      1152
         GCM dec |     20.70 ns/B     46.07 MiB/s     23.85 c/B      1152
        GCM auth |     0.877 ns/B      1087 MiB/s      1.01 c/B      1152
         OCB enc |     27.32 ns/B     34.91 MiB/s     31.46 c/B      1152
         OCB dec |     29.58 ns/B     32.24 MiB/s     34.08 c/B      1152
        OCB auth |     27.27 ns/B     34.97 MiB/s     31.42 c/B      1152
         SIV enc |     46.89 ns/B     20.34 MiB/s     54.00 c/B      1152
         SIV dec |     46.97 ns/B     20.30 MiB/s     54.11 c/B      1152
        SIV auth |     27.05 ns/B     35.25 MiB/s     31.16 c/B      1152
     GCM-SIV enc |     32.09 ns/B     29.72 MiB/s     36.96 c/B      1152
     GCM-SIV dec |     32.19 ns/B     29.63 MiB/s     37.08 c/B      1152
    GCM-SIV auth |     0.976 ns/B     976.7 MiB/s      1.12 c/B      1152
                 =


-Jussi


More information about the Gnupg-users mailing list