From vincent.torri at gmail.com Sun May 3 00:38:42 2026 From: vincent.torri at gmail.com (Vincent Torri) Date: Sun, 3 May 2026 00:38:42 +0200 Subject: libgcrypt: detect libgpg-error with pkg-config Message-ID: Hello in libgpg-error's configure.ac : 'gpg-error-config command is deprecated.' So, pkg-config detection should be used Vincent Torri From ametzler at bebt.de Sun May 3 07:29:22 2026 From: ametzler at bebt.de (Andreas Metzler) Date: Sun, 3 May 2026 07:29:22 +0200 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: References: Message-ID: On 2026-05-03 Vincent Torri via Gcrypt-devel wrote: > Hello > in libgpg-error's configure.ac : 'gpg-error-config command is deprecated.' > So, pkg-config detection should be used Misleadingly shortened quote. The actual text in libgpg-error's configure.ac is: # We used to provide gpg-error-config command always. Now, it's # gpgrt-config command with gpg-error.pc configuration file, which # does same thing. gpg-error-config command is deprecated. libgcrypt uses AM_PATH_GPG_ERROR which will prefer gpgrt-config over gpg-error-config. cu Andreas -- "You people are noisy," Nia said. I made the gesture of agreement. From vincent.torri at gmail.com Sun May 3 09:12:14 2026 From: vincent.torri at gmail.com (Vincent Torri) Date: Sun, 3 May 2026 09:12:14 +0200 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: References: Message-ID: and about using pkg-config instead to detect libgpg-error ? On Sun, May 3, 2026 at 8:22?AM Andreas Metzler wrote: > > On 2026-05-03 Vincent Torri via Gcrypt-devel wrote: > > Hello > > > in libgpg-error's configure.ac : 'gpg-error-config command is deprecated.' > > So, pkg-config detection should be used > > > Misleadingly shortened quote. The actual text in libgpg-error's > configure.ac is: > # We used to provide gpg-error-config command always. Now, it's > # gpgrt-config command with gpg-error.pc configuration file, which > # does same thing. gpg-error-config command is deprecated. > > libgcrypt uses AM_PATH_GPG_ERROR which will prefer gpgrt-config over > gpg-error-config. > > cu Andreas > > -- > "You people are noisy," Nia said. > I made the gesture of agreement. > > _______________________________________________ > Gcrypt-devel mailing list > Gcrypt-devel at gnupg.org > https://lists.gnupg.org/mailman/listinfo/gcrypt-devel From gniibe at fsij.org Sun May 3 10:43:14 2026 From: gniibe at fsij.org (Niibe Yutaka) Date: Sun, 03 May 2026 17:43:14 +0900 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: References: Message-ID: <87y0i03mu5.fsf@jumper.gniibe.org> Vincent Torri wrote: > and about using pkg-config instead to detect libgpg-error ? The configure script of libgcrypt is written for use of gpgrt-config. It's same for other libraries of GnuPG, and GnuPG itself. gpgrt-config is developed to replace many *-config for GnuPG (gpg-error-config, libgcrypt-config, libassuan-config, libksba-config, and npth-config, etc.). gpgrt-config is a minimum subset of pkg-config. "Minimum" means that, it should have enough but no other features to support GnuPG build. The intention is to allow building GnuPG without pkg-config. GnuPG build could be done in an early stage of OS porting, so, we pursue less dependency other than GNU toolchain. True, technically, it is possible to modify those configure scripts to use pkg-config instead of gpgrt-config. But our intention is use of gpgrt-config here. We (GnuPG team) don't have any plan to modify those configure scripts with pkg-config. -- From ametzler at bebt.de Sun May 3 13:13:44 2026 From: ametzler at bebt.de (Andreas Metzler) Date: Sun, 3 May 2026 13:13:44 +0200 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: <87y0i03mu5.fsf@jumper.gniibe.org> References: <87y0i03mu5.fsf@jumper.gniibe.org> Message-ID: On 2026-05-03 Niibe Yutaka wrote: > Vincent Torri wrote: > > and about using pkg-config instead to detect libgpg-error ? > The configure script of libgcrypt is written for use of gpgrt-config. > It's same for other libraries of GnuPG, and GnuPG itself. > gpgrt-config is developed to replace many *-config for GnuPG > (gpg-error-config, libgcrypt-config, libassuan-config, libksba-config, > and npth-config, etc.). gpgrt-config is a minimum subset of pkg-config. > "Minimum" means that, it should have enough but no other features to > support GnuPG build. > The intention is to allow building GnuPG without pkg-config. GnuPG > build could be done in an early stage of OS porting, so, we pursue > less dependency other than GNU toolchain. > True, technically, it is possible to modify those configure scripts > to use pkg-config instead of gpgrt-config. But our intention is > use of gpgrt-config here. > We (GnuPG team) don't have any plan to modify those configure scripts > with pkg-config Hello, well nowadays everybody is using pkgconf instead of the original pkg-config implementation. pkg-config depended on glib but pkgconf basically only needs a C-compiler and is therefore very easy to bootstrap. >From a user's point of view the biggest downside of the gpgrt-config-using autoconf tests is their lack of speed: Compare running minimal configure.ac with PKG_CHECK_MODULES([GPGERROR], [gpg-error >= 1.0] ) ... (sid)ametzler at argenau:/tmp/HELLO$ time ./configure [...] checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.9.0... yes checking for gpg-error >= 1.0 ... yes configure: creating ./config.status real 0m0.372s user 0m0.276s sys 0m0.107s ... OTOH with AM_PATH_GPG_ERROR([1.0]) checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for gpg-error-config... no checking for gpgrt-config... /usr/bin/gpgrt-config configure: Use gpgrt-config with /usr/lib/x86_64-linux-gnu as gpg-error-config checking for GPG Error - version >= 1.0... yes (1.59) configure: creating ./config.status real 0m2.236s user 0m1.671s sys 0m0.737s cu Andreas -- "You people are noisy," Nia said. I made the gesture of agreement. From vincent.torri at gmail.com Sun May 3 13:32:37 2026 From: vincent.torri at gmail.com (Vincent Torri) Date: Sun, 3 May 2026 13:32:37 +0200 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: References: <87y0i03mu5.fsf@jumper.gniibe.org> Message-ID: On Sun, May 3, 2026 at 1:14?PM Andreas Metzler wrote: > > On 2026-05-03 Niibe Yutaka wrote: > > Vincent Torri wrote: > > > and about using pkg-config instead to detect libgpg-error ? > > > The configure script of libgcrypt is written for use of gpgrt-config. > > It's same for other libraries of GnuPG, and GnuPG itself. > > > gpgrt-config is developed to replace many *-config for GnuPG > > (gpg-error-config, libgcrypt-config, libassuan-config, libksba-config, > > and npth-config, etc.). gpgrt-config is a minimum subset of pkg-config. > > "Minimum" means that, it should have enough but no other features to > > support GnuPG build. > > > The intention is to allow building GnuPG without pkg-config. GnuPG > > build could be done in an early stage of OS porting, so, we pursue > > less dependency other than GNU toolchain. > > > True, technically, it is possible to modify those configure scripts > > to use pkg-config instead of gpgrt-config. But our intention is > > use of gpgrt-config here. > > > We (GnuPG team) don't have any plan to modify those configure scripts > > with pkg-config > > Hello, > > well nowadays everybody is using pkgconf instead of the original > pkg-config implementation. pkg-config depended on glib but pkgconf > basically only needs a C-compiler and is therefore very easy to > bootstrap. > > >From a user's point of view the biggest downside of the > gpgrt-config-using autoconf tests is their lack of speed: > > Compare running minimal configure.ac with > PKG_CHECK_MODULES([GPGERROR], [gpg-error >= 1.0] ) ... > > (sid)ametzler at argenau:/tmp/HELLO$ time ./configure > [...] > checking for pkg-config... /usr/bin/pkg-config > checking pkg-config is at least version 0.9.0... yes > checking for gpg-error >= 1.0 ... yes > configure: creating ./config.status > > real 0m0.372s > user 0m0.276s > sys 0m0.107s > > ... OTOH with AM_PATH_GPG_ERROR([1.0]) > checking build system type... x86_64-pc-linux-gnu > checking host system type... x86_64-pc-linux-gnu > checking for gpg-error-config... no > checking for gpgrt-config... /usr/bin/gpgrt-config > configure: Use gpgrt-config with /usr/lib/x86_64-linux-gnu as gpg-error-config > checking for GPG Error - version >= 1.0... yes (1.59) > configure: creating ./config.status > > real 0m2.236s > user 0m1.671s > sys 0m0.737s > if you want speed, use a meson build system using muon (C port of meson) instead of autotools... Vincent Torri From gniibe at fsij.org Tue May 5 02:39:31 2026 From: gniibe at fsij.org (Niibe Yutaka) Date: Tue, 05 May 2026 09:39:31 +0900 Subject: libgcrypt: detect libgpg-error with pkg-config In-Reply-To: References: <87y0i03mu5.fsf@jumper.gniibe.org>

Message-ID: <87v7d28zb0.fsf@jumper.gniibe.org> Hello, Thank you for your comments. It would be OK to improve gpgrt-config implementation, possibly written in C. However, additional dependency to pkg-config won't be our way to go. And moving from GNU autotools + libtool to another... we don't have any plan in this direction at all. -- From jussi.kivilinna at iki.fi Wed May 6 08:57:20 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 06 May 2026 09:57:20 +0300 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: <20260506040715.2478247-1-mikey@neuling.org> References: <20260506040715.2478247-1-mikey@neuling.org> Message-ID: <335853b6baaebc2488b266d299cdefd3@iki.fi> Hello, On 2026-05-06 07:07, Michael Neuling wrote: > cipher/rijndael-riscv-zvkned.c's m4 batching code assumes m1 holds > exactly one 16-byte AES block (i.e. VLEN == 128). On VLEN >= 256 > the four-block m4 group is laid out differently and AES_CRYPT m4 > vl=16 miscomputes blocks 1..3. m4 batching code path selects 128-bit vectors (4 32-bit elements or 16 8-bit elements) and "m4" grouping. Whatever HW supports VLEN>128 or VLEN=128 not should not matter here. > > Replace the existing __riscv_vsetvl_e32m1(4) == 4 gate (which only > checked "VLEN >= 128") with __riscv_vsetvlmax_e32m1() == 4 (== 4 > if VLEN == 128). On any other VLEN the backend refuses setup > and libgcrypt's dispatcher in cipher/rijndael.c falls through to > USE_VP_RISCV (rijndael-vp-riscv.c), which is Zvbb-based and has > no VLEN dependency. > > Issue found by Claude Opus using qemu on Tenstorrent Ascalon model > (-cpu tt-ascalon). So, question is: * Is build system buggy for AES+m4? There was GCC bug for m4 aes intrinsics that configure.ac attempts to detect. Was GCC fixed incorrectly and is it now buggy in some other way (thus bypassing configure.ac check)? * Is QEMU implementation buggy for VLEN=256? * Or is the implementation actually wrong and I did get the RISC-V vector instruction set wrong? * SpaceMit K3 should have vector AES extension and VLEN=256... it would be nice to get hands on it and test this with real hardware. If issue is the first one, configure.ac should be improved. Or _gcry_aes_riscv_zvkned_setup_acceleration() to be improved to do run-time check for buggy aes+m4 build-system/HW. -Jussi > > Tested-on: tt-ascalon (VLEN=256) under qemu 9.1.92 > Tested-on: rva23s64 (VLEN=128) under qemu 9.1.92 > Signed-off-by: Michael Neuling > --- > cipher/rijndael-riscv-zvkned.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/cipher/rijndael-riscv-zvkned.c > b/cipher/rijndael-riscv-zvkned.c > index 434b9562be..d083c05703 100644 > --- a/cipher/rijndael-riscv-zvkned.c > +++ b/cipher/rijndael-riscv-zvkned.c > @@ -142,7 +142,10 @@ int ASM_FUNC_ATTR_NOINLINE FUNC_ATTR_OPT_O2 > _gcry_aes_riscv_zvkned_setup_acceleration(RIJNDAEL_context *ctx) > { > (void)ctx; > - return (__riscv_vsetvl_e32m1(4) == 4); > + /* The m4 batching code assumes m1 holds exactly one 16-byte > + AES block (i.e. VLEN == 128). Refuse the backend on any other > + VLEN. */ > + return (__riscv_vsetvlmax_e32m1() == 4); > } From jussi.kivilinna at iki.fi Wed May 6 11:41:31 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 06 May 2026 12:41:31 +0300 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: References: <20260506040715.2478247-1-mikey@neuling.org> <335853b6baaebc2488b266d299cdefd3@iki.fi> Message-ID: Hello, On 2026-05-06 10:47, Michael Neuling wrote: > Jussi, > > Thanks for the reply. > >> m4 batching code path selects 128-bit vectors (4 32-bit elements or >> 16 8-bit elements) and "m4" grouping. Whatever HW supports VLEN>128 >> or VLEN=128 not should not matter here. > > I think the code's assumption around __riscv_vset_v_u32m1_u32m4() may > be wrong. Thanks for checking this out and for the reproducer. I tested with clang and same problem persists, __riscv_vset_v_u32m1_u32m4 usage must be wrong. I'll check for proper fix. Btw, VLEN=512 gives yet another output: $ clang --target=riscv64-linux-gnu libgcrypt-rvv-vlen128-assumption.c -o libgcrypt-rvv-vlen128-assumption -O2 -march=rv64gcv -static $ qemu-riscv64 -cpu max,vlen=512 ./libgcrypt-rvv-vlen128-assumption Element-by-element view of out[0..15]: out[ 0] = 10001111 out[ 1] = 10002222 out[ 2] = 10003333 out[ 3] = 10004444 out[ 4] = 00000000 out[ 5] = 00000000 out[ 6] = 00000000 out[ 7] = 00000000 out[ 8] = 00000000 out[ 9] = 00000000 out[10] = 00000000 out[11] = 00000000 out[12] = 00000000 out[13] = 00000000 out[14] = 00000000 out[15] = 00000000 libgcrypt-shaped layout (VLEN=128 assumption): BUG -- AES_CRYPT m4 vl=16 will not find the 4 blocks here Where each loaded m1 register actually lands in g (per RVV intrinsic spec, sub-register N -> elements N*VLMAX_m1 .. (N+1)*VLMAX_m1 - 1): out[0..3] = sub-register 0 (= r0 + r0-tail) ... and so on for sub-registers 1..3 -Jussi From jussi.kivilinna at iki.fi Wed May 6 13:05:22 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 06 May 2026 14:05:22 +0300 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: References: <20260506040715.2478247-1-mikey@neuling.org> <335853b6baaebc2488b266d299cdefd3@iki.fi>

Message-ID: <4c82c74d04d170c5e0e5de6a7940b27b@iki.fi> On 2026-05-06 12:51, Michael Neuling wrote: > Jussi, > >> Thanks for checking this out and for the reproducer. I tested with >> clang >> and same problem persists, __riscv_vset_v_u32m1_u32m4 usage must be >> wrong. I'll check for proper fix. > > Thanks! I'm happy to test if you have something. > > If it helps, Claude had a go at a fix that works for me: > > https://github.com/mikey/libgcrypt/commit/c22ff9747c4e7b8b77360eb29b7a8cd91cee1a00 Thanks. I'm considering either vslideup or vcreate+vgather method for quick fix to avoid memory writing+reading. Then for actual fix, implementation would need to be checked if 4xM1->M4 transitions could be avoided altogether. -Jussi From mikey at neuling.org Wed May 6 06:07:15 2026 From: mikey at neuling.org (Michael Neuling) Date: Wed, 6 May 2026 04:07:15 +0000 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 Message-ID: <20260506040715.2478247-1-mikey@neuling.org> cipher/rijndael-riscv-zvkned.c's m4 batching code assumes m1 holds exactly one 16-byte AES block (i.e. VLEN == 128). On VLEN >= 256 the four-block m4 group is laid out differently and AES_CRYPT m4 vl=16 miscomputes blocks 1..3. Replace the existing __riscv_vsetvl_e32m1(4) == 4 gate (which only checked "VLEN >= 128") with __riscv_vsetvlmax_e32m1() == 4 (== 4 if VLEN == 128). On any other VLEN the backend refuses setup and libgcrypt's dispatcher in cipher/rijndael.c falls through to USE_VP_RISCV (rijndael-vp-riscv.c), which is Zvbb-based and has no VLEN dependency. Issue found by Claude Opus using qemu on Tenstorrent Ascalon model (-cpu tt-ascalon). Tested-on: tt-ascalon (VLEN=256) under qemu 9.1.92 Tested-on: rva23s64 (VLEN=128) under qemu 9.1.92 Signed-off-by: Michael Neuling --- cipher/rijndael-riscv-zvkned.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/cipher/rijndael-riscv-zvkned.c b/cipher/rijndael-riscv-zvkned.c index 434b9562be..d083c05703 100644 --- a/cipher/rijndael-riscv-zvkned.c +++ b/cipher/rijndael-riscv-zvkned.c @@ -142,7 +142,10 @@ int ASM_FUNC_ATTR_NOINLINE FUNC_ATTR_OPT_O2 _gcry_aes_riscv_zvkned_setup_acceleration(RIJNDAEL_context *ctx) { (void)ctx; - return (__riscv_vsetvl_e32m1(4) == 4); + /* The m4 batching code assumes m1 holds exactly one 16-byte + AES block (i.e. VLEN == 128). Refuse the backend on any other + VLEN. */ + return (__riscv_vsetvlmax_e32m1() == 4); } -- 2.43.0 From mikey at neuling.org Wed May 6 09:47:35 2026 From: mikey at neuling.org (Michael Neuling) Date: Wed, 6 May 2026 17:47:35 +1000 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: <335853b6baaebc2488b266d299cdefd3@iki.fi> References: <20260506040715.2478247-1-mikey@neuling.org> <335853b6baaebc2488b266d299cdefd3@iki.fi> Message-ID: Jussi, Thanks for the reply. > m4 batching code path selects 128-bit vectors (4 32-bit elements or > 16 8-bit elements) and "m4" grouping. Whatever HW supports VLEN>128 > or VLEN=128 not should not matter here. I think the code's assumption around __riscv_vset_v_u32m1_u32m4() may be wrong. I'm not familiar with RVV intrinsics so I apologize for my ignorance here. Claude did give a minimal reproducer below using the same pattern as the libgcrypt code. Results using qemu-user: % riscv64-linux-gnu-gcc -O2 -march=rv64gcv -static -o libgcrypt-rvv-vlen128-assumption libgcrypt-rvv-vlen128-assumption.c % qemu-riscv64 -cpu rv64,v=true,vlen=128 ./libgcrypt-rvv-vlen128-assumption Element-by-element view of out[0..15]: out[ 0] = 10001111 out[ 1] = 10002222 out[ 2] = 10003333 out[ 3] = 10004444 out[ 4] = 20001111 out[ 5] = 20002222 out[ 6] = 20003333 out[ 7] = 20004444 out[ 8] = 30001111 out[ 9] = 30002222 out[10] = 30003333 out[11] = 30004444 out[12] = 40001111 out[13] = 40002222 out[14] = 40003333 out[15] = 40004444 libgcrypt-shaped layout (VLEN=128 assumption): OK -- 4 contiguous AES blocks at out[0..15] % qemu-riscv64 -cpu rv64,v=true,vlen=256 ./libgcrypt-rvv-vlen128-assumption Element-by-element view of out[0..15]: out[ 0] = 10001111 out[ 1] = 10002222 out[ 2] = 10003333 out[ 3] = 10004444 out[ 4] = 00000000 out[ 5] = 00000000 out[ 6] = 00000000 out[ 7] = 00000000 out[ 8] = 20001111 out[ 9] = 20002222 out[10] = 20003333 out[11] = 20004444 out[12] = 00000000 out[13] = 00000000 out[14] = 00000000 out[15] = 00000000 libgcrypt-shaped layout (VLEN=128 assumption): BUG -- AES_CRYPT m4 vl=16 will not find the 4 blocks here Where each loaded m1 register actually lands in g (per RVV intrinsic spec, sub-register N -> elements N*VLMAX_m1 .. (N+1)*VLMAX_m1 - 1): out[0..3] = sub-register 0 (= r0 + r0-tail) ... and so on for sub-registers 1..3 % cat libgcrypt-rvv-vlen128-assumption.c /* * Minimal reproducer: libgcrypt's RVV Zvkned AES kernels miscompute on * RVV-capable CPUs whose VLEN > 128. The same bug shows up as AES-CFB * encrypt-decrypt mismatch in libgcrypt's tests/basic on tt-ascalon * (VLEN=256). * * NOT A GCC BUG. The __riscv_vundefined / __riscv_vset / __riscv_vget * intrinsics behave exactly as the RVV intrinsic spec defines them. * The bug is in libgcrypt's USAGE pattern, which silently assumes * VLEN=128. * * The pattern (cipher/rijndael-riscv-zvkned.c CFB-DEC m4 path): * * size_t vl = 4; // u32 elements per AES block * vsetvli e32, m1, vl=4 (implicit via intrinsics with vl arg) * vuint32m1_t r0, r1, r2, r3; // each = 1 AES block * vuint32m4_t g = __riscv_vundefined_u32m4(); * g = __riscv_vset_v_u32m1_u32m4(g, 0, r0); * g = __riscv_vset_v_u32m1_u32m4(g, 1, r1); * g = __riscv_vset_v_u32m1_u32m4(g, 2, r2); * g = __riscv_vset_v_u32m1_u32m4(g, 3, r3); * AES_CRYPT(e, m4, rounds, g, vl * 4); // vsetvli e32, m4, vl=16 * * The author intended "place 4 AES blocks contiguously at elements * 0..15 of g, then encrypt all four". That's what the code does on * VLEN=128 (m1 has 4 elements / register; the m4 group has 16 elements * total at 4 per sub-register). * * On VLEN=256 (m1 = 8 elements/register, m4 = 32 elements): * - vset places r0 in g's *register* 0 (whole register, 8 elements). * Of those 8 elements, only the first 4 are r0's active data; the * other 4 are r0's tail (whatever vsetvli e32m1 vl=4 left there -- * "all-1s" on tt-ascalon's tail-agnostic policy, "undisturbed" or * other garbage on other CPUs). * - vset slot 1 -> g register 1 (whole register), holds r1 + r1-tail. * - vset slot 2 -> g register 2 (whole register), holds r2 + r2-tail. * - vset slot 3 -> g register 3 (whole register), holds r3 + r3-tail. * - AES_CRYPT with vl=16 then processes only the first 16 elements * of g (elements 0..15 = sub-registers 0 and 1 only, with each * sub-register holding ONE valid block + ONE tail block). * * Net effect: AES sees blocks * block 0: r0's valid 4 elements (= intended slot 0) GOOD * block 1: r0's TAIL WRONG * block 2: r1's valid 4 elements (intended for slot 1!) WRONG * block 3: r1's TAIL WRONG * and r2, r3 in sub-registers 2, 3 are never touched. * * This program demonstrates the layout difference. It builds a u32m4 * group via the libgcrypt pattern and stores 16 elements (the same vl * AES_CRYPT m4 uses) -- showing where each input block actually lands. * * Build: * riscv64-linux-gnu-gcc -O2 -march=rv64gcv -static \ * -o libgcrypt-rvv-vlen128-assumption \ * libgcrypt-rvv-vlen128-assumption.c * * Run: * qemu-riscv64 -cpu rv64,v=true,vlen=128 ./libgcrypt-rvv-vlen128-assumption * PASS: blocks 0..3 land at out[0..3], out[4..7], out[8..11], out[12..15] * * qemu-riscv64 -cpu rv64,v=true,vlen=256 ./libgcrypt-rvv-vlen128-assumption * FAIL: out[4..7] is r0's tail; out[8..11] is r1; out[12..15] is r1's tail * * The libgcrypt fix is to lay the 4 blocks out byte-contiguously in * memory (4 blocks * 16 bytes = 64 bytes) and reload via * __riscv_vle8_v_u8m4 with vl=64, so the four blocks always land at * element positions 0..15 of the m4 group regardless of VLEN. See * targets/libgcrypt-zvkned-fix.py in the wr2 harness for the patch. * * Author: Claude Opus 4.6 */ #include #include #include int main(void) { /* Distinct per-block payloads so we can identify where each block's data ends up in memory. Top nibble = source slot id. */ uint32_t b0_in[4] = { 0x10001111, 0x10002222, 0x10003333, 0x10004444 }; uint32_t b1_in[4] = { 0x20001111, 0x20002222, 0x20003333, 0x20004444 }; uint32_t b2_in[4] = { 0x30001111, 0x30002222, 0x30003333, 0x30004444 }; uint32_t b3_in[4] = { 0x40001111, 0x40002222, 0x40003333, 0x40004444 }; size_t vl = 4; /* AES block = 4 u32 = 16 bytes */ vuint32m1_t r0 = __riscv_vle32_v_u32m1(b0_in, vl); vuint32m1_t r1 = __riscv_vle32_v_u32m1(b1_in, vl); vuint32m1_t r2 = __riscv_vle32_v_u32m1(b2_in, vl); vuint32m1_t r3 = __riscv_vle32_v_u32m1(b3_in, vl); /* libgcrypt's pattern: build u32m4 group via vundefined + 4 vsets. */ vuint32m4_t g = __riscv_vundefined_u32m4(); g = __riscv_vset_v_u32m1_u32m4(g, 0, r0); g = __riscv_vset_v_u32m1_u32m4(g, 1, r1); g = __riscv_vset_v_u32m1_u32m4(g, 2, r2); g = __riscv_vset_v_u32m1_u32m4(g, 3, r3); /* Store the same number of elements that AES_CRYPT m4 vl=16 would process (16 u32 = elements 0..15 of the m4 group). */ uint32_t out[16]; __riscv_vse32_v_u32m4(out, g, vl * 4); printf("Element-by-element view of out[0..15]:\n\n"); for (int i = 0; i < 16; i++) printf(" out[%2d] = %08x\n", i, out[i]); /* The libgcrypt-shaped expectation: blocks 0..3 contiguous at out[0..3], out[4..7], out[8..11], out[12..15]. This holds on VLEN=128 only. */ int libgcrypt_layout_ok = 1; uint32_t *blocks[4] = { b0_in, b1_in, b2_in, b3_in }; for (int blk = 0; blk < 4; blk++) { for (int e = 0; e < 4; e++) { int idx = blk * 4 + e; if (out[idx] != blocks[blk][e]) { libgcrypt_layout_ok = 0; break; } } } printf("\nlibgcrypt-shaped layout (VLEN=128 assumption): %s\n", libgcrypt_layout_ok ? "OK -- 4 contiguous AES blocks at out[0..15]" : "BUG -- AES_CRYPT m4 vl=16 will not find the 4 blocks here"); if (!libgcrypt_layout_ok) { /* Show where the blocks actually went on this VLEN. */ printf("\nWhere each loaded m1 register actually lands in g (per " "RVV intrinsic spec, sub-register N -> elements " "N*VLMAX_m1 .. (N+1)*VLMAX_m1 - 1):\n"); printf(" out[0..%zu] = sub-register 0 (= r0 + r0-tail)\n", (size_t)(vl * 4 / 4 - 1)); printf(" ... and so on for sub-registers 1..3\n"); } return libgcrypt_layout_ok ? 0 : 1; } From mikey at neuling.org Wed May 6 11:51:34 2026 From: mikey at neuling.org (Michael Neuling) Date: Wed, 6 May 2026 19:51:34 +1000 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: References: <20260506040715.2478247-1-mikey@neuling.org> <335853b6baaebc2488b266d299cdefd3@iki.fi> Message-ID: Jussi, > Thanks for checking this out and for the reproducer. I tested with clang > and same problem persists, __riscv_vset_v_u32m1_u32m4 usage must be > wrong. I'll check for proper fix. Thanks! I'm happy to test if you have something. If it helps, Claude had a go at a fix that works for me: https://github.com/mikey/libgcrypt/commit/c22ff9747c4e7b8b77360eb29b7a8cd91cee1a00 Mikey From mikey at neuling.org Wed May 6 11:28:28 2026 From: mikey at neuling.org (Michael Neuling) Date: Wed, 6 May 2026 19:28:28 +1000 Subject: [PATCH] cipher:riscv: gate Zvkned AES backend on VLEN == 128 In-Reply-To: References: <20260506040715.2478247-1-mikey@neuling.org> <335853b6baaebc2488b266d299cdefd3@iki.fi> Message-ID: Jussi, To try to eliminate qemu and gcc, I've done some more testing: I run this test case on a Banana BPI-F3 with Spacemit X60 cores (RVA22 + RVV 1.0 with VLEN=256) and it also fails there. This was compiled with gcc 14.2. The earlier qemu test I did with gcc 15.0 and gcc 13.3. Both fail with VLEN=256. SpacemiT X60 result: % ./libgcrypt-rvv-vlen128-assumption Element-by-element view of out[0..15]: out[ 0] = 10001111 out[ 1] = 10002222 out[ 2] = 10003333 out[ 3] = 10004444 out[ 4] = 00000000 out[ 5] = 00000000 out[ 6] = 00000000 out[ 7] = 00000000 out[ 8] = 20001111 out[ 9] = 20002222 out[10] = 20003333 out[11] = 20004444 out[12] = 00000000 out[13] = 00000000 out[14] = 00000000 out[15] = 00000000 libgcrypt-shaped layout (VLEN=128 assumption): BUG -- AES_CRYPT m4 vl=16 will not find the 4 blocks here Where each loaded m1 register actually lands in g (per RVV intrinsic spec, sub-register N -> elements N*VLMAX_m1 .. (N+1)*VLMAX_m1 - 1): out[0..3] = sub-register 0 (= r0 + r0-tail) ... and so on for sub-registers 1..3 % From jussi.kivilinna at iki.fi Wed May 6 20:50:20 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 6 May 2026 21:50:20 +0300 Subject: [PATCH] rijndael-riscv-zvkned: fix m4 grouping when VLEN greater than 128 In-Reply-To: References: Message-ID: <20260506185020.1553147-1-jussi.kivilinna@iki.fi> * cipher/rijndael-riscv-zvkned.c (vxor_u8_u32m1, vxor_u8_u32m4): Mark as ASM_FUNC_ATTR_INLINE. (merge_4x_u32m1_to_u32m4, split_u32m4_to_4x_u32m1): New. (_gcry_aes_riscv_zvkned_ctr_enc, _gcry_aes_riscv_zvkned_ctr32le_enc) (aes_riscv_ocb_enc, aes_riscv_ocb_dec, _gcry_aes_riscv_zvkned_ocb_auth) (aes_riscv_xts_enc, aes_riscv_xts_dec): Use merge_4x_u32m1_to_u32m4 and split_u32m4_to_4x_u32m1 instead of __riscv_vset_v_u32m1_u32m4 and __riscv_vget_v_u32m4_u32m1. (_gcry_aes_riscv_zvkned_cfb_dec, _gcry_aes_riscv_zvkned_cbc_dec): Slide m4 groups instead of splitting to m1 and combining back to m4. -- Implementation was making wrong assumptions about m4 grouping with different VLEN configurations. Implementation did work with VLEN=128 but broke apart with VLEN=256, etc when VLEN>128. This commit switches riscv-zvkned to use vslideup/vslidedown for setting up m4 vector group with 4x128-bits data from four vectors with 128-bits of data. Tested with "qemu-riscv64 -cpu max,vlen={128, 256, 512, 1024}". Reported-by: Michael Neuling Signed-off-by: Jussi Kivilinna --- cipher/rijndael-riscv-zvkned.c | 218 +++++++++++++++++++-------------- 1 file changed, 123 insertions(+), 95 deletions(-) diff --git a/cipher/rijndael-riscv-zvkned.c b/cipher/rijndael-riscv-zvkned.c index 434b9562..064c093f 100644 --- a/cipher/rijndael-riscv-zvkned.c +++ b/cipher/rijndael-riscv-zvkned.c @@ -115,7 +115,7 @@ unaligned_store_u32m4(void *ptr, vuint32m4_t vec, size_t vl_u32) __riscv_vse8_v_u8m4(ptr, cast_u32m4_u8m4(vec), vl_bytes); } -static vuint32m1_t +static ASM_FUNC_ATTR_INLINE vuint32m1_t vxor_u8_u32m1(vuint32m1_t a, vuint32m1_t b, size_t vl_u32) { size_t vl_bytes = vl_u32 * 4; @@ -124,7 +124,7 @@ vxor_u8_u32m1(vuint32m1_t a, vuint32m1_t b, size_t vl_u32) cast_u32m1_u8m1(b), vl_bytes)); } -static vuint32m4_t +static ASM_FUNC_ATTR_INLINE vuint32m4_t vxor_u8_u32m4(vuint32m4_t a, vuint32m4_t b, size_t vl_u32) { size_t vl_bytes = vl_u32 * 4; @@ -133,6 +133,45 @@ vxor_u8_u32m4(vuint32m4_t a, vuint32m4_t b, size_t vl_u32) cast_u32m4_u8m4(b), vl_bytes)); } +static ASM_FUNC_ATTR_INLINE vuint32m4_t +merge_4x_u32m1_to_u32m4(vuint32m1_t v0, vuint32m1_t v1, vuint32m1_t v2, + vuint32m1_t v3) +{ + vuint32m2_t v01, v23, tmp2; + vuint32m4_t out, tmp4; + size_t vl = 4; + + v01 = __riscv_vlmul_ext_v_u32m1_u32m2(v0); + tmp2 = __riscv_vlmul_ext_v_u32m1_u32m2(v1); + v01 = __riscv_vslideup_vx_u32m2(v01, tmp2, vl, vl * 2); + v23 = __riscv_vlmul_ext_v_u32m1_u32m2(v2); + tmp2 = __riscv_vlmul_ext_v_u32m1_u32m2(v3); + v23 = __riscv_vslideup_vx_u32m2(v23, tmp2, vl, vl * 2); + out = __riscv_vlmul_ext_v_u32m2_u32m4(v01); + tmp4 = __riscv_vlmul_ext_v_u32m2_u32m4(v23); + return __riscv_vslideup_vx_u32m4(out, tmp4, vl * 2, vl * 4); +} + +static ASM_FUNC_ATTR_INLINE vuint32m1x4_t +split_u32m4_to_4x_u32m1(vuint32m4_t v0123) +{ + vuint32m2_t v01 = __riscv_vlmul_trunc_v_u32m4_u32m2(v0123); + vuint32m2_t v23 = __riscv_vlmul_trunc_v_u32m4_u32m2( + __riscv_vslidedown_vx_u32m4(v0123, 8, 16)); + vuint32m1_t v0 = __riscv_vlmul_trunc_v_u32m2_u32m1(v01); + vuint32m1_t v1 = __riscv_vlmul_trunc_v_u32m2_u32m1( + __riscv_vslidedown_vx_u32m2(v01, 4, 8)); + vuint32m1_t v2 = __riscv_vlmul_trunc_v_u32m2_u32m1(v23); + vuint32m1_t v3 = __riscv_vlmul_trunc_v_u32m2_u32m1( + __riscv_vslidedown_vx_u32m2(v23, 4, 8)); + vuint32m1x4_t out = __riscv_vundefined_u32m1x4(); + out = __riscv_vset_v_u32m1_u32m1x4(out, 0, v0); + out = __riscv_vset_v_u32m1_u32m1x4(out, 1, v1); + out = __riscv_vset_v_u32m1_u32m1x4(out, 2, v2); + out = __riscv_vset_v_u32m1_u32m1x4(out, 3, v3); + return out; +} + /* * HW support detection @@ -780,11 +819,8 @@ _gcry_aes_riscv_zvkned_ctr_enc (void *context, unsigned char *ctr_arg, ctr_u32_3 = bswap128_u32m1(ctr_u32_3, vl); ctr_u32_4 = bswap128_u32m1(ctr_u32_4, vl); - ctr4blks = __riscv_vundefined_u32m4(); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 0, ctr); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 1, ctr_u32_1); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 2, ctr_u32_2); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 3, ctr_u32_3); + ctr4blks = merge_4x_u32m1_to_u32m4(ctr, ctr_u32_1, ctr_u32_2, + ctr_u32_3); ctr = ctr_u32_4; } else @@ -794,17 +830,14 @@ _gcry_aes_riscv_zvkned_ctr_enc (void *context, unsigned char *ctr_arg, vuint8m1_t ctr1 = __riscv_vadd_vv_u8m1(ctr_u8, add1, vl_bytes); vuint8m1_t ctr2 = __riscv_vadd_vv_u8m1(ctr_u8, add2, vl_bytes); vuint8m1_t ctr3 = __riscv_vadd_vv_u8m1(ctr_u8, add3, vl_bytes); - vuint8m4_t ctr0123_u8 = __riscv_vundefined_u8m4(); ctr = cast_u8m1_u32m1(__riscv_vadd_vv_u8m1(ctr_u8, add4, vl_bytes)); - ctr0123_u8 = __riscv_vset_v_u8m1_u8m4(ctr0123_u8, 0, ctr_u8); - ctr0123_u8 = __riscv_vset_v_u8m1_u8m4(ctr0123_u8, 1, ctr1); - ctr0123_u8 = __riscv_vset_v_u8m1_u8m4(ctr0123_u8, 2, ctr2); - ctr0123_u8 = __riscv_vset_v_u8m1_u8m4(ctr0123_u8, 3, ctr3); - - ctr4blks = cast_u8m4_u32m4(ctr0123_u8); + ctr4blks = merge_4x_u32m1_to_u32m4(cast_u8m1_u32m1(ctr_u8), + cast_u8m1_u32m1(ctr1), + cast_u8m1_u32m1(ctr2), + cast_u8m1_u32m1(ctr3)); } data4blks = __riscv_vle8_v_u8m4(inbuf, vl_bytes * 4); @@ -904,13 +937,10 @@ _gcry_aes_riscv_zvkned_ctr32le_enc (void *context, unsigned char *ctr_arg, vuint32m1_t ctr1 = __riscv_vadd_vv_u32m1(ctr, add1, vl); vuint32m1_t ctr2 = __riscv_vadd_vv_u32m1(ctr, add2, vl); vuint32m1_t ctr3 = __riscv_vadd_vv_u32m1(ctr, add3, vl); - vuint32m4_t ctr4blks = __riscv_vundefined_u32m4(); + vuint32m4_t ctr4blks; vuint8m4_t data4blks; - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 0, ctr); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 1, ctr1); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 2, ctr2); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 3, ctr3); + ctr4blks = merge_4x_u32m1_to_u32m4(ctr, ctr1, ctr2, ctr3); ctr = __riscv_vadd_vv_u32m1(ctr, add4, vl); data4blks = __riscv_vle8_v_u8m4(inbuf, vl_bytes * 4); @@ -968,17 +998,12 @@ _gcry_aes_riscv_zvkned_cfb_dec (void *context, unsigned char *iv_arg, for (; nblocks >= 4; nblocks -= 4) { vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m1_t iv1 = __riscv_vget_v_u32m4_u32m1(data4blks, 0); - vuint32m1_t iv2 = __riscv_vget_v_u32m4_u32m1(data4blks, 1); - vuint32m1_t iv3 = __riscv_vget_v_u32m4_u32m1(data4blks, 2); - vuint32m1_t iv4 = __riscv_vget_v_u32m4_u32m1(data4blks, 3); - vuint32m4_t iv4blks = __riscv_vundefined_u32m4(); - - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 0, iv); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 1, iv1); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 2, iv2); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 3, iv3); - iv = iv4; + vuint32m1_t new_iv = __riscv_vlmul_trunc_v_u32m4_u32m1( + __riscv_vslidedown_vx_u32m4(data4blks, 12, 16)); + vuint32m4_t iv_m4 = __riscv_vlmul_ext_v_u32m1_u32m4(iv); + vuint32m4_t iv4blks = __riscv_vslideup_vx_u32m4(iv_m4, data4blks, 4, 16); + + iv = new_iv; AES_CRYPT(e, m4, rounds, iv4blks, vl * 4); @@ -1036,22 +1061,16 @@ _gcry_aes_riscv_zvkned_cbc_dec (void *context, unsigned char *iv_arg, for (; nblocks >= 4; nblocks -= 4) { vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m1_t iv1 = __riscv_vget_v_u32m4_u32m1(data4blks, 0); - vuint32m1_t iv2 = __riscv_vget_v_u32m4_u32m1(data4blks, 1); - vuint32m1_t iv3 = __riscv_vget_v_u32m4_u32m1(data4blks, 2); - vuint32m1_t iv4 = __riscv_vget_v_u32m4_u32m1(data4blks, 3); - vuint32m4_t iv4blks = __riscv_vundefined_u32m4(); + vuint32m4_t iv_m4 = __riscv_vlmul_ext_v_u32m1_u32m4(iv); + vuint32m4_t iv4blks = __riscv_vslideup_vx_u32m4(iv_m4, data4blks, 4, 16); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 0, iv); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 1, iv1); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 2, iv2); - iv4blks = __riscv_vset_v_u32m1_u32m4(iv4blks, 3, iv3); + iv = __riscv_vlmul_trunc_v_u32m4_u32m1( + __riscv_vslidedown_vx_u32m4(data4blks, 12, 16)); AES_CRYPT(d, m4, rounds, data4blks, vl * 4); data4blks = vxor_u8_u32m4(iv4blks, data4blks, vl * 4); unaligned_store_u32m4(outbuf, data4blks, vl * 4); - iv = iv4; inbuf += 4 * BLOCKSIZE; outbuf += 4 * BLOCKSIZE; @@ -1101,20 +1120,16 @@ aes_riscv_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, if (nblocks >= 4) { - vuint32m4_t ctr4blks = __riscv_vundefined_u32m4(); vuint32m1_t zero = __riscv_vmv_v_x_u32m1(0, vl); - - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 0, ctr); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 1, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 2, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 3, zero); + vuint32m4_t ctr4blks = merge_4x_u32m1_to_u32m4(ctr, zero, zero, zero); for (; nblocks >= 4; nblocks -= 4) { const unsigned char *l; vuint8m1_t l_ntzi; vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m4_t offsets = __riscv_vundefined_u32m4(); + vuint32m1_t offset0, offset1, offset2, offset3; + vuint32m4_t offsets; /* Checksum_i = Checksum_{i-1} xor P_i */ ctr4blks = vxor_u8_u32m4(ctr4blks, data4blks, vl * 4); @@ -1124,22 +1139,24 @@ aes_riscv_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 0, iv); + offset0 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 1, iv); + offset1 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 2, iv); + offset2 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 3, iv); + offset3 = iv; + + offsets = merge_4x_u32m1_to_u32m4(offset0, offset1, offset2, offset3); data4blks = vxor_u8_u32m4(offsets, data4blks, vl * 4); @@ -1154,10 +1171,13 @@ aes_riscv_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, } /* Checksum_i = Checksum_{i-1} xor P_i */ - ctr = vxor_u8_u32m1(__riscv_vget_v_u32m4_u32m1(ctr4blks, 0), - __riscv_vget_v_u32m4_u32m1(ctr4blks, 1), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 2), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 3), vl); + { + vuint32m1x4_t ctr0123 = split_u32m4_to_4x_u32m1(ctr4blks); + ctr = vxor_u8_u32m1(__riscv_vget_v_u32m1x4_u32m1(ctr0123, 0), + __riscv_vget_v_u32m1x4_u32m1(ctr0123, 1), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 2), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 3), vl); + } } for (; nblocks; nblocks--) @@ -1228,42 +1248,40 @@ aes_riscv_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, if (nblocks >= 4) { - vuint32m4_t ctr4blks = __riscv_vundefined_u32m4(); vuint32m1_t zero = __riscv_vmv_v_x_u32m1(0, vl); - - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 0, ctr); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 1, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 2, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 3, zero); + vuint32m4_t ctr4blks = merge_4x_u32m1_to_u32m4(ctr, zero, zero, zero); for (; nblocks >= 4; nblocks -= 4) { const unsigned char *l; vuint8m1_t l_ntzi; vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m4_t offsets = __riscv_vundefined_u32m4(); + vuint32m1_t offset0, offset1, offset2, offset3; + vuint32m4_t offsets; /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ /* P_i = Offset_i xor ENCIPHER(K, C_i xor Offset_i) */ l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 0, iv); + offset0 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 1, iv); + offset1 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 2, iv); + offset2 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 3, iv); + offset3 = iv; + + offsets = merge_4x_u32m1_to_u32m4(offset0, offset1, offset2, offset3); data4blks = vxor_u8_u32m4(offsets, data4blks, vl * 4); @@ -1281,10 +1299,13 @@ aes_riscv_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, } /* Checksum_i = Checksum_{i-1} xor P_i */ - ctr = vxor_u8_u32m1(__riscv_vget_v_u32m4_u32m1(ctr4blks, 0), - __riscv_vget_v_u32m4_u32m1(ctr4blks, 1), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 2), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 3), vl); + { + vuint32m1x4_t ctr0123 = split_u32m4_to_4x_u32m1(ctr4blks); + ctr = vxor_u8_u32m1(__riscv_vget_v_u32m1x4_u32m1(ctr0123, 0), + __riscv_vget_v_u32m1x4_u32m1(ctr0123, 1), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 2), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 3), vl); + } } for (; nblocks; nblocks--) @@ -1360,42 +1381,40 @@ _gcry_aes_riscv_zvkned_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, if (nblocks >= 4) { - vuint32m4_t ctr4blks = __riscv_vundefined_u32m4(); vuint32m1_t zero = __riscv_vmv_v_x_u32m1(0, vl); - - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 0, ctr); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 1, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 2, zero); - ctr4blks = __riscv_vset_v_u32m1_u32m4(ctr4blks, 3, zero); + vuint32m4_t ctr4blks = merge_4x_u32m1_to_u32m4(ctr, zero, zero, zero); for (; nblocks >= 4; nblocks -= 4) { const unsigned char *l; vuint8m1_t l_ntzi; vuint32m4_t data4blks = unaligned_load_u32m4(abuf, vl * 4); - vuint32m4_t offsets = __riscv_vundefined_u32m4(); + vuint32m1_t offset0, offset1, offset2, offset3; + vuint32m4_t offsets; /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i) */ l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 0, iv); + offset0 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 1, iv); + offset1 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 2, iv); + offset2 = iv; l = ocb_get_l(c, ++n); l_ntzi = __riscv_vle8_v_u8m1(l, vl_bytes); iv = vxor_u8_u32m1(iv, cast_u8m1_u32m1(l_ntzi), vl); - offsets = __riscv_vset_v_u32m1_u32m4(offsets, 3, iv); + offset3 = iv; + + offsets = merge_4x_u32m1_to_u32m4(offset0, offset1, offset2, offset3); data4blks = vxor_u8_u32m4(offsets, data4blks, vl * 4); @@ -1407,10 +1426,13 @@ _gcry_aes_riscv_zvkned_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, } /* Checksum_i = Checksum_{i-1} xor P_i */ - ctr = vxor_u8_u32m1(__riscv_vget_v_u32m4_u32m1(ctr4blks, 0), - __riscv_vget_v_u32m4_u32m1(ctr4blks, 1), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 2), vl); - ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m4_u32m1(ctr4blks, 3), vl); + { + vuint32m1x4_t ctr0123 = split_u32m4_to_4x_u32m1(ctr4blks); + ctr = vxor_u8_u32m1(__riscv_vget_v_u32m1x4_u32m1(ctr0123, 0), + __riscv_vget_v_u32m1x4_u32m1(ctr0123, 1), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 2), vl); + ctr = vxor_u8_u32m1(ctr, __riscv_vget_v_u32m1x4_u32m1(ctr0123, 3), vl); + } } for (; nblocks; nblocks--) @@ -1492,17 +1514,20 @@ aes_riscv_xts_enc (void *context, unsigned char *tweak_arg, void *outbuf_arg, for (; nblocks >= 4; nblocks -= 4) { vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m4_t tweaks = __riscv_vundefined_u32m4(); + vuint32m1_t tweak0, tweak1, tweak2, tweak3; + vuint32m4_t tweaks; - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 0, tweak); + tweak0 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 1, tweak); + tweak1 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 2, tweak); + tweak2 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 3, tweak); + tweak3 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); + tweaks = merge_4x_u32m1_to_u32m4(tweak0, tweak1, tweak2, tweak3); + data4blks = vxor_u8_u32m4(tweaks, data4blks, vl * 4); AES_CRYPT(e, m4, rounds, data4blks, vl * 4); @@ -1569,17 +1594,20 @@ aes_riscv_xts_dec (void *context, unsigned char *tweak_arg, void *outbuf_arg, for (; nblocks >= 4; nblocks -= 4) { vuint32m4_t data4blks = unaligned_load_u32m4(inbuf, vl * 4); - vuint32m4_t tweaks = __riscv_vundefined_u32m4(); + vuint32m1_t tweak0, tweak1, tweak2, tweak3; + vuint32m4_t tweaks; - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 0, tweak); + tweak0 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 1, tweak); + tweak1 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 2, tweak); + tweak2 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); - tweaks = __riscv_vset_v_u32m1_u32m4(tweaks, 3, tweak); + tweak3 = tweak; tweak = xts_gfmul_byA(tweak, xts_gfmul, xts_swap64, vl); + tweaks = merge_4x_u32m1_to_u32m4(tweak0, tweak1, tweak2, tweak3); + data4blks = vxor_u8_u32m4(tweaks, data4blks, vl * 4); AES_CRYPT(d, m4, rounds, data4blks, vl * 4); -- 2.53.0 From jussi.kivilinna at iki.fi Thu May 7 06:28:18 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 7 May 2026 07:28:18 +0300 Subject: [PATCH] rijndael-riscv-zvkned: fix m4 grouping when VLEN greater than 128 In-Reply-To: References: <20260506185020.1553147-1-jussi.kivilinna@iki.fi> Message-ID: Hello, On 07/05/2026 05:30, Michael Neuling wrote: > Jussi, > >> Implementation was making wrong assumptions about m4 grouping with >> different VLEN configurations. Implementation did work with VLEN=128 >> but broke apart with VLEN=256, etc when VLEN>128. This commit switches >> riscv-zvkned to use vslideup/vslidedown for setting up m4 vector group >> with 4x128-bits data from four vectors with 128-bits of data. >> >> Tested with "qemu-riscv64 -cpu max,vlen={128, 256, 512, 1024}". >> >> Reported-by: Michael Neuling >> Signed-off-by: Jussi Kivilinna > > Tested-by: Michael Neuling > > This works here. Thanks for the quick turnaround. > > Mikey Thanks to reporting and testing. I've added different vlen variants to my qemu-riscv64 CI runs and this kind of bug should not pass anymore. -Jussi From mikey at neuling.org Thu May 7 04:30:01 2026 From: mikey at neuling.org (Michael Neuling) Date: Thu, 7 May 2026 12:30:01 +1000 Subject: [PATCH] rijndael-riscv-zvkned: fix m4 grouping when VLEN greater than 128 In-Reply-To: <20260506185020.1553147-1-jussi.kivilinna@iki.fi> References: <20260506185020.1553147-1-jussi.kivilinna@iki.fi> Message-ID: Jussi, > Implementation was making wrong assumptions about m4 grouping with > different VLEN configurations. Implementation did work with VLEN=128 > but broke apart with VLEN=256, etc when VLEN>128. This commit switches > riscv-zvkned to use vslideup/vslidedown for setting up m4 vector group > with 4x128-bits data from four vectors with 128-bits of data. > > Tested with "qemu-riscv64 -cpu max,vlen={128, 256, 512, 1024}". > > Reported-by: Michael Neuling > Signed-off-by: Jussi Kivilinna Tested-by: Michael Neuling This works here. Thanks for the quick turnaround. Mikey From mikey at neuling.org Thu May 7 07:30:25 2026 From: mikey at neuling.org (Michael Neuling) Date: Thu, 7 May 2026 15:30:25 +1000 Subject: [PATCH] rijndael-riscv-zvkned: fix m4 grouping when VLEN greater than 128 In-Reply-To: References: <20260506185020.1553147-1-jussi.kivilinna@iki.fi>

Message-ID: > Thanks to reporting and testing. I've added different vlen variants to > my qemu-riscv64 CI runs and this kind of bug should not pass anymore. Nice! FWIW Another common mistake is with tail agnostic bits. Running qemu with -cpu rvv_ta_all_1s/rvv_ma_all_1s=true/false may help find these. Using our Ascalon qemu model (with -cpu tt-ascalon) is also an option. We have different VLEN and TA behaviour than the qemu defaults. I'd love to add an option to qemu like -cpu rva23s64,randomize=true to test different implementation options like these. Anyway, thanks again! Mikey From stefbon at gmail.com Sun May 10 07:51:10 2026 From: stefbon at gmail.com (Stef Bon) Date: Sun, 10 May 2026 07:51:10 +0200 Subject: Get required size of signature before signing. Message-ID: Hi, I'm using gcry_pk_sign, and I want the code to get the size of the signature before using the sign function. Is there a way to do this? (or is the size of the signature the same as the hash?) S. Bon From johnthacker at gmail.com Sun May 10 16:29:17 2026 From: johnthacker at gmail.com (John Thacker) Date: Sun, 10 May 2026 10:29:17 -0400 Subject: PATCH: Use FreeLibrary instead of CloseHandle to free Win32 DLL opened with LoadLibraryEx Message-ID: Below is a patch to fix an issue with a recent libgcrypt commit that opens the Windows "shell32.dll" with LoadLibraryEx but closes it with CloseHandle instead of the preferred FreeLibrary. This caused problems running Wireshark under certain circumstances, e.g. under the MSVC debugger. Thanks, John Thacker >From 5979c49b981142efc2a0a8d67ee65fe62252dad7 Mon Sep 17 00:00:00 2001From: gpotter2 <10530980+gpotter2 at users.noreply.github.com>Date: Thu, 7 May 2026 00:52:19 +0200Subject: [PATCH] Fix 'Invalid Handle' crash on Win32--- src/hwfeatures.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/src/hwfeatures.c b/src/hwfeatures.cindex 1b107e63..34017cf7 100644--- a/src/hwfeatures.c+++ b/src/hwfeatures.c@@ -336,7 +336,7 @@ _gcry_get_sysconfdir (void) strcat (appdata, "/GNU/etc/gcrypt/"); } xfree (buf);- CloseHandle (handle);+ FreeLibrary(handle); } if (!appdata) appdata = xstrdup ("c:/ProgramData/GNU/etc/gcrypt/");--2.47.0.windows.1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnthacker at gmail.com Sun May 10 16:21:43 2026 From: johnthacker at gmail.com (John Thacker) Date: Sun, 10 May 2026 10:21:43 -0400 Subject: libgcrypt 1.12.0 and 1.11.3 Windows LoadLibraryEx, CloseHandle/FreeLibrary mismatch Message-ID: Hello, In Wireshark we have noticed some issues with the recent libgcrypt releases 1.11.3 and 1.12.2, discussed here: https://gitlab.com/wireshark/wireshark/-/work_items/21251 A recent commit opens Windows DLL "shell32.dll" with LoadLibraryEx, but closes it with CloseHandle instead of FreeLibrary. FreeLibrary does reference counting, but CloseHandle does not, which causes problems if an application already has opened shell32.dll for other reasons: https://github.com/gpg/libgcrypt/commit/d5e3cbfd8845a872d39f468da27a443cea4587e2 https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa Separately, we have a more minor concern that it's more appropriate to typedef Windows SSIZE_T to ssize_t rather than long, because long is 32-bit on Windows even on 64-bit platforms whereas SSIZE_T is not. Due to the GnuPG/libgcrypt bug system not accepting registration right now due to scraping, none of the Wireshark core developers have memberships in order to create bugs or post the patches. Please look at our discussion linked above. Thanks, John Thacker -------------- next part -------------- An HTML attachment was scrubbed... URL: From wk at gnupg.org Mon May 11 09:50:29 2026 From: wk at gnupg.org (Werner Koch) Date: Mon, 11 May 2026 09:50:29 +0200 Subject: Get required size of signature before signing. In-Reply-To: (Stef Bon via Gcrypt-devel's message of "Sun, 10 May 2026 07:51:10 +0200") References: Message-ID: <87pl32mlkq.fsf@jacob.g10code.de> Hi! On Sun, 10 May 2026 07:51, Stef Bon said: > I'm using gcry_pk_sign, and I want the code to get the size of the > signature before using the sign function. Is there a way to do this? The size of the signature depends on the algorithm and its parameters. In general the size of a signature is fixed. It is best to do trial signing to find out which signature size is yielded for your algorithm. There is no dummy signing, sorry. Shalom-Salam, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- A non-text attachment was scrubbed... Name: openpgp-digital-signature.asc Type: application/pgp-signature Size: 284 bytes Desc: not available URL: From gniibe at fsij.org Tue May 12 08:00:05 2026 From: gniibe at fsij.org (NIIBE Yutaka) Date: Tue, 12 May 2026 15:00:05 +0900 Subject: PATCH: Use FreeLibrary instead of CloseHandle to free Win32 DLL opened with LoadLibraryEx In-Reply-To: References: Message-ID: <87qznhxj4q.fsf@haruna.fsij.org> Hello, John Thacker wrote: > Below is a patch to fix an issue with a recent libgcrypt commit that > opens the Windows "shell32.dll" with LoadLibraryEx but closes it with > CloseHandle instead of the preferred FreeLibrary. Thank you for the patch. Applied and pushed in master, 1.11 branch, and 1.8 branch. -- From jussi.kivilinna at iki.fi Tue May 12 19:33:04 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Tue, 12 May 2026 20:33:04 +0300 Subject: [PATCH] configure: use AC_LINK_IFELSE for intrinsics to fix LTO builds Message-ID: <20260512173305.2267647-1-jussi.kivilinna@iki.fi> * configure.ac (gcry_cv_cc_x86_avx512_intrinsics) (GCRY_AARCH64_NEON_INTRINSICS_TEST) (GCRY_POWERPC_VECTOR_INTRINSICS_TEST) (GCRY_RISCV_VECTOR_INTRINSICS_TEST): Add main() function to test program. (gcry_cv_cc_x86_avx512_intrinsics) (gcry_cv_cc_aarch64_neon_intrinsics) (gcry_cv_cc_aarch64_neon_intrinsics_cflags) (gcry_cv_cc_ppc_altivec, gcry_cv_cc_ppc_altivec_cflags) (gcry_cv_cc_riscv_vector_intrinsics) (gcry_cv_cc_riscv_vector_intrinsics_cflags): Change AC_COMPILE_IFELSE to AC_LINK_IFELSE. -- With LTO enabled, AC_COMPILE_IFELSE does not perform actual compilation. Solve issue by using AC_LINK_IFELSE to ensure LTO performs code generation to detect missing vector extensions. Signed-off-by: Jussi Kivilinna --- configure.ac | 52 +++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 43 insertions(+), 9 deletions(-) diff --git a/configure.ac b/configure.ac index 30be86b5..b41f67f4 100644 --- a/configure.ac +++ b/configure.ac @@ -1768,7 +1768,7 @@ AC_CACHE_CHECK([whether compiler supports x86/AVX512 intrinsics], gcry_cv_cc_x86_avx512_intrinsics="n/a" else gcry_cv_cc_x86_avx512_intrinsics=no - AC_COMPILE_IFELSE([AC_LANG_SOURCE( + AC_LINK_IFELSE([AC_LANG_SOURCE( [[#include __m512i fn(void *in, __m128i y) { @@ -1781,6 +1781,12 @@ AC_CACHE_CHECK([whether compiler supports x86/AVX512 intrinsics], ::"x"(y),"r"(in):"memory","xmm6"); return x; } + int main(void) + { + __m128i y = _mm_setzero_si128(); + __m512i r = fn(&y, y); + return (int)_mm_cvtsi128_si32(_mm512_castsi512_si128(r)); + } ]])], [gcry_cv_cc_x86_avx512_intrinsics=yes]) fi]) @@ -2303,6 +2309,12 @@ m4_define([GCRY_AARCH64_NEON_INTRINSICS_TEST], memory_barrier_with_vec(in); return in; } + int main(void) + { + __m128i v = {0, 0}; + v = fn(v); + return (int)vgetq_lane_u64(v, 0); + } ]] )] ) @@ -2314,7 +2326,7 @@ AC_CACHE_CHECK([whether compiler supports AArch64/NEON/crypto intrinsics], gcry_cv_cc_aarch64_neon_intrinsics="n/a" else gcry_cv_cc_aarch64_neon_intrinsics=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_AARCH64_NEON_INTRINSICS_TEST], [gcry_cv_cc_aarch64_neon_intrinsics=yes]) fi]) @@ -2333,7 +2345,7 @@ if test "$gcry_cv_cc_aarch64_neon_intrinsics" = "no" && [gcry_cv_cc_aarch64_neon_intrinsics_cflags], [ gcry_cv_cc_aarch64_neon_intrinsics_cflags=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_AARCH64_NEON_INTRINSICS_TEST], [gcry_cv_cc_aarch64_neon_intrinsics_cflags=yes]) ]) @@ -2372,6 +2384,12 @@ m4_define([GCRY_POWERPC_VECTOR_INTRINSICS_TEST], y = vec_sld_u32 (y, y, 3); return vec_cipher_be (t, in) ^ (block)y; } + int main(void) + { + block b = {0}; + b = fn(b); + return (int)vec_extract((vecu32)b, 0); + } ]] )] ) @@ -2383,7 +2401,7 @@ AC_CACHE_CHECK([whether compiler supports PowerPC AltiVec/VSX/crypto intrinsics] gcry_cv_cc_ppc_altivec="n/a" else gcry_cv_cc_ppc_altivec=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_POWERPC_VECTOR_INTRINSICS_TEST], [gcry_cv_cc_ppc_altivec=yes]) fi]) @@ -2402,7 +2420,7 @@ if test "$gcry_cv_cc_ppc_altivec" = "no" && [gcry_cv_cc_ppc_altivec_cflags], [ gcry_cv_cc_ppc_altivec_cflags=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_POWERPC_VECTOR_INTRINSICS_TEST], [gcry_cv_cc_ppc_altivec_cflags=yes]) ]) @@ -2771,6 +2789,12 @@ m4_define([GCRY_RISCV_VECTOR_INTRINSICS_TEST], clear_vec_reg_v0(); return in; } + int main(void) + { + __m128i v = __riscv_vmv_v_x_u8m1(0, 16); + v = fn(v); + return (int)__riscv_vmv_x_s_u8m1_u8(v); + } ]] )] ) @@ -2782,7 +2806,7 @@ AC_CACHE_CHECK([whether compiler supports RISC-V vector intrinsics], gcry_cv_cc_riscv_vector_intrinsics="n/a" else gcry_cv_cc_riscv_vector_intrinsics=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_RISCV_VECTOR_INTRINSICS_TEST], [gcry_cv_cc_riscv_vector_intrinsics=yes]) fi]) @@ -2806,7 +2830,7 @@ if test "$gcry_cv_cc_riscv_vector_intrinsics" = "no" && [gcry_cv_cc_riscv_vector_intrinsics_cflags], [ gcry_cv_cc_riscv_vector_intrinsics_cflags=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_RISCV_VECTOR_INTRINSICS_TEST], [gcry_cv_cc_riscv_vector_intrinsics_cflags=yes]) ]) @@ -2906,6 +2930,16 @@ m4_define([GCRY_RISCV_VECTOR_CRYPTO_INTRINSICS_TEST], ); __riscv_vse32_v_u32m1(ptr + 0 * vl, a, vl); } + int main(void) + { + unsigned int buf[256]; + test_sha2(buf); + test_aes_key(buf); + test_aes_crypt(buf); + test_ghash(buf); + test_inline_vec_asm(buf); + return 0; + } ]] )] ) @@ -2917,7 +2951,7 @@ AC_CACHE_CHECK([whether compiler supports RISC-V vector cryptography intrinsics] gcry_cv_cc_riscv_vector_crypto_intrinsics="n/a" else gcry_cv_cc_riscv_vector_crypto_intrinsics=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_RISCV_VECTOR_CRYPTO_INTRINSICS_TEST], [gcry_cv_cc_riscv_vector_crypto_intrinsics=yes]) fi]) @@ -2941,7 +2975,7 @@ if test "$gcry_cv_cc_riscv_vector_crypto_intrinsics" = "no" && AC_CACHE_CHECK([whether compiler supports RISC-V vector intrinsics with extra GCC flags], [gcry_cv_cc_riscv_vector_crypto_intrinsics_cflags], [gcry_cv_cc_riscv_vector_crypto_intrinsics_cflags=no - AC_COMPILE_IFELSE( + AC_LINK_IFELSE( [GCRY_RISCV_VECTOR_CRYPTO_INTRINSICS_TEST], [gcry_cv_cc_riscv_vector_crypto_intrinsics_cflags=yes])]) if test "$gcry_cv_cc_riscv_vector_crypto_intrinsics_cflags" = "yes" ; then -- 2.53.0 From jussi.kivilinna at iki.fi Wed May 13 07:17:29 2026 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 13 May 2026 08:17:29 +0300 Subject: [PATCH] Fix carry flag clobber for powerpc inline assembly with clang Message-ID: <20260513051729.1157323-1-jussi.kivilinna@iki.fi> * mpi/longlong.h [_ARCH_PPC || _ARCH_PPC64] (__PPC_CLOBBER_CC) (add_ssaaaa, sub_ddmmss): Add "xer" to clobber list. * mpi/ec-inline.h [__powerpc__] (ADD3_LIMB64, SUB3_LIMB64) (ADD4_LIMB64, SUB4_LIMB64, ADD5_LIMB64, SUB5_LIMB64): Likewise. * cipher/poly1305.c [__powerpc__] (ADD_1305_64): Likewise. -- Add "xer" to inline assembly clobber list as carry flag is passed through XER register on PowerPC. On GCC "cc" clobbers XER implicitly but with clang XER needs to be explicitly clobbered. Signed-off-by: Jussi Kivilinna --- cipher/poly1305.c | 2 +- mpi/ec-inline.h | 12 ++++++------ mpi/longlong.h | 38 ++++++++++++++++++++++---------------- 3 files changed, 29 insertions(+), 23 deletions(-) diff --git a/cipher/poly1305.c b/cipher/poly1305.c index 8bc65699..8739a4ef 100644 --- a/cipher/poly1305.c +++ b/cipher/poly1305.c @@ -194,7 +194,7 @@ static void poly1305_init (poly1305_context_t *ctx, "adde %2, %5, %2\n" \ : "+r" (A0), "+r" (A1), "+r" (A2) \ : "r" (B0), "r" (B1), "r" (B2) \ - : "cc" ) + : "cc", "xer" ) #endif /* __powerpc__ */ diff --git a/mpi/ec-inline.h b/mpi/ec-inline.h index 662aa5c3..9edd20e5 100644 --- a/mpi/ec-inline.h +++ b/mpi/ec-inline.h @@ -322,7 +322,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc", "r0") + : "cc", "xer", "r0") #define SUB3_LIMB64(A2, A1, A0, B2, B1, B0, C2, C1, C0) \ __asm__ ("subfc %2, %8, %5\n" \ @@ -337,7 +337,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc", "r0") + : "cc", "xer", "r0") #define ADD4_LIMB64(A3, A2, A1, A0, B3, B2, B1, B0, C3, C2, C1, C0) \ __asm__ ("addc %3, %11, %7\n" \ @@ -356,7 +356,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc") + : "cc", "xer") #define SUB4_LIMB64(A3, A2, A1, A0, B3, B2, B1, B0, C3, C2, C1, C0) \ __asm__ ("subfc %3, %11, %7\n" \ @@ -375,7 +375,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc") + : "cc", "xer") #define ADD5_LIMB64(A4, A3, A2, A1, A0, B4, B3, B2, B1, B0, \ C4, C3, C2, C1, C0) \ @@ -399,7 +399,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc") + : "cc", "xer") #define SUB5_LIMB64(A4, A3, A2, A1, A0, B4, B3, B2, B1, B0, \ C4, C3, C2, C1, C0) \ @@ -423,7 +423,7 @@ LIMB64_HILO(u32 hi, u32 lo) "r" ((mpi_limb_t)(C2)), \ "r" ((mpi_limb_t)(C1)), \ "r" ((mpi_limb_t)(C0)) \ - : "cc") + : "cc", "xer") #endif /* __powerpc__ */ diff --git a/mpi/longlong.h b/mpi/longlong.h index 46de33a8..453f2704 100644 --- a/mpi/longlong.h +++ b/mpi/longlong.h @@ -981,6 +981,12 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); /*************************************** ************** PPC ****************** ***************************************/ +#if __GNUC__ >= 2 && (defined (_ARCH_PPC) || defined (_ARCH_PPC64) \ + || defined (__powerpc__) || defined (__powerpc64__)) +# define __PPC_CLOBBER_CC : "cc", "xer" +#else +# define __PPC_CLOBBER_CC +#endif /* Powerpc 32 bit support taken from GCC longlong.h. */ #if (defined (_ARCH_PPC) || defined (__powerpc__)) && W_TYPE_SIZE == 32 # define add_ssaaaa(sh, sl, ah, al, bh, bl) \ @@ -988,40 +994,40 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); if (__builtin_constant_p (bh) && (bh) == 0) \ __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0) \ __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else \ __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \ : "=r" (sh), "=&r" (sl) \ : "%r" (ah), "r" (bh), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ } while (0) # define sub_ddmmss(sh, sl, ah, al, bh, bl) \ do { \ if (__builtin_constant_p (ah) && (ah) == 0) \ __asm__ ("subf%I3c %1,%4,%3\n\tsubfze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (ah) && (ah) == ~(USItype) 0) \ __asm__ ("subf%I3c %1,%4,%3\n\tsubfme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == 0) \ __asm__ ("subf%I3c %1,%4,%3\n\taddme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0) \ __asm__ ("subf%I3c %1,%4,%3\n\taddze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else \ __asm__ ("subf%I4c %1,%5,%4\n\tsubfe %0,%3,%2" \ : "=r" (sh), "=&r" (sl) \ : "r" (ah), "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ } while (0) # define count_leading_zeros(count, x) \ __asm__ ("cntlzw %0,%1" : "=r" (count) : "r" (x)) @@ -1052,40 +1058,40 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); if (__builtin_constant_p (bh) && (bh) == 0) \ __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == ~(UDItype) 0) \ __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else \ __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \ : "=r" (sh), "=&r" (sl) \ : "%r" (ah), "r" (bh), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ } while (0) # define sub_ddmmss(sh, sl, ah, al, bh, bl) \ do { \ if (__builtin_constant_p (ah) && (ah) == 0) \ __asm__ ("subf%I3c %1,%4,%3\n\tsubfze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (ah) && (ah) == ~(UDItype) 0) \ __asm__ ("subf%I3c %1,%4,%3\n\tsubfme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == 0) \ __asm__ ("subf%I3c %1,%4,%3\n\taddme %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else if (__builtin_constant_p (bh) && (bh) == ~(UDItype) 0) \ __asm__ ("subf%I3c %1,%4,%3\n\taddze %0,%2" \ : "=r" (sh), "=&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ else \ __asm__ ("subf%I4c %1,%5,%4\n\tsubfe %0,%3,%2" \ : "=r" (sh), "=&r" (sl) \ : "r" (ah), "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ + __PPC_CLOBBER_CC); \ } while (0) # define count_leading_zeros(count, x) \ __asm__ ("cntlzd %0,%1" : "=r" (count) : "r" (x)) -- 2.53.0 From stefbon at gmail.com Wed May 13 21:30:13 2026 From: stefbon at gmail.com (Stef Bon) Date: Wed, 13 May 2026 21:30:13 +0200 Subject: Get required size of signature before signing. In-Reply-To: <87pl32mlkq.fsf@jacob.g10code.de> References: <87pl32mlkq.fsf@jacob.g10code.de> Message-ID: Op ma 11 mei 2026 om 09:47 schreef Werner Koch : > > Hi! > > On Sun, 10 May 2026 07:51, Stef Bon said: > > > I'm using gcry_pk_sign, and I want the code to get the size of the > > signature before using the sign function. Is there a way to do this? > > The size of the signature depends on the algorithm and its parameters. > In general the size of a signature is fixed. It is best to do trial > signing to find out which signature size is yielded for your algorithm. In my opinion it's the task of the crypt library to provide this information. For me to implement a test function to find out the size is a workaround. I thought the function gcry_pk_algo_info can do this (using GCRY_GET_ALGO_SIGNSIZE for example) because in general this size is fixed as you mention, but it doesn't. Is this an idea? Stef