[PATCH] Add SM3 x86-64 AVX/BMI2 assembly implementation
Tianjia Zhang
tianjia.zhang at linux.alibaba.com
Tue Dec 14 08:21:56 CET 2021
Hi Jussi,
On 12/12/21 10:49 PM, Jussi Kivilinna wrote:
> * cipher/Makefile.am: Add 'sm3-avx-bmi2-amd64.S'.
> * cipher/sm3-avx-bmi2-amd64.S: New.
> * cipher/sm3.c (USE_AVX_BMI2, ASM_FUNC_ABI, ASM_EXTRA_STACK): New.
> (SM3_CONTEXT): Define 'h' as array instead of separate fields 'h1',
> 'h2', etc.
> [USE_AVX_BMI2] (_gcry_sm3_transform_amd64_avx_bmi2)
> (do_sm3_transform_amd64_avx_bmi2): New.
> (sm3_init): Select AVX/BMI2 transform function if support by HW; Update
> to use 'hd->h' as array.
> (transform_blk, sm3_final): Update to use 'hd->h' as array.
> * configure.ac: Add 'sm3-avx-bmi2-amd64.lo'.
> --
>
> Benchmark on AMD Zen3:
>
> Before:
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 2.18 ns/B 436.6 MiB/s 10.59 c/B 4850
>
> After (~43% faster):
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 1.52 ns/B 627.4 MiB/s 7.37 c/B 4850
>
>
> Benchmark on Intel Skylake:
>
> Before:
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 4.35 ns/B 219.2 MiB/s 13.48 c/B 3098
>
> After (~34% faster):
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 3.24 ns/B 294.4 MiB/s 10.04 c/B 3098
>
>
> Benchmark on AMD Zen2:
>
> Before:
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 2.73 ns/B 348.9 MiB/s 11.86 c/B 4339
>
> After (~38% faster):
> | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
> SM3 | 1.97 ns/B 483.0 MiB/s 8.52 c/B 4318
>
>
> Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
> ---
Great job, it is very valuable to us, if possible, add this tag:
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang at linux.alibaba.com>
Best regards,
Tianjia
More information about the Gcrypt-devel
mailing list