From dtsen at us.ibm.com Mon Mar 2 02:37:32 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 2 Mar 2026 01:37:32 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: <87h5r3r50l.fsf@jacob.g10code.de> References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:47] danny at ltcden12-lp1 mldsa-ntt_tests % ./perf_mldsa_ntt_opt === Optimized assembly NTT test cpu_time_used (sec)=0.046582 loops=100000 -->ops / sec = 2146751.964278 === Original C NTT test cpu_time_used (sec)=0.229215 loops=100000 -->ops / sec = 436271.622712 -->Optimized improvement over original = 3.920678 -->Optimized speed over original faster = 4.920678 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.052021 loops=100000 -->ops / sec = 1922300.609369 === Original C Inverse NTT test cpu_time_used (sec)=0.270790 loops=100000 -->ops / sec = 369289.855608 -->Optimized improvement over original = 4.205398 -->Optimized speed over original faster = 5.205398 ________________________________ From: Werner Koch Sent: Thursday, February 26, 2026 9:47 PM To: Danny Tsen via Gcrypt-devel Cc: Danny Tsen Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for On Thu, 26 Feb 2026 10:23, Danny Tsen said: > I don't have benchmark for libgcrypt. I do have my own testing > performance number on NTT operation. That probably not what you are I just noticed that we do have support for MLKEM and MLDSA in our ./bench-slope . We should change that to make it easier torun benchmarks. I was actually looking only for a rough figure on how much performance you gain with your patches. Salam-Shalom, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtsen at us.ibm.com Mon Mar 2 03:19:29 2026 From: dtsen at us.ibm.com (Danny Tsen) Date: Mon, 2 Mar 2026 02:19:29 +0000 Subject: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for In-Reply-To: References: <20260224002753.151873-1-dtsen@us.ibm.com> <87bjhetrnl.fsf@jacob.g10code.de> <87h5r3r50l.fsf@jacob.g10code.de> Message-ID: Hi Werner, I do some modification for the ML-KEM format. Here is the raw performance number for ML-KEM NTT. Hope this help. Thanks. -Danny [16:33] danny at ltcden12-lp1 mlkem-ipcri % ./perf_mlkem_test === Optimized assembly NTT test cpu_time_used (sec)=0.016707 loops=100000 -->ops / sec = 5985515.053570 === Original C NTT test cpu_time_used (sec)=0.107232 loops=100000 -->ops / sec = 932557.445539 -->Optimized improvement over original = 5.418388 -->Optimized speed over original faster = 6.418388 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.031500 loops=100000 -->ops / sec = 3174603.174603 === Original C Inverse NTT test cpu_time_used (sec)=0.138457 loops=100000 -->ops / sec = 722245.895838 -->Optimized improvement over original = 3.395460 -->Optimized speed over original faster = 4.395460 ________________________________ From: Gcrypt-devel on behalf of Danny Tsen via Gcrypt-devel Sent: Monday, March 2, 2026 9:37 AM To: Werner Koch ; Danny Tsen via Gcrypt-devel Subject: [EXTERNAL] RE: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:?47] danny@?ltcden12-lp1 mldsa-ntt_tests Hi Werner, For some reason, I can't display your message. I got to display it now. I don't have a good comparison performance format for ML-KEM. But here is the raw performance number for MLDSA. Thanks. -Danny [15:47] danny at ltcden12-lp1 mldsa-ntt_tests % ./perf_mldsa_ntt_opt === Optimized assembly NTT test cpu_time_used (sec)=0.046582 loops=100000 -->ops / sec = 2146751.964278 === Original C NTT test cpu_time_used (sec)=0.229215 loops=100000 -->ops / sec = 436271.622712 -->Optimized improvement over original = 3.920678 -->Optimized speed over original faster = 4.920678 === Optimized Assembly Inverse NTT test cpu_time_used (sec)=0.052021 loops=100000 -->ops / sec = 1922300.609369 === Original C Inverse NTT test cpu_time_used (sec)=0.270790 loops=100000 -->ops / sec = 369289.855608 -->Optimized improvement over original = 4.205398 -->Optimized speed over original faster = 5.205398 ________________________________ From: Werner Koch Sent: Thursday, February 26, 2026 9:47 PM To: Danny Tsen via Gcrypt-devel Cc: Danny Tsen Subject: [EXTERNAL] Re: [PATCH 0/5] dilithium-kyber: Optimized (i)NTT support for On Thu, 26 Feb 2026 10:23, Danny Tsen said: > I don't have benchmark for libgcrypt. I do have my own testing > performance number on NTT operation. That probably not what you are I just noticed that we do have support for MLKEM and MLDSA in our ./bench-slope . We should change that to make it easier torun benchmarks. I was actually looking only for a rough figure on how much performance you gain with your patches. Salam-Shalom, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: