From cvs at cvs.gnupg.org Tue Jan 3 16:39:05 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Tue, 03 Jan 2017 16:39:05 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-49-g98b4969 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 98b49695b1ffe3c406ae39a45051b8594f903b9d (commit) via 3582641469f1c74078f0d758c4d5458cc0ee5649 (commit) from 0996d5f1c34a3d3012facd098a139d8abbde085f (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 98b49695b1ffe3c406ae39a45051b8594f903b9d Author: Werner Koch Date: Tue Jan 3 16:30:54 2017 +0100 Extend GCRYCTL_PRINT_CONFIG to print compiler version. * src/global.c (print_config): Print version of libgpg-error and used compiler. Signed-off-by: Werner Koch diff --git a/NEWS b/NEWS index ef882b7..179b18d 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,11 @@ Noteworthy changes in version 1.8.0 (unreleased) [C21/A1/R_] - GCRYCTL_REINIT_SYSCALL_CLAMP allows to init nPth after Libgcrypt. + * Extended interfaces: + + - GCRYCTL_PRINT_CONFIG does now also print build information for + libgpg-error and the used compiler version. + * Internal changes: - Libgpg-error 1.25 is now required. This avoids stalling of nPth diff --git a/src/global.c b/src/global.c index cfb7618..25815dd 100644 --- a/src/global.c +++ b/src/global.c @@ -279,7 +279,25 @@ print_config ( int (*fnc)(FILE *fp, const char *format, ...), FILE *fp) int i; const char *s; - fnc (fp, "version:%s:\n", VERSION); + fnc (fp, "version:%s:%x:%s:%x:\n", + VERSION, GCRYPT_VERSION_NUMBER, + GPGRT_VERSION, GPGRT_VERSION_NUMBER); + fnc (fp, "cc:%d:%s:\n", +#if GPGRT_VERSION_NUMBER >= 0x011b00 /* 1.27 */ + GPGRT_GCC_VERSION +#else + _GPG_ERR_GCC_VERSION /* Due to a bug in gpg-error.h. */ +#endif + , +#ifdef __clang__ + "clang:" __VERSION__ +#elif __GNUC__ + "gcc:" __VERSION__ +#else + ":" +#endif + ); + fnc (fp, "ciphers:%s:\n", LIBGCRYPT_CIPHERS); fnc (fp, "pubkeys:%s:\n", LIBGCRYPT_PUBKEY_CIPHERS); fnc (fp, "digests:%s:\n", LIBGCRYPT_DIGESTS); commit 3582641469f1c74078f0d758c4d5458cc0ee5649 Author: Werner Koch Date: Tue Jan 3 15:34:33 2017 +0100 tests: Add option --disable-hwf to the version utility. * src/hwfeatures.c (_gcry_disable_hw_feature): Rewrite to allow passing a colon delimited feature set. (parse_hwf_deny_file): Remove unused var I. * tests/version.c (main): Add options --verbose and --disable-hwf. Signed-off-by: Werner Koch diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index cb539da..47ac19e 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -906,10 +906,14 @@ success or an error code on failure. Libgcrypt detects certain features of the CPU at startup time. For performance tests it is sometimes required not to use such a feature. This option may be used to disable a certain feature; i.e. Libgcrypt -behaves as if this feature has not been detected. Note that the -detection code might be run if the feature has been disabled. This -command must be used at initialization time; i.e. before calling - at code{gcry_check_version}. +behaves as if this feature has not been detected. This call can be +used several times to disable a set of features, or features may be +given as a colon or comma delimited string. The special feature +"all" can be used to disable all available features. + +Note that the detection code might be run if the feature has been +disabled. This command must be used at initialization time; +i.e. before calling @code{gcry_check_version}. @item GCRYCTL_REINIT_SYSCALL_CLAMP; Arguments: none diff --git a/src/hwfeatures.c b/src/hwfeatures.c index 99aba34..82f8bf2 100644 --- a/src/hwfeatures.c +++ b/src/hwfeatures.c @@ -82,20 +82,34 @@ gpg_err_code_t _gcry_disable_hw_feature (const char *name) { int i; + size_t n1, n2; - if (!strcmp(name, "all")) + while (name && *name) { - disabled_hw_features = ~0; - return 0; + n1 = strcspn (name, ":,"); + if (!n1) + ; + else if (n1 == 3 && !strncmp (name, "all", 3)) + disabled_hw_features = ~0; + else + { + for (i=0; i < DIM (hwflist); i++) + { + n2 = strlen (hwflist[i].desc); + if (n1 == n2 && !strncmp (hwflist[i].desc, name, n2)) + { + disabled_hw_features |= hwflist[i].flag; + break; + } + } + if (!(i < DIM (hwflist))) + return GPG_ERR_INV_NAME; + } + name += n1; + if (*name) + name++; /* Skip delimiter ':' or ','. */ } - - for (i=0; i < DIM (hwflist); i++) - if (!strcmp (hwflist[i].desc, name)) - { - disabled_hw_features |= hwflist[i].flag; - return 0; - } - return GPG_ERR_INV_NAME; + return 0; } @@ -131,7 +145,7 @@ parse_hwf_deny_file (void) FILE *fp; char buffer[256]; char *p, *pend; - int i, lnr = 0; + int lnr = 0; fp = fopen (fname, "r"); if (!fp) diff --git a/tests/version.c b/tests/version.c index f22c305..baf984e 100644 --- a/tests/version.c +++ b/tests/version.c @@ -42,8 +42,47 @@ int main (int argc, char **argv) { - (void)argc; - (void)argv; + int last_argc = -1; + + if (argc) + { argc--; argv++; } + + while (argc && last_argc != argc ) + { + last_argc = argc; + if (!strcmp (*argv, "--")) + { + argc--; argv++; + break; + } + else if (!strcmp (*argv, "--verbose")) + { + verbose++; + argc--; argv++; + } + else if (!strcmp (*argv, "--debug")) + { + /* Dummy option */ + argc--; argv++; + } + else if (!strcmp (*argv, "--disable-hwf")) + { + argc--; + argv++; + if (argc) + { + if (gcry_control (GCRYCTL_DISABLE_HWF, *argv, NULL)) + fprintf (stderr, + PGM + ": unknown hardware feature '%s' - option ignored\n", + *argv); + argc--; + argv++; + } + } + } + + xgcry_control (GCRYCTL_SET_VERBOSITY, (int)verbose); xgcry_control (GCRYCTL_DISABLE_SECMEM, 0); if (strcmp (GCRYPT_VERSION, gcry_check_version (NULL))) ----------------------------------------------------------------------- Summary of changes: NEWS | 5 +++++ doc/gcrypt.texi | 12 ++++++++---- src/global.c | 20 +++++++++++++++++++- src/hwfeatures.c | 38 ++++++++++++++++++++++++++------------ tests/version.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 99 insertions(+), 19 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Tue Jan 3 20:57:52 2017 From: wk at gnupg.org (Werner Koch) Date: Tue, 03 Jan 2017 20:57:52 +0100 Subject: SSSE3 problems on Nehalem? Message-ID: <87h95fex1b.fsf@wheatstone.g10code.de> Hi! Due to hardware failures on our old Jenkins server, we switched to an E5520 box. Although this box is older than the former Intel pre-release Clarkdale box it is with its 8 cores more powerful and thus anyway better for our purposes. Now, here is the problem: We do not have AES-NI anymore and thus the SSSE3 optimized AES implementation is used - which fails in the CTR mode selftest. I was not able to replicate this failure on other machines even when forcing the use of SSSE3 for example by using tests/basic --disable-hwf intel-fast-shld:intel-pclmul:intel-aesni:intel-avx (this works for master; you may need to use several --disable-hwf). Disabling intel-ssse3 on the E5520 is possible (/etc/gcrypt/hwf.deny) but not a proper fix. The selftest should yield these values for rijndail.c:selftest_ctr_128 around line 487 in _gcry_selftest_helper_ctr (with diff==0): iv : 00000800000000000000000000000008 plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ 202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f \ 404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f \ 606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f \ 808182838485868788898a8b8c8d8e8f ciphr: eadf062f4bc843fe7662191a78dccd8011bea2ba43937fc63b66ddfaf902eb23 \ 4585dcf111ea27c00ade03493a89ed6880a4bdc12f3ac0df9493db796266b611 \ e51cdbf3bb9be44981c2d4e6b7b34dd326d8676d1dd19949a848ba72343611fa \ 6f636ddd8db82f0c17ed1bab5bfc1912082c87ff588404305ce8908d32f380c8 \ 875ee5d348b357227991bf5f5d8f7186 plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ 202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f \ 404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f \ 606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f \ 808182838485868788898a8b8c8d8e8f All fine. But on the E5520 I get this back after decryption: plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ e5a9525c2fcb886698104111a6edaeb407f3b66338c43f35621b5e1bc4c33b9b \ ad1c9778f4694da7cbe11352030b156d99a857fc80e124250a358009af6b7ef8 \ 5f6fc100ac3276af2d9670709718b43c96a62959bb48d623d21d1dedf32fcf0f \ da6405a4ba56eeb8e05e623acb304391 Thus _gcry_aes_ssse3_ctr_enc fails after one block (128 bits). Has anyone with an E5520 or another Nehalem CPU the same problem? Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From jussi.kivilinna at iki.fi Wed Jan 4 11:01:30 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 4 Jan 2017 12:01:30 +0200 Subject: SSSE3 problems on Nehalem? In-Reply-To: <87h95fex1b.fsf@wheatstone.g10code.de> References: <87h95fex1b.fsf@wheatstone.g10code.de> Message-ID: Hello, On 03.01.2017 21:57, Werner Koch wrote: > > The selftest should yield these values for rijndail.c:selftest_ctr_128 > around line 487 in _gcry_selftest_helper_ctr (with diff==0): > > iv : 00000800000000000000000000000008 > plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ > 202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f \ > 404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f \ > 606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f \ > 808182838485868788898a8b8c8d8e8f > ciphr: eadf062f4bc843fe7662191a78dccd8011bea2ba43937fc63b66ddfaf902eb23 \ > 4585dcf111ea27c00ade03493a89ed6880a4bdc12f3ac0df9493db796266b611 \ > e51cdbf3bb9be44981c2d4e6b7b34dd326d8676d1dd19949a848ba72343611fa \ > 6f636ddd8db82f0c17ed1bab5bfc1912082c87ff588404305ce8908d32f380c8 \ > 875ee5d348b357227991bf5f5d8f7186 > plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ > 202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f \ > 404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f \ > 606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f \ > 808182838485868788898a8b8c8d8e8f > > All fine. But on the E5520 I get this back after decryption: > > plain: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f \ > e5a9525c2fcb886698104111a6edaeb407f3b66338c43f35621b5e1bc4c33b9b \ > ad1c9778f4694da7cbe11352030b156d99a857fc80e124250a358009af6b7ef8 \ > 5f6fc100ac3276af2d9670709718b43c96a62959bb48d623d21d1dedf32fcf0f \ > da6405a4ba56eeb8e05e623acb304391 > > Thus _gcry_aes_ssse3_ctr_enc fails after one block (128 bits). Bug is in _gcry_aes_ssse3_ctr_enc. 'ctrlow' is passed to assembly block as read-only register when it should be read/write as assembly block does 64-bit increment on it. Whatever this ends up breaking depends on compiler register allocation (thus version & flags). So, on that machine, compiler passes 'ctrlow' to temporary register before assembly and assembly part increments that register and calculation is lost. I'll push fix for this soon. Diff for rinjdael-ssse3 attached below. -Jussi --- diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index a8e89d4..2adb73f 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -387,8 +387,8 @@ _gcry_aes_ssse3_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, ".Lno_carry%=:\n\t" "pshufb %%xmm6, %%xmm7\n\t" - : - : [ctr] "r" (ctr), [ctrlow] "r" (ctrlow) + : [ctrlow] "+r" (ctrlow) + : [ctr] "r" (ctr) : "cc", "memory"); do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 273 bytes Desc: OpenPGP digital signature URL: From cvs at cvs.gnupg.org Wed Jan 4 11:20:22 2017 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Wed, 04 Jan 2017 11:20:22 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-50-gaada604 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via aada604594fd42224d366d3cb98f67fd3b989cd6 (commit) from 98b49695b1ffe3c406ae39a45051b8594f903b9d (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit aada604594fd42224d366d3cb98f67fd3b989cd6 Author: Jussi Kivilinna Date: Wed Jan 4 12:02:36 2017 +0200 rijndael-ssse3: fix counter operand from read-only to read/write * cipher/rijndael-ssse3-amd64.c (_gcry_aes_ssse3_ctr_enc): Change 'ctrlow' operand from read-only to read-write. -- With read-only operand, compiler is allowed to pass temporary register to assembly block and throw away any calculation that have been done on that register. On the other hand, compiler is also allowed to keep operand value permanently in one register as value is treated as read-only, and effectly operates as expected. Selection between these two depends on compiler version and used flags. Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index a8e89d4..2adb73f 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -387,8 +387,8 @@ _gcry_aes_ssse3_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, ".Lno_carry%=:\n\t" "pshufb %%xmm6, %%xmm7\n\t" - : - : [ctr] "r" (ctr), [ctrlow] "r" (ctrlow) + : [ctrlow] "+r" (ctrlow) + : [ctr] "r" (ctr) : "cc", "memory"); do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); ----------------------------------------------------------------------- Summary of changes: cipher/rijndael-ssse3-amd64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From wk at gnupg.org Wed Jan 4 16:45:57 2017 From: wk at gnupg.org (Werner Koch) Date: Wed, 04 Jan 2017 16:45:57 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: (Jussi Kivilinna's message of "Wed, 4 Jan 2017 12:01:30 +0200") References: <87h95fex1b.fsf@wheatstone.g10code.de> Message-ID: <87bmvmde16.fsf@wheatstone.g10code.de> On Wed, 4 Jan 2017 11:01, jussi.kivilinna at iki.fi said: > Bug is in _gcry_aes_ssse3_ctr_enc. 'ctrlow' is passed to assembly block > as read-only register when it should be read/write as assembly block does > 64-bit increment on it. Whatever this ends up breaking depends on compiler > register allocation (thus version & flags). Hmmm, we have exactly the same compiler version on both machines: gcc (Debian 6.2.1-5) 6.2.1 20161124 but I just noticed that for whatever reason on the the Jenkins we use -fPIC. > I'll push fix for this soon. Diff for rinjdael-ssse3 attached below. Thanks. I can confirm that it works. Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From jussi.kivilinna at iki.fi Wed Jan 4 16:15:06 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 04 Jan 2017 17:15:06 +0200 Subject: [PATCH] Add XTS cipher mode Message-ID: <148354290694.7561.5683091455706158460.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'cipher-xts.c'. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'bulk.xts_crypt' and 'u_mode.xts' members. (_gcry_cipher_xts_crypt): New prototype. * cipher/cipher-xts.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt): Add XTS mode handling. * doc/gcrypt.texi: Add XTS mode to documentation. * src/gcrypt.h.in (GCRY_CIPHER_MODE_XTS, GCRY_XTS_BLOCK_LEN): New. * tests/basic.c (do_check_xts_cipher, check_xts_cipher): New. (check_bulk_cipher_modes): Add XTS test-vectors. (check_one_cipher_core, check_one_cipher, check_ciphers): Add XTS testing support. (check_cipher_modes): Add XTS test. * tests/bench-slope.c (bench_xts_encrypt_init) (bench_xts_encrypt_do_bench, bench_xts_decrypt_do_bench) (xts_encrypt_ops, xts_decrypt_ops): New. (cipher_modes, cipher_bench_one): Add XTS. * tests/benchmark.c (cipher_bench): Add XTS testing. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index ac0ec58..71a25ed 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -44,7 +44,7 @@ cipher.c cipher-internal.h \ cipher-cbc.c cipher-cfb.c cipher-ofb.c cipher-ctr.c cipher-aeswrap.c \ cipher-ccm.c cipher-cmac.c cipher-gcm.c cipher-gcm-intel-pclmul.c \ cipher-gcm-armv8-aarch32-ce.S cipher-gcm-armv8-aarch64-ce.S \ -cipher-poly1305.c cipher-ocb.c \ +cipher-poly1305.c cipher-ocb.c cipher-xts.c \ cipher-selftest.c cipher-selftest.h \ pubkey.c pubkey-internal.h pubkey-util.c \ md.c \ diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 7204d48..33d0629 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -146,6 +146,9 @@ struct gcry_cipher_handle const void *inbuf_arg, size_t nblocks, int encrypt); size_t (*ocb_auth)(gcry_cipher_hd_t c, const void *abuf_arg, size_t nblocks); + void (*xts_crypt)(gcry_cipher_hd_t c, unsigned char *tweak, + void *outbuf_arg, const void *inbuf_arg, + size_t nblocks, int encrypt); } bulk; @@ -309,6 +312,12 @@ struct gcry_cipher_handle } ocb; + /* Mode specific storage for XTS mode. */ + struct { + /* Pointer to tweak cipher context, allocated after actual + * cipher context. */ + char *tweak_context; + } xts; } u_mode; /* What follows are two contexts of the cipher in use. The first @@ -461,6 +470,12 @@ gcry_err_code_t _gcry_cipher_ocb_check_tag const unsigned char *intag, size_t taglen); +/*-- cipher-xts.c --*/ +gcry_err_code_t _gcry_cipher_xts_crypt +/* */ (gcry_cipher_hd_t c, unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen, int encrypt); + + /* Return the L-value for block N. Note: 'cipher_ocb.c' ensures that N * will never be multiple of 65536 (1 << OCB_L_TABLE_SIZE), thus N can * be directly passed to _gcry_ctz() function and resulting index will diff --git a/cipher/cipher-xts.c b/cipher/cipher-xts.c new file mode 100644 index 0000000..699382b --- /dev/null +++ b/cipher/cipher-xts.c @@ -0,0 +1,165 @@ +/* cipher-xts.c - XTS mode implementation + * Copyright (C) 2017 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser general Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#include +#include +#include +#include +#include + +#include "g10lib.h" +#include "cipher.h" +#include "bufhelp.h" +#include "./cipher-internal.h" + + +static inline void xts_gfmul_byA (unsigned char *out, const unsigned char *in) +{ + u64 hi = buf_get_le64 (in + 8); + u64 lo = buf_get_le64 (in + 0); + u64 carry = -(hi >> 63) & 0x87; + + hi = (hi << 1) + (lo >> 63); + lo = (lo << 1) ^ carry; + + buf_put_le64 (out + 8, hi); + buf_put_le64 (out + 0, lo); +} + + +static inline void xts_inc128 (unsigned char *seqno) +{ + u64 lo = buf_get_le64 (seqno + 0); + u64 hi = buf_get_le64 (seqno + 8); + + hi += !(++lo); + + buf_put_le64 (seqno + 0, lo); + buf_put_le64 (seqno + 8, hi); +} + + +gcry_err_code_t +_gcry_cipher_xts_crypt (gcry_cipher_hd_t c, + unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen, + int encrypt) +{ + gcry_cipher_encrypt_t tweak_fn = c->spec->encrypt; + gcry_cipher_encrypt_t crypt_fn = + encrypt ? c->spec->encrypt : c->spec->decrypt; + unsigned char tmp[GCRY_XTS_BLOCK_LEN]; + unsigned int burn, nburn; + size_t nblocks; + + if (c->spec->blocksize != GCRY_XTS_BLOCK_LEN) + return GPG_ERR_CIPHER_ALGO; + if (outbuflen < inbuflen) + return GPG_ERR_BUFFER_TOO_SHORT; + if (inbuflen < GCRY_XTS_BLOCK_LEN) + return GPG_ERR_BUFFER_TOO_SHORT; + + /* Data-unit max length: 2^20 blocks. */ + if (inbuflen > GCRY_XTS_BLOCK_LEN << 20) + return GPG_ERR_INV_LENGTH; + + nblocks = inbuflen / GCRY_XTS_BLOCK_LEN; + nblocks -= !encrypt && (inbuflen % GCRY_XTS_BLOCK_LEN) != 0; + + /* Generate first tweak value. */ + burn = tweak_fn (c->u_mode.xts.tweak_context, c->u_ctr.ctr, c->u_iv.iv); + + /* Use a bulk method if available. */ + if (nblocks && c->bulk.xts_crypt) + { + c->bulk.xts_crypt (c, c->u_ctr.ctr, outbuf, inbuf, nblocks, encrypt); + inbuf += nblocks * GCRY_XTS_BLOCK_LEN; + outbuf += nblocks * GCRY_XTS_BLOCK_LEN; + inbuflen -= nblocks * GCRY_XTS_BLOCK_LEN; + nblocks = 0; + } + + /* If we don't have a bulk method use the standard method. We also + use this method for the a remaining partial block. */ + + while (nblocks) + { + /* Xor-Encrypt/Decrypt-Xor block. */ + buf_xor (tmp, inbuf, c->u_ctr.ctr, GCRY_XTS_BLOCK_LEN); + nburn = crypt_fn (&c->context.c, tmp, tmp); + burn = nburn > burn ? nburn : burn; + buf_xor (outbuf, tmp, c->u_ctr.ctr, GCRY_XTS_BLOCK_LEN); + + outbuf += GCRY_XTS_BLOCK_LEN; + inbuf += GCRY_XTS_BLOCK_LEN; + inbuflen -= GCRY_XTS_BLOCK_LEN; + nblocks--; + + /* Generate next tweak. */ + xts_gfmul_byA (c->u_ctr.ctr, c->u_ctr.ctr); + } + + /* Handle remaining data with ciphertext stealing. */ + if (inbuflen) + { + if (!encrypt) + { + gcry_assert (inbuflen > GCRY_XTS_BLOCK_LEN); + gcry_assert (inbuflen < GCRY_XTS_BLOCK_LEN * 2); + + /* Generate last tweak. */ + xts_gfmul_byA (tmp, c->u_ctr.ctr); + + /* Decrypt last block first. */ + buf_xor (outbuf, inbuf, tmp, GCRY_XTS_BLOCK_LEN); + nburn = crypt_fn (&c->context.c, outbuf, outbuf); + burn = nburn > burn ? nburn : burn; + buf_xor (outbuf, outbuf, tmp, GCRY_XTS_BLOCK_LEN); + + inbuflen -= GCRY_XTS_BLOCK_LEN; + inbuf += GCRY_XTS_BLOCK_LEN; + outbuf += GCRY_XTS_BLOCK_LEN; + } + + gcry_assert (inbuflen < GCRY_XTS_BLOCK_LEN); + outbuf -= GCRY_XTS_BLOCK_LEN; + + /* Steal ciphertext from previous block. */ + buf_cpy (tmp, outbuf, GCRY_XTS_BLOCK_LEN); + buf_cpy (tmp, inbuf, inbuflen); + buf_cpy (outbuf + GCRY_XTS_BLOCK_LEN, outbuf, inbuflen); + + /* Decrypt/Encrypt last block. */ + buf_xor (tmp, tmp, c->u_ctr.ctr, GCRY_XTS_BLOCK_LEN); + nburn = crypt_fn (&c->context.c, tmp, tmp); + burn = nburn > burn ? nburn : burn; + buf_xor (outbuf, tmp, c->u_ctr.ctr, GCRY_XTS_BLOCK_LEN); + } + + /* Auto-increment data-unit sequence number */ + xts_inc128 (c->u_iv.iv); + + wipememory (tmp, sizeof(tmp)); + wipememory (c->u_ctr.ctr, sizeof(c->u_ctr.ctr)); + + if (burn > 0) + _gcry_burn_stack (burn + 4 * sizeof(void *)); + + return 0; +} diff --git a/cipher/cipher.c b/cipher/cipher.c index 55853da..aa4e925 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -405,6 +405,13 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, err = GPG_ERR_INV_CIPHER_MODE; break; + case GCRY_CIPHER_MODE_XTS: + if (spec->blocksize != GCRY_XTS_BLOCK_LEN) + err = GPG_ERR_INV_CIPHER_MODE; + if (!spec->encrypt || !spec->decrypt) + err = GPG_ERR_INV_CIPHER_MODE; + break; + case GCRY_CIPHER_MODE_ECB: case GCRY_CIPHER_MODE_CBC: case GCRY_CIPHER_MODE_CFB: @@ -468,6 +475,18 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, #endif /*NEED_16BYTE_ALIGNED_CONTEXT*/ ); + /* Space needed per mode. */ + switch (mode) + { + case GCRY_CIPHER_MODE_XTS: + /* Additional cipher context for tweak. */ + size += 2 * spec->contextsize + 15; + break; + + default: + break; + } + if (secure) h = xtrycalloc_secure (1, size); else @@ -478,6 +497,7 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, else { size_t off = 0; + char *tc; #ifdef NEED_16BYTE_ALIGNED_CONTEXT if ( ((uintptr_t)h & 0x0f) ) @@ -578,6 +598,13 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, h->u_mode.ocb.taglen = 16; /* Bytes. */ break; + case GCRY_CIPHER_MODE_XTS: + tc = h->context.c + spec->contextsize * 2; + tc += (16 - (uintptr_t)tc % 16) % 16; + h->u_mode.xts.tweak_context = tc; + + break; + default: break; } @@ -630,6 +657,14 @@ cipher_setkey (gcry_cipher_hd_t c, byte *key, size_t keylen) { gcry_err_code_t rc; + if (c->mode == GCRY_CIPHER_MODE_XTS) + { + /* XTS uses two keys. */ + if (keylen % 2) + return GPG_ERR_INV_KEYLEN; + keylen /= 2; + } + rc = c->spec->setkey (&c->context.c, key, keylen); if (!rc) { @@ -653,6 +688,20 @@ cipher_setkey (gcry_cipher_hd_t c, byte *key, size_t keylen) _gcry_cipher_poly1305_setkey (c); break; + case GCRY_CIPHER_MODE_XTS: + /* Setup tweak cipher with second part of XTS key. */ + rc = c->spec->setkey (c->u_mode.xts.tweak_context, key + keylen, + keylen); + if (!rc) + { + /* Duplicate initial tweak context. */ + memcpy (c->u_mode.xts.tweak_context + c->spec->contextsize, + c->u_mode.xts.tweak_context, c->spec->contextsize); + } + else + c->marks.key = 0; + break; + default: break; }; @@ -751,6 +800,12 @@ cipher_reset (gcry_cipher_hd_t c) c->u_mode.ocb.taglen = 16; break; + case GCRY_CIPHER_MODE_XTS: + memcpy (c->u_mode.xts.tweak_context, + c->u_mode.xts.tweak_context + c->spec->contextsize, + c->spec->contextsize); + break; + default: break; /* u_mode unused by other modes. */ } @@ -872,6 +927,10 @@ cipher_encrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, rc = _gcry_cipher_ocb_encrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; + case GCRY_CIPHER_MODE_XTS: + rc = _gcry_cipher_xts_crypt (c, outbuf, outbuflen, inbuf, inbuflen, 1); + break; + case GCRY_CIPHER_MODE_STREAM: c->spec->stencrypt (&c->context.c, outbuf, (byte*)/*arggg*/inbuf, inbuflen); @@ -995,6 +1054,10 @@ cipher_decrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, rc = _gcry_cipher_ocb_decrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; + case GCRY_CIPHER_MODE_XTS: + rc = _gcry_cipher_xts_crypt (c, outbuf, outbuflen, inbuf, inbuflen, 0); + break; + case GCRY_CIPHER_MODE_STREAM: c->spec->stdecrypt (&c->context.c, outbuf, (byte*)/*arggg*/inbuf, inbuflen); diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 47ac19e..80c369b 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -1692,6 +1692,23 @@ set to 12 (for 96 bit) or 8 (for 64 bit) provided for the Note that the use of @code{gcry_cipher_final} is required. + at item GCRY_CIPHER_MODE_XTS + at cindex XTS, XTS mode +XEX-based tweaked-codebook mode with ciphertext stealing (XTS) mode +is used to implement the AES-XTS as specified in IEEE 1619 Standard +Architecture for Encrypted Shared Storage Media and NIST SP800-38E. + +The XTS mode requires doubling key-length, for example, using 512-bit +key with AES-256 (@code{GCRY_CIPHER_AES256}). The 128-bit tweak value +is feed to XTS mode as little-endian byte array using + at code{gcry_cipher_setiv} function. When encrypting or decrypting, +full-sized data unit buffers needs to be passed to + at code{gcry_cipher_encrypt} or @code{gcry_cipher_decrypt}. The tweak +value is automatically incremented after each call of + at code{gcry_cipher_encrypt} and @code{gcry_cipher_decrypt}. +Auto-increment allows avoiding need of setting IV between processing +of sequential data units. + @end table @node Working with cipher handles @@ -1725,9 +1742,9 @@ ChaCha20 stream cipher. The block cipher modes @code{GCRY_CIPHER_MODE_CFB}, @code{GCRY_CIPHER_MODE_OFB} and @code{GCRY_CIPHER_MODE_CTR}) will work with any block cipher algorithm. GCM mode (@code{GCRY_CIPHER_MODE_CCM}), CCM mode -(@code{GCRY_CIPHER_MODE_GCM}), and OCB mode -(@code{GCRY_CIPHER_MODE_OCB}) will only work with block cipher -algorithms which have the block size of 16 bytes. +(@code{GCRY_CIPHER_MODE_GCM}), OCB mode (@code{GCRY_CIPHER_MODE_OCB}), +and XTS mode (@code{GCRY_CIPHER_MODE_XTS}) will only work +with block cipher algorithms which have the block size of 16 bytes. The third argument @var{flags} can either be passed as @code{0} or as the bit-wise OR of the following constants. diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in index 77ff947..a0fdaf9 100644 --- a/src/gcrypt.h.in +++ b/src/gcrypt.h.in @@ -961,7 +961,8 @@ enum gcry_cipher_modes GCRY_CIPHER_MODE_GCM = 9, /* Galois Counter Mode. */ GCRY_CIPHER_MODE_POLY1305 = 10, /* Poly1305 based AEAD mode. */ GCRY_CIPHER_MODE_OCB = 11, /* OCB3 mode. */ - GCRY_CIPHER_MODE_CFB8 = 12 /* Cipher feedback (8 bit mode). */ + GCRY_CIPHER_MODE_CFB8 = 12, /* Cipher feedback (8 bit mode). */ + GCRY_CIPHER_MODE_XTS = 13 /* XTS mode. */ }; /* Flags used with the open function. */ @@ -982,6 +983,9 @@ enum gcry_cipher_flags /* OCB works only with blocks of 128 bits. */ #define GCRY_OCB_BLOCK_LEN (128 / 8) +/* XTS works only with blocks of 128 bits. */ +#define GCRY_XTS_BLOCK_LEN (128 / 8) + /* Create a handle for algorithm ALGO to be used in MODE. FLAGS may be given as an bitwise OR of the gcry_cipher_flags values. */ gcry_error_t gcry_cipher_open (gcry_cipher_hd_t *handle, diff --git a/tests/basic.c b/tests/basic.c index 9223222..9b6fc13 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -3831,6 +3831,337 @@ check_ocb_cipher (void) check_ocb_cipher_splitaad (); } + + +static void +do_check_xts_cipher (int inplace) +{ + /* Note that we use hex strings and not binary strings in TV. That + makes it easier to maintain the test vectors. */ + static const struct + { + int algo; + const char *key; /* NULL means "000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F" */ + const char *iv; + const char *plain; + const char *ciph; + } tv[] = { + /* CAVS; hex/XTSGenAES128.rsp; COUNT=100 */ + { GCRY_CIPHER_AES, + "bcb6613c495de4bdad9c19f04e4b3915f9ecb379e1a575b633337e934fca1050", + "64981173159d58ac355a20120c8e81f1", + "189acacee06dfa7c94484c7dae59e166", + "7900191d0f19a97668fdba9def84eedc" + }, + /* CAVS; hex/XTSGenAES128.rsp; COUNT=101 */ + { GCRY_CIPHER_AES, + "b7b93f516aef295eff3a29d837cf1f135347e8a21dae616ff5062b2e8d78ce5e", + "873edea653b643bd8bcf51403197ed14", + "236f8a5b58dd55f6194ed70c4ac1a17f1fe60ec9a6c454d087ccb77d6b638c47", + "22e6a3c6379dcf7599b052b5a749c7f78ad8a11b9f1aa9430cf3aef445682e19" + }, + /* CAVS; hex/XTSGenAES128.rsp; COUNT=301 */ + { GCRY_CIPHER_AES, + "394c97881abd989d29c703e48a72b397a7acf51b59649eeea9b33274d8541df4", + "4b15c684a152d485fe9937d39b168c29", + "2f3b9dcfbae729583b1d1ffdd16bb6fe2757329435662a78f0", + "f3473802e38a3ffef4d4fb8e6aa266ebde553a64528a06463e" + }, + /* CAVS; hex/XTSGenAES128.rsp; COUNT=500 */ + { GCRY_CIPHER_AES, + "783a83ec52a27405dff9de4c57f9c979b360b6a5df88d67ec1a052e6f582a717", + "886e975b29bdf6f0c01bb47f61f6f0f5", + "b04d84da856b9a59ce2d626746f689a8051dacd6bce3b990aa901e4030648879", + "f941039ebab8cac39d59247cbbcb4d816c726daed11577692c55e4ac6d3e6820" + }, + /* CAVS; hex/XTSGenAES256.rsp; COUNT=1 */ + { GCRY_CIPHER_AES256, + "1ea661c58d943a0e4801e42f4b0947149e7f9f8e3e68d0c7505210bd311a0e7c" + "d6e13ffdf2418d8d1911c004cda58da3d619b7e2b9141e58318eea392cf41b08", + "adf8d92627464ad2f0428e84a9f87564", + "2eedea52cd8215e1acc647e810bbc3642e87287f8d2e57e36c0a24fbc12a202e", + "cbaad0e2f6cea3f50b37f934d46a9b130b9d54f07e34f36af793e86f73c6d7db" + }, + /* CAVS; hex/XTSGenAES256.rsp; COUNT=101 */ + { GCRY_CIPHER_AES256, + "266c336b3b01489f3267f52835fd92f674374b88b4e1ebd2d36a5f457581d9d0" + "42c3eef7b0b7e5137b086496b4d9e6ac658d7196a23f23f036172fdb8faee527", + "06b209a7a22f486ecbfadb0f3137ba42", + "ca7d65ef8d3dfad345b61ccddca1ad81de830b9e86c7b426d76cb7db766852d9" + "81c6b21409399d78f42cc0b33a7bbb06", + "c73256870cc2f4dd57acc74b5456dbd776912a128bc1f77d72cdebbf270044b7" + "a43ceed29025e1e8be211fa3c3ed002d" + }, + /* CAVS; hex/XTSGenAES256.rsp; COUNT=401 */ + { GCRY_CIPHER_AES256, + "33e89e817ff8d037d6ac5a2296657503f20885d94c483e26449066bd9284d130" + "2dbdbb4b66b6b9f4687f13dd028eb6aa528ca91deb9c5f40db93218806033801", + "a78c04335ab7498a52b81ed74b48e6cf", + "14c3ac31291b075f40788247c3019e88c7b40bac3832da45bbc6c4fe7461371b" + "4dfffb63f71c9f8edb98f28ff4f33121", + "dead7e587519bc78c70d99279fbe3d9b1ad13cdaae69824e0ab8135413230bfd" + "b13babe8f986fbb30d46ab5ec56b916e" + }, + /* From https://github.com/heisencoder/XTS-AES/blob/master/testvals/ */ + { GCRY_CIPHER_AES, + "fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0", + "9a785634120000000000000000000000", + "000102030405060708090a0b0c0d0e0f10", + "7fb2e8beccbb5c118aa52ddca31220bb1b" + }, + { GCRY_CIPHER_AES, + "fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0", + "9a785634120000000000000000000000", + "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e", + "d05bc090a8e04f1b3d3ecdd5baec0fd4edbf9dace45d6f6a7306e64be5dd82" + }, + { GCRY_CIPHER_AES, + "2718281828459045235360287471352631415926535897932384626433832795", + "00000000000000000000000000000000", + "000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F" + "20212223", + "27A7479BEFA1D476489F308CD4CFA6E288F548E5C4239F91712A587E2B05AC3D" + "A96E4BBE" + }, + { GCRY_CIPHER_AES256, + "2718281828459045235360287471352662497757247093699959574966967627" + "3141592653589793238462643383279502884197169399375105820974944592", + "11000000000000000000000000000000", + "3A060A8CAD115A6F44572E3759E43C8F8832FEDC28A8E35B357B5CF3EDBEF788" + "CAD8BFCB23", + "6D1C78A8BAD91DB2924C507CCEDE835F5BADD157DA0AF55C98BBC28CF676F9FA" + "61618FA696" + }, + { GCRY_CIPHER_AES256, + "2718281828459045235360287471352662497757247093699959574966967627" + "3141592653589793238462643383279502884197169399375105820974944592", + "11000000000000000000000000000000", + "3A060A8CAD115A6F44572E3759E43C8F8832FEDC28A8E35B357B5CF3EDBEF788" + "CAD8BFCB23", + "6D1C78A8BAD91DB2924C507CCEDE835F5BADD157DA0AF55C98BBC28CF676F9FA" + "61618FA696" + }, + { GCRY_CIPHER_AES, + "e0e1e2e3e4e5e6e7e8e9eaebecedeeefc0c1c2c3c4c5c6c7c8c9cacbcccdcecf", + "21436587a90000000000000000000000", + "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" + "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" + "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" + "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" + "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" + "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" + "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" + "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff" + "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" + "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" + "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" + "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" + "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" + "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" + "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" + "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff" + "0001020304050607", + "38b45812ef43a05bd957e545907e223b954ab4aaf088303ad910eadf14b42be6" + "8b2461149d8c8ba85f992be970bc621f1b06573f63e867bf5875acafa04e42cc" + "bd7bd3c2a0fb1fff791ec5ec36c66ae4ac1e806d81fbf709dbe29e471fad3854" + "9c8e66f5345d7c1eb94f405d1ec785cc6f6a68f6254dd8339f9d84057e01a177" + "41990482999516b5611a38f41bb6478e6f173f320805dd71b1932fc333cb9ee3" + "9936beea9ad96fa10fb4112b901734ddad40bc1878995f8e11aee7d141a2f5d4" + "8b7a4e1e7f0b2c04830e69a4fd1378411c2f287edf48c6c4e5c247a19680f7fe" + "41cefbd49b582106e3616cbbe4dfb2344b2ae9519391f3e0fb4922254b1d6d2d" + "19c6d4d537b3a26f3bcc51588b32f3eca0829b6a5ac72578fb814fb43cf80d64" + "a233e3f997a3f02683342f2b33d25b492536b93becb2f5e1a8b82f5b88334272" + "9e8ae09d16938841a21a97fb543eea3bbff59f13c1a18449e398701c1ad51648" + "346cbc04c27bb2da3b93a1372ccae548fb53bee476f9e9c91773b1bb19828394" + "d55d3e1a20ed69113a860b6829ffa847224604435070221b257e8dff783615d2" + "cae4803a93aa4334ab482a0afac9c0aeda70b45a481df5dec5df8cc0f423c77a" + "5fd46cd312021d4b438862419a791be03bb4d97c0e59578542531ba466a83baf" + "92cefc151b5cc1611a167893819b63fb37ec662bc0fc907db74a94468a55a7bc" + "8a6b18e86de60290" + }, + { GCRY_CIPHER_AES256, + "2718281828459045235360287471352662497757247093699959574966967627" + "3141592653589793238462643383279502884197169399375105820974944592", + "ffffffff000000000000000000000000", + "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" + "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" + "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" + "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" + "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" + "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" + "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" + "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff" + "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" + "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" + "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" + "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" + "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" + "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" + "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" + "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff", + "bf53d2dade78e822a4d949a9bc6766b01b06a8ef70d26748c6a7fc36d80ae4c5" + "520f7c4ab0ac8544424fa405162fef5a6b7f229498063618d39f0003cb5fb8d1" + "c86b643497da1ff945c8d3bedeca4f479702a7a735f043ddb1d6aaade3c4a0ac" + "7ca7f3fa5279bef56f82cd7a2f38672e824814e10700300a055e1630b8f1cb0e" + "919f5e942010a416e2bf48cb46993d3cb6a51c19bacf864785a00bc2ecff15d3" + "50875b246ed53e68be6f55bd7e05cfc2b2ed6432198a6444b6d8c247fab941f5" + "69768b5c429366f1d3f00f0345b96123d56204c01c63b22ce78baf116e525ed9" + "0fdea39fa469494d3866c31e05f295ff21fea8d4e6e13d67e47ce722e9698a1c" + "1048d68ebcde76b86fcf976eab8aa9790268b7068e017a8b9b749409514f1053" + "027fd16c3786ea1bac5f15cb79711ee2abe82f5cf8b13ae73030ef5b9e4457e7" + "5d1304f988d62dd6fc4b94ed38ba831da4b7634971b6cd8ec325d9c61c00f1df" + "73627ed3745a5e8489f3a95c69639c32cd6e1d537a85f75cc844726e8a72fc00" + "77ad22000f1d5078f6b866318c668f1ad03d5a5fced5219f2eabbd0aa5c0f460" + "d183f04404a0d6f469558e81fab24a167905ab4c7878502ad3e38fdbe62a4155" + "6cec37325759533ce8f25f367c87bb5578d667ae93f9e2fd99bcbc5f2fbba88c" + "f6516139420fcff3b7361d86322c4bd84c82f335abb152c4a93411373aaa8220" + } + }; + gpg_error_t err = 0; + gcry_cipher_hd_t hde, hdd; + int tidx; + + if (verbose) + fprintf (stderr, " Starting XTS checks.\n"); + + for (tidx = 0; tidx < DIM (tv); tidx++) + { + const char *hexkey = tv[tidx].key; + char *key, *iv, *ciph, *plain, *out; + size_t keylen, ivlen, ciphlen, plainlen, outlen; + + if (verbose) + fprintf (stderr, " checking XTS mode for %s [%i] (tv %d)\n", + gcry_cipher_algo_name (tv[tidx].algo), tv[tidx].algo, tidx); + + if (!hexkey) + hexkey = "000102030405060708090A0B0C0D0E0F" + "101112131415161718191A1B1C1D1E1F"; + + /* Convert to hex strings to binary. */ + key = hex2buffer (hexkey, &keylen); + iv = hex2buffer (tv[tidx].iv, &ivlen); + plain = hex2buffer (tv[tidx].plain, &plainlen); + ciph = hex2buffer (tv[tidx].ciph, &ciphlen); + outlen = plainlen + 5; + out = xmalloc (outlen); + + assert (plainlen == ciphlen); + assert (plainlen <= outlen); + assert (out); + + err = gcry_cipher_open (&hde, tv[tidx].algo, GCRY_CIPHER_MODE_XTS, 0); + if (!err) + err = gcry_cipher_open (&hdd, tv[tidx].algo, GCRY_CIPHER_MODE_XTS, 0); + if (err) + { + fail ("cipher-xts, gcry_cipher_open failed (tv %d): %s\n", + tidx, gpg_strerror (err)); + return; + } + + err = gcry_cipher_setkey (hde, key, keylen); + if (!err) + err = gcry_cipher_setkey (hdd, key, keylen); + if (err) + { + fail ("cipher-ocb, gcry_cipher_setkey failed (tv %d): %s\n", + tidx, gpg_strerror (err)); + gcry_cipher_close (hde); + gcry_cipher_close (hdd); + return; + } + + err = gcry_cipher_setiv (hde, iv, ivlen); + if (!err) + err = gcry_cipher_setiv (hdd, iv, ivlen); + if (err) + { + fail ("cipher-ocb, gcry_cipher_setiv failed (tv %d): %s\n", + tidx, gpg_strerror (err)); + gcry_cipher_close (hde); + gcry_cipher_close (hdd); + return; + } + + if (inplace) + { + memcpy(out, plain, plainlen); + err = gcry_cipher_encrypt (hde, out, plainlen, NULL, 0); + } + else + { + err = gcry_cipher_encrypt (hde, out, outlen, plain, plainlen); + } + if (err) + { + fail ("cipher-xts, gcry_cipher_encrypt failed (tv %d): %s\n", + tidx, gpg_strerror (err)); + gcry_cipher_close (hde); + gcry_cipher_close (hdd); + return; + } + + /* Check that the encrypt output matches the expected cipher text. */ + if (memcmp (ciph, out, plainlen)) + { + mismatch (ciph, plainlen, out, plainlen); + fail ("cipher-xts, encrypt data mismatch (tv %d)\n", tidx); + } + + /* Now for the decryption. */ + if (inplace) + { + err = gcry_cipher_decrypt (hdd, out, plainlen, NULL, 0); + } + else + { + memcpy(ciph, out, ciphlen); + err = gcry_cipher_decrypt (hdd, out, plainlen, ciph, ciphlen); + } + if (err) + { + fail ("cipher-xts, gcry_cipher_decrypt (tv %d) failed: %s\n", + tidx, gpg_strerror (err)); + gcry_cipher_close (hde); + gcry_cipher_close (hdd); + return; + } + + /* Check that the decrypt output matches the expected plain text. */ + if (memcmp (plain, out, plainlen)) + { + mismatch (plain, plainlen, out, plainlen); + fail ("cipher-xts, decrypt data mismatch (tv %d)\n", tidx); + } + + gcry_cipher_close (hde); + gcry_cipher_close (hdd); + + xfree (iv); + xfree (ciph); + xfree (plain); + xfree (key); + xfree (out); + } + + if (verbose) + fprintf (stderr, " Completed XTS checks.\n"); +} + + +static void +check_xts_cipher (void) +{ + /* Check XTS cipher with separate destination and source buffers for + * encryption/decryption. */ + do_check_xts_cipher(0); + + /* Check XTS cipher with inplace encrypt/decrypt. */ + do_check_xts_cipher(1); +} + + static void check_gost28147_cipher (void) { @@ -5233,6 +5564,20 @@ check_bulk_cipher_modes (void) /*[14]*/ { 0x2d, 0x71, 0x54, 0xb9, 0xc5, 0x28, 0x76, 0xff, 0x76, 0xb5, 0x99, 0x37, 0x99, 0x9d, 0xf7, 0x10, 0x6d, 0x86, 0x4f, 0x3f } + }, + { GCRY_CIPHER_AES128, GCRY_CIPHER_MODE_XTS, + "abcdefghijklmnopABCDEFGHIJKLMNOP", 32, + "1234567890123456", 16, +/*[15]*/ + { 0x71, 0x46, 0x40, 0xb0, 0xed, 0x6f, 0xc4, 0x82, 0x2b, 0x3f, + 0xb6, 0xf7, 0x81, 0x08, 0x4c, 0x8b, 0xc1, 0x66, 0x4c, 0x1b } + }, + { GCRY_CIPHER_AES256, GCRY_CIPHER_MODE_XTS, + "abcdefghijklmnopABCDEFGHIJKLMNOP_abcdefghijklmnopABCDEFGHIJKLMNO", 64, + "1234567890123456", 16, +/*[16]*/ + { 0x8e, 0xbc, 0xa5, 0x21, 0x0a, 0x4b, 0x53, 0x14, 0x79, 0x81, + 0x25, 0xad, 0x24, 0x45, 0x98, 0xbd, 0x9f, 0x27, 0x5f, 0x01 } } }; gcry_cipher_hd_t hde = NULL; @@ -5437,15 +5782,16 @@ check_one_cipher_core (int algo, int mode, int flags, blklen = get_algo_mode_blklen(algo, mode); - assert (nkey == 32); + assert (nkey == 64); assert (nplain == 1040); assert (sizeof(in_buffer) == nplain + 1); assert (sizeof(out_buffer) == sizeof(in_buffer)); assert (blklen > 0); - if (mode == GCRY_CIPHER_MODE_CBC && (flags & GCRY_CIPHER_CBC_CTS)) + if ((mode == GCRY_CIPHER_MODE_CBC && (flags & GCRY_CIPHER_CBC_CTS)) || + mode == GCRY_CIPHER_MODE_XTS) { - /* TODO: examine why CBC with CTS fails. */ + /* Input cannot be split in to multiple operations with CTS . */ blklen = nplain; } @@ -5484,6 +5830,11 @@ check_one_cipher_core (int algo, int mode, int flags, return -1; } + if (mode == GCRY_CIPHER_MODE_XTS) + { + keylen *= 2; + } + err = gcry_cipher_open (&hd, algo, mode, flags); if (err) { @@ -5695,14 +6046,15 @@ check_one_cipher_core (int algo, int mode, int flags, static void check_one_cipher (int algo, int mode, int flags) { - char key[32+1]; + char key[64+1]; unsigned char plain[1040+1]; int bufshift, i; for (bufshift=0; bufshift < 4; bufshift++) { /* Pass 0: Standard test. */ - memcpy (key, "0123456789abcdef.,;/[]{}-=ABCDEF", 32); + memcpy (key, "0123456789abcdef.,;/[]{}-=ABCDEF_" + "0123456789abcdef.,;/[]{}-=ABCDEF", 64); memcpy (plain, "foobar42FOOBAR17", 16); for (i = 16; i < 1040; i += 16) { @@ -5713,25 +6065,25 @@ check_one_cipher (int algo, int mode, int flags) plain[i+14]++; } - if (check_one_cipher_core (algo, mode, flags, key, 32, plain, 1040, + if (check_one_cipher_core (algo, mode, flags, key, 64, plain, 1040, bufshift, 0+10*bufshift)) return; /* Pass 1: Key not aligned. */ - memmove (key+1, key, 32); - if (check_one_cipher_core (algo, mode, flags, key+1, 32, plain, 1040, + memmove (key+1, key, 64); + if (check_one_cipher_core (algo, mode, flags, key+1, 64, plain, 1040, bufshift, 1+10*bufshift)) return; /* Pass 2: Key not aligned and data not aligned. */ memmove (plain+1, plain, 1040); - if (check_one_cipher_core (algo, mode, flags, key+1, 32, plain+1, 1040, + if (check_one_cipher_core (algo, mode, flags, key+1, 64, plain+1, 1040, bufshift, 2+10*bufshift)) return; /* Pass 3: Key aligned and data not aligned. */ - memmove (key, key+1, 32); - if (check_one_cipher_core (algo, mode, flags, key, 32, plain+1, 1040, + memmove (key, key+1, 64); + if (check_one_cipher_core (algo, mode, flags, key, 64, plain+1, 1040, bufshift, 3+10*bufshift)) return; } @@ -5831,6 +6183,8 @@ check_ciphers (void) check_one_cipher (algos[i], GCRY_CIPHER_MODE_GCM, 0); if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_OCB_BLOCK_LEN) check_one_cipher (algos[i], GCRY_CIPHER_MODE_OCB, 0); + if (gcry_cipher_get_algo_blklen (algos[i]) == GCRY_XTS_BLOCK_LEN) + check_one_cipher (algos[i], GCRY_CIPHER_MODE_XTS, 0); } for (i = 0; algos2[i]; i++) @@ -5874,6 +6228,7 @@ check_cipher_modes(void) check_gcm_cipher (); check_poly1305_cipher (); check_ocb_cipher (); + check_xts_cipher (); check_gost28147_cipher (); check_stream_cipher (); check_stream_cipher_large_block (); diff --git a/tests/bench-slope.c b/tests/bench-slope.c index 4ed98cb..6d93ad2 100644 --- a/tests/bench-slope.c +++ b/tests/bench-slope.c @@ -742,6 +742,126 @@ static struct bench_ops decrypt_ops = { }; +static int +bench_xts_encrypt_init (struct bench_obj *obj) +{ + struct bench_cipher_mode *mode = obj->priv; + gcry_cipher_hd_t hd; + int err, keylen; + + /* For XTS, benchmark with typical data-unit size (512 byte sectors). */ + obj->min_bufsize = 512; + obj->max_bufsize = 16 * obj->min_bufsize; + obj->step_size = obj->min_bufsize; + obj->num_measure_repetitions = num_measurement_repetitions; + + err = gcry_cipher_open (&hd, mode->algo, mode->mode, 0); + if (err) + { + fprintf (stderr, PGM ": error opening cipher `%s'\n", + gcry_cipher_algo_name (mode->algo)); + exit (1); + } + + /* Double key-length for XTS. */ + keylen = gcry_cipher_get_algo_keylen (mode->algo) * 2; + if (keylen) + { + char key[keylen]; + int i; + + for (i = 0; i < keylen; i++) + key[i] = 0x33 ^ (11 - i); + + err = gcry_cipher_setkey (hd, key, keylen); + if (err) + { + fprintf (stderr, PGM ": gcry_cipher_setkey failed: %s\n", + gpg_strerror (err)); + gcry_cipher_close (hd); + exit (1); + } + } + else + { + fprintf (stderr, PGM ": failed to get key length for algorithm `%s'\n", + gcry_cipher_algo_name (mode->algo)); + gcry_cipher_close (hd); + exit (1); + } + + obj->priv = hd; + + return 0; +} + +static void +bench_xts_encrypt_do_bench (struct bench_obj *obj, void *buf, size_t buflen) +{ + gcry_cipher_hd_t hd = obj->priv; + unsigned int pos; + static const char tweak[16] = { 0xff, 0xff, 0xfe, }; + size_t sectorlen = obj->step_size; + char *cbuf = buf; + int err; + + gcry_cipher_setiv (hd, tweak, sizeof (tweak)); + + /* Process each sector separately. */ + + for (pos = 0; pos < buflen; pos += sectorlen, cbuf += sectorlen) + { + err = gcry_cipher_encrypt (hd, cbuf, sectorlen, cbuf, sectorlen); + if (err) + { + fprintf (stderr, PGM ": gcry_cipher_encrypt failed: %s\n", + gpg_strerror (err)); + gcry_cipher_close (hd); + exit (1); + } + } +} + +static void +bench_xts_decrypt_do_bench (struct bench_obj *obj, void *buf, size_t buflen) +{ + gcry_cipher_hd_t hd = obj->priv; + unsigned int pos; + static const char tweak[16] = { 0xff, 0xff, 0xfe, }; + size_t sectorlen = obj->step_size; + char *cbuf = buf; + int err; + + gcry_cipher_setiv (hd, tweak, sizeof (tweak)); + + /* Process each sector separately. */ + + for (pos = 0; pos < buflen; pos += sectorlen, cbuf += sectorlen) + { + err = gcry_cipher_decrypt (hd, cbuf, sectorlen, cbuf, sectorlen); + if (err) + { + fprintf (stderr, PGM ": gcry_cipher_encrypt failed: %s\n", + gpg_strerror (err)); + gcry_cipher_close (hd); + exit (1); + } + } +} + +static struct bench_ops xts_encrypt_ops = { + &bench_xts_encrypt_init, + &bench_encrypt_free, + &bench_xts_encrypt_do_bench +}; + +static struct bench_ops xts_decrypt_ops = { + &bench_xts_encrypt_init, + &bench_encrypt_free, + &bench_xts_decrypt_do_bench +}; + + static void bench_ccm_encrypt_do_bench (struct bench_obj *obj, void *buf, size_t buflen) { @@ -1166,6 +1286,8 @@ static struct bench_cipher_mode cipher_modes[] = { {GCRY_CIPHER_MODE_OFB, "OFB dec", &decrypt_ops}, {GCRY_CIPHER_MODE_CTR, "CTR enc", &encrypt_ops}, {GCRY_CIPHER_MODE_CTR, "CTR dec", &decrypt_ops}, + {GCRY_CIPHER_MODE_XTS, "XTS enc", &xts_encrypt_ops}, + {GCRY_CIPHER_MODE_XTS, "XTS dec", &xts_decrypt_ops}, {GCRY_CIPHER_MODE_CCM, "CCM enc", &ccm_encrypt_ops}, {GCRY_CIPHER_MODE_CCM, "CCM dec", &ccm_decrypt_ops}, {GCRY_CIPHER_MODE_CCM, "CCM auth", &ccm_authenticate_ops}, @@ -1219,8 +1341,12 @@ cipher_bench_one (int algo, struct bench_cipher_mode *pmode) if (mode.mode == GCRY_CIPHER_MODE_GCM && blklen != GCRY_GCM_BLOCK_LEN) return; + /* XTS has restrictions for block-size */ + if (mode.mode == GCRY_CIPHER_MODE_XTS && blklen != GCRY_XTS_BLOCK_LEN) + return; + /* Our OCB implementaion has restrictions for block-size. */ - if (mode.mode == GCRY_CIPHER_MODE_OCB && blklen != 16) + if (mode.mode == GCRY_CIPHER_MODE_OCB && blklen != GCRY_OCB_BLOCK_LEN) return; bench_print_mode (14, mode.name); diff --git a/tests/benchmark.c b/tests/benchmark.c index a63cce3..44a8711 100644 --- a/tests/benchmark.c +++ b/tests/benchmark.c @@ -764,12 +764,15 @@ cipher_bench ( const char *algoname ) int req_blocksize; int authlen; int noncelen; + int doublekey; } modes[] = { { GCRY_CIPHER_MODE_ECB, " ECB/Stream", 1 }, { GCRY_CIPHER_MODE_CBC, " CBC", 1 }, { GCRY_CIPHER_MODE_CFB, " CFB", 0 }, { GCRY_CIPHER_MODE_OFB, " OFB", 0 }, { GCRY_CIPHER_MODE_CTR, " CTR", 0 }, + { GCRY_CIPHER_MODE_XTS, " XTS", 0, + NULL, GCRY_XTS_BLOCK_LEN, 0, 0, 1 }, { GCRY_CIPHER_MODE_CCM, " CCM", 0, ccm_aead_init, GCRY_CCM_BLOCK_LEN, 8 }, { GCRY_CIPHER_MODE_GCM, " GCM", 0, @@ -841,13 +844,13 @@ cipher_bench ( const char *algoname ) algoname); exit (1); } - if ( keylen > sizeof key ) + if ( keylen * 2 > sizeof key ) { fprintf (stderr, PGM ": algo %d, keylength problem (%d)\n", algo, keylen ); exit (1); } - for (i=0; i < keylen; i++) + for (i=0; i < keylen * 2; i++) key[i] = i + (clock () & 0xff); blklen = gcry_cipher_get_algo_blklen (algo); @@ -863,6 +866,8 @@ cipher_bench ( const char *algoname ) for (modeidx=0; modes[modeidx].mode; modeidx++) { + size_t modekeylen = keylen * (!!modes[modeidx].doublekey + 1); + if ((blklen > 1 && modes[modeidx].mode == GCRY_CIPHER_MODE_STREAM) || (blklen == 1 && modes[modeidx].mode != GCRY_CIPHER_MODE_STREAM)) continue; @@ -886,7 +891,7 @@ cipher_bench ( const char *algoname ) if (!cipher_with_keysetup) { - err = gcry_cipher_setkey (hd, key, keylen); + err = gcry_cipher_setkey (hd, key, modekeylen); if (err) { fprintf (stderr, "gcry_cipher_setkey failed: %s\n", @@ -905,7 +910,7 @@ cipher_bench ( const char *algoname ) { if (cipher_with_keysetup) { - err = gcry_cipher_setkey (hd, key, keylen); + err = gcry_cipher_setkey (hd, key, modekeylen); if (err) { fprintf (stderr, "gcry_cipher_setkey failed: %s\n", @@ -969,7 +974,7 @@ cipher_bench ( const char *algoname ) if (!cipher_with_keysetup) { - err = gcry_cipher_setkey (hd, key, keylen); + err = gcry_cipher_setkey (hd, key, modekeylen); if (err) { fprintf (stderr, "gcry_cipher_setkey failed: %s\n", @@ -984,7 +989,7 @@ cipher_bench ( const char *algoname ) { if (cipher_with_keysetup) { - err = gcry_cipher_setkey (hd, key, keylen); + err = gcry_cipher_setkey (hd, key, modekeylen); if (err) { fprintf (stderr, "gcry_cipher_setkey failed: %s\n", From jussi.kivilinna at iki.fi Wed Jan 4 16:15:17 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 04 Jan 2017 17:15:17 +0200 Subject: [PATCH] Add AVX2/vpgather bulk implementation of Twofish Message-ID: <148354291755.7623.13548603576778423286.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'twofish-avx2-amd64.S'. * cipher/twofish-avx2-amd64.S: New. * cipher/twofish.c (USE_AVX2): New. (TWOFISH_context) [USE_AVX2]: Add 'use_avx2' member. (ASM_FUNC_ABI): New. (twofish_setkey): Add check for AVX2 and fast VPGATHER HW features. (_gcry_twofish_avx2_ctr_enc, _gcry_twofish_avx2_cbc_dec) (_gcry_twofish_avx2_cfb_dec, _gcry_twofish_avx2_ocb_enc) (_gcry_twofish_avx2_ocb_dec, _gcry_twofish_avx2_ocb_auth): New. (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec, _gcry_twofish_cfb_dec) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Add AVX2 bulk handling. (selftest_ctr, selftest_cbc, selftest_cfb): Increase nblocks from 3+X to 16+X. * configure.ac: Add 'twofish-avx2-amd64.lo'. * src/g10lib.h (HWF_INTEL_FAST_VPGATHER): New. * src/hwf-x86.c (detect_x86_gnuc): Add detection for HWF_INTEL_FAST_VPGATHER. * src/hwfeatures.c (HWF_INTEL_FAST_VPGATHER): Add "intel-fast-vpgather" for HWF_INTEL_FAST_VPGATHER. -- Benchmark on Intel Core i3-6100 (3.7 Ghz): Before: TWOFISH | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.25 ns/B 224.5 MiB/s 15.71 c/B ECB dec | 4.16 ns/B 229.5 MiB/s 15.38 c/B CBC enc | 4.53 ns/B 210.4 MiB/s 16.77 c/B CBC dec | 2.71 ns/B 351.6 MiB/s 10.04 c/B CFB enc | 4.60 ns/B 207.3 MiB/s 17.02 c/B CFB dec | 2.70 ns/B 353.5 MiB/s 9.98 c/B OFB enc | 4.25 ns/B 224.2 MiB/s 15.74 c/B OFB dec | 4.24 ns/B 225.0 MiB/s 15.68 c/B CTR enc | 2.72 ns/B 350.6 MiB/s 10.06 c/B CTR dec | 2.72 ns/B 350.7 MiB/s 10.06 c/B CCM enc | 7.25 ns/B 131.5 MiB/s 26.83 c/B CCM dec | 7.25 ns/B 131.5 MiB/s 26.83 c/B CCM auth | 4.57 ns/B 208.9 MiB/s 16.89 c/B GCM enc | 3.02 ns/B 315.3 MiB/s 11.19 c/B GCM dec | 3.02 ns/B 315.6 MiB/s 11.18 c/B GCM auth | 0.297 ns/B 3208.4 MiB/s 1.10 c/B OCB enc | 2.73 ns/B 349.7 MiB/s 10.09 c/B OCB dec | 2.82 ns/B 338.3 MiB/s 10.43 c/B OCB auth | 2.77 ns/B 343.7 MiB/s 10.27 c/B After (CBC-dec & CFB-dec & CTR & OCB, ~1.5x faster): TWOFISH | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.25 ns/B 224.2 MiB/s 15.74 c/B ECB dec | 4.15 ns/B 229.5 MiB/s 15.37 c/B CBC enc | 4.61 ns/B 206.8 MiB/s 17.06 c/B CBC dec | 1.75 ns/B 544.0 MiB/s 6.49 c/B CFB enc | 4.52 ns/B 211.0 MiB/s 16.72 c/B CFB dec | 1.72 ns/B 554.1 MiB/s 6.37 c/B OFB enc | 4.27 ns/B 223.3 MiB/s 15.80 c/B OFB dec | 4.28 ns/B 222.7 MiB/s 15.84 c/B CTR enc | 1.73 ns/B 549.9 MiB/s 6.42 c/B CTR dec | 1.75 ns/B 545.1 MiB/s 6.47 c/B CCM enc | 6.31 ns/B 151.2 MiB/s 23.34 c/B CCM dec | 6.42 ns/B 148.5 MiB/s 23.76 c/B CCM auth | 4.56 ns/B 208.9 MiB/s 16.89 c/B GCM enc | 1.90 ns/B 502.8 MiB/s 7.02 c/B GCM dec | 2.00 ns/B 477.8 MiB/s 7.38 c/B GCM auth | 0.300 ns/B 3178.6 MiB/s 1.11 c/B OCB enc | 1.76 ns/B 542.2 MiB/s 6.51 c/B OCB dec | 1.76 ns/B 540.7 MiB/s 6.53 c/B OCB auth | 1.76 ns/B 542.8 MiB/s 6.50 c/B Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 71a25ed..8c9fc0e 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -100,6 +100,7 @@ stribog.c \ tiger.c \ whirlpool.c whirlpool-sse2-amd64.S \ twofish.c twofish-amd64.S twofish-arm.S twofish-aarch64.S \ + twofish-avx2-amd64.S \ rfc2268.c \ camellia.c camellia.h camellia-glue.c camellia-aesni-avx-amd64.S \ camellia-aesni-avx2-amd64.S camellia-arm.S camellia-aarch64.S diff --git a/cipher/twofish-avx2-amd64.S b/cipher/twofish-avx2-amd64.S new file mode 100644 index 0000000..db6e218 --- /dev/null +++ b/cipher/twofish-avx2-amd64.S @@ -0,0 +1,1012 @@ +/* twofish-avx2-amd64.S - AMD64/AVX2 assembly implementation of Twofish cipher + * + * Copyright (C) 2013-2017 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + */ + +#ifdef __x86_64 +#include +#if (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) && defined(USE_TWOFISH) && \ + defined(ENABLE_AVX2_SUPPORT) + +#ifdef HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS +# define ELF(...) __VA_ARGS__ +#else +# define ELF(...) /*_*/ +#endif + +#ifdef __PIC__ +# define RIP (%rip) +#else +# define RIP +#endif + +.text + +/* structure of TWOFISH_context: */ +#define s0 0 +#define s1 ((s0) + 4 * 256) +#define s2 ((s1) + 4 * 256) +#define s3 ((s2) + 4 * 256) +#define w ((s3) + 4 * 256) +#define k ((w) + 4 * 8) + +/* register macros */ +#define CTX %rdi + +#define RROUND %rbp +#define RROUNDd %ebp +#define RS0 CTX +#define RS1 %r8 +#define RS2 %r9 +#define RS3 %r10 +#define RK %r11 +#define RW %rax + +#define RA0 %ymm8 +#define RB0 %ymm9 +#define RC0 %ymm10 +#define RD0 %ymm11 +#define RA1 %ymm12 +#define RB1 %ymm13 +#define RC1 %ymm14 +#define RD1 %ymm15 + +/* temp regs */ +#define RX0 %ymm0 +#define RY0 %ymm1 +#define RX1 %ymm2 +#define RY1 %ymm3 +#define RT0 %ymm4 +#define RIDX %ymm5 + +#define RX0x %xmm0 +#define RY0x %xmm1 +#define RX1x %xmm2 +#define RY1x %xmm3 +#define RT0x %xmm4 +#define RIDXx %xmm5 + +#define RTMP0 RX0 +#define RTMP0x RX0x +#define RTMP1 RX1 +#define RTMP1x RX1x +#define RTMP2 RY0 +#define RTMP2x RY0x +#define RTMP3 RY1 +#define RTMP3x RY1x +#define RTMP4 RIDX +#define RTMP4x RIDXx + +/* vpgatherdd mask and '-1' */ +#define RNOT %ymm6 +#define RNOTx %xmm6 + +/* byte mask, (-1 >> 24) */ +#define RBYTE %ymm7 + +/********************************************************************** + 16-way AVX2 twofish + **********************************************************************/ +#define init_round_constants() \ + vpcmpeqd RNOT, RNOT, RNOT; \ + leaq k(CTX), RK; \ + leaq w(CTX), RW; \ + vpsrld $24, RNOT, RBYTE; \ + leaq s1(CTX), RS1; \ + leaq s2(CTX), RS2; \ + leaq s3(CTX), RS3; \ + +#define g16(ab, rs0, rs1, rs2, rs3, xy) \ + vpand RBYTE, ab ## 0, RIDX; \ + vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + \ + vpand RBYTE, ab ## 1, RIDX; \ + vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 1; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + \ + vpsrld $8, ab ## 0, RIDX; \ + vpand RBYTE, RIDX, RIDX; \ + vpgatherdd RNOT, (rs1, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 0, xy ## 0; \ + \ + vpsrld $8, ab ## 1, RIDX; \ + vpand RBYTE, RIDX, RIDX; \ + vpgatherdd RNOT, (rs1, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 1, xy ## 1; \ + \ + vpsrld $16, ab ## 0, RIDX; \ + vpand RBYTE, RIDX, RIDX; \ + vpgatherdd RNOT, (rs2, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 0, xy ## 0; \ + \ + vpsrld $16, ab ## 1, RIDX; \ + vpand RBYTE, RIDX, RIDX; \ + vpgatherdd RNOT, (rs2, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 1, xy ## 1; \ + \ + vpsrld $24, ab ## 0, RIDX; \ + vpgatherdd RNOT, (rs3, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 0, xy ## 0; \ + \ + vpsrld $24, ab ## 1, RIDX; \ + vpgatherdd RNOT, (rs3, RIDX, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpxor RT0, xy ## 1, xy ## 1; + +#define g1_16(a, x) \ + g16(a, RS0, RS1, RS2, RS3, x); + +#define g2_16(b, y) \ + g16(b, RS1, RS2, RS3, RS0, y); + +#define encrypt_round_end16(a, b, c, d, nk, r) \ + vpaddd RY0, RX0, RX0; \ + vpaddd RX0, RY0, RY0; \ + vpbroadcastd ((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RX0, RX0; \ + vpbroadcastd 4+((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RY0, RY0; \ + \ + vpxor RY0, d ## 0, d ## 0; \ + \ + vpxor RX0, c ## 0, c ## 0; \ + vpsrld $1, c ## 0, RT0; \ + vpslld $31, c ## 0, c ## 0; \ + vpor RT0, c ## 0, c ## 0; \ + \ + vpaddd RY1, RX1, RX1; \ + vpaddd RX1, RY1, RY1; \ + vpbroadcastd ((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RX1, RX1; \ + vpbroadcastd 4+((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RY1, RY1; \ + \ + vpxor RY1, d ## 1, d ## 1; \ + \ + vpxor RX1, c ## 1, c ## 1; \ + vpsrld $1, c ## 1, RT0; \ + vpslld $31, c ## 1, c ## 1; \ + vpor RT0, c ## 1, c ## 1; \ + +#define encrypt_round16(a, b, c, d, nk, r) \ + g2_16(b, RY); \ + \ + vpslld $1, b ## 0, RT0; \ + vpsrld $31, b ## 0, b ## 0; \ + vpor RT0, b ## 0, b ## 0; \ + \ + vpslld $1, b ## 1, RT0; \ + vpsrld $31, b ## 1, b ## 1; \ + vpor RT0, b ## 1, b ## 1; \ + \ + g1_16(a, RX); \ + \ + encrypt_round_end16(a, b, c, d, nk, r); + +#define encrypt_round_first16(a, b, c, d, nk, r) \ + vpslld $1, d ## 0, RT0; \ + vpsrld $31, d ## 0, d ## 0; \ + vpor RT0, d ## 0, d ## 0; \ + \ + vpslld $1, d ## 1, RT0; \ + vpsrld $31, d ## 1, d ## 1; \ + vpor RT0, d ## 1, d ## 1; \ + \ + encrypt_round16(a, b, c, d, nk, r); + +#define encrypt_round_last16(a, b, c, d, nk, r) \ + g2_16(b, RY); \ + \ + g1_16(a, RX); \ + \ + encrypt_round_end16(a, b, c, d, nk, r); + +#define decrypt_round_end16(a, b, c, d, nk, r) \ + vpaddd RY0, RX0, RX0; \ + vpaddd RX0, RY0, RY0; \ + vpbroadcastd ((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RX0, RX0; \ + vpbroadcastd 4+((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RY0, RY0; \ + \ + vpxor RX0, c ## 0, c ## 0; \ + \ + vpxor RY0, d ## 0, d ## 0; \ + vpsrld $1, d ## 0, RT0; \ + vpslld $31, d ## 0, d ## 0; \ + vpor RT0, d ## 0, d ## 0; \ + \ + vpaddd RY1, RX1, RX1; \ + vpaddd RX1, RY1, RY1; \ + vpbroadcastd ((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RX1, RX1; \ + vpbroadcastd 4+((nk)+((r)*8))(RK), RT0; \ + vpaddd RT0, RY1, RY1; \ + \ + vpxor RX1, c ## 1, c ## 1; \ + \ + vpxor RY1, d ## 1, d ## 1; \ + vpsrld $1, d ## 1, RT0; \ + vpslld $31, d ## 1, d ## 1; \ + vpor RT0, d ## 1, d ## 1; + +#define decrypt_round16(a, b, c, d, nk, r) \ + g1_16(a, RX); \ + \ + vpslld $1, a ## 0, RT0; \ + vpsrld $31, a ## 0, a ## 0; \ + vpor RT0, a ## 0, a ## 0; \ + \ + vpslld $1, a ## 1, RT0; \ + vpsrld $31, a ## 1, a ## 1; \ + vpor RT0, a ## 1, a ## 1; \ + \ + g2_16(b, RY); \ + \ + decrypt_round_end16(a, b, c, d, nk, r); + +#define decrypt_round_first16(a, b, c, d, nk, r) \ + vpslld $1, c ## 0, RT0; \ + vpsrld $31, c ## 0, c ## 0; \ + vpor RT0, c ## 0, c ## 0; \ + \ + vpslld $1, c ## 1, RT0; \ + vpsrld $31, c ## 1, c ## 1; \ + vpor RT0, c ## 1, c ## 1; \ + \ + decrypt_round16(a, b, c, d, nk, r) + +#define decrypt_round_last16(a, b, c, d, nk, r) \ + g1_16(a, RX); \ + \ + g2_16(b, RY); \ + \ + decrypt_round_end16(a, b, c, d, nk, r); + +#define encrypt_cycle16(r) \ + encrypt_round16(RA, RB, RC, RD, 0, r); \ + encrypt_round16(RC, RD, RA, RB, 8, r); + +#define encrypt_cycle_first16(r) \ + encrypt_round_first16(RA, RB, RC, RD, 0, r); \ + encrypt_round16(RC, RD, RA, RB, 8, r); + +#define encrypt_cycle_last16(r) \ + encrypt_round16(RA, RB, RC, RD, 0, r); \ + encrypt_round_last16(RC, RD, RA, RB, 8, r); + +#define decrypt_cycle16(r) \ + decrypt_round16(RC, RD, RA, RB, 8, r); \ + decrypt_round16(RA, RB, RC, RD, 0, r); + +#define decrypt_cycle_first16(r) \ + decrypt_round_first16(RC, RD, RA, RB, 8, r); \ + decrypt_round16(RA, RB, RC, RD, 0, r); + +#define decrypt_cycle_last16(r) \ + decrypt_round16(RC, RD, RA, RB, 8, r); \ + decrypt_round_last16(RA, RB, RC, RD, 0, r); + +#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ + vpunpckhdq x1, x0, t2; \ + vpunpckldq x1, x0, x0; \ + \ + vpunpckldq x3, x2, t1; \ + vpunpckhdq x3, x2, x2; \ + \ + vpunpckhqdq t1, x0, x1; \ + vpunpcklqdq t1, x0, x0; \ + \ + vpunpckhqdq x2, t2, x3; \ + vpunpcklqdq x2, t2, x2; + +#define read_blocks8(offs,a,b,c,d) \ + vmovdqu 16*offs(RIO), a; \ + vmovdqu 16*offs+32(RIO), b; \ + vmovdqu 16*offs+64(RIO), c; \ + vmovdqu 16*offs+96(RIO), d; \ + \ + transpose_4x4(a, b, c, d, RX0, RY0); + +#define write_blocks8(offs,a,b,c,d) \ + transpose_4x4(a, b, c, d, RX0, RY0); \ + \ + vmovdqu a, 16*offs(RIO); \ + vmovdqu b, 16*offs+32(RIO); \ + vmovdqu c, 16*offs+64(RIO); \ + vmovdqu d, 16*offs+96(RIO); + +#define inpack_enc8(a,b,c,d) \ + vpbroadcastd 4*0(RW), RT0; \ + vpxor RT0, a, a; \ + \ + vpbroadcastd 4*1(RW), RT0; \ + vpxor RT0, b, b; \ + \ + vpbroadcastd 4*2(RW), RT0; \ + vpxor RT0, c, c; \ + \ + vpbroadcastd 4*3(RW), RT0; \ + vpxor RT0, d, d; + +#define outunpack_enc8(a,b,c,d) \ + vpbroadcastd 4*4(RW), RX0; \ + vpbroadcastd 4*5(RW), RY0; \ + vpxor RX0, c, RX0; \ + vpxor RY0, d, RY0; \ + \ + vpbroadcastd 4*6(RW), RT0; \ + vpxor RT0, a, c; \ + vpbroadcastd 4*7(RW), RT0; \ + vpxor RT0, b, d; \ + \ + vmovdqa RX0, a; \ + vmovdqa RY0, b; + +#define inpack_dec8(a,b,c,d) \ + vpbroadcastd 4*4(RW), RX0; \ + vpbroadcastd 4*5(RW), RY0; \ + vpxor RX0, a, RX0; \ + vpxor RY0, b, RY0; \ + \ + vpbroadcastd 4*6(RW), RT0; \ + vpxor RT0, c, a; \ + vpbroadcastd 4*7(RW), RT0; \ + vpxor RT0, d, b; \ + \ + vmovdqa RX0, c; \ + vmovdqa RY0, d; + +#define outunpack_dec8(a,b,c,d) \ + vpbroadcastd 4*0(RW), RT0; \ + vpxor RT0, a, a; \ + \ + vpbroadcastd 4*1(RW), RT0; \ + vpxor RT0, b, b; \ + \ + vpbroadcastd 4*2(RW), RT0; \ + vpxor RT0, c, c; \ + \ + vpbroadcastd 4*3(RW), RT0; \ + vpxor RT0, d, d; + +#define transpose4x4_16(a,b,c,d) \ + transpose_4x4(a ## 0, b ## 0, c ## 0, d ## 0, RX0, RY0); \ + transpose_4x4(a ## 1, b ## 1, c ## 1, d ## 1, RX0, RY0); + +#define inpack_enc16(a,b,c,d) \ + inpack_enc8(a ## 0, b ## 0, c ## 0, d ## 0); \ + inpack_enc8(a ## 1, b ## 1, c ## 1, d ## 1); + +#define outunpack_enc16(a,b,c,d) \ + outunpack_enc8(a ## 0, b ## 0, c ## 0, d ## 0); \ + outunpack_enc8(a ## 1, b ## 1, c ## 1, d ## 1); + +#define inpack_dec16(a,b,c,d) \ + inpack_dec8(a ## 0, b ## 0, c ## 0, d ## 0); \ + inpack_dec8(a ## 1, b ## 1, c ## 1, d ## 1); + +#define outunpack_dec16(a,b,c,d) \ + outunpack_dec8(a ## 0, b ## 0, c ## 0, d ## 0); \ + outunpack_dec8(a ## 1, b ## 1, c ## 1, d ## 1); + +.align 8 +ELF(.type __twofish_enc_blk16, at function;) +__twofish_enc_blk16: + /* input: + * %rdi: ctx, CTX + * RA0, RB0, RC0, RD0, RA1, RB1, RC1, RD1: sixteen parallel + * plaintext blocks + * output: + * RA0, RB0, RC0, RD0, RA1, RB1, RC1, RD1: sixteen parallel + * ciphertext blocks + */ + init_round_constants(); + + transpose4x4_16(RA, RB, RC, RD); + inpack_enc16(RA, RB, RC, RD); + + encrypt_cycle_first16(0); + encrypt_cycle16(2); + encrypt_cycle16(4); + encrypt_cycle16(6); + encrypt_cycle16(8); + encrypt_cycle16(10); + encrypt_cycle16(12); + encrypt_cycle_last16(14); + + outunpack_enc16(RA, RB, RC, RD); + transpose4x4_16(RA, RB, RC, RD); + + ret; +ELF(.size __twofish_enc_blk16,.-__twofish_enc_blk16;) + +.align 8 +ELF(.type __twofish_dec_blk16, at function;) +__twofish_dec_blk16: + /* input: + * %rdi: ctx, CTX + * RA0, RB0, RC0, RD0, RA1, RB1, RC1, RD1: sixteen parallel + * plaintext blocks + * output: + * RA0, RB0, RC0, RD0, RA1, RB1, RC1, RD1: sixteen parallel + * ciphertext blocks + */ + init_round_constants(); + + transpose4x4_16(RA, RB, RC, RD); + inpack_dec16(RA, RB, RC, RD); + + decrypt_cycle_first16(14); + decrypt_cycle16(12); + decrypt_cycle16(10); + decrypt_cycle16(8); + decrypt_cycle16(6); + decrypt_cycle16(4); + decrypt_cycle16(2); + decrypt_cycle_last16(0); + + outunpack_dec16(RA, RB, RC, RD); + transpose4x4_16(RA, RB, RC, RD); + + ret; +ELF(.size __twofish_dec_blk16,.-__twofish_dec_blk16;) + +#define inc_le128(x, minus_one, tmp) \ + vpcmpeqq minus_one, x, tmp; \ + vpsubq minus_one, x, x; \ + vpslldq $8, tmp, tmp; \ + vpsubq tmp, x, x; + +.align 8 +.globl _gcry_twofish_avx2_ctr_enc +ELF(.type _gcry_twofish_avx2_ctr_enc, at function;) +_gcry_twofish_avx2_ctr_enc: + /* input: + * %rdi: ctx, CTX + * %rsi: dst (16 blocks) + * %rdx: src (16 blocks) + * %rcx: iv (big endian, 128bit) + */ + + movq 8(%rcx), %rax; + bswapq %rax; + + vzeroupper; + + vbroadcasti128 .Lbswap128_mask RIP, RTMP3; + vpcmpeqd RNOT, RNOT, RNOT; + vpsrldq $8, RNOT, RNOT; /* ab: -1:0 ; cd: -1:0 */ + vpaddq RNOT, RNOT, RTMP2; /* ab: -2:0 ; cd: -2:0 */ + + /* load IV and byteswap */ + vmovdqu (%rcx), RTMP4x; + vpshufb RTMP3x, RTMP4x, RTMP4x; + vmovdqa RTMP4x, RTMP0x; + inc_le128(RTMP4x, RNOTx, RTMP1x); + vinserti128 $1, RTMP4x, RTMP0, RTMP0; + vpshufb RTMP3, RTMP0, RA0; /* +1 ; +0 */ + + /* check need for handling 64-bit overflow and carry */ + cmpq $(0xffffffffffffffff - 16), %rax; + ja .Lhandle_ctr_carry; + + /* construct IVs */ + vpsubq RTMP2, RTMP0, RTMP0; /* +3 ; +2 */ + vpshufb RTMP3, RTMP0, RB0; + vpsubq RTMP2, RTMP0, RTMP0; /* +5 ; +4 */ + vpshufb RTMP3, RTMP0, RC0; + vpsubq RTMP2, RTMP0, RTMP0; /* +7 ; +6 */ + vpshufb RTMP3, RTMP0, RD0; + vpsubq RTMP2, RTMP0, RTMP0; /* +9 ; +8 */ + vpshufb RTMP3, RTMP0, RA1; + vpsubq RTMP2, RTMP0, RTMP0; /* +11 ; +10 */ + vpshufb RTMP3, RTMP0, RB1; + vpsubq RTMP2, RTMP0, RTMP0; /* +13 ; +12 */ + vpshufb RTMP3, RTMP0, RC1; + vpsubq RTMP2, RTMP0, RTMP0; /* +15 ; +14 */ + vpshufb RTMP3, RTMP0, RD1; + vpsubq RTMP2, RTMP0, RTMP0; /* +16 */ + vpshufb RTMP3x, RTMP0x, RTMP0x; + + jmp .Lctr_carry_done; + +.Lhandle_ctr_carry: + /* construct IVs */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RB0; /* +3 ; +2 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RC0; /* +5 ; +4 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RD0; /* +7 ; +6 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RA1; /* +9 ; +8 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RB1; /* +11 ; +10 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RC1; /* +13 ; +12 */ + inc_le128(RTMP0, RNOT, RTMP1); + inc_le128(RTMP0, RNOT, RTMP1); + vpshufb RTMP3, RTMP0, RD1; /* +15 ; +14 */ + inc_le128(RTMP0, RNOT, RTMP1); + vextracti128 $1, RTMP0, RTMP0x; + vpshufb RTMP3x, RTMP0x, RTMP0x; /* +16 */ + +.align 4 +.Lctr_carry_done: + /* store new IV */ + vmovdqu RTMP0x, (%rcx); + + call __twofish_enc_blk16; + + vpxor (0 * 32)(%rdx), RA0, RA0; + vpxor (1 * 32)(%rdx), RB0, RB0; + vpxor (2 * 32)(%rdx), RC0, RC0; + vpxor (3 * 32)(%rdx), RD0, RD0; + vpxor (4 * 32)(%rdx), RA1, RA1; + vpxor (5 * 32)(%rdx), RB1, RB1; + vpxor (6 * 32)(%rdx), RC1, RC1; + vpxor (7 * 32)(%rdx), RD1, RD1; + + vmovdqu RA0, (0 * 32)(%rsi); + vmovdqu RB0, (1 * 32)(%rsi); + vmovdqu RC0, (2 * 32)(%rsi); + vmovdqu RD0, (3 * 32)(%rsi); + vmovdqu RA1, (4 * 32)(%rsi); + vmovdqu RB1, (5 * 32)(%rsi); + vmovdqu RC1, (6 * 32)(%rsi); + vmovdqu RD1, (7 * 32)(%rsi); + + vzeroall; + + ret +ELF(.size _gcry_twofish_avx2_ctr_enc,.-_gcry_twofish_avx2_ctr_enc;) + +.align 8 +.globl _gcry_twofish_avx2_cbc_dec +ELF(.type _gcry_twofish_avx2_cbc_dec, at function;) +_gcry_twofish_avx2_cbc_dec: + /* input: + * %rdi: ctx, CTX + * %rsi: dst (16 blocks) + * %rdx: src (16 blocks) + * %rcx: iv + */ + + vzeroupper; + + vmovdqu (0 * 32)(%rdx), RA0; + vmovdqu (1 * 32)(%rdx), RB0; + vmovdqu (2 * 32)(%rdx), RC0; + vmovdqu (3 * 32)(%rdx), RD0; + vmovdqu (4 * 32)(%rdx), RA1; + vmovdqu (5 * 32)(%rdx), RB1; + vmovdqu (6 * 32)(%rdx), RC1; + vmovdqu (7 * 32)(%rdx), RD1; + + call __twofish_dec_blk16; + + vmovdqu (%rcx), RNOTx; + vinserti128 $1, (%rdx), RNOT, RNOT; + vpxor RNOT, RA0, RA0; + vpxor (0 * 32 + 16)(%rdx), RB0, RB0; + vpxor (1 * 32 + 16)(%rdx), RC0, RC0; + vpxor (2 * 32 + 16)(%rdx), RD0, RD0; + vpxor (3 * 32 + 16)(%rdx), RA1, RA1; + vpxor (4 * 32 + 16)(%rdx), RB1, RB1; + vpxor (5 * 32 + 16)(%rdx), RC1, RC1; + vpxor (6 * 32 + 16)(%rdx), RD1, RD1; + vmovdqu (7 * 32 + 16)(%rdx), RNOTx; + vmovdqu RNOTx, (%rcx); /* store new IV */ + + vmovdqu RA0, (0 * 32)(%rsi); + vmovdqu RB0, (1 * 32)(%rsi); + vmovdqu RC0, (2 * 32)(%rsi); + vmovdqu RD0, (3 * 32)(%rsi); + vmovdqu RA1, (4 * 32)(%rsi); + vmovdqu RB1, (5 * 32)(%rsi); + vmovdqu RC1, (6 * 32)(%rsi); + vmovdqu RD1, (7 * 32)(%rsi); + + vzeroall; + + ret +ELF(.size _gcry_twofish_avx2_cbc_dec,.-_gcry_twofish_avx2_cbc_dec;) + +.align 8 +.globl _gcry_twofish_avx2_cfb_dec +ELF(.type _gcry_twofish_avx2_cfb_dec, at function;) +_gcry_twofish_avx2_cfb_dec: + /* input: + * %rdi: ctx, CTX + * %rsi: dst (16 blocks) + * %rdx: src (16 blocks) + * %rcx: iv + */ + + vzeroupper; + + /* Load input */ + vmovdqu (%rcx), RNOTx; + vinserti128 $1, (%rdx), RNOT, RA0; + vmovdqu (0 * 32 + 16)(%rdx), RB0; + vmovdqu (1 * 32 + 16)(%rdx), RC0; + vmovdqu (2 * 32 + 16)(%rdx), RD0; + vmovdqu (3 * 32 + 16)(%rdx), RA1; + vmovdqu (4 * 32 + 16)(%rdx), RB1; + vmovdqu (5 * 32 + 16)(%rdx), RC1; + vmovdqu (6 * 32 + 16)(%rdx), RD1; + + /* Update IV */ + vmovdqu (7 * 32 + 16)(%rdx), RNOTx; + vmovdqu RNOTx, (%rcx); + + call __twofish_enc_blk16; + + vpxor (0 * 32)(%rdx), RA0, RA0; + vpxor (1 * 32)(%rdx), RB0, RB0; + vpxor (2 * 32)(%rdx), RC0, RC0; + vpxor (3 * 32)(%rdx), RD0, RD0; + vpxor (4 * 32)(%rdx), RA1, RA1; + vpxor (5 * 32)(%rdx), RB1, RB1; + vpxor (6 * 32)(%rdx), RC1, RC1; + vpxor (7 * 32)(%rdx), RD1, RD1; + + vmovdqu RA0, (0 * 32)(%rsi); + vmovdqu RB0, (1 * 32)(%rsi); + vmovdqu RC0, (2 * 32)(%rsi); + vmovdqu RD0, (3 * 32)(%rsi); + vmovdqu RA1, (4 * 32)(%rsi); + vmovdqu RB1, (5 * 32)(%rsi); + vmovdqu RC1, (6 * 32)(%rsi); + vmovdqu RD1, (7 * 32)(%rsi); + + vzeroall; + + ret +ELF(.size _gcry_twofish_avx2_cfb_dec,.-_gcry_twofish_avx2_cfb_dec;) + +.align 8 +.globl _gcry_twofish_avx2_ocb_enc +ELF(.type _gcry_twofish_avx2_ocb_enc, at function;) + +_gcry_twofish_avx2_ocb_enc: + /* input: + * %rdi: ctx, CTX + * %rsi: dst (16 blocks) + * %rdx: src (16 blocks) + * %rcx: offset + * %r8 : checksum + * %r9 : L pointers (void *L[16]) + */ + + vzeroupper; + + subq $(4 * 8), %rsp; + + movq %r10, (0 * 8)(%rsp); + movq %r11, (1 * 8)(%rsp); + movq %r12, (2 * 8)(%rsp); + movq %r13, (3 * 8)(%rsp); + + vmovdqu (%rcx), RTMP0x; + vmovdqu (%r8), RTMP1x; + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* Checksum_i = Checksum_{i-1} xor P_i */ + /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i) */ + +#define OCB_INPUT(n, l0reg, l1reg, yreg) \ + vmovdqu (n * 32)(%rdx), yreg; \ + vpxor (l0reg), RTMP0x, RNOTx; \ + vpxor (l1reg), RNOTx, RTMP0x; \ + vinserti128 $1, RTMP0x, RNOT, RNOT; \ + vpxor yreg, RTMP1, RTMP1; \ + vpxor yreg, RNOT, yreg; \ + vmovdqu RNOT, (n * 32)(%rsi); + + movq (0 * 8)(%r9), %r10; + movq (1 * 8)(%r9), %r11; + movq (2 * 8)(%r9), %r12; + movq (3 * 8)(%r9), %r13; + OCB_INPUT(0, %r10, %r11, RA0); + OCB_INPUT(1, %r12, %r13, RB0); + movq (4 * 8)(%r9), %r10; + movq (5 * 8)(%r9), %r11; + movq (6 * 8)(%r9), %r12; + movq (7 * 8)(%r9), %r13; + OCB_INPUT(2, %r10, %r11, RC0); + OCB_INPUT(3, %r12, %r13, RD0); + movq (8 * 8)(%r9), %r10; + movq (9 * 8)(%r9), %r11; + movq (10 * 8)(%r9), %r12; + movq (11 * 8)(%r9), %r13; + OCB_INPUT(4, %r10, %r11, RA1); + OCB_INPUT(5, %r12, %r13, RB1); + movq (12 * 8)(%r9), %r10; + movq (13 * 8)(%r9), %r11; + movq (14 * 8)(%r9), %r12; + movq (15 * 8)(%r9), %r13; + OCB_INPUT(6, %r10, %r11, RC1); + OCB_INPUT(7, %r12, %r13, RD1); +#undef OCB_INPUT + + vextracti128 $1, RTMP1, RNOTx; + vmovdqu RTMP0x, (%rcx); + vpxor RNOTx, RTMP1x, RTMP1x; + vmovdqu RTMP1x, (%r8); + + movq (0 * 8)(%rsp), %r10; + movq (1 * 8)(%rsp), %r11; + movq (2 * 8)(%rsp), %r12; + movq (3 * 8)(%rsp), %r13; + + call __twofish_enc_blk16; + + addq $(4 * 8), %rsp; + + vpxor (0 * 32)(%rsi), RA0, RA0; + vpxor (1 * 32)(%rsi), RB0, RB0; + vpxor (2 * 32)(%rsi), RC0, RC0; + vpxor (3 * 32)(%rsi), RD0, RD0; + vpxor (4 * 32)(%rsi), RA1, RA1; + vpxor (5 * 32)(%rsi), RB1, RB1; + vpxor (6 * 32)(%rsi), RC1, RC1; + vpxor (7 * 32)(%rsi), RD1, RD1; + + vmovdqu RA0, (0 * 32)(%rsi); + vmovdqu RB0, (1 * 32)(%rsi); + vmovdqu RC0, (2 * 32)(%rsi); + vmovdqu RD0, (3 * 32)(%rsi); + vmovdqu RA1, (4 * 32)(%rsi); + vmovdqu RB1, (5 * 32)(%rsi); + vmovdqu RC1, (6 * 32)(%rsi); + vmovdqu RD1, (7 * 32)(%rsi); + + vzeroall; + + ret; +ELF(.size _gcry_twofish_avx2_ocb_enc,.-_gcry_twofish_avx2_ocb_enc;) + +.align 8 +.globl _gcry_twofish_avx2_ocb_dec +ELF(.type _gcry_twofish_avx2_ocb_dec, at function;) + +_gcry_twofish_avx2_ocb_dec: + /* input: + * %rdi: ctx, CTX + * %rsi: dst (16 blocks) + * %rdx: src (16 blocks) + * %rcx: offset + * %r8 : checksum + * %r9 : L pointers (void *L[16]) + */ + + vzeroupper; + + subq $(4 * 8), %rsp; + + movq %r10, (0 * 8)(%rsp); + movq %r11, (1 * 8)(%rsp); + movq %r12, (2 * 8)(%rsp); + movq %r13, (3 * 8)(%rsp); + + vmovdqu (%rcx), RTMP0x; + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* C_i = Offset_i xor ENCIPHER(K, P_i xor Offset_i) */ + +#define OCB_INPUT(n, l0reg, l1reg, yreg) \ + vmovdqu (n * 32)(%rdx), yreg; \ + vpxor (l0reg), RTMP0x, RNOTx; \ + vpxor (l1reg), RNOTx, RTMP0x; \ + vinserti128 $1, RTMP0x, RNOT, RNOT; \ + vpxor yreg, RNOT, yreg; \ + vmovdqu RNOT, (n * 32)(%rsi); + + movq (0 * 8)(%r9), %r10; + movq (1 * 8)(%r9), %r11; + movq (2 * 8)(%r9), %r12; + movq (3 * 8)(%r9), %r13; + OCB_INPUT(0, %r10, %r11, RA0); + OCB_INPUT(1, %r12, %r13, RB0); + movq (4 * 8)(%r9), %r10; + movq (5 * 8)(%r9), %r11; + movq (6 * 8)(%r9), %r12; + movq (7 * 8)(%r9), %r13; + OCB_INPUT(2, %r10, %r11, RC0); + OCB_INPUT(3, %r12, %r13, RD0); + movq (8 * 8)(%r9), %r10; + movq (9 * 8)(%r9), %r11; + movq (10 * 8)(%r9), %r12; + movq (11 * 8)(%r9), %r13; + OCB_INPUT(4, %r10, %r11, RA1); + OCB_INPUT(5, %r12, %r13, RB1); + movq (12 * 8)(%r9), %r10; + movq (13 * 8)(%r9), %r11; + movq (14 * 8)(%r9), %r12; + movq (15 * 8)(%r9), %r13; + OCB_INPUT(6, %r10, %r11, RC1); + OCB_INPUT(7, %r12, %r13, RD1); +#undef OCB_INPUT + + vmovdqu RTMP0x, (%rcx); + mov %r8, %rcx + + movq (0 * 8)(%rsp), %r10; + movq (1 * 8)(%rsp), %r11; + movq (2 * 8)(%rsp), %r12; + movq (3 * 8)(%rsp), %r13; + + call __twofish_dec_blk16; + + vmovdqu (%rcx), RTMP1x; + + vpxor (0 * 32)(%rsi), RA0, RA0; + vpxor (1 * 32)(%rsi), RB0, RB0; + vpxor (2 * 32)(%rsi), RC0, RC0; + vpxor (3 * 32)(%rsi), RD0, RD0; + vpxor (4 * 32)(%rsi), RA1, RA1; + vpxor (5 * 32)(%rsi), RB1, RB1; + vpxor (6 * 32)(%rsi), RC1, RC1; + vpxor (7 * 32)(%rsi), RD1, RD1; + + addq $(4 * 8), %rsp; + + /* Checksum_i = Checksum_{i-1} xor P_i */ + + vmovdqu RA0, (0 * 32)(%rsi); + vpxor RA0, RTMP1, RTMP1; + vmovdqu RB0, (1 * 32)(%rsi); + vpxor RB0, RTMP1, RTMP1; + vmovdqu RC0, (2 * 32)(%rsi); + vpxor RC0, RTMP1, RTMP1; + vmovdqu RD0, (3 * 32)(%rsi); + vpxor RD0, RTMP1, RTMP1; + vmovdqu RA1, (4 * 32)(%rsi); + vpxor RA1, RTMP1, RTMP1; + vmovdqu RB1, (5 * 32)(%rsi); + vpxor RB1, RTMP1, RTMP1; + vmovdqu RC1, (6 * 32)(%rsi); + vpxor RC1, RTMP1, RTMP1; + vmovdqu RD1, (7 * 32)(%rsi); + vpxor RD1, RTMP1, RTMP1; + + vextracti128 $1, RTMP1, RNOTx; + vpxor RNOTx, RTMP1x, RTMP1x; + vmovdqu RTMP1x, (%rcx); + + vzeroall; + + ret; +ELF(.size _gcry_twofish_avx2_ocb_dec,.-_gcry_twofish_avx2_ocb_dec;) + +.align 8 +.globl _gcry_twofish_avx2_ocb_auth +ELF(.type _gcry_twofish_avx2_ocb_auth, at function;) + +_gcry_twofish_avx2_ocb_auth: + /* input: + * %rdi: ctx, CTX + * %rsi: abuf (16 blocks) + * %rdx: offset + * %rcx: checksum + * %r8 : L pointers (void *L[16]) + */ + + vzeroupper; + + subq $(4 * 8), %rsp; + + movq %r10, (0 * 8)(%rsp); + movq %r11, (1 * 8)(%rsp); + movq %r12, (2 * 8)(%rsp); + movq %r13, (3 * 8)(%rsp); + + vmovdqu (%rdx), RTMP0x; + + /* Offset_i = Offset_{i-1} xor L_{ntz(i)} */ + /* Sum_i = Sum_{i-1} xor ENCIPHER(K, A_i xor Offset_i) */ + +#define OCB_INPUT(n, l0reg, l1reg, yreg) \ + vmovdqu (n * 32)(%rsi), yreg; \ + vpxor (l0reg), RTMP0x, RNOTx; \ + vpxor (l1reg), RNOTx, RTMP0x; \ + vinserti128 $1, RTMP0x, RNOT, RNOT; \ + vpxor yreg, RNOT, yreg; + + movq (0 * 8)(%r8), %r10; + movq (1 * 8)(%r8), %r11; + movq (2 * 8)(%r8), %r12; + movq (3 * 8)(%r8), %r13; + OCB_INPUT(0, %r10, %r11, RA0); + OCB_INPUT(1, %r12, %r13, RB0); + movq (4 * 8)(%r8), %r10; + movq (5 * 8)(%r8), %r11; + movq (6 * 8)(%r8), %r12; + movq (7 * 8)(%r8), %r13; + OCB_INPUT(2, %r10, %r11, RC0); + OCB_INPUT(3, %r12, %r13, RD0); + movq (8 * 8)(%r8), %r10; + movq (9 * 8)(%r8), %r11; + movq (10 * 8)(%r8), %r12; + movq (11 * 8)(%r8), %r13; + OCB_INPUT(4, %r10, %r11, RA1); + OCB_INPUT(5, %r12, %r13, RB1); + movq (12 * 8)(%r8), %r10; + movq (13 * 8)(%r8), %r11; + movq (14 * 8)(%r8), %r12; + movq (15 * 8)(%r8), %r13; + OCB_INPUT(6, %r10, %r11, RC1); + OCB_INPUT(7, %r12, %r13, RD1); +#undef OCB_INPUT + + vmovdqu RTMP0x, (%rdx); + + movq (0 * 8)(%rsp), %r10; + movq (1 * 8)(%rsp), %r11; + movq (2 * 8)(%rsp), %r12; + movq (3 * 8)(%rsp), %r13; + + call __twofish_enc_blk16; + + vpxor RA0, RB0, RA0; + vpxor RC0, RD0, RC0; + vpxor RA1, RB1, RA1; + vpxor RC1, RD1, RC1; + + vpxor RA0, RC0, RA0; + vpxor RA1, RC1, RA1; + + addq $(4 * 8), %rsp; + + vpxor RA1, RA0, RTMP1; + + vextracti128 $1, RTMP1, RNOTx; + vpxor (%rcx), RTMP1x, RTMP1x; + vpxor RNOTx, RTMP1x, RTMP1x; + vmovdqu RTMP1x, (%rcx); + + vzeroall; + + ret; +ELF(.size _gcry_twofish_avx2_ocb_auth,.-_gcry_twofish_avx2_ocb_auth;) + +.align 16 + +/* For CTR-mode IV byteswap */ + _gcry_twofish_bswap128_mask: +.Lbswap128_mask: + .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +ELF(.size _gcry_twofish_bswap128_mask,.-_gcry_twofish_bswap128_mask;) + +#endif /*defined(USE_TWOFISH) && defined(ENABLE_AVX2_SUPPORT)*/ +#endif /*__x86_64*/ diff --git a/cipher/twofish.c b/cipher/twofish.c index 55f6fb9..942e8d4 100644 --- a/cipher/twofish.c +++ b/cipher/twofish.c @@ -72,6 +72,15 @@ # endif # endif +/* USE_AVX2 indicates whether to compile with AMD64 AVX2 code. */ +#undef USE_AVX2 +#if defined(__x86_64__) && (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) +# if defined(ENABLE_AVX2_SUPPORT) +# define USE_AVX2 1 +# endif +#endif + /* Prototype for the self-test function. */ static const char *selftest(void); @@ -82,8 +91,25 @@ static const char *selftest(void); * that k[i] corresponds to what the Twofish paper calls K[i+8]. */ typedef struct { u32 s[4][256], w[8], k[32]; + +#ifdef USE_AVX2 + int use_avx2; +#endif } TWOFISH_context; + +/* Assembly implementations use SystemV ABI, ABI conversion and additional + * stack to store XMM6-XMM15 needed on Win64. */ +#undef ASM_FUNC_ABI +#if defined(USE_AVX2) +# ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ASM_FUNC_ABI __attribute__((sysv_abi)) +# else +# define ASM_FUNC_ABI +# endif +#endif + + /* These two tables are the q0 and q1 permutations, exactly as described in * the Twofish paper. */ @@ -711,12 +737,66 @@ static gcry_err_code_t twofish_setkey (void *context, const byte *key, unsigned int keylen) { TWOFISH_context *ctx = context; - int rc = do_twofish_setkey (ctx, key, keylen); + unsigned int hwfeatures = _gcry_get_hw_features (); + int rc; + + rc = do_twofish_setkey (ctx, key, keylen); + +#ifdef USE_AVX2 + ctx->use_avx2 = 0; + if ((hwfeatures & HWF_INTEL_AVX2) && (hwfeatures & HWF_INTEL_FAST_VPGATHER)) + { + ctx->use_avx2 = 1; + } +#endif + + (void)hwfeatures; + _gcry_burn_stack (23+6*sizeof(void*)); return rc; } +#ifdef USE_AVX2 +/* Assembler implementations of Twofish using AVX2. Process 16 block in + parallel. + */ +extern void _gcry_twofish_avx2_ctr_enc(const TWOFISH_context *ctx, + unsigned char *out, + const unsigned char *in, + unsigned char *ctr) ASM_FUNC_ABI; + +extern void _gcry_twofish_avx2_cbc_dec(const TWOFISH_context *ctx, + unsigned char *out, + const unsigned char *in, + unsigned char *iv) ASM_FUNC_ABI; + +extern void _gcry_twofish_avx2_cfb_dec(const TWOFISH_context *ctx, + unsigned char *out, + const unsigned char *in, + unsigned char *iv) ASM_FUNC_ABI; + +extern void _gcry_twofish_avx2_ocb_enc(const TWOFISH_context *ctx, + unsigned char *out, + const unsigned char *in, + unsigned char *offset, + unsigned char *checksum, + const u64 Ls[16]) ASM_FUNC_ABI; + +extern void _gcry_twofish_avx2_ocb_dec(const TWOFISH_context *ctx, + unsigned char *out, + const unsigned char *in, + unsigned char *offset, + unsigned char *checksum, + const u64 Ls[16]) ASM_FUNC_ABI; + +extern void _gcry_twofish_avx2_ocb_auth(const TWOFISH_context *ctx, + const unsigned char *abuf, + unsigned char *offset, + unsigned char *checksum, + const u64 Ls[16]) ASM_FUNC_ABI; +#endif + #ifdef USE_AMD64_ASM @@ -1111,6 +1191,31 @@ _gcry_twofish_ctr_enc(void *context, unsigned char *ctr, void *outbuf_arg, unsigned int burn, burn_stack_depth = 0; int i; +#ifdef USE_AVX2 + if (ctx->use_avx2) + { + int did_use_avx2 = 0; + + /* Process data in 16 block chunks. */ + while (nblocks >= 16) + { + _gcry_twofish_avx2_ctr_enc(ctx, outbuf, inbuf, ctr); + + nblocks -= 16; + outbuf += 16 * TWOFISH_BLOCKSIZE; + inbuf += 16 * TWOFISH_BLOCKSIZE; + did_use_avx2 = 1; + } + + if (did_use_avx2) + { + /* twofish-avx2 assembly code does not use stack */ + if (nblocks == 0) + burn_stack_depth = 0; + } + } +#endif + #ifdef USE_AMD64_ASM { /* Process data in 3 block chunks. */ @@ -1169,6 +1274,31 @@ _gcry_twofish_cbc_dec(void *context, unsigned char *iv, void *outbuf_arg, unsigned char savebuf[TWOFISH_BLOCKSIZE]; unsigned int burn, burn_stack_depth = 0; +#ifdef USE_AVX2 + if (ctx->use_avx2) + { + int did_use_avx2 = 0; + + /* Process data in 16 block chunks. */ + while (nblocks >= 16) + { + _gcry_twofish_avx2_cbc_dec(ctx, outbuf, inbuf, iv); + + nblocks -= 16; + outbuf += 16 * TWOFISH_BLOCKSIZE; + inbuf += 16 * TWOFISH_BLOCKSIZE; + did_use_avx2 = 1; + } + + if (did_use_avx2) + { + /* twofish-avx2 assembly code does not use stack */ + if (nblocks == 0) + burn_stack_depth = 0; + } + } +#endif + #ifdef USE_AMD64_ASM { /* Process data in 3 block chunks. */ @@ -1218,6 +1348,31 @@ _gcry_twofish_cfb_dec(void *context, unsigned char *iv, void *outbuf_arg, const unsigned char *inbuf = inbuf_arg; unsigned int burn, burn_stack_depth = 0; +#ifdef USE_AVX2 + if (ctx->use_avx2) + { + int did_use_avx2 = 0; + + /* Process data in 16 block chunks. */ + while (nblocks >= 16) + { + _gcry_twofish_avx2_cfb_dec(ctx, outbuf, inbuf, iv); + + nblocks -= 16; + outbuf += 16 * TWOFISH_BLOCKSIZE; + inbuf += 16 * TWOFISH_BLOCKSIZE; + did_use_avx2 = 1; + } + + if (did_use_avx2) + { + /* twofish-avx2 assembly code does not use stack */ + if (nblocks == 0) + burn_stack_depth = 0; + } + } +#endif + #ifdef USE_AMD64_ASM { /* Process data in 3 block chunks. */ @@ -1264,6 +1419,62 @@ _gcry_twofish_ocb_crypt (gcry_cipher_hd_t c, void *outbuf_arg, unsigned int burn, burn_stack_depth = 0; u64 blkn = c->u_mode.ocb.data_nblocks; +#ifdef USE_AVX2 + if (ctx->use_avx2) + { + int did_use_avx2 = 0; + u64 Ls[16]; + unsigned int n = 16 - (blkn % 16); + u64 *l; + int i; + + if (nblocks >= 16) + { + for (i = 0; i < 16; i += 8) + { + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + } + + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + l = &Ls[(15 + n) % 16]; + + /* Process data in 16 block chunks. */ + while (nblocks >= 16) + { + blkn += 16; + *l = (uintptr_t)(void *)ocb_get_l(c, blkn - blkn % 16); + + if (encrypt) + _gcry_twofish_avx2_ocb_enc(ctx, outbuf, inbuf, c->u_iv.iv, + c->u_ctr.ctr, Ls); + else + _gcry_twofish_avx2_ocb_dec(ctx, outbuf, inbuf, c->u_iv.iv, + c->u_ctr.ctr, Ls); + + nblocks -= 16; + outbuf += 16 * TWOFISH_BLOCKSIZE; + inbuf += 16 * TWOFISH_BLOCKSIZE; + did_use_avx2 = 1; + } + } + + if (did_use_avx2) + { + /* twofish-avx2 assembly code does not use stack */ + if (nblocks == 0) + burn_stack_depth = 0; + } + } +#endif + { /* Use u64 to store pointers for x32 support (assembly function * assumes 64-bit pointers). */ @@ -1321,6 +1532,59 @@ _gcry_twofish_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, unsigned int burn, burn_stack_depth = 0; u64 blkn = c->u_mode.ocb.aad_nblocks; +#ifdef USE_AVX2 + if (ctx->use_avx2) + { + int did_use_avx2 = 0; + u64 Ls[16]; + unsigned int n = 16 - (blkn % 16); + u64 *l; + int i; + + if (nblocks >= 16) + { + for (i = 0; i < 16; i += 8) + { + /* Use u64 to store pointers for x32 support (assembly function + * assumes 64-bit pointers). */ + Ls[(i + 0 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 1 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 2 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 3 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[2]; + Ls[(i + 4 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + Ls[(i + 5 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[1]; + Ls[(i + 6 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[0]; + } + + Ls[(7 + n) % 16] = (uintptr_t)(void *)c->u_mode.ocb.L[3]; + l = &Ls[(15 + n) % 16]; + + /* Process data in 16 block chunks. */ + while (nblocks >= 16) + { + blkn += 16; + *l = (uintptr_t)(void *)ocb_get_l(c, blkn - blkn % 16); + + _gcry_twofish_avx2_ocb_auth(ctx, abuf, c->u_mode.ocb.aad_offset, + c->u_mode.ocb.aad_sum, Ls); + + nblocks -= 16; + abuf += 16 * TWOFISH_BLOCKSIZE; + did_use_avx2 = 1; + } + } + + if (did_use_avx2) + { + /* twofish-avx2 assembly code does not use stack */ + if (nblocks == 0) + burn_stack_depth = 0; + } + + /* Use generic code to handle smaller chunks... */ + } +#endif + { /* Use u64 to store pointers for x32 support (assembly function * assumes 64-bit pointers). */ @@ -1367,7 +1631,7 @@ _gcry_twofish_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, static const char * selftest_ctr (void) { - const int nblocks = 3+1; + const int nblocks = 16+1; const int blocksize = TWOFISH_BLOCKSIZE; const int context_size = sizeof(TWOFISH_context); @@ -1381,7 +1645,7 @@ selftest_ctr (void) static const char * selftest_cbc (void) { - const int nblocks = 3+2; + const int nblocks = 16+2; const int blocksize = TWOFISH_BLOCKSIZE; const int context_size = sizeof(TWOFISH_context); @@ -1395,7 +1659,7 @@ selftest_cbc (void) static const char * selftest_cfb (void) { - const int nblocks = 3+2; + const int nblocks = 16+2; const int blocksize = TWOFISH_BLOCKSIZE; const int context_size = sizeof(TWOFISH_context); diff --git a/configure.ac b/configure.ac index 91562a9..4932786 100644 --- a/configure.ac +++ b/configure.ac @@ -2070,6 +2070,11 @@ if test "$found" = "1" ; then x86_64-*-*) # Build with the assembly implementation GCRYPT_CIPHERS="$GCRYPT_CIPHERS twofish-amd64.lo" + + if test x"$avx2support" = xyes ; then + # Build with the AVX2 implementation + GCRYPT_CIPHERS="$GCRYPT_CIPHERS twofish-avx2-amd64.lo" + fi ;; arm*-*-*) # Build with the assembly implementation diff --git a/src/g10lib.h b/src/g10lib.h index f0a4628..1308cff 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -196,27 +196,28 @@ char **_gcry_strtokenize (const char *string, const char *delim); /*-- src/hwfeatures.c --*/ -#define HWF_PADLOCK_RNG (1 << 0) -#define HWF_PADLOCK_AES (1 << 1) -#define HWF_PADLOCK_SHA (1 << 2) -#define HWF_PADLOCK_MMUL (1 << 3) - -#define HWF_INTEL_CPU (1 << 4) -#define HWF_INTEL_FAST_SHLD (1 << 5) -#define HWF_INTEL_BMI2 (1 << 6) -#define HWF_INTEL_SSSE3 (1 << 7) -#define HWF_INTEL_SSE4_1 (1 << 8) -#define HWF_INTEL_PCLMUL (1 << 9) -#define HWF_INTEL_AESNI (1 << 10) -#define HWF_INTEL_RDRAND (1 << 11) -#define HWF_INTEL_AVX (1 << 12) -#define HWF_INTEL_AVX2 (1 << 13) - -#define HWF_ARM_NEON (1 << 14) -#define HWF_ARM_AES (1 << 15) -#define HWF_ARM_SHA1 (1 << 16) -#define HWF_ARM_SHA2 (1 << 17) -#define HWF_ARM_PMULL (1 << 18) +#define HWF_PADLOCK_RNG (1 << 0) +#define HWF_PADLOCK_AES (1 << 1) +#define HWF_PADLOCK_SHA (1 << 2) +#define HWF_PADLOCK_MMUL (1 << 3) + +#define HWF_INTEL_CPU (1 << 4) +#define HWF_INTEL_FAST_SHLD (1 << 5) +#define HWF_INTEL_BMI2 (1 << 6) +#define HWF_INTEL_SSSE3 (1 << 7) +#define HWF_INTEL_SSE4_1 (1 << 8) +#define HWF_INTEL_PCLMUL (1 << 9) +#define HWF_INTEL_AESNI (1 << 10) +#define HWF_INTEL_RDRAND (1 << 11) +#define HWF_INTEL_AVX (1 << 12) +#define HWF_INTEL_AVX2 (1 << 13) +#define HWF_INTEL_FAST_VPGATHER (1 << 14) + +#define HWF_ARM_NEON (1 << 15) +#define HWF_ARM_AES (1 << 16) +#define HWF_ARM_SHA1 (1 << 17) +#define HWF_ARM_SHA2 (1 << 18) +#define HWF_ARM_PMULL (1 << 19) gpg_err_code_t _gcry_disable_hw_feature (const char *name); diff --git a/src/hwf-x86.c b/src/hwf-x86.c index eeacccb..a746ab2 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -176,6 +176,7 @@ detect_x86_gnuc (void) unsigned int max_cpuid_level; unsigned int fms, family, model; unsigned int result = 0; + unsigned int avoid_vpgather = 0; (void)os_supports_avx_avx2_registers; @@ -262,11 +263,33 @@ detect_x86_gnuc (void) case 0x47: case 0x4E: case 0x5E: + case 0x8E: + case 0x9E: case 0x55: case 0x66: result |= HWF_INTEL_FAST_SHLD; break; } + + /* These Intel Core processors that have AVX2 have slow VPGATHER and + * should be avoided for table-lookup use. */ + switch (model) + { + case 0x3C: + case 0x3F: + case 0x45: + case 0x46: + /* Haswell */ + avoid_vpgather |= 1; + break; + } + } + else + { + /* Avoid VPGATHER for non-Intel CPUs as testing is needed to + * make sure it is fast enough. */ + + avoid_vpgather |= 1; } #ifdef ENABLE_PCLMUL_SUPPORT @@ -324,6 +347,9 @@ detect_x86_gnuc (void) if (features & 0x00000020) if (os_supports_avx_avx2_registers) result |= HWF_INTEL_AVX2; + + if ((result & HWF_INTEL_AVX2) && !avoid_vpgather) + result |= HWF_INTEL_FAST_VPGATHER; #endif /*ENABLE_AVX_SUPPORT*/ } diff --git a/src/hwfeatures.c b/src/hwfeatures.c index 82f8bf2..b2ae7c3 100644 --- a/src/hwfeatures.c +++ b/src/hwfeatures.c @@ -42,25 +42,26 @@ static struct const char *desc; } hwflist[] = { - { HWF_PADLOCK_RNG, "padlock-rng" }, - { HWF_PADLOCK_AES, "padlock-aes" }, - { HWF_PADLOCK_SHA, "padlock-sha" }, - { HWF_PADLOCK_MMUL, "padlock-mmul"}, - { HWF_INTEL_CPU, "intel-cpu" }, - { HWF_INTEL_FAST_SHLD, "intel-fast-shld" }, - { HWF_INTEL_BMI2, "intel-bmi2" }, - { HWF_INTEL_SSSE3, "intel-ssse3" }, - { HWF_INTEL_SSE4_1, "intel-sse4.1" }, - { HWF_INTEL_PCLMUL, "intel-pclmul" }, - { HWF_INTEL_AESNI, "intel-aesni" }, - { HWF_INTEL_RDRAND, "intel-rdrand" }, - { HWF_INTEL_AVX, "intel-avx" }, - { HWF_INTEL_AVX2, "intel-avx2" }, - { HWF_ARM_NEON, "arm-neon" }, - { HWF_ARM_AES, "arm-aes" }, - { HWF_ARM_SHA1, "arm-sha1" }, - { HWF_ARM_SHA2, "arm-sha2" }, - { HWF_ARM_PMULL, "arm-pmull" } + { HWF_PADLOCK_RNG, "padlock-rng" }, + { HWF_PADLOCK_AES, "padlock-aes" }, + { HWF_PADLOCK_SHA, "padlock-sha" }, + { HWF_PADLOCK_MMUL, "padlock-mmul"}, + { HWF_INTEL_CPU, "intel-cpu" }, + { HWF_INTEL_FAST_SHLD, "intel-fast-shld" }, + { HWF_INTEL_BMI2, "intel-bmi2" }, + { HWF_INTEL_SSSE3, "intel-ssse3" }, + { HWF_INTEL_SSE4_1, "intel-sse4.1" }, + { HWF_INTEL_PCLMUL, "intel-pclmul" }, + { HWF_INTEL_AESNI, "intel-aesni" }, + { HWF_INTEL_RDRAND, "intel-rdrand" }, + { HWF_INTEL_AVX, "intel-avx" }, + { HWF_INTEL_AVX2, "intel-avx2" }, + { HWF_INTEL_FAST_VPGATHER, "intel-fast-vpgather" }, + { HWF_ARM_NEON, "arm-neon" }, + { HWF_ARM_AES, "arm-aes" }, + { HWF_ARM_SHA1, "arm-sha1" }, + { HWF_ARM_SHA2, "arm-sha2" }, + { HWF_ARM_PMULL, "arm-pmull" } }; /* A bit vector with the hardware features which shall not be used. From smueller at chronox.de Wed Jan 4 17:09:31 2017 From: smueller at chronox.de (Stephan =?ISO-8859-1?Q?M=FCller?=) Date: Wed, 04 Jan 2017 17:09:31 +0100 Subject: [PATCH] Add XTS cipher mode In-Reply-To: <148354290694.7561.5683091455706158460.stgit@localhost6.localdomain6> References: <148354290694.7561.5683091455706158460.stgit@localhost6.localdomain6> Message-ID: <1751051.oe8UdD1sG7@tauon.atsec.com> Am Mittwoch, 4. Januar 2017, 17:15:06 CET schrieb Jussi Kivilinna: Hi Jussi, > + case GCRY_CIPHER_MODE_XTS: > + /* Setup tweak cipher with second part of XTS key. */ > + rc = c->spec->setkey (c->u_mode.xts.tweak_context, key + keylen, > + keylen); > + if (!rc) > + { > + /* Duplicate initial tweak context. */ > + memcpy (c->u_mode.xts.tweak_context + c->spec->contextsize, > + c->u_mode.xts.tweak_context, c->spec->contextsize); > + } > + else > + c->marks.key = 0; > + break; > + As libgcrypt is intended to be used in FIPS 140-2 context, can you please add a check that the key and tweak key are not identical? If they are, setkey should fail. See https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/ tree/include/crypto/xts.h#n43 for an example code. Thanks Stephan From jussi.kivilinna at iki.fi Wed Jan 4 21:42:14 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 4 Jan 2017 22:42:14 +0200 Subject: [PATCH] Add XTS cipher mode In-Reply-To: <1751051.oe8UdD1sG7@tauon.atsec.com> References: <148354290694.7561.5683091455706158460.stgit@localhost6.localdomain6> <1751051.oe8UdD1sG7@tauon.atsec.com> Message-ID: On 04.01.2017 18:09, Stephan M?ller wrote: > Am Mittwoch, 4. Januar 2017, 17:15:06 CET schrieb Jussi Kivilinna: > > Hi Jussi, > >> + case GCRY_CIPHER_MODE_XTS: >> + /* Setup tweak cipher with second part of XTS key. */ >> + rc = c->spec->setkey (c->u_mode.xts.tweak_context, key + keylen, >> + keylen); >> + if (!rc) >> + { >> + /* Duplicate initial tweak context. */ >> + memcpy (c->u_mode.xts.tweak_context + c->spec->contextsize, >> + c->u_mode.xts.tweak_context, c->spec->contextsize); >> + } >> + else >> + c->marks.key = 0; >> + break; >> + > > As libgcrypt is intended to be used in FIPS 140-2 context, can you please add > a check that the key and tweak key are not identical? If they are, setkey > should fail. Sure, I'll add check that is active in FIPS mode. -Jussi From jussi.kivilinna at iki.fi Wed Jan 4 23:35:58 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 05 Jan 2017 00:35:58 +0200 Subject: [PATCH] mpi: amd64: fix too large jump alignment in mpih-rshift Message-ID: <148356935841.19209.3011828834131755094.stgit@localhost6.localdomain6> * mpi/amd64/mpih-rshift.S (_gcry_mpih_rshift): Use 16-byte alignment with 'ALIGN(4)' instead of 256-byte. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/mpi/amd64/mpih-rshift.S b/mpi/amd64/mpih-rshift.S index 311b85b..7bd5942 100644 --- a/mpi/amd64/mpih-rshift.S +++ b/mpi/amd64/mpih-rshift.S @@ -57,7 +57,7 @@ C_SYMBOL_NAME(_gcry_mpih_rshift:) addq $2, %rdx jg .Lendo - ALIGN(8) /* minimal alignment for claimed speed */ + ALIGN(4) /* minimal alignment for claimed speed */ .Loop: movq -8(%rsi,%rdx,8), %mm6 movq %mm6, %mm2 psllq %mm0, %mm6 From jussi.kivilinna at iki.fi Wed Jan 4 23:35:42 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Thu, 05 Jan 2017 00:35:42 +0200 Subject: [PATCH] rijndael-ssse3: move assembly functions to separate source-file Message-ID: <148356934217.19150.7143664773806268178.stgit@localhost6.localdomain6> * cipher/Makefile.am: Add 'rinjdael-ssse3-amd64-asm.S'. * cipher/rinjdael-ssse3-amd64-asm.S: Moved assembly functions here ... * cipher/rinjdael-ssse3-amd64.c: ... from this file. (_gcry_aes_ssse3_enc_preload, _gcry_aes_ssse3_dec_preload) (_gcry_aes_ssse3_shedule_core, _gcry_aes_ssse3_encrypt_core) (_gcry_aes_ssse3_decrypt_core): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec) (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption) (do_vpaes_ssse3_enc, do_vpaes_ssse3_dec): Update to use external assembly functions; remove 'aes_const_ptr' variable usage. (_gcry_aes_ssse3_encrypt, _gcry_aes_ssse3_decrypt) (_gcry_aes_ssse3_cfb_enc, _gcry_aes_ssse3_cbc_enc) (_gcry_aes_ssse3_ctr_enc, _gcry_aes_ssse3_cfb_dec) (_gcry_aes_ssse3_cbc_dec, ssse3_ocb_enc, ssse3_ocb_dec) (_gcry_aes_ssse3_ocb_auth): Remove 'aes_const_ptr' variable usage. * configure.ac: Add 'rinjdael-ssse3-amd64-asm.lo'. -- After this change, libgcrypt can be compiled with -flto optimization enabled on x86-64. Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/Makefile.am b/cipher/Makefile.am index 8c9fc0e..fb0b7d2 100644 --- a/cipher/Makefile.am +++ b/cipher/Makefile.am @@ -80,7 +80,8 @@ md4.c \ md5.c \ poly1305-sse2-amd64.S poly1305-avx2-amd64.S poly1305-armv7-neon.S \ rijndael.c rijndael-internal.h rijndael-tables.h rijndael-aesni.c \ - rijndael-padlock.c rijndael-amd64.S rijndael-arm.S rijndael-ssse3-amd64.c \ + rijndael-padlock.c rijndael-amd64.S rijndael-arm.S \ + rijndael-ssse3-amd64.c rijndael-ssse3-amd64-asm.S \ rijndael-armv8-ce.c rijndael-armv8-aarch32-ce.S rijndael-armv8-aarch64-ce.S \ rijndael-aarch64.S \ rmd160.c \ diff --git a/cipher/rijndael-ssse3-amd64-asm.S b/cipher/rijndael-ssse3-amd64-asm.S new file mode 100644 index 0000000..3ae55e8 --- /dev/null +++ b/cipher/rijndael-ssse3-amd64-asm.S @@ -0,0 +1,853 @@ +/* SSSE3 vector permutation AES for Libgcrypt + * Copyright (C) 2014-2017 Jussi Kivilinna + * + * This file is part of Libgcrypt. + * + * Libgcrypt is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * Libgcrypt is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this program; if not, see . + * + * + * The code is based on the public domain library libvpaes version 0.5 + * available at http://crypto.stanford.edu/vpaes/ and which carries + * this notice: + * + * libvpaes: constant-time SSSE3 AES encryption and decryption. + * version 0.5 + * + * By Mike Hamburg, Stanford University, 2009. Public domain. + * I wrote essentially all of this code. I did not write the test + * vectors; they are the NIST known answer tests. I hereby release all + * the code and documentation here that I wrote into the public domain. + * + * This is an implementation of AES following my paper, + * "Accelerating AES with Vector Permute Instructions + * CHES 2009; http://shiftleft.org/papers/vector_aes/ + */ + +#if defined(__x86_64__) +#include +#if defined(HAVE_GCC_INLINE_ASM_SSSE3) && \ + (defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS) || \ + defined(HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS)) + +#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS +# define ELF(...) +#else +# define ELF(...) __VA_ARGS__ +#endif + +.text + +## +## _gcry_aes_ssse3_enc_preload +## +ELF(.type _gcry_aes_ssse3_enc_preload, at function) +.globl _gcry_aes_ssse3_enc_preload +_gcry_aes_ssse3_enc_preload: + lea .Laes_consts(%rip), %rax + movdqa (%rax), %xmm9 # 0F + movdqa .Lk_inv (%rax), %xmm10 # inv + movdqa .Lk_inv+16(%rax), %xmm11 # inva + movdqa .Lk_sb1 (%rax), %xmm13 # sb1u + movdqa .Lk_sb1+16(%rax), %xmm12 # sb1t + movdqa .Lk_sb2 (%rax), %xmm15 # sb2u + movdqa .Lk_sb2+16(%rax), %xmm14 # sb2t + ret +ELF(.size _gcry_aes_ssse3_enc_preload,.-_gcry_aes_ssse3_enc_preload) + +## +## _gcry_aes_ssse3_dec_preload +## +ELF(.type _gcry_aes_ssse3_dec_preload, at function) +.globl _gcry_aes_ssse3_dec_preload +_gcry_aes_ssse3_dec_preload: + lea .Laes_consts(%rip), %rax + movdqa (%rax), %xmm9 # 0F + movdqa .Lk_inv (%rax), %xmm10 # inv + movdqa .Lk_inv+16(%rax), %xmm11 # inva + movdqa .Lk_dsb9 (%rax), %xmm13 # sb9u + movdqa .Lk_dsb9+16(%rax), %xmm12 # sb9t + movdqa .Lk_dsbd (%rax), %xmm15 # sbdu + movdqa .Lk_dsbb (%rax), %xmm14 # sbbu + movdqa .Lk_dsbe (%rax), %xmm8 # sbeu + ret +ELF(.size _gcry_aes_ssse3_dec_preload,.-_gcry_aes_ssse3_dec_preload) + +## +## Constant-time SSSE3 AES core implementation. +## +## By Mike Hamburg (Stanford University), 2009 +## Public domain. +## + +## +## _aes_encrypt_core +## +## AES-encrypt %xmm0. +## +## Inputs: +## %xmm0 = input +## %xmm9-%xmm15 as in .Laes_preheat +## (%rdx) = scheduled keys +## %rax = nrounds - 1 +## +## Output in %xmm0 +## Clobbers %xmm1-%xmm4, %r9, %r11, %rax, %rcx +## Preserves %xmm6 - %xmm7 so you get some local vectors +## +## +.align 16 +ELF(.type _gcry_aes_ssse3_encrypt_core, at function) +.globl _gcry_aes_ssse3_encrypt_core +_gcry_aes_ssse3_encrypt_core: +_aes_encrypt_core: + lea .Laes_consts(%rip), %rcx + leaq .Lk_mc_backward(%rcx), %rdi + mov $16, %rsi + movdqa .Lk_ipt (%rcx), %xmm2 # iptlo + movdqa %xmm9, %xmm1 + pandn %xmm0, %xmm1 + psrld $4, %xmm1 + pand %xmm9, %xmm0 + pshufb %xmm0, %xmm2 + movdqa .Lk_ipt+16(%rcx), %xmm0 # ipthi + pshufb %xmm1, %xmm0 + pxor (%rdx),%xmm2 + pxor %xmm2, %xmm0 + add $16, %rdx + jmp .Laes_entry + +.align 8 +.Laes_loop: + # middle of middle round + movdqa %xmm13, %xmm4 # 4 : sb1u + pshufb %xmm2, %xmm4 # 4 = sb1u + pxor (%rdx), %xmm4 # 4 = sb1u + k + movdqa %xmm12, %xmm0 # 0 : sb1t + pshufb %xmm3, %xmm0 # 0 = sb1t + pxor %xmm4, %xmm0 # 0 = A + movdqa %xmm15, %xmm4 # 4 : sb2u + pshufb %xmm2, %xmm4 # 4 = sb2u + movdqa .Lk_mc_forward-.Lk_mc_backward(%rsi,%rdi), %xmm1 + movdqa %xmm14, %xmm2 # 2 : sb2t + pshufb %xmm3, %xmm2 # 2 = sb2t + pxor %xmm4, %xmm2 # 2 = 2A + movdqa %xmm0, %xmm3 # 3 = A + pshufb %xmm1, %xmm0 # 0 = B + pxor %xmm2, %xmm0 # 0 = 2A+B + pshufb (%rsi,%rdi), %xmm3 # 3 = D + lea 16(%esi),%esi # next mc + pxor %xmm0, %xmm3 # 3 = 2A+B+D + lea 16(%rdx),%rdx # next key + pshufb %xmm1, %xmm0 # 0 = 2B+C + pxor %xmm3, %xmm0 # 0 = 2A+3B+C+D + and $48, %rsi # ... mod 4 + dec %rax # nr-- + +.Laes_entry: + # top of round + movdqa %xmm9, %xmm1 # 1 : i + pandn %xmm0, %xmm1 # 1 = i<<4 + psrld $4, %xmm1 # 1 = i + pand %xmm9, %xmm0 # 0 = k + movdqa %xmm11, %xmm2 # 2 : a/k + pshufb %xmm0, %xmm2 # 2 = a/k + pxor %xmm1, %xmm0 # 0 = j + movdqa %xmm10, %xmm3 # 3 : 1/i + pshufb %xmm1, %xmm3 # 3 = 1/i + pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k + movdqa %xmm10, %xmm4 # 4 : 1/j + pshufb %xmm0, %xmm4 # 4 = 1/j + pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k + movdqa %xmm10, %xmm2 # 2 : 1/iak + pshufb %xmm3, %xmm2 # 2 = 1/iak + pxor %xmm0, %xmm2 # 2 = io + movdqa %xmm10, %xmm3 # 3 : 1/jak + pshufb %xmm4, %xmm3 # 3 = 1/jak + pxor %xmm1, %xmm3 # 3 = jo + jnz .Laes_loop + + # middle of last round + movdqa .Lk_sbo(%rcx), %xmm4 # 3 : sbou + pshufb %xmm2, %xmm4 # 4 = sbou + pxor (%rdx), %xmm4 # 4 = sb1u + k + movdqa .Lk_sbo+16(%rcx), %xmm0 # 0 : sbot + pshufb %xmm3, %xmm0 # 0 = sb1t + pxor %xmm4, %xmm0 # 0 = A + pshufb .Lk_sr(%rsi,%rcx), %xmm0 + ret +ELF(.size _aes_encrypt_core,.-_aes_encrypt_core) + +## +## Decryption core +## +## Same API as encryption core. +## +.align 16 +.globl _gcry_aes_ssse3_decrypt_core +ELF(.type _gcry_aes_ssse3_decrypt_core, at function) +_gcry_aes_ssse3_decrypt_core: +_aes_decrypt_core: + lea .Laes_consts(%rip), %rcx + movl %eax, %esi + shll $4, %esi + xorl $48, %esi + andl $48, %esi + movdqa .Lk_dipt (%rcx), %xmm2 # iptlo + movdqa %xmm9, %xmm1 + pandn %xmm0, %xmm1 + psrld $4, %xmm1 + pand %xmm9, %xmm0 + pshufb %xmm0, %xmm2 + movdqa .Lk_dipt+16(%rcx), %xmm0 # ipthi + pshufb %xmm1, %xmm0 + pxor (%rdx), %xmm2 + pxor %xmm2, %xmm0 + movdqa .Lk_mc_forward+48(%rcx), %xmm5 + lea 16(%rdx), %rdx + neg %rax + jmp .Laes_dec_entry + +.align 16 +.Laes_dec_loop: +## +## Inverse mix columns +## + movdqa %xmm13, %xmm4 # 4 : sb9u + pshufb %xmm2, %xmm4 # 4 = sb9u + pxor (%rdx), %xmm4 + movdqa %xmm12, %xmm0 # 0 : sb9t + pshufb %xmm3, %xmm0 # 0 = sb9t + movdqa .Lk_dsbd+16(%rcx),%xmm1 # 1 : sbdt + pxor %xmm4, %xmm0 # 0 = ch + lea 16(%rdx), %rdx # next round key + + pshufb %xmm5, %xmm0 # MC ch + movdqa %xmm15, %xmm4 # 4 : sbdu + pshufb %xmm2, %xmm4 # 4 = sbdu + pxor %xmm0, %xmm4 # 4 = ch + pshufb %xmm3, %xmm1 # 1 = sbdt + pxor %xmm4, %xmm1 # 1 = ch + + pshufb %xmm5, %xmm1 # MC ch + movdqa %xmm14, %xmm4 # 4 : sbbu + pshufb %xmm2, %xmm4 # 4 = sbbu + inc %rax # nr-- + pxor %xmm1, %xmm4 # 4 = ch + movdqa .Lk_dsbb+16(%rcx),%xmm0 # 0 : sbbt + pshufb %xmm3, %xmm0 # 0 = sbbt + pxor %xmm4, %xmm0 # 0 = ch + + pshufb %xmm5, %xmm0 # MC ch + movdqa %xmm8, %xmm4 # 4 : sbeu + pshufb %xmm2, %xmm4 # 4 = sbeu + pshufd $0x93, %xmm5, %xmm5 + pxor %xmm0, %xmm4 # 4 = ch + movdqa .Lk_dsbe+16(%rcx),%xmm0 # 0 : sbet + pshufb %xmm3, %xmm0 # 0 = sbet + pxor %xmm4, %xmm0 # 0 = ch + +.Laes_dec_entry: + # top of round + movdqa %xmm9, %xmm1 # 1 : i + pandn %xmm0, %xmm1 # 1 = i<<4 + psrld $4, %xmm1 # 1 = i + pand %xmm9, %xmm0 # 0 = k + movdqa %xmm11, %xmm2 # 2 : a/k + pshufb %xmm0, %xmm2 # 2 = a/k + pxor %xmm1, %xmm0 # 0 = j + movdqa %xmm10, %xmm3 # 3 : 1/i + pshufb %xmm1, %xmm3 # 3 = 1/i + pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k + movdqa %xmm10, %xmm4 # 4 : 1/j + pshufb %xmm0, %xmm4 # 4 = 1/j + pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k + movdqa %xmm10, %xmm2 # 2 : 1/iak + pshufb %xmm3, %xmm2 # 2 = 1/iak + pxor %xmm0, %xmm2 # 2 = io + movdqa %xmm10, %xmm3 # 3 : 1/jak + pshufb %xmm4, %xmm3 # 3 = 1/jak + pxor %xmm1, %xmm3 # 3 = jo + jnz .Laes_dec_loop + + # middle of last round + movdqa .Lk_dsbo(%rcx), %xmm4 # 3 : sbou + pshufb %xmm2, %xmm4 # 4 = sbou + pxor (%rdx), %xmm4 # 4 = sb1u + k + movdqa .Lk_dsbo+16(%rcx), %xmm0 # 0 : sbot + pshufb %xmm3, %xmm0 # 0 = sb1t + pxor %xmm4, %xmm0 # 0 = A + pshufb .Lk_sr(%rsi,%rcx), %xmm0 + ret +ELF(.size _aes_decrypt_core,.-_aes_decrypt_core) + +######################################################## +## ## +## AES key schedule ## +## ## +######################################################## + +.align 16 +.globl _gcry_aes_ssse3_schedule_core +ELF(.type _gcry_aes_ssse3_schedule_core, at function) +_gcry_aes_ssse3_schedule_core: +_aes_schedule_core: + # rdi = key + # rsi = size in bits + # rdx = buffer + # rcx = direction. 0=encrypt, 1=decrypt + + # load the tables + lea .Laes_consts(%rip), %r10 + movdqa (%r10), %xmm9 # 0F + movdqa .Lk_inv (%r10), %xmm10 # inv + movdqa .Lk_inv+16(%r10), %xmm11 # inva + movdqa .Lk_sb1 (%r10), %xmm13 # sb1u + movdqa .Lk_sb1+16(%r10), %xmm12 # sb1t + movdqa .Lk_sb2 (%r10), %xmm15 # sb2u + movdqa .Lk_sb2+16(%r10), %xmm14 # sb2t + + movdqa .Lk_rcon(%r10), %xmm8 # load rcon + movdqu (%rdi), %xmm0 # load key (unaligned) + + # input transform + movdqu %xmm0, %xmm3 + lea .Lk_ipt(%r10), %r11 + call .Laes_schedule_transform + movdqu %xmm0, %xmm7 + + test %rcx, %rcx + jnz .Laes_schedule_am_decrypting + + # encrypting, output zeroth round key after transform + movdqa %xmm0, (%rdx) + jmp .Laes_schedule_go + +.Laes_schedule_am_decrypting: + # decrypting, output zeroth round key after shiftrows + pshufb .Lk_sr(%r8,%r10),%xmm3 + movdqa %xmm3, (%rdx) + xor $48, %r8 + +.Laes_schedule_go: + cmp $192, %rsi + je .Laes_schedule_192 + cmp $256, %rsi + je .Laes_schedule_256 + # 128: fall though + +## +## .Laes_schedule_128 +## +## 128-bit specific part of key schedule. +## +## This schedule is really simple, because all its parts +## are accomplished by the subroutines. +## +.Laes_schedule_128: + mov $10, %rsi + +.Laes_schedule_128_L: + call .Laes_schedule_round + dec %rsi + jz .Laes_schedule_mangle_last + call .Laes_schedule_mangle # write output + jmp .Laes_schedule_128_L + +## +## .Laes_schedule_192 +## +## 192-bit specific part of key schedule. +## +## The main body of this schedule is the same as the 128-bit +## schedule, but with more smearing. The long, high side is +## stored in %xmm7 as before, and the short, low side is in +## the high bits of %xmm6. +## +## This schedule is somewhat nastier, however, because each +## round produces 192 bits of key material, or 1.5 round keys. +## Therefore, on each cycle we do 2 rounds and produce 3 round +## keys. +## +.Laes_schedule_192: + movdqu 8(%rdi),%xmm0 # load key part 2 (very unaligned) + call .Laes_schedule_transform # input transform + pshufd $0x0E, %xmm0, %xmm6 + pslldq $8, %xmm6 # clobber low side with zeros + mov $4, %rsi + +.Laes_schedule_192_L: + call .Laes_schedule_round + palignr $8,%xmm6,%xmm0 + call .Laes_schedule_mangle # save key n + call .Laes_schedule_192_smear + call .Laes_schedule_mangle # save key n+1 + call .Laes_schedule_round + dec %rsi + jz .Laes_schedule_mangle_last + call .Laes_schedule_mangle # save key n+2 + call .Laes_schedule_192_smear + jmp .Laes_schedule_192_L + +## +## .Laes_schedule_192_smear +## +## Smear the short, low side in the 192-bit key schedule. +## +## Inputs: +## %xmm7: high side, b a x y +## %xmm6: low side, d c 0 0 +## %xmm13: 0 +## +## Outputs: +## %xmm6: b+c+d b+c 0 0 +## %xmm0: b+c+d b+c b a +## +.Laes_schedule_192_smear: + pshufd $0x80, %xmm6, %xmm0 # d c 0 0 -> c 0 0 0 + pxor %xmm0, %xmm6 # -> c+d c 0 0 + pshufd $0xFE, %xmm7, %xmm0 # b a _ _ -> b b b a + pxor %xmm6, %xmm0 # -> b+c+d b+c b a + pshufd $0x0E, %xmm0, %xmm6 + pslldq $8, %xmm6 # clobber low side with zeros + ret + +## +## .Laes_schedule_256 +## +## 256-bit specific part of key schedule. +## +## The structure here is very similar to the 128-bit +## schedule, but with an additional 'low side' in +## %xmm6. The low side's rounds are the same as the +## high side's, except no rcon and no rotation. +## +.Laes_schedule_256: + movdqu 16(%rdi),%xmm0 # load key part 2 (unaligned) + call .Laes_schedule_transform # input transform + mov $7, %rsi + +.Laes_schedule_256_L: + call .Laes_schedule_mangle # output low result + movdqa %xmm0, %xmm6 # save cur_lo in xmm6 + + # high round + call .Laes_schedule_round + dec %rsi + jz .Laes_schedule_mangle_last + call .Laes_schedule_mangle + + # low round. swap xmm7 and xmm6 + pshufd $0xFF, %xmm0, %xmm0 + movdqa %xmm7, %xmm5 + movdqa %xmm6, %xmm7 + call .Laes_schedule_low_round + movdqa %xmm5, %xmm7 + + jmp .Laes_schedule_256_L + +## +## .Laes_schedule_round +## +## Runs one main round of the key schedule on %xmm0, %xmm7 +## +## Specifically, runs subbytes on the high dword of %xmm0 +## then rotates it by one byte and xors into the low dword of +## %xmm7. +## +## Adds rcon from low byte of %xmm8, then rotates %xmm8 for +## next rcon. +## +## Smears the dwords of %xmm7 by xoring the low into the +## second low, result into third, result into highest. +## +## Returns results in %xmm7 = %xmm0. +## Clobbers %xmm1-%xmm4, %r11. +## +.Laes_schedule_round: + # extract rcon from xmm8 + pxor %xmm1, %xmm1 + palignr $15, %xmm8, %xmm1 + palignr $15, %xmm8, %xmm8 + pxor %xmm1, %xmm7 + + # rotate + pshufd $0xFF, %xmm0, %xmm0 + palignr $1, %xmm0, %xmm0 + + # fall through... + + # low round: same as high round, but no rotation and no rcon. +.Laes_schedule_low_round: + # smear xmm7 + movdqa %xmm7, %xmm1 + pslldq $4, %xmm7 + pxor %xmm1, %xmm7 + movdqa %xmm7, %xmm1 + pslldq $8, %xmm7 + pxor %xmm1, %xmm7 + pxor .Lk_s63(%r10), %xmm7 + + # subbytes + movdqa %xmm9, %xmm1 + pandn %xmm0, %xmm1 + psrld $4, %xmm1 # 1 = i + pand %xmm9, %xmm0 # 0 = k + movdqa %xmm11, %xmm2 # 2 : a/k + pshufb %xmm0, %xmm2 # 2 = a/k + pxor %xmm1, %xmm0 # 0 = j + movdqa %xmm10, %xmm3 # 3 : 1/i + pshufb %xmm1, %xmm3 # 3 = 1/i + pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k + movdqa %xmm10, %xmm4 # 4 : 1/j + pshufb %xmm0, %xmm4 # 4 = 1/j + pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k + movdqa %xmm10, %xmm2 # 2 : 1/iak + pshufb %xmm3, %xmm2 # 2 = 1/iak + pxor %xmm0, %xmm2 # 2 = io + movdqa %xmm10, %xmm3 # 3 : 1/jak + pshufb %xmm4, %xmm3 # 3 = 1/jak + pxor %xmm1, %xmm3 # 3 = jo + movdqa .Lk_sb1(%r10), %xmm4 # 4 : sbou + pshufb %xmm2, %xmm4 # 4 = sbou + movdqa .Lk_sb1+16(%r10), %xmm0 # 0 : sbot + pshufb %xmm3, %xmm0 # 0 = sb1t + pxor %xmm4, %xmm0 # 0 = sbox output + + # add in smeared stuff + pxor %xmm7, %xmm0 + movdqa %xmm0, %xmm7 + ret + +## +## .Laes_schedule_transform +## +## Linear-transform %xmm0 according to tables at (%r11) +## +## Requires that %xmm9 = 0x0F0F... as in preheat +## Output in %xmm0 +## Clobbers %xmm1, %xmm2 +## +.Laes_schedule_transform: + movdqa %xmm9, %xmm1 + pandn %xmm0, %xmm1 + psrld $4, %xmm1 + pand %xmm9, %xmm0 + movdqa (%r11), %xmm2 # lo + pshufb %xmm0, %xmm2 + movdqa 16(%r11), %xmm0 # hi + pshufb %xmm1, %xmm0 + pxor %xmm2, %xmm0 + ret + +## +## .Laes_schedule_mangle +## +## Mangle xmm0 from (basis-transformed) standard version +## to our version. +## +## On encrypt, +## xor with 0x63 +## multiply by circulant 0,1,1,1 +## apply shiftrows transform +## +## On decrypt, +## xor with 0x63 +## multiply by 'inverse mixcolumns' circulant E,B,D,9 +## deskew +## apply shiftrows transform +## +## +## Writes out to (%rdx), and increments or decrements it +## Keeps track of round number mod 4 in %r8 +## Preserves xmm0 +## Clobbers xmm1-xmm5 +## +.Laes_schedule_mangle: + movdqa %xmm0, %xmm4 # save xmm0 for later + movdqa .Lk_mc_forward(%r10),%xmm5 + test %rcx, %rcx + jnz .Laes_schedule_mangle_dec + + # encrypting + add $16, %rdx + pxor .Lk_s63(%r10),%xmm4 + pshufb %xmm5, %xmm4 + movdqa %xmm4, %xmm3 + pshufb %xmm5, %xmm4 + pxor %xmm4, %xmm3 + pshufb %xmm5, %xmm4 + pxor %xmm4, %xmm3 + + jmp .Laes_schedule_mangle_both + +.Laes_schedule_mangle_dec: + lea .Lk_dks_1(%r10), %r11 # first table: *9 + call .Laes_schedule_transform + movdqa %xmm0, %xmm3 + pshufb %xmm5, %xmm3 + + add $32, %r11 # next table: *B + call .Laes_schedule_transform + pxor %xmm0, %xmm3 + pshufb %xmm5, %xmm3 + + add $32, %r11 # next table: *D + call .Laes_schedule_transform + pxor %xmm0, %xmm3 + pshufb %xmm5, %xmm3 + + add $32, %r11 # next table: *E + call .Laes_schedule_transform + pxor %xmm0, %xmm3 + pshufb %xmm5, %xmm3 + + movdqa %xmm4, %xmm0 # restore %xmm0 + add $-16, %rdx + +.Laes_schedule_mangle_both: + pshufb .Lk_sr(%r8,%r10),%xmm3 + add $-16, %r8 + and $48, %r8 + movdqa %xmm3, (%rdx) + ret + +## +## .Laes_schedule_mangle_last +## +## Mangler for last round of key schedule +## Mangles %xmm0 +## when encrypting, outputs out(%xmm0) ^ 63 +## when decrypting, outputs unskew(%xmm0) +## +## Always called right before return... jumps to cleanup and exits +## +.Laes_schedule_mangle_last: + # schedule last round key from xmm0 + lea .Lk_deskew(%r10),%r11 # prepare to deskew + test %rcx, %rcx + jnz .Laes_schedule_mangle_last_dec + + # encrypting + pshufb .Lk_sr(%r8,%r10),%xmm0 # output permute + lea .Lk_opt(%r10), %r11 # prepare to output transform + add $32, %rdx + +.Laes_schedule_mangle_last_dec: + add $-16, %rdx + pxor .Lk_s63(%r10), %xmm0 + call .Laes_schedule_transform # output transform + movdqa %xmm0, (%rdx) # save last key + + #_aes_cleanup + pxor %xmm0, %xmm0 + pxor %xmm1, %xmm1 + pxor %xmm2, %xmm2 + pxor %xmm3, %xmm3 + pxor %xmm4, %xmm4 + pxor %xmm5, %xmm5 + pxor %xmm6, %xmm6 + pxor %xmm7, %xmm7 + pxor %xmm8, %xmm8 + ret +ELF(.size _aes_schedule_core,.-_aes_schedule_core) + +######################################################## +## ## +## Constants ## +## ## +######################################################## + +.align 16 +ELF(.type _aes_consts, at object) +.Laes_consts: +_aes_consts: + # s0F + .Lk_s0F = .-.Laes_consts + .quad 0x0F0F0F0F0F0F0F0F + .quad 0x0F0F0F0F0F0F0F0F + + # input transform (lo, hi) + .Lk_ipt = .-.Laes_consts + .quad 0xC2B2E8985A2A7000 + .quad 0xCABAE09052227808 + .quad 0x4C01307D317C4D00 + .quad 0xCD80B1FCB0FDCC81 + + # inv, inva + .Lk_inv = .-.Laes_consts + .quad 0x0E05060F0D080180 + .quad 0x040703090A0B0C02 + .quad 0x01040A060F0B0780 + .quad 0x030D0E0C02050809 + + # sb1u, sb1t + .Lk_sb1 = .-.Laes_consts + .quad 0xB19BE18FCB503E00 + .quad 0xA5DF7A6E142AF544 + .quad 0x3618D415FAE22300 + .quad 0x3BF7CCC10D2ED9EF + + + # sb2u, sb2t + .Lk_sb2 = .-.Laes_consts + .quad 0xE27A93C60B712400 + .quad 0x5EB7E955BC982FCD + .quad 0x69EB88400AE12900 + .quad 0xC2A163C8AB82234A + + # sbou, sbot + .Lk_sbo = .-.Laes_consts + .quad 0xD0D26D176FBDC700 + .quad 0x15AABF7AC502A878 + .quad 0xCFE474A55FBB6A00 + .quad 0x8E1E90D1412B35FA + + # mc_forward + .Lk_mc_forward = .-.Laes_consts + .quad 0x0407060500030201 + .quad 0x0C0F0E0D080B0A09 + .quad 0x080B0A0904070605 + .quad 0x000302010C0F0E0D + .quad 0x0C0F0E0D080B0A09 + .quad 0x0407060500030201 + .quad 0x000302010C0F0E0D + .quad 0x080B0A0904070605 + + # mc_backward + .Lk_mc_backward = .-.Laes_consts + .quad 0x0605040702010003 + .quad 0x0E0D0C0F0A09080B + .quad 0x020100030E0D0C0F + .quad 0x0A09080B06050407 + .quad 0x0E0D0C0F0A09080B + .quad 0x0605040702010003 + .quad 0x0A09080B06050407 + .quad 0x020100030E0D0C0F + + # sr + .Lk_sr = .-.Laes_consts + .quad 0x0706050403020100 + .quad 0x0F0E0D0C0B0A0908 + .quad 0x030E09040F0A0500 + .quad 0x0B06010C07020D08 + .quad 0x0F060D040B020900 + .quad 0x070E050C030A0108 + .quad 0x0B0E0104070A0D00 + .quad 0x0306090C0F020508 + + # rcon + .Lk_rcon = .-.Laes_consts + .quad 0x1F8391B9AF9DEEB6 + .quad 0x702A98084D7C7D81 + + # s63: all equal to 0x63 transformed + .Lk_s63 = .-.Laes_consts + .quad 0x5B5B5B5B5B5B5B5B + .quad 0x5B5B5B5B5B5B5B5B + + # output transform + .Lk_opt = .-.Laes_consts + .quad 0xFF9F4929D6B66000 + .quad 0xF7974121DEBE6808 + .quad 0x01EDBD5150BCEC00 + .quad 0xE10D5DB1B05C0CE0 + + # deskew tables: inverts the sbox's 'skew' + .Lk_deskew = .-.Laes_consts + .quad 0x07E4A34047A4E300 + .quad 0x1DFEB95A5DBEF91A + .quad 0x5F36B5DC83EA6900 + .quad 0x2841C2ABF49D1E77 + +## +## Decryption stuff +## Key schedule constants +## + # decryption key schedule: x -> invskew x*9 + .Lk_dks_1 = .-.Laes_consts + .quad 0xB6116FC87ED9A700 + .quad 0x4AED933482255BFC + .quad 0x4576516227143300 + .quad 0x8BB89FACE9DAFDCE + + # decryption key schedule: invskew x*9 -> invskew x*D + .Lk_dks_2 = .-.Laes_consts + .quad 0x27438FEBCCA86400 + .quad 0x4622EE8AADC90561 + .quad 0x815C13CE4F92DD00 + .quad 0x73AEE13CBD602FF2 + + # decryption key schedule: invskew x*D -> invskew x*B + .Lk_dks_3 = .-.Laes_consts + .quad 0x03C4C50201C6C700 + .quad 0xF83F3EF9FA3D3CFB + .quad 0xEE1921D638CFF700 + .quad 0xA5526A9D7384BC4B + + # decryption key schedule: invskew x*B -> invskew x*E + 0x63 + .Lk_dks_4 = .-.Laes_consts + .quad 0xE3C390B053732000 + .quad 0xA080D3F310306343 + .quad 0xA0CA214B036982E8 + .quad 0x2F45AEC48CE60D67 + +## +## Decryption stuff +## Round function constants +## + # decryption input transform + .Lk_dipt = .-.Laes_consts + .quad 0x0F505B040B545F00 + .quad 0x154A411E114E451A + .quad 0x86E383E660056500 + .quad 0x12771772F491F194 + + # decryption sbox output *9*u, *9*t + .Lk_dsb9 = .-.Laes_consts + .quad 0x851C03539A86D600 + .quad 0xCAD51F504F994CC9 + .quad 0xC03B1789ECD74900 + .quad 0x725E2C9EB2FBA565 + + # decryption sbox output *D*u, *D*t + .Lk_dsbd = .-.Laes_consts + .quad 0x7D57CCDFE6B1A200 + .quad 0xF56E9B13882A4439 + .quad 0x3CE2FAF724C6CB00 + .quad 0x2931180D15DEEFD3 + + # decryption sbox output *B*u, *B*t + .Lk_dsbb = .-.Laes_consts + .quad 0xD022649296B44200 + .quad 0x602646F6B0F2D404 + .quad 0xC19498A6CD596700 + .quad 0xF3FF0C3E3255AA6B + + # decryption sbox output *E*u, *E*t + .Lk_dsbe = .-.Laes_consts + .quad 0x46F2929626D4D000 + .quad 0x2242600464B4F6B0 + .quad 0x0C55A6CDFFAAC100 + .quad 0x9467F36B98593E32 + + # decryption sbox final output + .Lk_dsbo = .-.Laes_consts + .quad 0x1387EA537EF94000 + .quad 0xC7AA6DB9D4943E2D + .quad 0x12D7560F93441D00 + .quad 0xCA4B8159D8C58E9C +ELF(.size _aes_consts,.-_aes_consts) + +#endif +#endif diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index 2adb73f..25d1849 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -1,5 +1,5 @@ /* SSSE3 vector permutation AES for Libgcrypt - * Copyright (C) 2014-2015 Jussi Kivilinna + * Copyright (C) 2014-2017 Jussi Kivilinna * * This file is part of Libgcrypt. * @@ -57,11 +57,22 @@ #endif +/* Assembly functions in rijndael-ssse3-amd64-asm.S. Note that these + have custom calling convention and need to be called from assembly + blocks, not directly. */ +extern void _gcry_aes_ssse3_enc_preload(void); +extern void _gcry_aes_ssse3_dec_preload(void); +extern void _gcry_aes_ssse3_schedule_core(void); +extern void _gcry_aes_ssse3_encrypt_core(void); +extern void _gcry_aes_ssse3_decrypt_core(void); + + + /* Two macros to be called prior and after the use of SSSE3 - instructions. There should be no external function calls between - the use of these macros. There purpose is to make sure that the - SSE registers are cleared and won't reveal any information about - the key or the data. */ + instructions. There should be no external function calls between + the use of these macros. There purpose is to make sure that the + SSE registers are cleared and won't reveal any information about + the key or the data. */ #ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS # define SSSE3_STATE_SIZE (16 * 10) /* XMM6-XMM15 are callee-saved registers on WIN64. */ @@ -115,34 +126,19 @@ ::: "memory" ) #endif -#define vpaes_ssse3_prepare_enc(const_ptr) \ +#define vpaes_ssse3_prepare_enc() \ vpaes_ssse3_prepare(); \ - asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ - "movdqa (%q0), %%xmm9 # 0F \n\t" \ - "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ - "movdqa .Lk_inv+16(%q0), %%xmm11 # inva \n\t" \ - "movdqa .Lk_sb1 (%q0), %%xmm13 # sb1u \n\t" \ - "movdqa .Lk_sb1+16(%q0), %%xmm12 # sb1t \n\t" \ - "movdqa .Lk_sb2 (%q0), %%xmm15 # sb2u \n\t" \ - "movdqa .Lk_sb2+16(%q0), %%xmm14 # sb2t \n\t" \ - : "=c" (const_ptr) \ + asm volatile ("call *%[core] \n\t" \ : \ - : "memory" ) + : [core] "r" (_gcry_aes_ssse3_enc_preload) \ + : "rax", "cc", "memory" ) -#define vpaes_ssse3_prepare_dec(const_ptr) \ +#define vpaes_ssse3_prepare_dec() \ vpaes_ssse3_prepare(); \ - asm volatile ("lea .Laes_consts(%%rip), %q0 \n\t" \ - "movdqa (%q0), %%xmm9 # 0F \n\t" \ - "movdqa .Lk_inv (%q0), %%xmm10 # inv \n\t" \ - "movdqa .Lk_inv+16(%q0), %%xmm11 # inva \n\t" \ - "movdqa .Lk_dsb9 (%q0), %%xmm13 # sb9u \n\t" \ - "movdqa .Lk_dsb9+16(%q0), %%xmm12 # sb9t \n\t" \ - "movdqa .Lk_dsbd (%q0), %%xmm15 # sbdu \n\t" \ - "movdqa .Lk_dsbb (%q0), %%xmm14 # sbbu \n\t" \ - "movdqa .Lk_dsbe (%q0), %%xmm8 # sbeu \n\t" \ - : "=c" (const_ptr) \ + asm volatile ("call *%[core] \n\t" \ : \ - : "memory" ) + : [core] "r" (_gcry_aes_ssse3_dec_preload) \ + : "rax", "cc", "memory" ) @@ -159,9 +155,10 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call _aes_schedule_core" "\n\t" + "call *%[core]" "\n\t" : - : [key] "m" (*key), + : [core] "r" (&_gcry_aes_ssse3_schedule_core), + [key] "m" (*key), [bits] "g" (keybits), [buf] "m" (ctx->keyschenc32[0][0]), [dir] "g" (0), @@ -169,10 +166,31 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) : "r8", "r9", "r10", "r11", "rax", "rcx", "rdx", "rdi", "rsi", "cc", "memory"); - vpaes_ssse3_cleanup(); - /* Save key for setting up decryption. */ - memcpy(&ctx->keyschdec32[0][0], key, keybits / 8); + if (keybits > 192) + asm volatile ("movdqu (%[src]), %%xmm0\n\t" + "movdqu 16(%[src]), %%xmm1\n\t" + "movdqu %%xmm0, (%[dst])\n\t" + "movdqu %%xmm1, 16(%[dst])\n\t" + : /* No output */ + : [dst] "r" (&ctx->keyschdec32[0][0]), [src] "r" (key) + : "memory" ); + else if (keybits == 192) + asm volatile ("movdqu (%[src]), %%xmm0\n\t" + "movq 16(%[src]), %%xmm1\n\t" + "movdqu %%xmm0, (%[dst])\n\t" + "movq %%xmm1, 16(%[dst])\n\t" + : /* No output */ + : [dst] "r" (&ctx->keyschdec32[0][0]), [src] "r" (key) + : "memory" ); + else + asm volatile ("movdqu (%[src]), %%xmm0\n\t" + "movdqu %%xmm0, (%[dst])\n\t" + : /* No output */ + : [dst] "r" (&ctx->keyschdec32[0][0]), [src] "r" (key) + : "memory" ); + + vpaes_ssse3_cleanup(); } @@ -190,9 +208,10 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call _aes_schedule_core" "\n\t" + "call *%[core]" "\n\t" : - : [key] "m" (ctx->keyschdec32[0][0]), + : [core] "r" (_gcry_aes_ssse3_schedule_core), + [key] "m" (ctx->keyschdec32[0][0]), [bits] "g" (keybits), [buf] "m" (ctx->keyschdec32[ctx->rounds][0]), [dir] "g" (1), @@ -207,32 +226,30 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) /* Encrypt one block using the Intel SSSE3 instructions. Block is input * and output through SSE register xmm0. */ static inline void -do_vpaes_ssse3_enc (const RIJNDAEL_context *ctx, unsigned int nrounds, - const void *aes_const_ptr) +do_vpaes_ssse3_enc (const RIJNDAEL_context *ctx, unsigned int nrounds) { unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschenc32; - asm volatile ("call _aes_encrypt_core" "\n\t" - : "+a" (middle_rounds), "+d" (keysched) - : "c" (aes_const_ptr) - : "rdi", "rsi", "cc", "memory"); + asm volatile ("call *%[core]" "\n\t" + : "+a" (middle_rounds), "+d" (keysched) + : [core] "r" (_gcry_aes_ssse3_encrypt_core) + : "rcx", "rsi", "rdi", "cc", "memory"); } /* Decrypt one block using the Intel SSSE3 instructions. Block is input * and output through SSE register xmm0. */ static inline void -do_vpaes_ssse3_dec (const RIJNDAEL_context *ctx, unsigned int nrounds, - const void *aes_const_ptr) +do_vpaes_ssse3_dec (const RIJNDAEL_context *ctx, unsigned int nrounds) { unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschdec32; - asm volatile ("call _aes_decrypt_core" "\n\t" + asm volatile ("call *%[core]" "\n\t" : "+a" (middle_rounds), "+d" (keysched) - : "c" (aes_const_ptr) - : "rsi", "cc", "memory"); + : [core] "r" (_gcry_aes_ssse3_decrypt_core) + : "rcx", "rsi", "cc", "memory"); } @@ -241,15 +258,14 @@ _gcry_aes_ssse3_encrypt (const RIJNDAEL_context *ctx, unsigned char *dst, const unsigned char *src) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); asm volatile ("movdqu %[src], %%xmm0\n\t" : : [src] "m" (*src) : "memory" ); - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("movdqu %%xmm0, %[dst]\n\t" : [dst] "=m" (*dst) : @@ -265,10 +281,9 @@ _gcry_aes_ssse3_cfb_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, size_t nblocks) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); asm volatile ("movdqu %[iv], %%xmm0\n\t" : /* No output */ @@ -277,7 +292,7 @@ _gcry_aes_ssse3_cfb_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, for ( ;nblocks; nblocks-- ) { - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("movdqu %[inbuf], %%xmm1\n\t" "pxor %%xmm1, %%xmm0\n\t" @@ -305,10 +320,9 @@ _gcry_aes_ssse3_cbc_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, size_t nblocks, int cbc_mac) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); asm volatile ("movdqu %[iv], %%xmm7\n\t" : /* No output */ @@ -323,7 +337,7 @@ _gcry_aes_ssse3_cbc_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, : [inbuf] "m" (*inbuf) : "memory" ); - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("movdqa %%xmm0, %%xmm7\n\t" "movdqu %%xmm0, %[outbuf]\n\t" @@ -353,11 +367,10 @@ _gcry_aes_ssse3_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, static const unsigned char be_mask[16] __attribute__ ((aligned (16))) = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; u64 ctrlow; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); asm volatile ("movdqa %[mask], %%xmm6\n\t" /* Preload mask */ "movdqa (%[ctr]), %%xmm7\n\t" /* Preload CTR */ @@ -388,10 +401,10 @@ _gcry_aes_ssse3_ctr_enc (RIJNDAEL_context *ctx, unsigned char *outbuf, "pshufb %%xmm6, %%xmm7\n\t" : [ctrlow] "+r" (ctrlow) - : [ctr] "r" (ctr) + : : "cc", "memory"); - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("movdqu %[src], %%xmm1\n\t" /* xmm1 := input */ "pxor %%xmm1, %%xmm0\n\t" /* EncCTR ^= input */ @@ -418,15 +431,14 @@ _gcry_aes_ssse3_decrypt (const RIJNDAEL_context *ctx, unsigned char *dst, const unsigned char *src) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_dec (aes_const_ptr); + vpaes_ssse3_prepare_dec (); asm volatile ("movdqu %[src], %%xmm0\n\t" : : [src] "m" (*src) : "memory" ); - do_vpaes_ssse3_dec (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_dec (ctx, nrounds); asm volatile ("movdqu %%xmm0, %[dst]\n\t" : [dst] "=m" (*dst) : @@ -442,10 +454,9 @@ _gcry_aes_ssse3_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, size_t nblocks) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); asm volatile ("movdqu %[iv], %%xmm0\n\t" : /* No output */ @@ -454,7 +465,7 @@ _gcry_aes_ssse3_cfb_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, for ( ;nblocks; nblocks-- ) { - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("movdqa %%xmm0, %%xmm6\n\t" "movdqu %[inbuf], %%xmm0\n\t" @@ -483,45 +494,40 @@ _gcry_aes_ssse3_cbc_dec (RIJNDAEL_context *ctx, unsigned char *outbuf, size_t nblocks) { unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_dec (aes_const_ptr); + vpaes_ssse3_prepare_dec (); - asm volatile - ("movdqu %[iv], %%xmm7\n\t" /* use xmm7 as fast IV storage */ - : /* No output */ - : [iv] "m" (*iv) - : "memory"); + asm volatile ("movdqu %[iv], %%xmm7\n\t" /* use xmm7 as fast IV storage */ + : /* No output */ + : [iv] "m" (*iv) + : "memory"); for ( ;nblocks; nblocks-- ) { - asm volatile - ("movdqu %[inbuf], %%xmm0\n\t" - "movdqa %%xmm0, %%xmm6\n\t" /* use xmm6 as savebuf */ - : /* No output */ - : [inbuf] "m" (*inbuf) - : "memory"); - - do_vpaes_ssse3_dec (ctx, nrounds, aes_const_ptr); - - asm volatile - ("pxor %%xmm7, %%xmm0\n\t" /* xor IV with output */ - "movdqu %%xmm0, %[outbuf]\n\t" - "movdqu %%xmm6, %%xmm7\n\t" /* store savebuf as new IV */ - : [outbuf] "=m" (*outbuf) - : - : "memory"); + asm volatile ("movdqu %[inbuf], %%xmm0\n\t" + "movdqa %%xmm0, %%xmm6\n\t" /* use xmm6 as savebuf */ + : /* No output */ + : [inbuf] "m" (*inbuf) + : "memory"); + + do_vpaes_ssse3_dec (ctx, nrounds); + + asm volatile ("pxor %%xmm7, %%xmm0\n\t" /* xor IV with output */ + "movdqu %%xmm0, %[outbuf]\n\t" + "movdqu %%xmm6, %%xmm7\n\t" /* store savebuf as new IV */ + : [outbuf] "=m" (*outbuf) + : + : "memory"); outbuf += BLOCKSIZE; inbuf += BLOCKSIZE; } - asm volatile - ("movdqu %%xmm7, %[iv]\n\t" /* store IV */ - : /* No output */ - : [iv] "m" (*iv) - : "memory"); + asm volatile ("movdqu %%xmm7, %[iv]\n\t" /* store IV */ + : /* No output */ + : [iv] "m" (*iv) + : "memory"); vpaes_ssse3_cleanup (); } @@ -536,10 +542,9 @@ ssse3_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm7\n\t" @@ -568,7 +573,7 @@ ssse3_ocb_enc (gcry_cipher_hd_t c, void *outbuf_arg, [inbuf] "m" (*inbuf) : "memory" ); - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("pxor %%xmm7, %%xmm0\n\t" "movdqu %%xmm0, %[outbuf]\n\t" @@ -600,10 +605,9 @@ ssse3_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, const unsigned char *inbuf = inbuf_arg; u64 n = c->u_mode.ocb.data_nblocks; unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_dec (aes_const_ptr); + vpaes_ssse3_prepare_dec (); /* Preload Offset and Checksum */ asm volatile ("movdqu %[iv], %%xmm7\n\t" @@ -631,7 +635,7 @@ ssse3_ocb_dec (gcry_cipher_hd_t c, void *outbuf_arg, [inbuf] "m" (*inbuf) : "memory" ); - do_vpaes_ssse3_dec (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_dec (ctx, nrounds); asm volatile ("pxor %%xmm7, %%xmm0\n\t" "pxor %%xmm0, %%xmm6\n\t" @@ -675,10 +679,9 @@ _gcry_aes_ssse3_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, const unsigned char *abuf = abuf_arg; u64 n = c->u_mode.ocb.aad_nblocks; unsigned int nrounds = ctx->rounds; - const void *aes_const_ptr; byte ssse3_state[SSSE3_STATE_SIZE]; - vpaes_ssse3_prepare_enc (aes_const_ptr); + vpaes_ssse3_prepare_enc (); /* Preload Offset and Sum */ asm volatile ("movdqu %[iv], %%xmm7\n\t" @@ -705,7 +708,7 @@ _gcry_aes_ssse3_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, [abuf] "m" (*abuf) : "memory" ); - do_vpaes_ssse3_enc (ctx, nrounds, aes_const_ptr); + do_vpaes_ssse3_enc (ctx, nrounds); asm volatile ("pxor %%xmm0, %%xmm6\n\t" : @@ -726,774 +729,4 @@ _gcry_aes_ssse3_ocb_auth (gcry_cipher_hd_t c, const void *abuf_arg, vpaes_ssse3_cleanup (); } - -#ifdef HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS -# define X(...) -#else -# define X(...) __VA_ARGS__ -#endif - -asm ( - "\n\t" "##" - "\n\t" "## Constant-time SSSE3 AES core implementation." - "\n\t" "##" - "\n\t" "## By Mike Hamburg (Stanford University), 2009" - "\n\t" "## Public domain." - "\n\t" "##" - - "\n\t" ".text" - - "\n\t" "##" - "\n\t" "## _aes_encrypt_core" - "\n\t" "##" - "\n\t" "## AES-encrypt %xmm0." - "\n\t" "##" - "\n\t" "## Inputs:" - "\n\t" "## %xmm0 = input" - "\n\t" "## %xmm9-%xmm15 as in .Laes_preheat" - "\n\t" "## %rcx = .Laes_consts" - "\n\t" "## (%rdx) = scheduled keys" - "\n\t" "## %rax = nrounds - 1" - "\n\t" "##" - "\n\t" "## Output in %xmm0" - "\n\t" "## Clobbers %xmm1-%xmm4, %r9, %r11, %rax" - "\n\t" "## Preserves %xmm6 - %xmm7 so you get some local vectors" - "\n\t" "##" - "\n\t" "##" - "\n\t" ".align 16" -X("\n\t" ".type _aes_encrypt_core, at function") - "\n\t" "_aes_encrypt_core:" - "\n\t" " leaq .Lk_mc_backward(%rcx), %rdi" - "\n\t" " mov $16, %rsi" - "\n\t" " movdqa .Lk_ipt (%rcx), %xmm2 # iptlo" - "\n\t" " movdqa %xmm9, %xmm1" - "\n\t" " pandn %xmm0, %xmm1" - "\n\t" " psrld $4, %xmm1" - "\n\t" " pand %xmm9, %xmm0" - "\n\t" " pshufb %xmm0, %xmm2" - "\n\t" " movdqa .Lk_ipt+16(%rcx), %xmm0 # ipthi" - "\n\t" " pshufb %xmm1, %xmm0" - "\n\t" " pxor (%rdx),%xmm2" - "\n\t" " pxor %xmm2, %xmm0" - "\n\t" " add $16, %rdx" - "\n\t" " jmp .Laes_entry" - - "\n\t" ".align 8" - "\n\t" ".Laes_loop:" - "\n\t" " # middle of middle round" - "\n\t" " movdqa %xmm13, %xmm4 # 4 : sb1u" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sb1u" - "\n\t" " pxor (%rdx), %xmm4 # 4 = sb1u + k" - "\n\t" " movdqa %xmm12, %xmm0 # 0 : sb1t" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sb1t" - "\n\t" " pxor %xmm4, %xmm0 # 0 = A" - "\n\t" " movdqa %xmm15, %xmm4 # 4 : sb2u" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sb2u" - "\n\t" " movdqa .Lk_mc_forward-.Lk_mc_backward(%rsi,%rdi), %xmm1" - "\n\t" " movdqa %xmm14, %xmm2 # 2 : sb2t" - "\n\t" " pshufb %xmm3, %xmm2 # 2 = sb2t" - "\n\t" " pxor %xmm4, %xmm2 # 2 = 2A" - "\n\t" " movdqa %xmm0, %xmm3 # 3 = A" - "\n\t" " pshufb %xmm1, %xmm0 # 0 = B" - "\n\t" " pxor %xmm2, %xmm0 # 0 = 2A+B" - "\n\t" " pshufb (%rsi,%rdi), %xmm3 # 3 = D" - "\n\t" " lea 16(%esi),%esi # next mc" - "\n\t" " pxor %xmm0, %xmm3 # 3 = 2A+B+D" - "\n\t" " lea 16(%rdx),%rdx # next key" - "\n\t" " pshufb %xmm1, %xmm0 # 0 = 2B+C" - "\n\t" " pxor %xmm3, %xmm0 # 0 = 2A+3B+C+D" - "\n\t" " and $48, %rsi # ... mod 4" - "\n\t" " dec %rax # nr--" - - "\n\t" ".Laes_entry:" - "\n\t" " # top of round" - "\n\t" " movdqa %xmm9, %xmm1 # 1 : i" - "\n\t" " pandn %xmm0, %xmm1 # 1 = i<<4" - "\n\t" " psrld $4, %xmm1 # 1 = i" - "\n\t" " pand %xmm9, %xmm0 # 0 = k" - "\n\t" " movdqa %xmm11, %xmm2 # 2 : a/k" - "\n\t" " pshufb %xmm0, %xmm2 # 2 = a/k" - "\n\t" " pxor %xmm1, %xmm0 # 0 = j" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/i" - "\n\t" " pshufb %xmm1, %xmm3 # 3 = 1/i" - "\n\t" " pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k" - "\n\t" " movdqa %xmm10, %xmm4 # 4 : 1/j" - "\n\t" " pshufb %xmm0, %xmm4 # 4 = 1/j" - "\n\t" " pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k" - "\n\t" " movdqa %xmm10, %xmm2 # 2 : 1/iak" - "\n\t" " pshufb %xmm3, %xmm2 # 2 = 1/iak" - "\n\t" " pxor %xmm0, %xmm2 # 2 = io" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/jak" - "\n\t" " pshufb %xmm4, %xmm3 # 3 = 1/jak" - "\n\t" " pxor %xmm1, %xmm3 # 3 = jo" - "\n\t" " jnz .Laes_loop" - - "\n\t" " # middle of last round" - "\n\t" " movdqa .Lk_sbo(%rcx), %xmm4 # 3 : sbou" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbou" - "\n\t" " pxor (%rdx), %xmm4 # 4 = sb1u + k" - "\n\t" " movdqa .Lk_sbo+16(%rcx), %xmm0 # 0 : sbot" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sb1t" - "\n\t" " pxor %xmm4, %xmm0 # 0 = A" - "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" - "\n\t" " ret" -X("\n\t" ".size _aes_encrypt_core,.-_aes_encrypt_core") - - "\n\t" "##" - "\n\t" "## Decryption core" - "\n\t" "##" - "\n\t" "## Same API as encryption core." - "\n\t" "##" - "\n\t" ".align 16" -X("\n\t" ".type _aes_decrypt_core, at function") - "\n\t" "_aes_decrypt_core:" - "\n\t" " movl %eax, %esi" - "\n\t" " shll $4, %esi" - "\n\t" " xorl $48, %esi" - "\n\t" " andl $48, %esi" - "\n\t" " movdqa .Lk_dipt (%rcx), %xmm2 # iptlo" - "\n\t" " movdqa %xmm9, %xmm1" - "\n\t" " pandn %xmm0, %xmm1" - "\n\t" " psrld $4, %xmm1" - "\n\t" " pand %xmm9, %xmm0" - "\n\t" " pshufb %xmm0, %xmm2" - "\n\t" " movdqa .Lk_dipt+16(%rcx), %xmm0 # ipthi" - "\n\t" " pshufb %xmm1, %xmm0" - "\n\t" " pxor (%rdx), %xmm2" - "\n\t" " pxor %xmm2, %xmm0" - "\n\t" " movdqa .Lk_mc_forward+48(%rcx), %xmm5" - "\n\t" " lea 16(%rdx), %rdx" - "\n\t" " neg %rax" - "\n\t" " jmp .Laes_dec_entry" - - "\n\t" ".align 16" - "\n\t" ".Laes_dec_loop:" - "\n\t" "##" - "\n\t" "## Inverse mix columns" - "\n\t" "##" - "\n\t" " movdqa %xmm13, %xmm4 # 4 : sb9u" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sb9u" - "\n\t" " pxor (%rdx), %xmm4" - "\n\t" " movdqa %xmm12, %xmm0 # 0 : sb9t" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sb9t" - "\n\t" " movdqa .Lk_dsbd+16(%rcx),%xmm1 # 1 : sbdt" - "\n\t" " pxor %xmm4, %xmm0 # 0 = ch" - "\n\t" " lea 16(%rdx), %rdx # next round key" - - "\n\t" " pshufb %xmm5, %xmm0 # MC ch" - "\n\t" " movdqa %xmm15, %xmm4 # 4 : sbdu" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbdu" - "\n\t" " pxor %xmm0, %xmm4 # 4 = ch" - "\n\t" " pshufb %xmm3, %xmm1 # 1 = sbdt" - "\n\t" " pxor %xmm4, %xmm1 # 1 = ch" - - "\n\t" " pshufb %xmm5, %xmm1 # MC ch" - "\n\t" " movdqa %xmm14, %xmm4 # 4 : sbbu" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbbu" - "\n\t" " inc %rax # nr--" - "\n\t" " pxor %xmm1, %xmm4 # 4 = ch" - "\n\t" " movdqa .Lk_dsbb+16(%rcx),%xmm0 # 0 : sbbt" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sbbt" - "\n\t" " pxor %xmm4, %xmm0 # 0 = ch" - - "\n\t" " pshufb %xmm5, %xmm0 # MC ch" - "\n\t" " movdqa %xmm8, %xmm4 # 4 : sbeu" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbeu" - "\n\t" " pshufd $0x93, %xmm5, %xmm5" - "\n\t" " pxor %xmm0, %xmm4 # 4 = ch" - "\n\t" " movdqa .Lk_dsbe+16(%rcx),%xmm0 # 0 : sbet" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sbet" - "\n\t" " pxor %xmm4, %xmm0 # 0 = ch" - - "\n\t" ".Laes_dec_entry:" - "\n\t" " # top of round" - "\n\t" " movdqa %xmm9, %xmm1 # 1 : i" - "\n\t" " pandn %xmm0, %xmm1 # 1 = i<<4" - "\n\t" " psrld $4, %xmm1 # 1 = i" - "\n\t" " pand %xmm9, %xmm0 # 0 = k" - "\n\t" " movdqa %xmm11, %xmm2 # 2 : a/k" - "\n\t" " pshufb %xmm0, %xmm2 # 2 = a/k" - "\n\t" " pxor %xmm1, %xmm0 # 0 = j" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/i" - "\n\t" " pshufb %xmm1, %xmm3 # 3 = 1/i" - "\n\t" " pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k" - "\n\t" " movdqa %xmm10, %xmm4 # 4 : 1/j" - "\n\t" " pshufb %xmm0, %xmm4 # 4 = 1/j" - "\n\t" " pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k" - "\n\t" " movdqa %xmm10, %xmm2 # 2 : 1/iak" - "\n\t" " pshufb %xmm3, %xmm2 # 2 = 1/iak" - "\n\t" " pxor %xmm0, %xmm2 # 2 = io" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/jak" - "\n\t" " pshufb %xmm4, %xmm3 # 3 = 1/jak" - "\n\t" " pxor %xmm1, %xmm3 # 3 = jo" - "\n\t" " jnz .Laes_dec_loop" - - "\n\t" " # middle of last round" - "\n\t" " movdqa .Lk_dsbo(%rcx), %xmm4 # 3 : sbou" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbou" - "\n\t" " pxor (%rdx), %xmm4 # 4 = sb1u + k" - "\n\t" " movdqa .Lk_dsbo+16(%rcx), %xmm0 # 0 : sbot" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sb1t" - "\n\t" " pxor %xmm4, %xmm0 # 0 = A" - "\n\t" " pshufb .Lk_sr(%rsi,%rcx), %xmm0" - "\n\t" " ret" -X("\n\t" ".size _aes_decrypt_core,.-_aes_decrypt_core") - - "\n\t" "########################################################" - "\n\t" "## ##" - "\n\t" "## AES key schedule ##" - "\n\t" "## ##" - "\n\t" "########################################################" - - "\n\t" ".align 16" -X("\n\t" ".type _aes_schedule_core, at function") - "\n\t" "_aes_schedule_core:" - "\n\t" " # rdi = key" - "\n\t" " # rsi = size in bits" - "\n\t" " # rdx = buffer" - "\n\t" " # rcx = direction. 0=encrypt, 1=decrypt" - - "\n\t" " # load the tables" - "\n\t" " lea .Laes_consts(%rip), %r10" - "\n\t" " movdqa (%r10), %xmm9 # 0F" - "\n\t" " movdqa .Lk_inv (%r10), %xmm10 # inv" - "\n\t" " movdqa .Lk_inv+16(%r10), %xmm11 # inva" - "\n\t" " movdqa .Lk_sb1 (%r10), %xmm13 # sb1u" - "\n\t" " movdqa .Lk_sb1+16(%r10), %xmm12 # sb1t" - "\n\t" " movdqa .Lk_sb2 (%r10), %xmm15 # sb2u" - "\n\t" " movdqa .Lk_sb2+16(%r10), %xmm14 # sb2t" - - "\n\t" " movdqa .Lk_rcon(%r10), %xmm8 # load rcon" - "\n\t" " movdqu (%rdi), %xmm0 # load key (unaligned)" - - "\n\t" " # input transform" - "\n\t" " movdqu %xmm0, %xmm3" - "\n\t" " lea .Lk_ipt(%r10), %r11" - "\n\t" " call .Laes_schedule_transform" - "\n\t" " movdqu %xmm0, %xmm7" - - "\n\t" " test %rcx, %rcx" - "\n\t" " jnz .Laes_schedule_am_decrypting" - - "\n\t" " # encrypting, output zeroth round key after transform" - "\n\t" " movdqa %xmm0, (%rdx)" - "\n\t" " jmp .Laes_schedule_go" - - "\n\t" ".Laes_schedule_am_decrypting:" - "\n\t" " # decrypting, output zeroth round key after shiftrows" - "\n\t" " pshufb .Lk_sr(%r8,%r10),%xmm3" - "\n\t" " movdqa %xmm3, (%rdx)" - "\n\t" " xor $48, %r8" - - "\n\t" ".Laes_schedule_go:" - "\n\t" " cmp $192, %rsi" - "\n\t" " je .Laes_schedule_192" - "\n\t" " cmp $256, %rsi" - "\n\t" " je .Laes_schedule_256" - "\n\t" " # 128: fall though" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_128" - "\n\t" "##" - "\n\t" "## 128-bit specific part of key schedule." - "\n\t" "##" - "\n\t" "## This schedule is really simple, because all its parts" - "\n\t" "## are accomplished by the subroutines." - "\n\t" "##" - "\n\t" ".Laes_schedule_128:" - "\n\t" " mov $10, %rsi" - - "\n\t" ".Laes_schedule_128_L:" - "\n\t" " call .Laes_schedule_round" - "\n\t" " dec %rsi" - "\n\t" " jz .Laes_schedule_mangle_last" - "\n\t" " call .Laes_schedule_mangle # write output" - "\n\t" " jmp .Laes_schedule_128_L" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_192" - "\n\t" "##" - "\n\t" "## 192-bit specific part of key schedule." - "\n\t" "##" - "\n\t" "## The main body of this schedule is the same as the 128-bit" - "\n\t" "## schedule, but with more smearing. The long, high side is" - "\n\t" "## stored in %xmm7 as before, and the short, low side is in" - "\n\t" "## the high bits of %xmm6." - "\n\t" "##" - "\n\t" "## This schedule is somewhat nastier, however, because each" - "\n\t" "## round produces 192 bits of key material, or 1.5 round keys." - "\n\t" "## Therefore, on each cycle we do 2 rounds and produce 3 round" - "\n\t" "## keys." - "\n\t" "##" - "\n\t" ".Laes_schedule_192:" - "\n\t" " movdqu 8(%rdi),%xmm0 # load key part 2 (very unaligned)" - "\n\t" " call .Laes_schedule_transform # input transform" - "\n\t" " pshufd $0x0E, %xmm0, %xmm6" - "\n\t" " pslldq $8, %xmm6 # clobber low side with zeros" - "\n\t" " mov $4, %rsi" - - "\n\t" ".Laes_schedule_192_L:" - "\n\t" " call .Laes_schedule_round" - "\n\t" " palignr $8,%xmm6,%xmm0 " - "\n\t" " call .Laes_schedule_mangle # save key n" - "\n\t" " call .Laes_schedule_192_smear" - "\n\t" " call .Laes_schedule_mangle # save key n+1" - "\n\t" " call .Laes_schedule_round" - "\n\t" " dec %rsi" - "\n\t" " jz .Laes_schedule_mangle_last" - "\n\t" " call .Laes_schedule_mangle # save key n+2" - "\n\t" " call .Laes_schedule_192_smear" - "\n\t" " jmp .Laes_schedule_192_L" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_192_smear" - "\n\t" "##" - "\n\t" "## Smear the short, low side in the 192-bit key schedule." - "\n\t" "##" - "\n\t" "## Inputs:" - "\n\t" "## %xmm7: high side, b a x y" - "\n\t" "## %xmm6: low side, d c 0 0" - "\n\t" "## %xmm13: 0" - "\n\t" "##" - "\n\t" "## Outputs:" - "\n\t" "## %xmm6: b+c+d b+c 0 0" - "\n\t" "## %xmm0: b+c+d b+c b a" - "\n\t" "##" - "\n\t" ".Laes_schedule_192_smear:" - "\n\t" " pshufd $0x80, %xmm6, %xmm0 # d c 0 0 -> c 0 0 0" - "\n\t" " pxor %xmm0, %xmm6 # -> c+d c 0 0" - "\n\t" " pshufd $0xFE, %xmm7, %xmm0 # b a _ _ -> b b b a" - "\n\t" " pxor %xmm6, %xmm0 # -> b+c+d b+c b a" - "\n\t" " pshufd $0x0E, %xmm0, %xmm6" - "\n\t" " pslldq $8, %xmm6 # clobber low side with zeros" - "\n\t" " ret" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_256" - "\n\t" "##" - "\n\t" "## 256-bit specific part of key schedule." - "\n\t" "##" - "\n\t" "## The structure here is very similar to the 128-bit" - "\n\t" "## schedule, but with an additional 'low side' in" - "\n\t" "## %xmm6. The low side's rounds are the same as the" - "\n\t" "## high side's, except no rcon and no rotation." - "\n\t" "##" - "\n\t" ".Laes_schedule_256:" - "\n\t" " movdqu 16(%rdi),%xmm0 # load key part 2 (unaligned)" - "\n\t" " call .Laes_schedule_transform # input transform" - "\n\t" " mov $7, %rsi" - - "\n\t" ".Laes_schedule_256_L:" - "\n\t" " call .Laes_schedule_mangle # output low result" - "\n\t" " movdqa %xmm0, %xmm6 # save cur_lo in xmm6" - - "\n\t" " # high round" - "\n\t" " call .Laes_schedule_round" - "\n\t" " dec %rsi" - "\n\t" " jz .Laes_schedule_mangle_last" - "\n\t" " call .Laes_schedule_mangle " - - "\n\t" " # low round. swap xmm7 and xmm6" - "\n\t" " pshufd $0xFF, %xmm0, %xmm0" - "\n\t" " movdqa %xmm7, %xmm5" - "\n\t" " movdqa %xmm6, %xmm7" - "\n\t" " call .Laes_schedule_low_round" - "\n\t" " movdqa %xmm5, %xmm7" - - "\n\t" " jmp .Laes_schedule_256_L" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_round" - "\n\t" "##" - "\n\t" "## Runs one main round of the key schedule on %xmm0, %xmm7" - "\n\t" "##" - "\n\t" "## Specifically, runs subbytes on the high dword of %xmm0" - "\n\t" "## then rotates it by one byte and xors into the low dword of" - "\n\t" "## %xmm7." - "\n\t" "##" - "\n\t" "## Adds rcon from low byte of %xmm8, then rotates %xmm8 for" - "\n\t" "## next rcon." - "\n\t" "##" - "\n\t" "## Smears the dwords of %xmm7 by xoring the low into the" - "\n\t" "## second low, result into third, result into highest." - "\n\t" "##" - "\n\t" "## Returns results in %xmm7 = %xmm0." - "\n\t" "## Clobbers %xmm1-%xmm4, %r11." - "\n\t" "##" - "\n\t" ".Laes_schedule_round:" - "\n\t" " # extract rcon from xmm8" - "\n\t" " pxor %xmm1, %xmm1" - "\n\t" " palignr $15, %xmm8, %xmm1" - "\n\t" " palignr $15, %xmm8, %xmm8" - "\n\t" " pxor %xmm1, %xmm7" - - "\n\t" " # rotate" - "\n\t" " pshufd $0xFF, %xmm0, %xmm0" - "\n\t" " palignr $1, %xmm0, %xmm0" - - "\n\t" " # fall through..." - - "\n\t" " # low round: same as high round, but no rotation and no rcon." - "\n\t" ".Laes_schedule_low_round:" - "\n\t" " # smear xmm7" - "\n\t" " movdqa %xmm7, %xmm1" - "\n\t" " pslldq $4, %xmm7" - "\n\t" " pxor %xmm1, %xmm7" - "\n\t" " movdqa %xmm7, %xmm1" - "\n\t" " pslldq $8, %xmm7" - "\n\t" " pxor %xmm1, %xmm7" - "\n\t" " pxor .Lk_s63(%r10), %xmm7" - - "\n\t" " # subbytes" - "\n\t" " movdqa %xmm9, %xmm1" - "\n\t" " pandn %xmm0, %xmm1" - "\n\t" " psrld $4, %xmm1 # 1 = i" - "\n\t" " pand %xmm9, %xmm0 # 0 = k" - "\n\t" " movdqa %xmm11, %xmm2 # 2 : a/k" - "\n\t" " pshufb %xmm0, %xmm2 # 2 = a/k" - "\n\t" " pxor %xmm1, %xmm0 # 0 = j" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/i" - "\n\t" " pshufb %xmm1, %xmm3 # 3 = 1/i" - "\n\t" " pxor %xmm2, %xmm3 # 3 = iak = 1/i + a/k" - "\n\t" " movdqa %xmm10, %xmm4 # 4 : 1/j" - "\n\t" " pshufb %xmm0, %xmm4 # 4 = 1/j" - "\n\t" " pxor %xmm2, %xmm4 # 4 = jak = 1/j + a/k" - "\n\t" " movdqa %xmm10, %xmm2 # 2 : 1/iak" - "\n\t" " pshufb %xmm3, %xmm2 # 2 = 1/iak" - "\n\t" " pxor %xmm0, %xmm2 # 2 = io" - "\n\t" " movdqa %xmm10, %xmm3 # 3 : 1/jak" - "\n\t" " pshufb %xmm4, %xmm3 # 3 = 1/jak" - "\n\t" " pxor %xmm1, %xmm3 # 3 = jo" - "\n\t" " movdqa .Lk_sb1(%r10), %xmm4 # 4 : sbou" - "\n\t" " pshufb %xmm2, %xmm4 # 4 = sbou" - "\n\t" " movdqa .Lk_sb1+16(%r10), %xmm0 # 0 : sbot" - "\n\t" " pshufb %xmm3, %xmm0 # 0 = sb1t" - "\n\t" " pxor %xmm4, %xmm0 # 0 = sbox output" - - "\n\t" " # add in smeared stuff" - "\n\t" " pxor %xmm7, %xmm0 " - "\n\t" " movdqa %xmm0, %xmm7" - "\n\t" " ret" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_transform" - "\n\t" "##" - "\n\t" "## Linear-transform %xmm0 according to tables at (%r11)" - "\n\t" "##" - "\n\t" "## Requires that %xmm9 = 0x0F0F... as in preheat" - "\n\t" "## Output in %xmm0" - "\n\t" "## Clobbers %xmm1, %xmm2" - "\n\t" "##" - "\n\t" ".Laes_schedule_transform:" - "\n\t" " movdqa %xmm9, %xmm1" - "\n\t" " pandn %xmm0, %xmm1" - "\n\t" " psrld $4, %xmm1" - "\n\t" " pand %xmm9, %xmm0" - "\n\t" " movdqa (%r11), %xmm2 # lo" - "\n\t" " pshufb %xmm0, %xmm2" - "\n\t" " movdqa 16(%r11), %xmm0 # hi" - "\n\t" " pshufb %xmm1, %xmm0" - "\n\t" " pxor %xmm2, %xmm0" - "\n\t" " ret" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_mangle" - "\n\t" "##" - "\n\t" "## Mangle xmm0 from (basis-transformed) standard version" - "\n\t" "## to our version." - "\n\t" "##" - "\n\t" "## On encrypt," - "\n\t" "## xor with 0x63" - "\n\t" "## multiply by circulant 0,1,1,1" - "\n\t" "## apply shiftrows transform" - "\n\t" "##" - "\n\t" "## On decrypt," - "\n\t" "## xor with 0x63" - "\n\t" "## multiply by 'inverse mixcolumns' circulant E,B,D,9" - "\n\t" "## deskew" - "\n\t" "## apply shiftrows transform" - "\n\t" "##" - "\n\t" "##" - "\n\t" "## Writes out to (%rdx), and increments or decrements it" - "\n\t" "## Keeps track of round number mod 4 in %r8" - "\n\t" "## Preserves xmm0" - "\n\t" "## Clobbers xmm1-xmm5" - "\n\t" "##" - "\n\t" ".Laes_schedule_mangle:" - "\n\t" " movdqa %xmm0, %xmm4 # save xmm0 for later" - "\n\t" " movdqa .Lk_mc_forward(%r10),%xmm5" - "\n\t" " test %rcx, %rcx" - "\n\t" " jnz .Laes_schedule_mangle_dec" - - "\n\t" " # encrypting" - "\n\t" " add $16, %rdx" - "\n\t" " pxor .Lk_s63(%r10),%xmm4" - "\n\t" " pshufb %xmm5, %xmm4" - "\n\t" " movdqa %xmm4, %xmm3" - "\n\t" " pshufb %xmm5, %xmm4" - "\n\t" " pxor %xmm4, %xmm3" - "\n\t" " pshufb %xmm5, %xmm4" - "\n\t" " pxor %xmm4, %xmm3" - - "\n\t" " jmp .Laes_schedule_mangle_both" - - "\n\t" ".Laes_schedule_mangle_dec:" - "\n\t" " lea .Lk_dks_1(%r10), %r11 # first table: *9" - "\n\t" " call .Laes_schedule_transform" - "\n\t" " movdqa %xmm0, %xmm3" - "\n\t" " pshufb %xmm5, %xmm3" - - "\n\t" " add $32, %r11 # next table: *B" - "\n\t" " call .Laes_schedule_transform" - "\n\t" " pxor %xmm0, %xmm3" - "\n\t" " pshufb %xmm5, %xmm3" - - "\n\t" " add $32, %r11 # next table: *D" - "\n\t" " call .Laes_schedule_transform" - "\n\t" " pxor %xmm0, %xmm3" - "\n\t" " pshufb %xmm5, %xmm3" - - "\n\t" " add $32, %r11 # next table: *E" - "\n\t" " call .Laes_schedule_transform" - "\n\t" " pxor %xmm0, %xmm3" - "\n\t" " pshufb %xmm5, %xmm3" - - "\n\t" " movdqa %xmm4, %xmm0 # restore %xmm0" - "\n\t" " add $-16, %rdx" - - "\n\t" ".Laes_schedule_mangle_both:" - "\n\t" " pshufb .Lk_sr(%r8,%r10),%xmm3" - "\n\t" " add $-16, %r8" - "\n\t" " and $48, %r8" - "\n\t" " movdqa %xmm3, (%rdx)" - "\n\t" " ret" - - "\n\t" "##" - "\n\t" "## .Laes_schedule_mangle_last" - "\n\t" "##" - "\n\t" "## Mangler for last round of key schedule" - "\n\t" "## Mangles %xmm0" - "\n\t" "## when encrypting, outputs out(%xmm0) ^ 63" - "\n\t" "## when decrypting, outputs unskew(%xmm0)" - "\n\t" "##" - "\n\t" "## Always called right before return... jumps to cleanup and exits" - "\n\t" "##" - "\n\t" ".Laes_schedule_mangle_last:" - "\n\t" " # schedule last round key from xmm0" - "\n\t" " lea .Lk_deskew(%r10),%r11 # prepare to deskew" - "\n\t" " test %rcx, %rcx" - "\n\t" " jnz .Laes_schedule_mangle_last_dec" - - "\n\t" " # encrypting" - "\n\t" " pshufb .Lk_sr(%r8,%r10),%xmm0 # output permute" - "\n\t" " lea .Lk_opt(%r10), %r11 # prepare to output transform" - "\n\t" " add $32, %rdx" - - "\n\t" ".Laes_schedule_mangle_last_dec:" - "\n\t" " add $-16, %rdx" - "\n\t" " pxor .Lk_s63(%r10), %xmm0" - "\n\t" " call .Laes_schedule_transform # output transform" - "\n\t" " movdqa %xmm0, (%rdx) # save last key" - - "\n\t" " #_aes_cleanup" - "\n\t" " pxor %xmm0, %xmm0" - "\n\t" " pxor %xmm1, %xmm1" - "\n\t" " pxor %xmm2, %xmm2" - "\n\t" " pxor %xmm3, %xmm3" - "\n\t" " pxor %xmm4, %xmm4" - "\n\t" " pxor %xmm5, %xmm5" - "\n\t" " pxor %xmm6, %xmm6" - "\n\t" " pxor %xmm7, %xmm7" - "\n\t" " pxor %xmm8, %xmm8" - "\n\t" " ret" -X("\n\t" ".size _aes_schedule_core,.-_aes_schedule_core") - - "\n\t" "########################################################" - "\n\t" "## ##" - "\n\t" "## Constants ##" - "\n\t" "## ##" - "\n\t" "########################################################" - - "\n\t" ".align 16" -X("\n\t" ".type _aes_consts, at object") - "\n\t" ".Laes_consts:" - "\n\t" "_aes_consts:" - "\n\t" " # s0F" - "\n\t" " .Lk_s0F = .-.Laes_consts" - "\n\t" " .quad 0x0F0F0F0F0F0F0F0F" - "\n\t" " .quad 0x0F0F0F0F0F0F0F0F" - - "\n\t" " # input transform (lo, hi)" - "\n\t" " .Lk_ipt = .-.Laes_consts" - "\n\t" " .quad 0xC2B2E8985A2A7000" - "\n\t" " .quad 0xCABAE09052227808" - "\n\t" " .quad 0x4C01307D317C4D00" - "\n\t" " .quad 0xCD80B1FCB0FDCC81" - - "\n\t" " # inv, inva" - "\n\t" " .Lk_inv = .-.Laes_consts" - "\n\t" " .quad 0x0E05060F0D080180" - "\n\t" " .quad 0x040703090A0B0C02" - "\n\t" " .quad 0x01040A060F0B0780" - "\n\t" " .quad 0x030D0E0C02050809" - - "\n\t" " # sb1u, sb1t" - "\n\t" " .Lk_sb1 = .-.Laes_consts" - "\n\t" " .quad 0xB19BE18FCB503E00" - "\n\t" " .quad 0xA5DF7A6E142AF544" - "\n\t" " .quad 0x3618D415FAE22300" - "\n\t" " .quad 0x3BF7CCC10D2ED9EF" - - - "\n\t" " # sb2u, sb2t" - "\n\t" " .Lk_sb2 = .-.Laes_consts" - "\n\t" " .quad 0xE27A93C60B712400" - "\n\t" " .quad 0x5EB7E955BC982FCD" - "\n\t" " .quad 0x69EB88400AE12900" - "\n\t" " .quad 0xC2A163C8AB82234A" - - "\n\t" " # sbou, sbot" - "\n\t" " .Lk_sbo = .-.Laes_consts" - "\n\t" " .quad 0xD0D26D176FBDC700" - "\n\t" " .quad 0x15AABF7AC502A878" - "\n\t" " .quad 0xCFE474A55FBB6A00" - "\n\t" " .quad 0x8E1E90D1412B35FA" - - "\n\t" " # mc_forward" - "\n\t" " .Lk_mc_forward = .-.Laes_consts" - "\n\t" " .quad 0x0407060500030201" - "\n\t" " .quad 0x0C0F0E0D080B0A09" - "\n\t" " .quad 0x080B0A0904070605" - "\n\t" " .quad 0x000302010C0F0E0D" - "\n\t" " .quad 0x0C0F0E0D080B0A09" - "\n\t" " .quad 0x0407060500030201" - "\n\t" " .quad 0x000302010C0F0E0D" - "\n\t" " .quad 0x080B0A0904070605" - - "\n\t" " # mc_backward" - "\n\t" " .Lk_mc_backward = .-.Laes_consts" - "\n\t" " .quad 0x0605040702010003" - "\n\t" " .quad 0x0E0D0C0F0A09080B" - "\n\t" " .quad 0x020100030E0D0C0F" - "\n\t" " .quad 0x0A09080B06050407" - "\n\t" " .quad 0x0E0D0C0F0A09080B" - "\n\t" " .quad 0x0605040702010003" - "\n\t" " .quad 0x0A09080B06050407" - "\n\t" " .quad 0x020100030E0D0C0F" - - "\n\t" " # sr" - "\n\t" " .Lk_sr = .-.Laes_consts" - "\n\t" " .quad 0x0706050403020100" - "\n\t" " .quad 0x0F0E0D0C0B0A0908" - "\n\t" " .quad 0x030E09040F0A0500" - "\n\t" " .quad 0x0B06010C07020D08" - "\n\t" " .quad 0x0F060D040B020900" - "\n\t" " .quad 0x070E050C030A0108" - "\n\t" " .quad 0x0B0E0104070A0D00" - "\n\t" " .quad 0x0306090C0F020508" - - "\n\t" " # rcon" - "\n\t" " .Lk_rcon = .-.Laes_consts" - "\n\t" " .quad 0x1F8391B9AF9DEEB6" - "\n\t" " .quad 0x702A98084D7C7D81" - - "\n\t" " # s63: all equal to 0x63 transformed" - "\n\t" " .Lk_s63 = .-.Laes_consts" - "\n\t" " .quad 0x5B5B5B5B5B5B5B5B" - "\n\t" " .quad 0x5B5B5B5B5B5B5B5B" - - "\n\t" " # output transform" - "\n\t" " .Lk_opt = .-.Laes_consts" - "\n\t" " .quad 0xFF9F4929D6B66000" - "\n\t" " .quad 0xF7974121DEBE6808" - "\n\t" " .quad 0x01EDBD5150BCEC00" - "\n\t" " .quad 0xE10D5DB1B05C0CE0" - - "\n\t" " # deskew tables: inverts the sbox's 'skew'" - "\n\t" " .Lk_deskew = .-.Laes_consts" - "\n\t" " .quad 0x07E4A34047A4E300" - "\n\t" " .quad 0x1DFEB95A5DBEF91A" - "\n\t" " .quad 0x5F36B5DC83EA6900" - "\n\t" " .quad 0x2841C2ABF49D1E77" - - "\n\t" "##" - "\n\t" "## Decryption stuff" - "\n\t" "## Key schedule constants" - "\n\t" "##" - "\n\t" " # decryption key schedule: x -> invskew x*9" - "\n\t" " .Lk_dks_1 = .-.Laes_consts" - "\n\t" " .quad 0xB6116FC87ED9A700" - "\n\t" " .quad 0x4AED933482255BFC" - "\n\t" " .quad 0x4576516227143300" - "\n\t" " .quad 0x8BB89FACE9DAFDCE" - - "\n\t" " # decryption key schedule: invskew x*9 -> invskew x*D" - "\n\t" " .Lk_dks_2 = .-.Laes_consts" - "\n\t" " .quad 0x27438FEBCCA86400" - "\n\t" " .quad 0x4622EE8AADC90561" - "\n\t" " .quad 0x815C13CE4F92DD00" - "\n\t" " .quad 0x73AEE13CBD602FF2" - - "\n\t" " # decryption key schedule: invskew x*D -> invskew x*B" - "\n\t" " .Lk_dks_3 = .-.Laes_consts" - "\n\t" " .quad 0x03C4C50201C6C700" - "\n\t" " .quad 0xF83F3EF9FA3D3CFB" - "\n\t" " .quad 0xEE1921D638CFF700" - "\n\t" " .quad 0xA5526A9D7384BC4B" - - "\n\t" " # decryption key schedule: invskew x*B -> invskew x*E + 0x63" - "\n\t" " .Lk_dks_4 = .-.Laes_consts" - "\n\t" " .quad 0xE3C390B053732000" - "\n\t" " .quad 0xA080D3F310306343" - "\n\t" " .quad 0xA0CA214B036982E8" - "\n\t" " .quad 0x2F45AEC48CE60D67" - - "\n\t" "##" - "\n\t" "## Decryption stuff" - "\n\t" "## Round function constants" - "\n\t" "##" - "\n\t" " # decryption input transform" - "\n\t" " .Lk_dipt = .-.Laes_consts" - "\n\t" " .quad 0x0F505B040B545F00" - "\n\t" " .quad 0x154A411E114E451A" - "\n\t" " .quad 0x86E383E660056500" - "\n\t" " .quad 0x12771772F491F194" - - "\n\t" " # decryption sbox output *9*u, *9*t" - "\n\t" " .Lk_dsb9 = .-.Laes_consts" - "\n\t" " .quad 0x851C03539A86D600" - "\n\t" " .quad 0xCAD51F504F994CC9" - "\n\t" " .quad 0xC03B1789ECD74900" - "\n\t" " .quad 0x725E2C9EB2FBA565" - - "\n\t" " # decryption sbox output *D*u, *D*t" - "\n\t" " .Lk_dsbd = .-.Laes_consts" - "\n\t" " .quad 0x7D57CCDFE6B1A200" - "\n\t" " .quad 0xF56E9B13882A4439" - "\n\t" " .quad 0x3CE2FAF724C6CB00" - "\n\t" " .quad 0x2931180D15DEEFD3" - - "\n\t" " # decryption sbox output *B*u, *B*t" - "\n\t" " .Lk_dsbb = .-.Laes_consts" - "\n\t" " .quad 0xD022649296B44200" - "\n\t" " .quad 0x602646F6B0F2D404" - "\n\t" " .quad 0xC19498A6CD596700" - "\n\t" " .quad 0xF3FF0C3E3255AA6B" - - "\n\t" " # decryption sbox output *E*u, *E*t" - "\n\t" " .Lk_dsbe = .-.Laes_consts" - "\n\t" " .quad 0x46F2929626D4D000" - "\n\t" " .quad 0x2242600464B4F6B0" - "\n\t" " .quad 0x0C55A6CDFFAAC100" - "\n\t" " .quad 0x9467F36B98593E32" - - "\n\t" " # decryption sbox final output" - "\n\t" " .Lk_dsbo = .-.Laes_consts" - "\n\t" " .quad 0x1387EA537EF94000" - "\n\t" " .quad 0xC7AA6DB9D4943E2D" - "\n\t" " .quad 0x12D7560F93441D00" - "\n\t" " .quad 0xCA4B8159D8C58E9C" -X("\n\t" ".size _aes_consts,.-_aes_consts") -); - #endif /* USE_SSSE3 */ diff --git a/configure.ac b/configure.ac index 4932786..31c0d55 100644 --- a/configure.ac +++ b/configure.ac @@ -2031,6 +2031,7 @@ if test "$found" = "1" ; then # Build with the SSSE3 implementation GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-ssse3-amd64.lo" + GCRYPT_CIPHERS="$GCRYPT_CIPHERS rijndael-ssse3-amd64-asm.lo" ;; arm*-*-*) # Build with the assembly implementation From wk at gnupg.org Thu Jan 5 08:56:31 2017 From: wk at gnupg.org (Werner Koch) Date: Thu, 05 Jan 2017 08:56:31 +0100 Subject: [PATCH] rijndael-ssse3: move assembly functions to separate source-file In-Reply-To: <148356934217.19150.7143664773806268178.stgit@localhost6.localdomain6> (Jussi Kivilinna's message of "Thu, 05 Jan 2017 00:35:42 +0200") References: <148356934217.19150.7143664773806268178.stgit@localhost6.localdomain6> Message-ID: <871swiaqj4.fsf@wheatstone.g10code.de> On Wed, 4 Jan 2017 23:35, jussi.kivilinna at iki.fi said: > After this change, libgcrypt can be compiled with -flto optimization > enabled on x86-64. This is the fix for bug 2882, right? If so, please add a line GnuPG-bug-id: 2882 while applying the patch. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From jussi.kivilinna at iki.fi Fri Jan 6 11:20:49 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 6 Jan 2017 12:20:49 +0200 Subject: [PATCH] rijndael-ssse3: move assembly functions to separate source-file In-Reply-To: <871swiaqj4.fsf@wheatstone.g10code.de> References: <148356934217.19150.7143664773806268178.stgit@localhost6.localdomain6> <871swiaqj4.fsf@wheatstone.g10code.de> Message-ID: On 05.01.2017 09:56, Werner Koch wrote: > On Wed, 4 Jan 2017 23:35, jussi.kivilinna at iki.fi said: > >> After this change, libgcrypt can be compiled with -flto optimization >> enabled on x86-64. > > This is the fix for bug 2882, right? If so, please add a line > > GnuPG-bug-id: 2882 > > while applying the patch. Yes, patch fixes bug 2882. I'll add bug-id line. -Jussi From jussi.kivilinna at iki.fi Fri Jan 6 14:29:36 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 06 Jan 2017 15:29:36 +0200 Subject: [PATCH] tests/basic: add invalid-tag negative tests for GCM Message-ID: <148370937599.7792.103440657952953764.stgit@localhost6.localdomain6> * tests/basic.c (_check_gcm_cipher): Add invalid-tag tests. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/tests/basic.c b/tests/basic.c index 6d086b5..89ea3d5 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -1263,7 +1263,8 @@ _check_gcm_cipher (unsigned int step) char out[MAX_DATA_LEN]; char tag[MAX_DATA_LEN]; int taglen; - int should_fail; + int wrong_taglen; + int bad_tag; } tv[] = { /* http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/gcm-revised-spec.pdf */ @@ -1454,7 +1455,80 @@ _check_gcm_cipher (unsigned int step) "\xee\xb2\xb2\x2a\xaf\xde\x64\x19\xa0\x58\xab\x4f\x6f\x74\x6b\xf4" "\x0f\xc0\xc3\xb7\x80\xf2\x44\x45\x2d\xa3\xeb\xf1\xc5\xd8\x2c\xde" "\xa2\x41\x89\x97\x20\x0e\xf8\x2e\x44\xae\x7e\x3f", - "\xa4\x4a\x82\x66\xee\x1c\x8e\xb0\xc8\xb5\xd4\xcf\x5a\xe9\xf1\x9a" } + "\xa4\x4a\x82\x66\xee\x1c\x8e\xb0\xc8\xb5\xd4\xcf\x5a\xe9\xf1\x9a" }, + /* negative tests, invalid tag. */ + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\xa4\x4a\x82\x66\xee\x1c\x8e\xb0\xc8\xb5\xd4\xcf\x5a\xe9\xf1\x9a", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\xd8\xe2\xfc\xce\xfa\x7e\x30\x61\x36\x7f\x1d\x57\xa4\xe7\x45\x5a", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\x58\xe2\xfc\xce\xfa\x7e\x30\x61\x36\x7f\x1d\x57\xa4\xe7\x45\x5b", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\xd8\xe2\xfc\xce\xfa\x7e\x30\x61\x36\x7f\x1d\x57\xa4\xe7\x45\x5b", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\x58\xe2\xfc\xce\xfa\x7e\x30\xFF\x36\x7f\x1d\x57\xa4\xe7\x45\x5a", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\x58\xe2\xfc\xce\xfa\x7e\x30\xFF\xFF\x7f\x1d\x57\xa4\xe7\x45\x5a", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\x58\xe2\xfc\xce\xfa\x7e\x30\xFF\x36\x7f\x1d\x57\xa4\xe7\x45\xFF", + 16, 0, 1 }, + { GCRY_CIPHER_AES, + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", + "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 12, + "", 0, + "", + 0, + "", + "\xFF\xe2\xfc\xce\xfa\x7e\x30\x61\xFF\x7f\x1d\x57\xa4\xe7\x45\x5a", + 16, 0, 1 }, }; gcry_cipher_hd_t hde, hdd; @@ -1607,7 +1681,7 @@ _check_gcm_cipher (unsigned int step) err = gcry_cipher_gettag (hde, out, taglen2); if (err) { - if (tv[i].should_fail) + if (tv[i].wrong_taglen) goto next_tv; fail ("aes-gcm, gcry_cipher_gettag(%d) failed: %s\n", @@ -1617,8 +1691,13 @@ _check_gcm_cipher (unsigned int step) return; } - if (memcmp (tv[i].tag, out, taglen2)) - fail ("aes-gcm, encrypt tag mismatch entry %d\n", i); + if ((memcmp (tv[i].tag, out, taglen2) == 0) == tv[i].bad_tag) + { + if (!tv[i].bad_tag) + fail ("aes-gcm, encrypt tag mismatch entry %d\n", i); + else + fail ("aes-gcm, encrypt tag match bad-tag entry %d\n", i); + } err = gcry_cipher_checktag (hdd, out, taglen2); if (err) @@ -1702,7 +1781,7 @@ _check_gcm_cipher (unsigned int step) err = gcry_cipher_gettag (hde, tag, taglen2); if (err) { - if (tv[i].should_fail) + if (tv[i].wrong_taglen) goto next_tv; fail ("aes-gcm, gcry_cipher_gettag(%d, %lu) (byte-buf) failed: %s\n", @@ -1714,8 +1793,14 @@ _check_gcm_cipher (unsigned int step) taglen2 = tv[i].taglen ? tv[i].taglen : GCRY_GCM_BLOCK_LEN; - if (memcmp (tv[i].tag, tag, taglen2)) - fail ("aes-gcm, encrypt tag mismatch entry %d, (byte-buf)\n", i); + if ((memcmp (tv[i].tag, tag, taglen2) == 0) == tv[i].bad_tag) + { + if (!tv[i].bad_tag) + fail ("aes-gcm, encrypt tag mismatch entry %d, (byte-buf)\n", i); + else + fail ("aes-gcm, encrypt tag match bad-tag entry %d, (byte-buf)\n", + i); + } for (byteNum = 0; byteNum < tv[i].inlen; ++byteNum) { @@ -1733,9 +1818,12 @@ _check_gcm_cipher (unsigned int step) if (memcmp (tv[i].plaintext, out, tv[i].inlen)) fail ("aes-gcm, decrypt mismatch entry %d\n", i); - err = gcry_cipher_checktag (hdd, tag, taglen2); + err = gcry_cipher_checktag (hdd, tv[i].tag, taglen2); if (err) { + if (tv[i].bad_tag) + goto next_tv; + fail ("aes-gcm, gcry_cipher_checktag(%d) (byte-buf) failed: %s\n", i, gpg_strerror (err)); gcry_cipher_close (hde); @@ -1762,7 +1850,7 @@ _check_gcm_cipher (unsigned int step) return; } - if (tv[i].should_fail) + if (tv[i].wrong_taglen || tv[i].bad_tag) { fail ("aes-gcm, negative test succeeded %d\n", i); gcry_cipher_close (hde); From nathan at nathanrossi.com Tue Jan 10 15:41:12 2017 From: nathan at nathanrossi.com (Nathan Rossi) Date: Wed, 11 Jan 2017 00:41:12 +1000 Subject: [PATCH] configure.ac: Set 'mym4_revision' to 0 if not a git repo Message-ID: <20170110144112.5361-1-nathan@nathanrossi.com> --- It is possible for the source to not be located in a git repository (e.g. source is from a tarball). In which case the git repository information is not available. This results in the mym4_revision being an empty string however this value is used in BUILD_FILEVERSION where it is assumed to be 4 decimal values. Additionally BUILD_REVISION uses this value and is also assumed to be non-empty. In the case of BUILD_FILEVERSION it is used in versioninfo.rc.in, where it must be populated as 4 decimal values due to the expected syntax. In cases where it is not (e.g. when BUILD_FILEVERSION = '1,7,5,' a syntax error is raised. windres: versioninfo.rc.in:21: syntax error This patch changes mym4_revision so that if the 'git rev-parse' returns non-zero (e.g. not in a git repository) the value falls back to '0'. This propagates as '0' to both BUILD_FILEVERSION and BUILD_REVISION. Signed-off-by: Nathan Rossi --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 31c0d553fa..a3deffa6e9 100644 --- a/configure.ac +++ b/configure.ac @@ -39,7 +39,7 @@ m4_define(mym4_version_micro, [0]) m4_define(mym4_version, [mym4_version_major.mym4_version_minor.mym4_version_micro]) m4_define([mym4_revision], - m4_esyscmd([git rev-parse --short HEAD | tr -d '\n\r'])) + m4_esyscmd([(git rev-parse --short HEAD || printf '0') | tr -d '\n\r'])) m4_define([mym4_revision_dec], m4_esyscmd_s([echo $((0x$(echo ]mym4_revision[|head -c 4)))])) m4_define([mym4_betastring], -- 2.11.0 From dkg at fifthhorseman.net Wed Jan 11 01:40:11 2017 From: dkg at fifthhorseman.net (Daniel Kahn Gillmor) Date: Tue, 10 Jan 2017 19:40:11 -0500 Subject: [PATCH] configure.ac: Set 'mym4_revision' to 0 if not a git repo In-Reply-To: <20170110144112.5361-1-nathan@nathanrossi.com> References: <20170110144112.5361-1-nathan@nathanrossi.com> Message-ID: <87o9zeqvis.fsf@alice.fifthhorseman.net> On Tue 2017-01-10 09:41:12 -0500, Nathan Rossi wrote: > --- > It is possible for the source to not be located in a git repository > (e.g. source is from a tarball). In which case the git repository > information is not available. This results in the mym4_revision being an > empty string however this value is used in BUILD_FILEVERSION where it is > assumed to be 4 decimal values. Additionally BUILD_REVISION uses this > value and is also assumed to be non-empty. > > In the case of BUILD_FILEVERSION it is used in versioninfo.rc.in, where > it must be populated as 4 decimal values due to the expected syntax. In > cases where it is not (e.g. when BUILD_FILEVERSION = '1,7,5,' a syntax > error is raised. > > windres: versioninfo.rc.in:21: syntax error > > This patch changes mym4_revision so that if the 'git rev-parse' returns > non-zero (e.g. not in a git repository) the value falls back to '0'. > This propagates as '0' to both BUILD_FILEVERSION and BUILD_REVISION. > > Signed-off-by: Nathan Rossi > --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 31c0d553fa..a3deffa6e9 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -39,7 +39,7 @@ m4_define(mym4_version_micro, [0]) > m4_define(mym4_version, > [mym4_version_major.mym4_version_minor.mym4_version_micro]) > m4_define([mym4_revision], > - m4_esyscmd([git rev-parse --short HEAD | tr -d '\n\r'])) > + m4_esyscmd([(git rev-parse --short HEAD || printf '0') | tr -d '\n\r'])) > m4_define([mym4_revision_dec], > m4_esyscmd_s([echo $((0x$(echo ]mym4_revision[|head -c 4)))])) > m4_define([mym4_betastring], If this is accepted, this change should probably be applied to all the gpg-related tools that have this kind of git-trickery in configure.ac (libgpg-error, etc). I'm currently patching out similar things in debian so that we get consistent and stable versioning and fewer error messages in the build logs. --dkg From ametzler at bebt.de Wed Jan 11 18:59:37 2017 From: ametzler at bebt.de (Andreas Metzler) Date: Wed, 11 Jan 2017 18:59:37 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: References: <87h95fex1b.fsf@wheatstone.g10code.de> Message-ID: <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> [repost, gmane -> list swallowed first try] Jussi Kivilinna wrote: > On 03.01.2017 21:57, Werner Koch wrote: [...] >> Thus _gcry_aes_ssse3_ctr_enc fails after one block (128 bits). > Bug is in _gcry_aes_ssse3_ctr_enc. 'ctrlow' is passed to assembly block > as read-only register when it should be read/write as assembly block does > 64-bit increment on it. Whatever this ends up breaking depends on compiler > register allocation (thus version & flags). > So, on that machine, compiler passes 'ctrlow' to temporary register > before assembly and assembly part increments that register and > calculation is lost. > I'll push fix for this soon. Diff for rinjdael-ssse3 attached below. Hello, should I cherrypick this patch for Debian's 1.7 packages? Is there anything else that should go into soon-to-be-frozen next Debian release? Thanks, cu Andreas -- `What a good friend you are to him, Dr. Maturin. His other friends are so grateful to you.' `I sew his ears on from time to time, sure' -- `What a good friend you are to him, Dr. Maturin. His other friends are so grateful to you.' `I sew his ears on from time to time, sure' From chris.westervelt at advantor.com Wed Jan 11 17:04:26 2017 From: chris.westervelt at advantor.com (Chris Westervelt) Date: Wed, 11 Jan 2017 16:04:26 +0000 Subject: libgcrypt eclipse cross compile autotools Message-ID: Hi! Do any of you libgcrypt developers have an Eclipse project that allows you to test builds for ARM as a target configuration? I need to do a proof of concept project with a hacked up version of libgcrypt with hardware plugins for acceleration of AES functions with a SPI device encryption Chris Westervelt Senior Product Development Engineer Advantor Systems. 12612 Challenger Pkwy Suite 300 Orlando, FL 32826 http://www.advantor.com Office: (407) 926-6983 Mobile: (407) 595-7023 Fax: (407) 857-1635 Notice of Confidentiality: This e-mail communication and the attachments hereto, if any, are intended solely for the information and use of the addressee(s) identified above and may contain information which is legally privileged and/or otherwise confidential. If a recipient of this e-mail communication is not an addressee (or an authorized representative of an addressee), such recipient is hereby advised that any review, disclosure, reproduction, re-transmission or other dissemination or use of this e-mail communication (or any information contained herein) is strictly prohibited. If you are not an addressee and have received this e-mail communication in error, please advise the sender of that circumstance either by reply e-mail or by telephone at (800) 238-2686, immediately delete this e-mail communication from any computer and destroy all physical copies of same. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5101 bytes Desc: not available URL: From nathan at nathanrossi.com Thu Jan 12 06:03:44 2017 From: nathan at nathanrossi.com (Nathan Rossi) Date: Thu, 12 Jan 2017 15:03:44 +1000 Subject: [PATCH] configure.ac: Set 'mym4_revision' to 0 if not a git repo In-Reply-To: <87o9zeqvis.fsf@alice.fifthhorseman.net> References: <20170110144112.5361-1-nathan@nathanrossi.com> <87o9zeqvis.fsf@alice.fifthhorseman.net> Message-ID: On 11 January 2017 at 10:40, Daniel Kahn Gillmor wrote: > On Tue 2017-01-10 09:41:12 -0500, Nathan Rossi wrote: >> --- >> It is possible for the source to not be located in a git repository >> (e.g. source is from a tarball). In which case the git repository >> information is not available. This results in the mym4_revision being an >> empty string however this value is used in BUILD_FILEVERSION where it is >> assumed to be 4 decimal values. Additionally BUILD_REVISION uses this >> value and is also assumed to be non-empty. >> >> In the case of BUILD_FILEVERSION it is used in versioninfo.rc.in, where >> it must be populated as 4 decimal values due to the expected syntax. In >> cases where it is not (e.g. when BUILD_FILEVERSION = '1,7,5,' a syntax >> error is raised. >> >> windres: versioninfo.rc.in:21: syntax error >> >> This patch changes mym4_revision so that if the 'git rev-parse' returns >> non-zero (e.g. not in a git repository) the value falls back to '0'. >> This propagates as '0' to both BUILD_FILEVERSION and BUILD_REVISION. >> >> Signed-off-by: Nathan Rossi >> --- >> configure.ac | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/configure.ac b/configure.ac >> index 31c0d553fa..a3deffa6e9 100644 >> --- a/configure.ac >> +++ b/configure.ac >> @@ -39,7 +39,7 @@ m4_define(mym4_version_micro, [0]) >> m4_define(mym4_version, >> [mym4_version_major.mym4_version_minor.mym4_version_micro]) >> m4_define([mym4_revision], >> - m4_esyscmd([git rev-parse --short HEAD | tr -d '\n\r'])) >> + m4_esyscmd([(git rev-parse --short HEAD || printf '0') | tr -d '\n\r'])) >> m4_define([mym4_revision_dec], >> m4_esyscmd_s([echo $((0x$(echo ]mym4_revision[|head -c 4)))])) >> m4_define([mym4_betastring], > > > If this is accepted, this change should probably be applied to all the > gpg-related tools that have this kind of git-trickery in configure.ac > (libgpg-error, etc). I'm currently patching out similar things in > debian so that we get consistent and stable versioning and fewer error > messages in the build logs. That was the intention, I did send a patch like this for libgpg-error at the same time as this (however I think I mucked up the subscribe/send order and it was not received). I have resent it now. Other than libgpg-error which tools/libraries should be changed? I am not particularly familiar with all the related tools/libraries. I did have a look at some of the gnupg related tools/libraries but they use differing mechanisms for this process (most use autogen.sh --find-version). Thanks, Nathan From stefbon at gmail.com Thu Jan 12 16:28:34 2017 From: stefbon at gmail.com (Stef Bon) Date: Thu, 12 Jan 2017 16:28:34 +0100 Subject: Howto implement chacha20-poly1305? In-Reply-To: References:

<87mvgh56re.fsf@wheatstone.g10code.de> <87mvgg2g0p.fsf@wheatstone.g10code.de> <4d2f55cc-910e-bdd4-0505-c4a5f7c3ed3d@iki.fi>

Message-ID: Hi, I still do not have the chacha20 cipher running. When I look at it again, I get errors from the openssh server like: sshd[13449]: padding error: need 60 block 8 mod 4 [preauth] sshd[13449]: ssh_dispatch_run_fatal: Connection from 192.168.2.20 port 46440: message authentication code incorrect [preauth] It looks like that the mac is constructed from the packet buffer minus the first four bytes. Right now my software gets the mac from the whole packet buffer, which is also according to the rfc: https://tools.ietf.org/html/rfc4253#section-6.4 I read on the PROTOCOL.chacha20poly1305: "The second instance, keyed by K_2, is used in conjunction with poly1305 to build an AEAD (Authenticated Encryption with Associated Data) that is used to encrypt and authenticate the entire packet." Well not the entire packet obviously? Do you know how to write and verify the mac? When writing the mac (or aead) the data to read starts at packetbuffer or at packetbuffer + 4? In the last case that explains the error from openssh: it's not good alligned. Stef From wk at gnupg.org Fri Jan 13 16:58:47 2017 From: wk at gnupg.org (Werner Koch) Date: Fri, 13 Jan 2017 16:58:47 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> (Andreas Metzler's message of "Wed, 11 Jan 2017 18:59:37 +0100") References: <87h95fex1b.fsf@wheatstone.g10code.de> <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> Message-ID: <8737gngde0.fsf@wheatstone.g10code.de> On Wed, 11 Jan 2017 18:59, ametzler at bebt.de said: > should I cherrypick this patch for Debian's 1.7 packages? Yes, please. As an alternative I could do another 1.7 release on Monday. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From wk at gnupg.org Fri Jan 13 17:03:51 2017 From: wk at gnupg.org (Werner Koch) Date: Fri, 13 Jan 2017 17:03:51 +0100 Subject: [PATCH] configure.ac: Set 'mym4_revision' to 0 if not a git repo In-Reply-To: (Nathan Rossi's message of "Thu, 12 Jan 2017 15:03:44 +1000") References: <20170110144112.5361-1-nathan@nathanrossi.com> <87o9zeqvis.fsf@alice.fifthhorseman.net> Message-ID: <87y3yfeyl4.fsf@wheatstone.g10code.de> On Thu, 12 Jan 2017 06:03, nathan at nathanrossi.com said: > That was the intention, I did send a patch like this for libgpg-error > at the same time as this (however I think I mucked up the I noticed your pacthed and looked at it. However, I considere the way we handle this in gnupg better... > have a look at some of the gnupg related tools/libraries but they use > differing mechanisms for this process (most use autogen.sh > --find-version). Right, This is easier to maintain because autogen.sh should be identical for all gnupg related packages, Meanwhile I have ported this to libgpg-error but nut yet pushed. I need to do a few more tests, though. Thanks for your work and please have some patience until I can push that to libgpg-error and other packages. Shalom-Salam, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From kristian.fiskerstrand at sumptuouscapital.com Fri Jan 13 17:41:37 2017 From: kristian.fiskerstrand at sumptuouscapital.com (Kristian Fiskerstrand) Date: Fri, 13 Jan 2017 17:41:37 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: <8737gngde0.fsf@wheatstone.g10code.de> References: <87h95fex1b.fsf@wheatstone.g10code.de> <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> <8737gngde0.fsf@wheatstone.g10code.de> Message-ID: <393a619b-6be6-fe6b-5ae4-322962170f55@sumptuouscapital.com> On 01/13/2017 04:58 PM, Werner Koch wrote: > On Wed, 11 Jan 2017 18:59, ametzler at bebt.de said: > >> should I cherrypick this patch for Debian's 1.7 packages? > > Yes, please. As an alternative I could do another 1.7 release on > Monday. > I'd support a release so that it reaches other distros as well (then I'll hold off until new release in Gentoo as well) -- ---------------------------- Kristian Fiskerstrand Blog: https://blog.sumptuouscapital.com Twitter: @krifisk ---------------------------- Public OpenPGP keyblock at hkp://pool.sks-keyservers.net fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3 ---------------------------- "History doesn't repeat itself, but it does rhyme." (Mark Twain) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From ametzler at bebt.de Fri Jan 13 20:30:45 2017 From: ametzler at bebt.de (Andreas Metzler) Date: Fri, 13 Jan 2017 20:30:45 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: <8737gngde0.fsf@wheatstone.g10code.de> References: <87h95fex1b.fsf@wheatstone.g10code.de> <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> <8737gngde0.fsf@wheatstone.g10code.de> Message-ID: <20170113193045.qgn7pbyjstvsfpu5@argenau.bebt.de> On 2017-01-13 Werner Koch wrote: > On Wed, 11 Jan 2017 18:59, ametzler at bebt.de said: >> should I cherrypick this patch for Debian's 1.7 packages? > Yes, please. As an alternative I could do another 1.7 release on > Monday. Hello, I do not mind cherry-picking a single patch that applies without fuzz. Perhaps you could publish it on LIBGCRYPT-1-7-BRANCH and integrate it in the next 1.7 release whenever that is needed? thanks, cu Andreas -- `What a good friend you are to him, Dr. Maturin. His other friends are so grateful to you.' `I sew his ears on from time to time, sure' From ametzler at bebt.de Sat Jan 14 17:29:35 2017 From: ametzler at bebt.de (Andreas Metzler) Date: Sat, 14 Jan 2017 17:29:35 +0100 Subject: SSSE3 problems on Nehalem? In-Reply-To: <20170113193045.qgn7pbyjstvsfpu5@argenau.bebt.de> References: <87h95fex1b.fsf@wheatstone.g10code.de> <20170111175937.4qpbffp3zspfrnql@argenau.bebt.de> <8737gngde0.fsf@wheatstone.g10code.de> <20170113193045.qgn7pbyjstvsfpu5@argenau.bebt.de> Message-ID: <20170114162935.vhhzic2s7vl3ngax@argenau.bebt.de> On 2017-01-13 Andreas Metzler wrote: > On 2017-01-13 Werner Koch wrote: > > On Wed, 11 Jan 2017 18:59, ametzler at bebt.de said: > >> should I cherrypick this patch for Debian's 1.7 packages? >> Yes, please. As an alternative I could do another 1.7 release on >> Monday. > Hello, > I do not mind cherry-picking a single patch that applies without fuzz. [x] Uploaded. From stefbon at gmail.com Sun Jan 15 10:56:28 2017 From: stefbon at gmail.com (Stef Bon) Date: Sun, 15 Jan 2017 10:56:28 +0100 Subject: Howto implement chacha20-poly1305? In-Reply-To: References:

<87mvgh56re.fsf@wheatstone.g10code.de> <87mvgg2g0p.fsf@wheatstone.g10code.de> <4d2f55cc-910e-bdd4-0505-c4a5f7c3ed3d@iki.fi>

Message-ID: Well I've got it working. It has been an allignment issue. It looks to me that the code Jussi has written is correct. I had to find out that the manner to determine padding has also changed. The chacha20poly1305 has to two ciphers, the data of the main cipher (starting at byte 4) has to be alligned, which is --not-- documented at all. If you want me to test the performance compared to other ciphers, let me know. Stef From cvs at cvs.gnupg.org Wed Jan 18 10:27:29 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Wed, 18 Jan 2017 10:27:29 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-55-g623aab8 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 623aab8a940ea61afe3fef650ad485a755ed9fe7 (commit) from ddcfe31e2425e88b280e7cdaf3f0eaaad8ccc023 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 623aab8a940ea61afe3fef650ad485a755ed9fe7 Author: Werner Koch Date: Wed Jan 18 10:24:06 2017 +0100 random: Call getrandom before select and emitting a progress callback. * random/rndlinux.c (_gcry_rndlinux_gather_random): Move the getrandom call before the select. -- A select for getrandom does not make any sense because there is no file descriptor for getrandom. Thus if getrandom is available we now select only when we want to read from the blocking /dev/random. In most cases this avoids all progress callbacks. Signed-off-by: Werner Koch diff --git a/random/rndlinux.c b/random/rndlinux.c index 562149a..d3a144a 100644 --- a/random/rndlinux.c +++ b/random/rndlinux.c @@ -195,50 +195,6 @@ _gcry_rndlinux_gather_random (void (*add)(const void*, size_t, struct timeval tv; int rc; - /* If we collected some bytes update the progress indicator. We - do this always and not just if the select timed out because - often just a few bytes are gathered within the timeout - period. */ - if (any_need_entropy || last_so_far != (want - length) ) - { - last_so_far = want - length; - _gcry_random_progress ("need_entropy", 'X', - (int)last_so_far, (int)want); - any_need_entropy = 1; - } - - /* If the system has no limit on the number of file descriptors - and we encounter an fd which is larger than the fd_set size, - we don't use the select at all. The select code is only used - to emit progress messages. A better solution would be to - fall back to poll() if available. */ -#ifdef FD_SETSIZE - if (fd < FD_SETSIZE) -#endif - { - FD_ZERO(&rfds); - FD_SET(fd, &rfds); - tv.tv_sec = delay; - tv.tv_usec = delay? 0 : 100000; - _gcry_pre_syscall (); - rc = select (fd+1, &rfds, NULL, NULL, &tv); - _gcry_post_syscall (); - if (!rc) - { - any_need_entropy = 1; - delay = 3; /* Use 3 seconds henceforth. */ - continue; - } - else if( rc == -1 ) - { - log_error ("select() error: %s\n", strerror(errno)); - if (!delay) - delay = 1; /* Use 1 second if we encounter an error before - we have ever blocked. */ - continue; - } - } - /* If we have a modern Linux kernel and we want to read from the * the non-blocking /dev/urandom, we first try to use the new * getrandom syscall. That call guarantees that the kernel's @@ -283,6 +239,50 @@ _gcry_rndlinux_gather_random (void (*add)(const void*, size_t, } #endif + /* If we collected some bytes update the progress indicator. We + do this always and not just if the select timed out because + often just a few bytes are gathered within the timeout + period. */ + if (any_need_entropy || last_so_far != (want - length) ) + { + last_so_far = want - length; + _gcry_random_progress ("need_entropy", 'X', + (int)last_so_far, (int)want); + any_need_entropy = 1; + } + + /* If the system has no limit on the number of file descriptors + and we encounter an fd which is larger than the fd_set size, + we don't use the select at all. The select code is only used + to emit progress messages. A better solution would be to + fall back to poll() if available. */ +#ifdef FD_SETSIZE + if (fd < FD_SETSIZE) +#endif + { + FD_ZERO(&rfds); + FD_SET(fd, &rfds); + tv.tv_sec = delay; + tv.tv_usec = delay? 0 : 100000; + _gcry_pre_syscall (); + rc = select (fd+1, &rfds, NULL, NULL, &tv); + _gcry_post_syscall (); + if (!rc) + { + any_need_entropy = 1; + delay = 3; /* Use 3 seconds henceforth. */ + continue; + } + else if( rc == -1 ) + { + log_error ("select() error: %s\n", strerror(errno)); + if (!delay) + delay = 1; /* Use 1 second if we encounter an error before + we have ever blocked. */ + continue; + } + } + do { size_t nbytes; ----------------------------------------------------------------------- Summary of changes: random/rndlinux.c | 88 +++++++++++++++++++++++++++---------------------------- 1 file changed, 44 insertions(+), 44 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jo.vanbulck at cs.kuleuven.be Thu Jan 19 17:22:52 2017 From: jo.vanbulck at cs.kuleuven.be (Jo Van Bulck) Date: Thu, 19 Jan 2017 17:22:52 +0100 Subject: [PATCH] ecc: store EdDSA session key in secure memory Message-ID: <7288bb2b-f88a-997f-a126-fb7846f91631@cs.kuleuven.be> Hi gcrypt-devel, Regarding the function _gcry_ecc_eddsa_sign (cipher/ecc-eddsa.c), I am wondering why the long-term secret key 'a' is stored in secure memory, whereas the derived session key 'r' is not. This seems particularly important in the case of EdDSA as the function _gcry_mpi_ec_mul_point (mpi/ec.c) attempts to provide side-channel protection by using constant time operations for scalars residing in secure memory. As far as I understand from Bernstein et al. (http://cr.yp.to/papers.html#ed25519), an attacker who learns 'r' from side-channel observation during the signing process can easily recover 'a' as follows: Given a valid signature (R,S) for message m, public key pk, point G and hash function H, S = r + a * H(encodepoint(R) + encodepoint(pk) + m) mod n => a = (S - r) / H(encodepoint(R) + encodepoint(pk) + m) mod n Or am I missing something here? If not, I included a simple patch below. Regards, Jo From dbc810e1a75559413e610d3913796559de792677 Mon Sep 17 00:00:00 2001 From: Jo Van Bulck Date: Thu, 19 Jan 2017 17:00:15 +0100 Subject: [PATCH 1/1] ecc: store EdDSA session key in secure memory. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_sign): use mpi_snew to allocate session key. -- An attacker who learns the EdDSA session key from side-channel observation during the signing process, can easily revover the long- term secret key. Storing the session key in secure memory ensures that constant time point operations are used in the MPI library. Signed-off-by: Jo Van Bulck --- cipher/ecc-eddsa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cipher/ecc-eddsa.c b/cipher/ecc-eddsa.c index f91f848..813e030 100644 --- a/cipher/ecc-eddsa.c +++ b/cipher/ecc-eddsa.c @@ -603,7 +603,7 @@ _gcry_ecc_eddsa_sign (gcry_mpi_t input, ECC_secret_key *skey, a = mpi_snew (0); x = mpi_new (0); y = mpi_new (0); - r = mpi_new (0); + r = mpi_snew (0); ctx = _gcry_mpi_ec_p_internal_new (skey->E.model, skey->E.dialect, 0, skey->E.p, skey->E.a, skey->E.b); b = (ctx->nbits+7)/8; -- 2.7.4 From jussi.kivilinna at iki.fi Mon Jan 23 19:31:18 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Mon, 23 Jan 2017 20:31:18 +0200 Subject: [PATCH] bufhelp: use 'may_alias' attribute unaligned pointer types Message-ID: <148519627849.27087.13508872859182789192.stgit@localhost6.localdomain6> * configure.ac (gcry_cv_gcc_attribute_may_alias) (HAVE_GCC_ATTRIBUTE_MAY_ALIAS): New check for 'may_alias' attribute. * cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only if HAVE_GCC_ATTRIBUTE_MAY_ALIAS is defined. [BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_int_t, bufhelp_u32_t) (bufhelp_u64_t): Add 'may_alias' attribute. * src/g10lib.h (fast_wipememory_t): Add HAVE_GCC_ATTRIBUTE_MAY_ALIAS defined check; Add 'may_alias' attribute. -- Attribute 'may_alias' was missing from bufhelp unaligned memory access pointer types, and was causing problems with newer GCC versions (with more aggressive optimization). This patch fixes broken Camellia-CFB with '-O3 -flto' flags with GCC-6 on x86-64 and generic GCM with default '-O2' on x32. Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index df35594..3616515 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -26,6 +26,7 @@ #undef BUFHELP_FAST_UNALIGNED_ACCESS #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ (defined(__i386__) || defined(__x86_64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ defined(__aarch64__)) @@ -43,7 +44,7 @@ typedef struct bufhelp_int_s { uintptr_t a; -} __attribute__((packed, aligned(1))) bufhelp_int_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_int_t; #else /* Define type with default alignment for other architectures (unaligned accessed handled in per byte loops). @@ -370,7 +371,7 @@ static inline void buf_put_le64(void *_buf, u64 val) typedef struct bufhelp_u32_s { u32 a; -} __attribute__((packed, aligned(1))) bufhelp_u32_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_u32_t; /* Functions for loading and storing unaligned u32 values of different endianness. */ @@ -400,7 +401,7 @@ static inline void buf_put_le32(void *_buf, u32 val) typedef struct bufhelp_u64_s { u64 a; -} __attribute__((packed, aligned(1))) bufhelp_u64_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_u64_t; /* Functions for loading and storing unaligned u64 values of different endianness. */ diff --git a/configure.ac b/configure.ac index 31c0d55..5dd27ca 100644 --- a/configure.ac +++ b/configure.ac @@ -994,6 +994,21 @@ fi # +# Check whether the compiler supports the GCC style may_alias attribute +# +AC_CACHE_CHECK([whether the GCC style may_alias attribute is supported], + [gcry_cv_gcc_attribute_may_alias], + [gcry_cv_gcc_attribute_may_alias=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[struct { int a; } foo __attribute__ ((may_alias));]])], + [gcry_cv_gcc_attribute_may_alias=yes])]) +if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_MAY_ALIAS,1, + [Defined if a GCC style "__attribute__ ((may_alias))" is supported]) +fi + + +# # Check whether the compiler supports 'asm' or '__asm__' keyword for # assembler blocks. # diff --git a/src/g10lib.h b/src/g10lib.h index 1308cff..8ce84b8 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -334,6 +334,7 @@ void __gcry_burn_stack (unsigned int bytes); /* Following architectures can handle unaligned accesses fast. */ #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ (defined(__i386__) || defined(__x86_64__) || \ defined(__powerpc__) || defined(__powerpc64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ @@ -342,7 +343,7 @@ void __gcry_burn_stack (unsigned int bytes); typedef struct fast_wipememory_s { FASTWIPE_T a; -} __attribute__((packed, aligned(1))) fast_wipememory_t; +} __attribute__((packed, aligned(1), may_alias)) fast_wipememory_t; #else #define fast_wipememory2_unaligned_head(_vptr,_vset,_vlen) do { \ while((size_t)(_vptr)&(sizeof(FASTWIPE_T)-1) && _vlen) \ From jussi.kivilinna at iki.fi Mon Jan 23 19:31:49 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Mon, 23 Jan 2017 20:31:49 +0200 Subject: [PATCH] rijndael-ssse3-amd64: fix building on x32 Message-ID: <148519630940.27202.4473572007616432061.stgit@localhost6.localdomain6> * cipher/rijndael-ssse3-amd64.c: Use 64-bit call instructions with 64-bit registers. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index 25d1849..78d8234 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -128,14 +128,14 @@ extern void _gcry_aes_ssse3_decrypt_core(void); #define vpaes_ssse3_prepare_enc() \ vpaes_ssse3_prepare(); \ - asm volatile ("call *%[core] \n\t" \ + asm volatile ("callq *%q[core] \n\t" \ : \ : [core] "r" (_gcry_aes_ssse3_enc_preload) \ : "rax", "cc", "memory" ) #define vpaes_ssse3_prepare_dec() \ vpaes_ssse3_prepare(); \ - asm volatile ("call *%[core] \n\t" \ + asm volatile ("callq *%q[core] \n\t" \ : \ : [core] "r" (_gcry_aes_ssse3_dec_preload) \ : "rax", "cc", "memory" ) @@ -155,7 +155,7 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call *%[core]" "\n\t" + "callq *%q[core]" "\n\t" : : [core] "r" (&_gcry_aes_ssse3_schedule_core), [key] "m" (*key), @@ -208,7 +208,7 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call *%[core]" "\n\t" + "callq *%q[core]" "\n\t" : : [core] "r" (_gcry_aes_ssse3_schedule_core), [key] "m" (ctx->keyschdec32[0][0]), @@ -231,7 +231,7 @@ do_vpaes_ssse3_enc (const RIJNDAEL_context *ctx, unsigned int nrounds) unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschenc32; - asm volatile ("call *%[core]" "\n\t" + asm volatile ("callq *%q[core]" "\n\t" : "+a" (middle_rounds), "+d" (keysched) : [core] "r" (_gcry_aes_ssse3_encrypt_core) : "rcx", "rsi", "rdi", "cc", "memory"); @@ -246,7 +246,7 @@ do_vpaes_ssse3_dec (const RIJNDAEL_context *ctx, unsigned int nrounds) unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschdec32; - asm volatile ("call *%[core]" "\n\t" + asm volatile ("callq *%q[core]" "\n\t" : "+a" (middle_rounds), "+d" (keysched) : [core] "r" (_gcry_aes_ssse3_decrypt_core) : "rcx", "rsi", "cc", "memory"); From cvs at cvs.gnupg.org Mon Jan 23 22:00:18 2017 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Mon, 23 Jan 2017 22:00:18 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-57-g39b9302 Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via 39b9302da5d08bd52688d20befe626fee0b6c41d (commit) via bf9e0b79e620ca2324224893b07522462b125412 (commit) from 623aab8a940ea61afe3fef650ad485a755ed9fe7 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 39b9302da5d08bd52688d20befe626fee0b6c41d Author: Jussi Kivilinna Date: Mon Jan 23 20:01:32 2017 +0200 rijndael-ssse3-amd64: fix building on x32 * cipher/rijndael-ssse3-amd64.c: Use 64-bit call instructions with 64-bit registers. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-ssse3-amd64.c b/cipher/rijndael-ssse3-amd64.c index 25d1849..78d8234 100644 --- a/cipher/rijndael-ssse3-amd64.c +++ b/cipher/rijndael-ssse3-amd64.c @@ -128,14 +128,14 @@ extern void _gcry_aes_ssse3_decrypt_core(void); #define vpaes_ssse3_prepare_enc() \ vpaes_ssse3_prepare(); \ - asm volatile ("call *%[core] \n\t" \ + asm volatile ("callq *%q[core] \n\t" \ : \ : [core] "r" (_gcry_aes_ssse3_enc_preload) \ : "rax", "cc", "memory" ) #define vpaes_ssse3_prepare_dec() \ vpaes_ssse3_prepare(); \ - asm volatile ("call *%[core] \n\t" \ + asm volatile ("callq *%q[core] \n\t" \ : \ : [core] "r" (_gcry_aes_ssse3_dec_preload) \ : "rax", "cc", "memory" ) @@ -155,7 +155,7 @@ _gcry_aes_ssse3_do_setkey (RIJNDAEL_context *ctx, const byte *key) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call *%[core]" "\n\t" + "callq *%q[core]" "\n\t" : : [core] "r" (&_gcry_aes_ssse3_schedule_core), [key] "m" (*key), @@ -208,7 +208,7 @@ _gcry_aes_ssse3_prepare_decryption (RIJNDAEL_context *ctx) "leaq %[buf], %%rdx" "\n\t" "movl %[dir], %%ecx" "\n\t" "movl %[rotoffs], %%r8d" "\n\t" - "call *%[core]" "\n\t" + "callq *%q[core]" "\n\t" : : [core] "r" (_gcry_aes_ssse3_schedule_core), [key] "m" (ctx->keyschdec32[0][0]), @@ -231,7 +231,7 @@ do_vpaes_ssse3_enc (const RIJNDAEL_context *ctx, unsigned int nrounds) unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschenc32; - asm volatile ("call *%[core]" "\n\t" + asm volatile ("callq *%q[core]" "\n\t" : "+a" (middle_rounds), "+d" (keysched) : [core] "r" (_gcry_aes_ssse3_encrypt_core) : "rcx", "rsi", "rdi", "cc", "memory"); @@ -246,7 +246,7 @@ do_vpaes_ssse3_dec (const RIJNDAEL_context *ctx, unsigned int nrounds) unsigned int middle_rounds = nrounds - 1; const void *keysched = ctx->keyschdec32; - asm volatile ("call *%[core]" "\n\t" + asm volatile ("callq *%q[core]" "\n\t" : "+a" (middle_rounds), "+d" (keysched) : [core] "r" (_gcry_aes_ssse3_decrypt_core) : "rcx", "rsi", "cc", "memory"); commit bf9e0b79e620ca2324224893b07522462b125412 Author: Jussi Kivilinna Date: Mon Jan 23 19:48:28 2017 +0200 bufhelp: use 'may_alias' attribute unaligned pointer types * configure.ac (gcry_cv_gcc_attribute_may_alias) (HAVE_GCC_ATTRIBUTE_MAY_ALIAS): New check for 'may_alias' attribute. * cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only if HAVE_GCC_ATTRIBUTE_MAY_ALIAS is defined. [BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_int_t, bufhelp_u32_t) (bufhelp_u64_t): Add 'may_alias' attribute. * src/g10lib.h (fast_wipememory_t): Add HAVE_GCC_ATTRIBUTE_MAY_ALIAS defined check; Add 'may_alias' attribute. -- Attribute 'may_alias' was missing from bufhelp unaligned memory access pointer types, and was causing problems with newer GCC versions (with more aggressive optimization). This patch fixes broken Camellia-CFB with '-O3 -flto' flags with GCC-6 on x86-64 and generic GCM with default '-O2' on x32. Signed-off-by: Jussi Kivilinna diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index df35594..3616515 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -26,6 +26,7 @@ #undef BUFHELP_FAST_UNALIGNED_ACCESS #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ (defined(__i386__) || defined(__x86_64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ defined(__aarch64__)) @@ -43,7 +44,7 @@ typedef struct bufhelp_int_s { uintptr_t a; -} __attribute__((packed, aligned(1))) bufhelp_int_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_int_t; #else /* Define type with default alignment for other architectures (unaligned accessed handled in per byte loops). @@ -370,7 +371,7 @@ static inline void buf_put_le64(void *_buf, u64 val) typedef struct bufhelp_u32_s { u32 a; -} __attribute__((packed, aligned(1))) bufhelp_u32_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_u32_t; /* Functions for loading and storing unaligned u32 values of different endianness. */ @@ -400,7 +401,7 @@ static inline void buf_put_le32(void *_buf, u32 val) typedef struct bufhelp_u64_s { u64 a; -} __attribute__((packed, aligned(1))) bufhelp_u64_t; +} __attribute__((packed, aligned(1), may_alias)) bufhelp_u64_t; /* Functions for loading and storing unaligned u64 values of different endianness. */ diff --git a/configure.ac b/configure.ac index 31c0d55..5dd27ca 100644 --- a/configure.ac +++ b/configure.ac @@ -994,6 +994,21 @@ fi # +# Check whether the compiler supports the GCC style may_alias attribute +# +AC_CACHE_CHECK([whether the GCC style may_alias attribute is supported], + [gcry_cv_gcc_attribute_may_alias], + [gcry_cv_gcc_attribute_may_alias=no + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [[struct { int a; } foo __attribute__ ((may_alias));]])], + [gcry_cv_gcc_attribute_may_alias=yes])]) +if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then + AC_DEFINE(HAVE_GCC_ATTRIBUTE_MAY_ALIAS,1, + [Defined if a GCC style "__attribute__ ((may_alias))" is supported]) +fi + + +# # Check whether the compiler supports 'asm' or '__asm__' keyword for # assembler blocks. # diff --git a/src/g10lib.h b/src/g10lib.h index 1308cff..8ce84b8 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -334,6 +334,7 @@ void __gcry_burn_stack (unsigned int bytes); /* Following architectures can handle unaligned accesses fast. */ #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ (defined(__i386__) || defined(__x86_64__) || \ defined(__powerpc__) || defined(__powerpc64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ @@ -342,7 +343,7 @@ void __gcry_burn_stack (unsigned int bytes); typedef struct fast_wipememory_s { FASTWIPE_T a; -} __attribute__((packed, aligned(1))) fast_wipememory_t; +} __attribute__((packed, aligned(1), may_alias)) fast_wipememory_t; #else #define fast_wipememory2_unaligned_head(_vptr,_vset,_vlen) do { \ while((size_t)(_vptr)&(sizeof(FASTWIPE_T)-1) && _vlen) \ ----------------------------------------------------------------------- Summary of changes: cipher/bufhelp.h | 7 ++++--- cipher/rijndael-ssse3-amd64.c | 12 ++++++------ configure.ac | 15 +++++++++++++++ src/g10lib.h | 3 ++- 4 files changed, 27 insertions(+), 10 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From mathias.baumann at sociomantic.com Tue Jan 24 15:43:31 2017 From: mathias.baumann at sociomantic.com (Mathias L. Baumann) Date: Tue, 24 Jan 2017 15:43:31 +0100 Subject: [PATCH] CFB 8 Bit implementation Message-ID: Hello dear Gcrypt Devs, as a followup to Lizas request/inqueries I now implemented CFB in 8 bit mode. Please let me know about any changes you want to see. I should add that my stack burning code is more or less just guessing how it should be done by imitating what I saw in the other functions. Please validate that I did that correct :) The patch is attached and can also found at https://github.com/mathias-baumann-sociomantic/libgcrypt/tree/cfb8 cheers, --Mathias Baumann -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Implement-CFB-with-8bit-mode.patch Type: text/x-patch Size: 6769 bytes Desc: not available URL: From smueller at chronox.de Wed Jan 25 14:27:58 2017 From: smueller at chronox.de (Stephan =?ISO-8859-1?Q?M=FCller?=) Date: Wed, 25 Jan 2017 14:27:58 +0100 Subject: [PATCH] CFB 8 Bit implementation In-Reply-To: References: Message-ID: <5403891.bK0XkydZDg@tauon.atsec.com> Am Dienstag, 24. Januar 2017, 15:43:31 CET schrieb Mathias L. Baumann: Hi Mathias, > Hello dear Gcrypt Devs, > > as a followup to Lizas request/inqueries I now implemented CFB in 8 bit > mode. > > Please let me know about any changes you want to see. > > I should add that my stack burning code is more or less just guessing > how it should be done by imitating what I saw in the other functions. > Please validate that I did that correct :) I guess you should take at least one or two test vectors from http:// csrc.nist.gov/groups/STM/cavp/block-ciphers.html#aes and add as a self test as you find in all other implementations. If you want to confirm your implementation, have all test vectors you find at the given URL processed by your implementation. > > > The patch is attached and can also found at > https://github.com/mathias-baumann-sociomantic/libgcrypt/tree/cfb8 > > cheers, > > --Mathias Baumann Ciao Stephan From mathias.baumann at sociomantic.com Wed Jan 25 14:55:54 2017 From: mathias.baumann at sociomantic.com (Mathias Baumann) Date: Wed, 25 Jan 2017 13:55:54 +0000 Subject: [PATCH] CFB 8 Bit implementation In-Reply-To: <5403891.bK0XkydZDg@tauon.atsec.com> References: , <5403891.bK0XkydZDg@tauon.atsec.com> Message-ID: > I guess you should take at least one or two test vectors from http:// > csrc.nist.gov/groups/STM/cavp/block-ciphers.html#aes and add as a self test as > you find in all other implementations. > If you want to confirm your implementation, have all test vectors you find at > the given URL processed by your implementation. I have used the vectors found at http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf which is also what the other tests in that file use. Unfortunately that document didn't seem to provide vectors >1 byte. Neither does your link though. I did test the implementation with numerous internal tests in our applications that previously worked with the mcrypt library. With those tests and my own I am fairly confident about the implementation. However if you wish to see all those tests implemented before accepting the patch, I can do that. cheers, --Marenz ________________________________ From: Stephan M?ller Sent: 25 January 2017 14:27:58 To: gcrypt-devel at gnupg.org Cc: Mathias Baumann Subject: Re: [PATCH] CFB 8 Bit implementation Am Dienstag, 24. Januar 2017, 15:43:31 CET schrieb Mathias L. Baumann: Hi Mathias, > Hello dear Gcrypt Devs, > > as a followup to Lizas request/inqueries I now implemented CFB in 8 bit > mode. > > Please let me know about any changes you want to see. > > I should add that my stack burning code is more or less just guessing > how it should be done by imitating what I saw in the other functions. > Please validate that I did that correct :) I guess you should take at least one or two test vectors from http:// csrc.nist.gov/groups/STM/cavp/block-ciphers.html#aes and add as a self test as you find in all other implementations. If you want to confirm your implementation, have all test vectors you find at the given URL processed by your implementation. > > > The patch is attached and can also found at > https://github.com/mathias-baumann-sociomantic/libgcrypt/tree/cfb8 > > cheers, > > --Mathias Baumann Ciao Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jussi.kivilinna at iki.fi Wed Jan 25 22:15:10 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:10 +0200 Subject: [PATCH 1/7] bufhelp: add 'may_alias' attribute for properly aligned 'bufhelp_int_t' Message-ID: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> * cipher/bufhelp.h [!BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_int_t): Add 'may_alias' attribute. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index 3616515..1c52db5 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -49,11 +49,18 @@ typedef struct bufhelp_int_s /* Define type with default alignment for other architectures (unaligned accessed handled in per byte loops). */ +#ifdef HAVE_GCC_ATTRIBUTE_MAY_ALIAS +typedef struct bufhelp_int_s +{ + uintptr_t a; +} __attribute__((may_alias)) bufhelp_int_t; +#else typedef struct bufhelp_int_s { uintptr_t a; } bufhelp_int_t; #endif +#endif /* Optimized function for small buffer copying */ From jussi.kivilinna at iki.fi Wed Jan 25 22:15:15 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:15 +0200 Subject: [PATCH 2/7] configure.ac: fix may_alias attribute check In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537891562.23293.13704630593950904486.stgit@localhost6.localdomain6> * configure.ac: Test may_alias attribute on type, not on variable. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/configure.ac b/configure.ac index 5dd27ca..d2b863c 100644 --- a/configure.ac +++ b/configure.ac @@ -1000,7 +1000,8 @@ AC_CACHE_CHECK([whether the GCC style may_alias attribute is supported], [gcry_cv_gcc_attribute_may_alias], [gcry_cv_gcc_attribute_may_alias=no AC_COMPILE_IFELSE([AC_LANG_SOURCE( - [[struct { int a; } foo __attribute__ ((may_alias));]])], + [[typedef struct foo_s { int a; } + __attribute__ ((may_alias)) foo_t;]])], [gcry_cv_gcc_attribute_may_alias=yes])]) if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then AC_DEFINE(HAVE_GCC_ATTRIBUTE_MAY_ALIAS,1, From jussi.kivilinna at iki.fi Wed Jan 25 22:15:20 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:20 +0200 Subject: [PATCH 3/7] configure.ac: fix attribute checks In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537892064.23293.17556691217072586615.stgit@localhost6.localdomain6> * configure.ac: Add -Werror flag for attribute checks. -- Compilter ignores unknown attributes and just shows warning. Therefore attribute checks need to be run with -Werror. Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/configure.ac b/configure.ac index d2b863c..bc5bed4 100644 --- a/configure.ac +++ b/configure.ac @@ -958,6 +958,12 @@ if test "$gcry_cv_visibility_attribute" = "yes" \ fi +# Following attribute tests depend on warnings to cause compile to fail, +# so set -Werror temporarily. +_gcc_cflags_save=$CFLAGS +CFLAGS="$CFLAGS -Werror" + + # # Check whether the compiler supports the GCC style aligned attribute # @@ -1009,6 +1015,10 @@ if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then fi +# Restore flags. +CFLAGS=$_gcc_cflags_save; + + # # Check whether the compiler supports 'asm' or '__asm__' keyword for # assembler blocks. From jussi.kivilinna at iki.fi Wed Jan 25 22:15:25 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:25 +0200 Subject: [PATCH 4/7] crc-intel-pclmul: fix undefined behavior with unaligned access In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537892566.23293.14366022559234944901.stgit@localhost6.localdomain6> * cipher/crc-intel-pclmul.c (u16_unaligned_s): New. (crc32_reflected_less_than_16, crc32_less_than_16): Use 'u16_unaligned_s' for unaligned memory access. -- GnuPG-bug-id: 2292 Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c index 7a344e2..8ff08ec 100644 --- a/cipher/crc-intel-pclmul.c +++ b/cipher/crc-intel-pclmul.c @@ -44,6 +44,12 @@ #define ALIGNED_16 __attribute__ ((aligned (16))) +struct u16_unaligned_s +{ + u16 a; +} __attribute__((packed, aligned (1), may_alias)); + + /* Constants structure for generic reflected/non-reflected CRC32 CLMUL * functions. */ struct crc32_consts_s @@ -345,14 +351,14 @@ crc32_reflected_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, } else if (inlen == 2) { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data ^= crc; data <<= 16; crc >>= 16; } else { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data |= inbuf[2] << 16; data ^= crc; data <<= 8; @@ -709,14 +715,14 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, } else if (inlen == 2) { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data ^= crc; data = _gcry_bswap32(data << 16); crc = _gcry_bswap32(crc >> 16); } else { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data |= inbuf[2] << 16; data ^= crc; data = _gcry_bswap32(data << 8); From jussi.kivilinna at iki.fi Wed Jan 25 22:15:30 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:30 +0200 Subject: [PATCH 5/7] cipher-xts: fix pointer casting to wrong alignment and aliasing In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537893068.23293.3823481032219292262.stgit@localhost6.localdomain6> * cipher/cipher-xts.c (xts_gfmul_byA, xts_inc128): Use buf_get_le64 and buf_put_le64 for accessing data; Change parameter pointers to 'unsigned char *' type. (_gcry_cipher_xts_crypt): Do not cast buffer pointers to 'u64 *' for helper functions. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/cipher-xts.c b/cipher/cipher-xts.c index 7a7181b..4da89e5 100644 --- a/cipher/cipher-xts.c +++ b/cipher/cipher-xts.c @@ -29,29 +29,29 @@ #include "./cipher-internal.h" -static inline void xts_gfmul_byA (u64 *out, const u64 *in) +static inline void xts_gfmul_byA (unsigned char *out, const unsigned char *in) { - u64 hi = le_bswap64 (in[1]); - u64 lo = le_bswap64 (in[0]); + u64 hi = buf_get_le64 (in + 8); + u64 lo = buf_get_le64 (in + 0); u64 carry = -(hi >> 63) & 0x87; hi = (hi << 1) + (lo >> 63); lo = (lo << 1) ^ carry; - out[1] = le_bswap64 (hi); - out[0] = le_bswap64 (lo); + buf_put_le64 (out + 8, hi); + buf_put_le64 (out + 0, lo); } -static inline void xts_inc128 (u64 *seqno) +static inline void xts_inc128 (unsigned char *seqno) { - u64 lo = le_bswap64 (seqno[0]); - u64 hi = le_bswap64 (seqno[1]); + u64 lo = buf_get_le64 (seqno + 0); + u64 hi = buf_get_le64 (seqno + 8); hi += !(++lo); - seqno[0] = le_bswap64 (lo); - seqno[1] = le_bswap64 (hi); + buf_put_le64 (seqno + 0, lo); + buf_put_le64 (seqno + 8, hi); } @@ -117,7 +117,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, nblocks--; /* Generate next tweak. */ - xts_gfmul_byA ((u64 *)c->u_ctr.ctr, (u64 *)c->u_ctr.ctr); + xts_gfmul_byA (c->u_ctr.ctr, c->u_ctr.ctr); } /* Handle remaining data with ciphertext stealing. */ @@ -129,7 +129,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, gcry_assert (inbuflen < GCRY_XTS_BLOCK_LEN * 2); /* Generate last tweak. */ - xts_gfmul_byA (tmp.x64, (u64 *)c->u_ctr.ctr); + xts_gfmul_byA (tmp.x1, c->u_ctr.ctr); /* Decrypt last block first. */ buf_xor (outbuf, inbuf, tmp.x64, GCRY_XTS_BLOCK_LEN); @@ -158,7 +158,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, } /* Auto-increment data-unit sequence number */ - xts_inc128 ((u64 *)c->u_iv.iv); + xts_inc128 (c->u_iv.iv); wipememory (&tmp, sizeof(tmp)); wipememory (c->u_ctr.ctr, sizeof(c->u_ctr.ctr)); From jussi.kivilinna at iki.fi Wed Jan 25 22:15:35 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:35 +0200 Subject: [PATCH 6/7] rijndael-aesni: fix u128_t strict-aliasing rule breaking In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537893570.23293.1505879704478552210.stgit@localhost6.localdomain6> * cipher/rijndael-aesni.c (u128_t): Add attributes to tell GCC and clang that casting from 'char *' to 'u128_t *' is ok. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 7852e19..735e5cd 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -41,7 +41,10 @@ #endif -typedef struct u128_s { u32 a, b, c, d; } u128_t; +typedef struct u128_s +{ + u32 a, b, c, d; +} __attribute__((packed, aligned(1), may_alias)) u128_t; /* Two macros to be called prior and after the use of AESNI From jussi.kivilinna at iki.fi Wed Jan 25 22:15:40 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Wed, 25 Jan 2017 23:15:40 +0200 Subject: [PATCH 7/7] bufhelp: use unaligned dword and qword types for endianess helpers In-Reply-To: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> References: <148537891059.23293.16333466738521393139.stgit@localhost6.localdomain6> Message-ID: <148537894072.23293.15355883908013139954.stgit@localhost6.localdomain6> * cipher/bufhelp.h (BUFHELP_UNALIGNED_ACCESS): New, defined if attributes 'packed', 'aligned' and 'may_alias' are supported. (BUFHELP_FAST_UNALIGNED_ACCESS): Define if have BUFHELP_UNALIGNED_ACCESS. -- Now that compiler is properly told that reads from these types may do not follow strict-aliasing and may be unaligned, we enable use of these for all architectures and compiler will emit more optimized, yet correct, code (for example, use special unaligned read/write instructions instead of accessing byte-by-byte). Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index 1c52db5..3110a1d 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -23,10 +23,19 @@ #include "bithelp.h" -#undef BUFHELP_FAST_UNALIGNED_ACCESS +#undef BUFHELP_UNALIGNED_ACCESS #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ - defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) +/* Compiler is supports attributes needed for automatically issuing unaligned + memory access instructions. + */ +# define BUFHELP_UNALIGNED_ACCESS 1 +#endif + + +#undef BUFHELP_FAST_UNALIGNED_ACCESS +#if defined(BUFHELP_UNALIGNED_ACCESS) && \ (defined(__i386__) || defined(__x86_64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ defined(__aarch64__)) @@ -290,7 +299,7 @@ buf_eq_const(const void *_a, const void *_b, size_t len) } -#ifndef BUFHELP_FAST_UNALIGNED_ACCESS +#ifndef BUFHELP_UNALIGNED_ACCESS /* Functions for loading and storing unaligned u32 values of different endianness. */ @@ -373,7 +382,7 @@ static inline void buf_put_le64(void *_buf, u64 val) out[0] = val; } -#else /*BUFHELP_FAST_UNALIGNED_ACCESS*/ +#else /*BUFHELP_UNALIGNED_ACCESS*/ typedef struct bufhelp_u32_s { @@ -435,6 +444,6 @@ static inline void buf_put_le64(void *_buf, u64 val) } -#endif /*BUFHELP_FAST_UNALIGNED_ACCESS*/ +#endif /*BUFHELP_UNALIGNED_ACCESS*/ #endif /*GCRYPT_BUFHELP_H*/ From cvs at cvs.gnupg.org Fri Jan 27 09:28:30 2017 From: cvs at cvs.gnupg.org (by Werner Koch) Date: Fri, 27 Jan 2017 09:28:30 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-59-ga351fbd Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via a351fbde8548ce3f57298c618426f043844fbc78 (commit) via 8bbefa2ab283dd1443cb7453749aa0b51aec6ec4 (commit) from 39b9302da5d08bd52688d20befe626fee0b6c41d (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit a351fbde8548ce3f57298c618426f043844fbc78 Author: Werner Koch Date: Fri Jan 27 09:16:31 2017 +0100 w32: New envvar GCRYPT_RNDW32_DBG. * random/rndw32.c (_gcry_rndw32_gather_random): Use getenv to set DEBUG_ME. Signed-off-by: Werner Koch diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi index 80c369b..a905d0f 100644 --- a/doc/gcrypt.texi +++ b/doc/gcrypt.texi @@ -5388,6 +5388,13 @@ for entropy. On some older Windows systems this could help to speed up the creation of random numbers but also decreases the amount of data used to init the random number generator. + at item GCRYPT_RNDW32_DBG + at cindex GCRYPT_RNDW32_DBG +Setting the value of this variable to a positive integer logs +information about the Windows entropy gatherer using the standard log +interface. + + @item HOME @cindex HOME This is used to locate the socket to connect to the EGD random diff --git a/random/rndw32.c b/random/rndw32.c index de6e783..8c507ac 100644 --- a/random/rndw32.c +++ b/random/rndw32.c @@ -245,12 +245,13 @@ static RTLGENRANDOM pRtlGenRandom; static int system_rng_available; /* Whether a system RNG is available. */ static HCRYPTPROV hRNGProv; /* Handle to Intel RNG CSP. */ -static int debug_me; /* Debug flag. */ +/* The debug flag. Debugging is enabled if the value of the envvar + * GCRY_RNDW32_DBG is a postive number.*/ +static int debug_me; static int system_is_w2000; /* True if running on W2000. */ - /* Try and connect to the system RNG if there's one present. */ static void @@ -787,11 +788,16 @@ _gcry_rndw32_gather_random (void (*add)(const void*, size_t, if (!is_initialized) { OSVERSIONINFO osvi = { sizeof( osvi ) }; + const char *s; + + if ((s = getenv ("GCRYPT_RNDW32_DBG")) && atoi (s) > 0) + debug_me = 1; GetVersionEx( &osvi ); if (osvi.dwPlatformId != VER_PLATFORM_WIN32_NT) log_fatal ("can only run on a Windows NT platform\n" ); system_is_w2000 = (osvi.dwMajorVersion == 5 && osvi.dwMinorVersion == 0); + init_system_rng (); is_initialized = 1; } commit 8bbefa2ab283dd1443cb7453749aa0b51aec6ec4 Author: Werner Koch Date: Fri Jan 27 09:13:07 2017 +0100 Update NEWS with release info from 1.7.4 to 1.7.6. -- diff --git a/NEWS b/NEWS index 179b18d..995aac3 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,22 @@ Noteworthy changes in version 1.8.0 (unreleased) [C21/A1/R_] - GCRYCTL_PRINT_CONFIG does now also print build information for libgpg-error and the used compiler version. + * Performance: + + - More ARMv8/AArch32 improvements for AES, GCM, SHA-256, and SHA-1. + [also in 1.7.4] + + - Add ARMv8/AArch32 assembly implementation for Twofish and + Camellia. [also in 1.7.4] + + - Add bulk processing implementation for ARMv8/AArch32. + [also in 1.7.4] + + - Add Stribog OIDs. [also in 1.7.4] + + - Improve the DRBG performance and sync the code with the Linux + version. [also in 1.7.4] + * Internal changes: - Libgpg-error 1.25 is now required. This avoids stalling of nPth @@ -26,54 +42,33 @@ Noteworthy changes in version 1.8.0 (unreleased) [C21/A1/R_] allocated as needed. These new pools are not protected against being swapped out (mlock can't be used). However, these days this is considered a minor issue and can easily be mitigated by - using encrypted swap space. - - - * Interface changes relative to the 1.7.0 release: - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - GCRYCTL_REINIT_SYSCALL_CLAMP NEW macro. - - -Noteworthy changes in version 1.7.5 (2016-12-15) [C21/A1/R5] ------------------------------------------------- + using encrypted swap space. [also in 1.7.4] * Bug fixes: - - Fix regression in mlock detection [bug#2870]. + - Fix AES CTR self-check detected failure in the SSSE3 based + implementation. [also in 1.7.6] + - Remove gratuitous select before the getrandom syscall. + [also in 1.7.6] -Noteworthy changes in version 1.7.4 (2016-12-09) [C21/A1/R4] ------------------------------------------------- + - Fix regression in mlock detection. [bug#2870] [also in 1.7.5] - * Performance: + - Fix GOST 28147 CryptoPro-B S-box. [also in 1.7.4] - - More ARMv8/AArch32 improvements for AES, GCM, SHA-256, and SHA-1. - - - Add ARMv8/AArch32 assembly implementation for Twofish and - Camellia. - - - Add bulk processing implementation for ARMv8/AArch32. - - - Add Stribog OIDs. - - - Improve the DRBG performance and sync the code with the Linux - version. - - * Internal changes: + - Fix error code handling of mlock calls. [also in 1.7.4] - - When secure memory is requested by the MPI functions or by - gcry_xmalloc_secure, they do not anymore lead to a fatal error if - the secure memory pool is used up. Instead new pools are - allocated as needed. These new pools are not protected against - being swapped out (mlock can't be used). However, these days - this is considered a minor issue and can easily be mitigated by - using encrypted swap space. - * Bug fixes: + * Interface changes relative to the 1.7.0 release: + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + GCRYCTL_REINIT_SYSCALL_CLAMP NEW macro. - - Fix GOST 28147 CryptoPro-B S-box. - - Fix error code handling of mlock calls. + * Release dates of 1.7.x versions: + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Version 1.7.6 (2017-01-18) [C21/A1/R6] + Version 1.7.5 (2016-12-15) [C21/A1/R5] + Version 1.7.4 (2016-12-09) [C21/A1/R4] Noteworthy changes in version 1.7.3 (2016-08-17) [C21/A1/R3] ----------------------------------------------------------------------- Summary of changes: NEWS | 69 ++++++++++++++++++++++++++------------------------------- doc/gcrypt.texi | 7 ++++++ random/rndw32.c | 10 +++++++-- 3 files changed, 47 insertions(+), 39 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Fri Jan 27 18:59:49 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 27 Jan 2017 19:59:49 +0200 Subject: [PATCH] CFB 8 Bit implementation In-Reply-To: References: Message-ID: <44db07fc-d345-1983-ee32-8d8bd730ab64@iki.fi> Hello, On 24.01.2017 16:43, Mathias L. Baumann wrote: > Hello dear Gcrypt Devs, > > as a followup to Lizas request/inqueries I now implemented CFB in 8 bit mode. > > Please let me know about any changes you want to see. > Looks mostly ok, just few requests: - Change '//' C++ style comments to /* .. */ - Add few multibyte test-vectors from [1] and [2]. - Add changelog to commit message, see other commit for example (see [3]). - Send signed DCO to mailing list, and add 'Signed-off-by' to commit message (see [3]). > I should add that my stack burning code is more or less just guessing how it should be done by imitating what I saw in the other functions. > Please validate that I did that correct :) > Yes, it's done correctly. -Jussi [1] http://csrc.nist.gov/groups/STM/cavp/documents/des/tdesmmt.zip [2] http://csrc.nist.gov/groups/STM/cavp/documents/aes/aesmmt.zip [3] https://github.com/mathias-baumann-sociomantic/libgcrypt/blob/master/doc/HACKING -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 273 bytes Desc: OpenPGP digital signature URL: From cvs at cvs.gnupg.org Sat Jan 28 10:34:27 2017 From: cvs at cvs.gnupg.org (by Jussi Kivilinna) Date: Sat, 28 Jan 2017 10:34:27 +0100 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-66-ge7b941c Message-ID: This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "The GNU crypto library". The branch, master has been updated via e7b941c3de9c9b6319298c02f844cc0cadbf8562 (commit) via 92b4a29d2453712192ced2d7226abc49679dcb1e (commit) via 4f31d816dcc1e95dc647651e92acbdfed53f5c14 (commit) via 55cf1b5588705cab5f45e2817c4aa1d204dc0042 (commit) via b29b1b9f576f501d4b993be0a751567045274a1a (commit) via 136c8416ea540dd126be3997d94d7063b3aaf577 (commit) via d1ae52a0e23308f33b78cffeba56005b687f23c0 (commit) from a351fbde8548ce3f57298c618426f043844fbc78 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit e7b941c3de9c9b6319298c02f844cc0cadbf8562 Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 bufhelp: use unaligned dword and qword types for endianess helpers * cipher/bufhelp.h (BUFHELP_UNALIGNED_ACCESS): New, defined if attributes 'packed', 'aligned' and 'may_alias' are supported. (BUFHELP_FAST_UNALIGNED_ACCESS): Define if have BUFHELP_UNALIGNED_ACCESS. -- Now that compiler is properly told that reads from these types may do not follow strict-aliasing and may be unaligned, we enable use of these for all architectures and compiler will emit more optimized, yet correct, code (for example, use special unaligned read/write instructions instead of accessing byte-by-byte). Signed-off-by: Jussi Kivilinna diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index 1c52db5..3110a1d 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -23,10 +23,19 @@ #include "bithelp.h" -#undef BUFHELP_FAST_UNALIGNED_ACCESS +#undef BUFHELP_UNALIGNED_ACCESS #if defined(HAVE_GCC_ATTRIBUTE_PACKED) && \ defined(HAVE_GCC_ATTRIBUTE_ALIGNED) && \ - defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) && \ + defined(HAVE_GCC_ATTRIBUTE_MAY_ALIAS) +/* Compiler is supports attributes needed for automatically issuing unaligned + memory access instructions. + */ +# define BUFHELP_UNALIGNED_ACCESS 1 +#endif + + +#undef BUFHELP_FAST_UNALIGNED_ACCESS +#if defined(BUFHELP_UNALIGNED_ACCESS) && \ (defined(__i386__) || defined(__x86_64__) || \ (defined(__arm__) && defined(__ARM_FEATURE_UNALIGNED)) || \ defined(__aarch64__)) @@ -290,7 +299,7 @@ buf_eq_const(const void *_a, const void *_b, size_t len) } -#ifndef BUFHELP_FAST_UNALIGNED_ACCESS +#ifndef BUFHELP_UNALIGNED_ACCESS /* Functions for loading and storing unaligned u32 values of different endianness. */ @@ -373,7 +382,7 @@ static inline void buf_put_le64(void *_buf, u64 val) out[0] = val; } -#else /*BUFHELP_FAST_UNALIGNED_ACCESS*/ +#else /*BUFHELP_UNALIGNED_ACCESS*/ typedef struct bufhelp_u32_s { @@ -435,6 +444,6 @@ static inline void buf_put_le64(void *_buf, u64 val) } -#endif /*BUFHELP_FAST_UNALIGNED_ACCESS*/ +#endif /*BUFHELP_UNALIGNED_ACCESS*/ #endif /*GCRYPT_BUFHELP_H*/ commit 92b4a29d2453712192ced2d7226abc49679dcb1e Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 rijndael-aesni: fix u128_t strict-aliasing rule breaking * cipher/rijndael-aesni.c (u128_t): Add attributes to tell GCC and clang that casting from 'char *' to 'u128_t *' is ok. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/rijndael-aesni.c b/cipher/rijndael-aesni.c index 7852e19..735e5cd 100644 --- a/cipher/rijndael-aesni.c +++ b/cipher/rijndael-aesni.c @@ -41,7 +41,10 @@ #endif -typedef struct u128_s { u32 a, b, c, d; } u128_t; +typedef struct u128_s +{ + u32 a, b, c, d; +} __attribute__((packed, aligned(1), may_alias)) u128_t; /* Two macros to be called prior and after the use of AESNI commit 4f31d816dcc1e95dc647651e92acbdfed53f5c14 Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 cipher-xts: fix pointer casting to wrong alignment and aliasing * cipher/cipher-xts.c (xts_gfmul_byA, xts_inc128): Use buf_get_le64 and buf_put_le64 for accessing data; Change parameter pointers to 'unsigned char *' type. (_gcry_cipher_xts_crypt): Do not cast buffer pointers to 'u64 *' for helper functions. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/cipher-xts.c b/cipher/cipher-xts.c index 7a7181b..4da89e5 100644 --- a/cipher/cipher-xts.c +++ b/cipher/cipher-xts.c @@ -29,29 +29,29 @@ #include "./cipher-internal.h" -static inline void xts_gfmul_byA (u64 *out, const u64 *in) +static inline void xts_gfmul_byA (unsigned char *out, const unsigned char *in) { - u64 hi = le_bswap64 (in[1]); - u64 lo = le_bswap64 (in[0]); + u64 hi = buf_get_le64 (in + 8); + u64 lo = buf_get_le64 (in + 0); u64 carry = -(hi >> 63) & 0x87; hi = (hi << 1) + (lo >> 63); lo = (lo << 1) ^ carry; - out[1] = le_bswap64 (hi); - out[0] = le_bswap64 (lo); + buf_put_le64 (out + 8, hi); + buf_put_le64 (out + 0, lo); } -static inline void xts_inc128 (u64 *seqno) +static inline void xts_inc128 (unsigned char *seqno) { - u64 lo = le_bswap64 (seqno[0]); - u64 hi = le_bswap64 (seqno[1]); + u64 lo = buf_get_le64 (seqno + 0); + u64 hi = buf_get_le64 (seqno + 8); hi += !(++lo); - seqno[0] = le_bswap64 (lo); - seqno[1] = le_bswap64 (hi); + buf_put_le64 (seqno + 0, lo); + buf_put_le64 (seqno + 8, hi); } @@ -117,7 +117,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, nblocks--; /* Generate next tweak. */ - xts_gfmul_byA ((u64 *)c->u_ctr.ctr, (u64 *)c->u_ctr.ctr); + xts_gfmul_byA (c->u_ctr.ctr, c->u_ctr.ctr); } /* Handle remaining data with ciphertext stealing. */ @@ -129,7 +129,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, gcry_assert (inbuflen < GCRY_XTS_BLOCK_LEN * 2); /* Generate last tweak. */ - xts_gfmul_byA (tmp.x64, (u64 *)c->u_ctr.ctr); + xts_gfmul_byA (tmp.x1, c->u_ctr.ctr); /* Decrypt last block first. */ buf_xor (outbuf, inbuf, tmp.x64, GCRY_XTS_BLOCK_LEN); @@ -158,7 +158,7 @@ _gcry_cipher_xts_crypt (gcry_cipher_hd_t c, } /* Auto-increment data-unit sequence number */ - xts_inc128 ((u64 *)c->u_iv.iv); + xts_inc128 (c->u_iv.iv); wipememory (&tmp, sizeof(tmp)); wipememory (c->u_ctr.ctr, sizeof(c->u_ctr.ctr)); commit 55cf1b5588705cab5f45e2817c4aa1d204dc0042 Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 crc-intel-pclmul: fix undefined behavior with unaligned access * cipher/crc-intel-pclmul.c (u16_unaligned_s): New. (crc32_reflected_less_than_16, crc32_less_than_16): Use 'u16_unaligned_s' for unaligned memory access. -- GnuPG-bug-id: 2292 Signed-off-by: Jussi Kivilinna diff --git a/cipher/crc-intel-pclmul.c b/cipher/crc-intel-pclmul.c index 7a344e2..8ff08ec 100644 --- a/cipher/crc-intel-pclmul.c +++ b/cipher/crc-intel-pclmul.c @@ -44,6 +44,12 @@ #define ALIGNED_16 __attribute__ ((aligned (16))) +struct u16_unaligned_s +{ + u16 a; +} __attribute__((packed, aligned (1), may_alias)); + + /* Constants structure for generic reflected/non-reflected CRC32 CLMUL * functions. */ struct crc32_consts_s @@ -345,14 +351,14 @@ crc32_reflected_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, } else if (inlen == 2) { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data ^= crc; data <<= 16; crc >>= 16; } else { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data |= inbuf[2] << 16; data ^= crc; data <<= 8; @@ -709,14 +715,14 @@ crc32_less_than_16 (u32 *pcrc, const byte *inbuf, size_t inlen, } else if (inlen == 2) { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data ^= crc; data = _gcry_bswap32(data << 16); crc = _gcry_bswap32(crc >> 16); } else { - data = *((const u16 *)inbuf); + data = ((const struct u16_unaligned_s *)inbuf)->a; data |= inbuf[2] << 16; data ^= crc; data = _gcry_bswap32(data << 8); commit b29b1b9f576f501d4b993be0a751567045274a1a Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 configure.ac: fix attribute checks * configure.ac: Add -Werror flag for attribute checks. -- Compilter ignores unknown attributes and just shows warning. Therefore attribute checks need to be run with -Werror. Signed-off-by: Jussi Kivilinna diff --git a/configure.ac b/configure.ac index d2b863c..bc5bed4 100644 --- a/configure.ac +++ b/configure.ac @@ -958,6 +958,12 @@ if test "$gcry_cv_visibility_attribute" = "yes" \ fi +# Following attribute tests depend on warnings to cause compile to fail, +# so set -Werror temporarily. +_gcc_cflags_save=$CFLAGS +CFLAGS="$CFLAGS -Werror" + + # # Check whether the compiler supports the GCC style aligned attribute # @@ -1009,6 +1015,10 @@ if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then fi +# Restore flags. +CFLAGS=$_gcc_cflags_save; + + # # Check whether the compiler supports 'asm' or '__asm__' keyword for # assembler blocks. commit 136c8416ea540dd126be3997d94d7063b3aaf577 Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 configure.ac: fix may_alias attribute check * configure.ac: Test may_alias attribute on type, not on variable. -- Signed-off-by: Jussi Kivilinna diff --git a/configure.ac b/configure.ac index 5dd27ca..d2b863c 100644 --- a/configure.ac +++ b/configure.ac @@ -1000,7 +1000,8 @@ AC_CACHE_CHECK([whether the GCC style may_alias attribute is supported], [gcry_cv_gcc_attribute_may_alias], [gcry_cv_gcc_attribute_may_alias=no AC_COMPILE_IFELSE([AC_LANG_SOURCE( - [[struct { int a; } foo __attribute__ ((may_alias));]])], + [[typedef struct foo_s { int a; } + __attribute__ ((may_alias)) foo_t;]])], [gcry_cv_gcc_attribute_may_alias=yes])]) if test "$gcry_cv_gcc_attribute_may_alias" = "yes" ; then AC_DEFINE(HAVE_GCC_ATTRIBUTE_MAY_ALIAS,1, commit d1ae52a0e23308f33b78cffeba56005b687f23c0 Author: Jussi Kivilinna Date: Sat Jan 28 11:26:02 2017 +0200 bufhelp: add 'may_alias' attribute for properly aligned 'bufhelp_int_t' * cipher/bufhelp.h [!BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_int_t): Add 'may_alias' attribute. -- Signed-off-by: Jussi Kivilinna diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index 3616515..1c52db5 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -49,11 +49,18 @@ typedef struct bufhelp_int_s /* Define type with default alignment for other architectures (unaligned accessed handled in per byte loops). */ +#ifdef HAVE_GCC_ATTRIBUTE_MAY_ALIAS +typedef struct bufhelp_int_s +{ + uintptr_t a; +} __attribute__((may_alias)) bufhelp_int_t; +#else typedef struct bufhelp_int_s { uintptr_t a; } bufhelp_int_t; #endif +#endif /* Optimized function for small buffer copying */ ----------------------------------------------------------------------- Summary of changes: cipher/bufhelp.h | 26 +++++++++++++++++++++----- cipher/cipher-xts.c | 26 +++++++++++++------------- cipher/crc-intel-pclmul.c | 14 ++++++++++---- cipher/rijndael-aesni.c | 5 ++++- configure.ac | 13 ++++++++++++- 5 files changed, 60 insertions(+), 24 deletions(-) hooks/post-receive -- The GNU crypto library http://git.gnupg.org _______________________________________________ Gnupg-commits mailing list Gnupg-commits at gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-commits From jussi.kivilinna at iki.fi Sat Jan 28 14:13:09 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 28 Jan 2017 15:13:09 +0200 Subject: [PATCH 1/4] cipher: add explicit blocksize checks to allow better optimization Message-ID: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt) (_gcry_cipher_cbc_decrypt): Add explicit check for cipher blocksize of 64-bit or 128-bit. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) (_gcry_cipher_cfb_decrypt): Ditto. * cipher/cipher-cmac.c (cmac_write, cmac_generate_subkeys) (cmac_final): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt): Ditto. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/cipher-cbc.c b/cipher/cipher-cbc.c index 67814b7..95c49b2 100644 --- a/cipher/cipher-cbc.c +++ b/cipher/cipher-cbc.c @@ -44,6 +44,11 @@ _gcry_cipher_cbc_encrypt (gcry_cipher_hd_t c, size_t nblocks = inbuflen / blocksize; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < ((c->flags & GCRY_CIPHER_CBC_MAC)? blocksize : inbuflen)) return GPG_ERR_BUFFER_TOO_SHORT; @@ -133,6 +138,11 @@ _gcry_cipher_cbc_decrypt (gcry_cipher_hd_t c, size_t nblocks = inbuflen / blocksize; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < inbuflen) return GPG_ERR_BUFFER_TOO_SHORT; diff --git a/cipher/cipher-cfb.c b/cipher/cipher-cfb.c index f289ed3..21c81ca 100644 --- a/cipher/cipher-cfb.c +++ b/cipher/cipher-cfb.c @@ -41,6 +41,11 @@ _gcry_cipher_cfb_encrypt (gcry_cipher_hd_t c, size_t blocksize_x_2 = blocksize + blocksize; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < inbuflen) return GPG_ERR_BUFFER_TOO_SHORT; @@ -138,6 +143,11 @@ _gcry_cipher_cfb_decrypt (gcry_cipher_hd_t c, size_t blocksize_x_2 = blocksize + blocksize; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < inbuflen) return GPG_ERR_BUFFER_TOO_SHORT; diff --git a/cipher/cipher-cmac.c b/cipher/cipher-cmac.c index eca1c1a..da3ef75 100644 --- a/cipher/cipher-cmac.c +++ b/cipher/cipher-cmac.c @@ -42,6 +42,11 @@ cmac_write (gcry_cipher_hd_t c, const byte * inbuf, size_t inlen) unsigned int burn = 0; unsigned int nblocks; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return; + if (!inlen || !inbuf) return; @@ -109,6 +114,11 @@ cmac_generate_subkeys (gcry_cipher_hd_t c) byte buf[MAX_BLOCKSIZE]; } u; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return; + if (MAX_BLOCKSIZE < blocksize) BUG (); @@ -149,6 +159,11 @@ cmac_final (gcry_cipher_hd_t c) unsigned int burn; byte *subkey; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return; + if (count == blocksize) subkey = c->u_mode.cmac.subkeys[0]; /* K1 */ else diff --git a/cipher/cipher-ctr.c b/cipher/cipher-ctr.c index 4bbfaae..f9cb6b5 100644 --- a/cipher/cipher-ctr.c +++ b/cipher/cipher-ctr.c @@ -42,6 +42,11 @@ _gcry_cipher_ctr_encrypt (gcry_cipher_hd_t c, size_t nblocks; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < inbuflen) return GPG_ERR_BUFFER_TOO_SHORT; diff --git a/cipher/cipher-ofb.c b/cipher/cipher-ofb.c index 7db7658..f821d1b 100644 --- a/cipher/cipher-ofb.c +++ b/cipher/cipher-ofb.c @@ -40,6 +40,11 @@ _gcry_cipher_ofb_encrypt (gcry_cipher_hd_t c, size_t blocksize = c->spec->blocksize; unsigned int burn, nburn; + /* Tell compiler that we require a cipher with a 64bit or 128 bit block + * length, to allow better optimization of this function. */ + if (blocksize > 16 || blocksize < 8 || blocksize & (8 - 1)) + return GPG_ERR_INV_LENGTH; + if (outbuflen < inbuflen) return GPG_ERR_BUFFER_TOO_SHORT; From jussi.kivilinna at iki.fi Sat Jan 28 14:13:14 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 28 Jan 2017 15:13:14 +0200 Subject: [PATCH 2/4] hwf-x86: avoid type-punching In-Reply-To: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> References: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> Message-ID: <148560919422.13097.871021115218275910.stgit@localhost6.localdomain6> * src/hwf-x86.c (detect_x86_gnuc): Use union for vendor_id. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/src/hwf-x86.c b/src/hwf-x86.c index a746ab2..53e00d9 100644 --- a/src/hwf-x86.c +++ b/src/hwf-x86.c @@ -170,7 +170,11 @@ get_xgetbv(void) static unsigned int detect_x86_gnuc (void) { - char vendor_id[12+1]; + union + { + char c[12+1]; + unsigned int ui[3]; + } vendor_id; unsigned int features; unsigned int os_supports_avx_avx2_registers = 0; unsigned int max_cpuid_level; @@ -183,16 +187,14 @@ detect_x86_gnuc (void) if (!is_cpuid_available()) return 0; - get_cpuid(0, &max_cpuid_level, - (unsigned int *)&vendor_id[0], - (unsigned int *)&vendor_id[8], - (unsigned int *)&vendor_id[4]); - vendor_id[12] = 0; + get_cpuid(0, &max_cpuid_level, &vendor_id.ui[0], &vendor_id.ui[2], + &vendor_id.ui[1]); + vendor_id.c[12] = 0; if (0) ; /* Just to make "else if" and ifdef macros look pretty. */ #ifdef ENABLE_PADLOCK_SUPPORT - else if (!strcmp (vendor_id, "CentaurHauls")) + else if (!strcmp (vendor_id.c, "CentaurHauls")) { /* This is a VIA CPU. Check what PadLock features we have. */ @@ -225,12 +227,12 @@ detect_x86_gnuc (void) } } #endif /*ENABLE_PADLOCK_SUPPORT*/ - else if (!strcmp (vendor_id, "GenuineIntel")) + else if (!strcmp (vendor_id.c, "GenuineIntel")) { /* This is an Intel CPU. */ result |= HWF_INTEL_CPU; } - else if (!strcmp (vendor_id, "AuthenticAMD")) + else if (!strcmp (vendor_id.c, "AuthenticAMD")) { /* This is an AMD CPU. */ } From jussi.kivilinna at iki.fi Sat Jan 28 14:13:19 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 28 Jan 2017 15:13:19 +0200 Subject: [PATCH 3/4] rndhw: avoid type-punching In-Reply-To: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> References: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> Message-ID: <148560919924.13097.3269343013591826920.stgit@localhost6.localdomain6> * random/rndhw.c (rdrand_long, rdrand_nlong): Add 'volatile' for pointer. (poll_drng): Convert buffer to 'unsigned long[]' and make use of DIM macro. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/random/rndhw.c b/random/rndhw.c index 8e50751..7c75955 100644 --- a/random/rndhw.c +++ b/random/rndhw.c @@ -129,7 +129,7 @@ poll_padlock (void (*add)(const void*, size_t, enum random_origins), # define RDRAND_LONG RDRAND_INT # endif static inline int -rdrand_long (unsigned long *v) +rdrand_long (volatile unsigned long *v) { int ok; asm volatile ("1: " RDRAND_LONG "\n\t" @@ -145,7 +145,7 @@ rdrand_long (unsigned long *v) static inline int -rdrand_nlong (unsigned long *v, int count) +rdrand_nlong (volatile unsigned long *v, int count) { while (count--) if (!rdrand_long(v++)) @@ -157,12 +157,12 @@ rdrand_nlong (unsigned long *v, int count) static size_t poll_drng (add_fn_t add, enum random_origins origin, int fast) { - volatile char buffer[64] __attribute__ ((aligned (8))); + volatile unsigned long buffer[8]; unsigned int nbytes = sizeof (buffer); (void)fast; - if (!rdrand_nlong ((unsigned long *)buffer, sizeof(buffer)/sizeof(long))) + if (!rdrand_nlong (buffer, DIM(buffer))) return 0; (*add)((void *)buffer, nbytes, origin); return nbytes; From jussi.kivilinna at iki.fi Sat Jan 28 14:13:24 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Sat, 28 Jan 2017 15:13:24 +0200 Subject: [PATCH 4/4] Add UNLIKELY and LIKELY macros In-Reply-To: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> References: <148560918918.13097.2811551016110191421.stgit@localhost6.localdomain6> Message-ID: <148560920426.13097.3468065541879813575.stgit@localhost6.localdomain6> * src/g10lib.h (LIKELY, UNLIKELY): New. (gcry_assert): Use LIKELY for assert check. (fast_wipememory2_unaligned_head): Use UNLIKELY for unaligned branching. * cipher/bufhelp.h (buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst) (buf_xor_n_copy_2): Ditto. -- Signed-off-by: Jussi Kivilinna --- 0 files changed diff --git a/cipher/bufhelp.h b/cipher/bufhelp.h index 3110a1d..b854bc0 100644 --- a/cipher/bufhelp.h +++ b/cipher/bufhelp.h @@ -1,5 +1,5 @@ /* bufhelp.h - Some buffer manipulation helpers - * Copyright (C) 2012 Jussi Kivilinna + * Copyright (C) 2012-2017 Jussi Kivilinna * * This file is part of Libgcrypt. * @@ -20,6 +20,7 @@ #define GCRYPT_BUFHELP_H +#include "g10lib.h" #include "bithelp.h" @@ -88,7 +89,7 @@ buf_cpy(void *_dst, const void *_src, size_t len) const unsigned int longmask = sizeof(bufhelp_int_t) - 1; /* Skip fast processing if buffers are unaligned. */ - if (((uintptr_t)dst | (uintptr_t)src) & longmask) + if (UNLIKELY(((uintptr_t)dst | (uintptr_t)src) & longmask)) goto do_bytes; #endif @@ -124,7 +125,7 @@ buf_xor(void *_dst, const void *_src1, const void *_src2, size_t len) const unsigned int longmask = sizeof(bufhelp_int_t) - 1; /* Skip fast processing if buffers are unaligned. */ - if (((uintptr_t)dst | (uintptr_t)src1 | (uintptr_t)src2) & longmask) + if (UNLIKELY(((uintptr_t)dst | (uintptr_t)src1 | (uintptr_t)src2) & longmask)) goto do_bytes; #endif @@ -160,7 +161,7 @@ buf_xor_1(void *_dst, const void *_src, size_t len) const unsigned int longmask = sizeof(bufhelp_int_t) - 1; /* Skip fast processing if buffers are unaligned. */ - if (((uintptr_t)dst | (uintptr_t)src) & longmask) + if (UNLIKELY(((uintptr_t)dst | (uintptr_t)src) & longmask)) goto do_bytes; #endif @@ -196,7 +197,7 @@ buf_xor_2dst(void *_dst1, void *_dst2, const void *_src, size_t len) const unsigned int longmask = sizeof(bufhelp_int_t) - 1; /* Skip fast processing if buffers are unaligned. */ - if (((uintptr_t)src | (uintptr_t)dst1 | (uintptr_t)dst2) & longmask) + if (UNLIKELY(((uintptr_t)src | (uintptr_t)dst1 | (uintptr_t)dst2) & longmask)) goto do_bytes; #endif @@ -238,8 +239,8 @@ buf_xor_n_copy_2(void *_dst_xor, const void *_src_xor, void *_srcdst_cpy, const unsigned int longmask = sizeof(bufhelp_int_t) - 1; /* Skip fast processing if buffers are unaligned. */ - if (((uintptr_t)src_cpy | (uintptr_t)src_xor | (uintptr_t)dst_xor | - (uintptr_t)srcdst_cpy) & longmask) + if (UNLIKELY(((uintptr_t)src_cpy | (uintptr_t)src_xor | (uintptr_t)dst_xor | + (uintptr_t)srcdst_cpy) & longmask)) goto do_bytes; #endif diff --git a/src/g10lib.h b/src/g10lib.h index 8ce84b8..0309a83 100644 --- a/src/g10lib.h +++ b/src/g10lib.h @@ -75,6 +75,14 @@ #define GCC_ATTR_UNUSED #endif +#if __GNUC__ >= 3 +#define LIKELY( expr ) __builtin_expect( !!(expr), 1 ) +#define UNLIKELY( expr ) __builtin_expect( !!(expr), 0 ) +#else +#define LIKELY( expr ) (!!(expr)) +#define UNLIKELY( expr ) (!!(expr)) +#endif + /* Gettext macros. */ #define _(a) _gcry_gettext(a) @@ -165,15 +173,15 @@ int _gcry_log_verbosity( int level ); #ifdef JNLIB_GCC_M_FUNCTION #define BUG() _gcry_bug( __FILE__ , __LINE__, __FUNCTION__ ) -#define gcry_assert(expr) ((expr)? (void)0 \ +#define gcry_assert(expr) (LIKELY(expr)? (void)0 \ : _gcry_assert_failed (STR(expr), __FILE__, __LINE__, __FUNCTION__)) #elif __STDC_VERSION__ >= 199901L #define BUG() _gcry_bug( __FILE__ , __LINE__, __func__ ) -#define gcry_assert(expr) ((expr)? (void)0 \ +#define gcry_assert(expr) (LIKELY(expr)? (void)0 \ : _gcry_assert_failed (STR(expr), __FILE__, __LINE__, __func__)) #else #define BUG() _gcry_bug( __FILE__ , __LINE__ ) -#define gcry_assert(expr) ((expr)? (void)0 \ +#define gcry_assert(expr) (LIKELY(expr)? (void)0 \ : _gcry_assert_failed (STR(expr), __FILE__, __LINE__)) #endif @@ -346,7 +354,7 @@ typedef struct fast_wipememory_s } __attribute__((packed, aligned(1), may_alias)) fast_wipememory_t; #else #define fast_wipememory2_unaligned_head(_vptr,_vset,_vlen) do { \ - while((size_t)(_vptr)&(sizeof(FASTWIPE_T)-1) && _vlen) \ + while(UNLIKELY((size_t)(_vptr)&(sizeof(FASTWIPE_T)-1)) && _vlen) \ { *_vptr=(_vset); _vptr++; _vlen--; } \ } while(0) typedef struct fast_wipememory_s From mathias.baumann at sociomantic.com Mon Jan 30 14:47:13 2017 From: mathias.baumann at sociomantic.com (Mathias L. Baumann) Date: Mon, 30 Jan 2017 14:47:13 +0100 Subject: [PATCH] Implement CFB with 8bit mode Message-ID: <6ff7a494-28aa-22be-7aa5-0c2e4efb4c16@sociomantic.com> * cipher/cipher-cfb.c: add 8bit variants of decrypt/encrypt functions * tests/basic.c: add tests for cfb8 with AES and 3DES * adjust code flow to work with constant GCRY_CIPHER_MODE_CFB8 Signed-off-by: Mathias L. Baumann --- cipher/cipher-cfb.c | 86 ++++++++++++++++++++++++++++++++++++++++++++ cipher/cipher-internal.h | 8 +++++ cipher/cipher.c | 9 +++++ tests/basic.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 195 insertions(+), 1 deletion(-) diff --git a/cipher/cipher-cfb.c b/cipher/cipher-cfb.c index f289ed38..dee4a1cb 100644 --- a/cipher/cipher-cfb.c +++ b/cipher/cipher-cfb.c @@ -223,3 +223,89 @@ _gcry_cipher_cfb_decrypt (gcry_cipher_hd_t c, return 0; } + + +gcry_err_code_t +_gcry_cipher_cfb8_encrypt (gcry_cipher_hd_t c, + unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen) +{ + gcry_cipher_encrypt_t enc_fn = c->spec->encrypt; + size_t blocksize = c->spec->blocksize; + unsigned int burn, nburn; + + if (outbuflen < inbuflen) + return GPG_ERR_BUFFER_TOO_SHORT; + + burn = 0; + + while ( inbuflen > 0) + { + /* Encrypt the IV. */ + nburn = enc_fn ( &c->context.c, c->lastiv, c->u_iv.iv ); + burn = nburn > burn ? nburn : burn; + + outbuf[0] = c->lastiv[0] ^ inbuf[0]; + + /* Bitshift iv by 8 bit to the left */ + for (int i = 0; i < blocksize-1; i++) + c->u_iv.iv[i] = c->u_iv.iv[i+1]; + + /* append cipher text to iv */ + c->u_iv.iv[blocksize-1] = outbuf[0]; + + outbuf += 1; + inbuf += 1; + inbuflen -= 1; + } + + if (burn > 0) + _gcry_burn_stack (burn + 4 * sizeof(void *)); + + return 0; +} + + +gcry_err_code_t +_gcry_cipher_cfb8_decrypt (gcry_cipher_hd_t c, + unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen) +{ + gcry_cipher_encrypt_t enc_fn = c->spec->encrypt; + size_t blocksize = c->spec->blocksize; + unsigned int burn, nburn; + unsigned char appendee; + + if (outbuflen < inbuflen) + return GPG_ERR_BUFFER_TOO_SHORT; + + burn = 0; + + while (inbuflen > 0) + { + /* Encrypt the IV. */ + nburn = enc_fn ( &c->context.c, c->lastiv, c->u_iv.iv ); + burn = nburn > burn ? nburn : burn; + + /* inbuf might == outbuf, make sure we keep the value + so we can append it later */ + appendee = inbuf[0]; + + outbuf[0] = inbuf[0] ^ c->lastiv[0]; + + /* Bitshift iv by 8 bit to the left */ + for (int i = 0; i < blocksize-1; i++) + c->u_iv.iv[i] = c->u_iv.iv[i+1]; + + c->u_iv.iv[blocksize-1] = appendee; + + outbuf += 1; + inbuf += 1; + inbuflen -= 1; + } + + if (burn > 0) + _gcry_burn_stack (burn + 4 * sizeof(void *)); + + return 0; +} diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 33d0629c..ea9c33d3 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -348,6 +348,14 @@ gcry_err_code_t _gcry_cipher_cfb_decrypt /* */ (gcry_cipher_hd_t c, unsigned char *outbuf, size_t outbuflen, const unsigned char *inbuf, size_t inbuflen); +gcry_err_code_t _gcry_cipher_cfb8_encrypt +/* */ (gcry_cipher_hd_t c, + unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen); +gcry_err_code_t _gcry_cipher_cfb8_decrypt +/* */ (gcry_cipher_hd_t c, + unsigned char *outbuf, size_t outbuflen, + const unsigned char *inbuf, size_t inbuflen); /*-- cipher-ofb.c --*/ diff --git a/cipher/cipher.c b/cipher/cipher.c index 06ce1dad..124700e9 100644 --- a/cipher/cipher.c +++ b/cipher/cipher.c @@ -415,6 +415,7 @@ _gcry_cipher_open_internal (gcry_cipher_hd_t *handle, case GCRY_CIPHER_MODE_ECB: case GCRY_CIPHER_MODE_CBC: case GCRY_CIPHER_MODE_CFB: + case GCRY_CIPHER_MODE_CFB8: case GCRY_CIPHER_MODE_OFB: case GCRY_CIPHER_MODE_CTR: case GCRY_CIPHER_MODE_AESWRAP: @@ -902,6 +903,10 @@ cipher_encrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, rc = _gcry_cipher_cfb_encrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; + case GCRY_CIPHER_MODE_CFB8: + rc = _gcry_cipher_cfb8_encrypt (c, outbuf, outbuflen, inbuf, inbuflen); + break; + case GCRY_CIPHER_MODE_OFB: rc = _gcry_cipher_ofb_encrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; @@ -1029,6 +1034,10 @@ cipher_decrypt (gcry_cipher_hd_t c, byte *outbuf, size_t outbuflen, rc = _gcry_cipher_cfb_decrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; + case GCRY_CIPHER_MODE_CFB8: + rc = _gcry_cipher_cfb8_decrypt (c, outbuf, outbuflen, inbuf, inbuflen); + break; + case GCRY_CIPHER_MODE_OFB: rc = _gcry_cipher_ofb_encrypt (c, outbuf, outbuflen, inbuf, inbuflen); break; diff --git a/tests/basic.c b/tests/basic.c index 6d086b55..8b17bf75 100644 --- a/tests/basic.c +++ b/tests/basic.c @@ -893,7 +893,98 @@ check_cfb_cipher (void) 16, "\x75\xa3\x85\x74\x1a\xb9\xce\xf8\x20\x31\x62\x3d\x55\xb1\xe4\x71" } } - } + }, + { GCRY_CIPHER_AES, 1, + "\x2b\x7e\x15\x16\x28\xae\xd2\xa6\xab\xf7\x15\x88\x09\xcf\x4f\x3c", + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", + { { "\x6b", + 1, + "\x3b"}, + { "\xc1", + 1, + "\x79"}, + { "\xbe", + 1, + "\x42"}, + { "\xe2", + 1, + "\x4c"}, + } + }, + { GCRY_CIPHER_AES192, 1, + "\x8e\x73\xb0\xf7\xda\x0e\x64\x52\xc8\x10\xf3\x2b\x80\x90\x79\xe5\x62\xf8\xea\xd2\x52\x2c\x6b\x7b", + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", + { { "\x6b", + 1, + "\xcd"}, + { "\xc1", + 1, + "\xa2"}, + { "\xbe", + 1, + "\x52"}, + { "\xe2", + 1, + "\x1e"}, + } + }, + { GCRY_CIPHER_AES256, 1, + "\x60\x3d\xeb\x10\x15\xca\x71\xbe\x2b\x73\xae\xf0\x85\x7d\x77\x81\x1f\x35\x2c\x07\x3b\x61\x08\xd7\x2d\x98\x10\xa3\x09\x14\xdf\xf4", + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", + { { "\x6b", + 1, + "\xdc"}, + { "\xc1", + 1, + "\x1f"}, + { "\xbe", + 1, + "\x1a"}, + { "\xe2", + 1, + "\x85"}, + } + }, + { GCRY_CIPHER_AES, 1, + "\x3a\x6f\x91\x59\x26\x3f\xa6\xce\xf2\xa0\x75\xca\xfa\xce\x58\x17", + "\x0f\xc2\x36\x62\xb7\xdb\xf7\x38\x27\xf0\xc7\xde\x32\x1c\xa3\x6e", + { { "\x87\xef\xeb\x8d\x55\x9e\xd3\x36\x77\x28", + 10, + "\x8e\x9c\x50\x42\x56\x14\xd5\x40\xce\x11"}, + } + }, + { GCRY_CIPHER_AES192, 1, + "\x53\x7e\x7b\xf6\x61\xfd\x40\x24\xa0\x24\x61\x3f\x15\xb1\x36\x90\xf7\xd0\xc8\x47\xc1\xe1\x89\x65", + "\x3a\x81\xf9\xd9\xd3\xc1\x55\xb0\xca\xad\x5d\x73\x34\x94\x76\xfc", + { { "\xd3\xd8\xb9\xb9\x84\xad\xc2\x42\x37\xee", + 10, + "\x38\x79\xfe\xa7\x2a\xc9\x99\x29\xe5\x3a"}, + } + }, + { GCRY_CIPHER_AES256, 1, + "\xeb\xbb\x45\x66\xb5\xe1\x82\xe0\xf0\x72\x46\x6b\x0b\x31\x1d\xf3\x8f\x91\x75\xbc\x02\x13\xa5\x53\x0b\xce\x2e\xc4\xd7\x4f\x40\x0d", + "\x09\x56\xa4\x8e\x01\x00\x2c\x9e\x16\x37\x6d\x6e\x30\x8d\xba\xd1", + { { "\xb0\xfe\x25\xac\x8d\x3d\x28\xa2\xf4\x71", + 10, + "\x63\x8c\x68\x23\xe7\x25\x6f\xb5\x62\x6e"}, + } + }, + { GCRY_CIPHER_3DES, 1, + "\xe3\x34\x7a\x6b\x0b\xc1\x15\x2c\x64\x2a\x25\xcb\xd3\xbc\x31\xab\xfb\xa1\x62\xa8\x1f\x19\x7c\x15", + "\xb7\x40\xcc\x21\xe9\x25\xe3\xc8", + { { "\xdb\xe9\x15\xfc\xb3\x3b\xca\x18\xef\x14", + 10, + "\xf4\x80\x1a\x8d\x03\x9d\xb4\xca\x8f\xf6"}, + } + }, + { GCRY_CIPHER_3DES, 1, + "\x7c\xa2\x89\x38\xba\x6b\xec\x1f\xfe\xc7\x8f\x7c\xd6\x97\x61\x94\x7c\xa2\x89\x38\xba\x6b\xec\x1f", + "\x95\x38\x96\x58\x6e\x49\xd3\x8f", + { { "\x2e\xa9\x56\xd4\xa2\x11\xdb\x68\x59\xb7", + 10, + "\xf2\x0e\x53\x66\x74\xa6\x6f\xa7\x38\x05"}, + } + }, }; gcry_cipher_hd_t hde, hdd; unsigned char out[MAX_DATA_LEN]; -- 2.11.0 -------------- next part -------------- A non-text attachment was scrubbed... Name: 0xDF9A49AD.asc Type: application/pgp-keys Size: 3144 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From mathias.baumann at sociomantic.com Mon Jan 30 14:30:42 2017 From: mathias.baumann at sociomantic.com (Mathias L. Baumann) Date: Mon, 30 Jan 2017 14:30:42 +0100 Subject: My signed DCO Message-ID: <07c06d79-0828-b564-d604-fd16c7c86ebe@sociomantic.com> Libgcrypt Developer's Certificate of Origin. Version 1.0 ========================================================= By making a contribution to the Libgcrypt project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the free software license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate free software license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same free software license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the free software license(s) involved. Signed-off-by: Mathias L. Baumann -------------- next part -------------- A non-text attachment was scrubbed... Name: 0xDF9A49AD.asc Type: application/pgp-keys Size: 3144 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jussi.kivilinna at iki.fi Fri Jan 27 10:17:17 2017 From: jussi.kivilinna at iki.fi (Jussi Kivilinna) Date: Fri, 27 Jan 2017 11:17:17 +0200 Subject: [git] GCRYPT - branch, master, updated. libgcrypt-1.7.3-59-ga351fbd In-Reply-To: References: Message-ID: Hello, > > + * Performance: > + > + - More ARMv8/AArch32 improvements for AES, GCM, SHA-256, and SHA-1. > + [also in 1.7.4] New architecture naming in ARM is a bit confusing. These were new 'crypto extension' implementations for the new 64-bit ARM arch (ARMv8/AArch64), where as 1.7.3 added new 'crypto extension' implementation for the refreshed 32-bit ARM arch (ARMv8/AArch32). > + > + - Add ARMv8/AArch32 assembly implementation for Twofish and > + Camellia. [also in 1.7.4] These were for AArch64 too. -Jussi