peter/libgcrypt - libgcrypt source repository for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2015-01-05	doc: State that gcry_md_write et al may be used after md_read.	Werner Koch	2	-1/+7
	--
2015-01-02	rmd160: restore native-endian store in _gcry_rmd160_mixblock	Jussi Kivilinna	1	-3/+4
	* cipher/rmd160.c (_gcry_rmd160_mixblock): Store result to buffer in native-endianess. -- Commit 4515315f61fbf79413e150fbd1d5f5a2435f2bc5 unintendedly changed this native-endian store to little-endian. Reported-by: Yuriy Kaminskiy <yumkam@gmail.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-27	Add Intel SSSE3 based vector permutation AES implementation	Jussi Kivilinna	4	-3/+1313
	* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'. * cipher/rijndael-internal.h (USE_SSSE3): New. (RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'. * cipher/rijndael-ssse3-amd64.c: New. * cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey) (_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt) (_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc) (_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc) (_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New. (do_setkey): Add HWF check for SSSE3 and setup for SSSE3 implementation. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add selection for SSSE3 implementation. * configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'. -- This patch adds "AES with vector permutations" implementation by Mike Hamburg. Public-domain source-code is available at: http://crypto.stanford.edu/vpaes/ Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo): Old (AMD64 asm): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 8.79 ns/B 108.5 MiB/s 18.46 c/B ECB dec \| 9.07 ns/B 105.1 MiB/s 19.05 c/B CBC enc \| 7.77 ns/B 122.7 MiB/s 16.33 c/B CBC dec \| 7.74 ns/B 123.2 MiB/s 16.26 c/B CFB enc \| 7.88 ns/B 121.0 MiB/s 16.54 c/B CFB dec \| 7.56 ns/B 126.1 MiB/s 15.88 c/B OFB enc \| 9.02 ns/B 105.8 MiB/s 18.94 c/B OFB dec \| 9.07 ns/B 105.1 MiB/s 19.05 c/B CTR enc \| 7.80 ns/B 122.2 MiB/s 16.38 c/B CTR dec \| 7.81 ns/B 122.2 MiB/s 16.39 c/B New (ssse3): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 5.77 ns/B 165.2 MiB/s 12.13 c/B ECB dec \| 7.13 ns/B 133.7 MiB/s 14.98 c/B CBC enc \| 5.27 ns/B 181.0 MiB/s 11.06 c/B CBC dec \| 6.39 ns/B 149.3 MiB/s 13.42 c/B CFB enc \| 5.27 ns/B 180.9 MiB/s 11.07 c/B CFB dec \| 5.28 ns/B 180.7 MiB/s 11.08 c/B OFB enc \| 6.11 ns/B 156.1 MiB/s 12.83 c/B OFB dec \| 6.13 ns/B 155.5 MiB/s 12.88 c/B CTR enc \| 5.26 ns/B 181.5 MiB/s 11.04 c/B CTR dec \| 5.24 ns/B 182.0 MiB/s 11.00 c/B Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled): Old (AMD64 asm): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 8.06 ns/B 118.3 MiB/s 20.15 c/B ECB dec \| 8.21 ns/B 116.1 MiB/s 20.53 c/B CBC enc \| 7.88 ns/B 121.1 MiB/s 19.69 c/B CBC dec \| 7.57 ns/B 126.0 MiB/s 18.92 c/B CFB enc \| 7.87 ns/B 121.2 MiB/s 19.67 c/B CFB dec \| 7.56 ns/B 126.2 MiB/s 18.89 c/B OFB enc \| 8.27 ns/B 115.3 MiB/s 20.67 c/B OFB dec \| 8.28 ns/B 115.1 MiB/s 20.71 c/B CTR enc \| 8.02 ns/B 119.0 MiB/s 20.04 c/B CTR dec \| 8.02 ns/B 118.9 MiB/s 20.05 c/B New (ssse3): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 4.03 ns/B 236.6 MiB/s 10.07 c/B ECB dec \| 5.28 ns/B 180.8 MiB/s 13.19 c/B CBC enc \| 3.77 ns/B 252.7 MiB/s 9.43 c/B CBC dec \| 4.69 ns/B 203.3 MiB/s 11.73 c/B CFB enc \| 3.75 ns/B 254.3 MiB/s 9.37 c/B CFB dec \| 3.69 ns/B 258.6 MiB/s 9.22 c/B OFB enc \| 4.17 ns/B 228.7 MiB/s 10.43 c/B OFB dec \| 4.17 ns/B 228.7 MiB/s 10.42 c/B CTR enc \| 3.72 ns/B 256.5 MiB/s 9.30 c/B CTR dec \| 3.72 ns/B 256.1 MiB/s 9.31 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-25	scrypt: fix compiler warnings on ARM	Jussi Kivilinna	1	-1/+1
	* cipher/scrypt.c (_scryptBlockMix): Cast X to 'u32 ' through 'void '. -- Patch fixes 'cast increases required alignment' warnings seen on GCC: scrypt.c: In function '_scryptBlockMix': scrypt.c:145:22: warning: cast increases required alignment of target type [-Wcast-align] _salsa20_core ((u32)X, (u32)X, 8); ^ scrypt.c:145:31: warning: cast increases required alignment of target type [-Wcast-align] _salsa20_core ((u32)X, (u32)X, 8); ^ Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-25	hash: fix compiler warning on ARM	Jussi Kivilinna	8	-11/+11
	* cipher/md.c (md_open, md_copy): Cast 'char ' to ctx through 'void '. * cipher/md4.c (md4_final): Use buf_put_* helper instead of converting 'char ' to 'u32 '. * cipher/md5.c (md5_final): Ditto. * cipher/rmd160.c (_gcry_rmd160_mixblock, rmd160_final): Ditto. * cipher/sha1.c (sha1_final): Ditto. * cipher/sha256.c (sha256_final): Ditto. * cipher/sha512.c (sha512_final): Ditto. * cipher/tiger.c (tiger_final): Ditto. -- Patch fixes 'cast increases required alignment' warnings seen on GCC: md.c: In function 'md_open': md.c:318:23: warning: cast increases required alignment of target type [-Wcast-align] hd->ctx = ctx = (struct gcry_md_context ) ((char ) hd + n); ^ md.c: In function 'md_copy': md.c:491:22: warning: cast increases required alignment of target type [-Wcast-align] bhd->ctx = b = (struct gcry_md_context ) ((char ) bhd + n); ^ md4.c: In function 'md4_final': md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align] #define X(a) do { (u32)p = le_bswap32((hd).a) ; p += 4; } while(0) ^ md4.c:259:3: note: in expansion of macro 'X' X(A); ^ md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align] #define X(a) do { (u32)p = le_bswap32((hd).a) ; p += 4; } while(0) ^ md4.c:260:3: note: in expansion of macro 'X' X(B); ^ [removed the rest] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-25	rijndael: fix compiler warnings on ARM	Jussi Kivilinna	2	-72/+75
	* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add u32 variants of keyschedule arrays to unions u1 and u2. (keyschedenc32, keyscheddec32): New. * cipher/rijndael.c (u32_a_t): Remove. (do_setkey): Add and use tkk[].data32, k_u32, tk_u32 and W_u32; Remove casting byte arrays to u32_a_t. (prepare_decryption, do_encrypt_fn, do_decrypt_fn): Use keyschedenc32 and keyscheddec32; Remove casting byte arrays to u32_a_t. -- Patch fixes 'cast increases required alignment' compiler warnings that GCC was showing: rijndael.c: In function 'do_setkey': rijndael.c:310:13: warning: cast increases required alignment of target type [-Wcast-align] ((u32_a_t)tk[j]) = ((u32_a_t)k[j]); ^ rijndael.c:310:34: warning: cast increases required alignment of target type [-Wcast-align] ((u32_a_t)tk[j]) = ((u32_a_t)k[j]); [removed the rest] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-23	Poly1305-AEAD: updated implementation to match ↵	Jussi Kivilinna	3	-25/+56
	draft-irtf-cfrg-chacha20-poly1305-03 * cipher/cipher-internal.h (gcry_cipher_handle): Use separate byte counters for AAD and data in Poly1305. * cipher/cipher-poly1305.c (poly1305_fill_bytecount): Remove. (poly1305_fill_bytecounts, poly1305_do_padding): New. (poly1305_aad_finish): Fill padding to Poly1305 and do not fill AAD length. (_gcry_cipher_poly1305_authenticate, _gcry_cipher_poly1305_encrypt) (_gcry_cipher_poly1305_decrypt): Update AAD and data length separately. (_gcry_cipher_poly1305_tag): Fill padding and bytecounts to Poly1305. (_gcry_cipher_poly1305_setkey, _gcry_cipher_poly1305_setiv): Reset AAD and data byte counts; only allow 96-bit IV. * cipher/cipher.c (_gcry_cipher_open_internal): Limit Poly1305-AEAD to ChaCha20 cipher. * tests/basic.c (_check_poly1305_cipher): Update test-vectors. (check_ciphers): Limit Poly1305-AEAD checks to ChaCha20. * tests/bench-slope.c (cipher_bench_one): Ditto. -- Latest Internet-Draft version for "ChaCha20 and Poly1305 for IETF protocols" has added additional padding to Poly1305-AEAD and limited support IV size to 96-bits: https://www.ietf.org/rfcdiff?url1=draft-nir-cfrg-chacha20-poly1305-03&difftype=--html&submit=Go!&url2=draft-irtf-cfrg-chacha20-poly1305-03 Patch makes Poly1305-AEAD implementation to match the changes and limits Poly1305-AEAD to ChaCha20 only. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-23	chacha20: allow setting counter for stream random access	Jussi Kivilinna	1	-5/+16
	* cipher/chacha20.c (CHACHA20_CTR_SIZE): New. (chacha20_ivsetup): Add setup for full counter. (chacha20_setiv): Allow ivlen == CHACHA20_CTR_SIZE. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-23	gcm: do not pass extra key pointer for setupM/fillM	Jussi Kivilinna	2	-8/+9
	* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_setup_intel_pclmul): Remove 'h' parameter. * cipher/cipher-gcm.c (_gcry_ghash_setup_intel_pclmul): Ditto. (fillM): Get 'h' pointer from 'c'. (setupM): Remome 'h' parameter. (_gcry_cipher_gcm_setkey): Only pass 'c' to setupM. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-23	rijndael: use more compact look-up tables and add table prefetching	Jussi Kivilinna	5	-3426/+820
	* cipher/rijndael-internal.h (rijndael_prefetchfn_t): New. (RIJNDAEL_context): Add 'prefetch_enc_fn' and 'prefetch_dec_fn'. * cipher/rijndael-tables.h (S, T1, T2, T3, T4, T5, T6, T7, T8, S5, U1) (U2, U3, U4): Remove. (encT, dec_tables, decT, inv_sbox): Add. * cipher/rijndael.c (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block, _gcry_aes_arm_encrypt_block) (_gcry_aes_arm_encrypt_block): Add parameter for passing table pointer to assembly implementation. (prefetch_table, prefetch_enc, prefetch_dec): New. (do_setkey): Setup context prefetch functions depending on selected rijndael implementation; Use new tables for key setup. (prepare_decryption): Use new tables for decryption key setup. (do_encrypt_aligned): Rename to... (do_encrypt_fn): ... to this, change to use new compact tables, make handle unaligned input and unroll rounds loop by two. (do_encrypt): Remove handling of unaligned input/output; pass table pointer to assembly implementations. (rijndael_encrypt, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec): Prefetch encryption tables before encryption. (do_decrypt_aligned): Rename to... (do_decrypt_fn): ... to this, change to use new compact tables, make handle unaligned input and unroll rounds loop by two. (do_decrypt): Remove handling of unaligned input/output; pass table pointer to assembly implementations. (rijndael_decrypt, _gcry_aes_cbc_dec): Prefetch decryption tables before decryption. * cipher/rijndael-amd64.S: Use 1+1.25 KiB tables for encryption+decryption; remove tables from assembly file. * cipher/rijndael-arm.S: Ditto. -- Patch replaces 4+4.25 KiB look-up tables in generic implementation and 8+8 KiB look-up tables in AMD64 implementation and 2+2 KiB look-up tables in ARM implementation with 1+1.25 KiB look-up tables, and adds prefetching of look-up tables. AMD64 assembly is slower than before because of additional rotation instructions. The generic C implementation is now better optimized and actually faster than before. Benchmark results on Intel i5-4570 (turbo off) (64-bit, AMD64 assembly): tests/bench-slope --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes Old: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 3.10 ns/B 307.5 MiB/s 9.92 c/B ECB dec \| 3.15 ns/B 302.5 MiB/s 10.09 c/B CBC enc \| 3.46 ns/B 275.5 MiB/s 11.08 c/B CBC dec \| 3.19 ns/B 299.2 MiB/s 10.20 c/B CFB enc \| 3.48 ns/B 274.4 MiB/s 11.12 c/B CFB dec \| 3.23 ns/B 294.8 MiB/s 10.35 c/B OFB enc \| 3.29 ns/B 290.2 MiB/s 10.52 c/B OFB dec \| 3.31 ns/B 288.3 MiB/s 10.58 c/B CTR enc \| 3.64 ns/B 261.7 MiB/s 11.66 c/B CTR dec \| 3.65 ns/B 261.6 MiB/s 11.67 c/B New: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 4.21 ns/B 226.7 MiB/s 13.46 c/B ECB dec \| 4.27 ns/B 223.2 MiB/s 13.67 c/B CBC enc \| 4.15 ns/B 229.8 MiB/s 13.28 c/B CBC dec \| 3.85 ns/B 247.8 MiB/s 12.31 c/B CFB enc \| 4.16 ns/B 229.1 MiB/s 13.32 c/B CFB dec \| 3.88 ns/B 245.9 MiB/s 12.41 c/B OFB enc \| 4.38 ns/B 217.8 MiB/s 14.01 c/B OFB dec \| 4.36 ns/B 218.6 MiB/s 13.96 c/B CTR enc \| 4.30 ns/B 221.6 MiB/s 13.77 c/B CTR dec \| 4.30 ns/B 221.7 MiB/s 13.76 c/B Benchmark on Intel i5-4570 (turbo off) (32-bit mingw, generic C): tests/bench-slope.exe --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes Old: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 6.03 ns/B 158.2 MiB/s 19.29 c/B ECB dec \| 5.81 ns/B 164.1 MiB/s 18.60 c/B CBC enc \| 6.22 ns/B 153.4 MiB/s 19.90 c/B CBC dec \| 5.91 ns/B 161.3 MiB/s 18.92 c/B CFB enc \| 6.25 ns/B 152.7 MiB/s 19.99 c/B CFB dec \| 6.24 ns/B 152.8 MiB/s 19.97 c/B OFB enc \| 6.33 ns/B 150.6 MiB/s 20.27 c/B OFB dec \| 6.33 ns/B 150.7 MiB/s 20.25 c/B CTR enc \| 6.28 ns/B 152.0 MiB/s 20.08 c/B CTR dec \| 6.28 ns/B 151.7 MiB/s 20.11 c/B New: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 5.02 ns/B 190.0 MiB/s 16.06 c/B ECB dec \| 5.33 ns/B 178.8 MiB/s 17.07 c/B CBC enc \| 4.64 ns/B 205.4 MiB/s 14.86 c/B CBC dec \| 4.95 ns/B 192.7 MiB/s 15.84 c/B CFB enc \| 4.75 ns/B 200.7 MiB/s 15.20 c/B CFB dec \| 4.74 ns/B 201.1 MiB/s 15.18 c/B OFB enc \| 5.29 ns/B 180.3 MiB/s 16.93 c/B OFB dec \| 5.29 ns/B 180.3 MiB/s 16.93 c/B CTR enc \| 4.77 ns/B 200.0 MiB/s 15.26 c/B CTR dec \| 4.77 ns/B 199.8 MiB/s 15.27 c/B Benchmark on Cortex-A8 (ARM assembly): tests/bench-slope --cpu-mhz 1008 cipher aes Old: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 21.84 ns/B 43.66 MiB/s 22.02 c/B ECB dec \| 22.35 ns/B 42.67 MiB/s 22.53 c/B CBC enc \| 22.97 ns/B 41.53 MiB/s 23.15 c/B CBC dec \| 23.48 ns/B 40.61 MiB/s 23.67 c/B CFB enc \| 22.72 ns/B 41.97 MiB/s 22.90 c/B CFB dec \| 23.41 ns/B 40.74 MiB/s 23.59 c/B OFB enc \| 23.65 ns/B 40.32 MiB/s 23.84 c/B OFB dec \| 23.67 ns/B 40.29 MiB/s 23.86 c/B CTR enc \| 23.24 ns/B 41.03 MiB/s 23.43 c/B CTR dec \| 23.23 ns/B 41.05 MiB/s 23.42 c/B New: AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 26.03 ns/B 36.64 MiB/s 26.24 c/B ECB dec \| 26.97 ns/B 35.36 MiB/s 27.18 c/B CBC enc \| 23.21 ns/B 41.09 MiB/s 23.39 c/B CBC dec \| 23.36 ns/B 40.83 MiB/s 23.54 c/B CFB enc \| 23.02 ns/B 41.42 MiB/s 23.21 c/B CFB dec \| 23.67 ns/B 40.28 MiB/s 23.86 c/B OFB enc \| 27.86 ns/B 34.24 MiB/s 28.08 c/B OFB dec \| 27.87 ns/B 34.21 MiB/s 28.10 c/B CTR enc \| 23.47 ns/B 40.63 MiB/s 23.66 c/B CTR dec \| 23.49 ns/B 40.61 MiB/s 23.67 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-12	rijndael: further optimizations for AES-NI accelerated CBC and CFB bulk modes	Jussi Kivilinna	1	-140/+104
	* cipher/rijndael-aesni.c (do_aesni_enc, do_aesni_dec): Pass input/output through SSE register XMM0. (do_aesni_cfb): Remove. (_gcry_aes_aesni_encrypt, _gcry_aes_aesni_decrypt): Add loading/storing input/output to/from XMM0. (_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc) (_gcry_aes_aesni_cfb_dec): Update to use renewed 'do_aesni_enc' and move IV loading/storing outside loop. (_gcry_aes_aesni_cbc_dec): Update to use renewed 'do_aesni_dec'. -- CBC encryption speed is improved ~16% on Intel Haswell and CFB encryption ~8%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-12	GCM: move Intel PCLMUL accelerated implementation to separate file	Jussi Kivilinna	4	-377/+430
	* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'. * cipher/cipher-gcm-intel-pclmul.c: New. * cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New prototypes. [GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move to 'cipher-gcm-intel-pclmul.c'. (ghash): Rename to... (ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'. (setupM): Move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based on available HW acceleration. (do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'. * cipher/internal.h (ghash_fn_t): New. (gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-06	rijndael: split Padlock part to separate file	Jussi Kivilinna	3	-79/+111
	* cipher/Makefile.am: Add 'rijndael-padlock.c'. * cipher/rijndael-padlock.c: New. * cipher/rijndael.c (do_padlock, do_padlock_encrypt) (do_padlock_decrypt): Move to 'rijndael-padlock.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01	rijndael: refactor to reduce number of #ifdefs and branches	Jussi Kivilinna	5	-223/+172
	* cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt) (_gcry_aes_aesni_decrypt): Make return stack burn depth. * cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block): Ditto. * cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block) (_gcry_aes_arm_decrypt_block): Ditto. * cipher/rijndael-internal.h (RIJNDAEL_context_s) (rijndael_cryptfn_t): New. (RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'. * cipher/rijndael.c (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt) (_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block) (_gcry_aes_arm_decrypt_block): Change prototypes. (do_padlock_encrypt, do_padlock_decrypt): New. (do_setkey): Separate key-length to rounds conversion from HW features check; Add selection for ctx->encrypt_fn and ctx->decrypt_fn. (do_encrypt_aligned, do_decrypt_aligned): Move inside '[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and USE_ARM_ASM to... (do_encrypt, do_decrypt): ...here; Return stack depth; Remove second temporary buffer from non-aligned input/output case. (do_padlock): Move decrypt_flag to last argument; Return stack depth. (rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn. (_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned. (_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer after use. (rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn. (_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place of do_decrypt/do_decrypt_aligned. (_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01	rijndael: move AES-NI blocks before Padlock	Jussi Kivilinna	1	-43/+45
	* cipher/rijndael.c (do_setkey, rijndael_encrypt, _gcry_aes_cfb_enc) (rijndael_decrypt, _gcry_aes_cfb_dec): Move USE_AESNI before USE_PADLOCK. (check_decryption_praparation) [USE_PADLOCK]: Move to... (prepare_decryption) [USE_PADLOCK]: ...here. -- Make order of AES-NI and Padlock #ifdefs consistent. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01	rijndael: split AES-NI functions to separate file	Jussi Kivilinna	4	-1331/+1471
	* cipher/Makefile.in: Add 'rijndael-aesni.c'. * cipher/rijndael-aesni.c: New. * cipher/rijndael-internal.h: New. * cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16) (USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context) (keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'. (u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6) (aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4) (do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move to 'rijndael-aesni.c'. (prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc) (_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions in 'rijdael-aesni.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'. -- Clean-up rijndael.c before new new hardware acceleration support gets added. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-19	ecc: Improve Montgomery curve implementation.	NIIBE Yutaka	2	-6/+86
	* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Support MPI_EC_MONTGOMERY. * cipher/ecc.c (test_ecdh_only_keys): New. (nist_generate_key): Call test_ecdh_only_keys for MPI_EC_MONTGOMERY. (check_secret_key): Handle Montgomery curve of x-coordinate only. * mpi/ec.c (_gcry_mpi_ec_mul_point): Resize points before the loop. Simplify, using pointers of Q1, Q2, PRD, and SUM. --
2014-11-02	Add ARM/NEON implementation of Poly1305	Jussi Kivilinna	4	-1/+747
	* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: New. * cipher/poly1305-internal.h (POLY1305_USE_NEON) (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT): New. * cipher/poly1305.c [POLY1305_USE_NEON] (_gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New. (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation if HWF_ARM_NEON set. * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'. -- Add Andrew Moon's public domain NEON implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 12.34 ns/B 77.27 MiB/s 12.44 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 2.12 ns/B 450.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02	chacha20: add ARMv7/NEON implementation	Jussi Kivilinna	3	-0/+745
	* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'. * cipher/chacha20-armv7-neon.S: New. * cipher/chacha20.c (USE_NEON): New. [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New. (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if HWF_ARM_NEON flag set. (selftest): Self-test encrypting buffer byte by byte. * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'. -- Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 13.45 ns/B 70.92 MiB/s 13.56 c/B STREAM dec \| 13.45 ns/B 70.90 MiB/s 13.56 c/B New: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 6.20 ns/B 153.9 MiB/s 6.25 c/B STREAM dec \| 6.20 ns/B 153.9 MiB/s 6.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-08	Fix prime test for 2 and lower and add check command to mpicalc.	Werner Koch	1	-9/+10
	* cipher/primegen.c (check_prime): Return true for the small primes. (_gcry_prime_check): Return correct values for 2 and lower numbers. * src/mpicalc.c (do_primecheck): New. (main): Add command 'P'. (main): Allow for larger input data.
2014-10-04	Add Whirlpool AMD64/SSE2 assembly implementation	Jussi Kivilinna	3	-37/+391
	* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. -- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL \| 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL \| 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL \| 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL \| 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL \| 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL \| 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-04	Improved ripemd160 performance	Andrei Scherer	1	-189/+178
	* cipher/rmd160.c (transform): Interleave the left and right lane rounds to introduce more instruction level parallelism. -- The benchmarks on different systems: Intel(R) Atom(TM) CPU N570 @ 1.66GHz before: Hash: \| nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 \| 13.07 ns/B 72.97 MiB/s - c/B after: Hash: \| nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 \| 11.37 ns/B 83.84 MiB/s - c/B Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz before: Hash: \| nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 \| 3.31 ns/B 288.0 MiB/s - c/B after: Hash: \| nanosecs/byte mebibytes/sec cycles/byte RIPEMD160 \| 2.08 ns/B 458.5 MiB/s - c/B Signed-off-by: Andrei Scherer <andsch@inbox.com>
2014-09-30	mac: Fix gcry_mac_close to allow for a NULL handle.	Werner Koch	1	-1/+2
	* cipher/mac.c (_gcry_mac_close): Check for NULL. -- We always allow this for easier cleanup. actually the docs already tell that this is allowed.
2014-08-21	cipher: Fix a segv in case of calling with wrong parameters.	Werner Koch	1	-1/+1
	* cipher/md.c (_gcry_md_info): Fix arg testing. -- GnuPG-bug-id: 1697
2014-08-21	cipher: Fix possible NULL deref in call to prime generator.	Werner Koch	3	-18/+41
	* cipher/primegen.c (_gcry_generate_elg_prime): Change to return an error code. * cipher/dsa.c (generate): Take care of new return code. * cipher/elgamal.c (generate): Change to return an error code. Take care of _gcry_generate_elg_prime return code. (generate_using_x): Take care of _gcry_generate_elg_prime return code. (elg_generate): Propagate return code from generate. -- GnuPG-bug-id: 1699, 1700 Reported-by: S.K. Gupta Note that the NULL deref may have only happened on malloc failure.
2014-08-08	ecc: Add cofactor to domain parameters.	NIIBE Yutaka	5	-72/+151
	* src/ec-context.h (mpi_ec_ctx_s): Add cofactor 'h'. * cipher/ecc-common.h (elliptic_curve_t): Add cofactor 'h'. (_gcry_ecc_update_curve_param): New API adding cofactor. * cipher/ecc-curves.c (ecc_domain_parms_t): Add cofactor 'h'. (ecc_domain_parms_t domain_parms): Add cofactors. (_gcry_ecc_fill_in_curve, _gcry_ecc_update_curve_param) (_gcry_ecc_get_curve, _gcry_mpi_ec_new, _gcry_ecc_get_param_sexp) (_gcry_ecc_get_mpi): Handle cofactor. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Likewise. * cipher/ecc-misc.c (_gcry_ecc_curve_free) (_gcry_ecc_curve_copy): Likewise. * cipher/ecc.c (nist_generate_key, ecc_generate) (ecc_check_secret_key, ecc_sign, ecc_verify, ecc_encrypt_raw) (ecc_decrypt_raw, _gcry_pk_ecc_get_sexp, _gcry_pubkey_spec_ecc): Likewise. (compute_keygrip): Handle cofactor, but skip it for its computation. * mpi/ec.c (ec_deinit): Likewise. * tests/t-mpi-point.c (context_param): Likewise. (test_curve): Add cofactors. * tests/curves.c (sample_key_1, sample_key_2): Add cofactors. * tests/keygrip.c (key_grips): Add cofactors. -- We keep compatibility of compute_keygrip in cipher/ecc.c.
2014-07-25	ecc: Support the non-standard 0x40 compression flag for EdDSA.	Werner Koch	4	-67/+99
	* cipher/ecc.c (ecc_generate): Check the "comp" flag for EdDSA. * cipher/ecc-eddsa.c (eddsa_encode_x_y): Add arg WITH_PREFIX. (_gcry_ecc_eddsa_encodepoint): Ditto. (_gcry_ecc_eddsa_ensure_compact): Handle the 0x40 compression prefix. (_gcry_ecc_eddsa_decodepoint): Ditto. * tests/keygrip.c: Check an compresssed with prefix Ed25519 key. * tests/t-ed25519.inp: Ditto.
2014-07-25	cipher: Fix compiler warning for chacha20.	Werner Koch	1	-0/+3
	* cipher/chacha20.c (chacha20_blocks) [!USE_SSE2]: Do not build.
2014-06-29	Speed-up SHA-1 NEON assembly implementation	Jussi Kivilinna	1	-73/+82
	* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up. -- Benchmark on Cortex-A8 1008Mhz: New: \| nanosecs/byte mebibytes/sec cycles/byte SHA1 \| 7.04 ns/B 135.4 MiB/s 7.10 c/B Old: \| nanosecs/byte mebibytes/sec cycles/byte SHA1 \| 7.79 ns/B 122.4 MiB/s 7.85 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-06-28	gostr3411_94: rewrite to use u32 mathematic	Dmitry Eremin-Solenikov	3	-103/+139
	* cipher/gost28147.c (_gcry_gost_enc_data): New. * cipher/gostr3411-94.c: Rewrite implementation to use u32 mathematic internally. * cipher/gost28147.c (_gcry_gost_enc_one): Remove. -- On my box (Core2 Duo, i386) this highly improves GOST R 34.11-94 speed. Before: GOSTR3411_94 \| 55.04 ns/B 17.33 MiB/s - c/B After: GOSTR3411_94 \| 36.70 ns/B 25.99 MiB/s - c/B Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	gost28147: use bufhelp helpers	Dmitry Eremin-Solenikov	1	-36/+10
	* cipher/gost28147.c (gost_setkey, gost_encrypt_block, gost_decrypt_block): use buf_get_le32/buf_put_le32 helpers. -- On my box this boosts GOST 28147-89 speed from 36 MiB/s up to 44.5 MiB/s. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	Add GOST R 34.11-94 variant using id-GostR3411-94-CryptoProParamSet	Dmitry Eremin-Solenikov	4	-8/+31
	* src/gcrypt.h.in (GCRY_MD_GOSTR3411_CP): New. * src/cipher.h (_gcry_digest_spec_gost3411_cp): New. * cipher/gost28147.c (_gcry_gost_enc_one): Differentiate between CryptoPro and Test S-Boxes. * cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_cp, gost3411_cp_init): New. * cipher/md.c (md_open): GCRY_MD_GOSTR3411_CP also uses B=32. -- RFC4357 defines only two S-Boxes that should be used together with GOST R 34.11-94 - a testing one (from standard itself, for testing only) and CryptoPro one. Instead of adding a separate gcry_md_ctrl() function just to switch s-boxes, add a separate MD algorithm using CryptoPro S-box. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	gost28147: support GCRYCTL_SET_SBOX	Dmitry Eremin-Solenikov	1	-0/+39
	cipher/gost28147.c (gost_set_extra_info, gost_set_sbox): New. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	Support setting s-box for the ciphers that require it	Dmitry Eremin-Solenikov	1	-0/+7
	* src/gcrypt.h.in (GCRYCTL_SET_SBOX, gcry_cipher_set_sbox): New. * cipher/cipher.c (_gcry_cipher_ctl): pass GCRYCTL_SET_SBOX to set_extra_info callback. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	cipher/gost28147: generate optimized s-boxes from compact ones	Dmitry Eremin-Solenikov	4	-274/+270
	* cipher/gost-s-box.c: New. Outputs optimized expanded representation of s-boxes (4x256) from compact 16x8 representation. * cipher/Makefile.am: Add gost-sb.h dependency to gost28147.lo * cipher/gost.h: Add sbox to the GOST28147_context structure. * cipher/gost28147.c (gost_setkey): Set default s-box to test s-box from GOST R 34.11 (this was the only one S-box before). * cipher/gost28147.c (gost_val): Use sbox from the context. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	gost28147: add OIDs used to define cipher mode	Dmitry Eremin-Solenikov	1	-1/+11
	* cipher/gost28147 (oids_gost28147): Add OID from RFC4357. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-06-28	GOST R 34.11-94 add OIDs	Dmitry Eremin-Solenikov	1	-1/+14
	* cipher/gostr3411-94.c: Add OIDs for GOST R 34.11-94 from RFC 4357. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-05-21	sha512: fix ARM/NEON implementation	Jussi Kivilinna	1	-1/+1
	* cipher/sha512-armv7-neon.S (_gcry_sha512_transform_armv7_neon): Byte-swap RW67q and RW1011q correctly in multi-block loop. * tests/basic.c (check_digests): Add large test vector for SHA512. -- Patch fixes bug introduced to multi-block processing by commit df629ba53a6, "Improve performance of SHA-512/ARM/NEON implementation". Patch also adds multi-block test vector for SHA-512. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-20	Fix ARM assembly when building __PIC__	Jussi Kivilinna	4	-10/+64
	* cipher/camellia-arm.S (GET_DATA_POINTER): New. (_gcry_camellia_arm_encrypt_block): Use GET_DATA_POINTER. (_gcry_camellia_arm_decrypt_block): Ditto. * cipher/cast5-arm.S (GET_DATA_POINTER): New. (_gcry_cast5_arm_encrypt_block, _gcry_cast5_arm_decrypt_block) (_gcry_cast5_arm_enc_blk2, _gcry_cast5_arm_dec_blk2): Use GET_DATA_POINTER. * cipher/rijndael-arm.S (GET_DATA_POINTER): New. (_gcry_aes_arm_encrypt_block, _gcry_aes_arm_decrypt_block): Use GET_DATA_POINTER. * cipher/sha1-armv7-neon.S (GET_DATA_POINTER): New. (.LK_VEC): Move from .text to .data section. (_gcry_sha1_transform_armv7_neon): Use GET_DATA_POINTER. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-16	chacha20: add SSE2/AMD64 optimized implementation	Jussi Kivilinna	3	-1/+671
	* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'. * cipher/chacha20-sse2-amd64.S: New. * cipher/chacha20.c (USE_SSE2): New. [USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New. (chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks function. * configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Intel i5-4570 (haswell), with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3": Old: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 1.97 ns/B 483.8 MiB/s 6.31 c/B STREAM dec \| 1.97 ns/B 483.6 MiB/s 6.31 c/B New: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.931 ns/B 1024.7 MiB/s 2.98 c/B STREAM dec \| 0.930 ns/B 1025.0 MiB/s 2.98 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-16	poly1305: add AMD64/AVX2 optimized implementation	Jussi Kivilinna	4	-4/+1001
	* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'. * cipher/poly1305-avx2-amd64.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX2) (POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE) (POLY1305_AVX2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed. * cipher/poly1305.c [POLY1305_USE_AVX2] (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New. (_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if AVX2 supported by CPU. * configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'. -- Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.448 ns/B 2129.5 MiB/s 1.43 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.205 ns/B 4643.5 MiB/s 0.657 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	poly1305: add AMD64/SSE2 optimized implementation	Jussi Kivilinna	4	-3/+1084
	* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'. * cipher/poly1305-internal.h (POLY1305_USE_SSE2) (POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed. * cipher/poly1305-sse2-amd64.S: New. * cipher/poly1305.c [POLY1305_USE_SSE2] (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New. (_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version. * configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.844 ns/B 1130.2 MiB/s 2.70 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.448 ns/B 2129.5 MiB/s 1.43 c/B Benchmarks on Intel i5-2450M (sandy-bridge): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 1.25 ns/B 763.0 MiB/s 3.12 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.605 ns/B 1575.9 MiB/s 1.51 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	Add Poly1305 based cipher AEAD mode	Jussi Kivilinna	4	-5/+382
	* cipher/Makefile.am: Add 'cipher-poly1305.c'. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'. (_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt) (_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate) (_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New. * cipher/cipher-poly1305.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv) (_gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'. (cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ... (_gcry_cipher_setiv): ... here, as with other modes. * src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'. * tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New. (check_ciphers): Add Poly1305 check. (check_cipher_modes): Call 'check_poly1305_cipher'. * tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to bench_aead_... and take nonce as argument. (bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto. (bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench) (bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench) (bench_poly1305_decrypt_do_bench) (bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops) (poly1305_decrypt_ops, poly1305_authenticate_ops): New. (cipher_modes): Add Poly1305. (cipher_bench_one): Add special handling for Poly1305. -- Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant of this mode is proposed for use in TLS and ipsec: https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04 http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	Add Poly1305-AES (-Camellia, etc) MACs	Jussi Kivilinna	3	-14/+180
	* cipher/mac-internal.h (_gcry_mac_type_spec_poly1305_aes) (_gcry_mac_type_spec_poly1305_camellia) (_gcry_mac_type_spec_poly1305_twofish) (_gcry_mac_type_spec_poly1305_serpent) (_gcry_mac_type_spec_poly1305_seed): New. * cipher/mac-poly1305.c (poly1305mac_context_s): Add 'hd' and 'nonce_set'. (poly1305mac_open, poly1305mac_close, poly1305mac_setkey): Add handling for Poly1305-*** MACs. (poly1305mac_prepare_key, poly1305mac_setiv): New. (poly1305mac_reset, poly1305mac_write, poly1305mac_read): Add handling for 'nonce_set'. (poly1305mac_ops): Add 'poly1305mac_setiv'. (_gcry_mac_type_spec_poly1305_aes) (_gcry_mac_type_spec_poly1305_camellia) (_gcry_mac_type_spec_poly1305_twofish) (_gcry_mac_type_spec_poly1305_serpent) (_gcry_mac_type_spec_poly1305_seed): New. * cipher/mac.c (mac_list): Add Poly1305-AES, Poly1305-Twofish, Poly1305-Serpent, Poly1305-SEED and Poly1305-Camellia. * src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305_AES', 'GCRY_MAC_POLY1305_CAMELLIA', 'GCRY_MAC_POLY1305_TWOFISH', 'GCRY_MAC_POLY1305_SERPENT' and 'GCRY_MAC_POLY1305_SEED'. * tests/basic.c (check_mac): Add Poly1305-AES test vectors. * tests/bench-slope.c (bench_mac_init): Set IV for Poly1305-*** MACs. * tests/bench-slope.c (mac_bench): Set IV for Poly1305-*** MACs. -- Patch adds Bernstein's Poly1305-AES message authentication code to libgcrypt and other variants of Poly1305-<128-bit block cipher>. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	Add Poly1305 MAC	Jussi Kivilinna	6	-2/+1091
	* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and 'poly1305-internal.h'. * cipher/mac-internal.h (poly1305mac_context_s): New. (gcry_mac_handle): Add 'u.poly1305mac'. (_gcry_mac_type_spec_poly1305mac): New. * cipher/mac-poly1305.c: New. * cipher/mac.c (mac_list): Add Poly1305. * cipher/poly1305-internal.h: New. * cipher/poly1305.c: New. * src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'. * tests/basic.c (check_mac): Add Poly1035 test vectors; Allow overriding lengths of data and key buffers. * tests/bench-slope.c (mac_bench): Increase max algo number from 500 to 600. * tests/benchmark.c (mac_bench): Ditto. -- Patch adds Bernstein's Poly1305 message authentication code to libgcrypt. Implementation is based on Andrew Moon's public domain implementation from: https://github.com/floodyberry/poly1305-opt The algorithm added by this patch is the plain Poly1305 without AES and takes 32-bit key that must not be reused. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	chacha20/AVX2: clear upper-halfs of YMM registers on entry	Jussi Kivilinna	1	-0/+1
	* cipher/chacha20-avx2-amd64.S (_gcry_chacha20_amd64_avx2_blocks): Add 'vzeroupper' at beginning. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	chacha20/AVX2: check for ENABLE_AVX2_SUPPORT instead of HAVE_GCC_INLINE_ASM_AVX2	Jussi Kivilinna	2	-2/+2
	* cipher/chacha20.c (USE_AVX2): Enable depending on ENABLE_AVX2_SUPPORT, not HAVE_GCC_INLINE_ASM_AVX2. * cipher/chacha20-avx2-amd64.S: Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	chacha20/SSSE3: clear XMM registers after use	Jussi Kivilinna	1	-0/+16
	* cipher/chacha20-ssse3-amd64.S (_gcry_chacha20_amd64_ssse3_blocks): On return, clear XMM registers. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11	chacha20: add AVX2/AMD64 assembly implementation	Jussi Kivilinna	3	-2/+969
	* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'. * cipher/chacha20-avx2-amd64.S: New. * cipher/chacha20.c (USE_AVX2): New macro. [USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New. (chacha20_do_setkey): Select AVX2 implementation if there is HW support. (selftest): Increase size of buf by 256. * configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'. -- Add AVX2 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. SSSE3 (Intel Haswell): CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec \| 0.741 ns/B 1286.5 MiB/s 2.37 c/B AVX2: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.393 ns/B 2428.0 MiB/s 1.26 c/B STREAM dec \| 0.392 ns/B 2433.6 MiB/s 1.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11	chacha20: add SSSE3 assembly implementation	Jussi Kivilinna	3	-1/+633
	* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'. * cipher/chacha20-ssse3-amd64.S: New. * cipher/chacha20.c (USE_SSSE3): New macro. [USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New. (chacha20_do_setkey): Select SSSE3 implementation if there is HW support. * configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'. -- Add SSSE3 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. Before (Intel Haswell): CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 1.97 ns/B 483.6 MiB/s 6.31 c/B STREAM dec \| 1.97 ns/B 484.0 MiB/s 6.31 c/B After: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec \| 0.741 ns/B 1286.5 MiB/s 2.37 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>