peter/libgcrypt - libgcrypt source repository for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2016-02-08	Add ARM assembly implementation of SHA-512	Jussi Kivilinna	1	-0/+4
	* cipher/Makefile.am: Add 'sha512-arm.S'. * cipher/sha512-arm.S: New. * cipher/sha512.c (USE_ARM_ASM): New. (_gcry_sha512_transform_arm): New. (transform) [USE_ARM_ASM]: Use ARM assembly implementation instead of generic. * configure.ac: Add 'sha512-arm.lo'. -- Benchmark on Cortex-A8 (armv6, 1008 Mhz): Before: \| nanosecs/byte mebibytes/sec cycles/byte SHA512 \| 112.0 ns/B 8.52 MiB/s 112.9 c/B After (3.3x faster): \| nanosecs/byte mebibytes/sec cycles/byte SHA512 \| 34.01 ns/B 28.04 MiB/s 34.28 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01	Add ARMv7/NEON implementation of Keccak	Jussi Kivilinna	1	-1/+1
	* cipher/Makefile.am: Add 'keccak-armv7-neon.S'. * cipher/keccak-armv7-neon.S: New. * cipher/keccak.c (USE_64BIT_ARM_NEON): New. (NEED_COMMON64): Select if USE_64BIT_ARM_NEON. [NEED_COMMON64] (round_consts_64bit): Rename to... [NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add terminator at end. [USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon) (_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon) (keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New. (keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation if supported by HW. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update to use new round constant table. * configure.ac: Add 'keccak-armv7-neon.lo'. -- Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch is based on public-domain implementation by Ronny Van Keer from SUPERCOP package: https://github.com/floodyberry/supercop/blob/master/crypto_hash/\ keccakc1024/inplace-armv7a-neon/keccak2.s Benchmark results on Cortex-A8 @ 1008 Mhz: Before (generic 32-bit bit-interleaved impl.): \| nanosecs/byte mebibytes/sec cycles/byte SHAKE128 \| 83.00 ns/B 11.49 MiB/s 83.67 c/B SHAKE256 \| 101.7 ns/B 9.38 MiB/s 102.5 c/B SHA3-224 \| 96.13 ns/B 9.92 MiB/s 96.90 c/B SHA3-256 \| 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 \| 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 \| 189.1 ns/B 5.04 MiB/s 190.6 c/B After (ARM/NEON, ~3.2x faster): \| nanosecs/byte mebibytes/sec cycles/byte SHAKE128 \| 25.09 ns/B 38.01 MiB/s 25.29 c/B SHAKE256 \| 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-224 \| 29.24 ns/B 32.61 MiB/s 29.48 c/B SHA3-256 \| 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-384 \| 40.42 ns/B 23.59 MiB/s 40.74 c/B SHA3-512 \| 58.37 ns/B 16.34 MiB/s 58.84 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-25	Add configure option --enable-build-timestamp.	Werner Koch	1	-1/+10
	* configure.ac (BUILD_TIMESTAMP): Set to "<none>" by default. -- This is based on libgpg-error commit d620005fd1a655d591fccb44639e22ea445e4554 but changed to be disabled by default. Check there for some background. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-08	Add framework to eventually support SHA3.	Werner Koch	1	-1/+19
	* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256) (GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New. (GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256) (GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New. * cipher/keccak.c: New with stub functions. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c. * configure.ac (available_digests): Add sha3. (USE_SHA3): New. * src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests. * cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos. (md_open): Ditto for hmac processing. * cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping. * cipher/hmac-tests.c (run_selftests): Prepare for tests. * cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx". -- Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to sync them with OpenPGP. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-05-01	Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64	Jussi Kivilinna	1	-3/+105
	* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-05-01	Fix rndhw for 64-bit Windows build	Jussi Kivilinna	1	-0/+1
	* configure.ac: Add sizeof check for 'void '. random/rndhw.c (poll_padlock): Check for SIZEOF_VOID_P == 8 instead of defined(__LP64__). (RDRAND_LONG): Check for SIZEOF_UNSIGNED_LONG == 8 instead of defined(__LP64__). -- __LP64__ is not predefined for 64-bit mingw64-gcc, which caused wrong assembly code selections. Do selection based on type sizes instead, to support x86_64, x32 and win64 properly. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-05-01	Fix packed attribute check for Windows targets	Jussi Kivilinna	1	-1/+3
	* configure.ac (gcry_cv_gcc_attribute_packed): Move 'long b' to its own packed structure. -- Change packed attribute test so that it works with both MS ABI and SYSV ABI. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-03-21	bufhelp: use one-byte aligned type for unaligned memory accesses	Jussi Kivilinna	1	-0/+18
	* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only when HAVE_GCC_ATTRIBUTE_PACKED and HAVE_GCC_ATTRIBUTE_ALIGNED are defined. (bufhelp_int_t): New type. (buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst, buf_xor_n_copy_2): Use 'bufhelp_int_t'. [BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_u32_t, bufhelp_u64_t): New. [BUFHELP_FAST_UNALIGNED_ACCESS] (buf_get_be32, buf_get_le32) (buf_put_be32, buf_put_le32, buf_get_be64, buf_get_le64) (buf_put_be64, buf_put_le64): Use 'bufhelp_uXX_t'. * configure.ac (gcry_cv_gcc_attribute_packed): New. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-01-15	Add functions to count trailing zero bits in a word.	Werner Koch	1	-0/+15
	* cipher/bithelp.h (_gcry_ctz, _gcry_ctz64): New. * configure.ac (HAVE_BUILTIN_CTZ): Add new test. -- Note that these functions return the number of bits in the word when passing 0. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-01-05	random: Silent warning under NetBSD using rndunix	Werner Koch	1	-4/+3
	* random/rndunix.c (STDERR_FILENO): Define if needed. (start_gatherer): Re-open standard descriptors. Fix an unsigned/signed pointer warning. -- GnuPG-bug-id: 1702
2015-01-05	build: Require automake 1.14.	Werner Koch	1	-2/+2
	* configure.ac (AM_INIT_AUTOMAKE): Add serial-tests. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-12-27	Add Intel SSSE3 based vector permutation AES implementation	Jussi Kivilinna	1	-0/+3
	* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'. * cipher/rijndael-internal.h (USE_SSSE3): New. (RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'. * cipher/rijndael-ssse3-amd64.c: New. * cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey) (_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt) (_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc) (_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc) (_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New. (do_setkey): Add HWF check for SSSE3 and setup for SSSE3 implementation. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add selection for SSSE3 implementation. * configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'. -- This patch adds "AES with vector permutations" implementation by Mike Hamburg. Public-domain source-code is available at: http://crypto.stanford.edu/vpaes/ Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo): Old (AMD64 asm): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 8.79 ns/B 108.5 MiB/s 18.46 c/B ECB dec \| 9.07 ns/B 105.1 MiB/s 19.05 c/B CBC enc \| 7.77 ns/B 122.7 MiB/s 16.33 c/B CBC dec \| 7.74 ns/B 123.2 MiB/s 16.26 c/B CFB enc \| 7.88 ns/B 121.0 MiB/s 16.54 c/B CFB dec \| 7.56 ns/B 126.1 MiB/s 15.88 c/B OFB enc \| 9.02 ns/B 105.8 MiB/s 18.94 c/B OFB dec \| 9.07 ns/B 105.1 MiB/s 19.05 c/B CTR enc \| 7.80 ns/B 122.2 MiB/s 16.38 c/B CTR dec \| 7.81 ns/B 122.2 MiB/s 16.39 c/B New (ssse3): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 5.77 ns/B 165.2 MiB/s 12.13 c/B ECB dec \| 7.13 ns/B 133.7 MiB/s 14.98 c/B CBC enc \| 5.27 ns/B 181.0 MiB/s 11.06 c/B CBC dec \| 6.39 ns/B 149.3 MiB/s 13.42 c/B CFB enc \| 5.27 ns/B 180.9 MiB/s 11.07 c/B CFB dec \| 5.28 ns/B 180.7 MiB/s 11.08 c/B OFB enc \| 6.11 ns/B 156.1 MiB/s 12.83 c/B OFB dec \| 6.13 ns/B 155.5 MiB/s 12.88 c/B CTR enc \| 5.26 ns/B 181.5 MiB/s 11.04 c/B CTR dec \| 5.24 ns/B 182.0 MiB/s 11.00 c/B Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled): Old (AMD64 asm): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 8.06 ns/B 118.3 MiB/s 20.15 c/B ECB dec \| 8.21 ns/B 116.1 MiB/s 20.53 c/B CBC enc \| 7.88 ns/B 121.1 MiB/s 19.69 c/B CBC dec \| 7.57 ns/B 126.0 MiB/s 18.92 c/B CFB enc \| 7.87 ns/B 121.2 MiB/s 19.67 c/B CFB dec \| 7.56 ns/B 126.2 MiB/s 18.89 c/B OFB enc \| 8.27 ns/B 115.3 MiB/s 20.67 c/B OFB dec \| 8.28 ns/B 115.1 MiB/s 20.71 c/B CTR enc \| 8.02 ns/B 119.0 MiB/s 20.04 c/B CTR dec \| 8.02 ns/B 118.9 MiB/s 20.05 c/B New (ssse3): AES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 4.03 ns/B 236.6 MiB/s 10.07 c/B ECB dec \| 5.28 ns/B 180.8 MiB/s 13.19 c/B CBC enc \| 3.77 ns/B 252.7 MiB/s 9.43 c/B CBC dec \| 4.69 ns/B 203.3 MiB/s 11.73 c/B CFB enc \| 3.75 ns/B 254.3 MiB/s 9.37 c/B CFB dec \| 3.69 ns/B 258.6 MiB/s 9.22 c/B OFB enc \| 4.17 ns/B 228.7 MiB/s 10.43 c/B OFB dec \| 4.17 ns/B 228.7 MiB/s 10.42 c/B CTR enc \| 3.72 ns/B 256.5 MiB/s 9.30 c/B CTR dec \| 3.72 ns/B 256.1 MiB/s 9.31 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-15	build: Add configure option --disable-doc.	Werner Koch	1	-1/+11
	* Makefile.am (AUTOMAKE_OPTIONS): Remove. (doc) [!BUILD_DOC]: Do not recurse into the dir. * configure.ac (AM_INIT_AUTOMAKE): Add option formerly in Makefile.am. (BUILD_DOC): Add new am_conditional.
2014-12-06	rijndael: split Padlock part to separate file	Jussi Kivilinna	1	-0/+3
	* cipher/Makefile.am: Add 'rijndael-padlock.c'. * cipher/rijndael-padlock.c: New. * cipher/rijndael.c (do_padlock, do_padlock_encrypt) (do_padlock_decrypt): Move to 'rijndael-padlock.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01	rijndael: split AES-NI functions to separate file	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.in: Add 'rijndael-aesni.c'. * cipher/rijndael-aesni.c: New. * cipher/rijndael-internal.h: New. * cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16) (USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context) (keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'. (u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6) (aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4) (do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move to 'rijndael-aesni.c'. (prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc) (_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions in 'rijdael-aesni.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'. -- Clean-up rijndael.c before new new hardware acceleration support gets added. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02	Add ARM/NEON implementation of Poly1305	Jussi Kivilinna	1	-0/+5
	* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: New. * cipher/poly1305-internal.h (POLY1305_USE_NEON) (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT): New. * cipher/poly1305.c [POLY1305_USE_NEON] (_gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New. (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation if HWF_ARM_NEON set. * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'. -- Add Andrew Moon's public domain NEON implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 12.34 ns/B 77.27 MiB/s 12.44 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 2.12 ns/B 450.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02	chacha20: add ARMv7/NEON implementation	Jussi Kivilinna	1	-0/+5
	* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'. * cipher/chacha20-armv7-neon.S: New. * cipher/chacha20.c (USE_NEON): New. [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New. (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if HWF_ARM_NEON flag set. (selftest): Self-test encrypting buffer byte by byte. * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'. -- Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 13.45 ns/B 70.92 MiB/s 13.56 c/B STREAM dec \| 13.45 ns/B 70.90 MiB/s 13.56 c/B New: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 6.20 ns/B 153.9 MiB/s 6.25 c/B STREAM dec \| 6.20 ns/B 153.9 MiB/s 6.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-04	Add Whirlpool AMD64/SSE2 assembly implementation	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. -- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL \| 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL \| 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL \| 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL \| 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL \| 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL \| 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-02	build: Document SYSROOT.	Werner Koch	1	-0/+2
	* configure.ac: Mark SYSROOT as arg var.
2014-10-02	build: Support SYSROOT based config script finding.	Werner Koch	1	-2/+10
	* src/libgcrypt.m4: Add support for SYSROOT and set gpg_config_script_warn. Use AC_PATH_PROG instead of AC_PATH_TOOL because the config script is not expected to be installed with a prefix for its name * configure.ac: Print a library mismatch warning. * m4/gpg-error.m4: Update from git master. -- Also fixed the false copyright notice in libgcrypt.m4.
2014-05-16	chacha20: add SSE2/AMD64 optimized implementation	Jussi Kivilinna	1	-0/+1
	* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'. * cipher/chacha20-sse2-amd64.S: New. * cipher/chacha20.c (USE_SSE2): New. [USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New. (chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks function. * configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Intel i5-4570 (haswell), with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3": Old: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 1.97 ns/B 483.8 MiB/s 6.31 c/B STREAM dec \| 1.97 ns/B 483.6 MiB/s 6.31 c/B New: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.931 ns/B 1024.7 MiB/s 2.98 c/B STREAM dec \| 0.930 ns/B 1025.0 MiB/s 2.98 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-16	poly1305: add AMD64/AVX2 optimized implementation	Jussi Kivilinna	1	-0/+1
	* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'. * cipher/poly1305-avx2-amd64.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX2) (POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE) (POLY1305_AVX2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed. * cipher/poly1305.c [POLY1305_USE_AVX2] (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New. (_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if AVX2 supported by CPU. * configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'. -- Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.448 ns/B 2129.5 MiB/s 1.43 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.205 ns/B 4643.5 MiB/s 0.657 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12	poly1305: add AMD64/SSE2 optimized implementation	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'. * cipher/poly1305-internal.h (POLY1305_USE_SSE2) (POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed. * cipher/poly1305-sse2-amd64.S: New. * cipher/poly1305.c [POLY1305_USE_SSE2] (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New. (_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version. * configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.844 ns/B 1130.2 MiB/s 2.70 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.448 ns/B 2129.5 MiB/s 1.43 c/B Benchmarks on Intel i5-2450M (sandy-bridge): Old: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 1.25 ns/B 763.0 MiB/s 3.12 c/B New: \| nanosecs/byte mebibytes/sec cycles/byte POLY1305 \| 0.605 ns/B 1575.9 MiB/s 1.51 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11	chacha20: add AVX2/AMD64 assembly implementation	Jussi Kivilinna	1	-0/+1
	* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'. * cipher/chacha20-avx2-amd64.S: New. * cipher/chacha20.c (USE_AVX2): New macro. [USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New. (chacha20_do_setkey): Select AVX2 implementation if there is HW support. (selftest): Increase size of buf by 256. * configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'. -- Add AVX2 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. SSSE3 (Intel Haswell): CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec \| 0.741 ns/B 1286.5 MiB/s 2.37 c/B AVX2: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.393 ns/B 2428.0 MiB/s 1.26 c/B STREAM dec \| 0.392 ns/B 2433.6 MiB/s 1.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11	chacha20: add SSSE3 assembly implementation	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'. * cipher/chacha20-ssse3-amd64.S: New. * cipher/chacha20.c (USE_SSSE3): New macro. [USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New. (chacha20_do_setkey): Select SSSE3 implementation if there is HW support. * configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'. -- Add SSSE3 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. Before (Intel Haswell): CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 1.97 ns/B 483.6 MiB/s 6.31 c/B STREAM dec \| 1.97 ns/B 484.0 MiB/s 6.31 c/B After: CHACHA20 \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec \| 0.741 ns/B 1286.5 MiB/s 2.37 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11	Add ChaCha20 stream cipher	Jussi Kivilinna	1	-1/+7
	* cipher/Makefile.am: Add 'chacha20.c'. * cipher/chacha20.c: New. * cipher/cipher.c (cipher_list): Add ChaCha20. * configure.ac: Add ChaCha20. * doc/gcrypt.texi: Add ChaCha20. * src/cipher.h (_gcry_cipher_spec_chacha20): New. * src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo. * tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100. (check_stream_cipher): Add ChaCha20 test-vectors. (check_ciphers): Add ChaCha20. -- Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based on public domain implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-07	Bump LT version.	Werner Koch	1	-2/+3
	* configure.ac: Bumb LT version to C21/A1/R0. -- This is to avoid conflicts with the 1.6 series. Note that if we add a new interface to 1.6 we would need to bump age again.
2014-03-30	3des: add amd64 assembly implementation for 3DES	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'des-amd64.S'. * cipher/cipher-selftests.c (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures from 'setkey' function. * cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk functions for 3DES. * cipher/des-amd64.S: New file. * cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros. [USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block) (_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec) (_gcry_3des_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (tripledes_ecb_crypt): New function. (TRIPLEDES_ECB_BURN_STACK): New macro. (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec) (bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Add call to CTR, CBC and CFB selftest functions. (do_tripledes_encrypt, do_tripledes_decrypt): Use TRIPLEDES_ECB_BURN_STACK. * configure.ac [host=x86-64]: Add 'des-amd64.lo'. * src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec) (_gcry_3des_cfb_dec): New prototypes. -- Add non-parallel functions for small speed-up and 3-way parallel functions for modes of operation that support parallel processing. Old vs new (Intel Core i5-4570): ================================ enc dec ECB 1.17x 1.17x CBC 1.17x 2.51x CFB 1.16x 2.49x OFB 1.17x 1.17x CTR 2.56x 2.56x Old vs new (Intel Core i5-2450M): ================================= enc dec ECB 1.28x 1.28x CBC 1.27x 2.33x CFB 1.27x 2.34x OFB 1.27x 1.27x CTR 2.36x 2.35x New (Intel Core i5-4570): ========================= 3DES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 28.39 ns/B 33.60 MiB/s 90.84 c/B ECB dec \| 28.27 ns/B 33.74 MiB/s 90.45 c/B CBC enc \| 29.50 ns/B 32.33 MiB/s 94.40 c/B CBC dec \| 13.35 ns/B 71.45 MiB/s 42.71 c/B CFB enc \| 29.59 ns/B 32.23 MiB/s 94.68 c/B CFB dec \| 13.41 ns/B 71.12 MiB/s 42.91 c/B OFB enc \| 28.90 ns/B 33.00 MiB/s 92.47 c/B OFB dec \| 28.90 ns/B 33.00 MiB/s 92.48 c/B CTR enc \| 13.39 ns/B 71.20 MiB/s 42.86 c/B CTR dec \| 13.39 ns/B 71.21 MiB/s 42.86 c/B Old (Intel Core i5-4570): ========================= 3DES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 33.24 ns/B 28.69 MiB/s 106.4 c/B ECB dec \| 33.26 ns/B 28.67 MiB/s 106.4 c/B CBC enc \| 34.45 ns/B 27.69 MiB/s 110.2 c/B CBC dec \| 33.45 ns/B 28.51 MiB/s 107.1 c/B CFB enc \| 34.43 ns/B 27.70 MiB/s 110.2 c/B CFB dec \| 33.41 ns/B 28.55 MiB/s 106.9 c/B OFB enc \| 33.79 ns/B 28.22 MiB/s 108.1 c/B OFB dec \| 33.79 ns/B 28.22 MiB/s 108.1 c/B CTR enc \| 34.27 ns/B 27.83 MiB/s 109.7 c/B CTR dec \| 34.27 ns/B 27.83 MiB/s 109.7 c/B New (Intel Core i5-2450M): ========================== 3DES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 42.21 ns/B 22.59 MiB/s 105.5 c/B ECB dec \| 42.23 ns/B 22.58 MiB/s 105.6 c/B CBC enc \| 43.70 ns/B 21.82 MiB/s 109.2 c/B CBC dec \| 23.25 ns/B 41.02 MiB/s 58.12 c/B CFB enc \| 43.71 ns/B 21.82 MiB/s 109.3 c/B CFB dec \| 23.23 ns/B 41.05 MiB/s 58.08 c/B OFB enc \| 42.73 ns/B 22.32 MiB/s 106.8 c/B OFB dec \| 42.73 ns/B 22.32 MiB/s 106.8 c/B CTR enc \| 23.31 ns/B 40.92 MiB/s 58.27 c/B CTR dec \| 23.35 ns/B 40.84 MiB/s 58.38 c/B Old (Intel Core i5-2450M): ========================== 3DES \| nanosecs/byte mebibytes/sec cycles/byte ECB enc \| 53.98 ns/B 17.67 MiB/s 134.9 c/B ECB dec \| 54.00 ns/B 17.66 MiB/s 135.0 c/B CBC enc \| 55.43 ns/B 17.20 MiB/s 138.6 c/B CBC dec \| 54.27 ns/B 17.57 MiB/s 135.7 c/B CFB enc \| 55.42 ns/B 17.21 MiB/s 138.6 c/B CFB dec \| 54.35 ns/B 17.55 MiB/s 135.9 c/B OFB enc \| 54.49 ns/B 17.50 MiB/s 136.2 c/B OFB dec \| 54.49 ns/B 17.50 MiB/s 136.2 c/B CTR enc \| 55.02 ns/B 17.33 MiB/s 137.5 c/B CTR dec \| 55.01 ns/B 17.34 MiB/s 137.5 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-03-11	Add MD2 message digest implementation	Dmitry Eremin-Solenikov	1	-1/+9
	* cipher/md2.c: New. * cipher/md.c (digest_list): add _gcry_digest_spec_md2. * tests/basic.c (check_digests): add MD2 test vectors. * configure.ac (default_digests): disable md2 by default. -- Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Some minor indentation fixes by wk.
2014-02-04	Fix ARMv6 detection when CFLAGS modify target CPU architecture	Jussi Kivilinna	1	-4/+10
	* configure.ac (gcry_cv_cc_arm_arch_is_v6): Use compiler test instead of preprocessor test. -- Old test was using C preprocessor to check ARM version macros and missed fact that using different CFLAGS affect those macros (CFLAGS are not passed to preprocessor checks). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-27	Small Windows build tweaks.	Werner Koch	1	-3/+5
	* configure.ac (HAVE_PTHREAD): Do test when building for Windows. * tests/basic.c: Replace "%zi" by "%z" and a cast to make it work under Windows. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-24	tests: Add a test for the internal locking	Werner Koch	1	-2/+2
	* src/global.c (external_lock_test): New. (_gcry_vcontrol): Call new function with formerly reserved code 61. * tests/t-common.h: New. Taken from current libgpg-error. * tests/t-lock.c: New. Based on t-lock.c from libgpg-error. * configure.ac (HAVE_PTHREAD): Set macro to 1 if defined. (AC_CHECK_FUNCS): Check for flockfile. * tests/Makefile.am (tests_bin): Add t-lock. (noinst_HEADERS): Add t-common.h (LDADD): Move value to ... (default_ldadd): new. (t_lock_LDADD): New. -- Signed-off-by: Werner Koch <wk@gnupg.org> (cherry picked from commit fa42c61a84996b6a7574c32233dfd8d9f254d93a) Resolved conflicts: * src/ath.c: Remove as not anymore used in 1.7. * tests/Makefile.am: Merge. Changes: * src/global.c (external_lock_test): Use the gpgrt function for locking. Changed subject because here we are only adding the test case.
2014-01-24	Check compiler features only for the relevant platform.	Werner Koch	1	-102/+169
	* mpi/config.links (mpi_cpu_arch): Always set for ARM. Set for HPPA. Set to "undefined" for unknown platforms. (try_asm_modules): Act upon only after having detected the CPU. * configure.ac: Move the call to config.links before the platform specific compiler checks. Check platform specific features only if the platform is targeted. -- There is no need to check x86 options if we are targeting ARM and vice versa. This may only introduce build problems. With this patch the summary output at the end of the compiler also shows more reasonable messages. Signed-off-by: Werner Koch <wk@gnupg.org> (cherry picked from commit 04d478d9b0f92d80105ddaf2c011f40ae8260cfb)
2014-01-17	Actually check for uint64_t.	Werner Koch	1	-0/+9
	* configure.ac: Check size of uint64_t and the UINT64_C macro. -- configure.ac used $ac_cv_sizeof_uint64_t but never set this variable. Due to the availability of long long on all platforms supporting uint64_t this was not a real problem. Found while remove the corresponding test from gnupg. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-16	Replace ath based mutexes by gpgrt based locks.	Werner Koch	1	-6/+1
	* configure.ac (NEED_GPG_ERROR_VERSION): Require 1.13. (gl_LOCK): Remove. * src/ath.c, src/ath.h: Remove. Remove from all files. Replace all mutexes by gpgrt based statically initialized locks. * src/global.c (global_init): Remove ath_init. (_gcry_vcontrol): Make ath install a dummy function. (print_config): Remove threads info line. * doc/gcrypt.texi: Simplify the multi-thread related documentation. -- The current code does only work on ELF systems with weak symbol support. In particular no locks were used under Windows. With the new gpgrt_lock functions from the soon to be released libgpg-error 1.13 we have a better portable scheme which also allows for static initialized mutexes. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-12	Fix assembly division check	Jussi Kivilinna	1	-1/+1
	* configure.ac (gcry_cv_gcc_as_const_division_ok): Correct variable name mismatch at '--Wa,--divide' workaround check. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-12	Fix constant division for AMD64 assembly on Solaris/x86	Jussi Kivilinna	1	-1/+37
	* configure.ac (gcry_cv_gcc_as_const_division_ok): Add new check for constant division in assembly and test for "-Wa,--divide" workaround. (gcry_cv_gcc_amd64_platform_as_ok): Check for also constant division. -- Appearantly on Solaris/x86 '/' character is treated as begining of line comment by GNU as. This causes problems when compiling SHA-1 SSSE3 implementation: On 02.01.2014 16:26, Richard PALO wrote: >> COLLECT_GCC_OPTIONS='-D' 'HAVE_CONFIG_H' '-I' '.' '-I' '..' '-I' '../src' '-I' '/var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include' '-I' '/var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include/gettext' '-D' '_REENTRANT' '-O2' '-MT' 'sha1-ssse3-amd64.lo' '-MD' '-MP' '-MF' '.deps/sha1-ssse3-amd64.Tpo' '-c' '-fPIC' '-D' 'PIC' '-o' '.libs/sha1-ssse3-amd64.o' '-v' '-mtune=generic' '-march=x86-64' >> /usr/gnu/bin/as -v -I . -I .. -I ../src -I /var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include -I /var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include/gettext -V -Qy -s --64 -o .libs/sha1-ssse3-amd64.o /var/tmp//ccAxWPXX.s >> GNU assembler version 2.23.1 (i386-pc-solaris2.11) using BFD version (GNU Binutils) 2.23.1 >> /var/tmp//ccAxWPXX.s: Assembler messages: >> /var/tmp//ccAxWPXX.s:34: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:38: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:42: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:46: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:54: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:58: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:62: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:66: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:70: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:74: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:78: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:82: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:86: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:90: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:94: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:98: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:102: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:106: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:110: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:114: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:119: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:123: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:127: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:132: Error: unbalanced parenthesis in operand 1. > > > apparently the paddd code, such as > `paddd (.LK_XMM + ((i)/20)*16) RIP, tmp0;` > isn't digested well, appended is the generated assembler code. On 02.01.2014 17:41, Richard PALO wrote: > Hi again, after finding the following: > https://sourceware.org/bugzilla/show_bug.cgi?id=4572 > > I tried using '-Wa,--divide' and that seemed to workaround the problem... > > perhaps the code, or at least the Makefile could be adapted accordingly? Patch adds detection of this feature and attempts to workaround issue with by adding "-Wa,--divide" to CPPFLAGS. If workaround does not work (old GAS on Solaris/x86), we'll disable AMD64 assembly. [v3]: - Update CPPFLAGS after testing instead of CFLAGS. Reported-and-tested-by: Richard PALO <richard.palo@free.fr> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-10	Move all helper scripts to build-aux/	Werner Koch	1	-0/+2
	* scripts/: Rename to build-aux/. * compile, config.guess, config.rpath, config.sub * depcomp, doc/mdate-sh, doc/texinfo.tex * install-sh, ltmain.sh, missing: Move to build-aux/. * Makefile.am (EXTRA_DIST): Adjust. * configure.ac (AC_CONFIG_AUX_DIR): New. (AM_SILENT_RULES): New. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-12-30	Add AMD64 assembly implementation for arcfour	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'arcfour-amd64.S'. * cipher/arcfour-amd64.S: New. * cipher/arcfour.c (USE_AMD64_ASM): New. [USE_AMD64_ASM] (ARCFOUR_context, _gcry_arcfour_amd64) (encrypt_stream): New. * configure.ac [host=x86_64]: Add 'arcfour-amd64.lo'. -- Patch adds Marc Bevand's public-domain AMD64 assembly implementation of RC4 to libgcrypt. Original implementation is at: http://www.zorinaq.com/papers/rc4-amd64.html Benchmarks on Intel i5-4570 (3200 Mhz): New: ARCFOUR \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 1.29 ns/B 737.7 MiB/s 4.14 c/B STREAM dec \| 1.31 ns/B 730.6 MiB/s 4.18 c/B Old (C-language): ARCFOUR \| nanosecs/byte mebibytes/sec cycles/byte STREAM enc \| 2.09 ns/B 457.4 MiB/s 6.67 c/B STREAM dec \| 2.09 ns/B 457.2 MiB/s 6.68 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-30	Fix buggy/incomplete detection of AVX/AVX2 support	Jussi Kivilinna	1	-2/+2
	* configure.ac: Also check for 'xgetbv' instruction in AVX and AVX2 inline assembly checks. * src/hwf-x86.c [__i386__] (get_xgetbv): New function. [__x86_64__] (get_xgetbv): New function. [HAS_X86_CPUID] (detect_x86_gnuc): Check for OSXSAVE and OS support for XMM&YMM registers and enable AVX/AVX2 only if XMM&YMM registers are supported by OS. -- This patch is based on original patch and bug report by Panagiotis Christopoulos: Adding better detection of AVX/AVX2 support After upgrading libgcrypt from 1.5.3 to 1.6.0 on a remote XEN system (linode) my gpg2 stopped working properly, throwing SIGILL signals when doing sha512 operations etc. I managed to debug this with the help of Doublas Freed (dwfreed at mtu.edu) and it seems that the current AVX detection just checks for bit 28 on cpuid but the check still works on systems that have disabled the avx/avx2 instructions for some reason (eg. performance/unstability) resulting in SIGILLs (eg. when trying _gcry_sha512_transform_amd64_avx() ). From Intel resources[1][2], I found additional checks for better AVX detection and applied them in the following patch. Please review/change accordingly and commit some better AVX detection mechanism. The AVX part is tested but could not test the AVX2 one, because I lack proper hardware. I can provide additional information upon request. Use the patch only as a guideline, as it's not thoroughly tested. [1] http://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled [2] http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf (sections 14.3 and 14.7.1) Reported-by: Panagiotis Christopoulos (pchrist) <pchrist@gentoo.org> Cc: Doublas Freed <dwfreed@mtu.edu> Cc: Tim Harder <radhermit@gentoo.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18	Add ARM/NEON implementation for SHA-1	Jussi Kivilinna	1	-0/+4
	* cipher/Makefile.am: Add 'sha1-armv7-neon.S'. * cipher/sha1-armv7-neon.S: New. * cipher/sha1.c (USE_NEON): New. (SHA1_CONTEXT, sha1_init) [USE_NEON]: Add and initialize 'use_neon'. [USE_NEON] (_gcry_sha1_transform_armv7_neon): New. (transform) [USE_NEON]: Use ARM/NEON assembly if enabled. * configure.ac: Add 'sha1-armv7-neon.lo'. -- Patch adds ARM/NEON implementation for SHA-1. Benchmarks show 1.72x improvement on ARM Cortex-A8, 1008 Mhz: jussi@cubie:~/libgcrypt$ tests/bench-slope --cpu-mhz 1008 hash sha1 Hash: \| nanosecs/byte mebibytes/sec cycles/byte SHA1 \| 7.80 ns/B 122.3 MiB/s 7.86 c/B = jussi@cubie:~/libgcrypt$ tests/bench-slope --disable-hwf arm-neon --cpu-mhz 1008 hash sha1 Hash: \| nanosecs/byte mebibytes/sec cycles/byte SHA1 \| 13.41 ns/B 71.10 MiB/s 13.52 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18	Add AVX and AVX2/BMI implementations for SHA-256	Jussi Kivilinna	1	-0/+2
	* LICENSES: Add 'cipher/sha256-avx-amd64.S' and 'cipher/sha256-avx2-bmi2-amd64.S'. * cipher/Makefile.am: Add 'sha256-avx-amd64.S' and 'sha256-avx2-bmi2-amd64.S'. * cipher/sha256-avx-amd64.S: New. * cipher/sha256-avx2-bmi2-amd64.S: New. * cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few places for tiny speed improvement. * cipher/sha256.c (USE_AVX, USE_AVX2): New. (SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'. (sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above new context members. [USE_AVX] (_gcry_sha256_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New. (transform) [USE_AVX2]: Use AVX2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha256-avx-amd64.lo' and 'sha256-avx2-bmi2-amd64.lo'. -- Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2 vs SSSE3 Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-17	Add AVX and AVX/BMI2 implementations for SHA-1	Jussi Kivilinna	1	-0/+2
	* cipher/Makefile.am: Add 'sha1-avx-amd64.S' and 'sha1-avx-bmi2-amd64.S'. * cipher/sha1-avx-amd64.S: New. * cipher/sha1-avx-bmi2-amd64.S: New. * cipher/sha1.c (USE_AVX, USE_BMI2): New. (SHA1_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA1_CONTEXT) [USE_BMI2]: Add 'use_bmi2'. (sha1_init): Initialize 'use_avx' and 'use_bmi2'. [USE_AVX] (_gcry_sha1_transform_amd64_avx): New. [USE_BMI2] (_gcry_sha1_transform_amd64_bmi2): New. (transform) [USE_BMI2]: Use BMI2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha1-avx-amd64.lo' and 'sha1-avx-bmi2-amd64.lo'. -- Patch adds AVX (for Sandybridge and Ivybridge) and AVX/BMI2 (for Haswell) optimized implementations of SHA-1. Note: AVX implementation is currently limited to Intel CPUs due to use of SHLD instruction for faster rotations on Sandybrigde. Benchmarks: cpu C-version SSSE3 AVX/(SHLD\|BMI2) New vs C New vs SSSE3 Intel i5-4570 8.84 c/B 4.61 c/B 3.86 c/B 2.29x 1.19x Intel i5-2450M 9.45 c/B 5.30 c/B 4.39 c/B 2.15x 1.20x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-16	Open new development branch.	Werner Koch	1	-2/+2
	--
2013-12-16	Post release updates.	Werner Koch	1	-1/+1
	--
2013-12-16	Release 1.6.0.	Werner Koch	1	-3/+3

2013-12-16	Add configure option --enable-large-data-tests.	Werner Koch	1	-0/+11
	* configure.ac: Add option --enable-large-data-tests. * tests/hashtest-256g.in: New. * tests/Makefile.am (EXTRA_DIST): Add hashtest-256g.in. (TESTS): Split up into tests_bin, tests_bin_last, tests_sh, and tests_sh_last. (tests_sh_last): Add hashtest-256g (noinst_PROGRAMS): Add only tests_bin and tests_bin_last. (bench-slope.log, hashtest-256g.log): New rules to enforce serial run. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-12-13	SHA-1: Add SSSE3 implementation	Jussi Kivilinna	1	-0/+7
	* cipher/Makefile.am: Add 'sha1-ssse3-amd64.c'. * cipher/sha1-ssse3-amd64.c: New. * cipher/sha1.c (USE_SSSE3): New. (SHA1_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'. (sha1_init) [USE_SSSE3]: Initialize 'use_ssse3'. (transform): Rename to... (_transform): this. (transform): New. * configure.ac [host=x86_64]: Add 'sha1-ssse3-amd64.lo'. -- Patch adds SSSE3 implementation based on white paper "Improving the Performance of the Secure Hash Algorithm (SHA-1)" at http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 Benchmarks: cpu Old New Diff Intel i5-4570 9.02 c/B 5.22 c/B 1.72x Intel i5-2450M 12.27 c/B 7.24 c/B 1.69x Intel Core2 T8100 7.94 c/B 6.76 c/B 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13	Fix empty clobber in AVX2 assembly check	Jussi Kivilinna	1	-1/+1
	* configure.ac (gcry_cv_gcc_inline_asm_avx2): Add "cc" as assembly globber. -- Appearently empty globbers only work in some cases on linux, and fail on mingw32. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13	SHA-512: Add AVX and AVX2 implementations for x86-64	Jussi Kivilinna	1	-0/+19
	* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and 'sha512-avx2-bmi2-amd64.S'. * cipher/sha512-avx-amd64.S: New. * cipher/sha512-avx2-bmi2-amd64.S: New. * cipher/sha512.c (USE_AVX, USE_AVX2): New. (SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'. (sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'. (sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'. [USE_AVX] (_gcry_sha512_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New. (transform) [USE_AVX2]: Add call for AVX2 implementation. (transform) [USE_AVX]: Add call for AVX implementation. * configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check. (sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'. * doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'. * src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New. * src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2". * src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and HWF_INTEL_BMI2. -- Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA512 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$ Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2 vs SSSE3 Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>