summaryrefslogtreecommitdiff
path: root/configure.ac
AgeCommit message (Collapse)AuthorFilesLines
2016-03-20Fix ARM NEON support detection on ARMv6 targetJussi Kivilinna1-1/+1
* configure.ac (gcry_cv_gcc_inline_asm_neon): Use '.arm' directive instead of '.thumb'. -- Fix allows building ARM NEON assembly implementations when compiler target is ARMv6. This enables NEON implementations on ARMv7+NEON CPUs running on ARMv6 OS (for example, Raspbian on Raspberry Pi 2/3). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-03-18Always require a 64 bit integer typeWerner Koch1-9/+9
* configure.ac (available_digests_64): Merge with available_digests. (available_kdfs_64): Merge with available_kdfs. <64 bit datatype test>: Bail out if no such type is available. * src/types.h: Emit #error if no u64 can be defined. (PROPERLY_ALIGNED_TYPE): Always add u64 type. * cipher/bithelp.h: Remove all code paths which handle the case of !HAVE_U64_TYPEDEF. * cipher/bufhelp.h: Ditto. * cipher/cipher-ccm.c: Ditto. * cipher/cipher-gcm.c: Ditto. * cipher/cipher-internal.h: Ditto. * cipher/cipher.c: Ditto. * cipher/hash-common.h: Ditto. * cipher/md.c: Ditto. * cipher/poly1305.c: Ditto. * cipher/scrypt.c: Ditto. * cipher/tiger.c: Ditto. * src/g10lib.h: Ditto. * tests/basic.c: Ditto. * tests/bench-slope.c: Ditto. * tests/benchmark.c: Ditto. -- Given that SHA-2 and some other algorithms require a 64 bit type it does not make anymore sense to conditionally compile some part when the platform does not provide such a type. GnuPG-bug-id: 1815. Signed-off-by: Werner Koch <wk@gnupg.org>
2016-03-12Add Intel PCLMUL implementations of CRC algorithmsJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'. * cipher/crc-intel-pclmul.c: New. * cipher/crc.c (USE_INTEL_PCLMUL): New macro. (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'. [USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul) (gcry_crc24rfc2440_intel_pclmul): New. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL HW features detected. (crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL implementation if enabled. (crc24_init): Document storage format of 24-bit CRC. (crc24_next4): Use only 'data' for last table look-up. * configure.ac: Add 'crc-intel-pclmul.lo'. * src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include Intel SSE4.1. * src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection. * src/hwfeatures.c (hwflist): Add 'intel-sse4.1'. * tests/basic.c (fillbuf_count): New. (check_one_md): Add "?" check (million byte data-set with byte pattern 0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?" checks. (check_one_md_multi): Skip "?". (check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160, CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!" test-vectors for CRC32_RFC1510 and CRC24_RFC2440. -- Add Intel PCLMUL accelerated implmentations of CRC algorithms. CRC performance is improved ~11x on x86_64 and i386 on Intel Haswell, and ~2.7x on Intel Sandy-bridge. Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-02-08Add ARM assembly implementation of SHA-512Jussi Kivilinna1-0/+4
* cipher/Makefile.am: Add 'sha512-arm.S'. * cipher/sha512-arm.S: New. * cipher/sha512.c (USE_ARM_ASM): New. (_gcry_sha512_transform_arm): New. (transform) [USE_ARM_ASM]: Use ARM assembly implementation instead of generic. * configure.ac: Add 'sha512-arm.lo'. -- Benchmark on Cortex-A8 (armv6, 1008 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 112.0 ns/B 8.52 MiB/s 112.9 c/B After (3.3x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 34.01 ns/B 28.04 MiB/s 34.28 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01Add ARMv7/NEON implementation of KeccakJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'. * cipher/keccak-armv7-neon.S: New. * cipher/keccak.c (USE_64BIT_ARM_NEON): New. (NEED_COMMON64): Select if USE_64BIT_ARM_NEON. [NEED_COMMON64] (round_consts_64bit): Rename to... [NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add terminator at end. [USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon) (_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon) (keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New. (keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation if supported by HW. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update to use new round constant table. * configure.ac: Add 'keccak-armv7-neon.lo'. -- Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch is based on public-domain implementation by Ronny Van Keer from SUPERCOP package: https://github.com/floodyberry/supercop/blob/master/crypto_hash/\ keccakc1024/inplace-armv7a-neon/keccak2.s Benchmark results on Cortex-A8 @ 1008 Mhz: Before (generic 32-bit bit-interleaved impl.): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B After (ARM/NEON, ~3.2x faster): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-25Add configure option --enable-build-timestamp.Werner Koch1-1/+10
* configure.ac (BUILD_TIMESTAMP): Set to "<none>" by default. -- This is based on libgpg-error commit d620005fd1a655d591fccb44639e22ea445e4554 but changed to be disabled by default. Check there for some background. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-08Add framework to eventually support SHA3.Werner Koch1-1/+19
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256) (GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New. (GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256) (GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New. * cipher/keccak.c: New with stub functions. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c. * configure.ac (available_digests): Add sha3. (USE_SHA3): New. * src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests. * cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos. (md_open): Ditto for hmac processing. * cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping. * cipher/hmac-tests.c (run_selftests): Prepare for tests. * cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx". -- Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to sync them with OpenPGP. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-05-01Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64Jussi Kivilinna1-3/+105
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-05-01Fix rndhw for 64-bit Windows buildJussi Kivilinna1-0/+1
* configure.ac: Add sizeof check for 'void *'. * random/rndhw.c (poll_padlock): Check for SIZEOF_VOID_P == 8 instead of defined(__LP64__). (RDRAND_LONG): Check for SIZEOF_UNSIGNED_LONG == 8 instead of defined(__LP64__). -- __LP64__ is not predefined for 64-bit mingw64-gcc, which caused wrong assembly code selections. Do selection based on type sizes instead, to support x86_64, x32 and win64 properly. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-05-01Fix packed attribute check for Windows targetsJussi Kivilinna1-1/+3
* configure.ac (gcry_cv_gcc_attribute_packed): Move 'long b' to its own packed structure. -- Change packed attribute test so that it works with both MS ABI and SYSV ABI. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-03-21bufhelp: use one-byte aligned type for unaligned memory accessesJussi Kivilinna1-0/+18
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only when HAVE_GCC_ATTRIBUTE_PACKED and HAVE_GCC_ATTRIBUTE_ALIGNED are defined. (bufhelp_int_t): New type. (buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst, buf_xor_n_copy_2): Use 'bufhelp_int_t'. [BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_u32_t, bufhelp_u64_t): New. [BUFHELP_FAST_UNALIGNED_ACCESS] (buf_get_be32, buf_get_le32) (buf_put_be32, buf_put_le32, buf_get_be64, buf_get_le64) (buf_put_be64, buf_put_le64): Use 'bufhelp_uXX_t'. * configure.ac (gcry_cv_gcc_attribute_packed): New. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-01-15Add functions to count trailing zero bits in a word.Werner Koch1-0/+15
* cipher/bithelp.h (_gcry_ctz, _gcry_ctz64): New. * configure.ac (HAVE_BUILTIN_CTZ): Add new test. -- Note that these functions return the number of bits in the word when passing 0. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-01-05random: Silent warning under NetBSD using rndunixWerner Koch1-4/+3
* random/rndunix.c (STDERR_FILENO): Define if needed. (start_gatherer): Re-open standard descriptors. Fix an unsigned/signed pointer warning. -- GnuPG-bug-id: 1702
2015-01-05build: Require automake 1.14.Werner Koch1-2/+2
* configure.ac (AM_INIT_AUTOMAKE): Add serial-tests. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-12-27Add Intel SSSE3 based vector permutation AES implementationJussi Kivilinna1-0/+3
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'. * cipher/rijndael-internal.h (USE_SSSE3): New. (RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'. * cipher/rijndael-ssse3-amd64.c: New. * cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey) (_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt) (_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc) (_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc) (_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New. (do_setkey): Add HWF check for SSSE3 and setup for SSSE3 implementation. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add selection for SSSE3 implementation. * configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'. -- This patch adds "AES with vector permutations" implementation by Mike Hamburg. Public-domain source-code is available at: http://crypto.stanford.edu/vpaes/ Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-15build: Add configure option --disable-doc.Werner Koch1-1/+11
* Makefile.am (AUTOMAKE_OPTIONS): Remove. (doc) [!BUILD_DOC]: Do not recurse into the dir. * configure.ac (AM_INIT_AUTOMAKE): Add option formerly in Makefile.am. (BUILD_DOC): Add new am_conditional.
2014-12-06rijndael: split Padlock part to separate fileJussi Kivilinna1-0/+3
* cipher/Makefile.am: Add 'rijndael-padlock.c'. * cipher/rijndael-padlock.c: New. * cipher/rijndael.c (do_padlock, do_padlock_encrypt) (do_padlock_decrypt): Move to 'rijndael-padlock.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01rijndael: split AES-NI functions to separate fileJussi Kivilinna1-0/+7
* cipher/Makefile.in: Add 'rijndael-aesni.c'. * cipher/rijndael-aesni.c: New. * cipher/rijndael-internal.h: New. * cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16) (USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context) (keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'. (u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6) (aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4) (do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move to 'rijndael-aesni.c'. (prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc) (_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions in 'rijdael-aesni.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'. -- Clean-up rijndael.c before new new hardware acceleration support gets added. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02Add ARM/NEON implementation of Poly1305Jussi Kivilinna1-0/+5
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: New. * cipher/poly1305-internal.h (POLY1305_USE_NEON) (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT): New. * cipher/poly1305.c [POLY1305_USE_NEON] (_gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New. (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation if HWF_ARM_NEON set. * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'. -- Add Andrew Moon's public domain NEON implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02chacha20: add ARMv7/NEON implementationJussi Kivilinna1-0/+5
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'. * cipher/chacha20-armv7-neon.S: New. * cipher/chacha20.c (USE_NEON): New. [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New. (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if HWF_ARM_NEON flag set. (selftest): Self-test encrypting buffer byte by byte. * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'. -- Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B New: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-04Add Whirlpool AMD64/SSE2 assembly implementationJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. -- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-02build: Document SYSROOT.Werner Koch1-0/+2
* configure.ac: Mark SYSROOT as arg var.
2014-10-02build: Support SYSROOT based config script finding.Werner Koch1-2/+10
* src/libgcrypt.m4: Add support for SYSROOT and set gpg_config_script_warn. Use AC_PATH_PROG instead of AC_PATH_TOOL because the config script is not expected to be installed with a prefix for its name * configure.ac: Print a library mismatch warning. * m4/gpg-error.m4: Update from git master. -- Also fixed the false copyright notice in libgcrypt.m4.
2014-05-16chacha20: add SSE2/AMD64 optimized implementationJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'. * cipher/chacha20-sse2-amd64.S: New. * cipher/chacha20.c (USE_SSE2): New. [USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New. (chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks function. * configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Intel i5-4570 (haswell), with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3": Old: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.97 ns/B 483.8 MiB/s 6.31 c/B STREAM dec | 1.97 ns/B 483.6 MiB/s 6.31 c/B New: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.931 ns/B 1024.7 MiB/s 2.98 c/B STREAM dec | 0.930 ns/B 1025.0 MiB/s 2.98 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-16poly1305: add AMD64/AVX2 optimized implementationJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'. * cipher/poly1305-avx2-amd64.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX2) (POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE) (POLY1305_AVX2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed. * cipher/poly1305.c [POLY1305_USE_AVX2] (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New. (_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if AVX2 supported by CPU. * configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'. -- Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12poly1305: add AMD64/SSE2 optimized implementationJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'. * cipher/poly1305-internal.h (POLY1305_USE_SSE2) (POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed. * cipher/poly1305-sse2-amd64.S: New. * cipher/poly1305.c [POLY1305_USE_SSE2] (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New. (_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version. * configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B Benchmarks on Intel i5-2450M (sandy-bridge): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11chacha20: add AVX2/AMD64 assembly implementationJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'. * cipher/chacha20-avx2-amd64.S: New. * cipher/chacha20.c (USE_AVX2): New macro. [USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New. (chacha20_do_setkey): Select AVX2 implementation if there is HW support. (selftest): Increase size of buf by 256. * configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'. -- Add AVX2 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. SSSE3 (Intel Haswell): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.393 ns/B 2428.0 MiB/s 1.26 c/B STREAM dec | 0.392 ns/B 2433.6 MiB/s 1.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11chacha20: add SSSE3 assembly implementationJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'. * cipher/chacha20-ssse3-amd64.S: New. * cipher/chacha20.c (USE_SSSE3): New macro. [USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New. (chacha20_do_setkey): Select SSSE3 implementation if there is HW support. * configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'. -- Add SSSE3 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. Before (Intel Haswell): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.97 ns/B 483.6 MiB/s 6.31 c/B STREAM dec | 1.97 ns/B 484.0 MiB/s 6.31 c/B After: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11Add ChaCha20 stream cipherJussi Kivilinna1-1/+7
* cipher/Makefile.am: Add 'chacha20.c'. * cipher/chacha20.c: New. * cipher/cipher.c (cipher_list): Add ChaCha20. * configure.ac: Add ChaCha20. * doc/gcrypt.texi: Add ChaCha20. * src/cipher.h (_gcry_cipher_spec_chacha20): New. * src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo. * tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100. (check_stream_cipher): Add ChaCha20 test-vectors. (check_ciphers): Add ChaCha20. -- Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based on public domain implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-07Bump LT version.Werner Koch1-2/+3
* configure.ac: Bumb LT version to C21/A1/R0. -- This is to avoid conflicts with the 1.6 series. Note that if we add a new interface to 1.6 we would need to bump age again.
2014-03-303des: add amd64 assembly implementation for 3DESJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'des-amd64.S'. * cipher/cipher-selftests.c (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures from 'setkey' function. * cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk functions for 3DES. * cipher/des-amd64.S: New file. * cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros. [USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block) (_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec) (_gcry_3des_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (tripledes_ecb_crypt): New function. (TRIPLEDES_ECB_BURN_STACK): New macro. (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec) (bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Add call to CTR, CBC and CFB selftest functions. (do_tripledes_encrypt, do_tripledes_decrypt): Use TRIPLEDES_ECB_BURN_STACK. * configure.ac [host=x86-64]: Add 'des-amd64.lo'. * src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec) (_gcry_3des_cfb_dec): New prototypes. -- Add non-parallel functions for small speed-up and 3-way parallel functions for modes of operation that support parallel processing. Old vs new (Intel Core i5-4570): ================================ enc dec ECB 1.17x 1.17x CBC 1.17x 2.51x CFB 1.16x 2.49x OFB 1.17x 1.17x CTR 2.56x 2.56x Old vs new (Intel Core i5-2450M): ================================= enc dec ECB 1.28x 1.28x CBC 1.27x 2.33x CFB 1.27x 2.34x OFB 1.27x 1.27x CTR 2.36x 2.35x New (Intel Core i5-4570): ========================= 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 28.39 ns/B 33.60 MiB/s 90.84 c/B ECB dec | 28.27 ns/B 33.74 MiB/s 90.45 c/B CBC enc | 29.50 ns/B 32.33 MiB/s 94.40 c/B CBC dec | 13.35 ns/B 71.45 MiB/s 42.71 c/B CFB enc | 29.59 ns/B 32.23 MiB/s 94.68 c/B CFB dec | 13.41 ns/B 71.12 MiB/s 42.91 c/B OFB enc | 28.90 ns/B 33.00 MiB/s 92.47 c/B OFB dec | 28.90 ns/B 33.00 MiB/s 92.48 c/B CTR enc | 13.39 ns/B 71.20 MiB/s 42.86 c/B CTR dec | 13.39 ns/B 71.21 MiB/s 42.86 c/B Old (Intel Core i5-4570): ========================= 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 33.24 ns/B 28.69 MiB/s 106.4 c/B ECB dec | 33.26 ns/B 28.67 MiB/s 106.4 c/B CBC enc | 34.45 ns/B 27.69 MiB/s 110.2 c/B CBC dec | 33.45 ns/B 28.51 MiB/s 107.1 c/B CFB enc | 34.43 ns/B 27.70 MiB/s 110.2 c/B CFB dec | 33.41 ns/B 28.55 MiB/s 106.9 c/B OFB enc | 33.79 ns/B 28.22 MiB/s 108.1 c/B OFB dec | 33.79 ns/B 28.22 MiB/s 108.1 c/B CTR enc | 34.27 ns/B 27.83 MiB/s 109.7 c/B CTR dec | 34.27 ns/B 27.83 MiB/s 109.7 c/B New (Intel Core i5-2450M): ========================== 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 42.21 ns/B 22.59 MiB/s 105.5 c/B ECB dec | 42.23 ns/B 22.58 MiB/s 105.6 c/B CBC enc | 43.70 ns/B 21.82 MiB/s 109.2 c/B CBC dec | 23.25 ns/B 41.02 MiB/s 58.12 c/B CFB enc | 43.71 ns/B 21.82 MiB/s 109.3 c/B CFB dec | 23.23 ns/B 41.05 MiB/s 58.08 c/B OFB enc | 42.73 ns/B 22.32 MiB/s 106.8 c/B OFB dec | 42.73 ns/B 22.32 MiB/s 106.8 c/B CTR enc | 23.31 ns/B 40.92 MiB/s 58.27 c/B CTR dec | 23.35 ns/B 40.84 MiB/s 58.38 c/B Old (Intel Core i5-2450M): ========================== 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 53.98 ns/B 17.67 MiB/s 134.9 c/B ECB dec | 54.00 ns/B 17.66 MiB/s 135.0 c/B CBC enc | 55.43 ns/B 17.20 MiB/s 138.6 c/B CBC dec | 54.27 ns/B 17.57 MiB/s 135.7 c/B CFB enc | 55.42 ns/B 17.21 MiB/s 138.6 c/B CFB dec | 54.35 ns/B 17.55 MiB/s 135.9 c/B OFB enc | 54.49 ns/B 17.50 MiB/s 136.2 c/B OFB dec | 54.49 ns/B 17.50 MiB/s 136.2 c/B CTR enc | 55.02 ns/B 17.33 MiB/s 137.5 c/B CTR dec | 55.01 ns/B 17.34 MiB/s 137.5 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-03-11Add MD2 message digest implementationDmitry Eremin-Solenikov1-1/+9
* cipher/md2.c: New. * cipher/md.c (digest_list): add _gcry_digest_spec_md2. * tests/basic.c (check_digests): add MD2 test vectors. * configure.ac (default_digests): disable md2 by default. -- Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Some minor indentation fixes by wk.
2014-02-04Fix ARMv6 detection when CFLAGS modify target CPU architectureJussi Kivilinna1-4/+10
* configure.ac (gcry_cv_cc_arm_arch_is_v6): Use compiler test instead of preprocessor test. -- Old test was using C preprocessor to check ARM version macros and missed fact that using different CFLAGS affect those macros (CFLAGS are not passed to preprocessor checks). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-27Small Windows build tweaks.Werner Koch1-3/+5
* configure.ac (HAVE_PTHREAD): Do test when building for Windows. * tests/basic.c: Replace "%zi" by "%z" and a cast to make it work under Windows. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-24tests: Add a test for the internal lockingWerner Koch1-2/+2
* src/global.c (external_lock_test): New. (_gcry_vcontrol): Call new function with formerly reserved code 61. * tests/t-common.h: New. Taken from current libgpg-error. * tests/t-lock.c: New. Based on t-lock.c from libgpg-error. * configure.ac (HAVE_PTHREAD): Set macro to 1 if defined. (AC_CHECK_FUNCS): Check for flockfile. * tests/Makefile.am (tests_bin): Add t-lock. (noinst_HEADERS): Add t-common.h (LDADD): Move value to ... (default_ldadd): new. (t_lock_LDADD): New. -- Signed-off-by: Werner Koch <wk@gnupg.org> (cherry picked from commit fa42c61a84996b6a7574c32233dfd8d9f254d93a) Resolved conflicts: * src/ath.c: Remove as not anymore used in 1.7. * tests/Makefile.am: Merge. Changes: * src/global.c (external_lock_test): Use the gpgrt function for locking. Changed subject because here we are only adding the test case.
2014-01-24Check compiler features only for the relevant platform.Werner Koch1-102/+169
* mpi/config.links (mpi_cpu_arch): Always set for ARM. Set for HPPA. Set to "undefined" for unknown platforms. (try_asm_modules): Act upon only after having detected the CPU. * configure.ac: Move the call to config.links before the platform specific compiler checks. Check platform specific features only if the platform is targeted. -- There is no need to check x86 options if we are targeting ARM and vice versa. This may only introduce build problems. With this patch the summary output at the end of the compiler also shows more reasonable messages. Signed-off-by: Werner Koch <wk@gnupg.org> (cherry picked from commit 04d478d9b0f92d80105ddaf2c011f40ae8260cfb)
2014-01-17Actually check for uint64_t.Werner Koch1-0/+9
* configure.ac: Check size of uint64_t and the UINT64_C macro. -- configure.ac used $ac_cv_sizeof_uint64_t but never set this variable. Due to the availability of long long on all platforms supporting uint64_t this was not a real problem. Found while remove the corresponding test from gnupg. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-16Replace ath based mutexes by gpgrt based locks.Werner Koch1-6/+1
* configure.ac (NEED_GPG_ERROR_VERSION): Require 1.13. (gl_LOCK): Remove. * src/ath.c, src/ath.h: Remove. Remove from all files. Replace all mutexes by gpgrt based statically initialized locks. * src/global.c (global_init): Remove ath_init. (_gcry_vcontrol): Make ath install a dummy function. (print_config): Remove threads info line. * doc/gcrypt.texi: Simplify the multi-thread related documentation. -- The current code does only work on ELF systems with weak symbol support. In particular no locks were used under Windows. With the new gpgrt_lock functions from the soon to be released libgpg-error 1.13 we have a better portable scheme which also allows for static initialized mutexes. Signed-off-by: Werner Koch <wk@gnupg.org>
2014-01-12Fix assembly division checkJussi Kivilinna1-1/+1
* configure.ac (gcry_cv_gcc_as_const_division_ok): Correct variable name mismatch at '--Wa,--divide' workaround check. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-12Fix constant division for AMD64 assembly on Solaris/x86Jussi Kivilinna1-1/+37
* configure.ac (gcry_cv_gcc_as_const_division_ok): Add new check for constant division in assembly and test for "-Wa,--divide" workaround. (gcry_cv_gcc_amd64_platform_as_ok): Check for also constant division. -- Appearantly on Solaris/x86 '/' character is treated as begining of line comment by GNU as. This causes problems when compiling SHA-1 SSSE3 implementation: On 02.01.2014 16:26, Richard PALO wrote: >> COLLECT_GCC_OPTIONS='-D' 'HAVE_CONFIG_H' '-I' '.' '-I' '..' '-I' '../src' '-I' '/var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include' '-I' '/var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include/gettext' '-D' '_REENTRANT' '-O2' '-MT' 'sha1-ssse3-amd64.lo' '-MD' '-MP' '-MF' '.deps/sha1-ssse3-amd64.Tpo' '-c' '-fPIC' '-D' 'PIC' '-o' '.libs/sha1-ssse3-amd64.o' '-v' '-mtune=generic' '-march=x86-64' >> /usr/gnu/bin/as -v -I . -I .. -I ../src -I /var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include -I /var/tmp/pkgsrc/security/libgcrypt/work/.buildlink/include/gettext -V -Qy -s --64 -o .libs/sha1-ssse3-amd64.o /var/tmp//ccAxWPXX.s >> GNU assembler version 2.23.1 (i386-pc-solaris2.11) using BFD version (GNU Binutils) 2.23.1 >> /var/tmp//ccAxWPXX.s: Assembler messages: >> /var/tmp//ccAxWPXX.s:34: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:38: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:42: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:46: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:54: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:58: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:62: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:66: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:70: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:74: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:78: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:82: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:86: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:90: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:94: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:98: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:102: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:106: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:110: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:114: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:119: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:123: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:127: Error: unbalanced parenthesis in operand 1. >> /var/tmp//ccAxWPXX.s:132: Error: unbalanced parenthesis in operand 1. > > > apparently the paddd code, such as > `paddd (.LK_XMM + ((i)/20)*16) RIP, tmp0;` > isn't digested well, appended is the generated assembler code. On 02.01.2014 17:41, Richard PALO wrote: > Hi again, after finding the following: > https://sourceware.org/bugzilla/show_bug.cgi?id=4572 > > I tried using '-Wa,--divide' and that seemed to workaround the problem... > > perhaps the code, or at least the Makefile could be adapted accordingly? Patch adds detection of this feature and attempts to workaround issue with by adding "-Wa,--divide" to CPPFLAGS. If workaround does not work (old GAS on Solaris/x86), we'll disable AMD64 assembly. [v3]: - Update CPPFLAGS after testing instead of CFLAGS. Reported-and-tested-by: Richard PALO <richard.palo@free.fr> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-01-10Move all helper scripts to build-aux/Werner Koch1-0/+2
* scripts/: Rename to build-aux/. * compile, config.guess, config.rpath, config.sub * depcomp, doc/mdate-sh, doc/texinfo.tex * install-sh, ltmain.sh, missing: Move to build-aux/. * Makefile.am (EXTRA_DIST): Adjust. * configure.ac (AC_CONFIG_AUX_DIR): New. (AM_SILENT_RULES): New. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-12-30Add AMD64 assembly implementation for arcfourJussi Kivilinna1-0/+7
* cipher/Makefile.am: Add 'arcfour-amd64.S'. * cipher/arcfour-amd64.S: New. * cipher/arcfour.c (USE_AMD64_ASM): New. [USE_AMD64_ASM] (ARCFOUR_context, _gcry_arcfour_amd64) (encrypt_stream): New. * configure.ac [host=x86_64]: Add 'arcfour-amd64.lo'. -- Patch adds Marc Bevand's public-domain AMD64 assembly implementation of RC4 to libgcrypt. Original implementation is at: http://www.zorinaq.com/papers/rc4-amd64.html Benchmarks on Intel i5-4570 (3200 Mhz): New: ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.29 ns/B 737.7 MiB/s 4.14 c/B STREAM dec | 1.31 ns/B 730.6 MiB/s 4.18 c/B Old (C-language): ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.09 ns/B 457.4 MiB/s 6.67 c/B STREAM dec | 2.09 ns/B 457.2 MiB/s 6.68 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-30Fix buggy/incomplete detection of AVX/AVX2 supportJussi Kivilinna1-2/+2
* configure.ac: Also check for 'xgetbv' instruction in AVX and AVX2 inline assembly checks. * src/hwf-x86.c [__i386__] (get_xgetbv): New function. [__x86_64__] (get_xgetbv): New function. [HAS_X86_CPUID] (detect_x86_gnuc): Check for OSXSAVE and OS support for XMM&YMM registers and enable AVX/AVX2 only if XMM&YMM registers are supported by OS. -- This patch is based on original patch and bug report by Panagiotis Christopoulos: Adding better detection of AVX/AVX2 support After upgrading libgcrypt from 1.5.3 to 1.6.0 on a remote XEN system (linode) my gpg2 stopped working properly, throwing SIGILL signals when doing sha512 operations etc. I managed to debug this with the help of Doublas Freed (dwfreed at mtu.edu) and it seems that the current AVX detection just checks for bit 28 on cpuid but the check still works on systems that have disabled the avx/avx2 instructions for some reason (eg. performance/unstability) resulting in SIGILLs (eg. when trying _gcry_sha512_transform_amd64_avx() ). From Intel resources[1][2], I found additional checks for better AVX detection and applied them in the following patch. Please review/change accordingly and commit some better AVX detection mechanism. The AVX part is tested but could not test the AVX2 one, because I lack proper hardware. I can provide additional information upon request. Use the patch only as a guideline, as it's not thoroughly tested. [1] http://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled [2] http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf (sections 14.3 and 14.7.1) Reported-by: Panagiotis Christopoulos (pchrist) <pchrist@gentoo.org> Cc: Doublas Freed <dwfreed@mtu.edu> Cc: Tim Harder <radhermit@gentoo.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Add ARM/NEON implementation for SHA-1Jussi Kivilinna1-0/+4
* cipher/Makefile.am: Add 'sha1-armv7-neon.S'. * cipher/sha1-armv7-neon.S: New. * cipher/sha1.c (USE_NEON): New. (SHA1_CONTEXT, sha1_init) [USE_NEON]: Add and initialize 'use_neon'. [USE_NEON] (_gcry_sha1_transform_armv7_neon): New. (transform) [USE_NEON]: Use ARM/NEON assembly if enabled. * configure.ac: Add 'sha1-armv7-neon.lo'. -- Patch adds ARM/NEON implementation for SHA-1. Benchmarks show 1.72x improvement on ARM Cortex-A8, 1008 Mhz: jussi@cubie:~/libgcrypt$ tests/bench-slope --cpu-mhz 1008 hash sha1 Hash: | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 7.80 ns/B 122.3 MiB/s 7.86 c/B = jussi@cubie:~/libgcrypt$ tests/bench-slope --disable-hwf arm-neon --cpu-mhz 1008 hash sha1 Hash: | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 13.41 ns/B 71.10 MiB/s 13.52 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Add AVX and AVX2/BMI implementations for SHA-256Jussi Kivilinna1-0/+2
* LICENSES: Add 'cipher/sha256-avx-amd64.S' and 'cipher/sha256-avx2-bmi2-amd64.S'. * cipher/Makefile.am: Add 'sha256-avx-amd64.S' and 'sha256-avx2-bmi2-amd64.S'. * cipher/sha256-avx-amd64.S: New. * cipher/sha256-avx2-bmi2-amd64.S: New. * cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few places for tiny speed improvement. * cipher/sha256.c (USE_AVX, USE_AVX2): New. (SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'. (sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above new context members. [USE_AVX] (_gcry_sha256_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New. (transform) [USE_AVX2]: Use AVX2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha256-avx-amd64.lo' and 'sha256-avx2-bmi2-amd64.lo'. -- Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on IntelĀ® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2 vs SSSE3 Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-17Add AVX and AVX/BMI2 implementations for SHA-1Jussi Kivilinna1-0/+2
* cipher/Makefile.am: Add 'sha1-avx-amd64.S' and 'sha1-avx-bmi2-amd64.S'. * cipher/sha1-avx-amd64.S: New. * cipher/sha1-avx-bmi2-amd64.S: New. * cipher/sha1.c (USE_AVX, USE_BMI2): New. (SHA1_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA1_CONTEXT) [USE_BMI2]: Add 'use_bmi2'. (sha1_init): Initialize 'use_avx' and 'use_bmi2'. [USE_AVX] (_gcry_sha1_transform_amd64_avx): New. [USE_BMI2] (_gcry_sha1_transform_amd64_bmi2): New. (transform) [USE_BMI2]: Use BMI2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha1-avx-amd64.lo' and 'sha1-avx-bmi2-amd64.lo'. -- Patch adds AVX (for Sandybridge and Ivybridge) and AVX/BMI2 (for Haswell) optimized implementations of SHA-1. Note: AVX implementation is currently limited to Intel CPUs due to use of SHLD instruction for faster rotations on Sandybrigde. Benchmarks: cpu C-version SSSE3 AVX/(SHLD|BMI2) New vs C New vs SSSE3 Intel i5-4570 8.84 c/B 4.61 c/B 3.86 c/B 2.29x 1.19x Intel i5-2450M 9.45 c/B 5.30 c/B 4.39 c/B 2.15x 1.20x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-16Open new development branch.Werner Koch1-2/+2
--
2013-12-16Post release updates.Werner Koch1-1/+1
--
2013-12-16Release 1.6.0.Werner Koch1-3/+3
2013-12-16Add configure option --enable-large-data-tests.Werner Koch1-0/+11
* configure.ac: Add option --enable-large-data-tests. * tests/hashtest-256g.in: New. * tests/Makefile.am (EXTRA_DIST): Add hashtest-256g.in. (TESTS): Split up into tests_bin, tests_bin_last, tests_sh, and tests_sh_last. (tests_sh_last): Add hashtest-256g (noinst_PROGRAMS): Add only tests_bin and tests_bin_last. (bench-slope.log, hashtest-256g.log): New rules to enforce serial run. Signed-off-by: Werner Koch <wk@gnupg.org>