summaryrefslogtreecommitdiff
path: root/cipher/Makefile.am
AgeCommit message (Collapse)AuthorFilesLines
2016-03-12Add Intel PCLMUL implementations of CRC algorithmsJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'. * cipher/crc-intel-pclmul.c: New. * cipher/crc.c (USE_INTEL_PCLMUL): New macro. (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'. [USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul) (gcry_crc24rfc2440_intel_pclmul): New. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL HW features detected. (crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL implementation if enabled. (crc24_init): Document storage format of 24-bit CRC. (crc24_next4): Use only 'data' for last table look-up. * configure.ac: Add 'crc-intel-pclmul.lo'. * src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include Intel SSE4.1. * src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection. * src/hwfeatures.c (hwflist): Add 'intel-sse4.1'. * tests/basic.c (fillbuf_count): New. (check_one_md): Add "?" check (million byte data-set with byte pattern 0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?" checks. (check_one_md_multi): Skip "?". (check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160, CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!" test-vectors for CRC32_RFC1510 and CRC24_RFC2440. -- Add Intel PCLMUL accelerated implmentations of CRC algorithms. CRC performance is improved ~11x on x86_64 and i386 on Intel Haswell, and ~2.7x on Intel Sandy-bridge. Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-02-08Add ARM assembly implementation of SHA-512Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha512-arm.S'. * cipher/sha512-arm.S: New. * cipher/sha512.c (USE_ARM_ASM): New. (_gcry_sha512_transform_arm): New. (transform) [USE_ARM_ASM]: Use ARM assembly implementation instead of generic. * configure.ac: Add 'sha512-arm.lo'. -- Benchmark on Cortex-A8 (armv6, 1008 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 112.0 ns/B 8.52 MiB/s 112.9 c/B After (3.3x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 34.01 ns/B 28.04 MiB/s 34.28 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01Add ARMv7/NEON implementation of KeccakJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'. * cipher/keccak-armv7-neon.S: New. * cipher/keccak.c (USE_64BIT_ARM_NEON): New. (NEED_COMMON64): Select if USE_64BIT_ARM_NEON. [NEED_COMMON64] (round_consts_64bit): Rename to... [NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add terminator at end. [USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon) (_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon) (keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New. (keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation if supported by HW. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update to use new round constant table. * configure.ac: Add 'keccak-armv7-neon.lo'. -- Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch is based on public-domain implementation by Ronny Van Keer from SUPERCOP package: https://github.com/floodyberry/supercop/blob/master/crypto_hash/\ keccakc1024/inplace-armv7a-neon/keccak2.s Benchmark results on Cortex-A8 @ 1008 Mhz: Before (generic 32-bit bit-interleaved impl.): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B After (ARM/NEON, ~3.2x faster): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28keccak: rewrite for improved performanceJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-08Add framework to eventually support SHA3.Werner Koch1-0/+1
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256) (GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New. (GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256) (GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New. * cipher/keccak.c: New with stub functions. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c. * configure.ac (available_digests): Add sha3. (USE_SHA3): New. * src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests. * cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos. (md_open): Ditto for hmac processing. * cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping. * cipher/hmac-tests.c (run_selftests): Prepare for tests. * cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx". -- Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to sync them with OpenPGP. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-01-28Fix building of GOST s-boxes when cross-compiling.Werner Koch1-3/+8
* cipher/Makefile.am (gost-s-box): USe CC_FOR_BUILD. (noinst_PROGRAMS): Remove. (EXTRA_DIST): New. (CLEANFILES): New. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-01-16Add OCB cipher modeWerner Koch1-1/+2
* cipher/cipher-ocb.c: New. * cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c * cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New. (gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb. * cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode. (_gcry_cipher_open_internal): Setup default taglen of OCB. (cipher_reset): Clear OCB specific data. (cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate) (_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions. (_gcry_cipher_setiv): Add OCB specific nonce setting. (_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN * src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New. (gcry_cipher_final): New. * cipher/bufhelp.h (buf_xor_1): New. * tests/basic.c (hex2buffer): New. (check_ocb_cipher): New. (main): Call it here. Add option --cipher-modes. * tests/bench-slope.c (bench_aead_encrypt_do_bench): Call gcry_cipher_final. (bench_aead_decrypt_do_bench): Ditto. (bench_aead_authenticate_do_bench): Ditto. Check error code. (bench_ocb_encrypt_do_bench): New. (bench_ocb_decrypt_do_bench): New. (bench_ocb_authenticate_do_bench): New. (ocb_encrypt_ops): New. (ocb_decrypt_ops): New. (ocb_authenticate_ops): New. (cipher_modes): Add them. (cipher_bench_one): Skip wrong block length for OCB. * tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add OCB support. -- See the comments on top of cipher/cipher-ocb.c for the patent status of the OCB mode. The implementation has not yet been optimized and as such is not faster that the other AEAD modes. A first candidate for optimization is the double_block function. Large improvements can be expected by writing an AES ECB function to work on multiple blocks. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-01-06Make make distcheck work again.Werner Koch1-0/+2
* Makefile.am (DISTCHECK_CONFIGURE_FLAGS): Remove --enable-ciphers. * cipher/Makefile.am (DISTCLEANFILES): Add gost-sb.h.
2015-01-06Remove the old Manifest filesWerner Koch1-2/+0
-- The Manifest file have been part of an experiment a long time ago to implement source level integrity. I is not maintained for more than a decade and with the advent of git this is superfluous anyway.
2014-12-27Add Intel SSSE3 based vector permutation AES implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'. * cipher/rijndael-internal.h (USE_SSSE3): New. (RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'. * cipher/rijndael-ssse3-amd64.c: New. * cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey) (_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt) (_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc) (_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc) (_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New. (do_setkey): Add HWF check for SSSE3 and setup for SSSE3 implementation. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add selection for SSSE3 implementation. * configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'. -- This patch adds "AES with vector permutations" implementation by Mike Hamburg. Public-domain source-code is available at: http://crypto.stanford.edu/vpaes/ Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-12GCM: move Intel PCLMUL accelerated implementation to separate fileJussi Kivilinna1-2/+2
* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'. * cipher/cipher-gcm-intel-pclmul.c: New. * cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New prototypes. [GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move to 'cipher-gcm-intel-pclmul.c'. (ghash): Rename to... (ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'. (setupM): Move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based on available HW acceleration. (do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'. * cipher/internal.h (ghash_fn_t): New. (gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-06rijndael: split Padlock part to separate fileJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'rijndael-padlock.c'. * cipher/rijndael-padlock.c: New. * cipher/rijndael.c (do_padlock, do_padlock_encrypt) (do_padlock_decrypt): Move to 'rijndael-padlock.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-12-01rijndael: split AES-NI functions to separate fileJussi Kivilinna1-1/+2
* cipher/Makefile.in: Add 'rijndael-aesni.c'. * cipher/rijndael-aesni.c: New. * cipher/rijndael-internal.h: New. * cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16) (USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context) (keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'. (u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6) (aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4) (do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move to 'rijndael-aesni.c'. (prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc) (_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions in 'rijdael-aesni.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'. -- Clean-up rijndael.c before new new hardware acceleration support gets added. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02Add ARM/NEON implementation of Poly1305Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: New. * cipher/poly1305-internal.h (POLY1305_USE_NEON) (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT): New. * cipher/poly1305.c [POLY1305_USE_NEON] (_gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New. (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation if HWF_ARM_NEON set. * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'. -- Add Andrew Moon's public domain NEON implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-11-02chacha20: add ARMv7/NEON implementationJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'. * cipher/chacha20-armv7-neon.S: New. * cipher/chacha20.c (USE_NEON): New. [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New. (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if HWF_ARM_NEON flag set. (selftest): Self-test encrypting buffer byte by byte. * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'. -- Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B New: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-10-04Add Whirlpool AMD64/SSE2 assembly implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'. * cipher/whirlpool-sse2-amd64.S: New. * cipher/whirlpool.c (USE_AMD64_ASM): New. (whirlpool_tables_s): New. (rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single structure and replace old tables with macros of same name. (tab): New structure containing above tables. [USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64) (whirlpool_transform): New. * configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'. -- Benchmark results: On Intel Core i5-4570 (3.2 Ghz): After: WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B Before: WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B On Intel Core i5-2450M (2.5 Ghz): After: WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B Before: WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B On Intel Core2 T8100 (2.1 Ghz): After: WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B Before: WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B Summary, old vs new ratio: Intel Core i5-4570: 1.88x Intel Core i5-2450M: 1.59x Intel Core2 T8100: 1.94x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-06-28cipher/gost28147: generate optimized s-boxes from compact onesDmitry Eremin-Solenikov1-0/+5
* cipher/gost-s-box.c: New. Outputs optimized expanded representation of s-boxes (4x256) from compact 16x8 representation. * cipher/Makefile.am: Add gost-sb.h dependency to gost28147.lo * cipher/gost.h: Add sbox to the GOST28147_context structure. * cipher/gost28147.c (gost_setkey): Set default s-box to test s-box from GOST R 34.11 (this was the only one S-box before). * cipher/gost28147.c (gost_val): Use sbox from the context. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2014-05-16chacha20: add SSE2/AMD64 optimized implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'. * cipher/chacha20-sse2-amd64.S: New. * cipher/chacha20.c (USE_SSE2): New. [USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New. (chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks function. * configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Intel i5-4570 (haswell), with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3": Old: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.97 ns/B 483.8 MiB/s 6.31 c/B STREAM dec | 1.97 ns/B 483.6 MiB/s 6.31 c/B New: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.931 ns/B 1024.7 MiB/s 2.98 c/B STREAM dec | 0.930 ns/B 1025.0 MiB/s 2.98 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-16poly1305: add AMD64/AVX2 optimized implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'. * cipher/poly1305-avx2-amd64.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX2) (POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE) (POLY1305_AVX2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed. * cipher/poly1305.c [POLY1305_USE_AVX2] (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New. (_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if AVX2 supported by CPU. * configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'. -- Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12poly1305: add AMD64/SSE2 optimized implementationJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'. * cipher/poly1305-internal.h (POLY1305_USE_SSE2) (POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed. * cipher/poly1305-sse2-amd64.S: New. * cipher/poly1305.c [POLY1305_USE_SSE2] (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New. (_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version. * configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B Benchmarks on Intel i5-2450M (sandy-bridge): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12Add Poly1305 based cipher AEAD modeJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'cipher-poly1305.c'. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'. (_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt) (_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate) (_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New. * cipher/cipher-poly1305.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv) (_gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'. (cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ... (_gcry_cipher_setiv): ... here, as with other modes. * src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'. * tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New. (check_ciphers): Add Poly1305 check. (check_cipher_modes): Call 'check_poly1305_cipher'. * tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to bench_aead_... and take nonce as argument. (bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto. (bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench) (bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench) (bench_poly1305_decrypt_do_bench) (bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops) (poly1305_decrypt_ops, poly1305_authenticate_ops): New. (cipher_modes): Add Poly1305. (cipher_bench_one): Add special handling for Poly1305. -- Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant of this mode is proposed for use in TLS and ipsec: https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04 http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-12Add Poly1305 MACJussi Kivilinna1-1/+2
* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and 'poly1305-internal.h'. * cipher/mac-internal.h (poly1305mac_context_s): New. (gcry_mac_handle): Add 'u.poly1305mac'. (_gcry_mac_type_spec_poly1305mac): New. * cipher/mac-poly1305.c: New. * cipher/mac.c (mac_list): Add Poly1305. * cipher/poly1305-internal.h: New. * cipher/poly1305.c: New. * src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'. * tests/basic.c (check_mac): Add Poly1035 test vectors; Allow overriding lengths of data and key buffers. * tests/bench-slope.c (mac_bench): Increase max algo number from 500 to 600. * tests/benchmark.c (mac_bench): Ditto. -- Patch adds Bernstein's Poly1305 message authentication code to libgcrypt. Implementation is based on Andrew Moon's public domain implementation from: https://github.com/floodyberry/poly1305-opt The algorithm added by this patch is the plain Poly1305 without AES and takes 32-bit key that must not be reused. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11chacha20: add AVX2/AMD64 assembly implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'. * cipher/chacha20-avx2-amd64.S: New. * cipher/chacha20.c (USE_AVX2): New macro. [USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New. (chacha20_do_setkey): Select AVX2 implementation if there is HW support. (selftest): Increase size of buf by 256. * configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'. -- Add AVX2 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. SSSE3 (Intel Haswell): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.393 ns/B 2428.0 MiB/s 1.26 c/B STREAM dec | 0.392 ns/B 2433.6 MiB/s 1.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11chacha20: add SSSE3 assembly implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'. * cipher/chacha20-ssse3-amd64.S: New. * cipher/chacha20.c (USE_SSSE3): New macro. [USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New. (chacha20_do_setkey): Select SSSE3 implementation if there is HW support. * configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'. -- Add SSSE3 optimized implementation for ChaCha20. Based on implementation by Andrew Moon. Before (Intel Haswell): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.97 ns/B 483.6 MiB/s 6.31 c/B STREAM dec | 1.97 ns/B 484.0 MiB/s 6.31 c/B After: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-05-11Add ChaCha20 stream cipherJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'chacha20.c'. * cipher/chacha20.c: New. * cipher/cipher.c (cipher_list): Add ChaCha20. * configure.ac: Add ChaCha20. * doc/gcrypt.texi: Add ChaCha20. * src/cipher.h (_gcry_cipher_spec_chacha20): New. * src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo. * tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100. (check_stream_cipher): Add ChaCha20 test-vectors. (check_ciphers): Add ChaCha20. -- Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based on public domain implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2014-03-303des: add amd64 assembly implementation for 3DESJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'des-amd64.S'. * cipher/cipher-selftests.c (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures from 'setkey' function. * cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk functions for 3DES. * cipher/des-amd64.S: New file. * cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros. [USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block) (_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec) (_gcry_3des_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (tripledes_ecb_crypt): New function. (TRIPLEDES_ECB_BURN_STACK): New macro. (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec) (bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Add call to CTR, CBC and CFB selftest functions. (do_tripledes_encrypt, do_tripledes_decrypt): Use TRIPLEDES_ECB_BURN_STACK. * configure.ac [host=x86-64]: Add 'des-amd64.lo'. * src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec) (_gcry_3des_cfb_dec): New prototypes. -- Add non-parallel functions for small speed-up and 3-way parallel functions for modes of operation that support parallel processing. Old vs new (Intel Core i5-4570): ================================ enc dec ECB 1.17x 1.17x CBC 1.17x 2.51x CFB 1.16x 2.49x OFB 1.17x 1.17x CTR 2.56x 2.56x Old vs new (Intel Core i5-2450M): ================================= enc dec ECB 1.28x 1.28x CBC 1.27x 2.33x CFB 1.27x 2.34x OFB 1.27x 1.27x CTR 2.36x 2.35x New (Intel Core i5-4570): ========================= 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 28.39 ns/B 33.60 MiB/s 90.84 c/B ECB dec | 28.27 ns/B 33.74 MiB/s 90.45 c/B CBC enc | 29.50 ns/B 32.33 MiB/s 94.40 c/B CBC dec | 13.35 ns/B 71.45 MiB/s 42.71 c/B CFB enc | 29.59 ns/B 32.23 MiB/s 94.68 c/B CFB dec | 13.41 ns/B 71.12 MiB/s 42.91 c/B OFB enc | 28.90 ns/B 33.00 MiB/s 92.47 c/B OFB dec | 28.90 ns/B 33.00 MiB/s 92.48 c/B CTR enc | 13.39 ns/B 71.20 MiB/s 42.86 c/B CTR dec | 13.39 ns/B 71.21 MiB/s 42.86 c/B Old (Intel Core i5-4570): ========================= 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 33.24 ns/B 28.69 MiB/s 106.4 c/B ECB dec | 33.26 ns/B 28.67 MiB/s 106.4 c/B CBC enc | 34.45 ns/B 27.69 MiB/s 110.2 c/B CBC dec | 33.45 ns/B 28.51 MiB/s 107.1 c/B CFB enc | 34.43 ns/B 27.70 MiB/s 110.2 c/B CFB dec | 33.41 ns/B 28.55 MiB/s 106.9 c/B OFB enc | 33.79 ns/B 28.22 MiB/s 108.1 c/B OFB dec | 33.79 ns/B 28.22 MiB/s 108.1 c/B CTR enc | 34.27 ns/B 27.83 MiB/s 109.7 c/B CTR dec | 34.27 ns/B 27.83 MiB/s 109.7 c/B New (Intel Core i5-2450M): ========================== 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 42.21 ns/B 22.59 MiB/s 105.5 c/B ECB dec | 42.23 ns/B 22.58 MiB/s 105.6 c/B CBC enc | 43.70 ns/B 21.82 MiB/s 109.2 c/B CBC dec | 23.25 ns/B 41.02 MiB/s 58.12 c/B CFB enc | 43.71 ns/B 21.82 MiB/s 109.3 c/B CFB dec | 23.23 ns/B 41.05 MiB/s 58.08 c/B OFB enc | 42.73 ns/B 22.32 MiB/s 106.8 c/B OFB dec | 42.73 ns/B 22.32 MiB/s 106.8 c/B CTR enc | 23.31 ns/B 40.92 MiB/s 58.27 c/B CTR dec | 23.35 ns/B 40.84 MiB/s 58.38 c/B Old (Intel Core i5-2450M): ========================== 3DES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 53.98 ns/B 17.67 MiB/s 134.9 c/B ECB dec | 54.00 ns/B 17.66 MiB/s 135.0 c/B CBC enc | 55.43 ns/B 17.20 MiB/s 138.6 c/B CBC dec | 54.27 ns/B 17.57 MiB/s 135.7 c/B CFB enc | 55.42 ns/B 17.21 MiB/s 138.6 c/B CFB dec | 54.35 ns/B 17.55 MiB/s 135.9 c/B OFB enc | 54.49 ns/B 17.50 MiB/s 136.2 c/B OFB dec | 54.49 ns/B 17.50 MiB/s 136.2 c/B CTR enc | 55.02 ns/B 17.33 MiB/s 137.5 c/B CTR dec | 55.01 ns/B 17.34 MiB/s 137.5 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-30Add blowfish/serpent ARM assembly files to Makefile.amJussi Kivilinna1-2/+2
* cipher/Makefile.am: Add 'blowfish-arm.S' and 'serpent-armv7-neon.S'. -- Fix for bug https://bugs.g10code.com/gnupg/issue1584 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-30Add AMD64 assembly implementation for arcfourJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'arcfour-amd64.S'. * cipher/arcfour-amd64.S: New. * cipher/arcfour.c (USE_AMD64_ASM): New. [USE_AMD64_ASM] (ARCFOUR_context, _gcry_arcfour_amd64) (encrypt_stream): New. * configure.ac [host=x86_64]: Add 'arcfour-amd64.lo'. -- Patch adds Marc Bevand's public-domain AMD64 assembly implementation of RC4 to libgcrypt. Original implementation is at: http://www.zorinaq.com/papers/rc4-amd64.html Benchmarks on Intel i5-4570 (3200 Mhz): New: ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.29 ns/B 737.7 MiB/s 4.14 c/B STREAM dec | 1.31 ns/B 730.6 MiB/s 4.18 c/B Old (C-language): ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.09 ns/B 457.4 MiB/s 6.67 c/B STREAM dec | 2.09 ns/B 457.2 MiB/s 6.68 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Add ARM/NEON implementation for SHA-1Jussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'sha1-armv7-neon.S'. * cipher/sha1-armv7-neon.S: New. * cipher/sha1.c (USE_NEON): New. (SHA1_CONTEXT, sha1_init) [USE_NEON]: Add and initialize 'use_neon'. [USE_NEON] (_gcry_sha1_transform_armv7_neon): New. (transform) [USE_NEON]: Use ARM/NEON assembly if enabled. * configure.ac: Add 'sha1-armv7-neon.lo'. -- Patch adds ARM/NEON implementation for SHA-1. Benchmarks show 1.72x improvement on ARM Cortex-A8, 1008 Mhz: jussi@cubie:~/libgcrypt$ tests/bench-slope --cpu-mhz 1008 hash sha1 Hash: | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 7.80 ns/B 122.3 MiB/s 7.86 c/B = jussi@cubie:~/libgcrypt$ tests/bench-slope --disable-hwf arm-neon --cpu-mhz 1008 hash sha1 Hash: | nanosecs/byte mebibytes/sec cycles/byte SHA1 | 13.41 ns/B 71.10 MiB/s 13.52 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Add AVX and AVX2/BMI implementations for SHA-256Jussi Kivilinna1-1/+1
* LICENSES: Add 'cipher/sha256-avx-amd64.S' and 'cipher/sha256-avx2-bmi2-amd64.S'. * cipher/Makefile.am: Add 'sha256-avx-amd64.S' and 'sha256-avx2-bmi2-amd64.S'. * cipher/sha256-avx-amd64.S: New. * cipher/sha256-avx2-bmi2-amd64.S: New. * cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few places for tiny speed improvement. * cipher/sha256.c (USE_AVX, USE_AVX2): New. (SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'. (sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above new context members. [USE_AVX] (_gcry_sha256_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New. (transform) [USE_AVX2]: Use AVX2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha256-avx-amd64.lo' and 'sha256-avx2-bmi2-amd64.lo'. -- Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2 vs SSSE3 Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-17Add AVX and AVX/BMI2 implementations for SHA-1Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha1-avx-amd64.S' and 'sha1-avx-bmi2-amd64.S'. * cipher/sha1-avx-amd64.S: New. * cipher/sha1-avx-bmi2-amd64.S: New. * cipher/sha1.c (USE_AVX, USE_BMI2): New. (SHA1_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA1_CONTEXT) [USE_BMI2]: Add 'use_bmi2'. (sha1_init): Initialize 'use_avx' and 'use_bmi2'. [USE_AVX] (_gcry_sha1_transform_amd64_avx): New. [USE_BMI2] (_gcry_sha1_transform_amd64_bmi2): New. (transform) [USE_BMI2]: Use BMI2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha1-avx-amd64.lo' and 'sha1-avx-bmi2-amd64.lo'. -- Patch adds AVX (for Sandybridge and Ivybridge) and AVX/BMI2 (for Haswell) optimized implementations of SHA-1. Note: AVX implementation is currently limited to Intel CPUs due to use of SHLD instruction for faster rotations on Sandybrigde. Benchmarks: cpu C-version SSSE3 AVX/(SHLD|BMI2) New vs C New vs SSSE3 Intel i5-4570 8.84 c/B 4.61 c/B 3.86 c/B 2.29x 1.19x Intel i5-2450M 9.45 c/B 5.30 c/B 4.39 c/B 2.15x 1.20x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-14Minor fixes to SHA assembly implementationsJussi Kivilinna1-2/+3
* cipher/Makefile.am: Correct 'sha256-avx*.S' to 'sha512-avx*.S'. * cipher/sha1-ssse3-amd64.S: First line, correct filename. * cipher/sha256-ssse3-amd64.S: Return correct stack burn depth. * cipher/sha512-avx-amd64.S: Use 'vzeroall' to clear registers. * cipher/sha512-avx2-bmi2-amd64.S: Ditto and return correct stack burn depth. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13Convert SHA-1 SSSE3 implementation from mixed asm&C to pure asmJussi Kivilinna1-1/+1
* cipher/Makefile.am: Change 'sha1-ssse3-amd64.c' to 'sha1-ssse3-amd64.S'. * cipher/sha1-ssse3-amd64.c: Remove. * cipher/sha1-ssse3-amd64.S: New. -- Mixed C&asm implementation appears to trigger GCC bugs easily. Therefore convert SSSE3 implementation to pure assembly for safety. Benchmark also show smallish speed improvement. cpu C&asm asm Intel i5-4570 5.22 c/B 5.09 c/B Intel i5-2450M 7.24 c/B 7.00 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13SHA-1: Add SSSE3 implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha1-ssse3-amd64.c'. * cipher/sha1-ssse3-amd64.c: New. * cipher/sha1.c (USE_SSSE3): New. (SHA1_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'. (sha1_init) [USE_SSSE3]: Initialize 'use_ssse3'. (transform): Rename to... (_transform): this. (transform): New. * configure.ac [host=x86_64]: Add 'sha1-ssse3-amd64.lo'. -- Patch adds SSSE3 implementation based on white paper "Improving the Performance of the Secure Hash Algorithm (SHA-1)" at http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 Benchmarks: cpu Old New Diff Intel i5-4570 9.02 c/B 5.22 c/B 1.72x Intel i5-2450M 12.27 c/B 7.24 c/B 1.69x Intel Core2 T8100 7.94 c/B 6.76 c/B 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13SHA-512: Add AVX and AVX2 implementations for x86-64Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and 'sha512-avx2-bmi2-amd64.S'. * cipher/sha512-avx-amd64.S: New. * cipher/sha512-avx2-bmi2-amd64.S: New. * cipher/sha512.c (USE_AVX, USE_AVX2): New. (SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'. (sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'. (sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'. [USE_AVX] (_gcry_sha512_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New. (transform) [USE_AVX2]: Add call for AVX2 implementation. (transform) [USE_AVX]: Add call for AVX implementation. * configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check. (sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'. * doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'. * src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New. * src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2". * src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and HWF_INTEL_BMI2. -- Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA512 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$ Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2 vs SSSE3 Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13SHA-512: Add SSSE3 implementation for x86-64Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha512-ssse3-amd64.S'. * cipher/sha512-ssse3-amd64.S: New. * cipher/sha512.c (USE_SSSE3): New. (SHA512_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'. (sha512_init, sha384_init) [USE_SSSE3]: Initialize 'use_ssse3'. [USE_SSSE3] (_gcry_sha512_transform_amd64_ssse3): New. (transform) [USE_SSSE3]: Call SSSE3 implementation. * configure.ac (sha512): Add 'sha512-ssse3-amd64.lo'. -- Patch adds fast SSSE3 implementation of SHA-512 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA512 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementations-ia-processors-paper.html Benchmarks: cpu Old New Diff Intel i5-4570 10.11 c/B 7.56 c/B 1.33x Intel i5-2450M 14.11 c/B 10.53 c/B 1.33x Intel Core2 T8100 11.92 c/B 10.22 c/B 1.16x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-12SHA-256: Add SSSE3 implementation for x86-64Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'sha256-ssse3-amd64.S'. * cipher/sha256-ssse3-amd64.S: New. * cipher/sha256.c (USE_SSSE3): New. (SHA256_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'. (sha256_init, sha224_init) [USE_SSSE3]: Initialize 'use_ssse3'. (transform): Rename to... (_transform): This. [USE_SSSE3] (_gcry_sha256_transform_amd64_ssse3): New. (transform): New. * configure.ac (HAVE_INTEL_SYNTAX_PLATFORM_AS): New check. (sha256): Add 'sha256-ssse3-amd64.lo'. * doc/gcrypt.texi: Document 'intel-ssse3'. * src/g10lib.h (HWF_INTEL_SSSE3): New. * src/hwfeatures.c (hwflist): Add "intel-ssse3". * src/hwf-x86.c (detect_x86_gnuc): Test for SSSE3. -- Patch adds fast SSSE3 implementation of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Benchmarks: cpu Old New Diff Intel i5-4570 13.99 c/B 10.66 c/B 1.31x Intel i5-2450M 21.53 c/B 15.79 c/B 1.36x Intel Core2 T8100 20.84 c/B 15.07 c/B 1.38x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-11-21Add GMAC to MAC APIJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'mac-gmac.c'. * cipher/mac-gmac.c: New. * cipher/mac-internal.h (gcry_mac_handle): Add 'u.gcm'. (_gcry_mac_type_spec_gmac_aes, _gcry_mac_type_spec_gmac_twofish) (_gcry_mac_type_spec_gmac_serpent, _gcry_mac_type_spec_gmac_seed) (_gcry_mac_type_spec_gmac_camellia): New externs. * cipher/mac.c (mac_list): Add GMAC specifications. * doc/gcrypt.texi: Add mention of GMAC. * src/gcrypt.h.in (gcry_mac_algos): Add GCM algorithms. * tests/basic.c (check_one_mac): Add support for MAC IVs. (check_mac): Add support for MAC IVs and add GMAC test vectors. * tests/bench-slope.c (mac_bench): Iterate algorithm numbers to 499. * tests/benchmark.c (mac_bench): Iterate algorithm numbers to 499. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-11-19Initial implementation of GCMDmitry Eremin-Solenikov1-1/+1
* cipher/Makefile.am: Add 'cipher-gcm.c'. * cipher/cipher-ccm.c (_gcry_ciphert_ccm_set_lengths) (_gcry_cipher_ccm_authenticate, _gcry_cipher_ccm_tag) (_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Change 'c->u_mode.ccm.tag' to 'c->marks.tag'. * cipher/cipher-gcm.c: New. * cipher/cipher-internal.h (GCM_USE_TABLES): New. (gcry_cipher_handle): Add 'marks.tag', 'u_tag', 'length' and 'gcm_table'; Remove 'u_mode.ccm.tag'. (_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt) (_gcry_cipher_gcm_setiv, _gcry_cipher_gcm_authenticate) (_gcry_cipher_gcm_get_tag, _gcry_cipher_gcm_check_tag): New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate) (_gcry_cipher_gettag, _gcry_cipher_checktag): Add GCM mode handling. * src/gcrypt.h.in (gcry_cipher_modes): Add GCRY_CIPHER_MODE_GCM. (GCRY_GCM_BLOCK_LEN): New. * tests/basic.c (check_gcm_cipher): New. (check_ciphers): Add GCM check. (check_cipher_modes): Call 'check_gcm_cipher'. * tests/bench-slope.c (bench_gcm_encrypt_do_bench) (bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench) (gcm_encrypt_ops, gcm_decrypt_ops, gcm_authenticate_ops): New. (cipher_modes): Add GCM enc/dec/auth. (cipher_bench_one): Limit GCM to block ciphers with 16 byte block-size. * tests/benchmark.c (cipher_bench): Add GCM. -- Currently it is still quite slow. Still no support for generate_iv(). Is it really necessary? TODO: Merge/reuse cipher-internal state used by CCM. Changelog entry will be present in final patch submission. Changes since v1: - 6x-7x speedup. - added bench-slope support Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> [jk: mangle new file throught 'indent -nut'] [jk: few fixes] [jk: changelog]
2013-11-19Add CMAC (Cipher-based MAC) to MAC APIJussi Kivilinna1-2/+2
* cipher/Makefile.am: Add 'cipher-cmac.c' and 'mac-cmac.c'. * cipher/cipher-cmac.c: New. * cipher/cipher-internal.h (gcry_cipher_handle.u_mode): Add 'cmac'. * cipher/cipher.c (gcry_cipher_open): Rename to... (_gcry_cipher_open_internal): ...this and add CMAC. (gcry_cipher_open): New wrapper that disallows use of internal modes (CMAC) from outside. (cipher_setkey, cipher_encrypt, cipher_decrypt) (_gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag): Add handling for CMAC mode. (cipher_reset): Do not reset 'marks.key' and do not clear subkeys in 'u_mode' in CMAC mode. * cipher/mac-cmac.c: New. * cipher/mac-internal.h: Add CMAC support and algorithms. * cipher/mac.c: Add CMAC algorithms. * doc/gcrypt.texi: Add documentation for CMAC. * src/cipher.h (gcry_cipher_internal_modes): New. (_gcry_cipher_open_internal, _gcry_cipher_cmac_authenticate) (_gcry_cipher_cmac_get_tag, _gcry_cipher_cmac_check_tag) (_gcry_cipher_cmac_set_subkeys): New prototypes. * src/gcrypt.h.in (gcry_mac_algos): Add CMAC algorithms. * tests/basic.c (check_mac): Add CMAC test vectors. -- Patch adds CMAC (Cipher-based MAC) as defined in RFC 4493 and NIST Special Publication 800-38B. Internally CMAC is added to cipher module, but is available to outside only through MAC API. [v2]: - Add documentation. [v3]: - CMAC algorithm ids start from 201. - Coding style fixes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-11-16Add new MAC API, initially with HMACJussi Kivilinna1-0/+2
* cipher/Makefile.am: Add 'mac.c', 'mac-internal.h' and 'mac-hmac.c'. * cipher/bufhelp.h (buf_eq_const): New. * cipher/cipher-ccm.c (_gcry_cipher_ccm_tag): Use 'buf_eq_const' for constant-time compare. * cipher/mac-hmac.c: New. * cipher/mac-internal.h: New. * cipher/mac.c: New. * doc/gcrypt.texi: Add documentation for MAC API. * src/gcrypt-int.h [GPG_ERROR_VERSION_NUMBER < 1.13] (GPG_ERR_MAC_ALGO): New. * src/gcrypt.h.in (gcry_mac_handle, gcry_mac_hd_t, gcry_mac_algos) (gcry_mac_flags, gcry_mac_open, gcry_mac_close, gcry_mac_ctl) (gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write) (gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen) (gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name) (gcry_mac_reset, gcry_mac_test_algo): New. * src/libgcrypt.def (gcry_mac_open, gcry_mac_close, gcry_mac_ctl) (gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write) (gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen) (gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New. * src/libgcrypt.vers (gcry_mac_open, gcry_mac_close, gcry_mac_ctl) (gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write) (gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen) (gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New. * src/visibility.c (gcry_mac_open, gcry_mac_close, gcry_mac_ctl) (gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write) (gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen) (gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New. * src/visibility.h (gcry_mac_open, gcry_mac_close, gcry_mac_ctl) (gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write) (gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen) (gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New. * tests/basic.c (check_one_mac, check_mac): New. (main): Call 'check_mac'. * tests/bench-slope.c (bench_print_header, bench_print_footer): Allow variable algorithm name width. (_cipher_bench, hash_bench): Update to above change. (bench_hash_do_bench): Add 'gcry_md_reset'. (bench_mac_mode, bench_mac_init, bench_mac_free, bench_mac_do_bench) (mac_ops, mac_modes, mac_bench_one, _mac_bench, mac_bench): New. (main): Add 'mac' benchmark options. * tests/benchmark.c (mac_repetitions, mac_bench): New. (main): Add 'mac' benchmark options. -- Add MAC API, with HMAC algorithms. Internally uses HMAC functionality of the MD module. [v2]: - Add documentation for MAC API. - Change length argument for gcry_mac_read from size_t to size_t* for returning number of written bytes. [v3]: - HMAC algorithm ids start from 101. - Fix coding style for new files. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-28Add ARM NEON assembly implementation of Salsa20Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'. * cipher/salsa20-armv7-neon.S: New. * cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro. (struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t) (salsa20_ivsetup_t): New. (SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'. (SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'. (salsa20_core): Change 'src' argument to 'ctx'. [USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype. [USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon) (salsa20_ivsetup_neon): New. (salsa20_do_setkey): Setup keysetup, ivsetup and core with default functions. (salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect, set keysetup, ivsetup and core with ARM NEON functions. (salsa20_do_setkey): Call 'ctx->keysetup'. (salsa20_setiv): Call 'ctx->ivsetup'. (salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers in ARM NEON implementation. (salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly calling 'salsa20_core'. (selftest): Add test to check large buffer processing and block counter updating. * configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'. -- Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation gains extra speed by processing three blocks in parallel with help of ARM NEON vector processing unit. This implementation is based on public domain code by Peter Schwabe and D. J. Bernstein and it is available in SUPERCOP benchmarking framework. For more details on this work, check paper "NEON crypto" by Daniel J. Bernstein and Peter Schwabe: http://cryptojedi.org/papers/#neoncrypto Benchmark results on Cortex-A8 (1008 Mhz): Before: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B After: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-28Add AMD64 assembly implementation of Salsa20Jussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'salsa20-amd64.S'. * cipher/salsa20-amd64.S: New. * cipher/salsa20.c (USE_AMD64): New macro. [USE_AMD64] (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup) (_gcry_salsa20_amd64_encrypt_blocks): New prototypes. [USE_AMD64] (salsa20_keysetup, salsa20_ivsetup, salsa20_core): New. [!USE_AMD64] (salsa20_core): Change 'src' to non-constant, update block counter in 'salsa20_core' and return burn stack depth. [!USE_AMD64] (salsa20_keysetup, salsa20_ivsetup): New. (salsa20_do_setkey): Move generic key setup to 'salsa20_keysetup'. (salsa20_setkey): Fix burn stack depth. (salsa20_setiv): Move generic IV setup to 'salsa20_ivsetup'. (salsa20_do_encrypt_stream) [USE_AMD64]: Process large buffers in AMD64 implementation. (salsa20_do_encrypt_stream): Move stack burning to this function... (salsa20_encrypt_stream, salsa20r12_encrypt_stream): ...from these functions. * configure.ac [x86-64]: Add 'salsa20-amd64.lo'. -- Patch adds fast AMD64 assembly implementation for Salsa20. This implementation is based on public domain code by D. J. Bernstein and it is available at http://cr.yp.to/snuffle.html (amd64-xmm6). Implementation gains extra speed by processing four blocks in parallel with help SSE2 instructions. Benchmark results on Intel Core i5-4570 (3.2 Ghz): Before: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 3.88 ns/B 246.0 MiB/s 12.41 c/B STREAM dec | 3.88 ns/B 246.0 MiB/s 12.41 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.46 ns/B 387.9 MiB/s 7.87 c/B STREAM dec | 2.46 ns/B 387.7 MiB/s 7.87 c/B After: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.985 ns/B 967.8 MiB/s 3.15 c/B STREAM dec | 0.987 ns/B 966.5 MiB/s 3.16 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.636 ns/B 1500.5 MiB/s 2.03 c/B STREAM dec | 0.636 ns/B 1499.2 MiB/s 2.04 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-23Enable assembler optimizations on earlier ARM coresDmitry Eremin-Solenikov1-4/+4
* cipher/blowfish-armv6.S => cipher/blowfish-arm.S: adapt to pre-armv6 CPUs. * cipher/blowfish.c: enable assembly on armv4/armv5 little-endian CPUs. * cipher/camellia-armv6.S => cipher/camellia-arm.S: adapt to pre-armv6 CPUs. * cipher/camellia.c, cipher-camellia-glue.c: enable assembly on armv4/armv5 little-endian CPUs. * cipher/cast5-armv6.S => cipher/cast5-arm.S: adapt to pre-armv6 CPUs. * cipher/cast5.c: enable assembly on armv4/armv5 little-endian CPUs. * cipher/rijndael-armv6.S => cipher/rijndael-arm.S: adapt to pre-armv6 CPUs. * cipher/rijndael.c: enable assembly on armv4/armv5 little-endian CPUs. * cipher/twofish-armv6.S => cipher/twofish-arm.S: adapt to pre-armv6 CPUs. * cipher/twofish.c: enable assembly on armv4/armv5 little-endian CPUs. -- Our ARMv6 assembly optimized code can be easily adapted to earlier CPUs. The only incompatible place is rev instruction used to do byte swapping. Replace it on <= ARMv6 with a series of 4 instructions. Compare: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- AES 620ms 610ms 650ms 680ms 620ms 630ms 660ms 660ms 630ms 630ms CAMELLIA128 720ms 720ms 780ms 790ms 770ms 760ms 780ms 780ms 770ms 760ms CAMELLIA256 910ms 910ms 970ms 970ms 960ms 950ms 970ms 970ms 960ms 950ms CAST5 820ms 820ms 930ms 920ms 890ms 860ms 930ms 920ms 880ms 890ms BLOWFISH 550ms 560ms 650ms 660ms 630ms 600ms 660ms 650ms 610ms 620ms ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- AES 130ms 140ms 180ms 200ms 160ms 170ms 190ms 200ms 170ms 170ms CAMELLIA128 150ms 160ms 210ms 220ms 200ms 190ms 210ms 220ms 190ms 190ms CAMELLIA256 180ms 180ms 260ms 240ms 240ms 230ms 250ms 250ms 230ms 230ms CAST5 170ms 160ms 270ms 120ms 240ms 130ms 260ms 270ms 130ms 120ms BLOWFISH 160ms 150ms 260ms 110ms 230ms 120ms 250ms 260ms 110ms 120ms Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> [ jk: in camellia.h and twofish.c, USE_ARMV6_ASM => USE_ARM_ASM ] [ jk: fix blowfish-arm.S when __ARM_FEATURE_UNALIGNED defined ] [ jk: in twofish.S remove defined(HAVE_ARM_ARCH_V6) ] [ jk: ARMv6 => ARM in comments ]
2013-10-23ecc: Refactor ecc.cWerner Koch1-0/+1
* cipher/ecc-ecdsa.c, cipher/ecc-eddsa.c, cipher/ecc-gost.c: New. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files. * configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new files. * cipher/ecc.c (point_init, point_free): Move to ecc-common.h. (sign_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_sign. (verify_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_verify. (sign_gost): Move to ecc-gots.c as _gcry_ecc_gost_sign. (verify_gost): Move to ecc-gost.c as _gcry_ecc_gost_verify. (sign_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_sign. (verify_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_verify. (eddsa_generate_key): Move to ecc-eddsa.c as _gcry_ecc_eddsa_genkey. (reverse_buffer): Move to ecc-eddsa.c. (eddsa_encodempi, eddsa_encode_x_y): Ditto. (_gcry_ecc_eddsa_encodepoint, _gcry_ecc_eddsa_decodepoint): Ditto. -- This change should make it easier to add new ECC algorithms. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-10-22twofish: add ARMv6 assembly implementationJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'twofish-armv6.S'. * cipher/twofish-armv6.S: New. * cipher/twofish.c (USE_ARMV6_ASM): New macro. [USE_ARMV6_ASM] (_gcry_twofish_armv6_encrypt_block) (_gcry_twofish_armv6_decrypt_block): New prototypes. [USE_AMDV6_ASM] (twofish_encrypt, twofish_decrypt): Add. [USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt): Remove. (_gcry_twofish_ctr_enc, _gcry_twofish_cfb_dec): Use 'twofish_encrypt' instead of 'do_twofish_encrypt'. (_gcry_twofish_cbc_dec): Use 'twofish_decrypt' instead of 'do_twofish_decrypt'. * configure.ac [arm]: Add 'twofish-armv6.lo'. -- Add optimized ARMv6 assembly implementation for Twofish. Implementation is tuned for Cortex-A8. Unaligned access handling is done in assembly part. For now, only enable this on little-endian systems as big-endian correctness have not been tested yet. Old (gcc-4.8) vs new (twofish-asm), Cortex-A8 (on armhf): ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- TWOFISH 1.23x 1.25x 1.16x 1.26x 1.16x 1.30x 1.18x 1.17x 1.23x 1.23x 1.22x 1.22x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-22Add Counter with CBC-MAC mode (CCM)Jussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'cipher-ccm.c'. * cipher/cipher-ccm.c: New. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode'. (_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt) (_gcry_cipher_ccm_set_nonce, _gcry_cipher_ccm_authenticate) (_gcry_cipher_ccm_get_tag, _gcry_cipher_ccm_check_tag) (_gcry_cipher_ccm_set_lengths): New prototypes. * cipher/cipher.c (gcry_cipher_open, cipher_encrypt, cipher_decrypt) (_gcry_cipher_setiv, _gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag, gry_cipher_ctl): Add handling for CCM mode. * doc/gcrypt.texi: Add documentation for GCRY_CIPHER_MODE_CCM. * src/gcrypt.h.in (gcry_cipher_modes): Add 'GCRY_CIPHER_MODE_CCM'. (gcry_ctl_cmds): Add 'GCRYCTL_SET_CCM_LENGTHS'. (GCRY_CCM_BLOCK_LEN): New. * tests/basic.c (check_ccm_cipher): New. (check_cipher_modes): Call 'check_ccm_cipher'. * tests/benchmark.c (ccm_aead_init): New. (cipher_bench): Add handling for AEAD modes and add CCM benchmarking. -- Patch adds CCM (Counter with CBC-MAC) mode as defined in RFC 3610 and NIST Special Publication 800-38C. Example for encrypting message (split in two buffers; buf1, buf2) and authenticating additional non-encrypted data (split in two buffers; aadbuf1, aadbuf2) with authentication tag length of eigth bytes: size_t params[3]; taglen = 8; gcry_cipher_setkey(h, key, len(key)); gcry_cipher_setiv(h, nonce, len(nonce)); params[0] = len(buf1) + len(buf2); /* 0: enclen */ params[1] = len(aadbuf1) + len(aadbuf2); /* 1: aadlen */ params[2] = taglen; /* 2: authtaglen */ gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3); gcry_cipher_authenticate(h, aadbuf1, len(aadbuf1)); gcry_cipher_authenticate(h, aadbuf2, len(aadbuf2)); gcry_cipher_encrypt(h, buf1, len(buf1), buf1, len(buf1)); gcry_cipher_encrypt(h, buf2, len(buf2), buf2, len(buf2)); gcry_cipher_gettag(h, tag, taglen); Example for decrypting above message and checking authentication tag: size_t params[3]; taglen = 8; gcry_cipher_setkey(h, key, len(key)); gcry_cipher_setiv(h, nonce, len(nonce)); params[0] = len(buf1) + len(buf2); /* 0: enclen */ params[1] = len(aadbuf1) + len(aadbuf2); /* 1: aadlen */ params[2] = taglen; /* 2: authtaglen */ gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3); gcry_cipher_authenticate(h, aadbuf1, len(aadbuf1)); gcry_cipher_authenticate(h, aadbuf2, len(aadbuf2)); gcry_cipher_decrypt(h, buf1, len(buf1), buf1, len(buf1)); gcry_cipher_decrypt(h, buf2, len(buf2), buf2, len(buf2)); err = gcry_cipher_checktag(h, tag, taglen); if (gpg_err_code (err) == GPG_ERR_CHECKSUM) { /* Authentication failed. */ } else if (err == 0) { /* Authentication ok. */ } Example for encrypting message without additional authenticated data: size_t params[3]; taglen = 10; gcry_cipher_setkey(h, key, len(key)); gcry_cipher_setiv(h, nonce, len(nonce)); params[0] = len(buf1); /* 0: enclen */ params[1] = 0; /* 1: aadlen */ params[2] = taglen; /* 2: authtaglen */ gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3); gcry_cipher_encrypt(h, buf1, len(buf1), buf1, len(buf1)); gcry_cipher_gettag(h, tag, taglen); To reset CCM state for cipher handle, one can either set new nonce or use 'gcry_cipher_reset'. This implementation reuses existing CTR mode code for encryption/decryption and is there for able to process multiple buffers that are not multiple of blocksize. AAD data maybe also be passed into gcry_cipher_authenticate in non-blocksize chunks. [v4]: GCRYCTL_SET_CCM_PARAMS => GCRY_SET_CCM_LENGTHS Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-08pubkey: Move sexp parsing for gcry_pk_getkey to the modules.Werner Koch1-1/+1
* cipher/pubkey-util.c: New. (_gcry_pk_util_get_nbits): New. Based on code from gcry_pk_genkey. (_gcry_pk_util_get_rsa_use_e): Ditto. * cipher/pubkey.c (gcry_pk_genkey): Strip most code and pass. * cipher/rsa.c (rsa_generate): Remove args ALGO, NBITS and EVALUE. Call new fucntions to get these values. * cipher/dsa.c (dsa_generate): Remove args ALGO, NBITS and EVALUE. Call _gcry_pk_util_get_nbits to get nbits. Always parse genparms. * cipher/elgamal.c (elg_generate): Ditto. * cipher/ecc.c (ecc_generate): Ditto. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-09-19pk: Move RSA encoding functions to a new file.Werner Koch1-1/+1
* cipher/rsa-common: New. * cipher/pubkey.c (pkcs1_encode_for_encryption): Move to rsa-common.c and rename to _gcry_rsa_pkcs1_encode_for_enc. (pkcs1_decode_for_encryption): Move to rsa-common.c and rename to _gcry_rsa_pkcs1_decode_for_enc. (pkcs1_encode_for_signature): Move to rsa-common.c and rename to _gcry_rsa_pkcs1_encode_for_sig. (oaep_encode): Move to rsa-common.c and rename to _gcry_rsa_oaep_encode. (oaep_decode): Move to rsa-common.c and rename to _gcry_rsa_oaep_decode. (pss_encode): Move to rsa-common.c and rename to _gcry_rsa_pss_encode. (pss_verify): Move to rsa-common.c and rename to _gcry_rsa_pss_decode. (octet_string_from_mpi, mgf1): Move to rsa-common.c. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-09-18Add GOST R 34.11-2012 implementation (Stribog)Dmitry Eremin-Solenikov1-0/+1
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_12_256) (GCRY_MD_GOSTR3411_12_512): New. * cipher/stribog.c: New. * configure.ac (available_digests_64): Add stribog. * src/cipher.h: Declare Stribog declarations. * cipher/md.c: Register Stribog digest. * tests/basic.c (check_digests) Add 4 testcases for Stribog from standard. * doc/gcrypt.texi: Document new constants. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>