summaryrefslogtreecommitdiff
path: root/cipher/keccak.c
AgeCommit message (Collapse)AuthorFilesLines
2015-11-01Add ARMv7/NEON implementation of KeccakJussi Kivilinna1-3/+68
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'. * cipher/keccak-armv7-neon.S: New. * cipher/keccak.c (USE_64BIT_ARM_NEON): New. (NEED_COMMON64): Select if USE_64BIT_ARM_NEON. [NEED_COMMON64] (round_consts_64bit): Rename to... [NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add terminator at end. [USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon) (_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon) (keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New. (keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation if supported by HW. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update to use new round constant table. * configure.ac: Add 'keccak-armv7-neon.lo'. -- Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch is based on public-domain implementation by Ronny Van Keer from SUPERCOP package: https://github.com/floodyberry/supercop/blob/master/crypto_hash/\ keccakc1024/inplace-armv7a-neon/keccak2.s Benchmark results on Cortex-A8 @ 1008 Mhz: Before (generic 32-bit bit-interleaved impl.): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B After (ARM/NEON, ~3.2x faster): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01Optimize Keccak 64-bit absorb functionsJussi Kivilinna1-66/+93
* cipher/keccak.c [USE_64BIT] [__x86_64__] (absorb_lanes64_8) (absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New. * cipher/keccak.c [USE_64BIT] [!__x86_64__] (absorb_lanes64_8) (absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New. [USE_64BIT] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT] (keccak_absorb_lanes64): Remove. [USE_64BIT_SHLD] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT_SHLD] (keccak_absorb_lanes64_shld): Remove. [USE_64BIT_BMI2] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT_BMI2] (keccak_absorb_lanes64_bmi2): Remove. * cipher/keccak_permute_64.h (KECCAK_F1600_ABSORB_FUNC_NAME): New. -- Optimize 64-bit absorb functions for small speed-up. After this change, 64-bit BMI2 implementation matches speed of fastest results from SUPERCOP for Intel Haswell CPUs (long messages). Benchmark on Intel Haswell @ 3.2 Ghz: Before: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.32 ns/B 411.7 MiB/s 7.41 c/B SHAKE256 | 2.84 ns/B 336.2 MiB/s 9.08 c/B SHA3-224 | 2.69 ns/B 354.9 MiB/s 8.60 c/B SHA3-256 | 2.84 ns/B 336.0 MiB/s 9.08 c/B SHA3-384 | 3.69 ns/B 258.4 MiB/s 11.81 c/B SHA3-512 | 5.30 ns/B 179.9 MiB/s 16.97 c/B After: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.27 ns/B 420.6 MiB/s 7.26 c/B SHAKE256 | 2.79 ns/B 341.4 MiB/s 8.94 c/B SHA3-224 | 2.64 ns/B 361.7 MiB/s 8.44 c/B SHA3-256 | 2.79 ns/B 341.5 MiB/s 8.94 c/B SHA3-384 | 3.65 ns/B 261.4 MiB/s 11.68 c/B SHA3-512 | 5.27 ns/B 181.0 MiB/s 16.87 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-31Keccak: Add SHAKE Extendable-Output FunctionsJussi Kivilinna1-29/+246
* src/hash-common.c (_gcry_hash_selftest_check_one): Add handling for XOFs. * src/keccak.c (keccak_ops_t): Rename 'extract_inplace' to 'extract' and add 'pos' argument. (KECCAK_CONTEXT): Add 'suffix'. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi): Rename to... (keccak_extract32bi): ...this; Add handling for 'pos' argument. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi_bmi2): Rename to... (keccak_extract32bi_bmi2): ...this; Add handling for 'pos' argument. (keccak_init): Setup 'suffix'; add SHAKE128 & SHAKE256. (shake128_init, shake256_init): New. (keccak_final): Do not initial permute for SHAKE output; use correct suffix for SHAKE. (keccak_extract): New. (keccak_selftests_keccak): Add SHAKE128 & SHAKE256 test-vectors. (run_selftests): Add SHAKE128 & SHAKE256. (shake128_asn, oid_spec_shake128, shake256_asn, oid_spec_shake256) (_gcry_digest_spec_shake128, _gcry_digest_spec_shake256): New. * cipher/md.c (digest_list): Add SHAKE128 & SHAKE256. * doc/gcrypt.texi: Ditto. * src/cipher.h (_gcry_digest_spec_shake128) (_gcry_digest_spec_shake256): New. * src/gcrypt.h.in (GCRY_MD_SHAKE128, GCRY_MD_SHAKE256): New. * tests/basic.c (check_one_md): Add XOF check; Add 'elen' argument. (check_one_md_multi): Skip if algo is XOF. (check_digests): Add SHAKE128 & SHAKE256 test vectors. * tests/bench-slope.c (kdf_bench_one): Skip XOFs. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28md: add variable length output interfaceJussi Kivilinna1-4/+4
* cipher/crc.c (_gcry_digest_spec_crc32) (_gcry_digest_spec_crc32_rfc1510, _gcry_digest_spec_crc24_rfc2440): Set 'extract' NULL. * cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_94) (_gcry_digest_spec_gost3411_cp): Ditto. * cipher/keccak.c (_gcry_digest_spec_sha3_224) (_gcry_digest_spec_sha3_256, _gcry_digest_spec_sha3_384) (_gcry_digest_spec_sha3_512): Ditto. * cipher/md2.c (_gcry_digest_spec_md2): Ditto. * cipher/md4.c (_gcry_digest_spec_md4): Ditto. * cipher/md5.c (_gcry_digest_spec_md5): Ditto. * cipher/rmd160.c (_gcry_digest_spec_rmd160): Ditto. * cipher/sha1.c (_gcry_digest_spec_sha1): Ditto. * cipher/sha256.c (_gcry_digest_spec_sha224) (_gcry_digest_spec_sha256): Ditto. * cipher/sha512.c (_gcry_digest_spec_sha384) (_gcry_digest_spec_sha512): Ditto. * cipher/stribog.c (_gcry_digest_spec_stribog_256) (_gcry_digest_spec_stribog_512): Ditto. * cipher/tiger.c (_gcry_digest_spec_tiger) (_gcry_digest_spec_tiger1, _gcry_digest_spec_tiger2): Ditto. * cipher/whirlpool.c (_gcry_digest_spec_whirlpool): Ditto. * cipher/md.c (md_enable): Do not allow combination of HMAC and 'expandable-output function'. (md_final): Check if spec->read is NULL before calling. (md_read): Ditto. (md_extract, _gcry_md_extract): New. * doc/gcrypt.texi: Add SHA3 algorithms and gcry_md_extract. * src/cipher-proto.h (gcry_md_extract_t): New. (gcry_md_spec_t): Add 'extract'. * src/gcrypt-int.g (_gcry_md_extract): New. * src/gcrypt.h.in (gcry_md_extract): New. * src/libgcrypt.def: Add gcry_md_extract. * src/libgcrypt.vers: Add gcry_md_extract. * src/visibility.c (gcry_md_extract): New. * src/visibility.h (gcry_md_extract): New. -- Patch adds new interface for reading output from 'expandable-output function' MD algorithms that can give variable length output (ie. SHAKE algorithms from FIPS-202). New function to read output is gpg_error_t gcry_md_extract(gcry_md_hd_t md, int algo, void *buffer, size_t length); Function implicitly finalizes algorithm so that no new input can be given. Subsequents calls of the function return more output bytes from the algorithm. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28keccak: rewrite for improved performanceJussi Kivilinna1-233/+575
* cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-12Keccak: Fix array indexes in θ stepJussi Kivilinna1-12/+12
* cipher/keccak.c (keccak_f1600_state_permute): Fix indexes for D[5]. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-10Add generic SHA3 implementationJussi Kivilinna1-39/+390
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE): Increase blocksize USE_SHA3 enabled. * cipher/keccak.c (SHA3_DELIMITED_SUFFIX, SHAKE_DELIMITED_SUFFIX): New. (KECCAK_STATE): Add proper state. (KECCAK_CONTEXT): Add 'outlen'. (rol64, keccak_f1600_state_permute, transform_blk, transform): New. (keccak_init): Add proper initialization. (keccak_final): Add proper finalization. (selftests_keccak): Add selftests. (oid_spec_sha3_224, oid_spec_sha3_256, oid_spec_sha3_384) (oid_spec_sha3_512): Add OID. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_384, _gcry_digest_spec_sha3_512): Fix output length. * cipher/mac-hmac.c (map_mac_algo_to_md): Fix mapping for SHA3-512. (hmac_get_keylen): Return proper blocksizes for SHA3 algorithms. [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac-internal [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac.c (mac_list) [USE_SHA3]: Add SHA3 algorithms. * cipher/md.c (md_open): Use proper SHA-3 blocksizes for HMAC macpads. * tests/basic.c (check_digests): Add SHA3 test vectors. -- Patch adds generic implementation for SHA3. Currently missing with this patch: - HMAC SHA3 test vectors, not available from NIST (yet?) - ASNs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-08Add framework to eventually support SHA3.Werner Koch1-0/+264
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256) (GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New. (GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256) (GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New. * cipher/keccak.c: New with stub functions. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c. * configure.ac (available_digests): Add sha3. (USE_SHA3): New. * src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests. * cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos. (md_open): Ditto for hmac processing. * cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping. * cipher/hmac-tests.c (run_selftests): Prepare for tests. * cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx". -- Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to sync them with OpenPGP. Signed-off-by: Werner Koch <wk@gnupg.org>