summaryrefslogtreecommitdiff
path: root/cipher/hash-common.h
AgeCommit message (Collapse)AuthorFilesLines
2016-03-18Always require a 64 bit integer typeWerner Koch1-1/+1
* configure.ac (available_digests_64): Merge with available_digests. (available_kdfs_64): Merge with available_kdfs. <64 bit datatype test>: Bail out if no such type is available. * src/types.h: Emit #error if no u64 can be defined. (PROPERLY_ALIGNED_TYPE): Always add u64 type. * cipher/bithelp.h: Remove all code paths which handle the case of !HAVE_U64_TYPEDEF. * cipher/bufhelp.h: Ditto. * cipher/cipher-ccm.c: Ditto. * cipher/cipher-gcm.c: Ditto. * cipher/cipher-internal.h: Ditto. * cipher/cipher.c: Ditto. * cipher/hash-common.h: Ditto. * cipher/md.c: Ditto. * cipher/poly1305.c: Ditto. * cipher/scrypt.c: Ditto. * cipher/tiger.c: Ditto. * src/g10lib.h: Ditto. * tests/basic.c: Ditto. * tests/bench-slope.c: Ditto. * tests/benchmark.c: Ditto. -- Given that SHA-2 and some other algorithms require a 64 bit type it does not make anymore sense to conditionally compile some part when the platform does not provide such a type. GnuPG-bug-id: 1815. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-10-28keccak: rewrite for improved performanceJussi Kivilinna1-9/+3
* cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-10Add generic SHA3 implementationJussi Kivilinna1-3/+9
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE): Increase blocksize USE_SHA3 enabled. * cipher/keccak.c (SHA3_DELIMITED_SUFFIX, SHAKE_DELIMITED_SUFFIX): New. (KECCAK_STATE): Add proper state. (KECCAK_CONTEXT): Add 'outlen'. (rol64, keccak_f1600_state_permute, transform_blk, transform): New. (keccak_init): Add proper initialization. (keccak_final): Add proper finalization. (selftests_keccak): Add selftests. (oid_spec_sha3_224, oid_spec_sha3_256, oid_spec_sha3_384) (oid_spec_sha3_512): Add OID. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_384, _gcry_digest_spec_sha3_512): Fix output length. * cipher/mac-hmac.c (map_mac_algo_to_md): Fix mapping for SHA3-512. (hmac_get_keylen): Return proper blocksizes for SHA3 algorithms. [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac-internal [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac.c (mac_list) [USE_SHA3]: Add SHA3 algorithms. * cipher/md.c (md_open): Use proper SHA-3 blocksizes for HMAC macpads. * tests/basic.c (check_digests): Add SHA3 test vectors. -- Patch adds generic implementation for SHA3. Currently missing with this patch: - HMAC SHA3 test vectors, not available from NIST (yet?) - ASNs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-17Add bulk processing for hash transform functionsJussi Kivilinna1-1/+2
* cipher/hash-common.c (_gcry_md_block_write): Preload 'hd->blocksize' to stack, pass number of blocks to 'hd->bwrite'. * cipher/hash-common.c (_gcry_md_block_write_t): Add 'nblks'. * cipher/gostr3411-94.c: Rename 'transform' function to 'transform_blk', add new 'transform' function with 'nblks' as additional input. * cipher/md4.c: Ditto. * cipher/md5.c: Ditto. * cipher/md4.c: Ditto. * cipher/rmd160.c: Ditto. * cipher/sha1.c: Ditto. * cipher/sha256.c: Ditto. * cipher/sha512.c: Ditto. * cipher/stribog.c: Ditto. * cipher/tiger.c: Ditto. * cipher/whirlpool.c: Ditto. -- Pass number of blocks to algorithm for futher optimizations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-11-14md: Fix hashing for data >= 256 GBWerner Koch1-0/+1
* cipher/hash-common.h (gcry_md_block_ctx): Add "nblocks_high". * cipher/hash-common.c (_gcry_md_block_write): Bump NBLOCKS_HIGH. * cipher/md4.c (md4_init, md4_final): Take care of NBLOCKS_HIGH. * cipher/md5.c (md5_init, md5_final): Ditto. * cipher/rmd160.c (_gcry_rmd160_init, rmd160_final): Ditto. * cipher/sha1.c (sha1_init, sha1_final): Ditto. * cipher/sha256.c (sha256_init, sha224_init, sha256_final): Ditto. * cipher/sha512.c (sha512_init, sha384_init, sha512_final): Ditto. * cipher/tiger.c (do_init, tiger_final): Ditto. * cipher/whirlpool.c (whirlpool_final): Ditto. * cipher/md.c (gcry_md_algo_info): Add GCRYCTL_SELFTEST. (_gcry_md_selftest): Return "not implemented" as required. * tests/hashtest.c: New. * tests/genhashdata.c: New. * tests/Makefile.am (TESTS): Add hashtest. (noinst_PROGRAMS): Add genhashdata -- Problem found by Denis Corbin and analyzed by Yuriy Kaminskiy. sha512 and whirlpool should not have this problem because they use 64 bit types for counting the blocks. However, a similar fix has been employed to allow for really huge sizes - despite that it will be very hard to test them. The test vectors have been produced by sha{1,224,256}sum and the genhashdata tool. A sequence of 'a' is used for them because a test using one million 'a' is commonly used for test vectors. More test vectors are required. Running the large tests needs to be done manual for now: ./hashtest --gigs 256 tests all algorithms, ./hashtest --gigs 256 sha1 sha224 sha256 only the given ones. A configure option to include these test in the standard regression suite will be useful. The tests will take looong. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-09-30Make Whirlpool use the _gcry_md_block_write helperJussi Kivilinna1-2/+2
* cipher/whirlpool.c (whirlpool_context_t): Add 'bctx', remove 'buffer', 'count' and 'nblocks'. (whirlpool_init): Initialize 'bctx'. (whirlpool_transform): Adjust context argument type and burn stack depth. (whirlpool_add): Remove. (whirlpool_write): Use _gcry_md_block_write. (whirlpool_final, whirlpool_read): Adjust for 'bctx' usage. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-09-21Use hash transform function return type for passing burn stack depthJussi Kivilinna1-2/+2
* cipher/gostr4311-94.c (transform): Return stack burn depth. * cipher/hash-common.c (_gcry_md_block_write): Use stack burn depth returned by 'hd->bwrite'. * cipher/hash-common.h (_gcry_md_block_write_t): Change return type to 'unsigned int'. (gry_md_block_ctx_t): Remove 'stack_burn'. * cipher/md4.c (transform): Return stack burn depth. (md4_final): Use stack burn depth from transform. * cipher/md5.c (transform): Return stack burn depth. (md5_final): Use stack burn depth from transform. * cipher/rmd160.c (transform): Return stack burn depth. (rmd160_final): Use stack burn depth from transform. * cipher/sha1.c (transform): Return stack burn depth. (sha1_final): Use stack burn depth from transform. * cipher/sha256.c (transform): Return stack burn depth. (sha256_final): Use stack burn depth from transform. * cipher/sha512.c (__transform, transform): Return stack burn depth. (sha512_final): Use stack burn depth from transform. * cipher/stribog.c (transform64): Return stack burn depth. * cipher/tiger.c (transform): Return stack burn depth. (tiger_final): Use stack burn depth from transform. -- Transform function might want different depth of stack burn depending on detected CPU features (like in SHA-512 on ARM with NEON). So return stack burn depth from transform functions as a request or a hint to calling function. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-09-21Make SHA-512 use the new _gcry_md_block_write helperJussi Kivilinna1-3/+14
* cipher/hash-common.c (_gcry_md_block_write): Check that hd->buf is large enough. * cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE, MD_NBLOCKS_TYPE): New macros. (gcry_md_block_ctx_t): Use above macros for 'nblocks' and 'buf'. * cipher/sha512.c (SHA512_STATE): New struct. (SHA512_CONTEXT): Add 'bctx' and 'state'. (sha512_init, sha384_init): Initialize 'bctx'. (__transform, _gcry_sha512_transform_armv7_neon): Use SHA512_STATE for 'hd'. (transform): For now, do not return burn stack. (sha512_write): Remove. (sha512_final): Use _gcry_md_block_write and bctx. (_gcry_digest_spec_sha512, _gcry_digest_spec_sha384): Use _gcry_md_block_write. -- Patch changes 'nblocks' counter to 64-bits when SHA-512 is enabled. This does not cause problems with other algorithms; they are already casting 'nblocks' to u32 variable in their finalization functions. Also move 'buf' member to head of 'gcry_md_block_ctx_t' to ensure proper alignment; this is because some algorithms cast buffer pointer to (u64*) in final endian conversion. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-09-18Separate common md block codeDmitry Eremin-Solenikov1-3/+16
* cipher/hash-common.c (_gcry_md_block_write): New function to handle block md operations. The current implementation is limited to 64 byte buffer and u32 block counter. * cipher/md4.c, cipher/md5.c, cipher/rmd.h, cipher/rmd160.c *cipher/sha1.c, cipher/sha256.c, cipher/tiger.c: Convert to use _gcry_md_block_write. -- Whirlpool and SHA512 are left as before, as SHA512 uses 128 bytes buffer and u64 blocks counter and Whirlpool does not have trivial block handling structure. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Indentation changes, minor edits and adjustment of _gcry_sha1_hash_buffers by wk.
2011-02-04Nuked almost all trailing whitespace.Werner Koch1-3/+3
Check and install the standard git pre-commit hook.
2008-09-12Add files.Werner Koch1-0/+33