Age | Commit message (Collapse) | Author | Files | Lines |
|
* configure.ac (available_digests_64): Merge with available_digests.
(available_kdfs_64): Merge with available_kdfs.
<64 bit datatype test>: Bail out if no such type is available.
* src/types.h: Emit #error if no u64 can be defined.
(PROPERLY_ALIGNED_TYPE): Always add u64 type.
* cipher/bithelp.h: Remove all code paths which handle the
case of !HAVE_U64_TYPEDEF.
* cipher/bufhelp.h: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-internal.h: Ditto.
* cipher/cipher.c: Ditto.
* cipher/hash-common.h: Ditto.
* cipher/md.c: Ditto.
* cipher/poly1305.c: Ditto.
* cipher/scrypt.c: Ditto.
* cipher/tiger.c: Ditto.
* src/g10lib.h: Ditto.
* tests/basic.c: Ditto.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
--
Given that SHA-2 and some other algorithms require a 64 bit type it
does not make anymore sense to conditionally compile some part when
the platform does not provide such a type.
GnuPG-bug-id: 1815.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/Makefile.am: Add 'keccak_permute_32.h' and
'keccak_permute_64.h'.
* cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove.
* cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2)
(USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI)
(keccak_ops_t): New.
(KECCAK_STATE): Add 'state64' and 'state32bi' members.
(KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'.
(rol64, keccak_f1600_state_permute): Remove.
[NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New.
[NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi)
(keccak_absorb_lane32bi): New.
[USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64)
(keccak_absorb_lanes64, keccak_generic64_ops): New.
[USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld)
(keccak_absorb_lanes64_shld, keccak_shld_64_ops): New.
[USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2)
(keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New.
[USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi)
(keccak_absorb_lanes32bi, keccak_generic32bi_ops): New.
[USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2)
(pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2)
(keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New.
(keccak_write): New.
(keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation
selection based on HWF features.
(keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops'
for state manipulation.
(keccak_read): Adjust to KECCAK_CONTEXT changes.
(_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256)
(_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use
'keccak_write' instead of '_gcry_md_block_write'.
* cipher/keccak_permute_32.h: New.
* cipher/keccak_permute_64.h: New.
--
Patch adds new generic 64-bit and 32-bit implementations and
optimized implementations for SHA3:
- Generic 64-bit implementation based on 'simple' implementation
from SUPERCOP package.
- Generic 32-bit bit-inteleaved implementataion based on
'simple32bi' implementation from SUPERCOP package.
- Intel BMI2 optimized variants of 64-bit and 32-bit BI
implementations.
- Intel SHLD optimized variant of 64-bit implementation.
Patch also makes proper use of sponge construction to avoid
use of addition input buffer.
Below are bench-slope benchmarks for new 64-bit implementations
made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2).
Before (amd64):
SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B
SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B
SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B
SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B
After (generic 64-bit, amd64), 1.10x faster):
SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B
SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B
SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B
SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B
After (Intel SHLD 64-bit, amd64, 1.13x faster):
SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B
SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B
SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B
SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B
After (Intel BMI2 64-bit, amd64, 1.45x faster):
SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B
SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B
SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B
SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B
Benchmarks of new 32-bit implementations on Intel Core i5-4570
(no turbo, 3.2 Ghz, gcc-4.9.2):
Before (win32):
SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B
SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B
SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B
SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B
After (generic 32-bit BI, win32, 1.23x to 1.29x faster):
SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B
SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B
SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B
SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B
After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster):
SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B
SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B
SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B
SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B
Benchmarks of new 32-bit implementation on ARM Cortex-A8
(1008 Mhz, gcc-4.9.1):
Before:
SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B
SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B
SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B
SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B
After (1.56x faster):
SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B
SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B
SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B
SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE): Increase blocksize
USE_SHA3 enabled.
* cipher/keccak.c (SHA3_DELIMITED_SUFFIX, SHAKE_DELIMITED_SUFFIX): New.
(KECCAK_STATE): Add proper state.
(KECCAK_CONTEXT): Add 'outlen'.
(rol64, keccak_f1600_state_permute, transform_blk, transform): New.
(keccak_init): Add proper initialization.
(keccak_final): Add proper finalization.
(selftests_keccak): Add selftests.
(oid_spec_sha3_224, oid_spec_sha3_256, oid_spec_sha3_384)
(oid_spec_sha3_512): Add OID.
(_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256)
(_gcry_digest_spec_sha3_384, _gcry_digest_spec_sha3_512): Fix output
length.
* cipher/mac-hmac.c (map_mac_algo_to_md): Fix mapping for SHA3-512.
(hmac_get_keylen): Return proper blocksizes for SHA3 algorithms.
[USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224)
(_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384)
(_gcry_mac_type_spec_hmac_sha3_512): New.
* cipher/mac-internal [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224)
(_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384)
(_gcry_mac_type_spec_hmac_sha3_512): New.
* cipher/mac.c (mac_list) [USE_SHA3]: Add SHA3 algorithms.
* cipher/md.c (md_open): Use proper SHA-3 blocksizes for HMAC macpads.
* tests/basic.c (check_digests): Add SHA3 test vectors.
--
Patch adds generic implementation for SHA3. Currently missing with this
patch:
- HMAC SHA3 test vectors, not available from NIST (yet?)
- ASNs
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.c (_gcry_md_block_write): Preload 'hd->blocksize'
to stack, pass number of blocks to 'hd->bwrite'.
* cipher/hash-common.c (_gcry_md_block_write_t): Add 'nblks'.
* cipher/gostr3411-94.c: Rename 'transform' function to
'transform_blk', add new 'transform' function with 'nblks' as
additional input.
* cipher/md4.c: Ditto.
* cipher/md5.c: Ditto.
* cipher/md4.c: Ditto.
* cipher/rmd160.c: Ditto.
* cipher/sha1.c: Ditto.
* cipher/sha256.c: Ditto.
* cipher/sha512.c: Ditto.
* cipher/stribog.c: Ditto.
* cipher/tiger.c: Ditto.
* cipher/whirlpool.c: Ditto.
--
Pass number of blocks to algorithm for futher optimizations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.h (gcry_md_block_ctx): Add "nblocks_high".
* cipher/hash-common.c (_gcry_md_block_write): Bump NBLOCKS_HIGH.
* cipher/md4.c (md4_init, md4_final): Take care of NBLOCKS_HIGH.
* cipher/md5.c (md5_init, md5_final): Ditto.
* cipher/rmd160.c (_gcry_rmd160_init, rmd160_final): Ditto.
* cipher/sha1.c (sha1_init, sha1_final): Ditto.
* cipher/sha256.c (sha256_init, sha224_init, sha256_final): Ditto.
* cipher/sha512.c (sha512_init, sha384_init, sha512_final): Ditto.
* cipher/tiger.c (do_init, tiger_final): Ditto.
* cipher/whirlpool.c (whirlpool_final): Ditto.
* cipher/md.c (gcry_md_algo_info): Add GCRYCTL_SELFTEST.
(_gcry_md_selftest): Return "not implemented" as required.
* tests/hashtest.c: New.
* tests/genhashdata.c: New.
* tests/Makefile.am (TESTS): Add hashtest.
(noinst_PROGRAMS): Add genhashdata
--
Problem found by Denis Corbin and analyzed by Yuriy Kaminskiy.
sha512 and whirlpool should not have this problem because they use 64
bit types for counting the blocks. However, a similar fix has been
employed to allow for really huge sizes - despite that it will be very
hard to test them.
The test vectors have been produced by sha{1,224,256}sum and the
genhashdata tool. A sequence of 'a' is used for them because a test
using one million 'a' is commonly used for test vectors. More test
vectors are required. Running the large tests needs to be done
manual for now:
./hashtest --gigs 256
tests all algorithms,
./hashtest --gigs 256 sha1 sha224 sha256
only the given ones. A configure option to include these test in the
standard regression suite will be useful. The tests will take looong.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/whirlpool.c (whirlpool_context_t): Add 'bctx', remove
'buffer', 'count' and 'nblocks'.
(whirlpool_init): Initialize 'bctx'.
(whirlpool_transform): Adjust context argument type and burn stack
depth.
(whirlpool_add): Remove.
(whirlpool_write): Use _gcry_md_block_write.
(whirlpool_final, whirlpool_read): Adjust for 'bctx' usage.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/gostr4311-94.c (transform): Return stack burn depth.
* cipher/hash-common.c (_gcry_md_block_write): Use stack burn depth
returned by 'hd->bwrite'.
* cipher/hash-common.h (_gcry_md_block_write_t): Change return type to
'unsigned int'.
(gry_md_block_ctx_t): Remove 'stack_burn'.
* cipher/md4.c (transform): Return stack burn depth.
(md4_final): Use stack burn depth from transform.
* cipher/md5.c (transform): Return stack burn depth.
(md5_final): Use stack burn depth from transform.
* cipher/rmd160.c (transform): Return stack burn depth.
(rmd160_final): Use stack burn depth from transform.
* cipher/sha1.c (transform): Return stack burn depth.
(sha1_final): Use stack burn depth from transform.
* cipher/sha256.c (transform): Return stack burn depth.
(sha256_final): Use stack burn depth from transform.
* cipher/sha512.c (__transform, transform): Return stack burn depth.
(sha512_final): Use stack burn depth from transform.
* cipher/stribog.c (transform64): Return stack burn depth.
* cipher/tiger.c (transform): Return stack burn depth.
(tiger_final): Use stack burn depth from transform.
--
Transform function might want different depth of stack burn depending on
detected CPU features (like in SHA-512 on ARM with NEON). So return
stack burn depth from transform functions as a request or a hint to
calling function.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.c (_gcry_md_block_write): Check that hd->buf is
large enough.
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE, MD_NBLOCKS_TYPE): New
macros.
(gcry_md_block_ctx_t): Use above macros for 'nblocks' and 'buf'.
* cipher/sha512.c (SHA512_STATE): New struct.
(SHA512_CONTEXT): Add 'bctx' and 'state'.
(sha512_init, sha384_init): Initialize 'bctx'.
(__transform, _gcry_sha512_transform_armv7_neon): Use SHA512_STATE for
'hd'.
(transform): For now, do not return burn stack.
(sha512_write): Remove.
(sha512_final): Use _gcry_md_block_write and bctx.
(_gcry_digest_spec_sha512, _gcry_digest_spec_sha384): Use
_gcry_md_block_write.
--
Patch changes 'nblocks' counter to 64-bits when SHA-512 is enabled. This does
not cause problems with other algorithms; they are already casting 'nblocks'
to u32 variable in their finalization functions. Also move 'buf' member to
head of 'gcry_md_block_ctx_t' to ensure proper alignment; this is because some
algorithms cast buffer pointer to (u64*) in final endian conversion.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.c (_gcry_md_block_write): New function to handle
block md operations. The current implementation is limited to 64 byte
buffer and u32 block counter.
* cipher/md4.c, cipher/md5.c, cipher/rmd.h, cipher/rmd160.c
*cipher/sha1.c, cipher/sha256.c, cipher/tiger.c: Convert to use
_gcry_md_block_write.
--
Whirlpool and SHA512 are left as before, as SHA512 uses 128 bytes buffer
and u64 blocks counter and Whirlpool does not have trivial block handling
structure.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Indentation changes, minor edits and adjustment of
_gcry_sha1_hash_buffers by wk.
|
|
Check and install the standard git pre-commit hook.
|
|
|