Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'.
* cipher/keccak-armv7-neon.S: New.
* cipher/keccak.c (USE_64BIT_ARM_NEON): New.
(NEED_COMMON64): Select if USE_64BIT_ARM_NEON.
[NEED_COMMON64] (round_consts_64bit): Rename to...
[NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add
terminator at end.
[USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon)
(_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon)
(keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New.
(keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation
if supported by HW.
* cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update
to use new round constant table.
* configure.ac: Add 'keccak-armv7-neon.lo'.
--
Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch
is based on public-domain implementation by Ronny Van Keer from
SUPERCOP package:
https://github.com/floodyberry/supercop/blob/master/crypto_hash/\
keccakc1024/inplace-armv7a-neon/keccak2.s
Benchmark results on Cortex-A8 @ 1008 Mhz:
Before (generic 32-bit bit-interleaved impl.):
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B
SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B
SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B
SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B
SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B
SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B
After (ARM/NEON, ~3.2x faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B
SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B
SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B
SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B
SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B
SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/keccak.c [USE_64BIT] [__x86_64__] (absorb_lanes64_8)
(absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New.
* cipher/keccak.c [USE_64BIT] [!__x86_64__] (absorb_lanes64_8)
(absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New.
[USE_64BIT] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT] (keccak_absorb_lanes64): Remove.
[USE_64BIT_SHLD] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT_SHLD] (keccak_absorb_lanes64_shld): Remove.
[USE_64BIT_BMI2] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT_BMI2] (keccak_absorb_lanes64_bmi2): Remove.
* cipher/keccak_permute_64.h (KECCAK_F1600_ABSORB_FUNC_NAME): New.
--
Optimize 64-bit absorb functions for small speed-up. After this
change, 64-bit BMI2 implementation matches speed of fastest results
from SUPERCOP for Intel Haswell CPUs (long messages).
Benchmark on Intel Haswell @ 3.2 Ghz:
Before:
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 2.32 ns/B 411.7 MiB/s 7.41 c/B
SHAKE256 | 2.84 ns/B 336.2 MiB/s 9.08 c/B
SHA3-224 | 2.69 ns/B 354.9 MiB/s 8.60 c/B
SHA3-256 | 2.84 ns/B 336.0 MiB/s 9.08 c/B
SHA3-384 | 3.69 ns/B 258.4 MiB/s 11.81 c/B
SHA3-512 | 5.30 ns/B 179.9 MiB/s 16.97 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 2.27 ns/B 420.6 MiB/s 7.26 c/B
SHAKE256 | 2.79 ns/B 341.4 MiB/s 8.94 c/B
SHA3-224 | 2.64 ns/B 361.7 MiB/s 8.44 c/B
SHA3-256 | 2.79 ns/B 341.5 MiB/s 8.94 c/B
SHA3-384 | 3.65 ns/B 261.4 MiB/s 11.68 c/B
SHA3-512 | 5.27 ns/B 181.0 MiB/s 16.87 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/hash-common.c (_gcry_hash_selftest_check_one): Add handling for
XOFs.
* src/keccak.c (keccak_ops_t): Rename 'extract_inplace' to 'extract'
and add 'pos' argument.
(KECCAK_CONTEXT): Add 'suffix'.
(keccak_extract_inplace64): Rename to...
(keccak_extract64): ...this; Add handling for 'pos' argument.
(keccak_extract_inplace32bi): Rename to...
(keccak_extract32bi): ...this; Add handling for 'pos' argument.
(keccak_extract_inplace64): Rename to...
(keccak_extract64): ...this; Add handling for 'pos' argument.
(keccak_extract_inplace32bi_bmi2): Rename to...
(keccak_extract32bi_bmi2): ...this; Add handling for 'pos' argument.
(keccak_init): Setup 'suffix'; add SHAKE128 & SHAKE256.
(shake128_init, shake256_init): New.
(keccak_final): Do not initial permute for SHAKE output; use correct
suffix for SHAKE.
(keccak_extract): New.
(keccak_selftests_keccak): Add SHAKE128 & SHAKE256 test-vectors.
(run_selftests): Add SHAKE128 & SHAKE256.
(shake128_asn, oid_spec_shake128, shake256_asn, oid_spec_shake256)
(_gcry_digest_spec_shake128, _gcry_digest_spec_shake256): New.
* cipher/md.c (digest_list): Add SHAKE128 & SHAKE256.
* doc/gcrypt.texi: Ditto.
* src/cipher.h (_gcry_digest_spec_shake128)
(_gcry_digest_spec_shake256): New.
* src/gcrypt.h.in (GCRY_MD_SHAKE128, GCRY_MD_SHAKE256): New.
* tests/basic.c (check_one_md): Add XOF check; Add 'elen' argument.
(check_one_md_multi): Skip if algo is XOF.
(check_digests): Add SHAKE128 & SHAKE256 test vectors.
* tests/bench-slope.c (kdf_bench_one): Skip XOFs.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/crc.c (_gcry_digest_spec_crc32)
(_gcry_digest_spec_crc32_rfc1510, _gcry_digest_spec_crc24_rfc2440): Set
'extract' NULL.
* cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_94)
(_gcry_digest_spec_gost3411_cp): Ditto.
* cipher/keccak.c (_gcry_digest_spec_sha3_224)
(_gcry_digest_spec_sha3_256, _gcry_digest_spec_sha3_384)
(_gcry_digest_spec_sha3_512): Ditto.
* cipher/md2.c (_gcry_digest_spec_md2): Ditto.
* cipher/md4.c (_gcry_digest_spec_md4): Ditto.
* cipher/md5.c (_gcry_digest_spec_md5): Ditto.
* cipher/rmd160.c (_gcry_digest_spec_rmd160): Ditto.
* cipher/sha1.c (_gcry_digest_spec_sha1): Ditto.
* cipher/sha256.c (_gcry_digest_spec_sha224)
(_gcry_digest_spec_sha256): Ditto.
* cipher/sha512.c (_gcry_digest_spec_sha384)
(_gcry_digest_spec_sha512): Ditto.
* cipher/stribog.c (_gcry_digest_spec_stribog_256)
(_gcry_digest_spec_stribog_512): Ditto.
* cipher/tiger.c (_gcry_digest_spec_tiger)
(_gcry_digest_spec_tiger1, _gcry_digest_spec_tiger2): Ditto.
* cipher/whirlpool.c (_gcry_digest_spec_whirlpool): Ditto.
* cipher/md.c (md_enable): Do not allow combination of HMAC and
'expandable-output function'.
(md_final): Check if spec->read is NULL before calling.
(md_read): Ditto.
(md_extract, _gcry_md_extract): New.
* doc/gcrypt.texi: Add SHA3 algorithms and gcry_md_extract.
* src/cipher-proto.h (gcry_md_extract_t): New.
(gcry_md_spec_t): Add 'extract'.
* src/gcrypt-int.g (_gcry_md_extract): New.
* src/gcrypt.h.in (gcry_md_extract): New.
* src/libgcrypt.def: Add gcry_md_extract.
* src/libgcrypt.vers: Add gcry_md_extract.
* src/visibility.c (gcry_md_extract): New.
* src/visibility.h (gcry_md_extract): New.
--
Patch adds new interface for reading output from 'expandable-output
function' MD algorithms that can give variable length output (ie.
SHAKE algorithms from FIPS-202). New function to read output is
gpg_error_t gcry_md_extract(gcry_md_hd_t md, int algo,
void *buffer, size_t length);
Function implicitly finalizes algorithm so that no new input can
be given. Subsequents calls of the function return more output
bytes from the algorithm.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'keccak_permute_32.h' and
'keccak_permute_64.h'.
* cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove.
* cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2)
(USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI)
(keccak_ops_t): New.
(KECCAK_STATE): Add 'state64' and 'state32bi' members.
(KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'.
(rol64, keccak_f1600_state_permute): Remove.
[NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New.
[NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi)
(keccak_absorb_lane32bi): New.
[USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64)
(keccak_absorb_lanes64, keccak_generic64_ops): New.
[USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld)
(keccak_absorb_lanes64_shld, keccak_shld_64_ops): New.
[USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2)
(keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New.
[USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi)
(keccak_absorb_lanes32bi, keccak_generic32bi_ops): New.
[USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2)
(pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2)
(keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New.
(keccak_write): New.
(keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation
selection based on HWF features.
(keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops'
for state manipulation.
(keccak_read): Adjust to KECCAK_CONTEXT changes.
(_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256)
(_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use
'keccak_write' instead of '_gcry_md_block_write'.
* cipher/keccak_permute_32.h: New.
* cipher/keccak_permute_64.h: New.
--
Patch adds new generic 64-bit and 32-bit implementations and
optimized implementations for SHA3:
- Generic 64-bit implementation based on 'simple' implementation
from SUPERCOP package.
- Generic 32-bit bit-inteleaved implementataion based on
'simple32bi' implementation from SUPERCOP package.
- Intel BMI2 optimized variants of 64-bit and 32-bit BI
implementations.
- Intel SHLD optimized variant of 64-bit implementation.
Patch also makes proper use of sponge construction to avoid
use of addition input buffer.
Below are bench-slope benchmarks for new 64-bit implementations
made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2).
Before (amd64):
SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B
SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B
SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B
SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B
After (generic 64-bit, amd64), 1.10x faster):
SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B
SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B
SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B
SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B
After (Intel SHLD 64-bit, amd64, 1.13x faster):
SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B
SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B
SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B
SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B
After (Intel BMI2 64-bit, amd64, 1.45x faster):
SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B
SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B
SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B
SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B
Benchmarks of new 32-bit implementations on Intel Core i5-4570
(no turbo, 3.2 Ghz, gcc-4.9.2):
Before (win32):
SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B
SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B
SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B
SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B
After (generic 32-bit BI, win32, 1.23x to 1.29x faster):
SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B
SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B
SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B
SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B
After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster):
SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B
SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B
SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B
SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B
Benchmarks of new 32-bit implementation on ARM Cortex-A8
(1008 Mhz, gcc-4.9.1):
Before:
SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B
SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B
SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B
SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B
After (1.56x faster):
SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B
SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B
SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B
SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/keccak.c (keccak_f1600_state_permute): Fix indexes for D[5].
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE): Increase blocksize
USE_SHA3 enabled.
* cipher/keccak.c (SHA3_DELIMITED_SUFFIX, SHAKE_DELIMITED_SUFFIX): New.
(KECCAK_STATE): Add proper state.
(KECCAK_CONTEXT): Add 'outlen'.
(rol64, keccak_f1600_state_permute, transform_blk, transform): New.
(keccak_init): Add proper initialization.
(keccak_final): Add proper finalization.
(selftests_keccak): Add selftests.
(oid_spec_sha3_224, oid_spec_sha3_256, oid_spec_sha3_384)
(oid_spec_sha3_512): Add OID.
(_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256)
(_gcry_digest_spec_sha3_384, _gcry_digest_spec_sha3_512): Fix output
length.
* cipher/mac-hmac.c (map_mac_algo_to_md): Fix mapping for SHA3-512.
(hmac_get_keylen): Return proper blocksizes for SHA3 algorithms.
[USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224)
(_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384)
(_gcry_mac_type_spec_hmac_sha3_512): New.
* cipher/mac-internal [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224)
(_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384)
(_gcry_mac_type_spec_hmac_sha3_512): New.
* cipher/mac.c (mac_list) [USE_SHA3]: Add SHA3 algorithms.
* cipher/md.c (md_open): Use proper SHA-3 blocksizes for HMAC macpads.
* tests/basic.c (check_digests): Add SHA3 test vectors.
--
Patch adds generic implementation for SHA3. Currently missing with this
patch:
- HMAC SHA3 test vectors, not available from NIST (yet?)
- ASNs
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256)
(GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New.
(GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256)
(GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New.
* cipher/keccak.c: New with stub functions.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c.
* configure.ac (available_digests): Add sha3.
(USE_SHA3): New.
* src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests.
* cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos.
(md_open): Ditto for hmac processing.
* cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping.
* cipher/hmac-tests.c (run_selftests): Prepare for tests.
* cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx".
--
Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to
sync them with OpenPGP.
Signed-off-by: Werner Koch <wk@gnupg.org>
|