Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_NEON)
(POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT): New.
* cipher/poly1305.c [POLY1305_USE_NEON]
(_gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
if HWF_ARM_NEON set.
* configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
--
Add Andrew Moon's public domain NEON implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
* cipher/chacha20-armv7-neon.S: New.
* cipher/chacha20.c (USE_NEON): New.
[USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
(chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
HWF_ARM_NEON flag set.
(selftest): Self-test encrypting buffer byte by byte.
* configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
--
Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B
STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B
STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/primegen.c (check_prime): Return true for the small primes.
(_gcry_prime_check): Return correct values for 2 and lower numbers.
* src/mpicalc.c (do_primecheck): New.
(main): Add command 'P'.
(main): Allow for larger input data.
|
|
* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'.
* cipher/whirlpool-sse2-amd64.S: New.
* cipher/whirlpool.c (USE_AMD64_ASM): New.
(whirlpool_tables_s): New.
(rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single
structure and replace old tables with macros of same name.
(tab): New structure containing above tables.
[USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64)
(whirlpool_transform): New.
* configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'.
--
Benchmark results:
On Intel Core i5-4570 (3.2 Ghz):
After:
WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B
Before:
WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B
On Intel Core i5-2450M (2.5 Ghz):
After:
WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B
Before:
WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B
On Intel Core2 T8100 (2.1 Ghz):
After:
WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B
Before:
WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B
Summary, old vs new ratio:
Intel Core i5-4570: 1.88x
Intel Core i5-2450M: 1.59x
Intel Core2 T8100: 1.94x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rmd160.c (transform): Interleave the left and right lane
rounds to introduce more instruction level parallelism.
--
The benchmarks on different systems:
Intel(R) Atom(TM) CPU N570 @ 1.66GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 13.07 ns/B 72.97 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 11.37 ns/B 83.84 MiB/s - c/B
Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 3.31 ns/B 288.0 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 2.08 ns/B 458.5 MiB/s - c/B
Signed-off-by: Andrei Scherer <andsch@inbox.com>
|
|
* cipher/mac.c (_gcry_mac_close): Check for NULL.
--
We always allow this for easier cleanup. actually the docs already
tell that this is allowed.
|
|
* cipher/md.c (_gcry_md_info): Fix arg testing.
--
GnuPG-bug-id: 1697
|
|
* cipher/primegen.c (_gcry_generate_elg_prime): Change to return an
error code.
* cipher/dsa.c (generate): Take care of new return code.
* cipher/elgamal.c (generate): Change to return an error code. Take
care of _gcry_generate_elg_prime return code.
(generate_using_x): Take care of _gcry_generate_elg_prime return code.
(elg_generate): Propagate return code from generate.
--
GnuPG-bug-id: 1699, 1700
Reported-by: S.K. Gupta
Note that the NULL deref may have only happened on malloc failure.
|
|
* src/ec-context.h (mpi_ec_ctx_s): Add cofactor 'h'.
* cipher/ecc-common.h (elliptic_curve_t): Add cofactor 'h'.
(_gcry_ecc_update_curve_param): New API adding cofactor.
* cipher/ecc-curves.c (ecc_domain_parms_t): Add cofactor 'h'.
(ecc_domain_parms_t domain_parms): Add cofactors.
(_gcry_ecc_fill_in_curve, _gcry_ecc_update_curve_param)
(_gcry_ecc_get_curve, _gcry_mpi_ec_new, _gcry_ecc_get_param_sexp)
(_gcry_ecc_get_mpi): Handle cofactor.
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Likewise.
* cipher/ecc-misc.c (_gcry_ecc_curve_free)
(_gcry_ecc_curve_copy): Likewise.
* cipher/ecc.c (nist_generate_key, ecc_generate)
(ecc_check_secret_key, ecc_sign, ecc_verify, ecc_encrypt_raw)
(ecc_decrypt_raw, _gcry_pk_ecc_get_sexp, _gcry_pubkey_spec_ecc):
Likewise.
(compute_keygrip): Handle cofactor, but skip it for its computation.
* mpi/ec.c (ec_deinit): Likewise.
* tests/t-mpi-point.c (context_param): Likewise.
(test_curve): Add cofactors.
* tests/curves.c (sample_key_1, sample_key_2): Add cofactors.
* tests/keygrip.c (key_grips): Add cofactors.
--
We keep compatibility of compute_keygrip in cipher/ecc.c.
|
|
* cipher/ecc.c (ecc_generate): Check the "comp" flag for EdDSA.
* cipher/ecc-eddsa.c (eddsa_encode_x_y): Add arg WITH_PREFIX.
(_gcry_ecc_eddsa_encodepoint): Ditto.
(_gcry_ecc_eddsa_ensure_compact): Handle the 0x40 compression prefix.
(_gcry_ecc_eddsa_decodepoint): Ditto.
* tests/keygrip.c: Check an compresssed with prefix Ed25519 key.
* tests/t-ed25519.inp: Ditto.
|
|
* cipher/chacha20.c (chacha20_blocks) [!USE_SSE2]: Do not build.
|
|
* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
--
Benchmark on Cortex-A8 1008Mhz:
New:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.04 ns/B 135.4 MiB/s 7.10 c/B
Old:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.79 ns/B 122.4 MiB/s 7.85 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/gost28147.c (_gcry_gost_enc_data): New.
* cipher/gostr3411-94.c: Rewrite implementation to use u32 mathematic
internally.
* cipher/gost28147.c (_gcry_gost_enc_one): Remove.
--
On my box (Core2 Duo, i386) this highly improves GOST R 34.11-94 speed.
Before:
GOSTR3411_94 | 55.04 ns/B 17.33 MiB/s - c/B
After:
GOSTR3411_94 | 36.70 ns/B 25.99 MiB/s - c/B
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/gost28147.c (gost_setkey, gost_encrypt_block, gost_decrypt_block):
use buf_get_le32/buf_put_le32 helpers.
--
On my box this boosts GOST 28147-89 speed from 36 MiB/s up to 44.5 MiB/s.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_CP): New.
* src/cipher.h (_gcry_digest_spec_gost3411_cp): New.
* cipher/gost28147.c (_gcry_gost_enc_one): Differentiate between
CryptoPro and Test S-Boxes.
* cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_cp,
gost3411_cp_init): New.
* cipher/md.c (md_open): GCRY_MD_GOSTR3411_CP also uses B=32.
--
RFC4357 defines only two S-Boxes that should be used together with
GOST R 34.11-94 - a testing one (from standard itself, for testing only)
and CryptoPro one. Instead of adding a separate gcry_md_ctrl() function
just to switch s-boxes, add a separate MD algorithm using CryptoPro
S-box.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
cipher/gost28147.c (gost_set_extra_info, gost_set_sbox): New.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* src/gcrypt.h.in (GCRYCTL_SET_SBOX, gcry_cipher_set_sbox): New.
* cipher/cipher.c (_gcry_cipher_ctl): pass GCRYCTL_SET_SBOX to
set_extra_info callback.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/gost-s-box.c: New. Outputs optimized expanded representation of
s-boxes (4x256) from compact 16x8 representation.
* cipher/Makefile.am: Add gost-sb.h dependency to gost28147.lo
* cipher/gost.h: Add sbox to the GOST28147_context structure.
* cipher/gost28147.c (gost_setkey): Set default s-box to test s-box from
GOST R 34.11 (this was the only one S-box before).
* cipher/gost28147.c (gost_val): Use sbox from the context.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/gost28147 (oids_gost28147): Add OID from RFC4357.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/gostr3411-94.c: Add OIDs for GOST R 34.11-94 from RFC 4357.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/sha512-armv7-neon.S
(_gcry_sha512_transform_armv7_neon): Byte-swap RW67q and RW1011q
correctly in multi-block loop.
* tests/basic.c (check_digests): Add large test vector for SHA512.
--
Patch fixes bug introduced to multi-block processing by commit df629ba53a6,
"Improve performance of SHA-512/ARM/NEON implementation". Patch also adds
multi-block test vector for SHA-512.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/camellia-arm.S (GET_DATA_POINTER): New.
(_gcry_camellia_arm_encrypt_block): Use GET_DATA_POINTER.
(_gcry_camellia_arm_decrypt_block): Ditto.
* cipher/cast5-arm.S (GET_DATA_POINTER): New.
(_gcry_cast5_arm_encrypt_block, _gcry_cast5_arm_decrypt_block)
(_gcry_cast5_arm_enc_blk2, _gcry_cast5_arm_dec_blk2): Use
GET_DATA_POINTER.
* cipher/rijndael-arm.S (GET_DATA_POINTER): New.
(_gcry_aes_arm_encrypt_block, _gcry_aes_arm_decrypt_block): Use
GET_DATA_POINTER.
* cipher/sha1-armv7-neon.S (GET_DATA_POINTER): New.
(.LK_VEC): Move from .text to .data section.
(_gcry_sha1_transform_armv7_neon): Use GET_DATA_POINTER.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'.
* cipher/chacha20-sse2-amd64.S: New.
* cipher/chacha20.c (USE_SSE2): New.
[USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New.
(chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks
function.
* configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'.
--
Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Intel i5-4570 (haswell),
with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3":
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.97 ns/B 483.8 MiB/s 6.31 c/B
STREAM dec | 1.97 ns/B 483.6 MiB/s 6.31 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.931 ns/B 1024.7 MiB/s 2.98 c/B
STREAM dec | 0.930 ns/B 1025.0 MiB/s 2.98 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'.
* cipher/poly1305-avx2-amd64.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX2)
(POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE)
(POLY1305_AVX2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed.
* cipher/poly1305.c [POLY1305_USE_AVX2]
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if
AVX2 supported by CPU.
* configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'.
--
Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.
--
Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
Benchmarks on Intel i5-2450M (sandy-bridge):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-poly1305.c'.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'.
(_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt)
(_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate)
(_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New.
* cipher/cipher-poly1305.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'.
(cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ...
(_gcry_cipher_setiv): ... here, as with other modes.
* src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'.
* tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New.
(check_ciphers): Add Poly1305 check.
(check_cipher_modes): Call 'check_poly1305_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to
bench_aead_... and take nonce as argument.
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto.
(bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench)
(bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench)
(bench_poly1305_decrypt_do_bench)
(bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops)
(poly1305_decrypt_ops, poly1305_authenticate_ops): New.
(cipher_modes): Add Poly1305.
(cipher_bench_one): Add special handling for Poly1305.
--
Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant
of this mode is proposed for use in TLS and ipsec:
https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/mac-internal.h (_gcry_mac_type_spec_poly1305_aes)
(_gcry_mac_type_spec_poly1305_camellia)
(_gcry_mac_type_spec_poly1305_twofish)
(_gcry_mac_type_spec_poly1305_serpent)
(_gcry_mac_type_spec_poly1305_seed): New.
* cipher/mac-poly1305.c (poly1305mac_context_s): Add 'hd' and
'nonce_set'.
(poly1305mac_open, poly1305mac_close, poly1305mac_setkey): Add handling
for Poly1305-*** MACs.
(poly1305mac_prepare_key, poly1305mac_setiv): New.
(poly1305mac_reset, poly1305mac_write, poly1305mac_read): Add handling
for 'nonce_set'.
(poly1305mac_ops): Add 'poly1305mac_setiv'.
(_gcry_mac_type_spec_poly1305_aes)
(_gcry_mac_type_spec_poly1305_camellia)
(_gcry_mac_type_spec_poly1305_twofish)
(_gcry_mac_type_spec_poly1305_serpent)
(_gcry_mac_type_spec_poly1305_seed): New.
* cipher/mac.c (mac_list): Add Poly1305-AES, Poly1305-Twofish,
Poly1305-Serpent, Poly1305-SEED and Poly1305-Camellia.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305_AES',
'GCRY_MAC_POLY1305_CAMELLIA', 'GCRY_MAC_POLY1305_TWOFISH',
'GCRY_MAC_POLY1305_SERPENT' and 'GCRY_MAC_POLY1305_SEED'.
* tests/basic.c (check_mac): Add Poly1305-AES test vectors.
* tests/bench-slope.c (bench_mac_init): Set IV for Poly1305-*** MACs.
* tests/bench-slope.c (mac_bench): Set IV for Poly1305-*** MACs.
--
Patch adds Bernstein's Poly1305-AES message authentication code to libgcrypt
and other variants of Poly1305-<128-bit block cipher>.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and
'poly1305-internal.h'.
* cipher/mac-internal.h (poly1305mac_context_s): New.
(gcry_mac_handle): Add 'u.poly1305mac'.
(_gcry_mac_type_spec_poly1305mac): New.
* cipher/mac-poly1305.c: New.
* cipher/mac.c (mac_list): Add Poly1305.
* cipher/poly1305-internal.h: New.
* cipher/poly1305.c: New.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'.
* tests/basic.c (check_mac): Add Poly1035 test vectors; Allow
overriding lengths of data and key buffers.
* tests/bench-slope.c (mac_bench): Increase max algo number from 500 to
600.
* tests/benchmark.c (mac_bench): Ditto.
--
Patch adds Bernstein's Poly1305 message authentication code to libgcrypt.
Implementation is based on Andrew Moon's public domain implementation
from: https://github.com/floodyberry/poly1305-opt
The algorithm added by this patch is the plain Poly1305 without AES and
takes 32-bit key that must not be reused.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/chacha20-avx2-amd64.S (_gcry_chacha20_amd64_avx2_blocks): Add
'vzeroupper' at beginning.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/chacha20.c (USE_AVX2): Enable depending on
ENABLE_AVX2_SUPPORT, not HAVE_GCC_INLINE_ASM_AVX2.
* cipher/chacha20-avx2-amd64.S: Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/chacha20-ssse3-amd64.S (_gcry_chacha20_amd64_ssse3_blocks): On
return, clear XMM registers.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'.
* cipher/chacha20-avx2-amd64.S: New.
* cipher/chacha20.c (USE_AVX2): New macro.
[USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New.
(chacha20_do_setkey): Select AVX2 implementation if there is HW
support.
(selftest): Increase size of buf by 256.
* configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'.
--
Add AVX2 optimized implementation for ChaCha20. Based on implementation by
Andrew Moon.
SSSE3 (Intel Haswell):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B
STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B
AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.393 ns/B 2428.0 MiB/s 1.26 c/B
STREAM dec | 0.392 ns/B 2433.6 MiB/s 1.25 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'.
* cipher/chacha20-ssse3-amd64.S: New.
* cipher/chacha20.c (USE_SSSE3): New macro.
[USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New.
(chacha20_do_setkey): Select SSSE3 implementation if there is HW
support.
* configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'.
--
Add SSSE3 optimized implementation for ChaCha20. Based on implementation
by Andrew Moon.
Before (Intel Haswell):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.97 ns/B 483.6 MiB/s 6.31 c/B
STREAM dec | 1.97 ns/B 484.0 MiB/s 6.31 c/B
After:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B
STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20.c'.
* cipher/chacha20.c: New.
* cipher/cipher.c (cipher_list): Add ChaCha20.
* configure.ac: Add ChaCha20.
* doc/gcrypt.texi: Add ChaCha20.
* src/cipher.h (_gcry_cipher_spec_chacha20): New.
* src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo.
* tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100.
(check_stream_cipher): Add ChaCha20 test-vectors.
(check_ciphers): Add ChaCha20.
--
Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based
on public domain implementations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/pubkey.c (map_algo): Mape RSA_E and RSA_S.
|
|
* cipher/md.c (_gcry_md_selftest): Check for spec being NULL.
--
Also removed left-over code in unused file cipher/test-getrusage.c.
Found by Hans-Christoph Steiner with cppcheck.
|
|
* cipher/Makefile.am: Add 'des-amd64.S'.
* cipher/cipher-selftests.c (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures
from 'setkey' function.
* cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk
functions for 3DES.
* cipher/des-amd64.S: New file.
* cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros.
[USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block)
(_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec)
(_gcry_3des_amd64_cfb_dec): New prototypes.
[USE_AMD64_ASM] (tripledes_ecb_crypt): New function.
(TRIPLEDES_ECB_BURN_STACK): New macro.
(_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec)
(bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New
functions.
(selftest): Add call to CTR, CBC and CFB selftest functions.
(do_tripledes_encrypt, do_tripledes_decrypt): Use
TRIPLEDES_ECB_BURN_STACK.
* configure.ac [host=x86-64]: Add 'des-amd64.lo'.
* src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec)
(_gcry_3des_cfb_dec): New prototypes.
--
Add non-parallel functions for small speed-up and 3-way parallel functions for
modes of operation that support parallel processing.
Old vs new (Intel Core i5-4570):
================================
enc dec
ECB 1.17x 1.17x
CBC 1.17x 2.51x
CFB 1.16x 2.49x
OFB 1.17x 1.17x
CTR 2.56x 2.56x
Old vs new (Intel Core i5-2450M):
=================================
enc dec
ECB 1.28x 1.28x
CBC 1.27x 2.33x
CFB 1.27x 2.34x
OFB 1.27x 1.27x
CTR 2.36x 2.35x
New (Intel Core i5-4570):
=========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 28.39 ns/B 33.60 MiB/s 90.84 c/B
ECB dec | 28.27 ns/B 33.74 MiB/s 90.45 c/B
CBC enc | 29.50 ns/B 32.33 MiB/s 94.40 c/B
CBC dec | 13.35 ns/B 71.45 MiB/s 42.71 c/B
CFB enc | 29.59 ns/B 32.23 MiB/s 94.68 c/B
CFB dec | 13.41 ns/B 71.12 MiB/s 42.91 c/B
OFB enc | 28.90 ns/B 33.00 MiB/s 92.47 c/B
OFB dec | 28.90 ns/B 33.00 MiB/s 92.48 c/B
CTR enc | 13.39 ns/B 71.20 MiB/s 42.86 c/B
CTR dec | 13.39 ns/B 71.21 MiB/s 42.86 c/B
Old (Intel Core i5-4570):
=========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 33.24 ns/B 28.69 MiB/s 106.4 c/B
ECB dec | 33.26 ns/B 28.67 MiB/s 106.4 c/B
CBC enc | 34.45 ns/B 27.69 MiB/s 110.2 c/B
CBC dec | 33.45 ns/B 28.51 MiB/s 107.1 c/B
CFB enc | 34.43 ns/B 27.70 MiB/s 110.2 c/B
CFB dec | 33.41 ns/B 28.55 MiB/s 106.9 c/B
OFB enc | 33.79 ns/B 28.22 MiB/s 108.1 c/B
OFB dec | 33.79 ns/B 28.22 MiB/s 108.1 c/B
CTR enc | 34.27 ns/B 27.83 MiB/s 109.7 c/B
CTR dec | 34.27 ns/B 27.83 MiB/s 109.7 c/B
New (Intel Core i5-2450M):
==========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 42.21 ns/B 22.59 MiB/s 105.5 c/B
ECB dec | 42.23 ns/B 22.58 MiB/s 105.6 c/B
CBC enc | 43.70 ns/B 21.82 MiB/s 109.2 c/B
CBC dec | 23.25 ns/B 41.02 MiB/s 58.12 c/B
CFB enc | 43.71 ns/B 21.82 MiB/s 109.3 c/B
CFB dec | 23.23 ns/B 41.05 MiB/s 58.08 c/B
OFB enc | 42.73 ns/B 22.32 MiB/s 106.8 c/B
OFB dec | 42.73 ns/B 22.32 MiB/s 106.8 c/B
CTR enc | 23.31 ns/B 40.92 MiB/s 58.27 c/B
CTR dec | 23.35 ns/B 40.84 MiB/s 58.38 c/B
Old (Intel Core i5-2450M):
==========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 53.98 ns/B 17.67 MiB/s 134.9 c/B
ECB dec | 54.00 ns/B 17.66 MiB/s 135.0 c/B
CBC enc | 55.43 ns/B 17.20 MiB/s 138.6 c/B
CBC dec | 54.27 ns/B 17.57 MiB/s 135.7 c/B
CFB enc | 55.42 ns/B 17.21 MiB/s 138.6 c/B
CFB dec | 54.35 ns/B 17.55 MiB/s 135.9 c/B
OFB enc | 54.49 ns/B 17.50 MiB/s 136.2 c/B
OFB dec | 54.49 ns/B 17.50 MiB/s 136.2 c/B
CTR enc | 55.02 ns/B 17.33 MiB/s 137.5 c/B
CTR dec | 55.01 ns/B 17.34 MiB/s 137.5 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/md2.c: New.
* cipher/md.c (digest_list): add _gcry_digest_spec_md2.
* tests/basic.c (check_digests): add MD2 test vectors.
* configure.ac (default_digests): disable md2 by default.
--
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Some minor indentation fixes by wk.
|
|
* src/cipher.h (PUBKEY_ENC_PKCS1_RAW): New.
* cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Handle pkcs1-raw
flag.
* cipher/pubkey-util.c (_gcry_pk_util_data_to_mpi):
Handle s-exp like (data (flags pkcs1-raw) (value xxxxx))
* cipher/rsa-common.c (_gcry_rsa_pkcs1_encode_raw_for_sig):
PKCS#1-encode data with embedded hash OID for signature verification.
* tests/basic.c (check_pubkey_sign): Add tests for s-exps with pkcs1-raw
flag.
--
Allow user to specify (flags pkcs1-raw) to enable pkcs1 padding of raw
value (no hash algorithm is specified). It is up to the user to verify
that the passed value is properly formatted and includes DER-encoded
ASN OID of the used hash function.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/rsa.c (rsa_decrypt): Loop to get multiplicative inverse.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
* cipher/salsa20.c (USE_ARM_NEON_ASM): Define only if
ENABLE_NEON_SUPPORT is defined.
* cipher/serpent.c (USE_NEON): Ditto.
* cipher/sha1.c (USE_NEON): Ditto.
* cipher/sha512.c (USE_ARM_NEON_ASM): Ditto.
--
The generic C source files must only include NEON support if that is
enabled. The dedicated ASM files are conditionally compiled and thus
do not need to use it.
GnuPG-bug-id: 1603
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/ecc-curves.c (_gcry_ecc_update_curve_param): Release passed mpi
values.
* cipher/ecc.c (compute_keygrip): Fix potential memory leak in error
path.
* cipher/ecc.c (_gcry_ecc_get_curve): Release temporary mpi.
--
==11657== 252 (80 direct, 172 indirect) bytes in 4 blocks are definitely lost in loss record 8 of 8
==11657== at 0x4028A28: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==11657== by 0x404178F: _gcry_private_malloc (stdmem.c:113)
==11657== by 0x403CED1: do_malloc.constprop.4 (global.c:768)
==11657== by 0x403DD01: _gcry_xmalloc (global.c:790)
==11657== by 0x409EAE0: _gcry_mpi_alloc (mpiutil.c:84)
==11657== by 0x409C4E4: _gcry_mpi_scan (mpicoder.c:466)
==11657== by 0x404009C: _gcry_sexp_nth_mpi (sexp.c:796)
==11657== by 0x40410B5: _gcry_sexp_vextract_param (sexp.c:2327)
==11657== by 0x4041396: _gcry_sexp_extract_param (sexp.c:2378)
==11657== by 0x407B895: compute_keygrip (ecc.c:1492)
==11657== by 0x404BBE8: _gcry_pk_get_keygrip (pubkey.c:674)
==11657== by 0x403B1BF: gcry_pk_get_keygrip (visibility.c:1056)
==16502== 144 (60 direct, 84 indirect) bytes in 3 blocks are definitely lost in loss record 3 of 7
==16502== at 0x4028A28: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==16502== by 0x404B4DE: _gcry_private_malloc (stdmem.c:113)
==16502== by 0x404667B: do_malloc (global.c:768)
==16502== by 0x40466E7: _gcry_malloc (global.c:790)
==16502== by 0x4046A55: _gcry_xmalloc (global.c:944)
==16502== by 0x40CD25B: _gcry_mpi_alloc (mpiutil.c:84)
==16502== by 0x40CAC3E: _gcry_mpi_scan (mpicoder.c:548)
==16502== by 0x40A72B2: scanval (ecc-curves.c:432)
==16502== by 0x40A7B0D: _gcry_ecc_get_curve (ecc-curves.c:685)
==16502== by 0x4058164: _gcry_pk_get_curve (pubkey.c:747)
==16502== by 0x4043E14: gcry_pk_get_curve (visibility.c:1067)
==16502== by 0x8048934: check_matching (curves.c:124)
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/rmd160.c (_gcry_rmd160_mixblock): pass 1 to transform
--
Currently _gcry_rmd160_mixblock() passes 64 as nblocks to transform()
function, while passing only one block of data. This causes acess after
the allocated data and tons of errors on each valgrind invokation.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
This fixes commit 50b8c834.
|
|
* cipher/tiger.c (tiger_init): Add arg FLAGS.
(tiger1_init, tiger2_init): Ditto.
|
|
* src/gcrypt.h.in (GCRY_MD_FLAG_BUGEMU1): New.
* src/cipher-proto.h (gcry_md_init_t): Add arg FLAGS. Change all code
to implement that flag.
* cipher/md.c (gcry_md_context): Replace SECURE and FINALIZED by bit
field FLAGS. Add flag BUGEMU1. Change all users.
(md_open): Replace args SECURE and HMAC by FLAGS. Init flags.bugemu1.
(_gcry_md_open): Add for GCRY_MD_FLAG_BUGEMU1.
(md_enable): Pass bugemu1 flag to the hash init function.
(_gcry_md_reset): Ditto.
--
This problem is for example exhibited in the Linux cryptsetup tool.
See https://bbs.archlinux.org/viewtopic.php?id=175737 . It has be
been tracked down by Milan Broz.
The suggested way of using the flag is:
if (whirlpool_bug_assumed)
{
#if GCRYPT_VERSION_NUMBER >= 0x010601
err = gcry_md_open (&hd, GCRY_MD_WHIRLPOOL, GCRY_MD_FLAG_BUGEMU1)
if (gpg_err_code (err) == GPG_ERR_INV_ARG)
error ("Need at least Libggcrypt 1.6.1 for the fix");
else
{
do_hash (hd);
gcry_md_close (hd);
}
#endif
}
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac (NEED_GPG_ERROR_VERSION): Require 1.13.
(gl_LOCK): Remove.
* src/ath.c, src/ath.h: Remove. Remove from all files. Replace all
mutexes by gpgrt based statically initialized locks.
* src/global.c (global_init): Remove ath_init.
(_gcry_vcontrol): Make ath install a dummy function.
(print_config): Remove threads info line.
* doc/gcrypt.texi: Simplify the multi-thread related documentation.
--
The current code does only work on ELF systems with weak symbol
support. In particular no locks were used under Windows. With the
new gpgrt_lock functions from the soon to be released libgpg-error
1.13 we have a better portable scheme which also allows for static
initialized mutexes.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/kdf.c (_gcry_kdf_pkdf2): Use gcry_md_reset
to speed up calculation.
--
Current PBKDF2 implementation uses gcry_md_set_key in every iteration
which is extremely slow (even in comparison with other implementations).
Use gcry_md_reset instead and set key only once.
With this test program:
char input[32000], salt[8], key[16];
gcry_kdf_derive(input, sizeof(input), GCRY_KDF_PBKDF2,
gcry_md_map_name("sha1"),
salt, sizeof(salt), 100000, sizeof(key), key);
running time without patch:
real 0m11.165s
user 0m11.136s
sys 0m0.000s
and with patch applied
real 0m0.230s
user 0m0.184s
sys 0m0.024s
(The problem was found when cryptsetup started to use gcrypt internal PBKDF2
and for very long keyfiles unlocking time increased drastically.
See https://bugzilla.redhat.com/show_bug.cgi?id=1051733)
Signed-off-by: Milan Broz <gmazyland@gmail.com>
|
|
* cipher/bithelp.h (bswap32): Rename to _gcry_bswap32.
(bswap64): Rename to _gcry_bswap64.
--
NetBSD provides system macros bswap32 and bswap64 which conflicts with
our macros. Prefixing them with _gcry_ is easier than to come up with
a proper test.
GnuPG-bug-id: 1600
Signed-off-by: Werner Koch <wk@gnupg.org>
(cherry picked from commit 36214bfa8f612cd2faa4de217d1a12a8b5faadbf)
|
|
* cipher/dsa-common (_gcry_dsa_normalize_hash): New. Truncate opaque
mpis as required for DSA and ECDSA signature schemas.
* cipher/dsa.c (verify): Return gpg_err_code_t value from verify() to
behave like the rest of internal sign/verify functions.
* cipher/dsa.c (sign, verify, dsa_verify): Factor out hash truncation.
* cipher/ecc-ecdsa.c (_gcry_ecc_ecdsa_sign): Factor out hash truncation.
* cipher/ecc-ecdsa.c (_gcry_ecc_ecdsa_verify):
as required by ECDSA scheme, truncate hash values to bitlength of
used curve.
* tests/pubkey.c (check_ecc_sample_key): add a testcase for hash
truncation.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/ecc-curves.c (domain_parmss): Add two GOST R 34.10-2012 curves
proposed/pending to standardization by TC26 (Russian cryptography
technical comitee).
* cipher/ecc-curves.c (curve_alias): Add OID aliases.
* tests/curves.c: Increase N_CURVES.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|