Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/cipher.c (cipher_reset): Setup default taglen for OCB after
clearing state.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/arcfour-amd64.S (_gcry_arcfour_amd64): Fix swapped store of
'x' and 'y'.
* tests/basic.c (get_algo_mode_blklen): New.
(check_one_cipher_core): Add new tests for split buffer input on
encryption and decryption.
--
Reported-by: Dima Kukulniak <dima.ky@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c [gcc-version >= 4.4]: Add GCC target
pragma to disable compiler use of SSE.
* cipher/rijndael-aesni.c [gcc-version >= 4.4]: Ditto.
* cipher/rijndael-ssse3-amd64.c [gcc-version >= 4.4]: Ditto.
--
These implementations assume that compiler does not use XMM registers
between assembly blocks.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-internal.h (gcry_cipher_handle): Add bulk.ocb_crypt
and bulk.ocb_auth.
(_gcry_cipher_ocb_get_l): New prototype.
* cipher/cipher-ocb.c (get_l): Rename to ...
(_gcry_cipher_ocb_get_l): ... this.
(_gcry_cipher_ocb_authenticate, ocb_crypt): Use bulk function when
available.
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk
functions for AES.
* cipher/rijndael-aesni.c (get_l, aesni_ocb_enc, aes_ocb_dec)
(_gcry_aes_aesni_ocb_crypt, _gcry_aes_aesni_ocb_auth): New.
* cipher/rijndael.c [USE_AESNI] (_gcry_aes_aesni_ocb_crypt)
(_gcry_aes_aesni_ocb_auth): New prototypes.
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New.
* src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New
prototypes.
* tests/basic.c (check_ocb_cipher_largebuf): New.
(check_ocb_cipher): Add large buffer encryption/decryption test.
--
Patch adds bulk encryption/decryption/authentication code for AES-NI
accelerated AES.
Benchmark on Intel i5-4570 (3200 Mhz, turbo off):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 2.12 ns/B 449.7 MiB/s 6.79 c/B
OCB dec | 2.12 ns/B 449.6 MiB/s 6.79 c/B
OCB auth | 2.07 ns/B 459.9 MiB/s 6.64 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.292 ns/B 3262.5 MiB/s 0.935 c/B
OCB dec | 0.297 ns/B 3212.2 MiB/s 0.950 c/B
OCB auth | 0.260 ns/B 3666.1 MiB/s 0.832 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only when
HAVE_GCC_ATTRIBUTE_PACKED and HAVE_GCC_ATTRIBUTE_ALIGNED are defined.
(bufhelp_int_t): New type.
(buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst, buf_xor_n_copy_2): Use
'bufhelp_int_t'.
[BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_u32_t, bufhelp_u64_t): New.
[BUFHELP_FAST_UNALIGNED_ACCESS] (buf_get_be32, buf_get_le32)
(buf_put_be32, buf_put_le32, buf_get_be64, buf_get_le64)
(buf_put_be64, buf_put_le64): Use 'bufhelp_uXX_t'.
* configure.ac (gcry_cv_gcc_attribute_packed): New.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/bufhelp.h: Move include for uintptr_t to ...
* src/types.h: here. Check that config.h has been included.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
--
|
|
* cipher/hash-common.c (_gcry_md_block_write): Remove NUL check for
hd->buf.
--
HD->BUF is not allocated but part of the struct. HD has already be
dereferenced twice thus the check does not make sense. Detected by
Stack 0.3:
bug: anti-simplify
model: |
%cmp4 = icmp eq i8* %arraydecay, null, !dbg !29
--> false
stack:
- /home/wk/s/libgcrypt/cipher/hash-common.c:114:0
ncore: 1
core:
- /home/wk/s/libgcrypt/cipher/hash-common.c:108:0
- null pointer dereference
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/cipher-ocb.c (ocb_checksum): New.
(ocb_crypt): Move checksum calculation outside main crypt loop, do
checksum calculation for encryption before inbuf is overwritten.
* tests/basic.c (check_ocb_cipher): Rename to ...
(do_check_ocb_cipher): ... to this and add argument for testing
in-place encryption/decryption.
(check_ocb_cipher): New.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/elgamal.c (USE_BLINDING): New.
(decrypt): Rewrite to use ciphertext blinding.
--
CVE-id: CVE-2014-3591
As a countermeasure to a new side-channel attacks on sliding windows
exponentiation we blind the ciphertext for Elgamal decryption. This
is similar to what we are doing with RSA. This patch is a backport of
the GnuPG 1.4 commit ff53cf06e966dce0daba5f2c84e03ab9db2c3c8b.
Unfortunately, the performance impact of Elgamal blinding is quite
noticeable (i5-2410M CPU @ 2.30GHz TP 220):
Algorithm generate 100*priv 100*public
------------------------------------------------
ELG 1024 bit - 100ms 90ms
ELG 2048 bit - 330ms 350ms
ELG 3072 bit - 660ms 790ms
Algorithm generate 100*priv 100*public
------------------------------------------------
ELG 1024 bit - 150ms 90ms
ELG 2048 bit - 520ms 360ms
ELG 3072 bit - 1100ms 800ms
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/Makefile.am (gost-s-box): USe CC_FOR_BUILD.
(noinst_PROGRAMS): Remove.
(EXTRA_DIST): New.
(CLEANFILES): New.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/rijndael.c (do_setkey): Use USE_SSSE3 instead of USE_AESNI
around SSSE3 setkey selection.
--
Reported-by: Richard H Lee <ricardohenrylee@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-ocb.c: New.
* cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c
* cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New.
(gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb.
* cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode.
(_gcry_cipher_open_internal): Setup default taglen of OCB.
(cipher_reset): Clear OCB specific data.
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions.
(_gcry_cipher_setiv): Add OCB specific nonce setting.
(_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN
* src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New.
(gcry_cipher_final): New.
* cipher/bufhelp.h (buf_xor_1): New.
* tests/basic.c (hex2buffer): New.
(check_ocb_cipher): New.
(main): Call it here. Add option --cipher-modes.
* tests/bench-slope.c (bench_aead_encrypt_do_bench): Call
gcry_cipher_final.
(bench_aead_decrypt_do_bench): Ditto.
(bench_aead_authenticate_do_bench): Ditto. Check error code.
(bench_ocb_encrypt_do_bench): New.
(bench_ocb_decrypt_do_bench): New.
(bench_ocb_authenticate_do_bench): New.
(ocb_encrypt_ops): New.
(ocb_decrypt_ops): New.
(ocb_authenticate_ops): New.
(cipher_modes): Add them.
(cipher_bench_one): Skip wrong block length for OCB.
* tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add
OCB support.
--
See the comments on top of cipher/cipher-ocb.c for the patent status
of the OCB mode.
The implementation has not yet been optimized and as such is not faster
that the other AEAD modes. A first candidate for optimization is the
double_block function. Large improvements can be expected by writing
an AES ECB function to work on multiple blocks.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/bithelp.h (_gcry_ctz, _gcry_ctz64): New.
* configure.ac (HAVE_BUILTIN_CTZ): Add new test.
--
Note that these functions return the number of bits in the word when
passing 0.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* Makefile.am (DISTCHECK_CONFIGURE_FLAGS): Remove --enable-ciphers.
* cipher/Makefile.am (DISTCLEANFILES): Add gost-sb.h.
|
|
--
The Manifest file have been part of an experiment a long time ago to
implement source level integrity. I is not maintained for more than a
decade and with the advent of git this is superfluous anyway.
|
|
* cipher/stribog.c (C16): Avoid allocating superfluous space.
--
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/gostr3411-94.c (gost3411_final): Fix loop
--
The maximum iteration count for filling the l (bit length) array was
incrrectly set to 32 (missed that in u8->u32 refactoring). This was
not resulting in stack corruption, since nblocks variable would be
exausted earlier compared to 8 32-bit values (the size of the array).
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/primegen.c (prime_generate_internal): Refactor generator code
to not leak memory for non-implemented feature.
(_gcry_prime_group_generator): Refactor to not leak memory for invalid
args. Also make sure that R_G is set as soon as possible.
--
GnuPG-bug-id: 1705
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
--
|
|
* cipher/scrypt.c (_salsa20_core): Rename to salsa20_core. Change
callers.
(_scryptBlockMix): Rename to scrypt_block_mix. Change callers.
(_scryptROMix): Rename to scrypt_ro_mix. Change callers.
--
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
--
|
|
* cipher/rmd160.c (_gcry_rmd160_mixblock): Store result to buffer in
native-endianess.
--
Commit 4515315f61fbf79413e150fbd1d5f5a2435f2bc5 unintendedly changed this
native-endian store to little-endian.
Reported-by: Yuriy Kaminskiy <yumkam@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'.
* cipher/rijndael-internal.h (USE_SSSE3): New.
(RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'.
* cipher/rijndael-ssse3-amd64.c: New.
* cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey)
(_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New.
(do_setkey): Add HWF check for SSSE3 and setup for SSSE3
implementation.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add
selection for SSSE3 implementation.
* configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'.
--
This patch adds "AES with vector permutations" implementation by
Mike Hamburg. Public-domain source-code is available at:
http://crypto.stanford.edu/vpaes/
Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B
ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B
CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B
CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B
CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B
OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B
OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B
CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B
ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B
CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B
CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B
CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B
CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B
OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B
OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B
CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B
CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B
Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B
ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B
CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B
CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B
CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B
CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B
OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B
OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B
CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B
CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B
ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B
CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B
CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B
CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B
CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B
OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B
OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B
CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B
CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/scrypt.c (_scryptBlockMix): Cast X to 'u32 *' through 'void *'.
--
Patch fixes 'cast increases required alignment' warnings seen on GCC:
scrypt.c: In function '_scryptBlockMix':
scrypt.c:145:22: warning: cast increases required alignment of target type [-Wcast-align]
_salsa20_core ((u32*)X, (u32*)X, 8);
^
scrypt.c:145:31: warning: cast increases required alignment of target type [-Wcast-align]
_salsa20_core ((u32*)X, (u32*)X, 8);
^
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/md.c (md_open, md_copy): Cast 'char *' to ctx through
'void *'.
* cipher/md4.c (md4_final): Use buf_put_* helper instead of
converting 'char *' to 'u32 *'.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (_gcry_rmd160_mixblock, rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sha512.c (sha512_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
--
Patch fixes 'cast increases required alignment' warnings seen on GCC:
md.c: In function 'md_open':
md.c:318:23: warning: cast increases required alignment of target type [-Wcast-align]
hd->ctx = ctx = (struct gcry_md_context *) ((char *) hd + n);
^
md.c: In function 'md_copy':
md.c:491:22: warning: cast increases required alignment of target type [-Wcast-align]
bhd->ctx = b = (struct gcry_md_context *) ((char *) bhd + n);
^
md4.c: In function 'md4_final':
md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align]
#define X(a) do { *(u32*)p = le_bswap32((*hd).a) ; p += 4; } while(0)
^
md4.c:259:3: note: in expansion of macro 'X'
X(A);
^
md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align]
#define X(a) do { *(u32*)p = le_bswap32((*hd).a) ; p += 4; } while(0)
^
md4.c:260:3: note: in expansion of macro 'X'
X(B);
^
[removed the rest]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add u32 variants of
keyschedule arrays to unions u1 and u2.
(keyschedenc32, keyscheddec32): New.
* cipher/rijndael.c (u32_a_t): Remove.
(do_setkey): Add and use tkk[].data32, k_u32, tk_u32 and W_u32; Remove
casting byte arrays to u32_a_t.
(prepare_decryption, do_encrypt_fn, do_decrypt_fn): Use keyschedenc32
and keyscheddec32; Remove casting byte arrays to u32_a_t.
--
Patch fixes 'cast increases required alignment' compiler warnings that GCC was showing:
rijndael.c: In function 'do_setkey':
rijndael.c:310:13: warning: cast increases required alignment of target type [-Wcast-align]
*((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
^
rijndael.c:310:34: warning: cast increases required alignment of target type [-Wcast-align]
*((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
[removed the rest]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
draft-irtf-cfrg-chacha20-poly1305-03
* cipher/cipher-internal.h (gcry_cipher_handle): Use separate byte
counters for AAD and data in Poly1305.
* cipher/cipher-poly1305.c (poly1305_fill_bytecount): Remove.
(poly1305_fill_bytecounts, poly1305_do_padding): New.
(poly1305_aad_finish): Fill padding to Poly1305 and do not fill AAD
length.
(_gcry_cipher_poly1305_authenticate, _gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt): Update AAD and data length separately.
(_gcry_cipher_poly1305_tag): Fill padding and bytecounts to Poly1305.
(_gcry_cipher_poly1305_setkey, _gcry_cipher_poly1305_setiv): Reset
AAD and data byte counts; only allow 96-bit IV.
* cipher/cipher.c (_gcry_cipher_open_internal): Limit Poly1305-AEAD to
ChaCha20 cipher.
* tests/basic.c (_check_poly1305_cipher): Update test-vectors.
(check_ciphers): Limit Poly1305-AEAD checks to ChaCha20.
* tests/bench-slope.c (cipher_bench_one): Ditto.
--
Latest Internet-Draft version for "ChaCha20 and Poly1305 for IETF protocols"
has added additional padding to Poly1305-AEAD and limited support IV size to
96-bits:
https://www.ietf.org/rfcdiff?url1=draft-nir-cfrg-chacha20-poly1305-03&difftype=--html&submit=Go!&url2=draft-irtf-cfrg-chacha20-poly1305-03
Patch makes Poly1305-AEAD implementation to match the changes and limits
Poly1305-AEAD to ChaCha20 only.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/chacha20.c (CHACHA20_CTR_SIZE): New.
(chacha20_ivsetup): Add setup for full counter.
(chacha20_setiv): Allow ivlen == CHACHA20_CTR_SIZE.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c
(_gcry_ghash_setup_intel_pclmul): Remove 'h' parameter.
* cipher/cipher-gcm.c (_gcry_ghash_setup_intel_pclmul): Ditto.
(fillM): Get 'h' pointer from 'c'.
(setupM): Remome 'h' parameter.
(_gcry_cipher_gcm_setkey): Only pass 'c' to setupM.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-internal.h (rijndael_prefetchfn_t): New.
(RIJNDAEL_context): Add 'prefetch_enc_fn' and 'prefetch_dec_fn'.
* cipher/rijndael-tables.h (S, T1, T2, T3, T4, T5, T6, T7, T8, S5, U1)
(U2, U3, U4): Remove.
(encT, dec_tables, decT, inv_sbox): Add.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_encrypt_block): Add parameter for passing table pointer
to assembly implementation.
(prefetch_table, prefetch_enc, prefetch_dec): New.
(do_setkey): Setup context prefetch functions depending on selected
rijndael implementation; Use new tables for key setup.
(prepare_decryption): Use new tables for decryption key setup.
(do_encrypt_aligned): Rename to...
(do_encrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_encrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_encrypt, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec): Prefetch encryption tables
before encryption.
(do_decrypt_aligned): Rename to...
(do_decrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_decrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_decrypt, _gcry_aes_cbc_dec): Prefetch decryption tables
before decryption.
* cipher/rijndael-amd64.S: Use 1+1.25 KiB tables for
encryption+decryption; remove tables from assembly file.
* cipher/rijndael-arm.S: Ditto.
--
Patch replaces 4+4.25 KiB look-up tables in generic implementation and
8+8 KiB look-up tables in AMD64 implementation and 2+2 KiB look-up tables in
ARM implementation with 1+1.25 KiB look-up tables, and adds prefetching of
look-up tables.
AMD64 assembly is slower than before because of additional rotation
instructions. The generic C implementation is now better optimized and
actually faster than before.
Benchmark results on Intel i5-4570 (turbo off) (64-bit, AMD64 assembly):
tests/bench-slope --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.10 ns/B 307.5 MiB/s 9.92 c/B
ECB dec | 3.15 ns/B 302.5 MiB/s 10.09 c/B
CBC enc | 3.46 ns/B 275.5 MiB/s 11.08 c/B
CBC dec | 3.19 ns/B 299.2 MiB/s 10.20 c/B
CFB enc | 3.48 ns/B 274.4 MiB/s 11.12 c/B
CFB dec | 3.23 ns/B 294.8 MiB/s 10.35 c/B
OFB enc | 3.29 ns/B 290.2 MiB/s 10.52 c/B
OFB dec | 3.31 ns/B 288.3 MiB/s 10.58 c/B
CTR enc | 3.64 ns/B 261.7 MiB/s 11.66 c/B
CTR dec | 3.65 ns/B 261.6 MiB/s 11.67 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.21 ns/B 226.7 MiB/s 13.46 c/B
ECB dec | 4.27 ns/B 223.2 MiB/s 13.67 c/B
CBC enc | 4.15 ns/B 229.8 MiB/s 13.28 c/B
CBC dec | 3.85 ns/B 247.8 MiB/s 12.31 c/B
CFB enc | 4.16 ns/B 229.1 MiB/s 13.32 c/B
CFB dec | 3.88 ns/B 245.9 MiB/s 12.41 c/B
OFB enc | 4.38 ns/B 217.8 MiB/s 14.01 c/B
OFB dec | 4.36 ns/B 218.6 MiB/s 13.96 c/B
CTR enc | 4.30 ns/B 221.6 MiB/s 13.77 c/B
CTR dec | 4.30 ns/B 221.7 MiB/s 13.76 c/B
Benchmark on Intel i5-4570 (turbo off) (32-bit mingw, generic C):
tests/bench-slope.exe --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 6.03 ns/B 158.2 MiB/s 19.29 c/B
ECB dec | 5.81 ns/B 164.1 MiB/s 18.60 c/B
CBC enc | 6.22 ns/B 153.4 MiB/s 19.90 c/B
CBC dec | 5.91 ns/B 161.3 MiB/s 18.92 c/B
CFB enc | 6.25 ns/B 152.7 MiB/s 19.99 c/B
CFB dec | 6.24 ns/B 152.8 MiB/s 19.97 c/B
OFB enc | 6.33 ns/B 150.6 MiB/s 20.27 c/B
OFB dec | 6.33 ns/B 150.7 MiB/s 20.25 c/B
CTR enc | 6.28 ns/B 152.0 MiB/s 20.08 c/B
CTR dec | 6.28 ns/B 151.7 MiB/s 20.11 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.02 ns/B 190.0 MiB/s 16.06 c/B
ECB dec | 5.33 ns/B 178.8 MiB/s 17.07 c/B
CBC enc | 4.64 ns/B 205.4 MiB/s 14.86 c/B
CBC dec | 4.95 ns/B 192.7 MiB/s 15.84 c/B
CFB enc | 4.75 ns/B 200.7 MiB/s 15.20 c/B
CFB dec | 4.74 ns/B 201.1 MiB/s 15.18 c/B
OFB enc | 5.29 ns/B 180.3 MiB/s 16.93 c/B
OFB dec | 5.29 ns/B 180.3 MiB/s 16.93 c/B
CTR enc | 4.77 ns/B 200.0 MiB/s 15.26 c/B
CTR dec | 4.77 ns/B 199.8 MiB/s 15.27 c/B
Benchmark on Cortex-A8 (ARM assembly):
tests/bench-slope --cpu-mhz 1008 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.84 ns/B 43.66 MiB/s 22.02 c/B
ECB dec | 22.35 ns/B 42.67 MiB/s 22.53 c/B
CBC enc | 22.97 ns/B 41.53 MiB/s 23.15 c/B
CBC dec | 23.48 ns/B 40.61 MiB/s 23.67 c/B
CFB enc | 22.72 ns/B 41.97 MiB/s 22.90 c/B
CFB dec | 23.41 ns/B 40.74 MiB/s 23.59 c/B
OFB enc | 23.65 ns/B 40.32 MiB/s 23.84 c/B
OFB dec | 23.67 ns/B 40.29 MiB/s 23.86 c/B
CTR enc | 23.24 ns/B 41.03 MiB/s 23.43 c/B
CTR dec | 23.23 ns/B 41.05 MiB/s 23.42 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 26.03 ns/B 36.64 MiB/s 26.24 c/B
ECB dec | 26.97 ns/B 35.36 MiB/s 27.18 c/B
CBC enc | 23.21 ns/B 41.09 MiB/s 23.39 c/B
CBC dec | 23.36 ns/B 40.83 MiB/s 23.54 c/B
CFB enc | 23.02 ns/B 41.42 MiB/s 23.21 c/B
CFB dec | 23.67 ns/B 40.28 MiB/s 23.86 c/B
OFB enc | 27.86 ns/B 34.24 MiB/s 28.08 c/B
OFB dec | 27.87 ns/B 34.21 MiB/s 28.10 c/B
CTR enc | 23.47 ns/B 40.63 MiB/s 23.66 c/B
CTR dec | 23.49 ns/B 40.61 MiB/s 23.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-aesni.c (do_aesni_enc, do_aesni_dec): Pass
input/output through SSE register XMM0.
(do_aesni_cfb): Remove.
(_gcry_aes_aesni_encrypt, _gcry_aes_aesni_decrypt): Add loading/storing
input/output to/from XMM0.
(_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc)
(_gcry_aes_aesni_cfb_dec): Update to use renewed 'do_aesni_enc' and
move IV loading/storing outside loop.
(_gcry_aes_aesni_cbc_dec): Update to use renewed 'do_aesni_dec'.
--
CBC encryption speed is improved ~16% on Intel Haswell and CFB encryption ~8%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'.
* cipher/cipher-gcm-intel-pclmul.c: New.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL]
(_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New
prototypes.
[GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move
to 'cipher-gcm-intel-pclmul.c'.
(ghash): Rename to...
(ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new
function in 'cipher-gcm-intel-pclmul.c'.
(setupM): Move GCM_USE_INTEL_PCLMUL part to new function in
'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based
on available HW acceleration.
(do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'.
* cipher/internal.h (ghash_fn_t): New.
(gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-padlock.c'.
* cipher/rijndael-padlock.c: New.
* cipher/rijndael.c (do_padlock, do_padlock_encrypt)
(do_padlock_decrypt): Move to 'rijndael-padlock.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt): Make return stack burn depth.
* cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): Ditto.
* cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Ditto.
* cipher/rijndael-internal.h (RIJNDAEL_context_s)
(rijndael_cryptfn_t): New.
(RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Change prototypes.
(do_padlock_encrypt, do_padlock_decrypt): New.
(do_setkey): Separate key-length to rounds conversion from
HW features check; Add selection for ctx->encrypt_fn and
ctx->decrypt_fn.
(do_encrypt_aligned, do_decrypt_aligned): Move inside
'[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and
USE_ARM_ASM to...
(do_encrypt, do_decrypt): ...here; Return stack depth; Remove second
temporary buffer from non-aligned input/output case.
(do_padlock): Move decrypt_flag to last argument; Return stack depth.
(rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn.
(_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call
ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned.
(_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of
do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer
after use.
(rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn.
(_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place
of do_decrypt/do_decrypt_aligned.
(_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael.c (do_setkey, rijndael_encrypt, _gcry_aes_cfb_enc)
(rijndael_decrypt, _gcry_aes_cfb_dec): Move USE_AESNI before
USE_PADLOCK.
(check_decryption_praparation) [USE_PADLOCK]: Move to...
(prepare_decryption) [USE_PADLOCK]: ...here.
--
Make order of AES-NI and Padlock #ifdefs consistent.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.in: Add 'rijndael-aesni.c'.
* cipher/rijndael-aesni.c: New.
* cipher/rijndael-internal.h: New.
* cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16)
(USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context)
(keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'.
(u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6)
(aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4)
(do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move
to 'rijndael-aesni.c'.
(prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc)
(_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions
in 'rijdael-aesni.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'.
--
Clean-up rijndael.c before new new hardware acceleration support gets added.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Support
MPI_EC_MONTGOMERY.
* cipher/ecc.c (test_ecdh_only_keys): New.
(nist_generate_key): Call test_ecdh_only_keys for MPI_EC_MONTGOMERY.
(check_secret_key): Handle Montgomery curve of x-coordinate only.
* mpi/ec.c (_gcry_mpi_ec_mul_point): Resize points before the loop.
Simplify, using pointers of Q1, Q2, PRD, and SUM.
--
|
|
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_NEON)
(POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT): New.
* cipher/poly1305.c [POLY1305_USE_NEON]
(_gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
if HWF_ARM_NEON set.
* configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
--
Add Andrew Moon's public domain NEON implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
* cipher/chacha20-armv7-neon.S: New.
* cipher/chacha20.c (USE_NEON): New.
[USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
(chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
HWF_ARM_NEON flag set.
(selftest): Self-test encrypting buffer byte by byte.
* configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
--
Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B
STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B
STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/primegen.c (check_prime): Return true for the small primes.
(_gcry_prime_check): Return correct values for 2 and lower numbers.
* src/mpicalc.c (do_primecheck): New.
(main): Add command 'P'.
(main): Allow for larger input data.
|
|
* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'.
* cipher/whirlpool-sse2-amd64.S: New.
* cipher/whirlpool.c (USE_AMD64_ASM): New.
(whirlpool_tables_s): New.
(rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single
structure and replace old tables with macros of same name.
(tab): New structure containing above tables.
[USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64)
(whirlpool_transform): New.
* configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'.
--
Benchmark results:
On Intel Core i5-4570 (3.2 Ghz):
After:
WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B
Before:
WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B
On Intel Core i5-2450M (2.5 Ghz):
After:
WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B
Before:
WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B
On Intel Core2 T8100 (2.1 Ghz):
After:
WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B
Before:
WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B
Summary, old vs new ratio:
Intel Core i5-4570: 1.88x
Intel Core i5-2450M: 1.59x
Intel Core2 T8100: 1.94x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rmd160.c (transform): Interleave the left and right lane
rounds to introduce more instruction level parallelism.
--
The benchmarks on different systems:
Intel(R) Atom(TM) CPU N570 @ 1.66GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 13.07 ns/B 72.97 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 11.37 ns/B 83.84 MiB/s - c/B
Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 3.31 ns/B 288.0 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 2.08 ns/B 458.5 MiB/s - c/B
Signed-off-by: Andrei Scherer <andsch@inbox.com>
|
|
* cipher/mac.c (_gcry_mac_close): Check for NULL.
--
We always allow this for easier cleanup. actually the docs already
tell that this is allowed.
|
|
* cipher/md.c (_gcry_md_info): Fix arg testing.
--
GnuPG-bug-id: 1697
|
|
* cipher/primegen.c (_gcry_generate_elg_prime): Change to return an
error code.
* cipher/dsa.c (generate): Take care of new return code.
* cipher/elgamal.c (generate): Change to return an error code. Take
care of _gcry_generate_elg_prime return code.
(generate_using_x): Take care of _gcry_generate_elg_prime return code.
(elg_generate): Propagate return code from generate.
--
GnuPG-bug-id: 1699, 1700
Reported-by: S.K. Gupta
Note that the NULL deref may have only happened on malloc failure.
|
|
* src/ec-context.h (mpi_ec_ctx_s): Add cofactor 'h'.
* cipher/ecc-common.h (elliptic_curve_t): Add cofactor 'h'.
(_gcry_ecc_update_curve_param): New API adding cofactor.
* cipher/ecc-curves.c (ecc_domain_parms_t): Add cofactor 'h'.
(ecc_domain_parms_t domain_parms): Add cofactors.
(_gcry_ecc_fill_in_curve, _gcry_ecc_update_curve_param)
(_gcry_ecc_get_curve, _gcry_mpi_ec_new, _gcry_ecc_get_param_sexp)
(_gcry_ecc_get_mpi): Handle cofactor.
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Likewise.
* cipher/ecc-misc.c (_gcry_ecc_curve_free)
(_gcry_ecc_curve_copy): Likewise.
* cipher/ecc.c (nist_generate_key, ecc_generate)
(ecc_check_secret_key, ecc_sign, ecc_verify, ecc_encrypt_raw)
(ecc_decrypt_raw, _gcry_pk_ecc_get_sexp, _gcry_pubkey_spec_ecc):
Likewise.
(compute_keygrip): Handle cofactor, but skip it for its computation.
* mpi/ec.c (ec_deinit): Likewise.
* tests/t-mpi-point.c (context_param): Likewise.
(test_curve): Add cofactors.
* tests/curves.c (sample_key_1, sample_key_2): Add cofactors.
* tests/keygrip.c (key_grips): Add cofactors.
--
We keep compatibility of compute_keygrip in cipher/ecc.c.
|
|
* cipher/ecc.c (ecc_generate): Check the "comp" flag for EdDSA.
* cipher/ecc-eddsa.c (eddsa_encode_x_y): Add arg WITH_PREFIX.
(_gcry_ecc_eddsa_encodepoint): Ditto.
(_gcry_ecc_eddsa_ensure_compact): Handle the 0x40 compression prefix.
(_gcry_ecc_eddsa_decodepoint): Ditto.
* tests/keygrip.c: Check an compresssed with prefix Ed25519 key.
* tests/t-ed25519.inp: Ditto.
|
|
* cipher/chacha20.c (chacha20_blocks) [!USE_SSE2]: Do not build.
|
|
* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
--
Benchmark on Cortex-A8 1008Mhz:
New:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.04 ns/B 135.4 MiB/s 7.10 c/B
Old:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.79 ns/B 122.4 MiB/s 7.85 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|