Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/rijndael-aesni.c (do_aesni_ctr_4): Split assembly block in
two parts to reduce number of register constraints needed.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/camellia-glue.c (_gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Precalculate Ls array always, instead of
just if 'blkn % <parallel blocks> == 0'.
* cipher/serpent.c (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth): Ditto.
* cipher/rijndael-aesni.c (get_l): Remove low-bit checks.
(aes_ocb_enc, aes_ocb_dec, _gcry_aes_aesni_ocb_auth): Handle leading
blocks until block counter is multiple of 4, so that parallel block
processing loop can use 'c->u_mode.ocb.L' array directly.
* tests/basic.c (check_ocb_cipher_largebuf): Rename to...
(check_ocb_cipher_largebuf_split): ...this and add option to process
large buffer as two split buffers.
(check_ocb_cipher_largebuf): New.
--
Patch simplifies source and reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-aesni.c (do_aesni_ctr_4): Do addition using
CTR in big-endian form, if least-significant byte does not overflow.
--
Patch improves AES-NI CTR speed by 20%.
Benchmark on Intel Haswell (3.2 Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.8 MiB/s 0.875 c/B
CTR dec | 0.273 ns/B 3491.0 MiB/s 0.874 c/B
After:
CTR enc | 0.228 ns/B 4190.0 MiB/s 0.729 c/B
CTR dec | 0.228 ns/B 4190.2 MiB/s 0.729 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-internal.h (ocb_get_l): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/camellia-glue.c (get_l): Remove.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common
offsets.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate
offset array when block count matches parallel operation size.
* cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most
common offsets.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use
'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/serpent.c (get_l): Remove.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/twofish.c (get_l): Remove.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l'
instead of 'get_l'.
--
Patch optimizes OCB offset calculation for generic code and
assembly implementations with parallel block processing.
Benchmark of OCB AES-NI on Intel Haswell:
$ tests/bench-slope --cpu-mhz 3201 cipher aes
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B
CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B
OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B
OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B
OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B
CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B
OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B
OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B
OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector
registers before use and restore after.
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency
on !defined(__WIN64__).
* cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable,
aesni_prepare, aesni_prepare_2_6, aesni_cleanup)
( aesni_cleanup_2_6): New.
[!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New.
(_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc)
(_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec)
(_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use
'aesni_prepare_2_6'.
* cipher/rijndael-internal.h (USE_SSSE3): Enable if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS.
(USE_AESNI): Remove dependency on !defined(__WIN64__)
* cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]
(vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New.
[!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New.
(vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use
'vpaes_ssse3_prepare'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use
'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to
exclude '.type' and '.size' markers from assembly code, as they are
not support on WIN64/COFF objects.
* configure.ac (gcry_cv_gcc_attribute_ms_abi)
(gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi)
(gcry_cv_gcc_default_abi_is_sysv_abi)
(gcry_cv_gcc_win64_platform_as_ok): New checks.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c [gcc-version >= 4.4]: Add GCC target
pragma to disable compiler use of SSE.
* cipher/rijndael-aesni.c [gcc-version >= 4.4]: Ditto.
* cipher/rijndael-ssse3-amd64.c [gcc-version >= 4.4]: Ditto.
--
These implementations assume that compiler does not use XMM registers
between assembly blocks.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-internal.h (gcry_cipher_handle): Add bulk.ocb_crypt
and bulk.ocb_auth.
(_gcry_cipher_ocb_get_l): New prototype.
* cipher/cipher-ocb.c (get_l): Rename to ...
(_gcry_cipher_ocb_get_l): ... this.
(_gcry_cipher_ocb_authenticate, ocb_crypt): Use bulk function when
available.
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk
functions for AES.
* cipher/rijndael-aesni.c (get_l, aesni_ocb_enc, aes_ocb_dec)
(_gcry_aes_aesni_ocb_crypt, _gcry_aes_aesni_ocb_auth): New.
* cipher/rijndael.c [USE_AESNI] (_gcry_aes_aesni_ocb_crypt)
(_gcry_aes_aesni_ocb_auth): New prototypes.
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New.
* src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New
prototypes.
* tests/basic.c (check_ocb_cipher_largebuf): New.
(check_ocb_cipher): Add large buffer encryption/decryption test.
--
Patch adds bulk encryption/decryption/authentication code for AES-NI
accelerated AES.
Benchmark on Intel i5-4570 (3200 Mhz, turbo off):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 2.12 ns/B 449.7 MiB/s 6.79 c/B
OCB dec | 2.12 ns/B 449.6 MiB/s 6.79 c/B
OCB auth | 2.07 ns/B 459.9 MiB/s 6.64 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.292 ns/B 3262.5 MiB/s 0.935 c/B
OCB dec | 0.297 ns/B 3212.2 MiB/s 0.950 c/B
OCB auth | 0.260 ns/B 3666.1 MiB/s 0.832 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-aesni.c (do_aesni_enc, do_aesni_dec): Pass
input/output through SSE register XMM0.
(do_aesni_cfb): Remove.
(_gcry_aes_aesni_encrypt, _gcry_aes_aesni_decrypt): Add loading/storing
input/output to/from XMM0.
(_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc)
(_gcry_aes_aesni_cfb_dec): Update to use renewed 'do_aesni_enc' and
move IV loading/storing outside loop.
(_gcry_aes_aesni_cbc_dec): Update to use renewed 'do_aesni_dec'.
--
CBC encryption speed is improved ~16% on Intel Haswell and CFB encryption ~8%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt): Make return stack burn depth.
* cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): Ditto.
* cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Ditto.
* cipher/rijndael-internal.h (RIJNDAEL_context_s)
(rijndael_cryptfn_t): New.
(RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Change prototypes.
(do_padlock_encrypt, do_padlock_decrypt): New.
(do_setkey): Separate key-length to rounds conversion from
HW features check; Add selection for ctx->encrypt_fn and
ctx->decrypt_fn.
(do_encrypt_aligned, do_decrypt_aligned): Move inside
'[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and
USE_ARM_ASM to...
(do_encrypt, do_decrypt): ...here; Return stack depth; Remove second
temporary buffer from non-aligned input/output case.
(do_padlock): Move decrypt_flag to last argument; Return stack depth.
(rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn.
(_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call
ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned.
(_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of
do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer
after use.
(rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn.
(_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place
of do_decrypt/do_decrypt_aligned.
(_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.in: Add 'rijndael-aesni.c'.
* cipher/rijndael-aesni.c: New.
* cipher/rijndael-internal.h: New.
* cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16)
(USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context)
(keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'.
(u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6)
(aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4)
(do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move
to 'rijndael-aesni.c'.
(prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc)
(_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions
in 'rijdael-aesni.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'.
--
Clean-up rijndael.c before new new hardware acceleration support gets added.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|