Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/cipher-internal.h (ocb_get_l): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/camellia-glue.c (get_l): Remove.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common
offsets.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate
offset array when block count matches parallel operation size.
* cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most
common offsets.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use
'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/serpent.c (get_l): Remove.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/twofish.c (get_l): Remove.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l'
instead of 'get_l'.
--
Patch optimizes OCB offset calculation for generic code and
assembly implementations with parallel block processing.
Benchmark of OCB AES-NI on Intel Haswell:
$ tests/bench-slope --cpu-mhz 3201 cipher aes
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B
CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B
OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B
OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B
OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B
CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B
OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B
OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B
OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-ssse3-amd64.c (SSSE3_STATE_SIZE): New.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): Use
'ssse3_state' for storing current SSSE3 state.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]
(vpaes_ssse3_cleanup): Restore SSSE3 state from 'ssse3_state'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption)
(_gcry_aes_ssse3_encrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_dec)
(_gcry_aes_ssse3_cbc_dec, _gcry_aes_ssse3_cbc_dec): Add 'ssse3_state'
array.
(get_l, ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_crypt)
(_gcry_aes_ssse3_ocb_auth): New.
* cipher/rijndael.c (_gcry_aes_ssse3_ocb_crypt)
(_gcry_aes_ssse3_ocb_auth): New.
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_SSSE3]: Use SSSE3
implementation for OCB.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector
registers before use and restore after.
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency
on !defined(__WIN64__).
* cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable,
aesni_prepare, aesni_prepare_2_6, aesni_cleanup)
( aesni_cleanup_2_6): New.
[!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New.
(_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc)
(_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec)
(_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use
'aesni_prepare_2_6'.
* cipher/rijndael-internal.h (USE_SSSE3): Enable if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS.
(USE_AESNI): Remove dependency on !defined(__WIN64__)
* cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]
(vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New.
[!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New.
(vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use
'vpaes_ssse3_prepare'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use
'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to
exclude '.type' and '.size' markers from assembly code, as they are
not support on WIN64/COFF objects.
* configure.ac (gcry_cv_gcc_attribute_ms_abi)
(gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi)
(gcry_cv_gcc_default_abi_is_sysv_abi)
(gcry_cv_gcc_win64_platform_as_ok): New checks.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/cipher-gcm-intel-pclmul.c [gcc-version >= 4.4]: Add GCC target
pragma to disable compiler use of SSE.
* cipher/rijndael-aesni.c [gcc-version >= 4.4]: Ditto.
* cipher/rijndael-ssse3-amd64.c [gcc-version >= 4.4]: Ditto.
--
These implementations assume that compiler does not use XMM registers
between assembly blocks.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
--
|
|
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'.
* cipher/rijndael-internal.h (USE_SSSE3): New.
(RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'.
* cipher/rijndael-ssse3-amd64.c: New.
* cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey)
(_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New.
(do_setkey): Add HWF check for SSSE3 and setup for SSSE3
implementation.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add
selection for SSSE3 implementation.
* configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'.
--
This patch adds "AES with vector permutations" implementation by
Mike Hamburg. Public-domain source-code is available at:
http://crypto.stanford.edu/vpaes/
Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B
ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B
CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B
CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B
CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B
OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B
OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B
CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B
ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B
CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B
CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B
CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B
CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B
OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B
OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B
CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B
CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B
Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B
ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B
CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B
CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B
CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B
CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B
OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B
OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B
CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B
CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B
ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B
CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B
CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B
CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B
CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B
OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B
OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B
CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B
CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|