Age | Commit message (Collapse) | Author | Files | Lines |
|
cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to
'(C)'.
cipher/blowfish-arm.S: Ditto.
cipher/bufhelp.h: Ditto.
cipher/camellia-aesni-avx-amd64.S: Ditto.
cipher/camellia-aesni-avx2-amd64.S: Ditto.
cipher/camellia-arm.S: Ditto.
cipher/cast5-amd64.S: Ditto.
cipher/cast5-arm.S: Ditto.
cipher/cipher-ccm.c: Ditto.
cipher/cipher-cmac.c: Ditto.
cipher/cipher-gcm.c: Ditto.
cipher/cipher-selftest.c: Ditto.
cipher/cipher-selftest.h: Ditto.
cipher/mac-cmac.c: Ditto.
cipher/mac-gmac.c: Ditto.
cipher/mac-hmac.c: Ditto.
cipher/mac-internal.h: Ditto.
cipher/mac.c: Ditto.
cipher/rijndael-amd64.S: Ditto.
cipher/rijndael-arm.S: Ditto.
cipher/salsa20-amd64.S: Ditto.
cipher/salsa20-armv7-neon.S: Ditto.
cipher/serpent-armv7-neon.S: Ditto.
cipher/serpent-avx2-amd64.S: Ditto.
cipher/serpent-sse2-amd64.S: Ditto.
--
Avoid use of '©' for easier parsing of source for copyright information.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/salsa20-amd64.S: Rename '._labels' to '.L_labels'.
* cipher/salsa20-armv7-neon.S: Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'serpent-armv7-neon.S'.
* cipher/serpent-armv7-neon.S: New.
* cipher/serpent.c (USE_NEON): New macro.
(serpent_context_t) [USE_NEON]: Add 'use_neon'.
[USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec): New prototypes.
(serpent_setkey_internal) [USE_NEON]: Detect NEON support.
(_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations
to process eight blocks in parallel.
* configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'.
--
Patch adds ARM NEON optimized implementation of Serpent cipher
to speed up parallelizable bulk operations.
Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz):
Old:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 43.53 ns/B 21.91 MiB/s 43.88 c/B
CFB dec | 44.77 ns/B 21.30 MiB/s 45.13 c/B
CTR enc | 45.21 ns/B 21.10 MiB/s 45.57 c/B
CTR dec | 45.21 ns/B 21.09 MiB/s 45.57 c/B
New:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 26.26 ns/B 36.32 MiB/s 26.47 c/B
CFB dec | 26.21 ns/B 36.38 MiB/s 26.42 c/B
CTR enc | 26.20 ns/B 36.40 MiB/s 26.41 c/B
CTR dec | 26.20 ns/B 36.40 MiB/s 26.41 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.
--
Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.
This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:
http://cryptojedi.org/papers/#neoncrypto
Benchmark results on Cortex-A8 (1008 Mhz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B
STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B
STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B
STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B
STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|