diff options
author | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2013-10-26 15:00:48 +0300 |
---|---|---|
committer | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2013-10-28 16:12:20 +0200 |
commit | 3ff9d2571c18cd7a34359f9c60a10d3b0f932b23 (patch) | |
tree | 9b33f85e2b1c56e2f7704ebf7e60dddb8bfea69b /configure.ac | |
parent | 5a3d43485efdc09912be0967ee0a3ce345b3b15a (diff) | |
download | libgcrypt-3ff9d2571c18cd7a34359f9c60a10d3b0f932b23.tar.gz |
Add ARM NEON assembly implementation of Salsa20
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.
--
Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.
This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:
http://cryptojedi.org/papers/#neoncrypto
Benchmark results on Cortex-A8 (1008 Mhz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B
STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B
STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B
STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B
STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Diffstat (limited to 'configure.ac')
-rw-r--r-- | configure.ac | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/configure.ac b/configure.ac index 114460c2..19c97bd7 100644 --- a/configure.ac +++ b/configure.ac @@ -1560,6 +1560,11 @@ if test "$found" = "1" ; then GCRYPT_CIPHERS="$GCRYPT_CIPHERS salsa20-amd64.lo" ;; esac + + if test x"$neonsupport" = xyes ; then + # Build with the NEON implementation + GCRYPT_CIPHERS="$GCRYPT_CIPHERS salsa20-armv7-neon.lo" + fi fi LIST_MEMBER(gost28147, $enabled_ciphers) |