summaryrefslogtreecommitdiff
path: root/cipher/salsa20-armv7-neon.S
AgeCommit message (Collapse)AuthorFilesLines
2013-12-18Change utf-8 copyright characters to '(C)'Jussi Kivilinna1-1/+1
cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to '(C)'. cipher/blowfish-arm.S: Ditto. cipher/bufhelp.h: Ditto. cipher/camellia-aesni-avx-amd64.S: Ditto. cipher/camellia-aesni-avx2-amd64.S: Ditto. cipher/camellia-arm.S: Ditto. cipher/cast5-amd64.S: Ditto. cipher/cast5-arm.S: Ditto. cipher/cipher-ccm.c: Ditto. cipher/cipher-cmac.c: Ditto. cipher/cipher-gcm.c: Ditto. cipher/cipher-selftest.c: Ditto. cipher/cipher-selftest.h: Ditto. cipher/mac-cmac.c: Ditto. cipher/mac-gmac.c: Ditto. cipher/mac-hmac.c: Ditto. cipher/mac-internal.h: Ditto. cipher/mac.c: Ditto. cipher/rijndael-amd64.S: Ditto. cipher/rijndael-arm.S: Ditto. cipher/salsa20-amd64.S: Ditto. cipher/salsa20-armv7-neon.S: Ditto. cipher/serpent-armv7-neon.S: Ditto. cipher/serpent-avx2-amd64.S: Ditto. cipher/serpent-sse2-amd64.S: Ditto. -- Avoid use of '©' for easier parsing of source for copyright information. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-11-03Make jump labels local in Salsa20 assemblyJussi Kivilinna1-25/+25
* cipher/salsa20-amd64.S: Rename '._labels' to '.L_labels'. * cipher/salsa20-armv7-neon.S: Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-28Add ARM NEON assembly implementation of SerpentJussi Kivilinna1-1/+1
* cipher/Makefile.am: Add 'serpent-armv7-neon.S'. * cipher/serpent-armv7-neon.S: New. * cipher/serpent.c (USE_NEON): New macro. (serpent_context_t) [USE_NEON]: Add 'use_neon'. [USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec) (_gcry_serpent_neon_cbc_dec): New prototypes. (serpent_setkey_internal) [USE_NEON]: Detect NEON support. (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec) (_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations to process eight blocks in parallel. * configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'. -- Patch adds ARM NEON optimized implementation of Serpent cipher to speed up parallelizable bulk operations. Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz): Old: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 43.53 ns/B 21.91 MiB/s 43.88 c/B CFB dec | 44.77 ns/B 21.30 MiB/s 45.13 c/B CTR enc | 45.21 ns/B 21.10 MiB/s 45.57 c/B CTR dec | 45.21 ns/B 21.09 MiB/s 45.57 c/B New: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 26.26 ns/B 36.32 MiB/s 26.47 c/B CFB dec | 26.21 ns/B 36.38 MiB/s 26.42 c/B CTR enc | 26.20 ns/B 36.40 MiB/s 26.41 c/B CTR dec | 26.20 ns/B 36.40 MiB/s 26.41 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-28Add ARM NEON assembly implementation of Salsa20Jussi Kivilinna1-0/+899
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'. * cipher/salsa20-armv7-neon.S: New. * cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro. (struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t) (salsa20_ivsetup_t): New. (SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'. (SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'. (salsa20_core): Change 'src' argument to 'ctx'. [USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype. [USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon) (salsa20_ivsetup_neon): New. (salsa20_do_setkey): Setup keysetup, ivsetup and core with default functions. (salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect, set keysetup, ivsetup and core with ARM NEON functions. (salsa20_do_setkey): Call 'ctx->keysetup'. (salsa20_setiv): Call 'ctx->ivsetup'. (salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers in ARM NEON implementation. (salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly calling 'salsa20_core'. (selftest): Add test to check large buffer processing and block counter updating. * configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'. -- Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation gains extra speed by processing three blocks in parallel with help of ARM NEON vector processing unit. This implementation is based on public domain code by Peter Schwabe and D. J. Bernstein and it is available in SUPERCOP benchmarking framework. For more details on this work, check paper "NEON crypto" by Daniel J. Bernstein and Peter Schwabe: http://cryptojedi.org/papers/#neoncrypto Benchmark results on Cortex-A8 (1008 Mhz): Before: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B After: SALSA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B = SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>