summaryrefslogtreecommitdiff
path: root/cipher/sha512-armv7-neon.S
AgeCommit message (Collapse)AuthorFilesLines
2014-05-21sha512: fix ARM/NEON implementationJussi Kivilinna1-1/+1
* cipher/sha512-armv7-neon.S (_gcry_sha512_transform_armv7_neon): Byte-swap RW67q and RW1011q correctly in multi-block loop. * tests/basic.c (check_digests): Add large test vector for SHA512. -- Patch fixes bug introduced to multi-block processing by commit df629ba53a6, "Improve performance of SHA-512/ARM/NEON implementation". Patch also adds multi-block test vector for SHA-512. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Change utf-8 copyright characters to '(C)'Jussi Kivilinna1-1/+1
cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to '(C)'. cipher/blowfish-arm.S: Ditto. cipher/bufhelp.h: Ditto. cipher/camellia-aesni-avx-amd64.S: Ditto. cipher/camellia-aesni-avx2-amd64.S: Ditto. cipher/camellia-arm.S: Ditto. cipher/cast5-amd64.S: Ditto. cipher/cast5-arm.S: Ditto. cipher/cipher-ccm.c: Ditto. cipher/cipher-cmac.c: Ditto. cipher/cipher-gcm.c: Ditto. cipher/cipher-selftest.c: Ditto. cipher/cipher-selftest.h: Ditto. cipher/mac-cmac.c: Ditto. cipher/mac-gmac.c: Ditto. cipher/mac-hmac.c: Ditto. cipher/mac-internal.h: Ditto. cipher/mac.c: Ditto. cipher/rijndael-amd64.S: Ditto. cipher/rijndael-arm.S: Ditto. cipher/salsa20-amd64.S: Ditto. cipher/salsa20-armv7-neon.S: Ditto. cipher/serpent-armv7-neon.S: Ditto. cipher/serpent-avx2-amd64.S: Ditto. cipher/serpent-sse2-amd64.S: Ditto. -- Avoid use of '©' for easier parsing of source for copyright information. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-18Improve performance of SHA-512/ARM/NEON implementationJussi Kivilinna1-117/+250
* cipher/sha512-armv7-neon.S (RT01q, RT23q, RT45q, RT67q): New. (round_0_63, round_64_79): Remove. (rounds2_0_63, rounds2_64_79): New. (_gcry_sha512_transform_armv7_neon): Add 'nblks' input; Handle multiple input blocks; Use new round macros. * cipher/sha512.c [USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): Add 'num_blks'. (transform) [USE_ARM_NEON_ASM]: Pass nblks to assembly. -- Benchmarks on ARM Cortex-A8: C-language: 139.1 c/B Old ARM/NEON: 34.30 c/B New ARM/NEON: 24.46 c/B New vs C: 5.68x New vs Old: 1.40x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-08-31sha512: add ARM/NEON assembly version of transform functionJussi Kivilinna1-0/+316
* cipher/Makefile.am: Add 'sha512-armv7-neon.S'. * cipher/sha512-armv7-neon.S: New file. * cipher/sha512.c (USE_ARM_NEON_ASM): New macro. (SHA512_CONTEXT) [USE_ARM_NEON_ASM]: Add 'use_neon'. (sha512_init, sha384_init) [USE_ARM_NEON_ASM]: Enable 'use_neon' if CPU support NEON instructions. (k): Round constant array moved outside of 'transform' function. (__transform): Renamed from 'tranform' function. [USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): New prototype. (transform): New wrapper function for different transform versions. (sha512_write, sha512_final): Burn stack by the amount returned by transform function. * configure.ac (sha512) [neonsupport]: Add 'sha512-armv7-neon.lo'. -- Add NEON assembly for transform function for faster SHA512 on ARM. Major speed up thanks to 64-bit integer registers and large register file that can hold full input buffer. Benchmark results on Cortex-A8, 1Ghz: Old: $ tests/benchmark --hash-repetitions 100 md sha512 sha384 SHA512 17050ms 18780ms 29120ms 18040ms 17190ms SHA384 17130ms 18720ms 29160ms 18090ms 17280ms New: $ tests/benchmark --hash-repetitions 100 md sha512 sha384 SHA512 3600ms 5070ms 15330ms 4510ms 3480ms SHA384 3590ms 5060ms 15350ms 4510ms 3520ms New vs old: SHA512 4.74x 3.70x 1.90x 4.00x 4.94x SHA384 4.77x 3.70x 1.90x 4.01x 4.91x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>