Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/camellia.c: Add include for <config.h> and "types.h".
(u32): Remove.
(u8): Typedef as 'byte'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/blowfish-armv6.S => cipher/blowfish-arm.S: adapt to pre-armv6 CPUs.
* cipher/blowfish.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/camellia-armv6.S => cipher/camellia-arm.S: adapt to pre-armv6 CPUs.
* cipher/camellia.c, cipher-camellia-glue.c: enable assembly on armv4/armv5
little-endian CPUs.
* cipher/cast5-armv6.S => cipher/cast5-arm.S: adapt to pre-armv6 CPUs.
* cipher/cast5.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/rijndael-armv6.S => cipher/rijndael-arm.S: adapt to pre-armv6 CPUs.
* cipher/rijndael.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/twofish-armv6.S => cipher/twofish-arm.S: adapt to pre-armv6 CPUs.
* cipher/twofish.c: enable assembly on armv4/armv5 little-endian CPUs.
--
Our ARMv6 assembly optimized code can be easily adapted to earlier CPUs.
The only incompatible place is rev instruction used to do byte swapping.
Replace it on <= ARMv6 with a series of 4 instructions.
Compare:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 620ms 610ms 650ms 680ms 620ms 630ms 660ms 660ms 630ms 630ms
CAMELLIA128 720ms 720ms 780ms 790ms 770ms 760ms 780ms 780ms 770ms 760ms
CAMELLIA256 910ms 910ms 970ms 970ms 960ms 950ms 970ms 970ms 960ms 950ms
CAST5 820ms 820ms 930ms 920ms 890ms 860ms 930ms 920ms 880ms 890ms
BLOWFISH 550ms 560ms 650ms 660ms 630ms 600ms 660ms 650ms 610ms 620ms
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 130ms 140ms 180ms 200ms 160ms 170ms 190ms 200ms 170ms 170ms
CAMELLIA128 150ms 160ms 210ms 220ms 200ms 190ms 210ms 220ms 190ms 190ms
CAMELLIA256 180ms 180ms 260ms 240ms 240ms 230ms 250ms 250ms 230ms 230ms
CAST5 170ms 160ms 270ms 120ms 240ms 130ms 260ms 270ms 130ms 120ms
BLOWFISH 160ms 150ms 260ms 110ms 230ms 120ms 250ms 260ms 110ms 120ms
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
[ jk: in camellia.h and twofish.c, USE_ARMV6_ASM => USE_ARM_ASM ]
[ jk: fix blowfish-arm.S when __ARM_FEATURE_UNALIGNED defined ]
[ jk: in twofish.S remove defined(HAVE_ARM_ARCH_V6) ]
[ jk: ARMv6 => ARM in comments ]
|
|
* cipher/bithelp.h (bswap32, bswap64, le_bswap32, be_bswap32)
(le_bswap64, be_bswap64): New.
* cipher/bufhelp.h (buf_get_be32, buf_get_le32, buf_put_le32)
(buf_put_be32, buf_get_be64, buf_get_le64, buf_put_be64)
(buf_put_le64): New.
* cipher/blowfish.c (do_encrypt_block, do_decrypt_block): Use new
endian conversion helpers.
(do_bf_setkey): Turn endian specific code to generic.
* cipher/camellia.c (GETU32, PUTU32): Use new endian conversion
helpers.
* cipher/cast5.c (rol): Remove, use rol from bithelp.
(F1, F2, F3): Fix to use rol from bithelp.
(do_encrypt_block, do_decrypt_block, do_cast_setkey): Use new endian
conversion helpers.
* cipher/des.c (READ_64BIT_DATA, WRITE_64BIT_DATA): Ditto.
* cipher/md4.c (transform, md4_final): Ditto.
* cipher/md5.c (transform, md5_final): Ditto.
* cipher/rmd160.c (transform, rmd160_final): Ditto.
* cipher/salsa20.c (LE_SWAP32, LE_READ_UINT32): Ditto.
* cipher/scrypt.c (READ_UINT64, LE_READ_UINT64, LE_SWAP32): Ditto.
* cipher/seed.c (GETU32, PUTU32): Ditto.
* cipher/serpent.c (byte_swap_32): Remove.
(serpent_key_prepare, serpent_encrypt_internal)
(serpent_decrypt_internal): Use new endian conversion helpers.
* cipher/sha1.c (transform, sha1_final): Ditto.
* cipher/sha256.c (transform, sha256_final): Ditto.
* cipher/sha512.c (__transform, sha512_final): Ditto.
* cipher/stribog.c (transform, stribog_final): Ditto.
* cipher/tiger.c (transform, tiger_final): Ditto.
* cipher/twofish.c (INPACK, OUTUNPACK): Ditto.
* cipher/whirlpool.c (buffer_to_block, block_to_buffer): Ditto.
* configure.ac (gcry_cv_have_builtin_bswap32): Check for compiler
provided __builtin_bswap32.
(gcry_cv_have_builtin_bswap64): Check for compiler provided
__builtin_bswap64.
--
Patch add helper functions that provide conversions to/from integers and
buffers of different endianess. Benefits are code cleanup and optimization
for architectures that have byte-swaping instructions and/or can do fast
unaligned memory accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'camellia-armv6.S'.
* cipher/camellia-armv6.S: New file.
* cipher/camellia-glue.c [USE_ARMV6_ASM]
(_gcry_camellia_armv6_encrypt_block)
(_gcry_camellia_armv6_decrypt_block): New prototypes.
[USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock)
(camellia_encrypt, camellia_decrypt): New functions.
* cipher/camellia.c [!USE_ARMV6_ASM]: Compile encryption and decryption
routines if USE_ARMV6_ASM macro is _not_ defined.
* cipher/camellia.h (USE_ARMV6_ASM): New macro.
[!USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock): If
USE_ARMV6_ASM is defined, disable these function prototypes.
(camellia) [arm]: Add 'camellia-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for Camellia. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now. only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 1.44x 1.47x 1.35x 1.34x 1.43x 1.39x 1.38x 1.36x 1.38x 1.39x
CAMELLIA192 1.60x 1.62x 1.52x 1.47x 1.56x 1.54x 1.52x 1.53x 1.52x 1.53x
CAMELLIA256 1.59x 1.60x 1.49x 1.47x 1.53x 1.54x 1.51x 1.50x 1.52x 1.53x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac: Add option --disable-avx-support.
(HAVE_GCC_INLINE_ASM_AVX): New.
(ENABLE_AVX_SUPPORT): New.
(camellia) [ENABLE_AVX_SUPPORT, ENABLE_AESNI_SUPPORT]: Add
camellia_aesni_avx_x86-64.lo.
* cipher/Makefile.am (AM_CCASFLAGS): Add.
(EXTRA_libcipher_la_SOURCES): Add camellia_aesni_avx_x86-64.S
* cipher/camellia-glue.c [ENABLE_AESNI_SUPPORT, ENABLE_AVX_SUPPORT]
[__x86_64__] (USE_AESNI_AVX): Add macro.
(struct Camellia_context) [USE_AESNI_AVX]: Add use_aesni_avx.
[USE_AESNI_AVX] (_gcry_camellia_aesni_avx_ctr_enc)
(_gcry_camellia_aesni_avx_cbc_dec): New prototypes to assembly
functions.
(camellia_setkey) [USE_AESNI_AVX]: Enable AES-NI/AVX if hardware
support both.
(_gcry_camellia_ctr_enc) [USE_AESNI_AVX]: Add AES-NI/AVX code.
(_gcry_camellia_cbc_dec) [USE_AESNI_AVX]: Add AES-NI/AVX code.
* cipher/camellia_aesni_avx_x86-64.S: New.
* src/g10lib.h (HWF_INTEL_AVX): New.
* src/global.c (hwflist): Add HWF_INTEL_AVX.
* src/hwf-x86.c (detect_x86_gnuc) [ENABLE_AVX_SUPPORT]: Add detection
for AVX.
--
Before:
Running each test 250 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 2210ms 2200ms 2300ms 2050ms 2240ms 2250ms 2290ms 2270ms 2070ms 2070ms
CAMELLIA256 2810ms 2800ms 2920ms 2670ms 2840ms 2850ms 2910ms 2890ms 2660ms 2640ms
After:
Running each test 250 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 2200ms 2220ms 2290ms 470ms 2240ms 2270ms 2270ms 2290ms 480ms 480ms
CAMELLIA256 2820ms 2820ms 2900ms 600ms 2860ms 2860ms 2900ms 2920ms 620ms 620ms
AES-NI/AVX implementation works by processing 16 parallel blocks (256 bytes).
It's bytesliced implementation that uses AES-NI (Subbyte) for Camellia sboxes,
with help of prefiltering/postfiltering. For smaller data sets generic C
implementation is used.
Speed-up for CBC-decryption and CTR-mode (large data): 4.3x
Tests were run on: Intel Core i5-2450M
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
(license boiler plate update by wk)
|
|
* cipher/camellia-glue.c (CAMELLIA_encrypt_stack_burn_size)
(CAMELLIA_decrypt_stack_burn_size): Increase stack burn size.
* cipher/camellia.c (CAMELLIA_ROUNDSM): Move key-material mixing in
the front.
(camellia_setup128, camellia_setup256): Remove now unneeded
key-material mangling.
(camellia_encrypt128, camellia_decrypt128, amellia_encrypt256)
(camellia_decrypt256): Copy block to stack, so that compiler can
optimize it for register usage.
--
Camellia implementation needs to be modified slightly for compatibility with
AES-NI/AVX version.
Before:
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 800ms 790ms 840ms 730ms 810ms 800ms 820ms 820ms 730ms 740ms
CAMELLIA192 1040ms 1040ms 1030ms 930ms 1000ms 1000ms 1020ms 1020ms 940ms 930ms
CAMELLIA256 1000ms 980ms 1040ms 930ms 1010ms 990ms 1040ms 1040ms 940ms 930ms
After:
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 780ms 750ms 810ms 690ms 780ms 770ms 810ms 810ms 700ms 690ms
CAMELLIA192 1020ms 990ms 1000ms 890ms 970ms 970ms 990ms 1000ms 890ms 900ms
CAMELLIA256 960ms 960ms 1000ms 900ms 970ms 970ms 990ms 1010ms 900ms 890ms
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
Check and install the standard git pre-commit hook.
|
|
|
|
|