Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.
--
Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.
This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:
http://cryptojedi.org/papers/#neoncrypto
Benchmark results on Cortex-A8 (1008 Mhz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B
STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B
STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B
STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B
STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'salsa20-amd64.S'.
* cipher/salsa20-amd64.S: New.
* cipher/salsa20.c (USE_AMD64): New macro.
[USE_AMD64] (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup)
(_gcry_salsa20_amd64_encrypt_blocks): New prototypes.
[USE_AMD64] (salsa20_keysetup, salsa20_ivsetup, salsa20_core): New.
[!USE_AMD64] (salsa20_core): Change 'src' to non-constant, update block
counter in 'salsa20_core' and return burn stack depth.
[!USE_AMD64] (salsa20_keysetup, salsa20_ivsetup): New.
(salsa20_do_setkey): Move generic key setup to 'salsa20_keysetup'.
(salsa20_setkey): Fix burn stack depth.
(salsa20_setiv): Move generic IV setup to 'salsa20_ivsetup'.
(salsa20_do_encrypt_stream) [USE_AMD64]: Process large buffers in AMD64
implementation.
(salsa20_do_encrypt_stream): Move stack burning to this function...
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): ...from these
functions.
* configure.ac [x86-64]: Add 'salsa20-amd64.lo'.
--
Patch adds fast AMD64 assembly implementation for Salsa20. This implementation
is based on public domain code by D. J. Bernstein and it is available at
http://cr.yp.to/snuffle.html (amd64-xmm6). Implementation gains extra speed
by processing four blocks in parallel with help SSE2 instructions.
Benchmark results on Intel Core i5-4570 (3.2 Ghz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.88 ns/B 246.0 MiB/s 12.41 c/B
STREAM dec | 3.88 ns/B 246.0 MiB/s 12.41 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 2.46 ns/B 387.9 MiB/s 7.87 c/B
STREAM dec | 2.46 ns/B 387.7 MiB/s 7.87 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.985 ns/B 967.8 MiB/s 3.15 c/B
STREAM dec | 0.987 ns/B 966.5 MiB/s 3.16 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.636 ns/B 1500.5 MiB/s 2.03 c/B
STREAM dec | 0.636 ns/B 1499.2 MiB/s 2.04 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/blowfish-armv6.S => cipher/blowfish-arm.S: adapt to pre-armv6 CPUs.
* cipher/blowfish.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/camellia-armv6.S => cipher/camellia-arm.S: adapt to pre-armv6 CPUs.
* cipher/camellia.c, cipher-camellia-glue.c: enable assembly on armv4/armv5
little-endian CPUs.
* cipher/cast5-armv6.S => cipher/cast5-arm.S: adapt to pre-armv6 CPUs.
* cipher/cast5.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/rijndael-armv6.S => cipher/rijndael-arm.S: adapt to pre-armv6 CPUs.
* cipher/rijndael.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/twofish-armv6.S => cipher/twofish-arm.S: adapt to pre-armv6 CPUs.
* cipher/twofish.c: enable assembly on armv4/armv5 little-endian CPUs.
--
Our ARMv6 assembly optimized code can be easily adapted to earlier CPUs.
The only incompatible place is rev instruction used to do byte swapping.
Replace it on <= ARMv6 with a series of 4 instructions.
Compare:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 620ms 610ms 650ms 680ms 620ms 630ms 660ms 660ms 630ms 630ms
CAMELLIA128 720ms 720ms 780ms 790ms 770ms 760ms 780ms 780ms 770ms 760ms
CAMELLIA256 910ms 910ms 970ms 970ms 960ms 950ms 970ms 970ms 960ms 950ms
CAST5 820ms 820ms 930ms 920ms 890ms 860ms 930ms 920ms 880ms 890ms
BLOWFISH 550ms 560ms 650ms 660ms 630ms 600ms 660ms 650ms 610ms 620ms
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 130ms 140ms 180ms 200ms 160ms 170ms 190ms 200ms 170ms 170ms
CAMELLIA128 150ms 160ms 210ms 220ms 200ms 190ms 210ms 220ms 190ms 190ms
CAMELLIA256 180ms 180ms 260ms 240ms 240ms 230ms 250ms 250ms 230ms 230ms
CAST5 170ms 160ms 270ms 120ms 240ms 130ms 260ms 270ms 130ms 120ms
BLOWFISH 160ms 150ms 260ms 110ms 230ms 120ms 250ms 260ms 110ms 120ms
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
[ jk: in camellia.h and twofish.c, USE_ARMV6_ASM => USE_ARM_ASM ]
[ jk: fix blowfish-arm.S when __ARM_FEATURE_UNALIGNED defined ]
[ jk: in twofish.S remove defined(HAVE_ARM_ARCH_V6) ]
[ jk: ARMv6 => ARM in comments ]
|
|
* configure.ac: correct HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS test to
require neither ARMv6, nor thumb mode. Our assembly code works
perfectly even on ARMv4 now.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/ecc-ecdsa.c, cipher/ecc-eddsa.c, cipher/ecc-gost.c: New.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files.
* configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new files.
* cipher/ecc.c (point_init, point_free): Move to ecc-common.h.
(sign_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_sign.
(verify_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_verify.
(sign_gost): Move to ecc-gots.c as _gcry_ecc_gost_sign.
(verify_gost): Move to ecc-gost.c as _gcry_ecc_gost_verify.
(sign_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_sign.
(verify_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_verify.
(eddsa_generate_key): Move to ecc-eddsa.c as _gcry_ecc_eddsa_genkey.
(reverse_buffer): Move to ecc-eddsa.c.
(eddsa_encodempi, eddsa_encode_x_y): Ditto.
(_gcry_ecc_eddsa_encodepoint, _gcry_ecc_eddsa_decodepoint): Ditto.
--
This change should make it easier to add new ECC algorithms.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/Makefile.am: Add 'twofish-armv6.S'.
* cipher/twofish-armv6.S: New.
* cipher/twofish.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_twofish_armv6_encrypt_block)
(_gcry_twofish_armv6_decrypt_block): New prototypes.
[USE_AMDV6_ASM] (twofish_encrypt, twofish_decrypt): Add.
[USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt): Remove.
(_gcry_twofish_ctr_enc, _gcry_twofish_cfb_dec): Use 'twofish_encrypt'
instead of 'do_twofish_encrypt'.
(_gcry_twofish_cbc_dec): Use 'twofish_decrypt' instead of
'do_twofish_decrypt'.
* configure.ac [arm]: Add 'twofish-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for Twofish. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old (gcc-4.8) vs new (twofish-asm), Cortex-A8 (on armhf):
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
TWOFISH 1.23x 1.25x 1.16x 1.26x 1.16x 1.30x 1.18x 1.17x 1.23x 1.23x 1.22x 1.22x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/serpent-avx2-amd64.S: Remove use of GAS macros.
* cipher/serpent-sse2-amd64.S: Ditto.
* configure.ac [HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Do not check
for GAS macros.
--
This way we have better portability; for example, when compiling with clang
on x86-64, the assembly implementations are now enabled and working.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac: New check, HAVE_GCC_ASM_VOLATILE_MEMORY.
* src/g10lib.h (_gcry_burn_stack): Rename to __gcry_burn_stack.
(__gcry_burn_stack_dummy): New.
(_gcry_burn_stack): New macro.
* src/misc.c (_gcry_burn_stack): Rename to __gcry_burn_stack.
(__gcry_burn_stack_dummy): New.
--
Tail call optimization can turn _gcry_burn_stack call in to tail jump. When
this happens, stack pointer is restored to initial state of current function.
This causes problem for _gcry_burn_stack because its callers do not count in
current function stack depth.
One solution is to prevent gcry_burn_stack being tail optimized by inserting
dummy function call behind it. Another would be to add memory barrier 'asm
volatile("":::"memory")' behind every _gcry_burn_stack call. This however
requires GCC asm support from compiler.
Patch adds detection for memory barrier support and when available uses
memory barrier to prevent when tail call optimization. If not available
dummy function call is used instead.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/bithelp.h (bswap32, bswap64, le_bswap32, be_bswap32)
(le_bswap64, be_bswap64): New.
* cipher/bufhelp.h (buf_get_be32, buf_get_le32, buf_put_le32)
(buf_put_be32, buf_get_be64, buf_get_le64, buf_put_be64)
(buf_put_le64): New.
* cipher/blowfish.c (do_encrypt_block, do_decrypt_block): Use new
endian conversion helpers.
(do_bf_setkey): Turn endian specific code to generic.
* cipher/camellia.c (GETU32, PUTU32): Use new endian conversion
helpers.
* cipher/cast5.c (rol): Remove, use rol from bithelp.
(F1, F2, F3): Fix to use rol from bithelp.
(do_encrypt_block, do_decrypt_block, do_cast_setkey): Use new endian
conversion helpers.
* cipher/des.c (READ_64BIT_DATA, WRITE_64BIT_DATA): Ditto.
* cipher/md4.c (transform, md4_final): Ditto.
* cipher/md5.c (transform, md5_final): Ditto.
* cipher/rmd160.c (transform, rmd160_final): Ditto.
* cipher/salsa20.c (LE_SWAP32, LE_READ_UINT32): Ditto.
* cipher/scrypt.c (READ_UINT64, LE_READ_UINT64, LE_SWAP32): Ditto.
* cipher/seed.c (GETU32, PUTU32): Ditto.
* cipher/serpent.c (byte_swap_32): Remove.
(serpent_key_prepare, serpent_encrypt_internal)
(serpent_decrypt_internal): Use new endian conversion helpers.
* cipher/sha1.c (transform, sha1_final): Ditto.
* cipher/sha256.c (transform, sha256_final): Ditto.
* cipher/sha512.c (__transform, sha512_final): Ditto.
* cipher/stribog.c (transform, stribog_final): Ditto.
* cipher/tiger.c (transform, tiger_final): Ditto.
* cipher/twofish.c (INPACK, OUTUNPACK): Ditto.
* cipher/whirlpool.c (buffer_to_block, block_to_buffer): Ditto.
* configure.ac (gcry_cv_have_builtin_bswap32): Check for compiler
provided __builtin_bswap32.
(gcry_cv_have_builtin_bswap64): Check for compiler provided
__builtin_bswap64.
--
Patch add helper functions that provide conversions to/from integers and
buffers of different endianess. Benefits are code cleanup and optimization
for architectures that have byte-swaping instructions and/or can do fast
unaligned memory accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_12_256)
(GCRY_MD_GOSTR3411_12_512): New.
* cipher/stribog.c: New.
* configure.ac (available_digests_64): Add stribog.
* src/cipher.h: Declare Stribog declarations.
* cipher/md.c: Register Stribog digest.
* tests/basic.c (check_digests) Add 4 testcases for Stribog from
standard.
* doc/gcrypt.texi: Document new constants.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_94): New.
* cipher/gostr3411-94.c: New.
* configure.ac (available_digests): Add gostr3411-94.
* src/cipher.h: Add gostr3411-94 definitions.
* cipher/md.c: Register GOST R 34.11-94.
* tests/basic.c (check_digests): Add 4 tests for GOST R 34.11-94
hash algo. Two are defined in the standard itself, two other are
more or less common tests - an empty string an exclamation mark.
* doc/gcrypt.texi: Add an entry describing GOST R 34.11-94 to the MD
algorithms table.
--
Add simple implementation of GOST R 34.11-94 hash function. Currently
there is no way to specify hash parameters (it always uses GOST R 34.11-94
test parameters).
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Stack burn value in gost3411_init added by wk.
|
|
* src/gcrypt.h.in (GCRY_CIPHER_GOST28147): New.
* cipher/gost.h, cipher/gost28147.c: New.
* configure.ac (available_ciphers): Add gost28147.
* src/cipher.h: Add gost28147 definitions.
* cipher/cipher.c: Register gost28147.
* tests/basic.c (check_ciphers): Enable simple test for gost28147.
* doc/gcrypt.texi: document GCRY_CIPHER_GOST28147.
--
Add a very basic implementation of GOST 28147-89 cipher: from modes
defined in standard only ECB and CFB are supported, sbox is limited
to the "test variant" as provided in GOST 34.11-94.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* configure.ac: Implement new disable flag.
--
Doing a static build of Libgcrypt currently throws an as error on my
box. Adding this configure option as a workaround
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac (HAVE_VLA): Add check.
* src/misc.c (_gcry_burn_stack) [HAVE_VLA]: Add VLA code.
--
Some gcc versions convert _gcry_burn_stack into loop that overwrites the same
64-byte stack buffer instead of burn stack deeper. It's argued at GCC bugzilla
that _gcry_burn_stack is doing wrong thing here [1] and that this kind of
optimization is allowed.
So lets fix _gcry_burn_stack by using variable length array when VLAs are
supported by compiler. This should ensure proper stack burning to the requested
depth and avoid GCC loop optimizations.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-armv7-neon.S'.
* cipher/sha512-armv7-neon.S: New file.
* cipher/sha512.c (USE_ARM_NEON_ASM): New macro.
(SHA512_CONTEXT) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(sha512_init, sha384_init) [USE_ARM_NEON_ASM]: Enable 'use_neon' if
CPU support NEON instructions.
(k): Round constant array moved outside of 'transform' function.
(__transform): Renamed from 'tranform' function.
[USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): New prototype.
(transform): New wrapper function for different transform versions.
(sha512_write, sha512_final): Burn stack by the amount returned by
transform function.
* configure.ac (sha512) [neonsupport]: Add 'sha512-armv7-neon.lo'.
--
Add NEON assembly for transform function for faster SHA512 on ARM. Major speed
up thanks to 64-bit integer registers and large register file that can hold
full input buffer.
Benchmark results on Cortex-A8, 1Ghz:
Old:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 17050ms 18780ms 29120ms 18040ms 17190ms
SHA384 17130ms 18720ms 29160ms 18090ms 17280ms
New:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 3600ms 5070ms 15330ms 4510ms 3480ms
SHA384 3590ms 5060ms 15350ms 4510ms 3520ms
New vs old:
SHA512 4.74x 3.70x 1.90x 4.00x 4.94x
SHA384 4.77x 3.70x 1.90x 4.01x 4.91x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac: Add option --disable-neon-support.
(HAVE_GCC_INLINE_ASM_NEON): New.
(ENABLE_NEON_SUPPORT): New.
[arm]: Add 'hwf-arm.lo' as HW feature module.
* src/Makefile.am: Add 'hwf-arm.c'.
* src/g10lib.h (HWF_ARM_NEON): New macro.
* src/global.c (hwflist): Add HWF_ARM_NEON entry.
* src/hwf-arm.c: New file.
* src/hwf-common.h (_gcry_hwf_detect_arm): New prototype.
* src/hwfeatures.c (_gcry_detect_hw_features) [HAVE_CPU_ARCH_ARM]: Add
call to _gcry_hwf_detect_arm.
--
Add HW detection module for detecting ARM NEON instruction set. ARM does not
have cpuid instruction so we have to rely on OS to pass feature set information
to user-space. For linux, NEON support can be detected by parsing
'/proc/self/auxv' for hardware capabilities information. For other OSes, NEON
can be detected by checking if platform/compiler only supports NEON capable
CPUs (by check if __ARM_NEON__ macro is defined).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/ecc-common.h, cipher/ecc-curves.c, cipher/ecc-misc.c: New.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files.
* configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new .c files.
* cipher/ecc.c (curve_aliases, ecc_domain_parms_t, domain_parms)
(scanval): Move to ecc-curves.c.
(fill_in_curve): Move to ecc-curve.c as _gcry_ecc_fill_in_curve.
(ecc_get_curve): Move to ecc-curve.c as _gcry_ecc_get_curve.
(_gcry_mpi_ec_ec2os): Move to ecc-misc.c.
(ec2os): Move to ecc-misc.c as _gcry_ecc_ec2os.
(os2ec): Move to ecc-misc.c as _gcry_ecc_os2ec.
(point_set): Move as inline function to ecc-common.h.
(_gcry_ecc_curve_free): Move to ecc-misc.c as _gcry_ecc_curve_free.
(_gcry_ecc_curve_copy): Move to ecc-misc.c as _gcry_ecc_curve_copy.
(mpi_from_keyparam, point_from_keyparam): Move to ecc-curves.c.
(_gcry_mpi_ec_new): Move to ecc-curves.c.
(ecc_get_param): Move to ecc-curves.c as _gcry_ecc_get_param.
(ecc_get_param_sexp): Move to ecc-curves.c as _gcry_ecc_get_param_sexp.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/blowfish-armv6.S: Replace __ARM_ARCH >= 6 checks with
HAVE_ARM_ARCH_V6.
* cipher/blowfish.c: Ditto.
* cipher/camellia-armv6.S: Ditto.
* cipher/camellia.h: Ditto.
* cipher/cast5-armv6.S: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael-armv6.S: Ditto.
* cipher/rijndael.c: Ditto.
* configure.ac: Add HAVE_ARM_ARCH_V6 check.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'camellia-armv6.S'.
* cipher/camellia-armv6.S: New file.
* cipher/camellia-glue.c [USE_ARMV6_ASM]
(_gcry_camellia_armv6_encrypt_block)
(_gcry_camellia_armv6_decrypt_block): New prototypes.
[USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock)
(camellia_encrypt, camellia_decrypt): New functions.
* cipher/camellia.c [!USE_ARMV6_ASM]: Compile encryption and decryption
routines if USE_ARMV6_ASM macro is _not_ defined.
* cipher/camellia.h (USE_ARMV6_ASM): New macro.
[!USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock): If
USE_ARMV6_ASM is defined, disable these function prototypes.
(camellia) [arm]: Add 'camellia-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for Camellia. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now. only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 1.44x 1.47x 1.35x 1.34x 1.43x 1.39x 1.38x 1.36x 1.38x 1.39x
CAMELLIA192 1.60x 1.62x 1.52x 1.47x 1.56x 1.54x 1.52x 1.53x 1.52x 1.53x
CAMELLIA256 1.59x 1.60x 1.49x 1.47x 1.53x 1.54x 1.51x 1.50x 1.52x 1.53x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'blowfish-armv6.S'.
* cipher/blowfish-armv6.S: New file.
* cipher/blowfish.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_blowfish_armv6_do_encrypt)
(_gcry_blowfish_armv6_encrypt_block)
(_gcry_blowfish_armv6_decrypt_block, _gcry_blowfish_armv6_ctr_enc)
(_gcry_blowfish_armv6_cbc_dec, _gcry_blowfish_armv6_cfb_dec): New
prototypes.
[USE_ARMV6_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block)
(encrypt_block, decrypt_block): New functions.
(_gcry_blowfish_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
* configure.ac (blowfish) [arm]: Add 'blowfish-armv6.lo'.
--
Patch provides non-parallel implementations for small speed-up and 2-way
parallel implementations that gets accelerated on multi-issue CPUs (hand-tuned
for in-order dual-issue Cortex-A8). Unaligned access handling is done in
assembly.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new (Cortex-A8, Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.28x 1.16x 1.21x 2.16x 1.26x 1.86x 1.21x 1.25x 1.89x 1.96x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cast5-armv6.S'.
* cipher/cast5-armv6.S: New file.
* cipher/cast5.c (USE_ARMV6_ASM): New macro.
(CAST5_context) [USE_ARMV6_ASM]: New members 'Kr_arm_enc' and
'Kr_arm_dec'.
[USE_ARMV6_ASM] (_gcry_cast5_armv6_encrypt_block)
(_gcry_cast5_armv6_decrypt_block, _gcry_cast5_armv6_ctr_enc)
(_gcry_cast5_armv6_cbc_dec, _gcry_cast5_armv6_cfb_dec): New prototypes.
[USE_ARMV6_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block)
(decrypt_block): New functions.
(_gcry_cast5_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_cast_setkey) [USE_ARMV6_ASM]: Initialize 'Kr_arm_enc' and
'Kr_arm_dec'.
* configure.ac (cast5) [arm]: Add 'cast5-armv6.lo'.
--
Provides non-parallel implementations for small speed-up and 2-way parallel
implementations that gets accelerated on multi-issue CPUs (hand-tuned for
in-order dual-issue Cortex-A8). Unaligned access handling is done in assembly.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new (Cortex-A8, Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAST5 1.15x 1.12x 1.12x 2.07x 1.14x 1.60x 1.12x 1.13x 1.62x 1.63x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-armv6.S'.
* cipher/rijndael-armv6.S: New file.
* cipher/rijndael.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_aes_armv6_encrypt_block)
(_gcry_aes_armv6_decrypt_block): New prototypes.
(do_encrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_encrypt): Disable input/output alignment when USE_ARMV6_ASM.
(do_decrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_decrypt): Disable input/output alignment when USE_ARMV6_ASM.
* configure.ac (HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS): New check for
gcc/as compatibility with ARM assembly implementations.
(aes) [arm]: Add 'rijndael-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for AES. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 2.61x 3.12x 2.16x 2.59x 2.26x 2.25x 2.08x 2.08x 2.23x 2.23x
AES192 2.60x 3.06x 2.18x 2.65x 2.29x 2.29x 2.12x 2.12x 2.25x 2.27x
AES256 2.62x 3.09x 2.24x 2.72x 2.30x 2.34x 2.17x 2.19x 2.32x 2.32x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_CIPHER_SALSA20): New.
* cipher/salsa20.c: New.
* configure.ac (available_ciphers): Add Salsa20.
* cipher/cipher.c: Register Salsa20.
(cipher_setiv): Allow to divert an IV to a cipher module.
* src/cipher-proto.h (cipher_setiv_func_t): New.
(cipher_extra_spec): Add field setiv.
* src/cipher.h: Declare Salsa20 definitions.
* tests/basic.c (check_stream_cipher): New.
(check_stream_cipher_large_block): New.
(check_cipher_modes): Run new test functions.
(check_ciphers): Add simple test for Salsa20.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac (AH_BOTTOM): Move GPG_ERR_ replacement defines to ...
* src/gcrypt-int.h: new file.
* src/visibility.h, src/cipher.h: Replace gcrypt.h by gcrypt-int.h.
* tests/: Ditto for all test files.
--
Defining newer gpg-error codes in config.h was not a good idea,
because config.h is usually included before gpg-error.h and thus
gpg-error.h would be double defines to lead to faulty code there like
typedef enum
{
[...]
191 = 191,
[...]
};
|
|
* cipher/blowfish-amd64.S: Enable only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined.
* cipher/camellia-aesni-avx-amd64.S: Ditto.
* cipher/camellia-aesni-avx2-amd64.S: Ditto.
* cipher/cast5-amd64.S: Ditto.
* cipher/rinjdael-amd64.S: Ditto.
* cipher/serpent-avx2-amd64.S: Ditto.
* cipher/serpent-sse2-amd64.S: Ditto.
* cipher/twofish-amd64.S: Ditto.
* cipher/blowfish.c: Use AMD64 assembly implementation only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
* configure.ac: Check gcc/as compatibility with AMD64 assembly
implementations.
--
Later these checks can be split and assembly implementations adapted to handle
different platforms, but for now disable AMD64 assembly implementations if
assembler does not look to be able to handle them.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'camellia-aesni-avx2-amd64.S'.
* cipher/camellia-aesni-avx2-amd64.S: New file.
* cipher/camellia-glue.c (USE_AESNI_AVX2): New macro.
(CAMELLIA_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'.
[USE_AESNI_AVX2] (_gcry_camellia_aesni_avx2_ctr_enc)
(_gcry_camellia_aesni_avx2_cbc_dec)
(_gcry_camellia_aesni_avx2_cfb_dec): New prototypes.
(camellia_setkey) [USE_AESNI_AVX2]: Check AVX2+AES-NI capable hardware
and set 'ctx->use_aesni_avx2'.
(_gcry_camellia_ctr_enc) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cbc_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cfb_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths get tested.
* configure.ac (camellia) [avx2support, aesnisupport]: Add
'camellia-aesni-avx2-amd64.lo'.
--
Add new AVX2/AES-NI implementation of Camellia that processes 32 blocks in
parallel.
Speed old (AVX/AES-NI) vs. new (AVX2/AES-NI) on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 1.00x 0.99x 1.00x 1.53x 1.00x 1.49x 1.00x 1.00x 1.54x 1.54x
CAMELLIA256 0.99x 1.00x 1.00x 1.50x 1.00x 1.50x 1.00x 1.00x 1.54x 1.52x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'.
* cipher/serpent-avx2-amd64.S: New file.
* cipher/serpent.c (USE_AVX2): New macro.
(serpent_context_t) [USE_AVX2]: Add 'use_avx2'.
[USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec)
(_gcry_serpent_avx2_cfb_dec): New prototypes.
(serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware
and set 'use_avx2'.
(_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths are tested.
* configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'.
--
Add new AVX2 implementation of Serpent that processes 16 blocks in parallel.
Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 1.00x 1.00x 2.10x 1.00x 2.16x 1.01x 1.00x 2.16x 2.18x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac: Add option --disable-avx2-support.
(HAVE_GCC_INLINE_ASM_AVX2): New.
(ENABLE_AVX2_SUPPORT): New.
* src/g10lib.h (HWF_INTEL_AVX2): New.
* src/global.c (hwflist): Add HWF_INTEL_AVX2.
* src/hwf-x86.c [__i386__] (get_cpuid): Initialize registers to zero
before cpuid.
[__x86_64__] (get_cpuid): Initialize registers to zero before cpuid.
(detect_x86_gnuc): Store maximum cpuid level.
(detect_x86_gnuc) [ENABLE_AVX2_SUPPORT]: Add detection for AVX2.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'twofish-amd64.S'.
* cipher/twofish-amd64.S: New file.
* cipher/twofish.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_twofish_amd64_encrypt_block)
(_gcry_twofish_amd64_decrypt_block, _gcry_twofish_amd64_ctr_enc)
(_gcry_twofish_amd64_cbc_dec, _gcry_twofish_amd64_cfb_dec): New
prototypes.
[USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt)
(twofish_encrypt, twofish_decrypt): New functions.
(_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec, _gcry_twofish_cfb_dec)
(selftest_ctr, selftest_cbc, selftest_cfb): New functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_TWOFISH]: Register Twofish
bulk functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (twofish) [x86_64]: Add 'twofish-amd64.lo'.
* src/cipher.h (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(gcry_twofish_cfb_dec): New prototypes.
--
Provides non-parallel implementations for small speed-up and 3-way parallel
implementations that gets accelerated on `out-of-order' CPUs.
Speed old vs. new on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
TWOFISH128 1.08x 1.07x 1.10x 1.80x 1.09x 1.70x 1.08x 1.08x 1.70x 1.69x
Speed old vs. new on Intel Core2 T8100:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
TWOFISH128 1.11x 1.10x 1.13x 1.65x 1.13x 1.62x 1.12x 1.11x 1.63x 1.59x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-amd64.S'.
* cipher/rijndael-amd64.S: New file.
* cipher/rijndael.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): New prototypes.
(do_encrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function.
(do_encrypt): Disable input/output alignment when USE_AMD64_ASM is set.
(do_decrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function.
(do_decrypt): Disable input/output alignment when USE_AMD64_AES is set.
* configure.ac (aes) [x86-64]: Add 'rijndael-amd64.lo'.
--
Add optimized amd64 assembly implementation for AES.
Old vs new, on AMD Phenom II:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 1.74x 1.72x 1.81x 1.85x 1.82x 1.76x 1.67x 1.64x 1.79x 1.81x
AES192 1.77x 1.77x 1.79x 1.88x 1.90x 1.80x 1.69x 1.69x 1.85x 1.81x
AES256 1.79x 1.81x 1.83x 1.89x 1.88x 1.82x 1.72x 1.70x 1.87x 1.89x
Old vs new, on Intel Core2:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 1.77x 1.75x 1.78x 1.76x 1.76x 1.77x 1.75x 1.76x 1.76x 1.82x
AES192 1.80x 1.73x 1.81x 1.76x 1.79x 1.85x 1.77x 1.76x 1.80x 1.85x
AES256 1.81x 1.77x 1.81x 1.77x 1.80x 1.79x 1.78x 1.77x 1.81x 1.85x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'blowfish-amd64.S'.
* cipher/blowfish-amd64.S: New file.
* cipher/blowfish.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_blowfish_amd64_do_encrypt)
(_gcry_blowfish_amd64_encrypt_block)
(_gcry_blowfish_amd64_decrypt_block, _gcry_blowfish_amd64_ctr_enc)
(_gcry_blowfish_amd64_cbc_dec, _gcry_blowfish_amd64_cfb_dec): New
prototypes.
[USE_AMD64_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block)
(encrypt_block, decrypt_block): New functions.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec, selftest_ctr, selftest_cbc, selftest_cfb): New
functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_BLOWFISH]: Register Blowfish
bulk functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (blowfish) [x86_64]: Add 'blowfish-amd64.lo'.
* src/cipher.h (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(gcry_blowfish_cfb_dec): New prototypes.
--
Add non-parallel functions for small speed-up and 4-way parallel functions for
modes of operation that support parallel processing.
Speed old vs. new on AMD Phenom II X6 1055T:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.21x 1.12x 1.17x 3.52x 1.18x 3.34x 1.16x 1.15x 3.38x 3.47x
Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.16x 1.10x 1.17x 2.98x 1.18x 2.88x 1.16x 1.15x 3.00x 3.02x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cast5-amd64.S'.
* cipher/cast5-amd64.S: New file.
* cipher/cast5.c (USE_AMD64_ASM): New macro.
(_gcry_cast5_s1tos4): Merge arrays s1, s2, s3, s4 to single array to
simplify access from assembly implementation.
(s1, s2, s3, s4): New macros pointing to subarrays in
_gcry_cast5_s1tos4.
[USE_AMD64_ASM] (_gcry_cast5_amd64_encrypt_block)
(_gcry_cast5_amd64_decrypt_block, _gcry_cast5_amd64_ctr_enc)
(_gcry_cast5_amd64_cbc_dec, _gcry_cast5_amd64_cfb_dec): New prototypes.
[USE_AMD64_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block)
(decrypt_block): New functions.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec)
(selftest_ctr, selftest_cbc, selftest_cfb): New functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_CAST5]: Register CAST5 bulk
functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (cast5) [x86_64]: Add 'cast5-amd64.lo'.
* src/cipher.h (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec)
(gcry_cast5_cfb_dec): New prototypes.
--
Provides non-parallel implementations for small speed-up and 4-way parallel
implementations that gets accelerated on `out-of-order' CPUs.
Speed old vs. new on AMD Phenom II X6 1055T:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAST5 1.23x 1.22x 1.21x 2.86x 1.21x 2.83x 1.22x 1.17x 2.73x 2.73x
Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAST5 1.00x 1.04x 1.06x 2.56x 1.06x 2.37x 1.03x 1.01x 2.43x 2.41x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac (serpent): Add 'serpent-sse2-amd64.lo'.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add
'serpent-sse2-amd64.S'.
* cipher/cipher.c (gcry_cipher_open) [USE_SERPENT]: Register bulk
functions for CBC-decryption and CTR-mode.
* cipher/serpent.c (USE_SSE2): New macro.
[USE_SSE2] (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec):
New prototypes to assembler functions.
(serpent_setkey): Set 'serpent_init_done' before calling serpent_test.
(_gcry_serpent_ctr_enc): New function.
(_gcry_serpent_cbc_dec): New function.
(selftest_ctr_128): New function.
(selftest_cbc_128): New function.
(selftest): Call selftest_ctr_128 and selftest_cbc_128.
* cipher/serpent-sse2-amd64.S: New file.
* src/cipher.h (_gcry_serpent_ctr_enc): New prototype.
(_gcry_serpent_cbc_dec): New prototype.
--
[v2]: Converted to SSE2, to support all amd64 processors (SSE2 is required
feature by AMD64 SysV ABI).
Patch adds word-sliced SSE2 implementation of Serpent for amd64 for speeding
up parallelizable workloads (CTR mode, CBC mode decryption). Implementation
processes eight blocks in parallel, with two four-block sets interleaved for
out-of-order scheduling.
Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 0.99x 1.00x 3.98x 1.00x 1.01x 1.00x 1.01x 4.04x 4.04x
Speed old vs. new on AMD Phenom II X6 1055T:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.02x 1.01x 1.00x 2.83x 1.00x 1.00x 1.00x 1.00x 2.72x 2.72x
Speed old vs. new on Intel Core2 Duo T8100:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 1.02x 0.97x 4.02x 0.98x 1.01x 0.98x 1.00x 3.82x 3.91x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/camellia_aesni_avx_x86-64.S: Remove.
* cipher/camellia-aesni-avx-amd64.S: New.
* cipher/Makefile.am: Use the new filename.
* configure.ac: Use the new filename.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_PK_GET_PUBKEY): New.
(GCRY_PK_GET_SECKEY): New.
(gcry_pubkey_get_sexp): New.
* src/visibility.c (gcry_pubkey_get_sexp): New.
* src/visibility.h (gcry_pubkey_get_sexp): Mark visible.
* src/libgcrypt.def, src/libgcrypt.vers: Add new function.
* cipher/pubkey-internal.h: New.
* cipher/Makefile.am (libcipher_la_SOURCES): Add new file.
* cipher/ecc.c: Include pubkey-internal.h
(_gcry_pk_ecc_get_sexp): New.
* cipher/pubkey.c: Include pubkey-internal.h and context.h.
(_gcry_pubkey_get_sexp): New.
* src/context.c (_gcry_ctx_find_pointer): New.
* src/cipher-proto.h: Add _gcry_pubkey_get_sexp.
* tests/t-mpi-point.c (print_sexp): New.
(context_param, basic_ec_math_simplified): Add tests for the new
function.
* configure.ac (NEED_GPG_ERROR_VERSION): Set to 1.11.
(AH_BOTTOM) Add error codes from gpg-error 1.12
* src/g10lib.h (fips_not_operational): Use GPG_ERR_NOT_OPERATIONAL.
* mpi/ec.c (_gcry_mpi_ec_get_mpi): Fix computation of Q.
(_gcry_mpi_ec_get_point): Ditto.
--
While checking the new code I figured that the auto-computation of Q
must have led to a segv. It seems we had no test case for that.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* tests/t-kdf.c (check_scrypt): New.
(main): Call new test.
* configure.ac: Support disabling of the scrypt algorithm. Make KDF
enabling similar to the other algorithm classes. Disable scrypt if we
don't have a 64 bit type.
* cipher/memxor.c, cipher/memxor.h: Remove.
* cipher/scrypt.h: Remove.
* cipher/kdf-internal.h: New.
* cipher/Makefile.am: Remove files. Add new file. Move scrypt.c to
EXTRA_libcipher_la_SOURCES.
(GCRYPT_MODULES): Add GCRYPT_KDFS.
* src/gcrypt.h.in (GCRY_KDF_SCRYPT): Change value.
* cipher/kdf.c (pkdf2): Rename to _gcry_kdf_pkdf2.
(_gcry_kdf_pkdf2): Don't bail out for SALTLEN==0.
(gcry_kdf_derive): Allow for a passwordlen of zero for scrypt. Check
for SALTLEN > 0 for GCRY_KDF_PBKDF2. Pass algo to _gcry_kdf_scrypt.
(gcry_kdf_derive) [!USE_SCRYPT]: Return an error.
* cipher/scrypt.c: Replace memxor.h by bufhelp.h. Replace scrypt.h by
kdf-internal.h. Enable code only if HAVE_U64_TYPEDEF is defined.
Replace C99 types uint64_t, uint32_t, and uint8_t by libgcrypt types.
(_SALSA20_INPUT_LENGTH): Remove underscore from identifier.
(_scryptBlockMix): Replace memxor by buf_xor.
(_gcry_kdf_scrypt): Use gcry_malloc and gcry_free. Check for integer
overflow. Add hack to support blocksize of 1 for tests. Return
errors from calls to _gcry_kdf_pkdf2.
* cipher/kdf.c (openpgp_s2k): Make static.
--
This patch prepares the addition of more KDF functions, brings the
code into Libgcrypt shape, adds a test case and makes the code more
robust. For example, scrypt would have fail silently if Libgcrypt was
not build with SHA256 support. Also fixed symbol naming for systems
without a visibility support.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac: s/AM_CONFIG_HEADER/AC_CONFIG_HEADER/
|
|
* configure.ac (HAVE_GCC_INLINE_ASM_SSSE3): New test.
(ENABLE_AESNI_SUPPORT): Do not define without SSSE3 support.
(HAVE_GCC_INLINE_ASM_SSSE3, ENABLE_AVX_SUPPORT): Split up detection
and definition.
--
For example the assembler of FreeBSD 7.3 does not know about pshufb
and thus rijndael.c can't be compiled without using
--disable-aesni-support. This check that the toolchain can use SSSE3
instructions before trying to build with AES_NI support.
|
|
* src/gcrypt.h.in (GCRYPT_VERSION_NUMBER): New.
* configure.ac (VERSION_NUMBER): New ac_subst.
* src/global.c (_gcry_vcontrol): Move call to above function ...
(gcry_check_version): .. here.
* configure.ac (BUILD_REVISION, BUILD_FILEVERSION)
(BUILD_TIMESTAMP): Define on all platforms.
* compat/compat.c (_gcry_compat_identification): Include revision and
timestamp.
|
|
* acinclude.m4 (GNUPG_MSG_PRINT): Remove.
(GCRY_MSG_SHOW, GCRY_MSG_WRAP): New.
* configure.ac: Use new macros for the feedback.
|
|
* configure.ac [freebsd]: Do not add /usr/local to CPPFLAGS and
LDFLAGS.
--
Back in ~2000 we introduced a quick hack to make building of Libgcrypt
on FreeBSD easier by always adding -I/usr/local/include and
-L/usr/local/lib . It turned out that this is a bad idea if one wants
to build with library version which is not installed in /usr/local.
|
|
* configure.ac: Add option --disable-avx-support.
(HAVE_GCC_INLINE_ASM_AVX): New.
(ENABLE_AVX_SUPPORT): New.
(camellia) [ENABLE_AVX_SUPPORT, ENABLE_AESNI_SUPPORT]: Add
camellia_aesni_avx_x86-64.lo.
* cipher/Makefile.am (AM_CCASFLAGS): Add.
(EXTRA_libcipher_la_SOURCES): Add camellia_aesni_avx_x86-64.S
* cipher/camellia-glue.c [ENABLE_AESNI_SUPPORT, ENABLE_AVX_SUPPORT]
[__x86_64__] (USE_AESNI_AVX): Add macro.
(struct Camellia_context) [USE_AESNI_AVX]: Add use_aesni_avx.
[USE_AESNI_AVX] (_gcry_camellia_aesni_avx_ctr_enc)
(_gcry_camellia_aesni_avx_cbc_dec): New prototypes to assembly
functions.
(camellia_setkey) [USE_AESNI_AVX]: Enable AES-NI/AVX if hardware
support both.
(_gcry_camellia_ctr_enc) [USE_AESNI_AVX]: Add AES-NI/AVX code.
(_gcry_camellia_cbc_dec) [USE_AESNI_AVX]: Add AES-NI/AVX code.
* cipher/camellia_aesni_avx_x86-64.S: New.
* src/g10lib.h (HWF_INTEL_AVX): New.
* src/global.c (hwflist): Add HWF_INTEL_AVX.
* src/hwf-x86.c (detect_x86_gnuc) [ENABLE_AVX_SUPPORT]: Add detection
for AVX.
--
Before:
Running each test 250 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 2210ms 2200ms 2300ms 2050ms 2240ms 2250ms 2290ms 2270ms 2070ms 2070ms
CAMELLIA256 2810ms 2800ms 2920ms 2670ms 2840ms 2850ms 2910ms 2890ms 2660ms 2640ms
After:
Running each test 250 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 2200ms 2220ms 2290ms 470ms 2240ms 2270ms 2270ms 2290ms 480ms 480ms
CAMELLIA256 2820ms 2820ms 2900ms 600ms 2860ms 2860ms 2900ms 2920ms 620ms 620ms
AES-NI/AVX implementation works by processing 16 parallel blocks (256 bytes).
It's bytesliced implementation that uses AES-NI (Subbyte) for Camellia sboxes,
with help of prefiltering/postfiltering. For smaller data sets generic C
implementation is used.
Speed-up for CBC-decryption and CTR-mode (large data): 4.3x
Tests were run on: Intel Core i5-2450M
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
(license boiler plate update by wk)
|
|
* configure.ac (GCRYPT_HWF_MODULES): New.
(HAVE_CPU_ARCH_X86, HAVE_CPU_ARCH_ALPHA, HAVE_CPU_ARCH_SPARC)
(HAVE_CPU_ARCH_MIPS, HAVE_CPU_ARCH_M68K, HAVE_CPU_ARCH_PPC)
(HAVE_CPU_ARCH_ARM): New AC_DEFINEs.
* mpi/config.links (mpi_cpu_arch): New.
* src/global.c (print_config): Print new tag "cpu-arch".
* src/Makefile.am (libgcrypt_la_SOURCES): Add hwf-common.h
(EXTRA_libgcrypt_la_SOURCES): New.
(gcrypt_hwf_modules): New.
(libgcrypt_la_DEPENDENCIES, libgcrypt_la_LIBADD): Add that one.
* src/hwfeatures.c: Factor most code out to ...
* src/hwf-x86.c: New file.
(detect_x86_gnuc): Return the feature vector.
(_gcry_hwf_detect_x86): New.
* src/hwf-common.h: New.
* src/hwfeatures.c (_gcry_detect_hw_features): Dispatch using
HAVE_CPU_ARCH_ macros.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac: Add option --disable-drng-support.
(ENABLE_DRNG_SUPPORT): New.
* random/rndhw.c (USE_DRNG): New.
(rdrand_long, rdrand_nlong, poll_drng): New.
(_gcry_rndhw_poll_fast, _gcry_rndhw_poll_slow): Call poll function.
* src/g10lib.h (HWF_INTEL_RDRAND): New.
* src/global.c (hwflist): Add "intel-rdrand".
* src/hwfeatures.c (detect_x86_64_gnuc) [ENABLE_DRNG_SUPPORT]: Detect
RDRAND.
(detect_ia32_gnuc) [ENABLE_DRNG_SUPPORT]: Detect RDRAND.
--
This patch provides support for using Digital Random Number Generator (DRNG)
engine, which is available on the latest Intel's CPUs. DRNG engine is
accesible via new the RDRAND instruction.
This patch adds the following:
- support for disabling using of rdrand instruction
- checking for RDRAND instruction support using cpuid
- RDRAND usage implementation
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
ChangeLog and editorial changes by wk.
|
|
* configure.ac: Add check for missing 'asm' keyword in C90 mode and
replacement with '__asm__'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
* configure.ac (HAVE_GCC_ATTRIBUTE_ALIGNED): New test and ac_define.
* cipher/cipher-internal.h, cipher/rijndael.c, random/rndhw.c: Use new
macro instead of a fixed test for __GNUC__.
--
We assume that compilers that grok "__attribute__ ((aligned (16)))"
implement that in the same way as gcc does. In case it turns out
that this is not the case we will need to do two more things: Detect
such different behaviour and come up with a construct to allows the
use of that other style of alignment forcing.
|
|
* configure.ac (mmm4_revision): Use git rev-parse.
|
|
* configure.ac: Bump LT version at C20/A0/R0 to adjust for a planned
API update in 1.5.1.
|
|
* configure.ac: Fix usage of AC_LANG_PROGRAM.
|
|
* configure.ac: Add all the required m4 magic.
|