summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-07-29Implement deterministic ECDSA as specified by rfc-6979.Werner Koch3-45/+626
* cipher/ecc.c (sign): Add args FLAGS and HASHALGO. Convert an opaque MPI as INPUT. Implement rfc-6979. (ecc_sign): Remove the opaque MPI code and pass FLAGS to sign. (verify): Do not allocate and compute Y; it is not used. (ecc_verify): Truncate the hash value if needed. * tests/dsa-rfc6979.c (check_dsa_rfc6979): Add ECDSA test cases. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-26Implement deterministic DSA as specified by rfc-6979.Werner Koch5-32/+834
* cipher/dsa.c (dsa_sign): Move opaque mpi extraction to sign. (sign): Add args FLAGS and HASHALGO. Implement deterministic DSA. Add code path for R==0 to comply with the standard. (dsa_verify): Left fill opaque mpi based hash values. * cipher/dsa-common.c (int2octets, bits2octets): New. (_gcry_dsa_gen_rfc6979_k): New. * tests/dsa-rfc6979.c: New. * tests/Makefile.am (TESTS): Add dsa-rfc6979. -- This patch also fixes a recent patch (37d0a1e) which allows to pass the hash in a (hash) element. Support for deterministic ECDSA will come soon. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-26Allow the use of a private-key s-expression with gcry_pk_verify.Werner Koch1-1/+6
* cipher/pubkey.c (sexp_to_key): Fallback to private key. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-25Mitigate a flush+reload cache attack on RSA secret exponents.Werner Koch1-2/+11
* mpi/mpi-pow.c (gcry_mpi_powm): Always perfrom the mpi_mul for exponents in secure memory. -- The attack is published as http://eprint.iacr.org/2013/448 : Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack by Yuval Yarom and Katrina Falkner. 18 July 2013. Flush+Reload is a cache side-channel attack that monitors access to data in shared pages. In this paper we demonstrate how to use the attack to extract private encryption keys from GnuPG. The high resolution and low noise of the Flush+Reload attack enables a spy program to recover over 98% of the bits of the private key in a single decryption or signing round. Unlike previous attacks, the attack targets the last level L3 cache. Consequently, the spy program and the victim do not need to share the execution core of the CPU. The attack is not limited to a traditional OS and can be used in a virtualised environment, where it can attack programs executing in a different VM. (cherry picked from commit 55237c8f6920c6629debd23db65e90b42a3767de)
2013-07-19pk: Allow the use of a hash element for DSA sign and verify.Werner Koch6-17/+152
* cipher/pubkey.c (pubkey_sign): Add arg ctx and pass it to the sign module. (gcry_pk_sign): Pass CTX to pubkey_sign. (sexp_data_to_mpi): Add flag rfc6979 and code to alls hash with *DSA * cipher/rsa.c (rsa_sign, rsa_verify): Return an error if an opaque MPI is given for DATA/HASH. * cipher/elgamal.c (elg_sign, elg_verify): Ditto. * cipher/dsa.c (dsa_sign, dsa_verify): Convert a given opaque MPI. * cipher/ecc.c (ecc_sign, ecc_verify): Ditto. * tests/basic.c (check_pubkey_sign_ecdsa): Add a test for using a hash element with DSA. -- This patch allows the use of (data (flags raw) (hash sha256 #80112233445566778899AABBCCDDEEFF 000102030405060708090A0B0C0D0E0F#)) in addition to the old but more efficient (data (flags raw) (value #80112233445566778899AABBCCDDEEFF 000102030405060708090A0B0C0D0E0F#)) for DSA and ECDSA. With the hash element the flag "raw" must be explicitly given because existing regression test code expects that conflict error is return if no flags but a hash element is given. Note that the hash algorithm name is currently not checked. It may eventually be used to cross-check the length of the provided hash value. It is suggested that the correct hash name is given - even if a truncated hash value is used. Finally this patch adds a way to pass the hash algorithm and flag values to the signing module. "rfc6979" as been implemented as a new but not yet used flag. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-19sexp: Add function gcry_sexp_nth_buffer.Werner Koch8-4/+76
* src/sexp.c (gcry_sexp_nth_buffer): New. * src/visibility.c, src/visibility.h: Add function wrapper. * src/libgcrypt.vers, src/libgcrypt.def: Add to API. * src/gcrypt.h.in: Add prototype. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-18Update AUTHORS with info on Salsa20.Werner Koch1-2/+2
--
2013-07-18Add support for Salsa20.Werner Koch10-8/+1023
* src/gcrypt.h.in (GCRY_CIPHER_SALSA20): New. * cipher/salsa20.c: New. * configure.ac (available_ciphers): Add Salsa20. * cipher/cipher.c: Register Salsa20. (cipher_setiv): Allow to divert an IV to a cipher module. * src/cipher-proto.h (cipher_setiv_func_t): New. (cipher_extra_spec): Add field setiv. * src/cipher.h: Declare Salsa20 definitions. * tests/basic.c (check_stream_cipher): New. (check_stream_cipher_large_block): New. (check_cipher_modes): Run new test functions. (check_ciphers): Add simple test for Salsa20. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-17Typo fix in comment.Werner Koch1-1/+1
--
2013-07-17Allow gcry_mpi_dump to print opaque MPIs.Werner Koch2-0/+44
* mpi/mpicoder.c (gcry_mpi_dump): Detect abd print opaque MPIs. * tests/mpitests.c (test_opaque): New. (main): Call new test. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-17cipher: Prepare to pass extra info to the sign functions.Werner Koch6-7/+25
* src/gcrypt-module.h (gcry_pk_sign_t): Add parms flags and hashalgo. * cipher/rsa.c (rsa_sign): Add parms and mark them as unused. * cipher/dsa.c (dsa_sign): Ditto. * cipher/elgamal.c (elg_sign): Ditto. * cipher/pubkey.c (dummy_sign): Ditto. (pubkey_sign): Pass 0 for the new args. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-17Fix a special case bug in mpi_powm for e==0.Werner Koch1-2/+7
* mpi/mpi-pow.c (gcry_mpi_powm): For a zero exponent, make sure that the result has been allocated. -- This code triggered the problem: modulus = gcry_mpi_set_ui(NULL, 100); generator = gcry_mpi_set_ui(NULL, 3); exponent = gcry_mpi_set_ui(NULL, 0); result = gcry_mpi_new(0); gcry_mpi_powm(result, generator, exponent, modulus); gcry_mpi_new(0) does not allocate the limb space thus it is not possible to write even into the first limb. Workaround was to use gcry_mpi_new (1) but a real fix is better. Reported-by: Ian Goldberg Signed-off-by: Werner Koch <wk@gnupg.org>
2013-07-15Register DCO for Dmitry Kasatkin.Werner Koch1-0/+3
--
2013-07-15Fix memory leak in t-mpi-point testDmitry Eremin-Solenikov1-0/+2
* tests/t-mpi-point.c (basic_ec_math, basic_ec_math_simplified): add calls to gcry_ctx_release() to free contexts after they become unused. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2013-07-10Fix 'Please include winsock2.h before windows.h' warnings with mingw32Jussi Kivilinna3-0/+3
* random/rndw32.c: include winsock2.h before windows.h. * src/ath.h [_WIN32]: Ditto. * tests/benchmark.c [_WIN32]: Ditto. -- Patch silences warnings of following type: /usr/lib/gcc/i686-w64-mingw32/4.6/../../../../i686-w64-mingw32/include/winsock2.h:15:2: warning: #warning Please include winsock2.h before windows.h [-Wcpp] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10Remove duplicate header from mpi/amd64/mpih-mul2.SJussi Kivilinna1-43/+0
* mpi/amd64/mpih-mul2.S: remove duplicated header. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10Fix i386/amd64 inline assembly "cc" clobbersJussi Kivilinna5-11/+21
* cipher/bithelp.h [__GNUC__, __i386__] (rol, ror): add "cc" globber for inline assembly. * cipher/cast5.c [__GNUC__, __i386__] (rol): Ditto. * random/rndhw.c [USE_DRNG] (rdrand_long): Ditto. * src/hmac256.c [__GNUC__, __i386__] (ror): Ditto. * mpi/longlong.c [__i386__] (add_ssaaaa, sub_ddmmss, umul_ppmm) (udiv_qrnnd, count_leading_zeros, count_trailing_zeros): Ditto. -- These assembly snippets modify cflags but do not mark "cc" clobber. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10bufhelp: Suppress 'cast increases required alignment' warningJussi Kivilinna1-10/+10
* cipher/bufhelp.h (buf_xor, buf_xor_2dst, buf_xor_n_copy): Cast to larger element pointer through (void *) to suppress -Wcast-error. -- Patch disables bogus warnings caused by -Wcast-error. We know that byte pointers are properly aligned at these phases, or that hardware can handle unaligned accesses. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10mpi: Add __ARM_ARCH for older GCCJussi Kivilinna1-7/+26
* mpi/longlong.h [__arm__]: Construct __ARM_ARCH if not provided by compiler. -- GCC 4.8 defines __ARM_ARCH which provides forward compatible way to detect ARM architecture. Use this when available and construct otherwise. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10mpi: add missing "cc" clobber for ARM assemblyJussi Kivilinna1-3/+3
* mpi/longlong.h [__arm__] (add_ssaaaa, sub_ddmmss): Add __CLOBBER_CC. [__arm__][__ARM_ARCH <= 3] (umul_ppmm): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-07-10Tweak ARM inline assembly for mpiJussi Kivilinna1-4/+16
mpi/longlong.h [__arm__]: Enable inline assembly if __thumb2__ is defined. [__arm__]: Use __ARCH_ARM when defined. [__arm__] [__ARM_ARCH >= 5] (count_leading_zeros): New. -- Current ARM Linux distributions use EABI that enables thumb2, and therefore inline assembly is disable (because !defined(__thumb__) selector). However thumb2 allows the use of assembly instructions that longlong.h contains for ARM. So this patch enables inline assembly for ARM when __thumb2__ is defined in addition to __thumb__. Patch also adds optimization for count_leading_zeros() macro for ARM. Results on Cortex-A8, 1Ghz: === Before: Algorithm generate 100*sign 100*verify ------------------------------------------------ RSA 1024 bit 750ms 2780ms 110ms RSA 2048 bit 14280ms 17250ms 300ms RSA 3072 bit 38630ms 51300ms 650ms RSA 4096 bit 60940ms 111430ms 1000ms jussi@cubie:~/libgcrypt$ tests/benchmark dsa Algorithm generate 100*sign 100*verify ------------------------------------------------ DSA 1024/160 - 1410ms 1680ms DSA 2048/224 - 6100ms 7390ms DSA 3072/256 - 14350ms 17120ms jussi@cubie:~/libgcrypt$ tests/benchmark ecc Algorithm generate 100*sign 100*verify ------------------------------------------------ ECDSA 192 bit 90ms 2160ms 3940ms ECDSA 224 bit 110ms 2810ms 5400ms ECDSA 256 bit 150ms 3570ms 6970ms ECDSA 384 bit 340ms 8320ms 16420ms ECDSA 521 bit 850ms 19760ms 38480ms After: jussi@cubie:~/libgcrypt$ tests/benchmark rsa Algorithm generate 100*sign 100*verify ------------------------------------------------ RSA 1024 bit 590ms 2230ms 80ms RSA 2048 bit 2320ms 13090ms 240ms RSA 3072 bit 60580ms 38420ms 460ms RSA 4096 bit 115130ms 82250ms 750ms jussi@cubie:~/libgcrypt$ tests/benchmark dsa Algorithm generate 100*sign 100*verify ------------------------------------------------ DSA 1024/160 - 1070ms 1290ms DSA 2048/224 - 4500ms 5550ms DSA 3072/256 - 10280ms 12200ms jussi@cubie:~/libgcrypt$ tests/benchmark ecc Algorithm generate 100*sign 100*verify ------------------------------------------------ ECDSA 192 bit 70ms 1900ms 3560ms ECDSA 224 bit 100ms 2490ms 4750ms ECDSA 256 bit 120ms 3140ms 5920ms ECDSA 384 bit 270ms 6990ms 13790ms ECDSA 521 bit 680ms 17080ms 33490ms Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-26Make gpg-error replacement defines more robust.Werner Koch25-33/+67
* configure.ac (AH_BOTTOM): Move GPG_ERR_ replacement defines to ... * src/gcrypt-int.h: new file. * src/visibility.h, src/cipher.h: Replace gcrypt.h by gcrypt-int.h. * tests/: Ditto for all test files. -- Defining newer gpg-error codes in config.h was not a good idea, because config.h is usually included before gpg-error.h and thus gpg-error.h would be double defines to lead to faulty code there like typedef enum { [...] 191 = 191, [...] };
2013-06-20Check if assembler is compatible with AMD64 assembly implementationsJussi Kivilinna15-16/+50
* cipher/blowfish-amd64.S: Enable only if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined. * cipher/camellia-aesni-avx-amd64.S: Ditto. * cipher/camellia-aesni-avx2-amd64.S: Ditto. * cipher/cast5-amd64.S: Ditto. * cipher/rinjdael-amd64.S: Ditto. * cipher/serpent-avx2-amd64.S: Ditto. * cipher/serpent-sse2-amd64.S: Ditto. * cipher/twofish-amd64.S: Ditto. * cipher/blowfish.c: Use AMD64 assembly implementation only if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined * cipher/camellia-glue.c: Ditto. * cipher/cast5.c: Ditto. * cipher/rijndael.c: Ditto. * cipher/serpent.c: Ditto. * cipher/twofish.c: Ditto. * configure.ac: Check gcc/as compatibility with AMD64 assembly implementations. -- Later these checks can be split and assembly implementations adapted to handle different platforms, but for now disable AMD64 assembly implementations if assembler does not look to be able to handle them. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-09Optimize _gcry_burn_stack for 32-bit and 64-bit architecturesJussi Kivilinna1-0/+26
* src/misc.c (_gcry_burn_stack): Add optimization for 32-bit and 64-bit architectures. -- Busy looping 'tests/benchmark --cipher-repetitions 10 cipher blowfish' on ARM Cortex-A8 shows that _gcry_burn_stack takes 21% of CPU time. With this patch, that number drops to 3.4%. On AMD64 (Intel i5-4570) CPU usage for _gcry_burn_stack in the same test drops from 3.5% to 1.1%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-09Add Camellia AES-NI/AVX2 implementationJussi Kivilinna4-4/+1395
* cipher/Makefile.am: Add 'camellia-aesni-avx2-amd64.S'. * cipher/camellia-aesni-avx2-amd64.S: New file. * cipher/camellia-glue.c (USE_AESNI_AVX2): New macro. (CAMELLIA_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'. [USE_AESNI_AVX2] (_gcry_camellia_aesni_avx2_ctr_enc) (_gcry_camellia_aesni_avx2_cbc_dec) (_gcry_camellia_aesni_avx2_cfb_dec): New prototypes. (camellia_setkey) [USE_AESNI_AVX2]: Check AVX2+AES-NI capable hardware and set 'ctx->use_aesni_avx2'. (_gcry_camellia_ctr_enc) [USE_AESNI_AVX2]: Add AVX2 accelerated code. (_gcry_camellia_cbc_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code. (_gcry_camellia_cfb_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks' so that AVX2 codepaths get tested. * configure.ac (camellia) [avx2support, aesnisupport]: Add 'camellia-aesni-avx2-amd64.lo'. -- Add new AVX2/AES-NI implementation of Camellia that processes 32 blocks in parallel. Speed old (AVX/AES-NI) vs. new (AVX2/AES-NI) on Intel Core i5-4570: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- CAMELLIA128 1.00x 0.99x 1.00x 1.53x 1.00x 1.49x 1.00x 1.00x 1.54x 1.54x CAMELLIA256 0.99x 1.00x 1.00x 1.50x 1.00x 1.50x 1.00x 1.00x 1.54x 1.52x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-09Add Serpent AVX2 implementationJussi Kivilinna4-4/+1028
* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'. * cipher/serpent-avx2-amd64.S: New file. * cipher/serpent.c (USE_AVX2): New macro. (serpent_context_t) [USE_AVX2]: Add 'use_avx2'. [USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec) (_gcry_serpent_avx2_cfb_dec): New prototypes. (serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware and set 'use_avx2'. (_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions. (_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions. (_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks' so that AVX2 codepaths are tested. * configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'. -- Add new AVX2 implementation of Serpent that processes 16 blocks in parallel. Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.00x 1.00x 1.00x 2.10x 1.00x 2.16x 1.01x 1.00x 2.16x 2.18x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-09Add detection for Intel AVX2 instruction setJussi Kivilinna4-3/+58
* configure.ac: Add option --disable-avx2-support. (HAVE_GCC_INLINE_ASM_AVX2): New. (ENABLE_AVX2_SUPPORT): New. * src/g10lib.h (HWF_INTEL_AVX2): New. * src/global.c (hwflist): Add HWF_INTEL_AVX2. * src/hwf-x86.c [__i386__] (get_cpuid): Initialize registers to zero before cpuid. [__x86_64__] (get_cpuid): Initialize registers to zero before cpuid. (detect_x86_gnuc): Store maximum cpuid level. (detect_x86_gnuc) [ENABLE_AVX2_SUPPORT]: Add detection for AVX2. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-06-09twofish: add amd64 assembly implementationJussi Kivilinna6-1/+1036
* cipher/Makefile.am: Add 'twofish-amd64.S'. * cipher/twofish-amd64.S: New file. * cipher/twofish.c (USE_AMD64_ASM): New macro. [USE_AMD64_ASM] (_gcry_twofish_amd64_encrypt_block) (_gcry_twofish_amd64_decrypt_block, _gcry_twofish_amd64_ctr_enc) (_gcry_twofish_amd64_cbc_dec, _gcry_twofish_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt) (twofish_encrypt, twofish_decrypt): New functions. (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec, _gcry_twofish_cfb_dec) (selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Call new bulk selftests. * cipher/cipher.c (gcry_cipher_open) [USE_TWOFISH]: Register Twofish bulk functions for ctr-enc, cbc-dec and cfb-dec. * configure.ac (twofish) [x86_64]: Add 'twofish-amd64.lo'. * src/cipher.h (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec) (gcry_twofish_cfb_dec): New prototypes. -- Provides non-parallel implementations for small speed-up and 3-way parallel implementations that gets accelerated on `out-of-order' CPUs. Speed old vs. new on Intel Core i5-4570: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- TWOFISH128 1.08x 1.07x 1.10x 1.80x 1.09x 1.70x 1.08x 1.08x 1.70x 1.69x Speed old vs. new on Intel Core2 T8100: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- TWOFISH128 1.11x 1.10x 1.13x 1.65x 1.13x 1.62x 1.12x 1.11x 1.63x 1.59x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-29rinjdael: add amd64 assembly implementationJussi Kivilinna4-1/+1456
* cipher/Makefile.am: Add 'rijndael-amd64.S'. * cipher/rijndael-amd64.S: New file. * cipher/rijndael.c (USE_AMD64_ASM): New macro. [USE_AMD64_ASM] (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block): New prototypes. (do_encrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function. (do_encrypt): Disable input/output alignment when USE_AMD64_ASM is set. (do_decrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function. (do_decrypt): Disable input/output alignment when USE_AMD64_AES is set. * configure.ac (aes) [x86-64]: Add 'rijndael-amd64.lo'. -- Add optimized amd64 assembly implementation for AES. Old vs new, on AMD Phenom II: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- AES 1.74x 1.72x 1.81x 1.85x 1.82x 1.76x 1.67x 1.64x 1.79x 1.81x AES192 1.77x 1.77x 1.79x 1.88x 1.90x 1.80x 1.69x 1.69x 1.85x 1.81x AES256 1.79x 1.81x 1.83x 1.89x 1.88x 1.82x 1.72x 1.70x 1.87x 1.89x Old vs new, on Intel Core2: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- AES 1.77x 1.75x 1.78x 1.76x 1.76x 1.77x 1.75x 1.76x 1.76x 1.82x AES192 1.80x 1.73x 1.81x 1.76x 1.79x 1.85x 1.77x 1.76x 1.80x 1.85x AES256 1.81x 1.77x 1.81x 1.77x 1.80x 1.79x 1.78x 1.77x 1.81x 1.85x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-29blowfish: add amd64 assembly implementationJussi Kivilinna6-1/+832
* cipher/Makefile.am: Add 'blowfish-amd64.S'. * cipher/blowfish-amd64.S: New file. * cipher/blowfish.c (USE_AMD64_ASM): New macro. [USE_AMD64_ASM] (_gcry_blowfish_amd64_do_encrypt) (_gcry_blowfish_amd64_encrypt_block) (_gcry_blowfish_amd64_decrypt_block, _gcry_blowfish_amd64_ctr_enc) (_gcry_blowfish_amd64_cbc_dec, _gcry_blowfish_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block) (encrypt_block, decrypt_block): New functions. (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (_gcry_blowfish_cfb_dec, selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Call new bulk selftests. * cipher/cipher.c (gcry_cipher_open) [USE_BLOWFISH]: Register Blowfish bulk functions for ctr-enc, cbc-dec and cfb-dec. * configure.ac (blowfish) [x86_64]: Add 'blowfish-amd64.lo'. * src/cipher.h (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (gcry_blowfish_cfb_dec): New prototypes. -- Add non-parallel functions for small speed-up and 4-way parallel functions for modes of operation that support parallel processing. Speed old vs. new on AMD Phenom II X6 1055T: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- BLOWFISH 1.21x 1.12x 1.17x 3.52x 1.18x 3.34x 1.16x 1.15x 3.38x 3.47x Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- BLOWFISH 1.16x 1.10x 1.17x 2.98x 1.18x 2.88x 1.16x 1.15x 3.00x 3.02x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-24ecc: Simplify the compliant point generation.Werner Koch1-28/+17
* cipher/ecc.c (generate_key): Use point_snatch_set, replaces unneeded variable copies, etc. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-05-24ecc: Fix a minor flaw in the generation of K.Werner Koch5-104/+117
* cipher/dsa.c (gen_k): Factor code out to .. * cipher/dsa-common.c (_gcry_dsa_gen_k): new file and function. Add arg security_level and re-indent a bit. * cipher/ecc.c (gen_k): Remove and change callers to _gcry_dsa_gen_k. * cipher/dsa.c: Include pubkey-internal. * cipher/Makefile.am (libcipher_la_SOURCES): Add dsa-common.c -- The ECDSA code used the simple $k = k \bmod p$ method which introduces a small bias. We now use the bias free method we have always used with DSA. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-05-24cast5: add amd64 assembly implementationJussi Kivilinna6-9/+885
* cipher/Makefile.am: Add 'cast5-amd64.S'. * cipher/cast5-amd64.S: New file. * cipher/cast5.c (USE_AMD64_ASM): New macro. (_gcry_cast5_s1tos4): Merge arrays s1, s2, s3, s4 to single array to simplify access from assembly implementation. (s1, s2, s3, s4): New macros pointing to subarrays in _gcry_cast5_s1tos4. [USE_AMD64_ASM] (_gcry_cast5_amd64_encrypt_block) (_gcry_cast5_amd64_decrypt_block, _gcry_cast5_amd64_ctr_enc) (_gcry_cast5_amd64_cbc_dec, _gcry_cast5_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block) (decrypt_block): New functions. (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec) (selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Call new bulk selftests. * cipher/cipher.c (gcry_cipher_open) [USE_CAST5]: Register CAST5 bulk functions for ctr-enc, cbc-dec and cfb-dec. * configure.ac (cast5) [x86_64]: Add 'cast5-amd64.lo'. * src/cipher.h (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec) (gcry_cast5_cfb_dec): New prototypes. -- Provides non-parallel implementations for small speed-up and 4-way parallel implementations that gets accelerated on `out-of-order' CPUs. Speed old vs. new on AMD Phenom II X6 1055T: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- CAST5 1.23x 1.22x 1.21x 2.86x 1.21x 2.83x 1.22x 1.17x 2.73x 2.73x Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- CAST5 1.00x 1.04x 1.06x 2.56x 1.06x 2.37x 1.03x 1.01x 2.43x 2.41x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-24cipher-selftest: make selftest work with any block-sizeJussi Kivilinna5-79/+85
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc_128) (_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed functions from '<name>_128' to '<name>'. (_gcry_selftest_helper_cbc, _gcry_selftest_helper_cfb) (_gcry_selftest_helper_ctr): Make work with different block sizes. * cipher/cipher-selftest.h (_gcry_selftest_helper_cbc_128) (_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed prototypes from '<name>_128' to '<name>'. * cipher/camellia-glue.c (selftest_ctr_128, selftest_cfb_128) (selftest_ctr_128): Change to use new function names. * cipher/rijndael.c (selftest_ctr_128, selftest_cfb_128) (selftest_ctr_128): Change to use new function names. * cipher/serpent.c (selftest_ctr_128, selftest_cfb_128) (selftest_ctr_128): Change to use new function names. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-23serpent: add parallel processing for CFB decryptionJussi Kivilinna4-0/+158
* cipher/cipher.c (gcry_cipher_open): Add bulf CFB decryption function for Serpent. * cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_cfb_dec): New function. * cipher/serpent.c (_gcry_serpent_sse2_cfb_dec): New prototype. (_gcry_serpent_cfb_dec) New function. (selftest_cfb_128) New function. (selftest) Call selftest_cfb_128. * src/cipher.h (_gcry_serpent_cfb_dec): New prototype. -- Patch makes Serpent-CFB decryption 4.0 times faster on Intel Sandy-Bridge and 2.7 times faster on AMD K10. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-23camellia: add parallel processing for CFB decryptionJussi Kivilinna4-0/+143
* cipher/camellia-aesni-avx-amd64.S (_gcry_camellia_aesni_avx_cfb_dec): New function. * cipher/camellia-glue.c (_gcry_camellia_aesni_avx_cfb_dec): New prototype. (_gcry_camellia_cfb_dec): New function. (selftest_cfb_128): New function. (selftest): Call selftest_cfb_128. * cipher/cipher.c (gry_cipher_open): Add bulk CFB decryption function for Camellia. * src/cipher.h (_gcry_camellia_cfb_dec): New prototype. -- Patch makes Camellia-CFB decryption 4.7 times faster on Intel Sandy-Bridge. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-23rinjdael: add parallel processing for CFB decryption with AES-NIJussi Kivilinna3-1/+299
* cipher/cipher-selftest.c (_gcry_selftest_helper_cfb_128): New function for CFB selftests. * cipher/cipher-selftest.h (_gcry_selftest_helper_cfb_128): New prototype. * cipher/rijndael.c [USE_AESNI] (do_aesni_enc_vec4): New function. (_gcry_aes_cfb_dec) [USE_AESNI]: Add parallelized CFB decryption. (selftest_cfb_128): New function. (selftest): Call selftest_cfb_128. -- CFB decryption can be parallelized for additional performance. On Intel Sandy-Bridge processor, this change makes CFB decryption 4.6 times faster. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-23Avoid compiler warning due to the global symbol setkey.Werner Koch1-4/+6
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc_128) (_gcry_selftest_helper_ctr_128): Rename setkey to setkey_func. -- setkey is a POSIX.1 function defined in stdlib.
2013-05-23serpent: add SSE2 accelerated amd64 implementationJussi Kivilinna6-3/+1066
* configure.ac (serpent): Add 'serpent-sse2-amd64.lo'. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add 'serpent-sse2-amd64.S'. * cipher/cipher.c (gcry_cipher_open) [USE_SERPENT]: Register bulk functions for CBC-decryption and CTR-mode. * cipher/serpent.c (USE_SSE2): New macro. [USE_SSE2] (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec): New prototypes to assembler functions. (serpent_setkey): Set 'serpent_init_done' before calling serpent_test. (_gcry_serpent_ctr_enc): New function. (_gcry_serpent_cbc_dec): New function. (selftest_ctr_128): New function. (selftest_cbc_128): New function. (selftest): Call selftest_ctr_128 and selftest_cbc_128. * cipher/serpent-sse2-amd64.S: New file. * src/cipher.h (_gcry_serpent_ctr_enc): New prototype. (_gcry_serpent_cbc_dec): New prototype. -- [v2]: Converted to SSE2, to support all amd64 processors (SSE2 is required feature by AMD64 SysV ABI). Patch adds word-sliced SSE2 implementation of Serpent for amd64 for speeding up parallelizable workloads (CTR mode, CBC mode decryption). Implementation processes eight blocks in parallel, with two four-block sets interleaved for out-of-order scheduling. Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.00x 0.99x 1.00x 3.98x 1.00x 1.01x 1.00x 1.01x 4.04x 4.04x Speed old vs. new on AMD Phenom II X6 1055T: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.02x 1.01x 1.00x 2.83x 1.00x 1.00x 1.00x 1.00x 2.72x 2.72x Speed old vs. new on Intel Core2 Duo T8100: ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.00x 1.02x 0.97x 4.02x 0.98x 1.01x 0.98x 1.00x 3.82x 3.91x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-23Serpent: faster S-box implementationJussi Kivilinna1-350/+250
* cipher/serpent.c (SBOX0, SBOX1, SBOX2, SBOX3, SBOX4, SBOX5, SBOX6) (SBOX7, SBOX0_INVERSE, SBOX1_INVERSE, SBOX2_INVERSE, SBOX3_INVERSE) (SBOX4_INVERSE, SBOX5_INVERSE, SBOX6_INVERSE, SBOX7_INVERSE): Replace with new definitions. -- These new S-box definitions are from paper: D. A. Osvik, “Speeding up Serpent,” in Third AES Candidate Conference, (New York, New York, USA), p. 317–329, National Institute of Standards and Technology, 2000. Available at http://www.ii.uib.no/~osvik/pub/aes3.ps.gz Although these were optimized for two-operand instructions on i386 and for old Pentium-1 processors, they are slightly faster on current processors on i386 and x86-64. On ARM, the performance of these S-boxes is about the same as with the old S-boxes. new vs old speed ratios (AMD K10, x86-64): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.06x 1.02x 1.06x 1.02x 1.06x 1.06x 1.06x 1.05x 1.07x 1.07x new vs old speed ratios (Intel Atom, i486): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.12x 1.15x 1.12x 1.15x 1.13x 1.11x 1.12x 1.12x 1.12x 1.13x new vs old speed ratios (ARM Cortex A8): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- SERPENT128 1.04x 1.02x 1.02x 0.99x 1.02x 1.02x 1.03x 1.03x 1.01x 1.01x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-22w32: Fix installing of .def file.Werner Koch1-0/+1
* src/Makefile.am (install-def-file): Create libdir first. -- Reported-by: LRN <lrn1986@gmail.com>
2013-05-22Register a DCO.Werner Koch1-0/+3
--
2013-05-22Add control commands to disable mlock and setuid dropping.Werner Koch6-9/+59
* src/gcrypt.h.in (GCRYCTL_DISABLE_LOCKED_SECMEM): New. (GCRYCTL_DISABLE_PRIV_DROP): New. * src/global.c (_gcry_vcontrol): Implement them. * src/secmem.h (GCRY_SECMEM_FLAG_NO_MLOCK): New. (GCRY_SECMEM_FLAG_NO_PRIV_DROP): New. * src/secmem.c (no_mlock, no_priv_drop): New. (_gcry_secmem_set_flags, _gcry_secmem_get_flags): Set and get them. (lock_pool): Handle no_mlock and no_priv_drop. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-05-22Fix libtool 2.4.2 to correctly detect .def files.Werner Koch2-5/+9
* ltmain.sh (sed_uncomment_deffile): New. (orig_export_symbols): Uncomment def file before testing for EXPORTS. * m4/libtool.m4: Do the same for the generated code. -- The old code was not correct in that it only looked at the first line and puts an EXPORTS keyword in front if missing. Binutils 2.22 accepted a duplicated EXPORTS keyword but at least 2.23.2 is more stringent and bails out without this fix. There is no need to send this upstream. Upstream's git master has a lot of changes including a similar fix for this problems. There are no signs that a libtool 2.4.3 will be released to fix this problem and thus we need to stick to our copy of 2.4.2 along with this patch. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-05-22Add AES bulk CBC decryption selftestJussi Kivilinna1-0/+18
* cipher/rinjdael.c (selftest_cbc_128): New. (selftest): Call selftest_cbc_128. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-22Change AES bulk CTR encryption selftest use new selftest helper functionJussi Kivilinna1-86/+7
* cipher/rinjdael.c: (selftest_ctr_128): Change to use new selftest helper function. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-22Convert bulk CTR and CBC selftest functions in Camellia to generic selftest ↵Jussi Kivilinna4-157/+357
helper functions * cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-selftest files. * cipher/camellia-glue.c (selftest_ctr_128, selftest_cbc_128): Change to use the new selftest helper functions. * cipher/cipher-selftest.c: New. * cipher/cipher-selftest.h: New. -- Convert selftest functions into generic helper functions for code sharing. [v2]: use syslog for more detailed selftest error messages Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-22camellia: add bulk CBC decryption selftestJussi Kivilinna1-0/+83
* cipher/camellia-glue.c: (selftest_cbc_128): New selftest function for bulk CBC decryption. (selftest): Add call to selftest_cbc_128. -- Add selftest for the parallel code paths in bulk CBC decryption. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-22camellia: Rename camellia_aesni_avx_x86-64.S to camellia-aesni-avx-amd64.SJussi Kivilinna3-4/+4
* cipher/camellia_aesni_avx_x86-64.S: Remove. * cipher/camellia-aesni-avx-amd64.S: New. * cipher/Makefile.am: Use the new filename. * configure.ac: Use the new filename. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-05-21Fix indentation and save on string space.Werner Koch2-33/+46
* cipher/ecc.c (generate_key): Use the same string for both fatal messages.