summaryrefslogtreecommitdiff
path: root/cipher
AgeCommit message (Collapse)AuthorFilesLines
2016-03-16cipher: Update comment.Justus Winter1-2/+2
* cipher/ecc.c (ecc_get_nbits): Update comment to reflect the fact that a curve parameter can be given. Signed-off-by: Justus Winter <justus@g10code.com>
2016-03-12Add Intel PCLMUL implementations of CRC algorithmsJussi Kivilinna3-2/+970
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'. * cipher/crc-intel-pclmul.c: New. * cipher/crc.c (USE_INTEL_PCLMUL): New macro. (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'. [USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul) (gcry_crc24rfc2440_intel_pclmul): New. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL HW features detected. (crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL implementation if enabled. (crc24_init): Document storage format of 24-bit CRC. (crc24_next4): Use only 'data' for last table look-up. * configure.ac: Add 'crc-intel-pclmul.lo'. * src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include Intel SSE4.1. * src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection. * src/hwfeatures.c (hwflist): Add 'intel-sse4.1'. * tests/basic.c (fillbuf_count): New. (check_one_md): Add "?" check (million byte data-set with byte pattern 0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?" checks. (check_one_md_multi): Skip "?". (check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160, CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!" test-vectors for CRC32_RFC1510 and CRC24_RFC2440. -- Add Intel PCLMUL accelerated implmentations of CRC algorithms. CRC performance is improved ~11x on x86_64 and i386 on Intel Haswell, and ~2.7x on Intel Sandy-bridge. Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-02-19Add new private header gcrypt-testapi.h.Werner Koch1-2/+3
* src/gcrypt-testapi.h: New. * src/Makefile.am (libgcrypt_la_SOURCES): Add new file. * random/random.h: Include gcrypt-testapi.h. (struct gcry_drbg_test_vector) : Move to gcrypt-testapi.h. * src/global.c: Include gcrypt-testapi.h. (_gcry_vcontrol): Use PRIV_CTL_* constants instead of 58, 59, 60, 61. * cipher/cipher.c: Include gcrypt-testapi.h. (_gcry_cipher_ctl): Use PRIV_CIPHERCTL_ constants instead of 61, 62. * tests/fipsdrv.c: Include gcrypt-testapi.h. Remove definition of PRIV_CTL_ constants and replace their use by the new PRIV_CIPHERCTL_ constants. * tests/t-lock.c: Include gcrypt-testapi.h. Remove PRIV_CTL_EXTERNAL_LOCK_TEST and EXTERNAL_LOCK_TEST_ constants. * random/random-drbg.c (gcry_rngdrbg_cavs_test): Rename to ... (_gcry_rngdrbg_cavs_test): this. (gcry_rngdrbg_healthcheck_one): Rename to ... (_gcry_rngdrbg_healthcheck_one): this. Signed-off-by: Werner Koch <wk@gnupg.org>
2016-02-13bufhelp: disable unaligned memory accesses on powerpcJussi Kivilinna1-1/+0
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Disable for __powerpc__ and __powerpc64__. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-02-12ecc: Not validate input point for Curve25519.NIIBE Yutaka1-1/+3
* cipher/ecc.c (ecc_decrypt_raw): Curve25519 is an exception. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
2016-02-10ecc: Fix memory leaks on error.NIIBE Yutaka1-2/+2
* cipher/ecc.c (ecc_decrypt_raw): Go to leave to release memory. * mpi/ec.c (_gcry_mpi_ec_curve_point): Likewise. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
2016-02-09ecc: input validation on ECDH.NIIBE Yutaka1-0/+6
* cipher/ecc.c (ecc_decrypt_raw): Validate the point. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> (forward port from LIBGCRYPT-1-6-BRANCH commit 28eb424e4427b320ec1c9c4ce56af25d495230bd)
2016-02-08Add ARM assembly implementation of SHA-512Jussi Kivilinna3-33/+516
* cipher/Makefile.am: Add 'sha512-arm.S'. * cipher/sha512-arm.S: New. * cipher/sha512.c (USE_ARM_ASM): New. (_gcry_sha512_transform_arm): New. (transform) [USE_ARM_ASM]: Use ARM assembly implementation instead of generic. * configure.ac: Add 'sha512-arm.lo'. -- Benchmark on Cortex-A8 (armv6, 1008 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 112.0 ns/B 8.52 MiB/s 112.9 c/B After (3.3x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA512 | 34.01 ns/B 28.04 MiB/s 34.28 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-02-02ecc: Fix Curve25519 for data by older implementation.NIIBE Yutaka1-20/+18
* cipher/ecc-misc.c (gcry_ecc_mont_decodepoint): Fix code path for short length data. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
2016-02-02ecc: more fix of Curve25519.NIIBE Yutaka1-4/+3
* cipher/ecc-misc.c (gcry_ecc_mont_decodepoint): Fix removing of prefix. Clear the MSB, according to RFC7748. -- This change fixes two things. * Handle the case the prefix 0x40 comes at the end when scanned as standard MPI. * Implement MSB handling. In the page 7 of RFC7748, it says about decoding u-coordinate: When receiving such an array, implementations of X25519 (but not X448) MUST mask the most significant bit in the final byte. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
2016-02-02ecc: Fix ECDH of Curve25519.NIIBE Yutaka2-18/+27
* cipher/ecc-misc.c (_gcry_ecc_mont_decodepoint): Fix calc of NBITS and prefix detection. * cipher/ecc.c (ecc_generate): Use NBITS instead of CTX->NBITS. (ecc_encrypt_raw): Use NBITS from curve instead of from P. Fix rawmpilen calculation. (ecc_decrypt_raw): Likewise. Add debug output. -- This fixes the commit dd3d06e7. NBITS is defined 256 in ecc-curves.c, thus, ecc_get_nbits returns 256. But CTX->NBITS has 255 for Montgomery curve.
2016-01-29Improve performance of generic SHA256 implementationJussi Kivilinna1-87/+83
* cipher/sha256.c (R): Let caller do variable shuffling. (Chro, Maj, Sum0, Sum1): Convert from inline functions to macros. (W, I): New. (transform_blk): Unroll round loop; inline message expansion to rounds to make message expansion buffer smaller. -- Benchmark on Cortex-A8 (armv6, 1008 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 27.63 ns/B 34.52 MiB/s 27.85 c/B After (1.31x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 20.97 ns/B 45.48 MiB/s 21.13 c/B Benchmark on Cortex-A8 (armv7, 1008 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 24.18 ns/B 39.43 MiB/s 24.38 c/B After (1.13x faster): | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 21.28 ns/B 44.82 MiB/s 21.45 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 5.78 ns/B 164.9 MiB/s 18.51 c/B After (1.06x faster) | nanosecs/byte mebibytes/sec cycles/byte SHA256 | 5.41 ns/B 176.1 MiB/s 17.33 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2016-01-28ecc: New API function gcry_mpi_ec_decode_point.Werner Koch1-7/+2
* mpi/ec.c (_gcry_mpi_ec_decode_point): New. * cipher/ecc-common.h: Move two prototypes to ... * src/ec-context.h: here. * src/gcrypt.h.in (gcry_mpi_ec_decode_point): New. * src/libgcrypt.def (gcry_mpi_ec_decode_point): New. * src/libgcrypt.vers (gcry_mpi_ec_decode_point): New. * src/visibility.c (gcry_mpi_ec_decode_point): New. * src/visibility.h: Add new function. -- This new function make the use of the gcry_mpi_ec_curve_point function possible in many contexts. Here is a code snippet which could be used in gpg to check a point: static gpg_error_t check_point (PKT_public_key *pk, gcry_mpi_t m_point) { gpg_error_t err; char *curve; gcry_ctx_t gctx = NULL; gcry_mpi_point_t point = NULL; /* Get the curve name from the first OpenPGP key parameter. */ curve = openpgp_oid_to_str (pk->pkey[0]); if (!curve) { err = gpg_error_from_syserror (); goto leave; } point = gcry_mpi_point_new (0); if (!point) { err = gpg_error_from_syserror (); goto leave; } err = gcry_mpi_ec_new (&gctx, NULL, curve); if (err) goto leave; err = gcry_mpi_ec_decode_point (point, m_point, gctx); if (err) goto leave; if (!gcry_mpi_ec_curve_point (point, gctx)) err = gpg_error (GPG_ERR_BAD_DATA); leave: gcry_ctx_release (gctx); gcry_mpi_point_release (point); xfree (curve); return err; } Signed-off-by: Werner Koch <wk@gnupg.org>
2015-12-07cipher: Improve error handling.Justus Winter1-1/+4
* cipher/ecc.c (ecc_decrypt_raw): Improve error handling. -- Found using the Clang Static Analyzer. Signed-off-by: Justus Winter <justus@g10code.com>
2015-12-07cipher: Initialize 'flags'.Justus Winter1-1/+1
* cipher/ecc.c (ecc_encrypt_raw): Initialize 'flags' to 0. -- Found using the Clang Static Analyzer. Signed-off-by: Justus Winter <justus@g10code.com>
2015-12-05ecc: CHANGE point representation of Curve25519.NIIBE Yutaka2-17/+52
* cipher/ecc-misc.c (_gcry_ecc_mont_decodepoint): Decode point with the prefix 0x40, additional 0x00 by MPI handling, and shorter octets by MPI normalization. * cipher/ecc.c (ecc_generate, ecc_encrypt_raw, ecc_decrypt_raw): Always add the prefix 0x40. -- Curve25519 native little-endian point representation is not friendly to existing practice of OpenPGP code, where MPI is assumed. MPI handling might insert 0x00 in the beginning to avoid sign confusion. MPI handling also might remove 0x00s in the front. So, it is safe to put the prefix 0x40. While we support old point representation of no prefix in ecc_mont_decodepoint, new libgcrypt always put the prefix.
2015-12-03chacha20: fix alignment of self-test contextJussi Kivilinna1-21/+25
* cipher/chacha20.c (selftest): Ensure 16-byte alignment for chacha20 context structure. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-12-03salsa20: fix alignment of self-test contextJussi Kivilinna1-15/+19
* cipher/salsa20.c (selftest): Ensure 16-byte alignment for salsa20 context structure. -- Reported-by: Carlos J Puga Medina <cpm@fbsd.es> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-18cipher: Fix error handling.Justus Winter1-0/+1
* cipher/cipher.c (_gcry_cipher_ctl): Fix error handling. -- Found using the Clang Static Analyzer. Signed-off-by: Justus Winter <justus@g10code.com>
2015-11-18Tweak Keccak for small speed-upJussi Kivilinna2-30/+27
* cipher/keccak_permute_32.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Track rounds with round constant pointer instead of separate round counter. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Ditto. (KECCAK_F1600_ABSORB_FUNC_NAME): Tweak lanes pointer increment for bulk absorb loops. -- Patch makes small tweaks to improve performance. Benchmark on Intel Haswell @ 3.2 Ghz: Before: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.27 ns/B 420.5 MiB/s 7.26 c/B SHAKE256 | 2.79 ns/B 341.4 MiB/s 8.94 c/B SHA3-224 | 2.64 ns/B 361.7 MiB/s 8.44 c/B SHA3-256 | 2.79 ns/B 341.4 MiB/s 8.94 c/B SHA3-384 | 3.65 ns/B 261.3 MiB/s 11.68 c/B SHA3-512 | 5.27 ns/B 181.0 MiB/s 16.86 c/B After: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.25 ns/B 423.5 MiB/s 7.21 c/B SHAKE256 | 2.77 ns/B 343.9 MiB/s 8.88 c/B SHA3-224 | 2.62 ns/B 364.1 MiB/s 8.38 c/B SHA3-256 | 2.77 ns/B 343.8 MiB/s 8.88 c/B SHA3-384 | 3.63 ns/B 262.6 MiB/s 11.63 c/B SHA3-512 | 5.23 ns/B 182.3 MiB/s 16.75 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-17Fix typos found using codespellJustus Winter7-10/+10
* cipher/cipher-ocb.c: Fix typos. * cipher/des.c: Likewise. * cipher/dsa-common.c: Likewise. * cipher/ecc.c: Likewise. * cipher/pubkey.c: Likewise. * cipher/rsa-common.c: Likewise. * cipher/scrypt.c: Likewise. * random/random-csprng.c: Likewise. * random/random-fips.c: Likewise. * random/rndw32.c: Likewise. * src/cipher-proto.h: Likewise. * src/context.c: Likewise. * src/fips.c: Likewise. * src/gcrypt.h.in: Likewise. * src/global.c: Likewise. * src/sexp.c: Likewise. * tests/mpitests.c: Likewise. * tests/t-lock.c: Likewise. Signed-off-by: Justus Winter <justus@g10code.com>
2015-11-01Improve performance of Tiger hash algorithmsJussi Kivilinna1-64/+40
* cipher/tiger.c (tiger_round, pass, key_schedule): Convert functions to macros. (transform_blk): Pass variable names instead of pointers to 'pass'. -- Benchmark results on Intel Haswell @ 3.2 Ghz: Before: | nanosecs/byte mebibytes/sec cycles/byte TIGER | 3.25 ns/B 293.5 MiB/s 10.40 c/B After (1.75x faster): | nanosecs/byte mebibytes/sec cycles/byte TIGER | 1.85 ns/B 515.3 MiB/s 5.92 c/B Benchmark results on Cortex-A8 @ 1008 Mhz: Before: | nanosecs/byte mebibytes/sec cycles/byte TIGER | 63.42 ns/B 15.04 MiB/s 63.93 c/B After (1.26x faster): | nanosecs/byte mebibytes/sec cycles/byte TIGER | 49.99 ns/B 19.08 MiB/s 50.39 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01Add ARMv7/NEON implementation of KeccakJussi Kivilinna4-5/+1015
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'. * cipher/keccak-armv7-neon.S: New. * cipher/keccak.c (USE_64BIT_ARM_NEON): New. (NEED_COMMON64): Select if USE_64BIT_ARM_NEON. [NEED_COMMON64] (round_consts_64bit): Rename to... [NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add terminator at end. [USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon) (_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon) (keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New. (keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation if supported by HW. * cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update to use new round constant table. * configure.ac: Add 'keccak-armv7-neon.lo'. -- Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch is based on public-domain implementation by Ronny Van Keer from SUPERCOP package: https://github.com/floodyberry/supercop/blob/master/crypto_hash/\ keccakc1024/inplace-armv7a-neon/keccak2.s Benchmark results on Cortex-A8 @ 1008 Mhz: Before (generic 32-bit bit-interleaved impl.): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B After (ARM/NEON, ~3.2x faster): | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-11-01Optimize Keccak 64-bit absorb functionsJussi Kivilinna2-66/+192
* cipher/keccak.c [USE_64BIT] [__x86_64__] (absorb_lanes64_8) (absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New. * cipher/keccak.c [USE_64BIT] [!__x86_64__] (absorb_lanes64_8) (absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New. [USE_64BIT] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT] (keccak_absorb_lanes64): Remove. [USE_64BIT_SHLD] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT_SHLD] (keccak_absorb_lanes64_shld): Remove. [USE_64BIT_BMI2] (KECCAK_F1600_ABSORB_FUNC_NAME): New. [USE_64BIT_BMI2] (keccak_absorb_lanes64_bmi2): Remove. * cipher/keccak_permute_64.h (KECCAK_F1600_ABSORB_FUNC_NAME): New. -- Optimize 64-bit absorb functions for small speed-up. After this change, 64-bit BMI2 implementation matches speed of fastest results from SUPERCOP for Intel Haswell CPUs (long messages). Benchmark on Intel Haswell @ 3.2 Ghz: Before: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.32 ns/B 411.7 MiB/s 7.41 c/B SHAKE256 | 2.84 ns/B 336.2 MiB/s 9.08 c/B SHA3-224 | 2.69 ns/B 354.9 MiB/s 8.60 c/B SHA3-256 | 2.84 ns/B 336.0 MiB/s 9.08 c/B SHA3-384 | 3.69 ns/B 258.4 MiB/s 11.81 c/B SHA3-512 | 5.30 ns/B 179.9 MiB/s 16.97 c/B After: | nanosecs/byte mebibytes/sec cycles/byte SHAKE128 | 2.27 ns/B 420.6 MiB/s 7.26 c/B SHAKE256 | 2.79 ns/B 341.4 MiB/s 8.94 c/B SHA3-224 | 2.64 ns/B 361.7 MiB/s 8.44 c/B SHA3-256 | 2.79 ns/B 341.5 MiB/s 8.94 c/B SHA3-384 | 3.65 ns/B 261.4 MiB/s 11.68 c/B SHA3-512 | 5.27 ns/B 181.0 MiB/s 16.87 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-31Keccak: Add SHAKE Extendable-Output FunctionsJussi Kivilinna3-35/+270
* src/hash-common.c (_gcry_hash_selftest_check_one): Add handling for XOFs. * src/keccak.c (keccak_ops_t): Rename 'extract_inplace' to 'extract' and add 'pos' argument. (KECCAK_CONTEXT): Add 'suffix'. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi): Rename to... (keccak_extract32bi): ...this; Add handling for 'pos' argument. (keccak_extract_inplace64): Rename to... (keccak_extract64): ...this; Add handling for 'pos' argument. (keccak_extract_inplace32bi_bmi2): Rename to... (keccak_extract32bi_bmi2): ...this; Add handling for 'pos' argument. (keccak_init): Setup 'suffix'; add SHAKE128 & SHAKE256. (shake128_init, shake256_init): New. (keccak_final): Do not initial permute for SHAKE output; use correct suffix for SHAKE. (keccak_extract): New. (keccak_selftests_keccak): Add SHAKE128 & SHAKE256 test-vectors. (run_selftests): Add SHAKE128 & SHAKE256. (shake128_asn, oid_spec_shake128, shake256_asn, oid_spec_shake256) (_gcry_digest_spec_shake128, _gcry_digest_spec_shake256): New. * cipher/md.c (digest_list): Add SHAKE128 & SHAKE256. * doc/gcrypt.texi: Ditto. * src/cipher.h (_gcry_digest_spec_shake128) (_gcry_digest_spec_shake256): New. * src/gcrypt.h.in (GCRY_MD_SHAKE128, GCRY_MD_SHAKE256): New. * tests/basic.c (check_one_md): Add XOF check; Add 'elen' argument. (check_one_md_multi): Skip if algo is XOF. (check_digests): Add SHAKE128 & SHAKE256 test vectors. * tests/bench-slope.c (kdf_bench_one): Skip XOFs. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28md: add variable length output interfaceJussi Kivilinna14-26/+89
* cipher/crc.c (_gcry_digest_spec_crc32) (_gcry_digest_spec_crc32_rfc1510, _gcry_digest_spec_crc24_rfc2440): Set 'extract' NULL. * cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_94) (_gcry_digest_spec_gost3411_cp): Ditto. * cipher/keccak.c (_gcry_digest_spec_sha3_224) (_gcry_digest_spec_sha3_256, _gcry_digest_spec_sha3_384) (_gcry_digest_spec_sha3_512): Ditto. * cipher/md2.c (_gcry_digest_spec_md2): Ditto. * cipher/md4.c (_gcry_digest_spec_md4): Ditto. * cipher/md5.c (_gcry_digest_spec_md5): Ditto. * cipher/rmd160.c (_gcry_digest_spec_rmd160): Ditto. * cipher/sha1.c (_gcry_digest_spec_sha1): Ditto. * cipher/sha256.c (_gcry_digest_spec_sha224) (_gcry_digest_spec_sha256): Ditto. * cipher/sha512.c (_gcry_digest_spec_sha384) (_gcry_digest_spec_sha512): Ditto. * cipher/stribog.c (_gcry_digest_spec_stribog_256) (_gcry_digest_spec_stribog_512): Ditto. * cipher/tiger.c (_gcry_digest_spec_tiger) (_gcry_digest_spec_tiger1, _gcry_digest_spec_tiger2): Ditto. * cipher/whirlpool.c (_gcry_digest_spec_whirlpool): Ditto. * cipher/md.c (md_enable): Do not allow combination of HMAC and 'expandable-output function'. (md_final): Check if spec->read is NULL before calling. (md_read): Ditto. (md_extract, _gcry_md_extract): New. * doc/gcrypt.texi: Add SHA3 algorithms and gcry_md_extract. * src/cipher-proto.h (gcry_md_extract_t): New. (gcry_md_spec_t): Add 'extract'. * src/gcrypt-int.g (_gcry_md_extract): New. * src/gcrypt.h.in (gcry_md_extract): New. * src/libgcrypt.def: Add gcry_md_extract. * src/libgcrypt.vers: Add gcry_md_extract. * src/visibility.c (gcry_md_extract): New. * src/visibility.h (gcry_md_extract): New. -- Patch adds new interface for reading output from 'expandable-output function' MD algorithms that can give variable length output (ie. SHAKE algorithms from FIPS-202). New function to read output is gpg_error_t gcry_md_extract(gcry_md_hd_t md, int algo, void *buffer, size_t length); Function implicitly finalizes algorithm so that no new input can be given. Subsequents calls of the function return more output bytes from the algorithm. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28md: check hmac flag in prepare_macpadsJussi Kivilinna1-0/+3
* cipher/md.c (prepare_macpads): Check hmac flag. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28keccak: rewrite for improved performanceJussi Kivilinna5-243/+1404
* cipher/Makefile.am: Add 'keccak_permute_32.h' and 'keccak_permute_64.h'. * cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove. * cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2) (USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI) (keccak_ops_t): New. (KECCAK_STATE): Add 'state64' and 'state32bi' members. (KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'. (rol64, keccak_f1600_state_permute): Remove. [NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New. [NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi) (keccak_absorb_lane32bi): New. [USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64) (keccak_absorb_lanes64, keccak_generic64_ops): New. [USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld) (keccak_absorb_lanes64_shld, keccak_shld_64_ops): New. [USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2) (keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New. [USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi) (keccak_absorb_lanes32bi, keccak_generic32bi_ops): New. [USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2) (pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2) (keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New. (keccak_write): New. (keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation selection based on HWF features. (keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops' for state manipulation. (keccak_read): Adjust to KECCAK_CONTEXT changes. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use 'keccak_write' instead of '_gcry_md_block_write'. * cipher/keccak_permute_32.h: New. * cipher/keccak_permute_64.h: New. -- Patch adds new generic 64-bit and 32-bit implementations and optimized implementations for SHA3: - Generic 64-bit implementation based on 'simple' implementation from SUPERCOP package. - Generic 32-bit bit-inteleaved implementataion based on 'simple32bi' implementation from SUPERCOP package. - Intel BMI2 optimized variants of 64-bit and 32-bit BI implementations. - Intel SHLD optimized variant of 64-bit implementation. Patch also makes proper use of sponge construction to avoid use of addition input buffer. Below are bench-slope benchmarks for new 64-bit implementations made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2). Before (amd64): SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B After (generic 64-bit, amd64), 1.10x faster): SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B After (Intel SHLD 64-bit, amd64, 1.13x faster): SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B After (Intel BMI2 64-bit, amd64, 1.45x faster): SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B Benchmarks of new 32-bit implementations on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2): Before (win32): SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B After (generic 32-bit BI, win32, 1.23x to 1.29x faster): SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster): SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B Benchmarks of new 32-bit implementation on ARM Cortex-A8 (1008 Mhz, gcc-4.9.1): Before: SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B After (1.56x faster): SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28hwf-x86: add detection for Intel CPUs with fast SHLD instructionJussi Kivilinna3-5/+5
* cipher/sha1.c (sha1_init): Use HWF_INTEL_FAST_SHLD instead of HWF_INTEL_CPU. * cipher/sha256.c (sha256_init, sha224_init): Ditto. * cipher/sha512.c (sha512_init, sha384_init): Ditto. * src/g10lib.h (HWF_INTEL_FAST_SHLD): New. (HWF_INTEL_BMI2, HWF_INTEL_SSSE3, HWF_INTEL_PCLMUL, HWF_INTEL_AESNI) (HWF_INTEL_RDRAND, HWF_INTEL_AVX, HWF_INTEL_AVX2) (HWF_ARM_NEON): Update. * src/hwf-x86.c (detect_x86_gnuc): Add detection of Intel Core CPUs with fast SHLD/SHRD instruction. * src/hwfeatures.c (hwflist): Add "intel-fast-shld". -- Intel Core CPUs since codename sandy-bridge have been able to execute SHLD/SHRD instructions faster than rotate instructions ROL/ROR. Since SHLD/SHRD can be used to do rotation, some optimized implementations (SHA1/SHA256/SHA512) use SHLD/SHRD instructions in-place of ROL/ROR. This patch provides more accurate detection of CPUs with fast SHLD implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28Fix OCB amd64 assembly implementations for x32Jussi Kivilinna3-116/+136
* cipher/camellia-glue.c (_gcry_camellia_aesni_avx_ocb_enc) (_gcry_camellia_aesni_avx_ocb_dec, _gcry_camellia_aesni_avx_ocb_auth) (_gcry_camellia_aesni_avx2_ocb_enc, _gcry_camellia_aesni_avx2_ocb_dec) (_gcry_camellia_aesni_avx2_ocb_auth, _gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_auth): Change 'Ls' from pointer array to u64 array. * cipher/serpent.c (_gcry_serpent_sse2_ocb_enc) (_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth) (_gcry_serpent_avx2_ocb_enc, _gcry_serpent_avx2_ocb_dec) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Ditto. * cipher/twofish.c (_gcry_twofish_amd64_ocb_enc) (_gcry_twofish_amd64_ocb_dec, _gcry_twofish_amd64_ocb_auth) (twofish_amd64_ocb_enc, twofish_amd64_ocb_dec, twofish_amd64_ocb_auth) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Ditto. -- Pointers on x32 are 32-bit, but amd64 assembly implementations expect 64-bit pointers. Pass 'Ls' array to 64-bit integers so that input arrays has correct format for assembly functions. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-22md: keep contexts for HMAC in GcryDigestEntry.NIIBE Yutaka1-124/+120
* cipher/md.c (struct gcry_md_context): Add flags.hmac. Remove macpads and mcpads_Bsize. (md_open): Initialize flags.hmac. Remove macpads initialization. (md_enable): Allocate contexts when flags.hmac is enabled. (md_copy): Remove macpads copying. Add copying contexts. (_gcry_md_reset): When flags.hmac is enabled, restore precomputed context with input pad (md_close): Remove macpads wiping. (md_final): When flags.hmac is enabled, compute hmac by precomputed context with output pad. (prepare_macpads): Prepare precomputed contexts with input pad and output pad for each registered digest entry. (_gcry_md_setkey): Just call prepare_macpads. -- This change is making things straight in HMAC computation. This makes HMAC computation allow multple algorithms in future. Libgcrypt's code has a potential to compute digests for multiple algorithms at once (currently, it's not enabled). HMAC code didn't work well with multple algorithms, because the macpads were only allocated for an algorithm. Now, it's allocated for each algorithm. We now precompute hash contexts, instead of keeping input pad and output pad. This can be performance improvement, which is described in RFC 2104. Thanks to: Andrea Visconti, Simone Bossi, Hany Ragab and Alexandro Calò For the discussion and their paper of CANS2015, which titled: On the weaknesses of PBKDF2
2015-10-14Fix gpg_error_t and gpg_err_code_t confusion.NIIBE Yutaka5-15/+13
* src/gcrypt-int.h (_gcry_sexp_extract_param): Revert the change. * cipher/dsa.c (dsa_check_secret_key): Ditto. * src/sexp.c (_gcry_sexp_extract_param): Return gpg_err_code_t. * src/gcrypt-int.h (_gcry_err_make_from_errno) (_gcry_error_from_errno): Return gpg_error_t. * cipher/cipher.c (_gcry_cipher_open_internal) (_gcry_cipher_ctl, _gcry_cipher_ctl): Don't use gcry_error. * src/global.c (_gcry_vcontrol): Likewise. * cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Use gpg_err_code_from_syserror. * cipher/mac.c (mac_reset, mac_setkey, mac_setiv, mac_write) (mac_read, mac_verify): Return gcry_err_code_t. * cipher/rsa-common.c (mgf1): Use gcry_err_code_t for ERR. * src/visibility.c (gcry_error_from_errno): Return gpg_error_t. -- Reverting a part of 73374fdd and fix _gcry_sexp_extract_param return type, instead. Fix similar coding mistakes, throughout.
2015-10-13Fix compiling AES/AES-NI implementation on linux-i386Jussi Kivilinna1-12/+13
* cipher/rijndael-aesni.c (do_aesni_ctr_4): Split assembly block in two parts to reduce number of register constraints needed. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-13Fix declaration of return type.NIIBE Yutaka1-3/+5
* src/gcrypt-int.h (_gcry_sexp_extract_param): Return gpg_error_t. * cipher/dsa.c (dsa_generate): Fix call to _gcry_sexp_extract_param. * src/g10lib.h (_gcry_vcontrol): Return gcry_err_code_t. * src/visibility.c (gcry_mpi_snatch): Fix call to _gcry_mpi_snatch. -- GnuPG-bug-id: 2074
2015-09-04w32: Avoid a few compiler warnings.Werner Koch1-0/+6
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Mark variable as unused. * random/rndw32.c (slow_gatherer): Avoid signed pointer mismatch warning. * src/secmem.c (init_pool): Avoid unused variable warning. * tests/random.c (writen, readn): Include on if needed. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-09-04w32: Fix alignment problem with AESNI on Windows >= 8Werner Koch3-15/+70
* cipher/cipher-selftest.c (_gcry_cipher_selftest_alloc_ctx): New. * cipher/rijndael.c (selftest_basic_128, selftest_basic_192) (selftest_basic_256): Allocate context on the heap. -- The stack alignment on Windows changed and because ld seems to limit stack variables to a 8 byte alignment (we request 16), we get bus errors from the selftests if AESNI is in use. GnuPG-bug-id: 2085 Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-31rsa: Add verify after sign to avoid Lenstra's CRT attack.Werner Koch1-1/+18
* cipher/rsa.c (rsa_sign): Check the CRT. -- Failures in the computation of the CRT (e.g. due faulty hardware) can lead to a leak of the private key. The standard precaution against this is to verify the signature after signing. GnuPG does this itself and even has an option to disable this. However, the low performance impact of this extra precaution suggest that it should always be done and Libgcrypt is the right place here. For decryption is not done because the application will detect the failure due to garbled plaintext and in any case no key derived material will be send to the user. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-12Keccak: Fix array indexes in θ stepJussi Kivilinna1-12/+12
* cipher/keccak.c (keccak_f1600_state_permute): Fix indexes for D[5]. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-12Simplify OCB offset calculation for parallel implementationsJussi Kivilinna3-642/+544
* cipher/camellia-glue.c (_gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_auth): Precalculate Ls array always, instead of just if 'blkn % <parallel blocks> == 0'. * cipher/serpent.c (_gcry_serpent_ocb_crypt) (_gcry_serpent_ocb_auth): Ditto. * cipher/rijndael-aesni.c (get_l): Remove low-bit checks. (aes_ocb_enc, aes_ocb_dec, _gcry_aes_aesni_ocb_auth): Handle leading blocks until block counter is multiple of 4, so that parallel block processing loop can use 'c->u_mode.ocb.L' array directly. * tests/basic.c (check_ocb_cipher_largebuf): Rename to... (check_ocb_cipher_largebuf_split): ...this and add option to process large buffer as two split buffers. (check_ocb_cipher_largebuf): New. -- Patch simplifies source and reduce object size. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-12Add carryless 8-bit addition fast-path for AES-NI CTR modeJussi Kivilinna1-2/+33
* cipher/rijndael-aesni.c (do_aesni_ctr_4): Do addition using CTR in big-endian form, if least-significant byte does not overflow. -- Patch improves AES-NI CTR speed by 20%. Benchmark on Intel Haswell (3.2 Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CTR enc | 0.273 ns/B 3489.8 MiB/s 0.875 c/B CTR dec | 0.273 ns/B 3491.0 MiB/s 0.874 c/B After: CTR enc | 0.228 ns/B 4190.0 MiB/s 0.729 c/B CTR dec | 0.228 ns/B 4190.2 MiB/s 0.729 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-10Add generic SHA3 implementationJussi Kivilinna6-45/+453
* cipher/hash-common.h (MD_BLOCK_MAX_BLOCKSIZE): Increase blocksize USE_SHA3 enabled. * cipher/keccak.c (SHA3_DELIMITED_SUFFIX, SHAKE_DELIMITED_SUFFIX): New. (KECCAK_STATE): Add proper state. (KECCAK_CONTEXT): Add 'outlen'. (rol64, keccak_f1600_state_permute, transform_blk, transform): New. (keccak_init): Add proper initialization. (keccak_final): Add proper finalization. (selftests_keccak): Add selftests. (oid_spec_sha3_224, oid_spec_sha3_256, oid_spec_sha3_384) (oid_spec_sha3_512): Add OID. (_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256) (_gcry_digest_spec_sha3_384, _gcry_digest_spec_sha3_512): Fix output length. * cipher/mac-hmac.c (map_mac_algo_to_md): Fix mapping for SHA3-512. (hmac_get_keylen): Return proper blocksizes for SHA3 algorithms. [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac-internal [USE_SHA3] (_gcry_mac_type_spec_hmac_sha3_224) (_gcry_mac_type_spec_hmac_sha3_256, _gcry_mac_type_spec_hmac_sha3_384) (_gcry_mac_type_spec_hmac_sha3_512): New. * cipher/mac.c (mac_list) [USE_SHA3]: Add SHA3 algorithms. * cipher/md.c (md_open): Use proper SHA-3 blocksizes for HMAC macpads. * tests/basic.c (check_digests): Add SHA3 test vectors. -- Patch adds generic implementation for SHA3. Currently missing with this patch: - HMAC SHA3 test vectors, not available from NIST (yet?) - ASNs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-10Optimize OCB offset calculationJussi Kivilinna8-351/+597
* cipher/cipher-internal.h (ocb_get_l): New. * cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate) (ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'. * cipher/camellia-glue.c (get_l): Remove. (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate offset array when block count matches parallel operation size; Use 'ocb_get_l' instead of 'get_l'. * cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common offsets. (aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate offset array when block count matches parallel operation size. * cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most common offsets. * cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'. * cipher/serpent.c (get_l): Remove. (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate offset array when block count matches parallel operation size; Use 'ocb_get_l' instead of 'get_l'. * cipher/twofish.c (get_l): Remove. (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l' instead of 'get_l'. -- Patch optimizes OCB offset calculation for generic code and assembly implementations with parallel block processing. Benchmark of OCB AES-NI on Intel Haswell: $ tests/bench-slope --cpu-mhz 3201 cipher aes Before: AES | nanosecs/byte mebibytes/sec cycles/byte CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-08-10ecc: fix Montgomery curve bugs.NIIBE Yutaka2-2/+4
* cipher/ecc.c (check_secret_key): Y1 should not be NULL when check. (ecc_check_secret_key): Support Montgomery curve. * mpi/ec.c (_gcry_mpi_ec_curve_point): Fix condition.
2015-08-08Add framework to eventually support SHA3.Werner Koch6-0/+296
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256) (GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New. (GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256) (GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New. * cipher/keccak.c: New with stub functions. * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c. * configure.ac (available_digests): Add sha3. (USE_SHA3): New. * src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests. * cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos. (md_open): Ditto for hmac processing. * cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping. * cipher/hmac-tests.c (run_selftests): Prepare for tests. * cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx". -- Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to sync them with OpenPGP. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-06ecc: Free memory also when in error branch.Ismo Puustinen1-3/+5
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_sign): Init DISGEST and goto leave on error. -- Fixing an issue found by static analysis. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com> Added DIGEST init and wrote Changelog. Signed-off-by: Werner Koch <wk@gnupg.org>
2015-08-06Add Curve25519 support.NIIBE Yutaka5-50/+228
* cipher/ecc-curves.c (curve_aliases, domain_parms): Add Curve25519. * tests/curves.c (N_CURVES): It's 22 now. * src/cipher.h (PUBKEY_FLAG_DJB_TWEAK): New. * cipher/ecc-common.h (_gcry_ecc_mont_decodepoint): New. * cipher/ecc-misc.c (_gcry_ecc_mont_decodepoint): New. * cipher/ecc.c (nist_generate_key): Handle the case of PUBKEY_FLAG_DJB_TWEAK and Montgomery curve. (test_ecdh_only_keys, check_secret_key): Likewise. (ecc_generate): Support Curve25519 which is Montgomery curve with flag PUBKEY_FLAG_DJB_TWEAK and PUBKEY_FLAG_COMP. (ecc_encrypt_raw): Get flags from KEYPARMS and handle PUBKEY_FLAG_DJB_TWEAK and Montgomery curve. (ecc_decrypt_raw): Likewise. (compute_keygrip): Handle the case of PUBKEY_FLAG_DJB_TWEAK. * cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): PUBKEY_FLAG_EDDSA implies PUBKEY_FLAG_DJB_TWEAK. Parse "djb-tweak" for PUBKEY_FLAG_DJB_TWEAK. -- With PUBKEY_FLAG_DJB_TWEAK, secret key has msb set and it should be always multiple by cofactor.
2015-07-27Reduce code size for Twofish key-setup and remove key dependend branchJussi Kivilinna1-50/+26
* cipher/twofish.c (poly_to_exp): Increase size by one, change type from byte to u16 and insert '492' to index 0. (exp_to_poly): Increase size by 256, let new cells have zero value. (CALC_S): Execute unconditionally with help of modified tables. (do_twofish_setkey): Change type for 'tmp' to 'unsigned int'; Un-unroll CALC_K256 and CALC_K phases to reduce generated object size. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-07-27Reduce amount of duplicated code in OCB bulk implementationsJussi Kivilinna6-209/+101
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate) (ocb_crypt): Change bulk function to return number of unprocessed blocks. * src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type to 'size_t'. * cipher/camellia-glue.c (get_l): Only if USE_AESNI_AVX or USE_AESNI_AVX2 defined. (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_AESNI_AVX or USE_AESNI_AVX2 defined; Remove unaccelerated common code. * cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Change return type to 'size_t' and return zero. * cipher/serpent.c (get_l): Only if USE_SSE2, USE_AVX2 or USE_NEON defined. (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_SSE2, USE_AVX2 or USE_NEON defined; Remove unaccelerated common code. * cipher/twofish.c (get_l): Only if USE_AMD64_ASM defined. (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_AMD64_ASM defined; Remove unaccelerated common code. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-07-27Add bulk OCB for Serpent SSE2, AVX2 and NEON implementationsJussi Kivilinna5-3/+1287
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk functions for Serpent. * cipher/serpent-armv7-neon.S: Add OCB assembly functions. * cipher/serpent-avx2-amd64.S: Add OCB assembly functions. * cipher/serpent-sse2-amd64.S: Add OCB assembly functions. * cipher/serpent.c (_gcry_serpent_sse2_ocb_enc) (_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth) (_gcry_serpent_neon_ocb_enc, _gcry_serpent_neon_ocb_dec) (_gcry_serpent_neon_ocb_auth, _gcry_serpent_avx2_ocb_enc) (_gcry_serpent_avx2_ocb_dec, _gcry_serpent_avx2_ocb_auth): New prototypes. (get_l, _gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): New. * src/cipher.h (_gcry_serpent_ocb_crypt) (_gcry_serpent_ocb_auth): New. * tests/basic.c (check_ocb_cipher): Add test-vector for serpent. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-07-27Add bulk OCB for Twofish AMD64 implementationJussi Kivilinna3-1/+570
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk functions for Twofish. * cipher/twofish-amd64.S: Add OCB assembly functions. * cipher/twofish.c (_gcry_twofish_amd64_ocb_enc) (_gcry_twofish_amd64_ocb_dec, _gcry_twofish_amd64_ocb_auth): New prototypes. (call_sysv_fn5, call_sysv_fn6, twofish_amd64_ocb_enc) (twofish_amd64_ocb_dec, twofish_amd64_ocb_auth, get_l) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): New. * src/cipher.h (_gcry_twofish_ocb_crypt) (_gcry_twofish_ocb_auth): New. * tests/basic.c (check_ocb_cipher): Add test-vector for Twofish. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>