Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'.
* cipher/crc-intel-pclmul.c: New.
* cipher/crc.c (USE_INTEL_PCLMUL): New macro.
(CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'.
[USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul)
(gcry_crc24rfc2440_intel_pclmul): New.
(crc32_init, crc32rfc1510_init, crc24rfc2440_init)
[USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL
HW features detected.
(crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL
implementation if enabled.
(crc24_init): Document storage format of 24-bit CRC.
(crc24_next4): Use only 'data' for last table look-up.
* configure.ac: Add 'crc-intel-pclmul.lo'.
* src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include
Intel SSE4.1.
* src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection.
* src/hwfeatures.c (hwflist): Add 'intel-sse4.1'.
* tests/basic.c (fillbuf_count): New.
(check_one_md): Add "?" check (million byte data-set with byte pattern
0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?"
checks.
(check_one_md_multi): Skip "?".
(check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256,
SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160,
CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!"
test-vectors for CRC32_RFC1510 and CRC24_RFC2440.
--
Add Intel PCLMUL accelerated implmentations of CRC algorithms.
CRC performance is improved ~11x on x86_64 and i386 on Intel
Haswell, and ~2.7x on Intel Sandy-bridge.
Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B
CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B
CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B
CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B
CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B
Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B
CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B
CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B
CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B
CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B
Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B
CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B
CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B
CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B
CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-arm.S'.
* cipher/sha512-arm.S: New.
* cipher/sha512.c (USE_ARM_ASM): New.
(_gcry_sha512_transform_arm): New.
(transform) [USE_ARM_ASM]: Use ARM assembly implementation instead of
generic.
* configure.ac: Add 'sha512-arm.lo'.
--
Benchmark on Cortex-A8 (armv6, 1008 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
SHA512 | 112.0 ns/B 8.52 MiB/s 112.9 c/B
After (3.3x faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHA512 | 34.01 ns/B 28.04 MiB/s 34.28 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'keccak-armv7-neon.S'.
* cipher/keccak-armv7-neon.S: New.
* cipher/keccak.c (USE_64BIT_ARM_NEON): New.
(NEED_COMMON64): Select if USE_64BIT_ARM_NEON.
[NEED_COMMON64] (round_consts_64bit): Rename to...
[NEED_COMMON64] (_gcry_keccak_round_consts_64bit): ...this; Add
terminator at end.
[USE_64BIT_ARM_NEON] (_gcry_keccak_permute_armv7_neon)
(_gcry_keccak_absorb_lanes64_armv7_neon, keccak_permute64_armv7_neon)
(keccak_absorb_lanes64_armv7_neon, keccak_armv7_neon_64_ops): New.
(keccak_init) [USE_64BIT_ARM_NEON]: Select ARM/NEON implementation
if supported by HW.
* cipher/keccak_permute_64.h (KECCAK_F1600_PERMUTE_FUNC_NAME): Update
to use new round constant table.
* configure.ac: Add 'keccak-armv7-neon.lo'.
--
Patch adds ARMv7/NEON implementation of Keccak (SHAKE/SHA3). Patch
is based on public-domain implementation by Ronny Van Keer from
SUPERCOP package:
https://github.com/floodyberry/supercop/blob/master/crypto_hash/\
keccakc1024/inplace-armv7a-neon/keccak2.s
Benchmark results on Cortex-A8 @ 1008 Mhz:
Before (generic 32-bit bit-interleaved impl.):
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 83.00 ns/B 11.49 MiB/s 83.67 c/B
SHAKE256 | 101.7 ns/B 9.38 MiB/s 102.5 c/B
SHA3-224 | 96.13 ns/B 9.92 MiB/s 96.90 c/B
SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B
SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B
SHA3-512 | 189.1 ns/B 5.04 MiB/s 190.6 c/B
After (ARM/NEON, ~3.2x faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHAKE128 | 25.09 ns/B 38.01 MiB/s 25.29 c/B
SHAKE256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B
SHA3-224 | 29.24 ns/B 32.61 MiB/s 29.48 c/B
SHA3-256 | 30.95 ns/B 30.82 MiB/s 31.19 c/B
SHA3-384 | 40.42 ns/B 23.59 MiB/s 40.74 c/B
SHA3-512 | 58.37 ns/B 16.34 MiB/s 58.84 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'keccak_permute_32.h' and
'keccak_permute_64.h'.
* cipher/hash-common.h [USE_SHA3] (MD_BLOCK_MAX_BLOCKSIZE): Remove.
* cipher/keccak.c (USE_64BIT, USE_32BIT, USE_64BIT_BMI2)
(USE_64BIT_SHLD, USE_32BIT_BMI2, NEED_COMMON64, NEED_COMMON32BI)
(keccak_ops_t): New.
(KECCAK_STATE): Add 'state64' and 'state32bi' members.
(KECCAK_CONTEXT): Remove 'bctx'; add 'blocksize', 'count' and 'ops'.
(rol64, keccak_f1600_state_permute): Remove.
[NEED_COMMON64] (round_consts_64bit, keccak_extract_inplace64): New.
[NEED_COMMON32BI] (round_consts_32bit, keccak_extract_inplace32bi)
(keccak_absorb_lane32bi): New.
[USE_64BIT] (ANDN64, ROL64, keccak_f1600_state_permute64)
(keccak_absorb_lanes64, keccak_generic64_ops): New.
[USE_64BIT_SHLD] (ANDN64, ROL64, keccak_f1600_state_permute64_shld)
(keccak_absorb_lanes64_shld, keccak_shld_64_ops): New.
[USE_64BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute64_bmi2)
(keccak_absorb_lanes64_bmi2, keccak_bmi2_64_ops): New.
[USE_32BIT] (ANDN64, ROL64, keccak_f1600_state_permute32bi)
(keccak_absorb_lanes32bi, keccak_generic32bi_ops): New.
[USE_32BIT_BMI2] (ANDN64, ROL64, keccak_f1600_state_permute32bi_bmi2)
(pext, pdep, keccak_absorb_lane32bi_bmi2, keccak_absorb_lanes32bi_bmi2)
(keccak_extract_inplace32bi_bmi2, keccak_bmi2_32bi_ops): New.
(keccak_write): New.
(keccak_init): Adjust to KECCAK_CONTEXT changes; add implementation
selection based on HWF features.
(keccak_final): Adjust to KECCAK_CONTEXT changes; use selected 'ops'
for state manipulation.
(keccak_read): Adjust to KECCAK_CONTEXT changes.
(_gcry_digest_spec_sha3_224, _gcry_digest_spec_sha3_256)
(_gcry_digest_spec_sha3_348, _gcry_digest_spec_sha3_512): Use
'keccak_write' instead of '_gcry_md_block_write'.
* cipher/keccak_permute_32.h: New.
* cipher/keccak_permute_64.h: New.
--
Patch adds new generic 64-bit and 32-bit implementations and
optimized implementations for SHA3:
- Generic 64-bit implementation based on 'simple' implementation
from SUPERCOP package.
- Generic 32-bit bit-inteleaved implementataion based on
'simple32bi' implementation from SUPERCOP package.
- Intel BMI2 optimized variants of 64-bit and 32-bit BI
implementations.
- Intel SHLD optimized variant of 64-bit implementation.
Patch also makes proper use of sponge construction to avoid
use of addition input buffer.
Below are bench-slope benchmarks for new 64-bit implementations
made on Intel Core i5-4570 (no turbo, 3.2 Ghz, gcc-4.9.2).
Before (amd64):
SHA3-224 | 3.92 ns/B 243.2 MiB/s 12.55 c/B
SHA3-256 | 4.15 ns/B 230.0 MiB/s 13.27 c/B
SHA3-384 | 5.40 ns/B 176.6 MiB/s 17.29 c/B
SHA3-512 | 7.77 ns/B 122.7 MiB/s 24.87 c/B
After (generic 64-bit, amd64), 1.10x faster):
SHA3-224 | 3.57 ns/B 267.4 MiB/s 11.42 c/B
SHA3-256 | 3.77 ns/B 252.8 MiB/s 12.07 c/B
SHA3-384 | 4.91 ns/B 194.1 MiB/s 15.72 c/B
SHA3-512 | 7.06 ns/B 135.0 MiB/s 22.61 c/B
After (Intel SHLD 64-bit, amd64, 1.13x faster):
SHA3-224 | 3.48 ns/B 273.7 MiB/s 11.15 c/B
SHA3-256 | 3.68 ns/B 258.9 MiB/s 11.79 c/B
SHA3-384 | 4.80 ns/B 198.7 MiB/s 15.36 c/B
SHA3-512 | 6.89 ns/B 138.4 MiB/s 22.05 c/B
After (Intel BMI2 64-bit, amd64, 1.45x faster):
SHA3-224 | 2.71 ns/B 352.1 MiB/s 8.67 c/B
SHA3-256 | 2.86 ns/B 333.2 MiB/s 9.16 c/B
SHA3-384 | 3.72 ns/B 256.2 MiB/s 11.91 c/B
SHA3-512 | 5.34 ns/B 178.5 MiB/s 17.10 c/B
Benchmarks of new 32-bit implementations on Intel Core i5-4570
(no turbo, 3.2 Ghz, gcc-4.9.2):
Before (win32):
SHA3-224 | 12.05 ns/B 79.16 MiB/s 38.56 c/B
SHA3-256 | 12.75 ns/B 74.78 MiB/s 40.82 c/B
SHA3-384 | 16.63 ns/B 57.36 MiB/s 53.22 c/B
SHA3-512 | 23.97 ns/B 39.79 MiB/s 76.72 c/B
After (generic 32-bit BI, win32, 1.23x to 1.29x faster):
SHA3-224 | 9.76 ns/B 97.69 MiB/s 31.25 c/B
SHA3-256 | 10.27 ns/B 92.82 MiB/s 32.89 c/B
SHA3-384 | 13.22 ns/B 72.16 MiB/s 42.31 c/B
SHA3-512 | 18.65 ns/B 51.13 MiB/s 59.70 c/B
After (Intel BMI2 32-bit BI, win32, 1.66x to 1.70x faster):
SHA3-224 | 7.26 ns/B 131.4 MiB/s 23.23 c/B
SHA3-256 | 7.65 ns/B 124.7 MiB/s 24.47 c/B
SHA3-384 | 9.87 ns/B 96.67 MiB/s 31.58 c/B
SHA3-512 | 14.05 ns/B 67.85 MiB/s 44.99 c/B
Benchmarks of new 32-bit implementation on ARM Cortex-A8
(1008 Mhz, gcc-4.9.1):
Before:
SHA3-224 | 148.6 ns/B 6.42 MiB/s 149.8 c/B
SHA3-256 | 157.2 ns/B 6.07 MiB/s 158.4 c/B
SHA3-384 | 205.3 ns/B 4.65 MiB/s 206.9 c/B
SHA3-512 | 296.3 ns/B 3.22 MiB/s 298.6 c/B
After (1.56x faster):
SHA3-224 | 96.12 ns/B 9.92 MiB/s 96.89 c/B
SHA3-256 | 101.5 ns/B 9.40 MiB/s 102.3 c/B
SHA3-384 | 131.4 ns/B 7.26 MiB/s 132.5 c/B
SHA3-512 | 188.2 ns/B 5.07 MiB/s 189.7 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_MD_SHA3_224, GCRY_MD_SHA3_256)
(GCRY_MD_SHA3_384, GCRY_MD_SHA3_512): New.
(GCRY_MAC_HMAC_SHA3_224, GCRY_MAC_HMAC_SHA3_256)
(GCRY_MAC_HMAC_SHA3_384, GCRY_MAC_HMAC_SHA3_512): New.
* cipher/keccak.c: New with stub functions.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add keccak.c.
* configure.ac (available_digests): Add sha3.
(USE_SHA3): New.
* src/fips.c (run_hmac_selftests): Add SHA3 to the required selftests.
* cipher/md.c (digest_list) [USE_SHA3]: Add standard SHA3 algos.
(md_open): Ditto for hmac processing.
* cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping.
* cipher/hmac-tests.c (run_selftests): Prepare for tests.
* cipher/pubkey-util.c (get_hash_algo): Add "sha3-xxx".
--
Note that the algo GCRY_MD_SHA3_xxx are prelimanry. We should try to
sync them with OpenPGP.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/Makefile.am (gost-s-box): USe CC_FOR_BUILD.
(noinst_PROGRAMS): Remove.
(EXTRA_DIST): New.
(CLEANFILES): New.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/cipher-ocb.c: New.
* cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c
* cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New.
(gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb.
* cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode.
(_gcry_cipher_open_internal): Setup default taglen of OCB.
(cipher_reset): Clear OCB specific data.
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions.
(_gcry_cipher_setiv): Add OCB specific nonce setting.
(_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN
* src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New.
(gcry_cipher_final): New.
* cipher/bufhelp.h (buf_xor_1): New.
* tests/basic.c (hex2buffer): New.
(check_ocb_cipher): New.
(main): Call it here. Add option --cipher-modes.
* tests/bench-slope.c (bench_aead_encrypt_do_bench): Call
gcry_cipher_final.
(bench_aead_decrypt_do_bench): Ditto.
(bench_aead_authenticate_do_bench): Ditto. Check error code.
(bench_ocb_encrypt_do_bench): New.
(bench_ocb_decrypt_do_bench): New.
(bench_ocb_authenticate_do_bench): New.
(ocb_encrypt_ops): New.
(ocb_decrypt_ops): New.
(ocb_authenticate_ops): New.
(cipher_modes): Add them.
(cipher_bench_one): Skip wrong block length for OCB.
* tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add
OCB support.
--
See the comments on top of cipher/cipher-ocb.c for the patent status
of the OCB mode.
The implementation has not yet been optimized and as such is not faster
that the other AEAD modes. A first candidate for optimization is the
double_block function. Large improvements can be expected by writing
an AES ECB function to work on multiple blocks.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* Makefile.am (DISTCHECK_CONFIGURE_FLAGS): Remove --enable-ciphers.
* cipher/Makefile.am (DISTCLEANFILES): Add gost-sb.h.
|
|
--
The Manifest file have been part of an experiment a long time ago to
implement source level integrity. I is not maintained for more than a
decade and with the advent of git this is superfluous anyway.
|
|
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'.
* cipher/rijndael-internal.h (USE_SSSE3): New.
(RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'.
* cipher/rijndael-ssse3-amd64.c: New.
* cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey)
(_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New.
(do_setkey): Add HWF check for SSSE3 and setup for SSSE3
implementation.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add
selection for SSSE3 implementation.
* configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'.
--
This patch adds "AES with vector permutations" implementation by
Mike Hamburg. Public-domain source-code is available at:
http://crypto.stanford.edu/vpaes/
Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B
ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B
CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B
CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B
CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B
OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B
OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B
CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B
ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B
CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B
CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B
CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B
CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B
OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B
OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B
CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B
CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B
Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B
ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B
CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B
CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B
CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B
CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B
OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B
OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B
CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B
CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B
ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B
CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B
CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B
CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B
CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B
OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B
OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B
CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B
CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'.
* cipher/cipher-gcm-intel-pclmul.c: New.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL]
(_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New
prototypes.
[GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move
to 'cipher-gcm-intel-pclmul.c'.
(ghash): Rename to...
(ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new
function in 'cipher-gcm-intel-pclmul.c'.
(setupM): Move GCM_USE_INTEL_PCLMUL part to new function in
'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based
on available HW acceleration.
(do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'.
* cipher/internal.h (ghash_fn_t): New.
(gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-padlock.c'.
* cipher/rijndael-padlock.c: New.
* cipher/rijndael.c (do_padlock, do_padlock_encrypt)
(do_padlock_decrypt): Move to 'rijndael-padlock.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.in: Add 'rijndael-aesni.c'.
* cipher/rijndael-aesni.c: New.
* cipher/rijndael-internal.h: New.
* cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16)
(USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context)
(keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'.
(u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6)
(aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4)
(do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move
to 'rijndael-aesni.c'.
(prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc)
(_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions
in 'rijdael-aesni.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'.
--
Clean-up rijndael.c before new new hardware acceleration support gets added.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_NEON)
(POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT): New.
* cipher/poly1305.c [POLY1305_USE_NEON]
(_gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
if HWF_ARM_NEON set.
* configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
--
Add Andrew Moon's public domain NEON implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
* cipher/chacha20-armv7-neon.S: New.
* cipher/chacha20.c (USE_NEON): New.
[USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
(chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
HWF_ARM_NEON flag set.
(selftest): Self-test encrypting buffer byte by byte.
* configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
--
Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B
STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B
STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'.
* cipher/whirlpool-sse2-amd64.S: New.
* cipher/whirlpool.c (USE_AMD64_ASM): New.
(whirlpool_tables_s): New.
(rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single
structure and replace old tables with macros of same name.
(tab): New structure containing above tables.
[USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64)
(whirlpool_transform): New.
* configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'.
--
Benchmark results:
On Intel Core i5-4570 (3.2 Ghz):
After:
WHIRLPOOL | 4.82 ns/B 197.8 MiB/s 15.43 c/B
Before:
WHIRLPOOL | 9.10 ns/B 104.8 MiB/s 29.13 c/B
On Intel Core i5-2450M (2.5 Ghz):
After:
WHIRLPOOL | 8.43 ns/B 113.1 MiB/s 21.09 c/B
Before:
WHIRLPOOL | 13.45 ns/B 70.92 MiB/s 33.62 c/B
On Intel Core2 T8100 (2.1 Ghz):
After:
WHIRLPOOL | 10.22 ns/B 93.30 MiB/s 21.47 c/B
Before:
WHIRLPOOL | 19.87 ns/B 48.00 MiB/s 41.72 c/B
Summary, old vs new ratio:
Intel Core i5-4570: 1.88x
Intel Core i5-2450M: 1.59x
Intel Core2 T8100: 1.94x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/gost-s-box.c: New. Outputs optimized expanded representation of
s-boxes (4x256) from compact 16x8 representation.
* cipher/Makefile.am: Add gost-sb.h dependency to gost28147.lo
* cipher/gost.h: Add sbox to the GOST28147_context structure.
* cipher/gost28147.c (gost_setkey): Set default s-box to test s-box from
GOST R 34.11 (this was the only one S-box before).
* cipher/gost28147.c (gost_val): Use sbox from the context.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'.
* cipher/chacha20-sse2-amd64.S: New.
* cipher/chacha20.c (USE_SSE2): New.
[USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New.
(chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks
function.
* configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'.
--
Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Intel i5-4570 (haswell),
with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3":
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.97 ns/B 483.8 MiB/s 6.31 c/B
STREAM dec | 1.97 ns/B 483.6 MiB/s 6.31 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.931 ns/B 1024.7 MiB/s 2.98 c/B
STREAM dec | 0.930 ns/B 1025.0 MiB/s 2.98 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'.
* cipher/poly1305-avx2-amd64.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX2)
(POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE)
(POLY1305_AVX2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed.
* cipher/poly1305.c [POLY1305_USE_AVX2]
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if
AVX2 supported by CPU.
* configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'.
--
Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.
--
Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
Benchmarks on Intel i5-2450M (sandy-bridge):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-poly1305.c'.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'.
(_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt)
(_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate)
(_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New.
* cipher/cipher-poly1305.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'.
(cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ...
(_gcry_cipher_setiv): ... here, as with other modes.
* src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'.
* tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New.
(check_ciphers): Add Poly1305 check.
(check_cipher_modes): Call 'check_poly1305_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to
bench_aead_... and take nonce as argument.
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto.
(bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench)
(bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench)
(bench_poly1305_decrypt_do_bench)
(bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops)
(poly1305_decrypt_ops, poly1305_authenticate_ops): New.
(cipher_modes): Add Poly1305.
(cipher_bench_one): Add special handling for Poly1305.
--
Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant
of this mode is proposed for use in TLS and ipsec:
https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and
'poly1305-internal.h'.
* cipher/mac-internal.h (poly1305mac_context_s): New.
(gcry_mac_handle): Add 'u.poly1305mac'.
(_gcry_mac_type_spec_poly1305mac): New.
* cipher/mac-poly1305.c: New.
* cipher/mac.c (mac_list): Add Poly1305.
* cipher/poly1305-internal.h: New.
* cipher/poly1305.c: New.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'.
* tests/basic.c (check_mac): Add Poly1035 test vectors; Allow
overriding lengths of data and key buffers.
* tests/bench-slope.c (mac_bench): Increase max algo number from 500 to
600.
* tests/benchmark.c (mac_bench): Ditto.
--
Patch adds Bernstein's Poly1305 message authentication code to libgcrypt.
Implementation is based on Andrew Moon's public domain implementation
from: https://github.com/floodyberry/poly1305-opt
The algorithm added by this patch is the plain Poly1305 without AES and
takes 32-bit key that must not be reused.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'.
* cipher/chacha20-avx2-amd64.S: New.
* cipher/chacha20.c (USE_AVX2): New macro.
[USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New.
(chacha20_do_setkey): Select AVX2 implementation if there is HW
support.
(selftest): Increase size of buf by 256.
* configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'.
--
Add AVX2 optimized implementation for ChaCha20. Based on implementation by
Andrew Moon.
SSSE3 (Intel Haswell):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B
STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B
AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.393 ns/B 2428.0 MiB/s 1.26 c/B
STREAM dec | 0.392 ns/B 2433.6 MiB/s 1.25 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'.
* cipher/chacha20-ssse3-amd64.S: New.
* cipher/chacha20.c (USE_SSSE3): New macro.
[USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New.
(chacha20_do_setkey): Select SSSE3 implementation if there is HW
support.
* configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'.
--
Add SSSE3 optimized implementation for ChaCha20. Based on implementation
by Andrew Moon.
Before (Intel Haswell):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.97 ns/B 483.6 MiB/s 6.31 c/B
STREAM dec | 1.97 ns/B 484.0 MiB/s 6.31 c/B
After:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B
STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'chacha20.c'.
* cipher/chacha20.c: New.
* cipher/cipher.c (cipher_list): Add ChaCha20.
* configure.ac: Add ChaCha20.
* doc/gcrypt.texi: Add ChaCha20.
* src/cipher.h (_gcry_cipher_spec_chacha20): New.
* src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo.
* tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100.
(check_stream_cipher): Add ChaCha20 test-vectors.
(check_ciphers): Add ChaCha20.
--
Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based
on public domain implementations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'des-amd64.S'.
* cipher/cipher-selftests.c (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures
from 'setkey' function.
* cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk
functions for 3DES.
* cipher/des-amd64.S: New file.
* cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros.
[USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block)
(_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec)
(_gcry_3des_amd64_cfb_dec): New prototypes.
[USE_AMD64_ASM] (tripledes_ecb_crypt): New function.
(TRIPLEDES_ECB_BURN_STACK): New macro.
(_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec)
(bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New
functions.
(selftest): Add call to CTR, CBC and CFB selftest functions.
(do_tripledes_encrypt, do_tripledes_decrypt): Use
TRIPLEDES_ECB_BURN_STACK.
* configure.ac [host=x86-64]: Add 'des-amd64.lo'.
* src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec)
(_gcry_3des_cfb_dec): New prototypes.
--
Add non-parallel functions for small speed-up and 3-way parallel functions for
modes of operation that support parallel processing.
Old vs new (Intel Core i5-4570):
================================
enc dec
ECB 1.17x 1.17x
CBC 1.17x 2.51x
CFB 1.16x 2.49x
OFB 1.17x 1.17x
CTR 2.56x 2.56x
Old vs new (Intel Core i5-2450M):
=================================
enc dec
ECB 1.28x 1.28x
CBC 1.27x 2.33x
CFB 1.27x 2.34x
OFB 1.27x 1.27x
CTR 2.36x 2.35x
New (Intel Core i5-4570):
=========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 28.39 ns/B 33.60 MiB/s 90.84 c/B
ECB dec | 28.27 ns/B 33.74 MiB/s 90.45 c/B
CBC enc | 29.50 ns/B 32.33 MiB/s 94.40 c/B
CBC dec | 13.35 ns/B 71.45 MiB/s 42.71 c/B
CFB enc | 29.59 ns/B 32.23 MiB/s 94.68 c/B
CFB dec | 13.41 ns/B 71.12 MiB/s 42.91 c/B
OFB enc | 28.90 ns/B 33.00 MiB/s 92.47 c/B
OFB dec | 28.90 ns/B 33.00 MiB/s 92.48 c/B
CTR enc | 13.39 ns/B 71.20 MiB/s 42.86 c/B
CTR dec | 13.39 ns/B 71.21 MiB/s 42.86 c/B
Old (Intel Core i5-4570):
=========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 33.24 ns/B 28.69 MiB/s 106.4 c/B
ECB dec | 33.26 ns/B 28.67 MiB/s 106.4 c/B
CBC enc | 34.45 ns/B 27.69 MiB/s 110.2 c/B
CBC dec | 33.45 ns/B 28.51 MiB/s 107.1 c/B
CFB enc | 34.43 ns/B 27.70 MiB/s 110.2 c/B
CFB dec | 33.41 ns/B 28.55 MiB/s 106.9 c/B
OFB enc | 33.79 ns/B 28.22 MiB/s 108.1 c/B
OFB dec | 33.79 ns/B 28.22 MiB/s 108.1 c/B
CTR enc | 34.27 ns/B 27.83 MiB/s 109.7 c/B
CTR dec | 34.27 ns/B 27.83 MiB/s 109.7 c/B
New (Intel Core i5-2450M):
==========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 42.21 ns/B 22.59 MiB/s 105.5 c/B
ECB dec | 42.23 ns/B 22.58 MiB/s 105.6 c/B
CBC enc | 43.70 ns/B 21.82 MiB/s 109.2 c/B
CBC dec | 23.25 ns/B 41.02 MiB/s 58.12 c/B
CFB enc | 43.71 ns/B 21.82 MiB/s 109.3 c/B
CFB dec | 23.23 ns/B 41.05 MiB/s 58.08 c/B
OFB enc | 42.73 ns/B 22.32 MiB/s 106.8 c/B
OFB dec | 42.73 ns/B 22.32 MiB/s 106.8 c/B
CTR enc | 23.31 ns/B 40.92 MiB/s 58.27 c/B
CTR dec | 23.35 ns/B 40.84 MiB/s 58.38 c/B
Old (Intel Core i5-2450M):
==========================
3DES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 53.98 ns/B 17.67 MiB/s 134.9 c/B
ECB dec | 54.00 ns/B 17.66 MiB/s 135.0 c/B
CBC enc | 55.43 ns/B 17.20 MiB/s 138.6 c/B
CBC dec | 54.27 ns/B 17.57 MiB/s 135.7 c/B
CFB enc | 55.42 ns/B 17.21 MiB/s 138.6 c/B
CFB dec | 54.35 ns/B 17.55 MiB/s 135.9 c/B
OFB enc | 54.49 ns/B 17.50 MiB/s 136.2 c/B
OFB dec | 54.49 ns/B 17.50 MiB/s 136.2 c/B
CTR enc | 55.02 ns/B 17.33 MiB/s 137.5 c/B
CTR dec | 55.01 ns/B 17.34 MiB/s 137.5 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'blowfish-arm.S' and 'serpent-armv7-neon.S'.
--
Fix for bug https://bugs.g10code.com/gnupg/issue1584
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'arcfour-amd64.S'.
* cipher/arcfour-amd64.S: New.
* cipher/arcfour.c (USE_AMD64_ASM): New.
[USE_AMD64_ASM] (ARCFOUR_context, _gcry_arcfour_amd64)
(encrypt_stream): New.
* configure.ac [host=x86_64]: Add 'arcfour-amd64.lo'.
--
Patch adds Marc Bevand's public-domain AMD64 assembly implementation of RC4 to
libgcrypt. Original implementation is at:
http://www.zorinaq.com/papers/rc4-amd64.html
Benchmarks on Intel i5-4570 (3200 Mhz):
New:
ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.29 ns/B 737.7 MiB/s 4.14 c/B
STREAM dec | 1.31 ns/B 730.6 MiB/s 4.18 c/B
Old (C-language):
ARCFOUR | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 2.09 ns/B 457.4 MiB/s 6.67 c/B
STREAM dec | 2.09 ns/B 457.2 MiB/s 6.68 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha1-armv7-neon.S'.
* cipher/sha1-armv7-neon.S: New.
* cipher/sha1.c (USE_NEON): New.
(SHA1_CONTEXT, sha1_init) [USE_NEON]: Add and initialize 'use_neon'.
[USE_NEON] (_gcry_sha1_transform_armv7_neon): New.
(transform) [USE_NEON]: Use ARM/NEON assembly if enabled.
* configure.ac: Add 'sha1-armv7-neon.lo'.
--
Patch adds ARM/NEON implementation for SHA-1.
Benchmarks show 1.72x improvement on ARM Cortex-A8, 1008 Mhz:
jussi@cubie:~/libgcrypt$ tests/bench-slope --cpu-mhz 1008 hash sha1
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.80 ns/B 122.3 MiB/s 7.86 c/B
=
jussi@cubie:~/libgcrypt$ tests/bench-slope --disable-hwf arm-neon --cpu-mhz 1008 hash sha1
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 13.41 ns/B 71.10 MiB/s 13.52 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* LICENSES: Add 'cipher/sha256-avx-amd64.S' and
'cipher/sha256-avx2-bmi2-amd64.S'.
* cipher/Makefile.am: Add 'sha256-avx-amd64.S' and
'sha256-avx2-bmi2-amd64.S'.
* cipher/sha256-avx-amd64.S: New.
* cipher/sha256-avx2-bmi2-amd64.S: New.
* cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few
places for tiny speed improvement.
* cipher/sha256.c (USE_AVX, USE_AVX2): New.
(SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'.
(sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above
new context members.
[USE_AVX] (_gcry_sha256_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Use AVX2 assembly if enabled.
(transform) [USE_AVX]: Use AVX assembly if enabled.
* configure.ac: Add 'sha256-avx-amd64.lo' and
'sha256-avx2-bmi2-amd64.lo'.
--
Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel
Corporation. The assembly source is licensed under 3-clause BSD license,
thus compatible with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA - 256 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html
Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
slower than RORQ, so therefore AVX implementation is (for now) limited
to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
HWF flag.
Benchmarks:
cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2
vs SSSE3
Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x
Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha1-avx-amd64.S' and
'sha1-avx-bmi2-amd64.S'.
* cipher/sha1-avx-amd64.S: New.
* cipher/sha1-avx-bmi2-amd64.S: New.
* cipher/sha1.c (USE_AVX, USE_BMI2): New.
(SHA1_CONTEXT) [USE_AVX]: Add 'use_avx'.
(SHA1_CONTEXT) [USE_BMI2]: Add 'use_bmi2'.
(sha1_init): Initialize 'use_avx' and 'use_bmi2'.
[USE_AVX] (_gcry_sha1_transform_amd64_avx): New.
[USE_BMI2] (_gcry_sha1_transform_amd64_bmi2): New.
(transform) [USE_BMI2]: Use BMI2 assembly if enabled.
(transform) [USE_AVX]: Use AVX assembly if enabled.
* configure.ac: Add 'sha1-avx-amd64.lo' and 'sha1-avx-bmi2-amd64.lo'.
--
Patch adds AVX (for Sandybridge and Ivybridge) and AVX/BMI2 (for Haswell)
optimized implementations of SHA-1.
Note: AVX implementation is currently limited to Intel CPUs due to use
of SHLD instruction for faster rotations on Sandybrigde.
Benchmarks:
cpu C-version SSSE3 AVX/(SHLD|BMI2) New vs C New vs SSSE3
Intel i5-4570 8.84 c/B 4.61 c/B 3.86 c/B 2.29x 1.19x
Intel i5-2450M 9.45 c/B 5.30 c/B 4.39 c/B 2.15x 1.20x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Correct 'sha256-avx*.S' to 'sha512-avx*.S'.
* cipher/sha1-ssse3-amd64.S: First line, correct filename.
* cipher/sha256-ssse3-amd64.S: Return correct stack burn depth.
* cipher/sha512-avx-amd64.S: Use 'vzeroall' to clear registers.
* cipher/sha512-avx2-bmi2-amd64.S: Ditto and return correct stack burn
depth.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Change 'sha1-ssse3-amd64.c' to
'sha1-ssse3-amd64.S'.
* cipher/sha1-ssse3-amd64.c: Remove.
* cipher/sha1-ssse3-amd64.S: New.
--
Mixed C&asm implementation appears to trigger GCC bugs easily. Therefore
convert SSSE3 implementation to pure assembly for safety.
Benchmark also show smallish speed improvement.
cpu C&asm asm
Intel i5-4570 5.22 c/B 5.09 c/B
Intel i5-2450M 7.24 c/B 7.00 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha1-ssse3-amd64.c'.
* cipher/sha1-ssse3-amd64.c: New.
* cipher/sha1.c (USE_SSSE3): New.
(SHA1_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha1_init) [USE_SSSE3]: Initialize 'use_ssse3'.
(transform): Rename to...
(_transform): this.
(transform): New.
* configure.ac [host=x86_64]: Add 'sha1-ssse3-amd64.lo'.
--
Patch adds SSSE3 implementation based on white paper "Improving the Performance
of the Secure Hash Algorithm (SHA-1)" at
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1
Benchmarks:
cpu Old New Diff
Intel i5-4570 9.02 c/B 5.22 c/B 1.72x
Intel i5-2450M 12.27 c/B 7.24 c/B 1.69x
Intel Core2 T8100 7.94 c/B 6.76 c/B 1.17x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and
'sha512-avx2-bmi2-amd64.S'.
* cipher/sha512-avx-amd64.S: New.
* cipher/sha512-avx2-bmi2-amd64.S: New.
* cipher/sha512.c (USE_AVX, USE_AVX2): New.
(SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'.
(SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'.
(sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'.
(sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'.
[USE_AVX] (_gcry_sha512_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Add call for AVX2 implementation.
(transform) [USE_AVX]: Add call for AVX implementation.
* configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check.
(sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'.
* src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New.
* src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2".
* src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and
HWF_INTEL_BMI2.
--
Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation.
The assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA512 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$
Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
slower than RORQ, so therefore AVX implementation is (for now) limited
to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
HWF flag.
Benchmarks:
cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2
vs SSSE3
Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x
Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-ssse3-amd64.S'.
* cipher/sha512-ssse3-amd64.S: New.
* cipher/sha512.c (USE_SSSE3): New.
(SHA512_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha512_init, sha384_init) [USE_SSSE3]: Initialize 'use_ssse3'.
[USE_SSSE3] (_gcry_sha512_transform_amd64_ssse3): New.
(transform) [USE_SSSE3]: Call SSSE3 implementation.
* configure.ac (sha512): Add 'sha512-ssse3-amd64.lo'.
--
Patch adds fast SSSE3 implementation of SHA-512 by Intel Corporation. The
assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA512 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementations-ia-processors-paper.html
Benchmarks:
cpu Old New Diff
Intel i5-4570 10.11 c/B 7.56 c/B 1.33x
Intel i5-2450M 14.11 c/B 10.53 c/B 1.33x
Intel Core2 T8100 11.92 c/B 10.22 c/B 1.16x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha256-ssse3-amd64.S'.
* cipher/sha256-ssse3-amd64.S: New.
* cipher/sha256.c (USE_SSSE3): New.
(SHA256_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha256_init, sha224_init) [USE_SSSE3]: Initialize 'use_ssse3'.
(transform): Rename to...
(_transform): This.
[USE_SSSE3] (_gcry_sha256_transform_amd64_ssse3): New.
(transform): New.
* configure.ac (HAVE_INTEL_SYNTAX_PLATFORM_AS): New check.
(sha256): Add 'sha256-ssse3-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-ssse3'.
* src/g10lib.h (HWF_INTEL_SSSE3): New.
* src/hwfeatures.c (hwflist): Add "intel-ssse3".
* src/hwf-x86.c (detect_x86_gnuc): Test for SSSE3.
--
Patch adds fast SSSE3 implementation of SHA-256 by Intel Corporation. The
assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA - 256 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html
Benchmarks:
cpu Old New Diff
Intel i5-4570 13.99 c/B 10.66 c/B 1.31x
Intel i5-2450M 21.53 c/B 15.79 c/B 1.36x
Intel Core2 T8100 20.84 c/B 15.07 c/B 1.38x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'mac-gmac.c'.
* cipher/mac-gmac.c: New.
* cipher/mac-internal.h (gcry_mac_handle): Add 'u.gcm'.
(_gcry_mac_type_spec_gmac_aes, _gcry_mac_type_spec_gmac_twofish)
(_gcry_mac_type_spec_gmac_serpent, _gcry_mac_type_spec_gmac_seed)
(_gcry_mac_type_spec_gmac_camellia): New externs.
* cipher/mac.c (mac_list): Add GMAC specifications.
* doc/gcrypt.texi: Add mention of GMAC.
* src/gcrypt.h.in (gcry_mac_algos): Add GCM algorithms.
* tests/basic.c (check_one_mac): Add support for MAC IVs.
(check_mac): Add support for MAC IVs and add GMAC test vectors.
* tests/bench-slope.c (mac_bench): Iterate algorithm numbers to 499.
* tests/benchmark.c (mac_bench): Iterate algorithm numbers to 499.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-gcm.c'.
* cipher/cipher-ccm.c (_gcry_ciphert_ccm_set_lengths)
(_gcry_cipher_ccm_authenticate, _gcry_cipher_ccm_tag)
(_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Change
'c->u_mode.ccm.tag' to 'c->marks.tag'.
* cipher/cipher-gcm.c: New.
* cipher/cipher-internal.h (GCM_USE_TABLES): New.
(gcry_cipher_handle): Add 'marks.tag', 'u_tag', 'length' and
'gcm_table'; Remove 'u_mode.ccm.tag'.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_setiv, _gcry_cipher_gcm_authenticate)
(_gcry_cipher_gcm_get_tag, _gcry_cipher_gcm_check_tag): New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Add GCM mode handling.
* src/gcrypt.h.in (gcry_cipher_modes): Add GCRY_CIPHER_MODE_GCM.
(GCRY_GCM_BLOCK_LEN): New.
* tests/basic.c (check_gcm_cipher): New.
(check_ciphers): Add GCM check.
(check_cipher_modes): Call 'check_gcm_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench)
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench)
(gcm_encrypt_ops, gcm_decrypt_ops, gcm_authenticate_ops): New.
(cipher_modes): Add GCM enc/dec/auth.
(cipher_bench_one): Limit GCM to block ciphers with 16 byte block-size.
* tests/benchmark.c (cipher_bench): Add GCM.
--
Currently it is still quite slow.
Still no support for generate_iv(). Is it really necessary?
TODO: Merge/reuse cipher-internal state used by CCM.
Changelog entry will be present in final patch submission.
Changes since v1:
- 6x-7x speedup.
- added bench-slope support
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
[jk: mangle new file throught 'indent -nut']
[jk: few fixes]
[jk: changelog]
|
|
* cipher/Makefile.am: Add 'cipher-cmac.c' and 'mac-cmac.c'.
* cipher/cipher-cmac.c: New.
* cipher/cipher-internal.h (gcry_cipher_handle.u_mode): Add 'cmac'.
* cipher/cipher.c (gcry_cipher_open): Rename to...
(_gcry_cipher_open_internal): ...this and add CMAC.
(gcry_cipher_open): New wrapper that disallows use of internal
modes (CMAC) from outside.
(cipher_setkey, cipher_encrypt, cipher_decrypt)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Add handling for CMAC mode.
(cipher_reset): Do not reset 'marks.key' and do not clear subkeys in
'u_mode' in CMAC mode.
* cipher/mac-cmac.c: New.
* cipher/mac-internal.h: Add CMAC support and algorithms.
* cipher/mac.c: Add CMAC algorithms.
* doc/gcrypt.texi: Add documentation for CMAC.
* src/cipher.h (gcry_cipher_internal_modes): New.
(_gcry_cipher_open_internal, _gcry_cipher_cmac_authenticate)
(_gcry_cipher_cmac_get_tag, _gcry_cipher_cmac_check_tag)
(_gcry_cipher_cmac_set_subkeys): New prototypes.
* src/gcrypt.h.in (gcry_mac_algos): Add CMAC algorithms.
* tests/basic.c (check_mac): Add CMAC test vectors.
--
Patch adds CMAC (Cipher-based MAC) as defined in RFC 4493 and NIST
Special Publication 800-38B.
Internally CMAC is added to cipher module, but is available to outside
only through MAC API.
[v2]:
- Add documentation.
[v3]:
- CMAC algorithm ids start from 201.
- Coding style fixes.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'mac.c', 'mac-internal.h' and 'mac-hmac.c'.
* cipher/bufhelp.h (buf_eq_const): New.
* cipher/cipher-ccm.c (_gcry_cipher_ccm_tag): Use 'buf_eq_const' for
constant-time compare.
* cipher/mac-hmac.c: New.
* cipher/mac-internal.h: New.
* cipher/mac.c: New.
* doc/gcrypt.texi: Add documentation for MAC API.
* src/gcrypt-int.h [GPG_ERROR_VERSION_NUMBER < 1.13]
(GPG_ERR_MAC_ALGO): New.
* src/gcrypt.h.in (gcry_mac_handle, gcry_mac_hd_t, gcry_mac_algos)
(gcry_mac_flags, gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name)
(gcry_mac_reset, gcry_mac_test_algo): New.
* src/libgcrypt.def (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/libgcrypt.vers (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.c (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.h (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* tests/basic.c (check_one_mac, check_mac): New.
(main): Call 'check_mac'.
* tests/bench-slope.c (bench_print_header, bench_print_footer): Allow
variable algorithm name width.
(_cipher_bench, hash_bench): Update to above change.
(bench_hash_do_bench): Add 'gcry_md_reset'.
(bench_mac_mode, bench_mac_init, bench_mac_free, bench_mac_do_bench)
(mac_ops, mac_modes, mac_bench_one, _mac_bench, mac_bench): New.
(main): Add 'mac' benchmark options.
* tests/benchmark.c (mac_repetitions, mac_bench): New.
(main): Add 'mac' benchmark options.
--
Add MAC API, with HMAC algorithms. Internally uses HMAC functionality of the
MD module.
[v2]:
- Add documentation for MAC API.
- Change length argument for gcry_mac_read from size_t to size_t* for
returning number of written bytes.
[v3]:
- HMAC algorithm ids start from 101.
- Fix coding style for new files.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.
--
Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.
This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:
http://cryptojedi.org/papers/#neoncrypto
Benchmark results on Cortex-A8 (1008 Mhz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 18.88 ns/B 50.51 MiB/s 19.03 c/B
STREAM dec | 18.89 ns/B 50.49 MiB/s 19.04 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 13.60 ns/B 70.14 MiB/s 13.71 c/B
STREAM dec | 13.60 ns/B 70.13 MiB/s 13.71 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 5.48 ns/B 174.1 MiB/s 5.52 c/B
STREAM dec | 5.47 ns/B 174.2 MiB/s 5.52 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.65 ns/B 260.9 MiB/s 3.68 c/B
STREAM dec | 3.65 ns/B 261.6 MiB/s 3.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'salsa20-amd64.S'.
* cipher/salsa20-amd64.S: New.
* cipher/salsa20.c (USE_AMD64): New macro.
[USE_AMD64] (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup)
(_gcry_salsa20_amd64_encrypt_blocks): New prototypes.
[USE_AMD64] (salsa20_keysetup, salsa20_ivsetup, salsa20_core): New.
[!USE_AMD64] (salsa20_core): Change 'src' to non-constant, update block
counter in 'salsa20_core' and return burn stack depth.
[!USE_AMD64] (salsa20_keysetup, salsa20_ivsetup): New.
(salsa20_do_setkey): Move generic key setup to 'salsa20_keysetup'.
(salsa20_setkey): Fix burn stack depth.
(salsa20_setiv): Move generic IV setup to 'salsa20_ivsetup'.
(salsa20_do_encrypt_stream) [USE_AMD64]: Process large buffers in AMD64
implementation.
(salsa20_do_encrypt_stream): Move stack burning to this function...
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): ...from these
functions.
* configure.ac [x86-64]: Add 'salsa20-amd64.lo'.
--
Patch adds fast AMD64 assembly implementation for Salsa20. This implementation
is based on public domain code by D. J. Bernstein and it is available at
http://cr.yp.to/snuffle.html (amd64-xmm6). Implementation gains extra speed
by processing four blocks in parallel with help SSE2 instructions.
Benchmark results on Intel Core i5-4570 (3.2 Ghz):
Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 3.88 ns/B 246.0 MiB/s 12.41 c/B
STREAM dec | 3.88 ns/B 246.0 MiB/s 12.41 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 2.46 ns/B 387.9 MiB/s 7.87 c/B
STREAM dec | 2.46 ns/B 387.7 MiB/s 7.87 c/B
After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.985 ns/B 967.8 MiB/s 3.15 c/B
STREAM dec | 0.987 ns/B 966.5 MiB/s 3.16 c/B
=
SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.636 ns/B 1500.5 MiB/s 2.03 c/B
STREAM dec | 0.636 ns/B 1499.2 MiB/s 2.04 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/blowfish-armv6.S => cipher/blowfish-arm.S: adapt to pre-armv6 CPUs.
* cipher/blowfish.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/camellia-armv6.S => cipher/camellia-arm.S: adapt to pre-armv6 CPUs.
* cipher/camellia.c, cipher-camellia-glue.c: enable assembly on armv4/armv5
little-endian CPUs.
* cipher/cast5-armv6.S => cipher/cast5-arm.S: adapt to pre-armv6 CPUs.
* cipher/cast5.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/rijndael-armv6.S => cipher/rijndael-arm.S: adapt to pre-armv6 CPUs.
* cipher/rijndael.c: enable assembly on armv4/armv5 little-endian CPUs.
* cipher/twofish-armv6.S => cipher/twofish-arm.S: adapt to pre-armv6 CPUs.
* cipher/twofish.c: enable assembly on armv4/armv5 little-endian CPUs.
--
Our ARMv6 assembly optimized code can be easily adapted to earlier CPUs.
The only incompatible place is rev instruction used to do byte swapping.
Replace it on <= ARMv6 with a series of 4 instructions.
Compare:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 620ms 610ms 650ms 680ms 620ms 630ms 660ms 660ms 630ms 630ms
CAMELLIA128 720ms 720ms 780ms 790ms 770ms 760ms 780ms 780ms 770ms 760ms
CAMELLIA256 910ms 910ms 970ms 970ms 960ms 950ms 970ms 970ms 960ms 950ms
CAST5 820ms 820ms 930ms 920ms 890ms 860ms 930ms 920ms 880ms 890ms
BLOWFISH 550ms 560ms 650ms 660ms 630ms 600ms 660ms 650ms 610ms 620ms
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 130ms 140ms 180ms 200ms 160ms 170ms 190ms 200ms 170ms 170ms
CAMELLIA128 150ms 160ms 210ms 220ms 200ms 190ms 210ms 220ms 190ms 190ms
CAMELLIA256 180ms 180ms 260ms 240ms 240ms 230ms 250ms 250ms 230ms 230ms
CAST5 170ms 160ms 270ms 120ms 240ms 130ms 260ms 270ms 130ms 120ms
BLOWFISH 160ms 150ms 260ms 110ms 230ms 120ms 250ms 260ms 110ms 120ms
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
[ jk: in camellia.h and twofish.c, USE_ARMV6_ASM => USE_ARM_ASM ]
[ jk: fix blowfish-arm.S when __ARM_FEATURE_UNALIGNED defined ]
[ jk: in twofish.S remove defined(HAVE_ARM_ARCH_V6) ]
[ jk: ARMv6 => ARM in comments ]
|
|
* cipher/ecc-ecdsa.c, cipher/ecc-eddsa.c, cipher/ecc-gost.c: New.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files.
* configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new files.
* cipher/ecc.c (point_init, point_free): Move to ecc-common.h.
(sign_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_sign.
(verify_ecdsa): Move to ecc-ecdsa.c as _gcry_ecc_ecdsa_verify.
(sign_gost): Move to ecc-gots.c as _gcry_ecc_gost_sign.
(verify_gost): Move to ecc-gost.c as _gcry_ecc_gost_verify.
(sign_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_sign.
(verify_eddsa): Move to ecc-eddsa.c as _gcry_ecc_eddsa_verify.
(eddsa_generate_key): Move to ecc-eddsa.c as _gcry_ecc_eddsa_genkey.
(reverse_buffer): Move to ecc-eddsa.c.
(eddsa_encodempi, eddsa_encode_x_y): Ditto.
(_gcry_ecc_eddsa_encodepoint, _gcry_ecc_eddsa_decodepoint): Ditto.
--
This change should make it easier to add new ECC algorithms.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/Makefile.am: Add 'twofish-armv6.S'.
* cipher/twofish-armv6.S: New.
* cipher/twofish.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_twofish_armv6_encrypt_block)
(_gcry_twofish_armv6_decrypt_block): New prototypes.
[USE_AMDV6_ASM] (twofish_encrypt, twofish_decrypt): Add.
[USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt): Remove.
(_gcry_twofish_ctr_enc, _gcry_twofish_cfb_dec): Use 'twofish_encrypt'
instead of 'do_twofish_encrypt'.
(_gcry_twofish_cbc_dec): Use 'twofish_decrypt' instead of
'do_twofish_decrypt'.
* configure.ac [arm]: Add 'twofish-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for Twofish. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old (gcc-4.8) vs new (twofish-asm), Cortex-A8 (on armhf):
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
TWOFISH 1.23x 1.25x 1.16x 1.26x 1.16x 1.30x 1.18x 1.17x 1.23x 1.23x 1.22x 1.22x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cipher-ccm.c'.
* cipher/cipher-ccm.c: New.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode'.
(_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt)
(_gcry_cipher_ccm_set_nonce, _gcry_cipher_ccm_authenticate)
(_gcry_cipher_ccm_get_tag, _gcry_cipher_ccm_check_tag)
(_gcry_cipher_ccm_set_lengths): New prototypes.
* cipher/cipher.c (gcry_cipher_open, cipher_encrypt, cipher_decrypt)
(_gcry_cipher_setiv, _gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag, gry_cipher_ctl): Add handling for CCM mode.
* doc/gcrypt.texi: Add documentation for GCRY_CIPHER_MODE_CCM.
* src/gcrypt.h.in (gcry_cipher_modes): Add 'GCRY_CIPHER_MODE_CCM'.
(gcry_ctl_cmds): Add 'GCRYCTL_SET_CCM_LENGTHS'.
(GCRY_CCM_BLOCK_LEN): New.
* tests/basic.c (check_ccm_cipher): New.
(check_cipher_modes): Call 'check_ccm_cipher'.
* tests/benchmark.c (ccm_aead_init): New.
(cipher_bench): Add handling for AEAD modes and add CCM benchmarking.
--
Patch adds CCM (Counter with CBC-MAC) mode as defined in RFC 3610 and NIST
Special Publication 800-38C.
Example for encrypting message (split in two buffers; buf1, buf2) and
authenticating additional non-encrypted data (split in two buffers; aadbuf1,
aadbuf2) with authentication tag length of eigth bytes:
size_t params[3];
taglen = 8;
gcry_cipher_setkey(h, key, len(key));
gcry_cipher_setiv(h, nonce, len(nonce));
params[0] = len(buf1) + len(buf2); /* 0: enclen */
params[1] = len(aadbuf1) + len(aadbuf2); /* 1: aadlen */
params[2] = taglen; /* 2: authtaglen */
gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3);
gcry_cipher_authenticate(h, aadbuf1, len(aadbuf1));
gcry_cipher_authenticate(h, aadbuf2, len(aadbuf2));
gcry_cipher_encrypt(h, buf1, len(buf1), buf1, len(buf1));
gcry_cipher_encrypt(h, buf2, len(buf2), buf2, len(buf2));
gcry_cipher_gettag(h, tag, taglen);
Example for decrypting above message and checking authentication tag:
size_t params[3];
taglen = 8;
gcry_cipher_setkey(h, key, len(key));
gcry_cipher_setiv(h, nonce, len(nonce));
params[0] = len(buf1) + len(buf2); /* 0: enclen */
params[1] = len(aadbuf1) + len(aadbuf2); /* 1: aadlen */
params[2] = taglen; /* 2: authtaglen */
gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3);
gcry_cipher_authenticate(h, aadbuf1, len(aadbuf1));
gcry_cipher_authenticate(h, aadbuf2, len(aadbuf2));
gcry_cipher_decrypt(h, buf1, len(buf1), buf1, len(buf1));
gcry_cipher_decrypt(h, buf2, len(buf2), buf2, len(buf2));
err = gcry_cipher_checktag(h, tag, taglen);
if (gpg_err_code (err) == GPG_ERR_CHECKSUM)
{ /* Authentication failed. */ }
else if (err == 0)
{ /* Authentication ok. */ }
Example for encrypting message without additional authenticated data:
size_t params[3];
taglen = 10;
gcry_cipher_setkey(h, key, len(key));
gcry_cipher_setiv(h, nonce, len(nonce));
params[0] = len(buf1); /* 0: enclen */
params[1] = 0; /* 1: aadlen */
params[2] = taglen; /* 2: authtaglen */
gcry_cipher_ctl(h, GCRYCTL_SET_CCM_LENGTHS, params, sizeof(size_t) * 3);
gcry_cipher_encrypt(h, buf1, len(buf1), buf1, len(buf1));
gcry_cipher_gettag(h, tag, taglen);
To reset CCM state for cipher handle, one can either set new nonce or use
'gcry_cipher_reset'.
This implementation reuses existing CTR mode code for encryption/decryption
and is there for able to process multiple buffers that are not multiple of
blocksize. AAD data maybe also be passed into gcry_cipher_authenticate
in non-blocksize chunks.
[v4]: GCRYCTL_SET_CCM_PARAMS => GCRY_SET_CCM_LENGTHS
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/pubkey-util.c: New.
(_gcry_pk_util_get_nbits): New. Based on code from gcry_pk_genkey.
(_gcry_pk_util_get_rsa_use_e): Ditto.
* cipher/pubkey.c (gcry_pk_genkey): Strip most code and pass.
* cipher/rsa.c (rsa_generate): Remove args ALGO, NBITS and EVALUE.
Call new fucntions to get these values.
* cipher/dsa.c (dsa_generate): Remove args ALGO, NBITS and EVALUE.
Call _gcry_pk_util_get_nbits to get nbits. Always parse genparms.
* cipher/elgamal.c (elg_generate): Ditto.
* cipher/ecc.c (ecc_generate): Ditto.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/rsa-common: New.
* cipher/pubkey.c (pkcs1_encode_for_encryption): Move to rsa-common.c
and rename to _gcry_rsa_pkcs1_encode_for_enc.
(pkcs1_decode_for_encryption): Move to rsa-common.c and rename to
_gcry_rsa_pkcs1_decode_for_enc.
(pkcs1_encode_for_signature): Move to rsa-common.c and rename to
_gcry_rsa_pkcs1_encode_for_sig.
(oaep_encode): Move to rsa-common.c and rename to
_gcry_rsa_oaep_encode.
(oaep_decode): Move to rsa-common.c and rename to
_gcry_rsa_oaep_decode.
(pss_encode): Move to rsa-common.c and rename to _gcry_rsa_pss_encode.
(pss_verify): Move to rsa-common.c and rename to _gcry_rsa_pss_decode.
(octet_string_from_mpi, mgf1): Move to rsa-common.c.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_12_256)
(GCRY_MD_GOSTR3411_12_512): New.
* cipher/stribog.c: New.
* configure.ac (available_digests_64): Add stribog.
* src/cipher.h: Declare Stribog declarations.
* cipher/md.c: Register Stribog digest.
* tests/basic.c (check_digests) Add 4 testcases for Stribog from
standard.
* doc/gcrypt.texi: Document new constants.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|