Age | Commit message (Collapse) | Author | Files | Lines |
|
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'.
* cipher/crc-intel-pclmul.c: New.
* cipher/crc.c (USE_INTEL_PCLMUL): New macro.
(CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'.
[USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul)
(gcry_crc24rfc2440_intel_pclmul): New.
(crc32_init, crc32rfc1510_init, crc24rfc2440_init)
[USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL
HW features detected.
(crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL
implementation if enabled.
(crc24_init): Document storage format of 24-bit CRC.
(crc24_next4): Use only 'data' for last table look-up.
* configure.ac: Add 'crc-intel-pclmul.lo'.
* src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include
Intel SSE4.1.
* src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection.
* src/hwfeatures.c (hwflist): Add 'intel-sse4.1'.
* tests/basic.c (fillbuf_count): New.
(check_one_md): Add "?" check (million byte data-set with byte pattern
0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?"
checks.
(check_one_md_multi): Skip "?".
(check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256,
SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160,
CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!"
test-vectors for CRC32_RFC1510 and CRC24_RFC2440.
--
Add Intel PCLMUL accelerated implmentations of CRC algorithms.
CRC performance is improved ~11x on x86_64 and i386 on Intel
Haswell, and ~2.7x on Intel Sandy-bridge.
Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B
CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B
CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B
CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B
CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B
Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B
CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B
CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B
CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B
CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B
Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B
CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B
CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B
CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B
CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/sha1.c (sha1_init): Use HWF_INTEL_FAST_SHLD instead of
HWF_INTEL_CPU.
* cipher/sha256.c (sha256_init, sha224_init): Ditto.
* cipher/sha512.c (sha512_init, sha384_init): Ditto.
* src/g10lib.h (HWF_INTEL_FAST_SHLD): New.
(HWF_INTEL_BMI2, HWF_INTEL_SSSE3, HWF_INTEL_PCLMUL, HWF_INTEL_AESNI)
(HWF_INTEL_RDRAND, HWF_INTEL_AVX, HWF_INTEL_AVX2)
(HWF_ARM_NEON): Update.
* src/hwf-x86.c (detect_x86_gnuc): Add detection of Intel Core
CPUs with fast SHLD/SHRD instruction.
* src/hwfeatures.c (hwflist): Add "intel-fast-shld".
--
Intel Core CPUs since codename sandy-bridge have been able to
execute SHLD/SHRD instructions faster than rotate instructions
ROL/ROR. Since SHLD/SHRD can be used to do rotation, some
optimized implementations (SHA1/SHA256/SHA512) use SHLD/SHRD
instructions in-place of ROL/ROR.
This patch provides more accurate detection of CPUs with
fast SHLD implementation.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and
'sha512-avx2-bmi2-amd64.S'.
* cipher/sha512-avx-amd64.S: New.
* cipher/sha512-avx2-bmi2-amd64.S: New.
* cipher/sha512.c (USE_AVX, USE_AVX2): New.
(SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'.
(SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'.
(sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'.
(sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'.
[USE_AVX] (_gcry_sha512_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Add call for AVX2 implementation.
(transform) [USE_AVX]: Add call for AVX implementation.
* configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check.
(sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'.
* src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New.
* src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2".
* src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and
HWF_INTEL_BMI2.
--
Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation.
The assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA512 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$
Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
slower than RORQ, so therefore AVX implementation is (for now) limited
to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
HWF flag.
Benchmarks:
cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2
vs SSSE3
Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x
Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha256-ssse3-amd64.S'.
* cipher/sha256-ssse3-amd64.S: New.
* cipher/sha256.c (USE_SSSE3): New.
(SHA256_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha256_init, sha224_init) [USE_SSSE3]: Initialize 'use_ssse3'.
(transform): Rename to...
(_transform): This.
[USE_SSSE3] (_gcry_sha256_transform_amd64_ssse3): New.
(transform): New.
* configure.ac (HAVE_INTEL_SYNTAX_PLATFORM_AS): New check.
(sha256): Add 'sha256-ssse3-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-ssse3'.
* src/g10lib.h (HWF_INTEL_SSSE3): New.
* src/hwfeatures.c (hwflist): Add "intel-ssse3".
* src/hwf-x86.c (detect_x86_gnuc): Test for SSSE3.
--
Patch adds fast SSSE3 implementation of SHA-256 by Intel Corporation. The
assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA - 256 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html
Benchmarks:
cpu Old New Diff
Intel i5-4570 13.99 c/B 10.66 c/B 1.31x
Intel i5-2450M 21.53 c/B 15.79 c/B 1.36x
Intel Core2 T8100 20.84 c/B 15.07 c/B 1.38x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/hwfeatures.c: Inclyde syslog.h and ctype.h.
(HWF_DENY_FILE): New.
(my_isascii): New.
(parse_hwf_deny_file): New.
(_gcry_detect_hw_features): Call it.
* src/mpicalc.c (main): Correctly initialize Libgcrypt. Add options
"--print-config" and "--disable-hwf".
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/global.c (hwflist, disabled_hw_features): Move to ..
* src/hwfeatures.c: here.
(_gcry_disable_hw_feature): New.
(_gcry_enum_hw_features): New.
(_gcry_detect_hw_features): Remove arg DISABLED_FEATURES.
* src/global.c (print_config, _gcry_vcontrol, global_init): Adjust
accordingly.
--
It is better to keep the hardware feature infor at one place.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* configure.ac: Add option --disable-neon-support.
(HAVE_GCC_INLINE_ASM_NEON): New.
(ENABLE_NEON_SUPPORT): New.
[arm]: Add 'hwf-arm.lo' as HW feature module.
* src/Makefile.am: Add 'hwf-arm.c'.
* src/g10lib.h (HWF_ARM_NEON): New macro.
* src/global.c (hwflist): Add HWF_ARM_NEON entry.
* src/hwf-arm.c: New file.
* src/hwf-common.h (_gcry_hwf_detect_arm): New prototype.
* src/hwfeatures.c (_gcry_detect_hw_features) [HAVE_CPU_ARCH_ARM]: Add
call to _gcry_hwf_detect_arm.
--
Add HW detection module for detecting ARM NEON instruction set. ARM does not
have cpuid instruction so we have to rely on OS to pass feature set information
to user-space. For linux, NEON support can be detected by parsing
'/proc/self/auxv' for hardware capabilities information. For other OSes, NEON
can be detected by checking if platform/compiler only supports NEON capable
CPUs (by check if __ARM_NEON__ macro is defined).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* configure.ac (GCRYPT_HWF_MODULES): New.
(HAVE_CPU_ARCH_X86, HAVE_CPU_ARCH_ALPHA, HAVE_CPU_ARCH_SPARC)
(HAVE_CPU_ARCH_MIPS, HAVE_CPU_ARCH_M68K, HAVE_CPU_ARCH_PPC)
(HAVE_CPU_ARCH_ARM): New AC_DEFINEs.
* mpi/config.links (mpi_cpu_arch): New.
* src/global.c (print_config): Print new tag "cpu-arch".
* src/Makefile.am (libgcrypt_la_SOURCES): Add hwf-common.h
(EXTRA_libgcrypt_la_SOURCES): New.
(gcrypt_hwf_modules): New.
(libgcrypt_la_DEPENDENCIES, libgcrypt_la_LIBADD): Add that one.
* src/hwfeatures.c: Factor most code out to ...
* src/hwf-x86.c: New file.
(detect_x86_gnuc): Return the feature vector.
(_gcry_hwf_detect_x86): New.
* src/hwf-common.h: New.
* src/hwfeatures.c (_gcry_detect_hw_features): Dispatch using
HAVE_CPU_ARCH_ macros.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/hwfeatures.c [__i386__ && __GNUC__] (detect_ia32_gnuc): Remove.
[__x86_64__ && __GNUC__] (detect_x86_64_gnuc): Remove.
[__i386__ && __GNUC__] (is_cpuid_available, get_cpuid)
(HAS_X86_CPUID): New.
[__x86_64__ && __GNUC__] (is_cpuid_available, get_cpuid)
(HAS_X86_CPUID): New.
[HAS_X86_CPUID] (detect_x86_gnuc): New.
(_gcry_detect_hw_features) [__i386__ && GNUC]: Remove detect_ia32_gnuc
call.
(_gcry_detect_hw_features) [__x86_64__ && GNUC]: Remove
detect_x86_64_gnuc call.
(_gcry_detect_hw_features) [HAS_X86_CPUID]: Add detect_x86_gnuc call.
--
For hwfeatures.c clean up, merge i386/x86-64 hardware detection and move
i386/x86-64 spesific assembler to separate functions, is_cpuid_available() and
get_cpuid().
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
* configure.ac: Add option --disable-drng-support.
(ENABLE_DRNG_SUPPORT): New.
* random/rndhw.c (USE_DRNG): New.
(rdrand_long, rdrand_nlong, poll_drng): New.
(_gcry_rndhw_poll_fast, _gcry_rndhw_poll_slow): Call poll function.
* src/g10lib.h (HWF_INTEL_RDRAND): New.
* src/global.c (hwflist): Add "intel-rdrand".
* src/hwfeatures.c (detect_x86_64_gnuc) [ENABLE_DRNG_SUPPORT]: Detect
RDRAND.
(detect_ia32_gnuc) [ENABLE_DRNG_SUPPORT]: Detect RDRAND.
--
This patch provides support for using Digital Random Number Generator (DRNG)
engine, which is available on the latest Intel's CPUs. DRNG engine is
accesible via new the RDRAND instruction.
This patch adds the following:
- support for disabling using of rdrand instruction
- checking for RDRAND instruction support using cpuid
- RDRAND usage implementation
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
ChangeLog and editorial changes by wk.
|
|
* src/hwfeatures.c (detect_ia32_gnuc): Detect AESNI support only if
that support has been enabled.
--
We better do not try to detect AESNI support if the support has been
disabled in the configure run. Disabling the support might have been
done due to problem with the AESNI support on a certain platform and
we can't exclude problem for sure with the detection code either.
|
|
* cipher/rijndael.c [ENABLE_AESNI_SUPPORT]: Enable USE_AESNI on x86-64.
(do_setkey) [USE_AESNI_is_disabled_here]: Use %[key] and %[ksch]
directly as registers instead of using temporary register %%esi.
[USE_AESNI] (do_aesni_enc_aligned, do_aesni_dec_aligned, do_aesni_cfb,
do_aesni_ctr, do_aesni_ctr_4): Use %[key] directly as register instead
of using temporary register %%esi.
[USE_AESNI] (do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Change %[key]
from generic "g" type to register "r".
* src/hwfeatures.c (_gcry_detect_hw_features) [__x86_64__]: Do not
clear AES-NI feature flag.
--
AES-NI assembler uses %%esi for key-material pointer register. However
%[key] can be marked as "r" (register) and automatically be 64bit on
x86-64 and be 32bit on i386.
So use %[key] for pointer register instead of %esi and that way make
same AES-NI code work on both x86-64 and i386.
[v2]
- Add GNU style changelog
- Fixed do_setkey changes, use %[ksch] for output instead of %[key]
- Changed [key] assembler arguments from "g" to "r" to force use of
registers in all cases (when tested v1, "g" did work as indented
and %[key] mapped to register on i386 and x86-64, but that might
not happen always).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
* src/hwfeatures.c (detect_x86_64_gnuc, detect_ia32_gnuc): Allow
Intel features be detect from CPU by other vendors too.
--
detect_x86_64_gnuc() and detect_ia32_gnuc() incorrectly exclude Intel
features on all other vendor CPUs. What we want here, is to detect if
CPU from any vendor support said Intel feature (in this case AES-NI).
[v2]
- Add GNU style changelog
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
* src/hwfeatures.c (detect_x86_64_gnuc): Add missing %ebx assembler
clobbers.
(detect_x86_64_gnuc, detect_ia32_gnuc) [ENABLE_PADLOCK_SUPPORT]: Add
missing %ecx assembler clobbers.
--
detect_x86_64_gnuc() and detect_ia32_gnuc() have missing clobbers in
assembler statements. "%ebx" is missing in x86-64, probably because
copy-paste error (i386 code saves and restores %ebx to/from stack).
"%ecx" is missing from PadLock detection.
[v2]
- add GNU style changelog
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|
|
* src/hwfeatures.c (_gcry_detect_hw_features) [__x86_64__]: Clear
AESNI feature flag.
|
|
* cipher/rijndael.c: Replace C99 feature from last patch. Keep cpp
lines short.
* random/rndhw.c: Keep cpp lines short.
* src/hwfeatures.c (_gcry_detect_hw_features): Make cpp def chain
better readable.
|
|
* cipher/rijndael.c: Duplicate x86 assembly and convert to x86_64.
* random/rndhw.c: Likewise.
* src/hwfeatures.c: Likewise.
--
Changes made to the x86 assembly:
- *l -> *q (long -> quad)
- e** registers -> r** registers (use widest registers available)
- don't mess with ebx GOT register
Tested with make check on VIA Nano X2 L4350
Signed-off-by: Rafaël Carré <funman@videolan.org>
|
|
Really a kind of bown paper bag bug: Use AND and not SUB for bit
testing. I should have known that, given that 30 years ago I wrote
almost everything in asm.
|
|
This option is useful to disable detected hardware features. It has
been implemented in benchmark, so that it is now possible to run
tests/benchmark --disable-hwf intel-aesni cipher aes aes192 aes256
to compare the use of AES-NI insns to the pure C code.
|
|
|
|
Check and install the standard git pre-commit hook.
|
|
|
|
|
|
FIPS cleanups.
Documentation upodates.
|
|
* hwfeatures.c (detect_ia32_gnuc): Fix inline asm.
|
|
|
|
|
|
This can be improved by using fucntion aliases instead
of wrapper functions.
|