summaryrefslogtreecommitdiff
path: root/src/hwfeatures.c
AgeCommit message (Collapse)AuthorFilesLines
2016-03-12Add Intel PCLMUL implementations of CRC algorithmsJussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'crc-intel-pclmul.c'. * cipher/crc-intel-pclmul.c: New. * cipher/crc.c (USE_INTEL_PCLMUL): New macro. (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'. [USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul) (gcry_crc24rfc2440_intel_pclmul): New. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL HW features detected. (crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL implementation if enabled. (crc24_init): Document storage format of 24-bit CRC. (crc24_next4): Use only 'data' for last table look-up. * configure.ac: Add 'crc-intel-pclmul.lo'. * src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include Intel SSE4.1. * src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection. * src/hwfeatures.c (hwflist): Add 'intel-sse4.1'. * tests/basic.c (fillbuf_count): New. (check_one_md): Add "?" check (million byte data-set with byte pattern 0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?" checks. (check_one_md_multi): Skip "?". (check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160, CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!" test-vectors for CRC32_RFC1510 and CRC24_RFC2440. -- Add Intel PCLMUL accelerated implmentations of CRC algorithms. CRC performance is improved ~11x on x86_64 and i386 on Intel Haswell, and ~2.7x on Intel Sandy-bridge. Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2015-10-28hwf-x86: add detection for Intel CPUs with fast SHLD instructionJussi Kivilinna1-13/+14
* cipher/sha1.c (sha1_init): Use HWF_INTEL_FAST_SHLD instead of HWF_INTEL_CPU. * cipher/sha256.c (sha256_init, sha224_init): Ditto. * cipher/sha512.c (sha512_init, sha384_init): Ditto. * src/g10lib.h (HWF_INTEL_FAST_SHLD): New. (HWF_INTEL_BMI2, HWF_INTEL_SSSE3, HWF_INTEL_PCLMUL, HWF_INTEL_AESNI) (HWF_INTEL_RDRAND, HWF_INTEL_AVX, HWF_INTEL_AVX2) (HWF_ARM_NEON): Update. * src/hwf-x86.c (detect_x86_gnuc): Add detection of Intel Core CPUs with fast SHLD/SHRD instruction. * src/hwfeatures.c (hwflist): Add "intel-fast-shld". -- Intel Core CPUs since codename sandy-bridge have been able to execute SHLD/SHRD instructions faster than rotate instructions ROL/ROR. Since SHLD/SHRD can be used to do rotation, some optimized implementations (SHA1/SHA256/SHA512) use SHLD/SHRD instructions in-place of ROL/ROR. This patch provides more accurate detection of CPUs with fast SHLD implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13SHA-512: Add AVX and AVX2 implementations for x86-64Jussi Kivilinna1-0/+2
* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and 'sha512-avx2-bmi2-amd64.S'. * cipher/sha512-avx-amd64.S: New. * cipher/sha512-avx2-bmi2-amd64.S: New. * cipher/sha512.c (USE_AVX, USE_AVX2): New. (SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'. (sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'. (sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'. [USE_AVX] (_gcry_sha512_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New. (transform) [USE_AVX2]: Add call for AVX2 implementation. (transform) [USE_AVX]: Add call for AVX implementation. * configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check. (sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'. * doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'. * src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New. * src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2". * src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and HWF_INTEL_BMI2. -- Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA512 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$ Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2 vs SSSE3 Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-12SHA-256: Add SSSE3 implementation for x86-64Jussi Kivilinna1-0/+1
* cipher/Makefile.am: Add 'sha256-ssse3-amd64.S'. * cipher/sha256-ssse3-amd64.S: New. * cipher/sha256.c (USE_SSSE3): New. (SHA256_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'. (sha256_init, sha224_init) [USE_SSSE3]: Initialize 'use_ssse3'. (transform): Rename to... (_transform): This. [USE_SSSE3] (_gcry_sha256_transform_amd64_ssse3): New. (transform): New. * configure.ac (HAVE_INTEL_SYNTAX_PLATFORM_AS): New check. (sha256): Add 'sha256-ssse3-amd64.lo'. * doc/gcrypt.texi: Document 'intel-ssse3'. * src/g10lib.h (HWF_INTEL_SSSE3): New. * src/hwfeatures.c (hwflist): Add "intel-ssse3". * src/hwf-x86.c (detect_x86_gnuc): Test for SSSE3. -- Patch adds fast SSSE3 implementation of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Benchmarks: cpu Old New Diff Intel i5-4570 13.99 c/B 10.66 c/B 1.31x Intel i5-2450M 21.53 c/B 15.79 c/B 1.36x Intel Core2 T8100 20.84 c/B 15.07 c/B 1.38x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-12Add a configuration file to disable hardware features.Werner Koch1-0/+76
* src/hwfeatures.c: Inclyde syslog.h and ctype.h. (HWF_DENY_FILE): New. (my_isascii): New. (parse_hwf_deny_file): New. (_gcry_detect_hw_features): Call it. * src/mpicalc.c (main): Correctly initialize Libgcrypt. Add options "--print-config" and "--disable-hwf". Signed-off-by: Werner Koch <wk@gnupg.org>
2013-12-12Move list of hardware features to hwfeatures.c.Werner Koch1-2/+55
* src/global.c (hwflist, disabled_hw_features): Move to .. * src/hwfeatures.c: here. (_gcry_disable_hw_feature): New. (_gcry_enum_hw_features): New. (_gcry_detect_hw_features): Remove arg DISABLED_FEATURES. * src/global.c (print_config, _gcry_vcontrol, global_init): Adjust accordingly. -- It is better to keep the hardware feature infor at one place. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-08-31Add ARM HW feature detection module and add NEON detectionJussi Kivilinna1-0/+5
* configure.ac: Add option --disable-neon-support. (HAVE_GCC_INLINE_ASM_NEON): New. (ENABLE_NEON_SUPPORT): New. [arm]: Add 'hwf-arm.lo' as HW feature module. * src/Makefile.am: Add 'hwf-arm.c'. * src/g10lib.h (HWF_ARM_NEON): New macro. * src/global.c (hwflist): Add HWF_ARM_NEON entry. * src/hwf-arm.c: New file. * src/hwf-common.h (_gcry_hwf_detect_arm): New prototype. * src/hwfeatures.c (_gcry_detect_hw_features) [HAVE_CPU_ARCH_ARM]: Add call to _gcry_hwf_detect_arm. -- Add HW detection module for detecting ARM NEON instruction set. ARM does not have cpuid instruction so we have to rely on OS to pass feature set information to user-space. For linux, NEON support can be detected by parsing '/proc/self/auxv' for hardware capabilities information. For other OSes, NEON can be detected by checking if platform/compiler only supports NEON capable CPUs (by check if __ARM_NEON__ macro is defined). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2012-12-21Prepare for hardware feature detection on other platforms.Werner Koch1-182/+6
* configure.ac (GCRYPT_HWF_MODULES): New. (HAVE_CPU_ARCH_X86, HAVE_CPU_ARCH_ALPHA, HAVE_CPU_ARCH_SPARC) (HAVE_CPU_ARCH_MIPS, HAVE_CPU_ARCH_M68K, HAVE_CPU_ARCH_PPC) (HAVE_CPU_ARCH_ARM): New AC_DEFINEs. * mpi/config.links (mpi_cpu_arch): New. * src/global.c (print_config): Print new tag "cpu-arch". * src/Makefile.am (libgcrypt_la_SOURCES): Add hwf-common.h (EXTRA_libgcrypt_la_SOURCES): New. (gcrypt_hwf_modules): New. (libgcrypt_la_DEPENDENCIES, libgcrypt_la_LIBADD): Add that one. * src/hwfeatures.c: Factor most code out to ... * src/hwf-x86.c: New file. (detect_x86_gnuc): Return the feature vector. (_gcry_hwf_detect_x86): New. * src/hwf-common.h: New. * src/hwfeatures.c (_gcry_detect_hw_features): Dispatch using HAVE_CPU_ARCH_ macros. Signed-off-by: Werner Koch <wk@gnupg.org>
2012-12-21Clean up i386/x86-64 cpuid usage in hwfeatures.cJussi Kivilinna1-218/+122
* src/hwfeatures.c [__i386__ && __GNUC__] (detect_ia32_gnuc): Remove. [__x86_64__ && __GNUC__] (detect_x86_64_gnuc): Remove. [__i386__ && __GNUC__] (is_cpuid_available, get_cpuid) (HAS_X86_CPUID): New. [__x86_64__ && __GNUC__] (is_cpuid_available, get_cpuid) (HAS_X86_CPUID): New. [HAS_X86_CPUID] (detect_x86_gnuc): New. (_gcry_detect_hw_features) [__i386__ && GNUC]: Remove detect_ia32_gnuc call. (_gcry_detect_hw_features) [__x86_64__ && GNUC]: Remove detect_x86_64_gnuc call. (_gcry_detect_hw_features) [HAS_X86_CPUID]: Add detect_x86_gnuc call. -- For hwfeatures.c clean up, merge i386/x86-64 hardware detection and move i386/x86-64 spesific assembler to separate functions, is_cpuid_available() and get_cpuid(). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
2012-12-18Add support for using DRNG random number generatorDmitry Kasatkin1-0/+30
* configure.ac: Add option --disable-drng-support. (ENABLE_DRNG_SUPPORT): New. * random/rndhw.c (USE_DRNG): New. (rdrand_long, rdrand_nlong, poll_drng): New. (_gcry_rndhw_poll_fast, _gcry_rndhw_poll_slow): Call poll function. * src/g10lib.h (HWF_INTEL_RDRAND): New. * src/global.c (hwflist): Add "intel-rdrand". * src/hwfeatures.c (detect_x86_64_gnuc) [ENABLE_DRNG_SUPPORT]: Detect RDRAND. (detect_ia32_gnuc) [ENABLE_DRNG_SUPPORT]: Detect RDRAND. -- This patch provides support for using Digital Random Number Generator (DRNG) engine, which is available on the latest Intel's CPUs. DRNG engine is accesible via new the RDRAND instruction. This patch adds the following: - support for disabling using of rdrand instruction - checking for RDRAND instruction support using cpuid - RDRAND usage implementation Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> ChangeLog and editorial changes by wk.
2012-11-21Do not detect AES-NI support if disabled by configure.Werner Koch1-4/+10
* src/hwfeatures.c (detect_ia32_gnuc): Detect AESNI support only if that support has been enabled. -- We better do not try to detect AESNI support if the support has been disabled in the configure run. Disabling the support might have been done due to problem with the AESNI support on a certain platform and we can't exclude problem for sure with the detection code either.
2012-11-21Add x86_64 support for AES-NIJussi Kivilinna1-3/+0
* cipher/rijndael.c [ENABLE_AESNI_SUPPORT]: Enable USE_AESNI on x86-64. (do_setkey) [USE_AESNI_is_disabled_here]: Use %[key] and %[ksch] directly as registers instead of using temporary register %%esi. [USE_AESNI] (do_aesni_enc_aligned, do_aesni_dec_aligned, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Use %[key] directly as register instead of using temporary register %%esi. [USE_AESNI] (do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Change %[key] from generic "g" type to register "r". * src/hwfeatures.c (_gcry_detect_hw_features) [__x86_64__]: Do not clear AES-NI feature flag. -- AES-NI assembler uses %%esi for key-material pointer register. However %[key] can be marked as "r" (register) and automatically be 64bit on x86-64 and be 32bit on i386. So use %[key] for pointer register instead of %esi and that way make same AES-NI code work on both x86-64 and i386. [v2] - Add GNU style changelog - Fixed do_setkey changes, use %[ksch] for output instead of %[key] - Changed [key] assembler arguments from "g" to "r" to force use of registers in all cases (when tested v1, "g" did work as indented and %[key] mapped to register on i386 and x86-64, but that might not happen always). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
2012-11-21Fix cpuid vendor-id check for i386 and x86-64Jussi Kivilinna1-27/+32
* src/hwfeatures.c (detect_x86_64_gnuc, detect_ia32_gnuc): Allow Intel features be detect from CPU by other vendors too. -- detect_x86_64_gnuc() and detect_ia32_gnuc() incorrectly exclude Intel features on all other vendor CPUs. What we want here, is to detect if CPU from any vendor support said Intel feature (in this case AES-NI). [v2] - Add GNU style changelog Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
2012-11-21Fix hwdetect assembler clobbersJussi Kivilinna1-4/+4
* src/hwfeatures.c (detect_x86_64_gnuc): Add missing %ebx assembler clobbers. (detect_x86_64_gnuc, detect_ia32_gnuc) [ENABLE_PADLOCK_SUPPORT]: Add missing %ecx assembler clobbers. -- detect_x86_64_gnuc() and detect_ia32_gnuc() have missing clobbers in assembler statements. "%ebx" is missing in x86-64, probably because copy-paste error (i386 code saves and restores %ebx to/from stack). "%ecx" is missing from PadLock detection. [v2] - add GNU style changelog Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
2012-06-21Clear AESNI feature flag for x86_64.Werner Koch1-0/+3
* src/hwfeatures.c (_gcry_detect_hw_features) [__x86_64__]: Clear AESNI feature flag.
2012-06-21Beautify last change.Werner Koch1-8/+14
* cipher/rijndael.c: Replace C99 feature from last patch. Keep cpp lines short. * random/rndhw.c: Keep cpp lines short. * src/hwfeatures.c (_gcry_detect_hw_features): Make cpp def chain better readable.
2012-06-21Enable VIA Padlock on x86_64 platformsRafaël Carré1-0/+97
* cipher/rijndael.c: Duplicate x86 assembly and convert to x86_64. * random/rndhw.c: Likewise. * src/hwfeatures.c: Likewise. -- Changes made to the x86 assembly: - *l -> *q (long -> quad) - e** registers -> r** registers (use widest registers available) - don't mess with ebx GOT register Tested with make check on VIA Nano X2 L4350 Signed-off-by: Rafaël Carré <funman@videolan.org>
2011-02-17Fix AES-NI detection.Werner Koch1-1/+1
Really a kind of bown paper bag bug: Use AND and not SUB for bit testing. I should have known that, given that 30 years ago I wrote almost everything in asm.
2011-02-16Add GCRYCTL_DISABLE_HWFWerner Koch1-2/+4
This option is useful to disable detected hardware features. It has been implemented in benchmark, so that it is now possible to run tests/benchmark --disable-hwf intel-aesni cipher aes aes192 aes256 to compare the use of AES-NI insns to the pure C code.
2011-02-11Renamed existing flag for AES-NI and fixed detection.Werner Koch1-1/+1
2011-02-04Nuked almost all trailing whitespace.Werner Koch1-12/+12
Check and install the standard git pre-commit hook.
2010-08-27Prepare support of ia32 AES instructionsWerner Koch1-5/+28
2009-11-29Fix detection of cpuid statement.Werner Koch1-1/+1
2008-08-29Changed the way the FIPS RNG is seeded.Werner Koch1-1/+4
FIPS cleanups. Documentation upodates.
2008-01-212008-01-21 Marcus Brinkmann <marcus@g10code.de>Marcus Brinkmann1-5/+4
* hwfeatures.c (detect_ia32_gnuc): Fix inline asm.
2007-12-10Preparing 1.4.0.Werner Koch1-2/+4
2007-11-29The ACE engine of VIA processors is now used for AES-128.Werner Koch1-0/+94
2007-10-31Reorganizatiosn to support the visibility attribute.Werner Koch1-0/+69
This can be improved by using fucntion aliases instead of wrapper functions.