From 5d601dd57fcb41aa2015ab655fd6fc51537da667 Mon Sep 17 00:00:00 2001 From: Jussi Kivilinna Date: Sat, 12 Mar 2016 17:07:21 +0200 Subject: Add Intel PCLMUL implementations of CRC algorithms * cipher/Makefile.am: Add 'crc-intel-pclmul.c'. * cipher/crc-intel-pclmul.c: New. * cipher/crc.c (USE_INTEL_PCLMUL): New macro. (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'use_pclmul'. [USE_INTEL_PCLMUL] (_gcry_crc32_intel_pclmul) (gcry_crc24rfc2440_intel_pclmul): New. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Select PCLMUL implementation if SSE4.1 and PCLMUL HW features detected. (crc32_write, crc24rfc2440_write) [USE_INTEL_PCLMUL]: Use PCLMUL implementation if enabled. (crc24_init): Document storage format of 24-bit CRC. (crc24_next4): Use only 'data' for last table look-up. * configure.ac: Add 'crc-intel-pclmul.lo'. * src/g10lib.h (HWF_*, HWF_INTEL_SSE4_1): Update HWF flags to include Intel SSE4.1. * src/hwf-x86.c (detect_x86_gnuc): Add SSE4.1 detection. * src/hwfeatures.c (hwflist): Add 'intel-sse4.1'. * tests/basic.c (fillbuf_count): New. (check_one_md): Add "?" check (million byte data-set with byte pattern 0x00,0x01,0x02,...); Test all buffer sizes 1 to 1000, for "!" and "?" checks. (check_one_md_multi): Skip "?". (check_digests): Add "?" test-vectors for MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SHA3_224, SHA3_256, SHA3_384, SHA3_512, RIPEMD160, CRC32, CRC32_RFC1510, CRC24_RFC2440, TIGER1 and WHIRLPOOL; Add "!" test-vectors for CRC32_RFC1510 and CRC24_RFC2440. -- Add Intel PCLMUL accelerated implmentations of CRC algorithms. CRC performance is improved ~11x on x86_64 and i386 on Intel Haswell, and ~2.7x on Intel Sandy-bridge. Benchmark on Intel Core i5-4570 (x86_64, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B CRC32RFC1510 | 0.865 ns/B 1102.7 MiB/s 2.77 c/B CRC24RFC2440 | 0.865 ns/B 1103.0 MiB/s 2.77 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.079 ns/B 12051.7 MiB/s 0.253 c/B CRC32RFC1510 | 0.079 ns/B 12050.6 MiB/s 0.253 c/B CRC24RFC2440 | 0.079 ns/B 12100.0 MiB/s 0.252 c/B Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.860 ns/B 1109.0 MiB/s 2.75 c/B CRC32RFC1510 | 0.861 ns/B 1108.3 MiB/s 2.75 c/B CRC24RFC2440 | 0.860 ns/B 1108.6 MiB/s 2.75 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC32RFC1510 | 0.078 ns/B 12207.0 MiB/s 0.250 c/B CRC24RFC2440 | 0.080 ns/B 11925.6 MiB/s 0.256 c/B Benchmark on Intel Core i5-2450M (x86_64, 2.5 Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 1.25 ns/B 762.3 MiB/s 3.13 c/B CRC32RFC1510 | 1.26 ns/B 759.1 MiB/s 3.14 c/B CRC24RFC2440 | 1.25 ns/B 764.9 MiB/s 3.12 c/B After: | nanosecs/byte mebibytes/sec cycles/byte CRC32 | 0.451 ns/B 2114.3 MiB/s 1.13 c/B CRC32RFC1510 | 0.451 ns/B 2114.6 MiB/s 1.13 c/B CRC24RFC2440 | 0.457 ns/B 2085.0 MiB/s 1.14 c/B Signed-off-by: Jussi Kivilinna --- configure.ac | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'configure.ac') diff --git a/configure.ac b/configure.ac index 8b503609..ff72e3fd 100644 --- a/configure.ac +++ b/configure.ac @@ -2023,6 +2023,13 @@ LIST_MEMBER(crc, $enabled_digests) if test "$found" = "1" ; then GCRYPT_DIGESTS="$GCRYPT_DIGESTS crc.lo" AC_DEFINE(USE_CRC, 1, [Defined if this module should be included]) + + case "${host}" in + i?86-*-* | x86_64-*-*) + # Build with the assembly implementation + GCRYPT_DIGESTS="$GCRYPT_DIGESTS crc-intel-pclmul.lo" + ;; + esac fi LIST_MEMBER(gostr3411-94, $enabled_digests) -- cgit v1.2.1