summaryrefslogtreecommitdiff
path: root/cipher/sha512-avx2-bmi2-amd64.S
AgeCommit message (Collapse)AuthorFilesLines
2015-05-02Enable AMD64 SHA512 implementations for WIN64Jussi Kivilinna1-2/+9
* cipher/sha512-avx-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/sha512-avx-bmi2-amd64.S: Ditto. * cipher/sha512-ssse3-amd64.S: Ditto. * cipher/sha512.c (USE_SSSE3, USE_AVX, USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. [USE_SSSE3 || USE_AVX || USE_AVX2] (ASM_FUNC_ABI) (ASM_EXTRA_STACK): New. (_gcry_sha512_transform_amd64_ssse3, _gcry_sha512_transform_amd64_avx) (_gcry_sha512_transform_amd64_avx_bmi2): Add ASM_FUNC_ABI to prototypes. (transform): Add ASM_EXTRA_STACK to stack burn value. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-14Minor fixes to SHA assembly implementationsJussi Kivilinna1-13/+3
* cipher/Makefile.am: Correct 'sha256-avx*.S' to 'sha512-avx*.S'. * cipher/sha1-ssse3-amd64.S: First line, correct filename. * cipher/sha256-ssse3-amd64.S: Return correct stack burn depth. * cipher/sha512-avx-amd64.S: Use 'vzeroall' to clear registers. * cipher/sha512-avx2-bmi2-amd64.S: Ditto and return correct stack burn depth. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13Add missing register clearing in to SHA-256 and SHA-512 assemblyJussi Kivilinna1-0/+14
* cipher/sha256-ssse3-amd64.S: Clear used XMM/YMM registers at return. * cipher/sha512-avx-amd64.S: Ditto. * cipher/sha512-avx2-bmi2-amd64.S: Ditto. * cipher/sha512-ssse3-amd64.S: Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-12-13SHA-512: Add AVX and AVX2 implementations for x86-64Jussi Kivilinna1-0/+783
* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and 'sha512-avx2-bmi2-amd64.S'. * cipher/sha512-avx-amd64.S: New. * cipher/sha512-avx2-bmi2-amd64.S: New. * cipher/sha512.c (USE_AVX, USE_AVX2): New. (SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'. (SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'. (sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'. (sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'. [USE_AVX] (_gcry_sha512_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New. (transform) [USE_AVX2]: Add call for AVX2 implementation. (transform) [USE_AVX]: Add call for AVX implementation. * configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check. (sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'. * doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'. * src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New. * src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2". * src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and HWF_INTEL_BMI2. -- Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA512 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$ Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu Old SSSE3 AVX/AVX2 Old vs AVX/AVX2 vs SSSE3 Intel i5-4570 10.11 c/B 7.56 c/B 6.72 c/B 1.50x 1.12x Intel i5-2450M 14.11 c/B 10.53 c/B 8.88 c/B 1.58x 1.18x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>