Age | Commit message (Collapse) | Author | Files | Lines |
|
* src/cipher-proto.h (gcry_pk_spec_t): Add fields ALGO and FLAGS.
* cipher/dsa.c (_gcry_pubkey_spec_dsa): Set these fields.
* cipher/ecc.c (_gcry_pubkey_spec_ecdsa): Ditto.
(_gcry_pubkey_spec_ecdh): Ditto.
* cipher/rsa.c (_gcry_pubkey_spec_rsa): Ditto.
* cipher/elgamal.c (_gcry_pubkey_spec_elg): Ditto
(_gcry_pubkey_spec_elg_e): New.
* cipher/pubkey.c: Change most code to replace the former module
system by a simpler system to gain information about the algorithms.
(disable_pubkey_algo): SImplified. Not anymore thread-safe, though.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt-module.h (gcry_pk_spec_t): Move this typedef and the
corresponding function typedefs to ...
* src/cipher-proto.h: here.
(pk_extra_spec_t): Remove typedef and merge fields into
gcry_pk_spec_t.
* cipher/rsa.c, cipher/dsa.c, cipher/elg.c, cipher/ecc.c: Ditto.
* cipher/pubkey.c: Change accordingly.
* src/cipher.h (_gcry_pubkey_extraspec_rsa): Remove.
(_gcry_pubkey_extraspec_dsa): Remove.
(_gcry_pubkey_extraspec_elg): Remove.
(_gcry_pubkey_extraspec_ecdsa): Remove.
--
Now that we don't have loadable modules anymore, we don't need to keep
the internal API between the modules and thus can simplify the code.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/gost.h (_gcry_gost_enc_one): Change return type to
'unsigned int'.
* cipher/gost28147.c (max): New macro.
(gost_encrypt_block, gost_decrypt_block): Return burn stack depth.
(_gcry_gost_enc_one): Return burn stack depth from gost_encrypt_block.
--
Return type for block cipher functions was lately changed from 'void' to
'unsigned int' to pass burn stack depth to cipher mode code. Patch fixes
gost28147 to return stack burn value.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
--
Dots and dashes in the names are probably not a good idea. I also
renamed the identifiers to names which are easier to remember.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_12_256)
(GCRY_MD_GOSTR3411_12_512): New.
* cipher/stribog.c: New.
* configure.ac (available_digests_64): Add stribog.
* src/cipher.h: Declare Stribog declarations.
* cipher/md.c: Register Stribog digest.
* tests/basic.c (check_digests) Add 4 testcases for Stribog from
standard.
* doc/gcrypt.texi: Document new constants.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* src/gcrypt.h.in (GCRY_MD_GOSTR3411_94): New.
* cipher/gostr3411-94.c: New.
* configure.ac (available_digests): Add gostr3411-94.
* src/cipher.h: Add gostr3411-94 definitions.
* cipher/md.c: Register GOST R 34.11-94.
* tests/basic.c (check_digests): Add 4 tests for GOST R 34.11-94
hash algo. Two are defined in the standard itself, two other are
more or less common tests - an empty string an exclamation mark.
* doc/gcrypt.texi: Add an entry describing GOST R 34.11-94 to the MD
algorithms table.
--
Add simple implementation of GOST R 34.11-94 hash function. Currently
there is no way to specify hash parameters (it always uses GOST R 34.11-94
test parameters).
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Stack burn value in gost3411_init added by wk.
|
|
* cipher/hash-common.c (_gcry_md_block_write): New function to handle
block md operations. The current implementation is limited to 64 byte
buffer and u32 block counter.
* cipher/md4.c, cipher/md5.c, cipher/rmd.h, cipher/rmd160.c
*cipher/sha1.c, cipher/sha256.c, cipher/tiger.c: Convert to use
_gcry_md_block_write.
--
Whirlpool and SHA512 are left as before, as SHA512 uses 128 bytes buffer
and u64 blocks counter and Whirlpool does not have trivial block handling
structure.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Indentation changes, minor edits and adjustment of
_gcry_sha1_hash_buffers by wk.
|
|
* src/gcrypt.h.in (GCRY_CIPHER_GOST28147): New.
* cipher/gost.h, cipher/gost28147.c: New.
* configure.ac (available_ciphers): Add gost28147.
* src/cipher.h: Add gost28147 definitions.
* cipher/cipher.c: Register gost28147.
* tests/basic.c (check_ciphers): Enable simple test for gost28147.
* doc/gcrypt.texi: document GCRY_CIPHER_GOST28147.
--
Add a very basic implementation of GOST 28147-89 cipher: from modes
defined in standard only ECB and CFB are supported, sbox is limited
to the "test variant" as provided in GOST 34.11-94.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* src/mpi.h (enum ecc_dialects): New.
* src/ec-context.h (mpi_ec_ctx_s): Add field DIALECT.
* cipher/ecc-common.h (elliptic_curve_t): Ditto.
* cipher/ecc-curves.c (ecc_domain_parms_t): Ditto.
(domain_parms): Add dialect values.
(_gcry_ecc_fill_in_curve): Set dialect.
(_gcry_ecc_get_curve): Ditto.
(_gcry_mpi_ec_new): Ditto.
(_gcry_ecc_get_param): Use ECC_DIALECT_STANDARD for now.
* cipher/ecc-misc.c (_gcry_ecc_curve_copy): Copy dialect.
(_gcry_ecc_dialect2str): New.
* mpi/ec.c (ec_p_init): Add arg DIALECT.
(_gcry_mpi_ec_p_internal_new): Ditto.
(_gcry_mpi_ec_p_new): Ditto.
* mpi/mpiutil.c (gcry_mpi_set_opaque): Set the secure flag.
(_gcry_mpi_set_opaque_copy): New.
* cipher/ecc-misc.c (_gcry_ecc_os2ec): Take care of an opaque MPI.
* cipher/ecc.c (eddsa_generate_key): New.
(generate_key): Rename to nist_generate_key and factor some code out
to ...
(ecc_generate_ext): here. Divert to eddsa_generate_key if desired.
(eddsa_decodepoint): Take care of an opaque MPI.
(ecc_check_secret_key): Ditto.
(ecc_sign): Ditto.
* cipher/pubkey.c (sexp_elements_extract_ecc): Store public and secret
key as opaque MPIs.
(gcry_pk_genkey): Add the curve_name also to the private key part of
the result.
* tests/benchmark.c (ecc_bench): Support Ed25519.
(main): Add option --debug.
* tests/curves.c (sample_key_2): Make sure that P and N are positive.
* tests/keygen.c (show): New.
(check_ecc_keys): Support Ed25519.
--
There are two main purposes of this patch: Add a key generation
feature for Ed25519 and add the "dialect" thingy which will eventually
be used to add curve specific optimization.
Note that the entire way of how we interface between the public key
modules and pubkey.c is overly complex and probably also the cause for
a lot of performance overhead. Given that we don't have the loadable
module system anymore, we should entirely get rid of the MPI-array
based internal interface and move parts of the s-expression handling
direct into the pubkey modules. This needs to be fixed or we are
turning Libgcrypt into another software incarnation of Heathrow
Airport.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/ecc-curves.c (domain_parms): Add curve "Ed25519".
* cipher/ecc.c (reverse_buffer): New.
(eddsa_encodempi): New.
(eddsa_encodepoint): New.
(eddsa_decodepoint): New.
(sign_eddsa): Implement.
(verify_eddsa): Implement.
(ecc_sign): Init unused Q. Pass public key to sign_eddsa.
(ecc_verify): Init pk.Q if not used. Pass public key verbatim to
verify_eddsa.
* cipher/pubkey.c (sexp_elements_extract): Add arg OPAQUE. Change all
callers to pass 0.
(sexp_to_sig): Add arg OPAQUE and pass it to sexp_elements_extract.
(sexp_data_to_mpi): Allow for a zero length "value".
(gcry_pk_verify): Reorder parameter processing. Pass OPAQUE flag as
required.
* mpi/ec.c (ec_invm): Print a warning if the inverse does not exist.
(_gcry_mpi_ec_get_affine): Implement for our Twisted Edwards curve
model.
(dup_point_twistededwards): Implement.
(add_points_twistededwards): Implement.
(_gcry_mpi_ec_mul_point): Support Twisted Edwards.
* mpi/mpicoder.c (do_get_buffer): Add arg FILL_LE.
(_gcry_mpi_get_buffer): Ditto. Change all callers.
(_gcry_mpi_get_secure_buffer): Ditto.
* src/sexp.c (_gcry_sexp_nth_opaque_mpi): New.
* tests/t-ed25519.c: New.
* tests/t-ed25519.inp: New.
* tests/t-mpi-point.c (basic_ec_math_simplified): Print some output
only in debug mode.
(twistededwards_math): New test.
(main): Call new test.
--
This is a non optimized version which takes far too long. On my X220
Thinkpad the 1024 test cases take 14 seconds (12 with --sign-with-pk).
There should be a lot of room for improvements.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* mpi/mpicoder.c (gcry_mpi_dump): Remove.
(_gcry_log_mpidump): Remove.
* src/misc.c (_gcry_log_printhex): Factor all code out to ...
(do_printhex): new. Add line wrapping a and compact printing.
(_gcry_log_printmpi): New.
* src/mpi.h (log_mpidump): Remove macro.
* src/g10lib.h (log_mpidump): Add compatibility macro.
(log_printmpi): New macro
* src/visibility.c (gcry_mpi_dump): Call _gcry_log_printmpi.
* cipher/primegen.c (prime_generate_internal): Replace gcry_mpi_dump
by log_printmpi.
(gcry_prime_group_generator): Ditto.
* cipher/pubkey.c: Remove extra colons from log_mpidump call.
* cipher/rsa.c (stronger_key_check): Use log_printmpi.
--
The values to debug get longer and longer and the different debug
functions made it hard to check them out. Now MPIs and hex buffers are
printed very similar. Lines may now wrap with an backslash as
indicator. MPIs are distinguished from plain buffers in the output by
always using a sign.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt.h.in (gcry_buffer_t): new.
(gcry_md_hash_buffers): New.
* src/visibility.c, src/visibility.h: Add wrapper for new function.
* src/libgcrypt.def, src/libgcrypt.vers: Export new function.
* cipher/md.c (gcry_md_hash_buffers): New.
* cipher/sha1.c (_gcry_sha1_hash_buffers): New.
* tests/basic.c (check_one_md_multi): New.
(check_digests): Run that test.
* tests/hmac.c (check_hmac_multi): New.
(main): Run that test.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/whirlpool.c (whirlpool_add): Remove shortcut return so that
byte counter is always properly updated.
--
Using the forthcoming gcry_md_hash_buffers() and its test suite, I
found that a message of size 62 won't yield the correct hash if it is
fed into Whirlpool into in chunks. The fix is obvious. The wrong
code was likely due to using similar structure as SHA-1 but neglecting
that bytes and not blocks are counted.
|
|
--
|
|
* cipher/rijndael-amd64.S: Correct 'RIP' macro for non-PIC build.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/scrypt.c (_salsa20_core): Fix endianess issues.
--
On big-endian systems 'tests/t-kdf' was failing scrypt tests. Patch fixes the
issue.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* src/gcrypt.h.in (GCRY_CIPHER_SALSA20R12): New.
* src/salsa20.c (salsa20_core, salsa20_do_encrypt_stream): Add support
for reduced round versions.
(salsa20r12_encrypt_stream, _gcry_cipher_spec_salsa20r12): Implement
Salsa20/12 - a 12 round version of Salsa20 selected by eStream.
* src/cipher.h: Declsare Salsa20/12 definition.
* cipher/cipher.c: Register Salsa20/12
* tests/basic.c: (check_stream_cipher, check_stream_cipher_large_block):
Populate Salsa20/12 tests with test vectors from ecrypt
(check_ciphers): Add simple test for Salsa20/12
--
Salsa20/12 is a reduced round version of Salsa20 that is amongst ciphers
selected by eSTREAM for Phase 3 of Profile 1 algorithm. Moreover it is
one of proposed ciphers for TLS (draft-josefsson-salsa20-tls-02).
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* mpi/ec.c (ec_p_init): Add args MODEL and P. Change all callers.
(_gcry_mpi_ec_p_internal_new): Ditto.
(_gcry_mpi_ec_p_new): Ditto.
* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Return
GPG_ERR_UNKNOWN_CURVE instead of invalid value. Init curve model.
* cipher/ecc.c (ecc_verify, ecc_encrypt_raw): Ditto.
* cipher/pubkey.c (sexp_data_to_mpi): Fix EDDSA flag error checking.
--
(fixes commit c26be7a337d0bf98193bc58e043209e46d0769bb)
|
|
* src/gcrypt.h.in (gcry_mpi_is_neg): New.
(gcry_mpi_neg, gcry_mpi_abs): New.
* mpi/mpiutil.c (_gcry_mpi_is_neg): New.
(_gcry_mpi_neg, _gcry_mpi_abs): New.
* src/visibility.c, src/visibility.h: Add wrappers.
* src/libgcrypt.def, src/libgcrypt.vers: Export them.
* src/mpi.h (mpi_is_neg): New. Rename old macro to mpi_has_sign.
* mpi/mpi-mod.c (_gcry_mpi_mod_barrett): Use mpi_has_sign.
* mpi/mpi-mpow.c (calc_barrett): Ditto.
* cipher/primegen.c (_gcry_derive_x931_prime): Ditto
* cipher/rsa.c (secret): Ditto.
|
|
* src/cipher.h (PUBKEY_FLAG_EDDSA): New.
* cipher/pubkey.c (pubkey_verify): Repalce args CMP and OPAQUEV by
CTX. Pass flags and hash algo to the verify function. Change all
verify functions to accept these args.
(sexp_data_to_mpi): Implement new flag "eddsa".
(gcry_pk_verify): Pass CTX instead of the compare function to
pubkey_verify.
* cipher/ecc.c (sign): Rename to sign_ecdsa. Change all callers.
(verify): Rename to verify_ecdsa. Change all callers.
(sign_eddsa, verify_eddsa): New stub functions.
(ecc_sign): Divert to sign_ecdsa or sign_eddsa.
(ecc_verify): Divert to verify_ecdsa or verify_eddsa.
|
|
* src/mpi.h (gcry_mpi_ec_models): New.
* src/ec-context.h (mpi_ec_ctx_s): Add MODEL.
* cipher/ecc-common.h (elliptic_curve_t): Ditto.
* cipher/ecc-curves.c (ecc_domain_parms_t): Ditto.
(domain_parms): Mark als as Weierstrass.
(_gcry_ecc_fill_in_curve): Check model.
(_gcry_ecc_get_curve): Set model to Weierstrass.
* cipher/ecc-misc.c (_gcry_ecc_model2str): New.
* cipher/ecc.c (generate_key, ecc_generate_ext): Print model in the
debug output.
* mpi/ec.c (_gcry_mpi_ec_dup_point): Switch depending on model.
Factor code out to ...
(dup_point_weierstrass): new.
(dup_point_montgomery, dup_point_twistededwards): New stub functions.
(_gcry_mpi_ec_add_points): Switch depending on model. Factor code out
to ...
(add_points_weierstrass): new.
(add_points_montgomery, add_points_twistededwards): New stub
functions.
* tests/Makefile.am (TESTS): Reorder tests.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt-module.h (gcry_cipher_encrypt_t)
(gcry_cipher_decrypt_t): Return 'unsigned int'.
* cipher/cipher.c (dummy_encrypt_block, dummy_decrypt_block): Return
zero.
(do_ecb_encrypt, do_ecb_decrypt): Get largest stack burn depth from
block cipher crypt function and burn stack at end.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Ditto.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_cbc_encrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
* cipher/blowfish.c (encrypt_block, decrypt_block): Return burn stack
depth.
* cipher/camellia-glue.c (camellia_encrypt, camellia_decrypt): Ditto.
* cipher/cast5.c (encrypt_block, decrypt_block): Ditto.
* cipher/des.c (do_tripledes_encrypt, do_tripledes_decrypt)
(do_des_encrypt, do_des_decrypt): Ditto.
* cipher/idea.c (idea_encrypt, idea_decrypt): Ditto.
* cipher/rijndael.c (rijndael_encrypt, rijndael_decrypt): Ditto.
* cipher/seed.c (seed_encrypt, seed_decrypt): Ditto.
* cipher/serpent.c (serpent_encrypt, serpent_decrypt): Ditto.
* cipher/twofish.c (twofish_encrypt, twofish_decrypt): Ditto.
* cipher/rfc2268.c (encrypt_block, decrypt_block): New.
(_gcry_cipher_spec_rfc2268_40): Use encrypt_block and decrypt_block.
--
Patch moves stack burning from block ciphers and cipher mode loop to end of
cipher mode functions. This greatly reduces the overall CPU usage of the
problematic _gcry_burn_stack. Internal cipher module API is changed so
that encrypt/decrypt functions now return the stack burn depth as unsigned
int to cipher mode function.
(Note, patch also adds missing burn_stack for RFC2268_40 cipher).
_gcry_burn_stack CPU time (looping tests/benchmark cipher blowfish):
arch CPU Old New
i386 Intel-Haswell 4.1% 0.16%
x86_64 Intel-Haswell 3.4% 0.07%
armhf Cortex-A8 8.7% 0.14%
New vs. old (armhf/Cortex-A8):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 1.05x 1.05x 1.04x 1.04x 1.04x 1.04x 1.07x 1.05x 1.04x 1.04x
3DES 1.04x 1.03x 1.04x 1.03x 1.04x 1.04x 1.04x 1.04x 1.04x 1.04x
CAST5 1.19x 1.20x 1.15x 1.00x 1.17x 1.00x 1.15x 1.05x 1.00x 1.00x
BLOWFISH 1.21x 1.22x 1.16x 1.00x 1.18x 1.00x 1.16x 1.16x 1.00x 1.00x
AES 1.09x 1.09x 1.00x 1.00x 1.00x 1.00x 1.07x 1.07x 1.00x 1.00x
AES192 1.11x 1.11x 1.00x 1.00x 1.00x 1.00x 1.08x 1.09x 1.01x 1.00x
AES256 1.07x 1.08x 1.01x .99x 1.00x 1.00x 1.07x 1.06x 1.00x 1.00x
TWOFISH 1.10x 1.09x 1.09x 1.00x 1.09x 1.00x 1.08x 1.09x 1.00x 1.00x
ARCFOUR 1.00x 1.00x
DES 1.07x 1.11x 1.06x 1.08x 1.07x 1.07x 1.06x 1.06x 1.06x 1.06x
TWOFISH128 1.10x 1.10x 1.09x 1.00x 1.09x 1.00x 1.08x 1.08x 1.00x 1.00x
SERPENT128 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x
SERPENT192 1.07x 1.06x 1.03x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x
SERPENT256 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.05x 1.06x 1.00x 1.00x
RFC2268_40 0.97x 1.01x 0.99x 0.98x 1.00x 0.97x 0.96x 0.96x 0.97x 0.97x
SEED 1.45x 1.54x 1.53x 1.56x 1.50x 1.51x 1.50x 1.50x 1.42x 1.42x
CAMELLIA128 1.08x 1.07x 1.06x 1.00x 1.07x 1.00x 1.06x 1.06x 1.00x 1.00x
CAMELLIA192 1.08x 1.08x 1.08x 1.00x 1.07x 1.00x 1.07x 1.07x 1.00x 1.00x
CAMELLIA256 1.08x 1.09x 1.07x 1.01x 1.08x 1.00x 1.07x 1.07x 1.00x 1.00x
SALSA20 .99x 1.00x
Raw data:
New (armhf/Cortex-A8):
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 8620ms 8680ms 9640ms 10010ms 9140ms 8960ms 9630ms 9660ms 9180ms 9180ms
3DES 13990ms 14000ms 14780ms 15300ms 14320ms 14370ms 14780ms 14780ms 14480ms 14480ms
CAST5 2980ms 2980ms 3780ms 2300ms 3290ms 2320ms 3770ms 4100ms 2320ms 2320ms
BLOWFISH 2740ms 2660ms 3530ms 2060ms 3050ms 2080ms 3530ms 3530ms 2070ms 2070ms
AES 2200ms 2330ms 2330ms 2450ms 2270ms 2270ms 2700ms 2690ms 2330ms 2320ms
AES192 2550ms 2670ms 2700ms 2910ms 2630ms 2640ms 3060ms 3060ms 2680ms 2690ms
AES256 2920ms 3010ms 3040ms 3190ms 3010ms 3000ms 3380ms 3420ms 3050ms 3050ms
TWOFISH 2790ms 2840ms 3300ms 2950ms 3010ms 2870ms 3310ms 3280ms 2940ms 2940ms
ARCFOUR 2050ms 2050ms
DES 5640ms 5630ms 6440ms 6970ms 5960ms 6000ms 6440ms 6440ms 6120ms 6120ms
TWOFISH128 2790ms 2840ms 3300ms 2950ms 3010ms 2890ms 3310ms 3290ms 2930ms 2930ms
SERPENT128 4530ms 4340ms 5210ms 4470ms 4740ms 4620ms 5020ms 5030ms 4680ms 4680ms
SERPENT192 4510ms 4340ms 5190ms 4460ms 4750ms 4620ms 5020ms 5030ms 4680ms 4680ms
SERPENT256 4540ms 4330ms 5220ms 4460ms 4730ms 4600ms 5030ms 5020ms 4680ms 4680ms
RFC2268_40 10530ms 7790ms 11140ms 9490ms 10650ms 10710ms 11710ms 11690ms 11000ms 11000ms
SEED 4530ms 4540ms 5050ms 5380ms 4760ms 4810ms 5060ms 5060ms 4850ms 4860ms
CAMELLIA128 2660ms 2630ms 3170ms 2750ms 2880ms 2740ms 3170ms 3170ms 2780ms 2780ms
CAMELLIA192 3430ms 3400ms 3930ms 3530ms 3650ms 3500ms 3940ms 3940ms 3570ms 3560ms
CAMELLIA256 3430ms 3390ms 3940ms 3500ms 3650ms 3510ms 3930ms 3940ms 3550ms 3550ms
SALSA20 1910ms 1900ms
Old (armhf/Cortex-A8):
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 9030ms 9100ms 10050ms 10410ms 9540ms 9360ms 10350ms 10190ms 9560ms 9570ms
3DES 14580ms 14460ms 15300ms 15720ms 14880ms 14900ms 15350ms 15330ms 15030ms 15020ms
CAST5 3560ms 3570ms 4350ms 2300ms 3860ms 2330ms 4340ms 4320ms 2330ms 2320ms
BLOWFISH 3320ms 3250ms 4110ms 2060ms 3610ms 2080ms 4100ms 4090ms 2070ms 2070ms
AES 2390ms 2530ms 2320ms 2460ms 2280ms 2270ms 2890ms 2880ms 2330ms 2330ms
AES192 2830ms 2970ms 2690ms 2900ms 2630ms 2650ms 3320ms 3330ms 2700ms 2690ms
AES256 3110ms 3250ms 3060ms 3170ms 3000ms 3000ms 3610ms 3610ms 3050ms 3060ms
TWOFISH 3080ms 3100ms 3600ms 2940ms 3290ms 2880ms 3560ms 3570ms 2940ms 2930ms
ARCFOUR 2060ms 2050ms
DES 6060ms 6230ms 6850ms 7540ms 6380ms 6400ms 6830ms 6840ms 6500ms 6510ms
TWOFISH128 3060ms 3110ms 3600ms 2940ms 3290ms 2890ms 3560ms 3560ms 2940ms 2930ms
SERPENT128 4820ms 4630ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4680ms
SERPENT192 4830ms 4620ms 5320ms 4460ms 5040ms 4620ms 5300ms 5300ms 4680ms 4680ms
SERPENT256 4820ms 4640ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4660ms
RFC2268_40 10260ms 7850ms 11080ms 9270ms 10620ms 10380ms 11250ms 11230ms 10690ms 10710ms
SEED 6580ms 6990ms 7710ms 8370ms 7140ms 7240ms 7600ms 7610ms 6870ms 6900ms
CAMELLIA128 2860ms 2820ms 3360ms 2750ms 3080ms 2740ms 3350ms 3360ms 2790ms 2790ms
CAMELLIA192 3710ms 3680ms 4240ms 3520ms 3910ms 3510ms 4200ms 4210ms 3560ms 3560ms
CAMELLIA256 3700ms 3680ms 4230ms 3520ms 3930ms 3510ms 4200ms 4210ms 3550ms 3560ms
SALSA20 1900ms 1900ms
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/camellia-aesni-avx2-amd64.S
(_gcry_camellia_aesni_avx2_ctr_enc): Add 'vzeroall'.
(_gcry_camellia_aesni_avx2_cbc_dec)
(_gcry_camellia_aesni_avx2_cfb_dec): Add 'vzeroupper' at head and
'vzeroall' at tail.
* cipher/camellia-glue.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AESNI_AVX2]: Remove register
clearing.
--
Patch moves register clearing with 'vzeroall' to assembly functions and
adds missing 'vzeroupper' instructions at head of assembly functions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/camellia-aesni-avx-amd64.S (_gcry_camellia_aesni_avx_ctr_enc)
(_gcry_camellia_aesni_avx_cbc_dec)
(_gcry_camellia_aesni_avx_cfb_dec): Add 'vzeroupper' at head and
'vzeroall' at tail.
* cipher/camellia-glue.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AESNI_AVX]: Remove register clearing.
--
Patch moves register clearing with 'vzeroall' to assembly functions and
adds missing 'vzeroupper' instructions at head of assembly functions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_ctr_enc)
(_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Change last
'vzeroupper' to 'vzeroall'.
* cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AVX2]: Remove register clearing with
'vzeroall'.
--
AVX2 implementation was already clearing upper halfs of YMM registers at end of
assembly functions to prevent long SSE<->AVX transition stalls present on Intel
CPUs. Patch changes these 'vzeroupper' instructions to 'vzeroall' to fully
clear YMM registers. After this change register clearing in serpent.c in not
needed.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-armv7-neon.S'.
* cipher/sha512-armv7-neon.S: New file.
* cipher/sha512.c (USE_ARM_NEON_ASM): New macro.
(SHA512_CONTEXT) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(sha512_init, sha384_init) [USE_ARM_NEON_ASM]: Enable 'use_neon' if
CPU support NEON instructions.
(k): Round constant array moved outside of 'transform' function.
(__transform): Renamed from 'tranform' function.
[USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): New prototype.
(transform): New wrapper function for different transform versions.
(sha512_write, sha512_final): Burn stack by the amount returned by
transform function.
* configure.ac (sha512) [neonsupport]: Add 'sha512-armv7-neon.lo'.
--
Add NEON assembly for transform function for faster SHA512 on ARM. Major speed
up thanks to 64-bit integer registers and large register file that can hold
full input buffer.
Benchmark results on Cortex-A8, 1Ghz:
Old:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 17050ms 18780ms 29120ms 18040ms 17190ms
SHA384 17130ms 18720ms 29160ms 18090ms 17280ms
New:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 3600ms 5070ms 15330ms 4510ms 3480ms
SHA384 3590ms 5060ms 15350ms 4510ms 3520ms
New vs old:
SHA512 4.74x 3.70x 1.90x 4.00x 4.94x
SHA384 4.77x 3.70x 1.90x 4.01x 4.91x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/sha512.c (transform): Change 'u64 w[80]' to 'u64 w[16]' and
inline input expansion to first 64 rounds.
(sha512_write, sha512_final): Reduce burn_stack depth by 512 bytes.
--
The input expansion to w[] array can be inlined with rounds and size of array
reduced from u64[80] to u64[16]. On Cortex-A8, this change gives small boost,
possibly thanks to reduced burn_stack depth.
New vs old (tests/benchmark md sha512 sha384):
SHA512 1.09x 1.11x 1.06x 1.09x 1.08x
SHA384 1.09x 1.11x 1.06x 1.09x 1.09x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/ecc-common.h, cipher/ecc-curves.c, cipher/ecc-misc.c: New.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files.
* configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new .c files.
* cipher/ecc.c (curve_aliases, ecc_domain_parms_t, domain_parms)
(scanval): Move to ecc-curves.c.
(fill_in_curve): Move to ecc-curve.c as _gcry_ecc_fill_in_curve.
(ecc_get_curve): Move to ecc-curve.c as _gcry_ecc_get_curve.
(_gcry_mpi_ec_ec2os): Move to ecc-misc.c.
(ec2os): Move to ecc-misc.c as _gcry_ecc_ec2os.
(os2ec): Move to ecc-misc.c as _gcry_ecc_os2ec.
(point_set): Move as inline function to ecc-common.h.
(_gcry_ecc_curve_free): Move to ecc-misc.c as _gcry_ecc_curve_free.
(_gcry_ecc_curve_copy): Move to ecc-misc.c as _gcry_ecc_curve_copy.
(mpi_from_keyparam, point_from_keyparam): Move to ecc-curves.c.
(_gcry_mpi_ec_new): Move to ecc-curves.c.
(ecc_get_param): Move to ecc-curves.c as _gcry_ecc_get_param.
(ecc_get_param_sexp): Move to ecc-curves.c as _gcry_ecc_get_param_sexp.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_ctr_enc)
(_gcry_serpent_sse2_cbc_dec, _gcry_serpent_sse2_cfb_dec): Clear used
XMM registers.
cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
( _gcry_serpent_cfb_dec) [USE_SSE2]: Remove XMM register clearing from
bulk functions.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/twofish-amd64.S (__twofish_dec_blk3): Do not export symbol as
global.
(__twofish_dec_blk3): Mark symbol as function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/blowfish-armv6.S: Replace __ARM_ARCH >= 6 checks with
HAVE_ARM_ARCH_V6.
* cipher/blowfish.c: Ditto.
* cipher/camellia-armv6.S: Ditto.
* cipher/camellia.h: Ditto.
* cipher/cast5-armv6.S: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael-armv6.S: Ditto.
* cipher/rijndael.c: Ditto.
* configure.ac: Add HAVE_ARM_ARCH_V6 check.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/bufhelp.h [__arm__ && __ARM_FEATURE_UNALIGNED]: Enable
BUFHELP_FAST_UNALIGNED_ACCESS.
--
Newer ARM systems support unaligned memory accesses and on gcc-4.7 and onwards
this is identified by __ARM_FEATURE_UNALIGNED macro.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'camellia-armv6.S'.
* cipher/camellia-armv6.S: New file.
* cipher/camellia-glue.c [USE_ARMV6_ASM]
(_gcry_camellia_armv6_encrypt_block)
(_gcry_camellia_armv6_decrypt_block): New prototypes.
[USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock)
(camellia_encrypt, camellia_decrypt): New functions.
* cipher/camellia.c [!USE_ARMV6_ASM]: Compile encryption and decryption
routines if USE_ARMV6_ASM macro is _not_ defined.
* cipher/camellia.h (USE_ARMV6_ASM): New macro.
[!USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock): If
USE_ARMV6_ASM is defined, disable these function prototypes.
(camellia) [arm]: Add 'camellia-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for Camellia. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now. only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 1.44x 1.47x 1.35x 1.34x 1.43x 1.39x 1.38x 1.36x 1.38x 1.39x
CAMELLIA192 1.60x 1.62x 1.52x 1.47x 1.56x 1.54x 1.52x 1.53x 1.52x 1.53x
CAMELLIA256 1.59x 1.60x 1.49x 1.47x 1.53x 1.54x 1.51x 1.50x 1.52x 1.53x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'blowfish-armv6.S'.
* cipher/blowfish-armv6.S: New file.
* cipher/blowfish.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_blowfish_armv6_do_encrypt)
(_gcry_blowfish_armv6_encrypt_block)
(_gcry_blowfish_armv6_decrypt_block, _gcry_blowfish_armv6_ctr_enc)
(_gcry_blowfish_armv6_cbc_dec, _gcry_blowfish_armv6_cfb_dec): New
prototypes.
[USE_ARMV6_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block)
(encrypt_block, decrypt_block): New functions.
(_gcry_blowfish_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
* configure.ac (blowfish) [arm]: Add 'blowfish-armv6.lo'.
--
Patch provides non-parallel implementations for small speed-up and 2-way
parallel implementations that gets accelerated on multi-issue CPUs (hand-tuned
for in-order dual-issue Cortex-A8). Unaligned access handling is done in
assembly.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new (Cortex-A8, Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.28x 1.16x 1.21x 2.16x 1.26x 1.86x 1.21x 1.25x 1.89x 1.96x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'cast5-armv6.S'.
* cipher/cast5-armv6.S: New file.
* cipher/cast5.c (USE_ARMV6_ASM): New macro.
(CAST5_context) [USE_ARMV6_ASM]: New members 'Kr_arm_enc' and
'Kr_arm_dec'.
[USE_ARMV6_ASM] (_gcry_cast5_armv6_encrypt_block)
(_gcry_cast5_armv6_decrypt_block, _gcry_cast5_armv6_ctr_enc)
(_gcry_cast5_armv6_cbc_dec, _gcry_cast5_armv6_cfb_dec): New prototypes.
[USE_ARMV6_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block)
(decrypt_block): New functions.
(_gcry_cast5_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_cast_setkey) [USE_ARMV6_ASM]: Initialize 'Kr_arm_enc' and
'Kr_arm_dec'.
* configure.ac (cast5) [arm]: Add 'cast5-armv6.lo'.
--
Provides non-parallel implementations for small speed-up and 2-way parallel
implementations that gets accelerated on multi-issue CPUs (hand-tuned for
in-order dual-issue Cortex-A8). Unaligned access handling is done in assembly.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new (Cortex-A8, Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAST5 1.15x 1.12x 1.12x 2.07x 1.14x 1.60x 1.12x 1.13x 1.62x 1.63x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-armv6.S'.
* cipher/rijndael-armv6.S: New file.
* cipher/rijndael.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_aes_armv6_encrypt_block)
(_gcry_aes_armv6_decrypt_block): New prototypes.
(do_encrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_encrypt): Disable input/output alignment when USE_ARMV6_ASM.
(do_decrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_decrypt): Disable input/output alignment when USE_ARMV6_ASM.
* configure.ac (HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS): New check for
gcc/as compatibility with ARM assembly implementations.
(aes) [arm]: Add 'rijndael-armv6.lo'.
--
Add optimized ARMv6 assembly implementation for AES. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
AES 2.61x 3.12x 2.16x 2.59x 2.26x 2.25x 2.08x 2.08x 2.23x 2.23x
AES192 2.60x 3.06x 2.18x 2.65x 2.29x 2.29x 2.12x 2.12x 2.25x 2.27x
AES256 2.62x 3.09x 2.24x 2.72x 2.30x 2.34x 2.17x 2.19x 2.32x 2.32x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/pubkey.c (gcry_pk_sign): Handle the specific case of ECC,
where there is NULL whichi is not the sentinel.
--
This is a kind of makeshift fix, but the MPI array API is internal
only and will be removed, it is better not to change API now.
|
|
* cipher/ecc.c (ecc_get_curve): Free TMP.
|
|
* cipher/elgamal.c (elg_generate_ext): Free XVALUE.
* cipher/pubkey.c (sexp_elements_extract): Don't use IDX for loop.
Call mpi_free.
(sexp_elements_extract_ecc): Call mpi_free.
|
|
* cipher/ecc.c (check_secret_key): replace wrong comparison of Q and
sk->Q points with correct one.
--
Currently check_secret_keys compares pointers to coordinates of Q
(calculated) and sk->Q (provided) points. Instead it should convert them
to affine representations and use mpi_cmp to compare coordinates.
This has an implication that keys that were (erroneously) verified as
valid could now become invalid.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
* cipher/ecc.c (sign): Add args FLAGS and HASHALGO. Convert an opaque
MPI as INPUT. Implement rfc-6979.
(ecc_sign): Remove the opaque MPI code and pass FLAGS to sign.
(verify): Do not allocate and compute Y; it is not used.
(ecc_verify): Truncate the hash value if needed.
* tests/dsa-rfc6979.c (check_dsa_rfc6979): Add ECDSA test cases.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/dsa.c (dsa_sign): Move opaque mpi extraction to sign.
(sign): Add args FLAGS and HASHALGO. Implement deterministic DSA.
Add code path for R==0 to comply with the standard.
(dsa_verify): Left fill opaque mpi based hash values.
* cipher/dsa-common.c (int2octets, bits2octets): New.
(_gcry_dsa_gen_rfc6979_k): New.
* tests/dsa-rfc6979.c: New.
* tests/Makefile.am (TESTS): Add dsa-rfc6979.
--
This patch also fixes a recent patch (37d0a1e) which allows to pass
the hash in a (hash) element.
Support for deterministic ECDSA will come soon.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/pubkey.c (sexp_to_key): Fallback to private key.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/pubkey.c (pubkey_sign): Add arg ctx and pass it to the sign
module.
(gcry_pk_sign): Pass CTX to pubkey_sign.
(sexp_data_to_mpi): Add flag rfc6979 and code to alls hash with *DSA
* cipher/rsa.c (rsa_sign, rsa_verify): Return an error if an opaque
MPI is given for DATA/HASH.
* cipher/elgamal.c (elg_sign, elg_verify): Ditto.
* cipher/dsa.c (dsa_sign, dsa_verify): Convert a given opaque MPI.
* cipher/ecc.c (ecc_sign, ecc_verify): Ditto.
* tests/basic.c (check_pubkey_sign_ecdsa): Add a test for using a hash
element with DSA.
--
This patch allows the use of
(data (flags raw)
(hash sha256 #80112233445566778899AABBCCDDEEFF
000102030405060708090A0B0C0D0E0F#))
in addition to the old but more efficient
(data (flags raw)
(value #80112233445566778899AABBCCDDEEFF
000102030405060708090A0B0C0D0E0F#))
for DSA and ECDSA. With the hash element the flag "raw" must be
explicitly given because existing regression test code expects that
conflict error is return if no flags but a hash element is given.
Note that the hash algorithm name is currently not checked. It may
eventually be used to cross-check the length of the provided hash
value. It is suggested that the correct hash name is given - even if
a truncated hash value is used.
Finally this patch adds a way to pass the hash algorithm and flag
values to the signing module. "rfc6979" as been implemented as a new
but not yet used flag.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt.h.in (GCRY_CIPHER_SALSA20): New.
* cipher/salsa20.c: New.
* configure.ac (available_ciphers): Add Salsa20.
* cipher/cipher.c: Register Salsa20.
(cipher_setiv): Allow to divert an IV to a cipher module.
* src/cipher-proto.h (cipher_setiv_func_t): New.
(cipher_extra_spec): Add field setiv.
* src/cipher.h: Declare Salsa20 definitions.
* tests/basic.c (check_stream_cipher): New.
(check_stream_cipher_large_block): New.
(check_cipher_modes): Run new test functions.
(check_ciphers): Add simple test for Salsa20.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* src/gcrypt-module.h (gcry_pk_sign_t): Add parms flags and hashalgo.
* cipher/rsa.c (rsa_sign): Add parms and mark them as unused.
* cipher/dsa.c (dsa_sign): Ditto.
* cipher/elgamal.c (elg_sign): Ditto.
* cipher/pubkey.c (dummy_sign): Ditto.
(pubkey_sign): Pass 0 for the new args.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
* cipher/bithelp.h [__GNUC__, __i386__] (rol, ror): add "cc" globber
for inline assembly.
* cipher/cast5.c [__GNUC__, __i386__] (rol): Ditto.
* random/rndhw.c [USE_DRNG] (rdrand_long): Ditto.
* src/hmac256.c [__GNUC__, __i386__] (ror): Ditto.
* mpi/longlong.c [__i386__] (add_ssaaaa, sub_ddmmss, umul_ppmm)
(udiv_qrnnd, count_leading_zeros, count_trailing_zeros): Ditto.
--
These assembly snippets modify cflags but do not mark "cc" clobber.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/bufhelp.h (buf_xor, buf_xor_2dst, buf_xor_n_copy): Cast
to larger element pointer through (void *) to suppress -Wcast-error.
--
Patch disables bogus warnings caused by -Wcast-error. We know that byte
pointers are properly aligned at these phases, or that hardware can handle
unaligned accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/blowfish-amd64.S: Enable only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined.
* cipher/camellia-aesni-avx-amd64.S: Ditto.
* cipher/camellia-aesni-avx2-amd64.S: Ditto.
* cipher/cast5-amd64.S: Ditto.
* cipher/rinjdael-amd64.S: Ditto.
* cipher/serpent-avx2-amd64.S: Ditto.
* cipher/serpent-sse2-amd64.S: Ditto.
* cipher/twofish-amd64.S: Ditto.
* cipher/blowfish.c: Use AMD64 assembly implementation only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
* configure.ac: Check gcc/as compatibility with AMD64 assembly
implementations.
--
Later these checks can be split and assembly implementations adapted to handle
different platforms, but for now disable AMD64 assembly implementations if
assembler does not look to be able to handle them.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'camellia-aesni-avx2-amd64.S'.
* cipher/camellia-aesni-avx2-amd64.S: New file.
* cipher/camellia-glue.c (USE_AESNI_AVX2): New macro.
(CAMELLIA_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'.
[USE_AESNI_AVX2] (_gcry_camellia_aesni_avx2_ctr_enc)
(_gcry_camellia_aesni_avx2_cbc_dec)
(_gcry_camellia_aesni_avx2_cfb_dec): New prototypes.
(camellia_setkey) [USE_AESNI_AVX2]: Check AVX2+AES-NI capable hardware
and set 'ctx->use_aesni_avx2'.
(_gcry_camellia_ctr_enc) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cbc_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cfb_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths get tested.
* configure.ac (camellia) [avx2support, aesnisupport]: Add
'camellia-aesni-avx2-amd64.lo'.
--
Add new AVX2/AES-NI implementation of Camellia that processes 32 blocks in
parallel.
Speed old (AVX/AES-NI) vs. new (AVX2/AES-NI) on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
CAMELLIA128 1.00x 0.99x 1.00x 1.53x 1.00x 1.49x 1.00x 1.00x 1.54x 1.54x
CAMELLIA256 0.99x 1.00x 1.00x 1.50x 1.00x 1.50x 1.00x 1.00x 1.54x 1.52x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|