summaryrefslogtreecommitdiff
path: root/cipher/cipher-cbc.c
AgeCommit message (Collapse)AuthorFilesLines
2014-01-16Replace ath based mutexes by gpgrt based locks.Werner Koch1-1/+0
* configure.ac (NEED_GPG_ERROR_VERSION): Require 1.13. (gl_LOCK): Remove. * src/ath.c, src/ath.h: Remove. Remove from all files. Replace all mutexes by gpgrt based statically initialized locks. * src/global.c (global_init): Remove ath_init. (_gcry_vcontrol): Make ath install a dummy function. (print_config): Remove threads info line. * doc/gcrypt.texi: Simplify the multi-thread related documentation. -- The current code does only work on ELF systems with weak symbol support. In particular no locks were used under Windows. With the new gpgrt_lock functions from the soon to be released libgpg-error 1.13 we have a better portable scheme which also allows for static initialized mutexes. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-11-15cipher: use size_t for internal buffer lengthsJussi Kivilinna1-10/+10
* cipher/arcfour.c (do_encrypt_stream, encrypt_stream): Use 'size_t' for buffer lengths. * cipher/blowfish.c (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (_gcry_blowfish_cfb_dec): Ditto. * cipher/camellia-glue.c (_gcry_camellia_ctr_enc) (_gcry_camellia_cbc_dec, _gcry_blowfish_cfb_dec): Ditto. * cipher/cast5.c (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec) (_gcry_cast5_cfb_dec): Ditto. * cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt) (_gcry_cipher_aeswrap_decrypt): Ditto. * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt) (_gcry_cipher_cbc_decrypt): Ditto. * cipher/cipher-ccm.c (_gcry_cipher_ccm_encrypt) (_gcry_cipher_ccm_decrypt): Ditto. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) (_gcry_cipher_cfb_decrypt): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto. * cipher/cipher-internal.h (gcry_cipher_handle->bulk) (_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt) (_gcry_cipher_cfb_encrypt, _gcry_cipher_cfb_decrypt) (_gcry_cipher_ofb_encrypt, _gcry_cipher_ctr_encrypt) (_gcry_cipher_aeswrap_encrypt, _gcry_cipher_aeswrap_decrypt) (_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Ditto. * cipher/cipher-ofb.c (_gcry_cipher_cbc_encrypt): Ditto. * cipher/cipher-selftest.h (gcry_cipher_bulk_cbc_dec_t) (gcry_cipher_bulk_cfb_dec_t, gcry_cipher_bulk_ctr_enc_t): Ditto. * cipher/cipher.c (cipher_setkey, cipher_setiv, do_ecb_crypt) (do_ecb_encrypt, do_ecb_decrypt, cipher_encrypt) (cipher_decrypt): Ditto. * cipher/rijndael.c (_gcry_aes_ctr_enc, _gcry_aes_cbc_dec) (_gcry_aes_cfb_dec, _gcry_aes_cbc_enc, _gcry_aes_cfb_enc): Ditto. * cipher/salsa20.c (salsa20_setiv, salsa20_do_encrypt_stream) (salsa20_encrypt_stream, salsa20r12_encrypt_stream): Ditto. * cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec) (_gcry_serpent_cfb_dec): Ditto. * cipher/twofish.c (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec) (_gcry_twofish_cfb_dec): Ditto. * src/cipher-proto.h (gcry_cipher_stencrypt_t) (gcry_cipher_stdecrypt_t, cipher_setiv_fuct_t): Ditto. * src/cipher.h (_gcry_aes_cfb_enc, _gcry_aes_cfb_dec) (_gcry_aes_cbc_enc, _gcry_aes_cbc_dec, _gcry_aes_ctr_enc) (_gcry_blowfish_cfb_dec, _gcry_blowfish_cbc_dec) (_gcry_blowfish_ctr_enc, _gcry_cast5_cfb_dec, _gcry_cast5_cbc_dec) (_gcry_cast5_ctr_enc, _gcry_camellia_cfb_dec, _gcry_camellia_cbc_dec) (_gcry_camellia_ctr_enc, _gcry_serpent_cfb_dec, _gcry_serpent_cbc_dec) (_gcry_serpent_ctr_enc, _gcry_twofish_cfb_dec, _gcry_twofish_cbc_dec) (_gcry_twofish_ctr_enc): Ditto. -- On 64-bit platforms, cipher module internally converts 64-bit size_t values to 32-bit unsigned integers. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-23Improve the speed of the cipher mode codeJussi Kivilinna1-24/+29
* cipher/bufhelp.h (buf_cpy): New. (buf_xor, buf_xor_2dst): If buffers unaligned, always jump to per-byte processing. (buf_xor_n_copy_2): New. (buf_xor_n_copy): Use 'buf_xor_n_copy_2'. * cipher/blowfish.c (_gcry_blowfish_cbc_dec): Avoid extra memory copy and use new 'buf_xor_n_copy_2'. * cipher/camellia-glue.c (_gcry_camellia_cbc_dec): Ditto. * cipher/cast5.c (_gcry_cast_cbc_dec): Ditto. * cipher/serpent.c (_gcry_serpent_cbc_dec): Ditto. * cipher/twofish.c (_gcry_twofish_cbc_dec): Ditto. * cipher/rijndael.c (_gcry_aes_cbc_dec): Ditto. (do_encrypt, do_decrypt): Use 'buf_cpy' instead of 'memcpy'. (_gcry_aes_cbc_enc): Avoid copying IV, use 'last_iv' pointer instead. * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt): Avoid copying IV, update pointer to IV instead. (_gcry_cipher_cbc_decrypt): Avoid extra memory copy and use new 'buf_xor_n_copy_2'. (_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt): Avoid extra accesses to c->spec, use 'buf_cpy' instead of memcpy. * cipher/cipher-ccm.c (do_cbc_mac): Ditto. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) (_gcry_cipher_cfb_decrypt): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt) (_gcry_cipher_ofb_decrypt): Ditto. * cipher/cipher.c (do_ecb_encrypt, do_ecb_decrypt): Ditto. -- Patch improves the speed of the generic block cipher mode code. Especially on targets without faster unaligned memory accesses, the generic code was slower than the algorithm specific bulk versions. With this patch, this issue should be solved. Tests on Cortex-A8; compiled for ARMv4, without unaligned-accesses: Before: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 490ms 500ms 560ms 580ms 530ms 540ms 560ms 560ms 550ms 540ms 1080ms 1080ms TWOFISH 230ms 230ms 290ms 300ms 260ms 240ms 290ms 290ms 240ms 240ms 520ms 510ms DES 720ms 720ms 800ms 860ms 770ms 770ms 810ms 820ms 770ms 780ms - - CAST5 340ms 340ms 440ms 250ms 390ms 250ms 440ms 430ms 260ms 250ms - - After: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 500ms 490ms 520ms 520ms 530ms 520ms 530ms 540ms 500ms 520ms 1060ms 1070ms TWOFISH 230ms 220ms 250ms 230ms 260ms 230ms 260ms 260ms 230ms 230ms 500ms 490ms DES 720ms 720ms 750ms 760ms 740ms 750ms 770ms 770ms 760ms 760ms - - CAST5 340ms 340ms 370ms 250ms 370ms 250ms 380ms 390ms 250ms 250ms - - Tests on Cortex-A8; compiled for ARMv7-A, with unaligned-accesses: Before: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 430ms 440ms 480ms 530ms 470ms 460ms 490ms 480ms 470ms 460ms 930ms 940ms TWOFISH 220ms 220ms 250ms 230ms 240ms 230ms 270ms 250ms 230ms 240ms 480ms 470ms DES 550ms 540ms 620ms 690ms 570ms 540ms 630ms 650ms 590ms 580ms - - CAST5 300ms 300ms 380ms 230ms 330ms 230ms 380ms 370ms 230ms 230ms - - After: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 430ms 430ms 460ms 450ms 460ms 450ms 470ms 470ms 460ms 470ms 900ms 930ms TWOFISH 220ms 210ms 240ms 230ms 230ms 230ms 250ms 250ms 230ms 230ms 470ms 470ms DES 540ms 540ms 580ms 570ms 570ms 570ms 560ms 620ms 580ms 570ms - - CAST5 300ms 290ms 310ms 230ms 320ms 230ms 350ms 350ms 230ms 230ms - - Tests on Intel Atom N160 (i386): Before: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 380ms 380ms 410ms 420ms 400ms 400ms 410ms 410ms 390ms 400ms 820ms 800ms TWOFISH 340ms 340ms 370ms 350ms 360ms 340ms 370ms 370ms 330ms 340ms 710ms 700ms DES 660ms 650ms 710ms 740ms 680ms 700ms 700ms 710ms 680ms 680ms - - CAST5 340ms 340ms 380ms 330ms 360ms 330ms 390ms 390ms 320ms 330ms - - After: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 380ms 380ms 390ms 410ms 400ms 390ms 410ms 400ms 400ms 390ms 810ms 800ms TWOFISH 330ms 340ms 350ms 360ms 350ms 340ms 380ms 370ms 340ms 360ms 700ms 710ms DES 630ms 640ms 660ms 690ms 680ms 680ms 700ms 690ms 680ms 680ms - - CAST5 340ms 330ms 350ms 330ms 370ms 340ms 380ms 390ms 330ms 330ms - - Tests in Intel i5-4570 (x86-64): Before: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 560ms 560ms 600ms 590ms 600ms 570ms 570ms 570ms 580ms 590ms 1200ms 1180ms TWOFISH 240ms 240ms 270ms 160ms 260ms 160ms 250ms 250ms 160ms 160ms 430ms 430ms DES 570ms 570ms 640ms 590ms 630ms 580ms 600ms 600ms 610ms 620ms - - CAST5 410ms 410ms 470ms 150ms 470ms 150ms 450ms 450ms 150ms 160ms - - After: ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- --------------- SEED 560ms 560ms 590ms 570ms 580ms 570ms 570ms 570ms 590ms 590ms 1200ms 1200ms TWOFISH 240ms 240ms 260ms 160ms 250ms 170ms 250ms 250ms 160ms 160ms 430ms 430ms DES 570ms 570ms 620ms 580ms 630ms 570ms 600ms 590ms 620ms 620ms - - CAST5 410ms 410ms 460ms 150ms 460ms 160ms 450ms 450ms 150ms 150ms - - Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2013-10-01cipher: Simplify the cipher dispatcher cipher.c.Werner Koch1-13/+13
* src/gcrypt-module.h (gcry_cipher_spec_t): Move to ... * src/cipher-proto.h (gcry_cipher_spec_t): here. Merge with cipher_extra_spec_t. Add fields ALGO and FLAGS. Set these fields in all cipher modules. * cipher/cipher.c: Change most code to replace the former module system by a simpler system to gain information about the algorithms. (disable_pubkey_algo): Simplified. Not anymore thread-safe, though. * cipher/md.c (_gcry_md_selftest): Use correct structure. Not a real problem because both define the same function as their first field. * cipher/pubkey.c (_gcry_pk_selftest): Take care of the disabled flag. Signed-off-by: Werner Koch <wk@gnupg.org>
2013-09-04Move stack burning from block ciphers to cipher modesJussi Kivilinna1-5/+22
* src/gcrypt-module.h (gcry_cipher_encrypt_t) (gcry_cipher_decrypt_t): Return 'unsigned int'. * cipher/cipher.c (dummy_encrypt_block, dummy_decrypt_block): Return zero. (do_ecb_encrypt, do_ecb_decrypt): Get largest stack burn depth from block cipher crypt function and burn stack at end. * cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt) (_gcry_cipher_aeswrap_decrypt): Ditto. * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt) (_gcry_cipher_cbc_decrypt): Ditto. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) (_gcry_cipher_cfb_decrypt): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_cbc_encrypt): Ditto. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt) (_gcry_cipher_ofb_decrypt): Ditto. * cipher/blowfish.c (encrypt_block, decrypt_block): Return burn stack depth. * cipher/camellia-glue.c (camellia_encrypt, camellia_decrypt): Ditto. * cipher/cast5.c (encrypt_block, decrypt_block): Ditto. * cipher/des.c (do_tripledes_encrypt, do_tripledes_decrypt) (do_des_encrypt, do_des_decrypt): Ditto. * cipher/idea.c (idea_encrypt, idea_decrypt): Ditto. * cipher/rijndael.c (rijndael_encrypt, rijndael_decrypt): Ditto. * cipher/seed.c (seed_encrypt, seed_decrypt): Ditto. * cipher/serpent.c (serpent_encrypt, serpent_decrypt): Ditto. * cipher/twofish.c (twofish_encrypt, twofish_decrypt): Ditto. * cipher/rfc2268.c (encrypt_block, decrypt_block): New. (_gcry_cipher_spec_rfc2268_40): Use encrypt_block and decrypt_block. -- Patch moves stack burning from block ciphers and cipher mode loop to end of cipher mode functions. This greatly reduces the overall CPU usage of the problematic _gcry_burn_stack. Internal cipher module API is changed so that encrypt/decrypt functions now return the stack burn depth as unsigned int to cipher mode function. (Note, patch also adds missing burn_stack for RFC2268_40 cipher). _gcry_burn_stack CPU time (looping tests/benchmark cipher blowfish): arch CPU Old New i386 Intel-Haswell 4.1% 0.16% x86_64 Intel-Haswell 3.4% 0.07% armhf Cortex-A8 8.7% 0.14% New vs. old (armhf/Cortex-A8): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- IDEA 1.05x 1.05x 1.04x 1.04x 1.04x 1.04x 1.07x 1.05x 1.04x 1.04x 3DES 1.04x 1.03x 1.04x 1.03x 1.04x 1.04x 1.04x 1.04x 1.04x 1.04x CAST5 1.19x 1.20x 1.15x 1.00x 1.17x 1.00x 1.15x 1.05x 1.00x 1.00x BLOWFISH 1.21x 1.22x 1.16x 1.00x 1.18x 1.00x 1.16x 1.16x 1.00x 1.00x AES 1.09x 1.09x 1.00x 1.00x 1.00x 1.00x 1.07x 1.07x 1.00x 1.00x AES192 1.11x 1.11x 1.00x 1.00x 1.00x 1.00x 1.08x 1.09x 1.01x 1.00x AES256 1.07x 1.08x 1.01x .99x 1.00x 1.00x 1.07x 1.06x 1.00x 1.00x TWOFISH 1.10x 1.09x 1.09x 1.00x 1.09x 1.00x 1.08x 1.09x 1.00x 1.00x ARCFOUR 1.00x 1.00x DES 1.07x 1.11x 1.06x 1.08x 1.07x 1.07x 1.06x 1.06x 1.06x 1.06x TWOFISH128 1.10x 1.10x 1.09x 1.00x 1.09x 1.00x 1.08x 1.08x 1.00x 1.00x SERPENT128 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x SERPENT192 1.07x 1.06x 1.03x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x SERPENT256 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.05x 1.06x 1.00x 1.00x RFC2268_40 0.97x 1.01x 0.99x 0.98x 1.00x 0.97x 0.96x 0.96x 0.97x 0.97x SEED 1.45x 1.54x 1.53x 1.56x 1.50x 1.51x 1.50x 1.50x 1.42x 1.42x CAMELLIA128 1.08x 1.07x 1.06x 1.00x 1.07x 1.00x 1.06x 1.06x 1.00x 1.00x CAMELLIA192 1.08x 1.08x 1.08x 1.00x 1.07x 1.00x 1.07x 1.07x 1.00x 1.00x CAMELLIA256 1.08x 1.09x 1.07x 1.01x 1.08x 1.00x 1.07x 1.07x 1.00x 1.00x SALSA20 .99x 1.00x Raw data: New (armhf/Cortex-A8): Running each test 100 times. ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- IDEA 8620ms 8680ms 9640ms 10010ms 9140ms 8960ms 9630ms 9660ms 9180ms 9180ms 3DES 13990ms 14000ms 14780ms 15300ms 14320ms 14370ms 14780ms 14780ms 14480ms 14480ms CAST5 2980ms 2980ms 3780ms 2300ms 3290ms 2320ms 3770ms 4100ms 2320ms 2320ms BLOWFISH 2740ms 2660ms 3530ms 2060ms 3050ms 2080ms 3530ms 3530ms 2070ms 2070ms AES 2200ms 2330ms 2330ms 2450ms 2270ms 2270ms 2700ms 2690ms 2330ms 2320ms AES192 2550ms 2670ms 2700ms 2910ms 2630ms 2640ms 3060ms 3060ms 2680ms 2690ms AES256 2920ms 3010ms 3040ms 3190ms 3010ms 3000ms 3380ms 3420ms 3050ms 3050ms TWOFISH 2790ms 2840ms 3300ms 2950ms 3010ms 2870ms 3310ms 3280ms 2940ms 2940ms ARCFOUR 2050ms 2050ms DES 5640ms 5630ms 6440ms 6970ms 5960ms 6000ms 6440ms 6440ms 6120ms 6120ms TWOFISH128 2790ms 2840ms 3300ms 2950ms 3010ms 2890ms 3310ms 3290ms 2930ms 2930ms SERPENT128 4530ms 4340ms 5210ms 4470ms 4740ms 4620ms 5020ms 5030ms 4680ms 4680ms SERPENT192 4510ms 4340ms 5190ms 4460ms 4750ms 4620ms 5020ms 5030ms 4680ms 4680ms SERPENT256 4540ms 4330ms 5220ms 4460ms 4730ms 4600ms 5030ms 5020ms 4680ms 4680ms RFC2268_40 10530ms 7790ms 11140ms 9490ms 10650ms 10710ms 11710ms 11690ms 11000ms 11000ms SEED 4530ms 4540ms 5050ms 5380ms 4760ms 4810ms 5060ms 5060ms 4850ms 4860ms CAMELLIA128 2660ms 2630ms 3170ms 2750ms 2880ms 2740ms 3170ms 3170ms 2780ms 2780ms CAMELLIA192 3430ms 3400ms 3930ms 3530ms 3650ms 3500ms 3940ms 3940ms 3570ms 3560ms CAMELLIA256 3430ms 3390ms 3940ms 3500ms 3650ms 3510ms 3930ms 3940ms 3550ms 3550ms SALSA20 1910ms 1900ms Old (armhf/Cortex-A8): Running each test 100 times. ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- IDEA 9030ms 9100ms 10050ms 10410ms 9540ms 9360ms 10350ms 10190ms 9560ms 9570ms 3DES 14580ms 14460ms 15300ms 15720ms 14880ms 14900ms 15350ms 15330ms 15030ms 15020ms CAST5 3560ms 3570ms 4350ms 2300ms 3860ms 2330ms 4340ms 4320ms 2330ms 2320ms BLOWFISH 3320ms 3250ms 4110ms 2060ms 3610ms 2080ms 4100ms 4090ms 2070ms 2070ms AES 2390ms 2530ms 2320ms 2460ms 2280ms 2270ms 2890ms 2880ms 2330ms 2330ms AES192 2830ms 2970ms 2690ms 2900ms 2630ms 2650ms 3320ms 3330ms 2700ms 2690ms AES256 3110ms 3250ms 3060ms 3170ms 3000ms 3000ms 3610ms 3610ms 3050ms 3060ms TWOFISH 3080ms 3100ms 3600ms 2940ms 3290ms 2880ms 3560ms 3570ms 2940ms 2930ms ARCFOUR 2060ms 2050ms DES 6060ms 6230ms 6850ms 7540ms 6380ms 6400ms 6830ms 6840ms 6500ms 6510ms TWOFISH128 3060ms 3110ms 3600ms 2940ms 3290ms 2890ms 3560ms 3560ms 2940ms 2930ms SERPENT128 4820ms 4630ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4680ms SERPENT192 4830ms 4620ms 5320ms 4460ms 5040ms 4620ms 5300ms 5300ms 4680ms 4680ms SERPENT256 4820ms 4640ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4660ms RFC2268_40 10260ms 7850ms 11080ms 9270ms 10620ms 10380ms 11250ms 11230ms 10690ms 10710ms SEED 6580ms 6990ms 7710ms 8370ms 7140ms 7240ms 7600ms 7610ms 6870ms 6900ms CAMELLIA128 2860ms 2820ms 3360ms 2750ms 3080ms 2740ms 3350ms 3360ms 2790ms 2790ms CAMELLIA192 3710ms 3680ms 4240ms 3520ms 3910ms 3510ms 4200ms 4210ms 3560ms 3560ms CAMELLIA256 3700ms 3680ms 4230ms 3520ms 3930ms 3510ms 4200ms 4210ms 3550ms 3560ms SALSA20 1900ms 1900ms Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
2012-12-03Optimize buffer xoring.Jussi Kivilinna1-9/+5
* cipher/Makefile.am (libcipher_la_SOURCES): Add 'bufhelp.h'. * cipher/bufhelp.h: New. * cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt) (_gcry_cipher_aeswrap_decrypt): Use 'buf_xor' for buffer xoring. * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt) (_gcry_cipher_cbc_decrypt): Use 'buf_xor' for buffer xoring and remove resulting unused variables. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) Use 'buf_xor_2dst' for buffer xoring and remove resulting unused variables. (_gcry_cipher_cfb_decrypt): Use 'buf_xor_n_copy' for buffer xoring and remove resulting unused variables. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Use 'buf_xor' for buffer xoring and remove resulting unused variables. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt) (_gcry_cipher_ofb_decrypt): Use 'buf_xor' for buffer xoring and remove resulting used variables. * cipher/rijndael.c (_gry_aes_cfb_enc): Use 'buf_xor_2dst' for buffer xoring and remove resulting unused variables. (_gry_aes_cfb_dev): Use 'buf_xor_n_copy' for buffer xoring and remove resulting unused variables. (_gry_aes_cbc_enc, _gry_aes_ctr_enc, _gry_aes_cbc_dec): Use 'buf_xor' for buffer xoring and remove resulting unused variables. -- Add faster helper functions for buffer xoring and replace byte buffer xor loops. This give following speed up. Note that CTR speed up is from refactoring code to use buf_xor() and removal of integer division/modulo operations issued per each processed byte. This removal of div/mod most likely gives even greater speed increase on CPU architechtures that do not have hardware division unit. Benchmark ratios (old-vs-new, AMD Phenom II, x86-64): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- IDEA 0.99x 1.01x 1.06x 1.02x 1.03x 1.06x 1.04x 1.02x 1.58x 1.58x 3DES 1.00x 1.00x 1.01x 1.01x 1.02x 1.02x 1.02x 1.01x 1.22x 1.23x CAST5 0.98x 1.00x 1.09x 1.03x 1.09x 1.09x 1.07x 1.07x 1.98x 1.95x BLOWFISH 1.00x 1.00x 1.18x 1.05x 1.07x 1.07x 1.05x 1.05x 1.93x 1.91x AES 1.00x 0.98x 1.18x 1.14x 1.13x 1.13x 1.14x 1.14x 1.18x 1.18x AES192 0.98x 1.00x 1.13x 1.14x 1.13x 1.10x 1.14x 1.16x 1.15x 1.15x AES256 0.97x 1.02x 1.09x 1.13x 1.13x 1.09x 1.10x 1.14x 1.11x 1.13x TWOFISH 1.00x 1.00x 1.15x 1.17x 1.18x 1.16x 1.18x 1.13x 2.37x 2.31x ARCFOUR 1.03x 0.97x DES 1.01x 1.00x 1.04x 1.04x 1.04x 1.05x 1.05x 1.02x 1.56x 1.55x TWOFISH128 0.97x 1.03x 1.18x 1.17x 1.18x 1.15x 1.15x 1.15x 2.37x 2.31x SERPENT128 1.00x 1.00x 1.10x 1.11x 1.08x 1.09x 1.08x 1.06x 1.66x 1.67x SERPENT192 1.00x 1.00x 1.07x 1.08x 1.08x 1.09x 1.08x 1.08x 1.65x 1.66x SERPENT256 1.00x 1.00x 1.09x 1.09x 1.08x 1.09x 1.08x 1.06x 1.66x 1.67x RFC2268_40 1.03x 0.99x 1.05x 1.02x 1.03x 1.03x 1.04x 1.03x 1.46x 1.46x SEED 1.00x 1.00x 1.10x 1.10x 1.09x 1.09x 1.10x 1.07x 1.80x 1.76x CAMELLIA128 1.00x 1.00x 1.23x 1.12x 1.15x 1.17x 1.15x 1.12x 2.15x 2.13x CAMELLIA192 1.05x 1.03x 1.23x 1.21x 1.21x 1.16x 1.12x 1.25x 1.90x 1.90x CAMELLIA256 1.03x 1.07x 1.10x 1.19x 1.08x 1.14x 1.12x 1.10x 1.90x 1.92x Benchmark ratios (old-vs-new, AMD Phenom II, i386): ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- --------------- IDEA 1.00x 1.00x 1.04x 1.05x 1.04x 1.02x 1.02x 1.02x 1.38x 1.40x 3DES 1.01x 1.00x 1.02x 1.04x 1.03x 1.01x 1.00x 1.02x 1.20x 1.20x CAST5 1.00x 1.00x 1.03x 1.09x 1.07x 1.04x 1.13x 1.00x 1.74x 1.74x BLOWFISH 1.04x 1.08x 1.03x 1.13x 1.07x 1.12x 1.03x 1.00x 1.78x 1.74x AES 0.96x 1.00x 1.09x 1.08x 1.14x 1.13x 1.07x 1.03x 1.14x 1.09x AES192 1.00x 1.03x 1.07x 1.03x 1.07x 1.07x 1.06x 1.03x 1.08x 1.11x AES256 1.00x 1.00x 1.06x 1.06x 1.10x 1.06x 1.05x 1.03x 1.10x 1.10x TWOFISH 0.95x 1.10x 1.13x 1.23x 1.05x 1.14x 1.09x 1.13x 1.95x 1.86x ARCFOUR 1.00x 1.00x DES 1.02x 0.98x 1.04x 1.04x 1.05x 1.02x 1.04x 1.00x 1.45x 1.48x TWOFISH128 0.95x 1.10x 1.26x 1.19x 1.09x 1.14x 1.17x 1.00x 2.00x 1.91x SERPENT128 1.02x 1.00x 1.08x 1.04x 1.10x 1.06x 1.08x 1.04x 1.42x 1.42x SERPENT192 1.02x 1.02x 1.06x 1.06x 1.10x 1.08x 1.04x 1.06x 1.42x 1.42x SERPENT256 1.02x 0.98x 1.06x 1.06x 1.10x 1.06x 1.04x 1.06x 1.42x 1.40x RFC2268_40 1.00x 1.00x 1.02x 1.06x 1.04x 1.02x 1.02x 1.02x 1.35x 1.35x SEED 1.00x 0.97x 1.11x 1.05x 1.06x 1.08x 1.08x 1.05x 1.56x 1.57x CAMELLIA128 1.03x 0.97x 1.12x 1.14x 1.06x 1.10x 1.06x 1.06x 1.73x 1.59x CAMELLIA192 1.06x 1.00x 1.13x 1.10x 1.11x 1.11x 1.15x 1.08x 1.57x 1.58x CAMELLIA256 1.06x 1.03x 1.10x 1.10x 1.11x 1.11x 1.13x 1.08x 1.57x 1.62x [v2]: - include stdint.h only when it's available - use uintptr_t instead of long and intptr_t Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
2011-08-03Factor cipher mode code out to separate files.Werner Koch1-0/+187
This is a preparation for adding more modes which are more complicated and thus ask for separate file. For uniformity we do this for all modes except ECB. It has also the advantage that it makes CPU specific variants of the code more easy to implement (e.g. the XOR operations).