Parallel modular multiplication using 512-bit advanced vector instructions-Reference-Cited by-同舟云学术

Parallel modular multiplication using 512-bit advanced vector instructions

Published:2021-02-13 Issue: Volume: Page:
ISSN:2190-8508
Container-title:Journal of Cryptographic Engineering
language:en
Short-container-title:J Cryptogr Eng

Author:

Buhrow Benjamin^ORCID,Gilbert Barry,Haider Clifton

Abstract

AbstractApplications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families. Our parallel-multiplication approach also allows for squaring and sub-quadratic Karatsuba enhancements. We demonstrate

$$1.9\,\times $$

1.9 × improvement in decryption throughput in comparison with OpenSSL and

$$1.5\,\times $$

1.5 × improvement in modular exponentiation throughput compared to GMP-6.1.2 on an Intel Xeon CPU. In addition, we show

$$1.4\,\times $$

1.4 × improvement in decryption throughput in comparison with state-of-the-art vector implementations on many-core Knights Landing Xeon Phi hardware. Finally, we show how interleaving Chinese remainder theorem-based RSA calculations within our parallel BPS technique halves decryption latency while providing protection against fault-injection attacks.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

http://link.springer.com/content/pdf/10.1007/s13389-021-00256-9.pdf

Reference30 articles.

1. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of checking cryptographic protocols for faults (extended abstract). In: Advances in Cryptology—EUROCRYPT ’97, International Conference on the Theory and Application of Cryptographic Techniques, May 11–15, 1997, Lecture Notes in Computer Science, vol. 1233, pp. 37–51. Springer (1997). https://doi.org/10.1007/3-540-69053-0_4

2. Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Selected Areas in Cryptography—SAC, August 14–16, 2013, pp. 471–489 (2013). https://doi.org/10.1007/978-3-662-43414-7_24

3. Chang, C., Yao, S., Yu, D.: Vectorized big integer operations for cryptosystems on the Intel mic architecture. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp. 194–203 (2015). https://doi.org/10.1109/HiPC.2015.54

4. Drucker, N., Gueron, S.: Fast modular squaring with AVX512IFMA. Cryptology ePrint Archive, Report 2018/335 (2018). http://eprint.iacr.org/2018/335

5. Emmart, N., Luitjens, J., Weems, C., Woolley, C.: Optimizing modular multiplication for NVIDIA’s Maxwell GPUs. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 47–54 (2016). https://doi.org/10.1109/ARITH.2016.21

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Large Integer Multiplication with Arm SVE Instructions;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2023-02-27