Abstract
AbstractApplications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families. Our parallel-multiplication approach also allows for squaring and sub-quadratic Karatsuba enhancements. We demonstrate $$1.9\,\times $$
1.9
×
improvement in decryption throughput in comparison with OpenSSL and $$1.5\,\times $$
1.5
×
improvement in modular exponentiation throughput compared to GMP-6.1.2 on an Intel Xeon CPU. In addition, we show $$1.4\,\times $$
1.4
×
improvement in decryption throughput in comparison with state-of-the-art vector implementations on many-core Knights Landing Xeon Phi hardware. Finally, we show how interleaving Chinese remainder theorem-based RSA calculations within our parallel BPS technique halves decryption latency while providing protection against fault-injection attacks.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Large Integer Multiplication with Arm SVE Instructions;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2023-02-27