MaskSIMD-lib: on the performance gap of a generic C optimized assembly and wide vector extensions for masked software with an Ascon-p test case
-
Published:2023-05-29
Issue:3
Volume:13
Page:325-342
-
ISSN:2190-8508
-
Container-title:Journal of Cryptographic Engineering
-
language:en
-
Short-container-title:J Cryptogr Eng
Author:
Salomon Dor, Levi ItamarORCID
Abstract
AbstractEfficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to show the prime performance gaps we can expect between different implementations and suggest how to harness the underlying hardware efficiently, a daunting task for various masking-orders or masking algorithm (multiplications, refreshing etc.). This paper focuses on implementations targeting wide vector bitsliced designs, such as the ISAP algorithm. We explore concrete instances of implementations utilizing processors enabled by wide-vector capability extensions of the AMD64 Instruction Set Architecture (ISA); namely, the SSE2/3/4.1, AVX-2 and AVX-512 Streaming Single Instruction Multiple Data extensions. These extensions mainly enable efficient memory level parallelism and provide a gradual reduction in computation-time as a function of the level of extension and the hardware support for instruction-level parallelism. For the first time we provide a complete open-source repository of such gadgets tailored for these extensions, various gadgets types and for all orders. We evaluate the disparities between $$ generic $$
generic
high-level language masking implementations for optimized (inline-) assembly and conventional single execution path data-path architectures such as the ARM architecture. We underscore the crucial trade-off between state storage in the data-memory as compared to keeping it in the register-file (RF). This relates specifically to masked designs, and is particularly difficult to resolve because it requires inline-assembly manipulations and is not natively supported by compilers. Moreover, as the masking order (d) increases and the state gets larger, there must be an increase in data memory read/write accesses for state handling since the RF is simply not large enough. This requires careful optimization which depends to a considerable extent on the underlying algorithm to implement. We discuss how full utilization of SSE extensions is not always possible; i.e. when d is not a power of two, and pin-point the optimal d values and very sub-optimal values of d which aggressively under-utilize the hardware. More generally, this paper presents several different fully generic masked implementations for any order or multiple highly optimized (inline-) assembly instances which are quite generic (for a wide spectrum of ISAs and extensions), and provide very specific implementations targeting specific extensions. The goal is to promote open-source availability, research, improvement and implementations relating to SCA security and masked designs. The building blocks and methodologies provided here are portable and can be easily adapted to other algorithms.
Funder
Israel Science Foundation
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Reference26 articles.
1. Barthe, G., Belaïd, S., Cassiers, G., Fouque, B.P.-A., Grégoire, B., Standaert, F.-X.: Maskverif: Automated verification of higher-order masking in presence of physical defaults. In: European Symposium on Research in Computer Security, pp. 300–318. Springer (2019) 2. Barthe, G., Belaïd, S., Dupressoir, F., Fouque, P.-A., Grégoire, B., Strub, P.-Y., Zucchini, R.: Strong non-interference and type-directed higher-order masking. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 116–129 (2016) 3. Barthe, G., Dupressoir, F., Faust, S., Grégoire, B., Standaert, F.-X., Strub, P.-Y.: Parallel implementations of masking schemes and the bounded moment leakage model. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 535–566. Springer (2017) 4. Belaïd, S., Dagand, P.-E., Mercadier, D., Rivain, M., Wintersdorff, R.: Tornado: automatic generation of probing-secure masked bitsliced implementations. In: Eurocrypt 2020-39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, vol. 12107, pp. 311–341. Springer (2020) 5. Bilgin, B., De Meyer, L., Duval, S., Levi, I., Standaert, F.-X.: Low and depth and efficient inverses: a guide on s-boxes for low-latency masking. IACR Trans. Symmetric Cryptol. 2020(1), 144–184 (2020)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|