MaskSIMD-lib: on the performance gap of a generic C optimized assembly and wide vector extensions for masked software with an Ascon-p test case

Author:

Salomon Dor,Levi ItamarORCID

Abstract

AbstractEfficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to show the prime performance gaps we can expect between different implementations and suggest how to harness the underlying hardware efficiently, a daunting task for various masking-orders or masking algorithm (multiplications, refreshing etc.). This paper focuses on implementations targeting wide vector bitsliced designs, such as the ISAP algorithm. We explore concrete instances of implementations utilizing processors enabled by wide-vector capability extensions of the AMD64 Instruction Set Architecture (ISA); namely, the SSE2/3/4.1, AVX-2 and AVX-512 Streaming Single Instruction Multiple Data extensions. These extensions mainly enable efficient memory level parallelism and provide a gradual reduction in computation-time as a function of the level of extension and the hardware support for instruction-level parallelism. For the first time we provide a complete open-source repository of such gadgets tailored for these extensions, various gadgets types and for all orders. We evaluate the disparities between $$ generic $$ generic high-level language masking implementations for optimized (inline-) assembly and conventional single execution path data-path architectures such as the ARM architecture. We underscore the crucial trade-off between state storage in the data-memory as compared to keeping it in the register-file (RF). This relates specifically to masked designs, and is particularly difficult to resolve because it requires inline-assembly manipulations and is not natively supported by compilers. Moreover, as the masking order (d) increases and the state gets larger, there must be an increase in data memory read/write accesses for state handling since the RF is simply not large enough. This requires careful optimization which depends to a considerable extent on the underlying algorithm to implement. We discuss how full utilization of SSE extensions is not always possible; i.e. when d is not a power of two, and pin-point the optimal d values and very sub-optimal values of d which aggressively under-utilize the hardware. More generally, this paper presents several different fully generic masked implementations for any order or multiple highly optimized (inline-) assembly instances which are quite generic (for a wide spectrum of ISAs and extensions), and provide very specific implementations targeting specific extensions. The goal is to promote open-source availability, research, improvement and implementations relating to SCA security and masked designs. The building blocks and methodologies provided here are portable and can be easily adapted to other algorithms.

Funder

Israel Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3