Affiliation:
1. State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences , No.19 Shucun Road, Haidian District, Beijing 100093 , China
2. School of Cyber Security, University of Chinese Academy of Sciences , No.1 Yanqihu East Road, Huairou District, Beijing 101408 , China
Abstract
AbstractMasking is a well used and widely deployed countermeasure against side channel attacks, both in software and hardware. With masking comes at a great cost, search has focused on how to lower a performance penalty or find efficient masking implementation. In particular, our contribution is 2-fold: for software masking, we first find bitsliced implementations of Sbox with Multiplicative Complexity 4 and Multiplicative Depth 2, then adapt the common shares approach introduced by Coron et al. at CHES 2016 to make many cross-products $a_{i}\cdot b_{j}$ can be reuse for parallel ISW-based 32-bit nonlinear operations. Therefore, we improve the efficiency of 2$\times b/4/32$ parallel high-order masking of ISW scheme for RECTANGLE, TANGRAM and KNOT on 32-bit ARM embedded microprocessor, with roughly a 13%-34% speed-up, at cost of $(1+d) \times 32$-bit randomness. For hardware masking, 4 bit cubic Sboxes with quadratic decomposition length 2, including RECTANGLE, TANGRAM, KNOT and LWC third-round candidates, can be implemented with a 3-share and 4-share threshold implementation (TI) by decomposing cubic permutations $S$ as a composition of sub-permutations having lower algebraic degrees. We use two decomposition form: one composition of two quadratic permutations $G$ and $F$, $S = F\circ G$, is for efficiency; the other composition of some linear permutations $A_i$ and one quadratic permutation $G$, $S=A_3 \circ G \circ A_2 \circ G \circ A_1 $, is for reducing the area requirements. For $S = F\circ G$, we introduce a new approach of searching through all possible quadratic permutations $G$ with 2$^{25.71}$, which is effcient than 2$^{26.23}$ in Poschmann et al. at J. Cryptol 2011. For $S=A_3 \circ G \circ A_2 \circ G \circ A_1 $, our approach of finding $A_i$ with complexity 2$^{27.71} $, which is effcient than the method introduced by Moradi et al. at ASIACRYPT 2016. In addition, we proposes a new decomposition that $S=G \circ A_2 \circ G \circ A_1 $. We can find the fastest and the smallest hard-ware decomposition implementation of 4-bit permutations for TI with 3 and 4 shares.
Funder
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)