Affiliation:
1. Beijing Smartchip Microelectronics Technology Co., Ltd., Beijing 102299, China
2. School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
Abstract
Recently, frequent data movement between computing units and memory during floating-point arithmetic has become a major problem for scientific computing. Computing-in-memory (CIM) is a novel computing paradigm that merges computing logic into memory, which can address the data movement problem with excellent power efficiency. However, the previous CIM paradigm failed to support double-precision floating-point format (FP64) due to its computing complexity. This paper presents a novel all-digital CIM macro-DCIM-FF to complete FP64 based fused multiply-add (FMA) operation for the first time. With 16 sub-CIM cells integrating digital multipliers to complete mantissa multiplication, DCIM-FF is able to provide correct rounded implementations for normalized/denormalized inputs in round-to-nearest-even mode and round-to-zero mode, respectively. To evaluate our design, we synthesized and tested the DCIM-FF macro in 55-nm CMOS technology. With a minimum power efficiency of 0.12 mW and a maximum computing efficiency of 26.9 TOPS/W, we successfully demonstrated that DCIM-FF can run the FP64-based FMA operation without error. Compared to related works, the proposed DCIM-FF macro shows significant power efficiency improvement and less area overhead based on CIM technology. This work paves a novel pathway for high-performance implementation of an FP64-based matrix-vector multiplication (MVM) operation, which is essential for hyperscale scientific computing.
Funder
The Laboratory Open Fund of Beijing Smart-Chip Microelectronics Technology Co., Ltd.
National Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference35 articles.
1. A survey of power and energy efficient techniques for high performance numerical linear algebra operations;Tan;Parallel Comput.,2014
2. Multiply accumulate operations in memristor crossbar arrays for analog computing;Chen;J. Semicond.,2021
3. Feinberg, B., Vengalam UK, R., Whitehair, N., Wang, S., and Ipek, E. (2018, January 1–6). Enabling scientific computing on memristive accelerators. Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA.
4. Cellular logic-in-memory arrays;Kautz;IEEE Trans. Comput.,1969
5. A logic-in-memory computer;Stone;IEEE Trans. Comput.,1970