SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs-Reference-Cited by-同舟云学术

SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs

Published:2023-11-09 Issue:6 Volume:22 Page:1-22
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Wu Jun-Shen¹^ORCID,Hsu Tsen-Wei¹^ORCID,Liu Ren-Shuo¹^ORCID

Affiliation:

1. Department of Electrical Engineering, National Tsing Hua University, Taiwan

Abstract

Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more accessible to energy-constrained devices, researchers are exploring new algorithmic techniques and hardware designs that can reduce memory and computation requirements. In this work, we present self-gating float (SG-Float), algorithm hardware co-design of a novel binary number format, which can significantly reduce memory access and computing power requirements in CNNs. SG-Float is a self-gating format that uses the exponent to self-gate the mantissa to zero, exploiting the characteristic of floating-point that the exponent determines the magnitude of a floating-point value and the error tolerance property of CNNs. SG-Float represents relatively small values using only the exponent, which increases the proportion of ineffective mantissas, corresponding to reducing mantissa multiplications of floating-point numbers. To minimize the accuracy loss caused by the approximation error introduced by SG-Float, we propose a fine-tuning process to determine the exponent thresholds of SG-Float and reclaim the accuracy loss. We also develop a hardware optimization technique, called the SG-Float buffering strategy, to best match SG-Float with CNN accelerators and further reduce memory access. We apply the SG-Float buffering strategy to vector-vector multiplication processing elements (PEs), which NVDLA adopts, in TSMC 40nm technology. Our evaluation results demonstrate that SG-Float can achieve up to 35% reduction in memory access power and up to 54% reduction in computing power compared with AdaptivFloat, a state-of-the-art format, with negligible power and area overhead. Additionally, we show that SG-Float can be combined with neural network pruning methods to further reduce memory access and mantissa multiplications in pruned CNN models. Overall, our work shows that SG-Float is a promising solution to the problem of CNN memory access and computing power.

Funder

NSTC

MOE

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3624582

Reference44 articles.

1. 9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

2. FATNN: Fast and Accurate Ternary Neural Networks