Author:
Hassani Sadi Mohammad,Sudarshan Chirag,Wehn Norbert
Abstract
AbstractThere is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07$$\times $$
×
while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is $$\approx 1\%$$
≈
1
%
for various networks with image and natural language processing datasets.
Funder
Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Publisher
Springer Science and Business Media LLC
Reference32 articles.
1. Choquette J (2022) Nvidia hopper gpu: scaling performance. In: 2022 IEEE Hot Chips 34 Symposium (HCS), IEEE Computer Society, (pp. 1–46)
2. Patterson D, Gonzalez J, Le Q, Liang C, Munguia L-M, Rothchild D, So D, Texier M, Dean J (2021) Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350
3. He X, Liu J, Xie Z, Chen H, Chen G, Zhang W, Li D (2021) Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators. In: Proceedings of the ACM International Conference on Supercomputing, (pp. 227–241)
4. You J, Chung J-W, Chowdhury M (2022) Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. arXiv preprint arXiv:2208.06102
5. Jouppi NP, Yoon DH, Ashcraft M, Gottscho M, Jablin TB, Kurian G, Laudon J, Li S, Ma P, Ma X(2021) Ten lessons from three generations shaped Google’s TPUv4i: Industrial product. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), (pp. 1–14). IEEE