Optimization of the 24-Bit Fixed-Point Format for the Laplacian Source
-
Published:2023-01-21
Issue:3
Volume:11
Page:568
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Perić Zoran1ORCID, Dinčić Milan1
Affiliation:
1. Faculty of Electronic Engineering Niš, University of Niš, 18104 Niš, Serbia
Abstract
The 32-bit floating-point (FP32) binary format, commonly used for data representation in computers, introduces high complexity, requiring powerful and expensive hardware for data processing and high energy consumption, hence being unsuitable for implementation on sensor nodes, edge devices, and other devices with limited hardware resources. Therefore, it is often necessary to use binary formats of lower complexity than FP32. This paper proposes the usage of the 24-bit fixed-point format that will reduce the complexity in two ways, by decreasing the number of bits and by the fact that the fixed-point format has significantly less complexity than the floating-point format. The paper optimizes the 24-bit fixed-point format and examines its performance for data with the Laplacian distribution, exploiting the analogy between fixed-point binary representation and uniform quantization. Firstly, the optimization of the 24-bit uniform quantizer is performed by deriving two new closed-form formulas for a very accurate calculation of its maximal amplitude. Then, the 24-bit fixed-point format is optimized by optimization of its key parameter and by proposing two adaptation procedures, with the aim to obtain the same performance as of the optimal uniform quantizer in a wide range of variance of input data. It is shown that the proposed 24-bit fixed-point format achieves for 18.425 dB higher performance than the floating-point format with the same number of bits while being less complex.
Funder
Science Fund of the Republic of Serbia
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference30 articles.
1. (2022, September 07). Standard for Floating-Point Arithmetic IEEE 754-2019. Available online: https://standards.ieee.org/ieee/754/6210/. 2. Tagliavini, G., Mach, S., Rossi, D., Marongiu, A., and Benini, L. (2018, January 19–23). A Transprecision Floating-Point Platform for Ultra-Low Power Computing. Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany. 3. Cattaneo, D., Di Bello, A., Cherubin, S., Terraneo, F., and Agosta, G. (2018, January 29–31). Embedded Operating System Optimization through Floating to Fixed Point Compiler Transformation. Proceedings of the 2018 21st Euromicro Conference on Digital System Design (DSD), Prague, Czech Republic. 4. Zhang, A., Lipton, Z.-C., Li, M., and Smola, A.-J. (2020). Dive into Deep Learning, Amazon Science. 5. Verucchi, M., Brilli, G., Sapienza, D., Verasani, M., Arena, M., Gatti, F., Capotondi, A., Cavicchioli, R., Bertogna, M., and Solieri, M. (2020, January 8–11). A Systematic Assessment of Embedded Neural Networks for Object Detection. Proceedings of the 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|