Research on High-Performance Fourier Transform Algorithms Based on the NPU
-
Published:2024-01-01
Issue:1
Volume:14
Page:405
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Li Qing12, Zuo Decheng1, Feng Yi1, Wen Dongxin1
Affiliation:
1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China 2. Jiangsu Automation Research Institute, Lianyungang 222006, China
Abstract
Backpack computers require powerful, intelligent computing capabilities for field wearables while taking energy consumption into careful consideration. A recommended solution for this demand is the CPU + NPU-based SoC. In many wearable intelligence applications, the Fourier Transform is an essential, computationally intensive preprocessing task. However, due to the unique structure of the NPU, the conventional Fourier Transform algorithms cannot be applied directly to it. This paper proposes two NPU-accelerated Fourier Transform algorithms that leverage the unique hardware structure of the NPU and provides three implementations of those algorithms, namely MM-2DFT, MV-2FFTm, and MV-2FFTv. Then, we benchmarked the speed and energy efficiency of our algorithms for the gray image edge filtering task on the Huawei Atlas200I-DK-A2 development kits against the Cooley-Tukey algorithm running on CPU and GPU platforms. The experiment results reveal MM-2DFT outperforms OpenCL-based FFT on NVIDIA Tegra X2 GPU for small input sizes, with a 4- to 8-time speedup. As the input image resolution exceeds 2048, MV-2FFTv approaches GPU computation speed. Additionally, two scenarios were tested and analyzed for energy efficiency, revealing that cube units of the NPU are more energy efficient. The vector and CPU units are better suited for sparse matrix multiplication and small-scale inputs, respectively.
Funder
National Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference31 articles.
1. Sipola, T., Alatalo, J., Kokkonen, T., and Rantonen, M. (2022, January 27–29). Artificial Intelligence in the IoT Era: A Review of Edge AI Hardware and Software. Proceedings of the 2022 31st Conference of Open Innovations Association (FRUCT), Helsinki, Finland. 2. AI on the Edge: A Comprehensive Review;Su;Artif. Intell. Rev.,2022 3. Tan, T., and Cao, G. (2020, January 6–9). FastVA: Deep Learning Video Analytics through Edge Processing and NPU in Mobile. Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications, Toronto, ON, Canada. 4. Tan, T., and Cao, G. (2021, January 18–21). Efficient Execution of Deep Neural Networks on Mobile Devices with NPU. Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021), Nashville, TN, USA. 5. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., and Borchers, A. (2017, January 24–28). In-Datacenter Performance Analysis of a Tensor Processing Unit. Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada.
|
|