FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks
Author:
Bae Seongwoo1ORCID, Kim Haechan1ORCID, Lee Seongjoo234ORCID, Jung Yunho15ORCID
Affiliation:
1. School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Republic of Korea 2. Department of Semiconductor Systems Engineering, Sejong University, Seoul 05006, Republic of Korea 3. Institute of Semiconductor and System IC, Sejong University, Seoul 05006, Republic of Korea 4. Department of Convergence Engineering of Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea 5. Department of Smart Air Mobility, Korea Aerospace University, Goyang-si 10540, Republic of Korea
Abstract
Keyword spotting (KWS) systems are used for human–machine communications in various applications. In many cases, KWS involves a combination of wake-up-word (WUW) recognition for device activation and voice command classification tasks. These tasks present a challenge for embedded systems due to the complexity of deep learning algorithms and the need for optimized networks for each application. In this paper, we propose a depthwise separable binarized/ternarized neural network (DS-BTNN) hardware accelerator capable of performing both WUW recognition and command classification on a single device. The design achieves significant area efficiency by redundantly utilizing bitwise operators in the computation of the binarized neural network (BNN) and ternary neural network (TNN). In a complementary metal-oxide semiconductor (CMOS) 40 nm process environment, the DS-BTNN accelerator demonstrated significant efficiency. Compared with a design approach where BNN and TNN were independently developed and subsequently integrated as two separate modules into the system, our method achieved a 49.3% area reduction while yielding an area of 0.558 mm2. The designed KWS system, which was implemented on a Xilinx UltraScale+ ZCU104 field-programmable gate array (FPGA) board, receives real-time data from the microphone, preprocesses them into a mel spectrogram, and uses this as input to the classifier. Depending on the order, the network operates as a BNN or a TNN for WUW recognition and command classification, respectively. Operating at 170 MHz, our system achieved 97.1% accuracy in BNN-based WUW recognition and 90.5% in TNN-based command classification.
Funder
Ministry of Trade, Industry, and Energy IDEC
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference28 articles.
1. Blouw, P., Malik, G., Morcos, B., Voelker, A.R., and Eliasmith, C. (2021). Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware. arXiv. 2. Zhang, Y., Suda, N., Lai, L., and Chandra, V. (2018). Hello Edge: Keyword Spotting on Microcontrollers. arXiv. 3. A depthwise separable convolutional neural network for keyword spotting on an embedded system;Epp;EURASIP J. Audio Speech Music. Process.,2020 4. Xiang, L., Lu, S., Wang, X., Liu, H., Pang, W., and Yu, H. (2019, January 25–27). Implementation of LSTM Accelerator for Speech Keywords Recognition. Proceedings of the 2019 IEEE 4th International Conference on Integrated Circuits and Microsystems (ICICM), Beijing, China. 5. Song, D., Yin, S., Ouyang, P., Liu, L., and Wei, S. (2018, January 20–22). Low Bits: Binary Neural Network for Vad and Wakeup. Proceedings of the 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, China.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|