A 34.7 µW Speech Keyword Spotting IC Based on Subband Energy Feature Extraction-Reference-Cited by-同舟云学术

A 34.7 µW Speech Keyword Spotting IC Based on Subband Energy Feature Extraction

Published:2023-07-31 Issue:15 Volume:12 Page:3287
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Wu Gexuan¹,Wei Jianlong¹,Wang Shuai¹,Wei Guangshun¹,Li Bing¹

Affiliation:

1. State Key Laboratory of Radio Frequency Heterogeneous Integration, Shenzhen University, Shenzhen 518060, China

Abstract

In the era of the Internet of Things (IoT), voice control has enhanced human–machine interaction and the accuracy of keyword spotting (KWS) algorithms has reached 97%; however, the high power consumption of KWS algorithms caused by their huge computing and storage requirements has limited their application in Artificial Intelligence of Things (AIoT) devices. In this study, voice features are extracted by utilizing the fast discrete cosine transform (FDCT) for frequency-domain transformation and to shorten the process of calculating the logarithmic spectrum and cepstrum. The designed KWS system is a two-stage wake-up system, with a sound detection (SD) awakening KWS. The inference process of the KWS network is achieved using time-division computation, reducing the KWS clock to an ultra-low frequency of 24 kHz.At the same time, the implementation of a depthwise separable convolution neural network (DSCNN) greatly reduces the parameter quantity and computation. Under the GSMC 0.11 µm technology, post-layout simulation results show that the total synthesized area of the entire system circuit is 0.58 mm2, the power consumption is 34.7 µW, and the F1-score of the KWS is 0.89 with 10 dB noise, which makes it suitable as a KWS system in AIoT devices.

Funder

Shenzhen Science and Technology Development Funds

Guangdong Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/15/3287/pdf

Reference18 articles.

1. Chen, G., Parada, C., and Heigold, G. (2014, January 4–9). Small-footprint keyword spotting using deep neural networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.

2. Shan, C., Zhang, J., Wang, Y., and Xie, L. (2018, January 2–6). Attention-based end-to-end models for small-footprint keyword spotting. Proceedings of the Interspeech 2018, Hyderabad, India.

3. Mittermaier, S., Kurzinger, L., Waschneck, B., and Rigoll, G. (2020, January 4–8). Small-footprint keyword spotting on raw audio data with Sinc-Convolutions. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.

4. A fixed-point neural network for keyword detection on resource constrained hardware;Shah;IEEE Signal Process. Syst.,2015

5. Price, M., Glass, J., and Chandrakasan, A. (2017, January 5–9). 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA.