Low-Power Feature-Attention Chinese Keyword Spotting Framework with Distillation Learning

Author:

Lei Lei1ORCID,Yuan Guoshun2ORCID,Zhang Tianle3ORCID,Yu Hongjiang1ORCID

Affiliation:

1. Institute of Microelectronics of Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China

2. Institute of Microelectronics of Chinese Academy of Sciences, Beijing, China

3. Institute of Automation, Chinese Academy of Sciences, China and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

Abstract

In this paper, we propose a novel Low-Power Feature-Attention Chinese Keyword Spotting Framework based on a depthwise separable convolution neural network (DSCNN) with distillation learning to recognize speech signals of Chinese wake-up words. The framework consists of a low-power feature-attention acoustic model and its learning methods. Different from the existing model, the proposed acoustic model based on connectionist temporal classification (CTC) focuses on the reduction of power consumption by reducing model network parameters and multiply-accumulate (MAC) operations through our designed feature-attention network and DSCNN. In particular, the feature-attention network is specially designed to extract effective syllable features from a large number of MFCC features. This could refine MFCC features by selectively focusing on important speech signal features and removing invalid speech signal features to reduce the number of speech signal features, which helps to significantly reduce the parameters and MAC operations of the whole acoustic model. Moreover, DSCNN with fewer parameters and MAC operations compared with traditional convolution neural networks is adopted to extract effective high-dimensional features from syllable features. Furthermore, we apply a distillation learning algorithm to efficiently train the proposed low-power acoustic model by utilizing the knowledge of the trained large acoustic model. Experimental results thoroughly verify the effectiveness of our model and show that the proposed acoustic model still has better accuracy than other acoustic models with the lowest power consumption and smallest latency measured by NVIDIA JETSON TX2. It has only 14.524 KB parameters and consumes only 0.141 J energy per query and 17.9 ms latency on the platform, which is hardware-friendly.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference37 articles.

1. 2021. Jetson TX2 module. https://developer.nvidia.com/embedded/jetson-tx2.

2. Convolutional recurrent neural networks for small-footprint keyword spotting;Arik Sercan O.;arXiv preprint arXiv:1703.05390,2017

3. Bi-directional Long Short-Term Memory Model with Semantic Positional Attention for the Question Answering System

4. Large vocabulary Mandarin speech recognition with different approaches in modeling tones

5. Small-footprint keyword spotting using deep neural networks

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3