Affiliation:
1. Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, Liaoning, P. R. China
Abstract
Keyword spotting plays a crucial role in realizing voice-based user interaction on intelligent equipment terminals and service robots. In this task, it remains challenging to achieve the balance between low memory and high precision. To better satisfy this requirement, we propose an end-to-end neural architecture with sandglass residual blocks embedded with the gated channel-wise attention mechanism. The sandglass residual blocks utilize 1D separable convolutions to extract bottleneck temporal features, which can effectively drive the model to focus more on the speech segment with lower parameters. Especially, the gated attention mechanism helps the model enhance the critical speech temporal features and suppress the useless ones and further focus on the most important part of the human speech region for keyword spotting. The experimental results on Google Speech Commands Dataset show that our proposed model has an accuracy of 97.4[Formula: see text] with only 46K parameters. Compared with the baseline method with the highest accuracy, our model parameters are decreased by 54[Formula: see text] and accuracy is increased by 0.8[Formula: see text]. That makes us take further step in achieving the goal of low memory and high precision.
Funder
liaoning united foundation
liaoning key r&d program
liaoning revitalization talents program
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献