Author:
Albaqshi Hussain, ,Sagheer Alaa,
Abstract
Automatic speech recognition (ASR) transcribes the human voice into a text automatically. Recently, ASR systems has reached, almost, the human performance in specific scenarios. In contrast, dysarthric speech recognition (DSR) is still a challenging task due to many reasons including unintelligible speech, irregular phonemes articulation, along with scarcity and heterogeneous of data. Most of the existing DSR works are employed the ASR systems that trained on an unimpaired speech to recognize such impaired speech, which of course is impractical and inefficient. In this paper, we developed a deep architecture of the convolutional recurrent neural network (CRNN) model and compared its performance with the vanilla convolutional neural network (CNN) model. We train both models using the samples of the Torgo dataset, which contains a mixed of impaired and unimpaired speech data. The experimental results show that the CRNN model attains 40.6% against 31.4% for the vanilla CNN. This indicates the effectiveness of the proposed hybrid structure of the CRNN to improve the recognition of dysarthric speech.
Publisher
The Intelligent Networks and Systems Society
Subject
General Engineering,General Computer Science
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献