Abstract
Abstract
People suffering from speech disorders face various challenges in their ability to communicate effectively, and one of the conditions they may experience is dysarthria. Dysarthria is a motor speech disorder that impacts an individual's ability to speak due to difficulties in controlling the muscles responsible for speech production. People with dysarthria may experience difficulties with articulation, pronunciation, intonation, rhythm, and pace, resulting in a slow or slurred speech pattern that can be difficult to understand. Augmentative and Alternative Communication (AAC) aids that utilize speech recognition technology have emerged as an appealing solution to support communication for individuals with dysarthria. Automatic Speech Recognition (ASR) systems trained solely on normal speech data may not accurately recognize dysarthric speech due to variations in their speech patterns and accent differences. However, a significant challenge in training ASR systems for dysarthric speech is the limited availability of data. To overcome these challenges, a hybrid architecture using the Transformer and Connectionist Temporal Classification (CTC) approach is proposed in this work. The transformer architecture is effective in learning speech patterns using limited data due to its self-attention mechanism, while the CTC approach allows for direct mapping between input speech features and output character sequences without requiring explicit alignment information. This approach is especially beneficial for speech recognition in situations involving variations in speech patterns. A hybrid architecture is trained using UA speech corpus that allows it to focus on important features of the speech and capture the relationships between them, leading to more accurate speech recognition. The performance of the proposed ASR system shows a remarkable decrease in Word Error Rate (WER) up to 2.78% and 15.67% for individuals with dysarthria who have low and very low intelligibility, respectively.
Publisher
Research Square Platform LLC
Reference37 articles.
1. Multiple-view multiple-learner active learning;Zhang Q;Pattern Recogn,2010
2. Mengistu KT, Rudzicz F (2011) Adapting acoustic and lexical models to dysarthric speech. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4924–4927. IEEE,
3. Simon King, and Pawel Swietojanski. Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In Interspeech;Christensen H,2013
4. Cox. Modelling errors in automatic speech recognition for dysarthric speakers;Caballero Morales S;EURASIP J Adv Signal Process,2009
5. Athanassios Hatzis, Peter O’Neill, and Rebecca Palmer. A speech-controlled environmental control system for people with severe dysarthria;Hawley MS;Med Eng Phys,2007