Empowering Dysarthric Communication: Hybrid Transformer-CTC based Speech Recognition System

Author:

R Vinotha1,D Hepsiba1,Anand Vijay1

Affiliation:

1. Karunya University

Abstract

Abstract People suffering from speech disorders face various challenges in their ability to communicate effectively, and one of the conditions they may experience is dysarthria. Dysarthria is a motor speech disorder that impacts an individual's ability to speak due to difficulties in controlling the muscles responsible for speech production. People with dysarthria may experience difficulties with articulation, pronunciation, intonation, rhythm, and pace, resulting in a slow or slurred speech pattern that can be difficult to understand. Augmentative and Alternative Communication (AAC) aids that utilize speech recognition technology have emerged as an appealing solution to support communication for individuals with dysarthria. Automatic Speech Recognition (ASR) systems trained solely on normal speech data may not accurately recognize dysarthric speech due to variations in their speech patterns and accent differences. However, a significant challenge in training ASR systems for dysarthric speech is the limited availability of data. To overcome these challenges, a hybrid architecture using the Transformer and Connectionist Temporal Classification (CTC) approach is proposed in this work. The transformer architecture is effective in learning speech patterns using limited data due to its self-attention mechanism, while the CTC approach allows for direct mapping between input speech features and output character sequences without requiring explicit alignment information. This approach is especially beneficial for speech recognition in situations involving variations in speech patterns. A hybrid architecture is trained using UA speech corpus that allows it to focus on important features of the speech and capture the relationships between them, leading to more accurate speech recognition. The performance of the proposed ASR system shows a remarkable decrease in Word Error Rate (WER) up to 2.78% and 15.67% for individuals with dysarthria who have low and very low intelligibility, respectively.

Publisher

Research Square Platform LLC

Reference37 articles.

1. Multiple-view multiple-learner active learning;Zhang Q;Pattern Recogn,2010

2. Mengistu KT, Rudzicz F (2011) Adapting acoustic and lexical models to dysarthric speech. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4924–4927. IEEE,

3. Simon King, and Pawel Swietojanski. Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In Interspeech;Christensen H,2013

4. Cox. Modelling errors in automatic speech recognition for dysarthric speakers;Caballero Morales S;EURASIP J Adv Signal Process,2009

5. Athanassios Hatzis, Peter O’Neill, and Rebecca Palmer. A speech-controlled environmental control system for people with severe dysarthria;Hawley MS;Med Eng Phys,2007

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3