Usage of Prosody Modification and Acoustic Adaptation for Robust Automatic Speech Recognition (ASR) System-Reference-Cited by-同舟云学术

Usage of Prosody Modification and Acoustic Adaptation for Robust Automatic Speech Recognition (ASR) System

Published:2021-06-30 Issue:3 Volume:35 Page:235-242
ISSN:0992-499X
Container-title:Revue d'Intelligence Artificielle
language:
Short-container-title:RIA

Author:

Bhardwaj Vivek,Kukreja Vinay,Singh Amitoj

Abstract

Most of the automatic speech recognition (ASR) systems are trained using adult speech due to the less availability of the children's speech dataset. The speech recognition rate of such systems is very less when tested using the children's speech, due to the presence of the inter-speaker acoustic variabilities between the adults and children's speech. These inter-speaker acoustic variabilities are mainly because of the higher pitch and lower speaking rate of the children. Thus, the main objective of the research work is to increase the speech recognition rate of the Punjabi-ASR system by reducing these inter-speaker acoustic variabilities with the help of prosody modification and speaker adaptive training. The pitch period and duration (speaking rate) of the speech signal can be altered with prosody modification without influencing the naturalness, message of the signal and helps to overcome the acoustic variations present in the adult's and children's speech. The developed Punjabi-ASR system is trained with the help of adult speech and prosody-modified adult speech. This prosody modified speech overcomes the massive need for children's speech for training the ASR system and improves the recognition rate. Results show that prosody modification and speaker adaptive training helps to minimize the word error rate (WER) of the Punjabi-ASR system to 8.79% when tested using children's speech.

Publisher

International Information and Engineering Technology Association

Subject

Artificial Intelligence,Software

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning Dermoscopy: Unveiling CNN-SVM Synergy in Skin Lesion Detection;2023 4th International Conference on Intelligent Technologies (CONIT);2024-06-21

2. Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges;Multimedia Tools and Applications;2024-03-11

3. Enhanced Emotion Recognition from Spoken Assamese Dialect: A Machine Learning Approach with Language-Independent Features;Traitement du Signal;2023-10-30

4. Proposed Framework for Managing Customer Queries in Banking sector using Robotic Process Automation;2023 International Conference on Smart Computing and Application (ICSCA);2023-02-05

5. Augmentation Techniques for Adult-Speech to Generate Child-Like Speech Data Samples at Scale;IEEE Access;2023