Prosodic Feature-Based Discriminatively Trained Low Resource Speech Recognition System-Reference-Cited by-同舟云学术

Prosodic Feature-Based Discriminatively Trained Low Resource Speech Recognition System

Published:2022-01-06 Issue:2 Volume:14 Page:614
ISSN:2071-1050
Container-title:Sustainability
language:en
Short-container-title:Sustainability

Author:

Hasija Taniya^ORCID,Kadyan Virender,Guleria Kalpna^ORCID,Alharbi Abdullah,Alyami Hashem,Goyal Nitin^ORCID

Abstract

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.

Funder

Taif University

Publisher

MDPI AG

Subject

Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development

Link

https://www.mdpi.com/2071-1050/14/2/614/pdf

Reference50 articles.

1. Automatic Speech Recognition;Yu,2016

2. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants

3. Automatic speech recognition and speech variability: A review

4. A review on speech recognition challenges and approaches;Radha;World Comput. Sci. Inf. Technol. J. (WCSIT),2012

5. Emotions, speech and the ASR framework;Bosch;Speech Commun.,2003

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advanced differential evolution for gender-aware English speech emotion recognition;Scientific Reports;2024-07-31

2. Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges;Multimedia Tools and Applications;2024-03-11

3. Efficacy of Current Dysarthric Speech Recognition Techniques;2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech);2023-12-23

4. English Speech Emotion Classification Based on Multi-Objective Differential Evolution;Applied Sciences;2023-11-13

5. An Attention Based Bi-LSTM DenseNet Model for Named Entity Recognition in English Texts;Wireless Personal Communications;2023-03-12