Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions-Reference-Cited by-同舟云学术

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

Published:2022-06-02 Issue:1 Volume:9 Page:1-23
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Bawa Puneet,Kadyan Virender,Tripathy Abinash,Singh Thipendra P.

Abstract

AbstractDevelopment of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker variability in achieving success of a recognition system. In this paper, four front-end approaches: mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), relative spectral-perceptual linear prediction (RASTA-PLP) and power-normalized cepstral coefficients (PNCC) have been investigated to generate unique and robust feature vectors at different SNR values. Furthermore, to handle the large training data complexity, parameter optimization has been performed with sequence-discriminative training techniques: maximum mutual information (MMI), minimum phone error (MPE), boosted-MMI (bMMI), and state-level minimum Bayes risk (sMBR). It has been demonstrated by selection of an optimal value of parameters using lattice generation, and adjustments of learning rates. In proposed framework, four different systems have been tested by analyzing various feature extraction approaches (with or without speaker normalization through Vocal Tract Length Normalization (VTLN) approach in test set) and classification strategy on with or without artificial extension of train dataset. To compare each system performance, true matched (adult train and test—S1, child train and test—S2) and mismatched (adult train and child test—S3, adult + child train and child test—S4) systems on large adult and very small Punjabi clean speech corpus have been demonstrated. Consequently, gender-based in-domain data augmented is used to moderate acoustic and phonetic variations throughout adult and children’s speech under mismatched conditions. The experiment result shows that an effective framework developed on PNCC + VTLN front-end approach using TDNN-sMBR-based model through parameter optimization technique yields a relative improvement (RI) of 40.18%, 47.51%, and 49.87% in matched, mismatched and gender-based in-domain augmented system under typical clean and noisy conditions, respectively.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-022-00651-7.pdf

Reference48 articles.

1. Bawa P, Kadyan V (2021) Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions. Appl Acoust 175:107810

2. López G, Quesada L, Guerrero LA (2017) Alexa vs. Siri vs. Cortana vs. Google Assistant: a comparison of speech-based natural user interfaces. International conference on applied human factors and ergonomics. Springer, Cham, pp 241–250

3. Hoy MB (2018) Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med Ref Serv Q 37(1):81–88

4. Kumar A, Aggarwal RK (2021) An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for Hindi speech recognition. J Reliable Intell Environ. https://doi.org/10.1007/s40860-021-00140-7

5. Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Automatic Speech Recognition for Punjabi Dialects: An Experimental Analysis of Incorporating Prosodic Features and Acoustic Variability Mitigation;SN Computer Science;2024-08-01

2. Investigating Lattice-Free Acoustic Modeling for Children Automatic Speech Recognition in Low-Resource Settings Under Mismatched Conditions;SN Computer Science;2024-04-23

3. Comprehensive Phonological Analysis for Clinical Implication Using Self-Attention Based Grapheme to Phoneme Modeling Under Low-Resource Conditions;2023 31st Irish Conference on Artificial Intelligence and Cognitive Science (AICS);2023-12-07

4. Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition;IEEE Access;2023