Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks-Reference-Cited by-同舟云学术

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Published:2021-09-10 Issue:18 Volume:11 Page:8412
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Na Hyeong-Ju,Park Jeong-Sik^ORCID

Abstract

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accent embedding method, in which the accent embedding features are directly fed into the ASR network. Although the method improves the performance of accented speech recognition, it has some restrictions, such as increasing the computational costs. This study proposes an efficient method of training the ASR model for accented speech in a domain adversarial way based on the Domain Adversarial Neural Network (DANN). The DANN plays a role as a domain adaptation in which the training data and test data have different distributions. Thus, our approach is expected to construct a reliable ASR model for accented speech by reducing the distribution differences between accented speech and standard speech. DANN has three sub-networks: the feature extractor, the domain classifier, and the label predictor. To adjust the DANN for accented speech recognition, we constructed these three sub-networks independently, considering the characteristics of accented speech. In particular, we used an end-to-end framework based on Connectionist Temporal Classification (CTC) to develop the label predictor, a very important module that directly affects ASR results. To verify the efficiency of the proposed approach, we conducted several experiments of accented speech recognition for four English accents including Australian, Canadian, British (England), and Indian accents. The experimental results showed that the proposed DANN-based model outperformed the baseline model for all accents, indicating that the end-to-end domain adversarial training effectively reduced the distribution differences between accented speech and standard speech.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/18/8412/pdf

Reference23 articles.

1. Hierarchical Phoneme Classification for Improved Speech Recognition

2. Language Model Using Neural Turing Machine Based on Localized Content-Based Addressing

3. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

4. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DFNet: Decoupled Fusion Network for Dialectal Speech Recognition;Mathematics;2024-06-17

2. An Image Classification Method of Unbalanced Ship Coating Defects Based on DCCVAE-ACWGAN-GP;Coatings;2024-02-27

3. The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy;Speech Communication;2024-02

4. The Application and Research of Intelligent Mobile Terminal in Mixed Listening and Speaking Teaching of College English;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2024

5. Towards a Corpus (and Language)-Independent Screening of Parkinson’s Disease from Voice and Speech through Domain Adaptation;Bioengineering;2023-11-15