Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children-Reference-Cited by-同舟云学术

Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children

Published:2016-04-12 Issue:3 Volume:23 Page:325-350
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

SERIZEL ROMAIN,GIULIANI DIEGO

Abstract

AbstractThis paper introduces deep neural network (DNN)–hidden Markov model (HMM)-based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kind of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN–HMM. Then, VTLN is investigated by training a DNN–HMM system by using either mel frequency cepstral coefficients normalised with standard VTLN or mel frequency cepstral coefficients derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when the VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. The combination of several approaches is shown to improve the baseline phone error rate performance by thirty per cent to thirty-five per cent relative and the baseline word error rate performance by about ten per cent relative.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference59 articles.

1. Welling L. , Kanthak S. , and Ney H. 1999. Improved methods for vocal tract normalization. In Proceedings of ICASSP, IEEE, New York (NY), United States (ICASSP, ASRU, SLT, IJCNN), vol. 2, 761–4.

2. Serizel R. , and Giuliani D. 2014b. Vocal tract length normalisation approaches to DNN-based children's and adults’ speech recognition. In Proceedings of SLT. IEEE, New York (NY), United States (ICASSP, ASRU, SLT, IJCNN).

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Experimental studies for improving the performance of children's speaker verification system using short utterances;Applied Acoustics;2024-01

2. PGST: A Persian gender style transfer method;Natural Language Engineering;2023-08-15

3. Gammatone-Filterbank Based Pitch-Normalized Cepstral Coefficients for Zero-Resource Children’s ASR;Speech and Computer;2023

4. Enhancement of formant regions in magnitude spectra to develop children’s KWS system in zero resource scenario;Speech Communication;2022-10

5. Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition;Applied Computational Intelligence and Soft Computing;2022-09-16