Abstract
AbstractSpeech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.
Publisher
Springer Science and Business Media LLC
Subject
Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems
Reference21 articles.
1. Errattahi R, El Hannani A. Recent advances in LVCSR: a benchmark comparison of performances. Int J Electr Comput Eng. 2017;7(6):3358–68.
2. Errattahi R, El Hannani A, Ouahmane H. Automatic speech recognition errors detection and correction: a review. Procedia Comput Sci. 2018;128:32–7.
3. Errattahi R, El Hannani A, Hain T, Ouahmane H. System-independent asr error detection and classification using recurrent neural network. Comput Speech Language. 2019;55:187–99.
4. Zhang R, Rudnicky AI. Word level confidence annotation using combinations of features. In: The proceedings of the European conference on speech communication and technology (EuroSpeech); 2001. p. 2105–8.
5. Gibson M, Hain T. Application of SVM-based correctness predictions to unsupervised discriminative speaker adaptation. In: The proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP); 2012. p. 4341–4.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献