Development of Speech Recognition Systems in Emergency Call Centers-Reference-Cited by-同舟云学术

Development of Speech Recognition Systems in Emergency Call Centers

Published:2021-04-09 Issue:4 Volume:13 Page:634
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Valizada Alakbar,Akhundova Natavan,Rustamov Samir

Abstract

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/13/4/634/pdf

Reference28 articles.

1. Automatic Speech Recognition with Deep Neural Networks for Impaired Speech

2. Robust i-Vector Based Adaptation of DNN Acoustic Model for Speech Recognition;Garimella,2015

3. End-to-end Speech Recognition Using Lattice-free MMI

4. ASR for Under-Resourced Languages From Probabilistic Transcription

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From crisis to opportunity: advancements in emergency language services;Humanities and Social Sciences Communications;2024-09-10

2. An effective speaker adaption using deep learning for the identification of speakers in emergency situation;Multimedia Tools and Applications;2024-07-02

3. Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets;Scientific Reports;2024-06-15

4. Building an NLP based speech recognition technology for emergency call centers;AIP Conference Proceedings;2024

5. Performance Analysis of Human Emotion via Speech Recognition using Convolution Neural Network Algorithm compared with Hidden Markov Model Classifier for Improved Accuracy;2023 9th International Conference on Smart Structures and Systems (ICSSS);2023-11-23