Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech-Reference-Cited by-同舟云学术

Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech

Published:2022-08-02 Issue:15 Volume:10 Page:2727
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Algabri Mohammed,Mathkour Hassan,Alsulaiman Mansour,Bencherif Mohamed A.^ORCID

Abstract

A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the presence of an instructor. In this paper, we propose deep learning-based techniques to build a high-performance versatile CAPT system for mispronunciation detection and diagnosis (MDD) and articulatory feedback generation for non-native Arabic learners. The proposed system can locate the error in pronunciation, recognize the mispronounced phonemes, and detect the corresponding articulatory features (AFs), not only in words but even in sentences. We formulate the recognition of phonemes and corresponding AFs as a multi-label object recognition problem, where the objects are the phonemes and their AFs in a spectral image. Moreover, we investigate the use of cutting-edge neural text-to-speech (TTS) technology to generate a new corpus of high-quality speech from predefined text that has the most common substitution errors among Arabic learners. The proposed model and its various enhanced versions achieved excellent results. We compared the performance of the different proposed models with the state-of-the-art end-to-end technique of MDD, and our system had a better performance. In addition, we proposed using fusion between the proposed model and the end-to-end model and obtained a better performance. Our best model achieved a 3.83% phoneme error rate (PER) in the phoneme recognition task, a 70.53% F1-score in the MDD task, and a detection error rate (DER) of 2.6% for the AF detection task.

Funder

King Abdulaziz City for Science and Technology

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/15/2727/pdf

Reference66 articles.

1. Education and the COVID-19 pandemic

2. Moving from Face-to-Face to Remote Instruction in a Higher Education Institution during a Pandemic: Multiple Case Studies

3. The Pedagogy-Technology Interface in Computer Assisted Pronunciation Training

4. Computer-Assisted Pronunciation Training (CAPT): Current Issues and Future Directions

5. Improving English Phoneme Pronunciation with Automatic Speech Recognition Using Voice Chatbot;Cheng;Proceedings of the International Conference on Technology in Education,2020

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. 2D Spectrogram analysis using vision transformer to detect mispronounced Arabic utterances for children;Applied Soft Computing;2024-11

2. Anomaly detection with a variational autoencoder for Arabic mispronunciation detection;International Journal of Speech Technology;2024-06

3. Empirical Study on Mispronunciation Detection for Tajweed Rules during Quran Recitation;2024 6th International Conference on Computing and Informatics (ICCI);2024-03-06

4. Mispronunciation detection and diagnosis using deep neural networks: a systematic review;Multimedia Tools and Applications;2024-01-09

5. A novel framework for mispronunciation detection of Arabic phonemes using audio-oriented transformer models;Applied Acoustics;2024-01