End-to-end Jordanian dialect speech-to-text self-supervised learning framework-Reference-Cited by-同舟云学术

End-to-end Jordanian dialect speech-to-text self-supervised learning framework

Published:2022-12-22 Issue: Volume:9 Page:
ISSN:2296-9144
Container-title:Frontiers in Robotics and AI
language:
Short-container-title:Front. Robot. AI

Author:

Safieh Ali A.,Alhaol Ibrahim Abu,Ghnemat Rawan

Abstract

Speech-to-text engines are extremely needed nowadays for different applications, representing an essential enabler in human–robot interaction. Still, some languages suffer from the lack of labeled speech data, especially in the Arabic dialects or any low-resource languages. The need for a self-supervised training process and self-training using noisy training is proven to be one of the up-and-coming feasible solutions. This article proposes an end-to-end, transformers-based model with a framework for low-resource languages. In addition, the framework incorporates customized audio-to-text processing algorithms to achieve a highly efficient Jordanian Arabic dialect speech-to-text system. The proposed framework enables ingesting data from many sources, making the ground truth from external sources possible by speeding up the manual annotation process. The framework allows the training process using noisy student training and self-supervised learning to utilize the unlabeled data in both pre- and post-training stages and incorporate multiple types of data augmentation. The proposed self-training approach outperforms the fine-tuned Wav2Vec model by 5% in terms of word error rate reduction. The outcome of this work provides the research community with a Jordanian-spoken data set along with an end-to-end approach to deal with low-resource languages. This is done by utilizing the power of the pretraining, post-training, and injecting noisy labeled and augmented data with minimal human intervention. It enables the development of new applications in the field of Arabic language speech-to-text area like the question-answering systems and intelligent control systems, and it will add human-like perception and hearing sensors to intelligent robots.

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Computer Science Applications

Reference30 articles.

1. MASC: Massive Arabic Speech Corpus;Al-Fetyani,2021

2. Deep speech 2: End-to-end speech recognition in English and Mandarin;Amodei,2016

3. wav2vec 2.0: A framework for self-supervised learning of speech representations;Baevski;Adv. Neural Inf. Process. Syst.,2020

4. A curriculum learning method for improved noise robustness in automatic speech recognition;Braun,2017

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable Artificial Intelligence (XAI) for Deep Learning Based Medical Imaging Classification;Journal of Imaging;2023-08-30