Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language

Author:

Bengono Obiang Saint Germes B.12ORCID,Tsopze Norbert32ORCID,Melatagia Yonta Paulin42ORCID,Bonastre Jean-Francois56ORCID,Jiménez Tania5ORCID

Affiliation:

1. University of Yaounde I, Yaounde, Cameroon

2. Sorbonne Université - IRD - UMMISCO - F-93143, Bondy France

3. Faculty of Sciences, Computer Science, University of Yaounde I, Yaounde Cameroon

4. Computer Science, University of Yaounde I Faculty of Sciences, Yaounde Cameroon

5. Avignon Universite Laboratoire Informatique d'Avignon, Avignon, France

6. Defense and Security dept, Inria, Paris France

Abstract

Many sub-Saharan African languages are categorized as tone languages and for the most part, they are classified as low resource languages due to the limited resources and tools available to process these languages. Identifying the tone associated with a syllable is therefore a key challenge for speech recognition in these languages. We propose models that automate the recognition of tones in continuous speech that can easily be incorporated into a speech recognition pipeline for these languages. We have investigated different neural architectures as well as several features extraction algorithms in speech (Filter banks, LEAF, Cestrogram, MFCC). In the context of low-resource languages, we also evaluated Wav2vec models for this task. In this work, we use a public speech recognition dataset on Yoruba. As for the results, using the combination of features obtained from CS (Cestrogram) and FB (Filters Bank), we obtain a minimum TER (Tone Error Rate) of 19.54% while the evaluations of the models using Wav2vec 2.0, we have a TER of 17.72% demonstrating that the use of Wav2vec 2.0 provides better performance than the models used in the literature for tone identification on low-resource languages.

Publisher

Association for Computing Machinery (ACM)

Reference31 articles.

1. Oliver Adams, Trevor Cohn, Graham Neubig, and Alexis Michaud. 2017. Phonemic Transcription of Low-Resource Tonal Languages. In Proceedings of the Australasian Language Technology Association Workshop 2017. Brisbane, Australia, 53–60. https://aclanthology.org/U17-1006

2. Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol.  33. Curran Associates, Inc., 12449–12460. https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf

3. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline

4. Malgorzata Ćavar, Damir Ćavar, and Hilaria Cruz. 2016. Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, 4004–4011. https://aclanthology.org/L16-1632

5. Charles Chen Razvan C. Bunescu Li Xu and Chang Liu. 2016. Tone Classification in Mandarin Chinese Using Convolutional Neural Networks. In Interspeech.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3