Low-Resource Machine Transliteration Using Recurrent Neural Networks

Author:

Le Ngoc Tan1,Sadat Fatiha2,Menard Lucie3,Dinh Dien4

Affiliation:

1. Universite du Quebec a Montreal, Faculty of IT, Montreal, Montreal, Quebec, Canada

2. Universite du Quebec a Montreal, Faculty of IT, Montreal, Quebec, Canada

3. Universite du Quebec a Montreal, Faculty of Linguistics, Quebec, Canada

4. University of Sciences, Knowledge Engineering, Ho Chi Minh, Vietnam

Abstract

Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are particularly useful. These models are based on initial alignments between grapheme source and phoneme target sequences. Inspired by sequence-to-sequence recurrent neural network--based translation methods, the current research presents an approach that applies an alignment representation for input sequences and pretrained source and target embeddings to overcome the transliteration problem for a low-resource languages pair. Evaluation and experiments involving French and Vietnamese showed that with only a small bilingual pronunciation dictionary available for training the transliteration models, promising results were obtained with a large increase in BLEU scores and a reduction in Translation Error Rate (TER) and Phoneme Error Rate (PER). Moreover, we compared our proposed neural network--based transliteration approach with a statistical one.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Improving neural machine translation by integrating transliteration for low-resource English–Assamese language;Natural Language Processing;2024-05-27

2. A Study of Word Embedding Models for Machine Translation of North Eastern Languages;Communications in Computer and Information Science;2023-11-30

3. Speech-to-speech Low-resource Translation;2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI);2023-08

4. Translating the List of Participants in the 2020 Tokyo Olympic Games into Japanese;Journal of Natural Language Processing;2023

5. A Review on Transliterated Text Retrieval for Indian Languages;Proceedings of International Conference on Computational Intelligence;2022-10-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3