Transliteration for Resource-Scarce Languages-Reference-Cited by-同舟云学术

Transliteration for Resource-Scarce Languages

Published:2010-12 Issue:4 Volume:9 Page:1-30
ISSN:1530-0226
Container-title:ACM Transactions on Asian Language Information Processing
language:en
Short-container-title:ACM Transactions on Asian Language Information Processing

Author:

Chinnakotla Manoj K.¹,Damani Om P.¹,Satoskar Avijit¹

Affiliation:

1. Indian Institute of Technology Bombay

Abstract

Today, parallel corpus-based systems dominate the transliteration landscape. But the resource-scarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modeling (CSM), which requires only monolingual resources. We present the results of our rule-based system for Hindi to English, English to Hindi, and Persian to English transliteration tasks. We also perform extrinsic evaluation of transliteration systems in the context of Cross Lingual Information Retrieval. Another important contribution of our work is to explain the widely varying accuracy numbers reported in transliteration literature, in terms of the entropy of the language pairs and the datasets involved.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1838751.1838753

Reference45 articles.

1. Statistical transliteration for english-arabic cross language information retrieval

2. Translating named entities using monolingual and bilingual resources

3. Algorithms for Arabic name transliteration

4. A hybrid back-transliteration system for Japanese

5. Data Compression Using Adaptive Coding and Partial String Matching

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Moroccan Arabizi-to-Arabic conversion using rule-based transliteration and weighted Levenshtein algorithm;Scientific African;2024-03

2. Phonetic-Based Forward Online Transliteration Tool from English to Tamil Language;International Journal of Reliability, Quality and Safety Engineering;2023-04-19

3. Transliterating Latin to Amharic scripts using user-defined rules and character mappings;International Journal on Digital Libraries;2023-03

4. Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model;ACM Transactions on Asian and Low-Resource Language Information Processing;2022-12-27

5. ISM@FIRE-2014;Proceedings of the Forum for Information Retrieval Evaluation on - FIRE '14;2015