Affiliation:
1. Microsoft Research India
2. Indian Institute of Technology Bombay
Abstract
Machine transliteration is an important problem in an increasingly multilingual world, as it plays a critical role in many downstream applications, such as machine translation or crosslingual information retrieval systems. In this article, we propose compositional machine transliteration systems, where multiple transliteration components may be composed either to improve existing transliteration quality, or to enable transliteration functionality between languages even when no direct parallel names corpora exist between them. Specifically, we propose two distinct forms of composition: serial and parallel. Serial compositional system chains individual transliteration components, say, X → Y and Y → Z systems, to provide transliteration functionality, X → Z. In parallel composition evidence from multiple transliteration paths between X → Z are aggregated for improving the quality of a direct system. We demonstrate the functionality and performance benefits of the compositional methodology using a state-of-the-art machine transliteration framework in English and a set of Indian languages, namely, Hindi, Marathi, and Kannada. Finally, we underscore the utility and practicality of our compositional approach by showing that a CLIR system integrated with compositional transliteration systems performs consistently on par with, and sometimes better than, that integrated with a direct transliteration system.
Publisher
Association for Computing Machinery (ACM)
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Different Models of Transliteration - A Comprehensive Review;2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA);2023-03-14
2. Study of machine transliteration for cross language retrieval;MACHINE LEARNING AND INFORMATION PROCESSING: PROCEEDINGS OF ICMLIP 2023;2023
3. Hindi title generation using rule-based approach;APPLIED DATA SCIENCE AND SMART SYSTEMS;2023
4. A Bilingual Machine Transliteration System for Sanskrit-English Using Rule-Based Approach;2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST);2022-12-09
5. A Review on Transliterated Text Retrieval for Indian Languages;Proceedings of International Conference on Computational Intelligence;2022-10-04