Affiliation:
1. PES University, Bangalore, Karnataka, India
Abstract
Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and agglutinative morphologies, inducing good-quality crosslingual embeddings becomes challenging due to the problem of complex morphological forms and rare words. This is true even for languages that share common linguistic structure. In our work, we have shown that performing a simple morphological segmentation upon the corpora prior to the generation of crosslingual word embeddings for both roots and suffixes greatly improves the prediction quality and captures semantic similarities more effectively. To exhibit this, we have chosen two related languages: Telugu and Kannada of the Dravidian language family. We have also tested our method upon a widely spoken North Indian language, Hindi, belonging to the Indo-European language family, and have observed encouraging results.
Publisher
Association for Computing Machinery (ACM)
Reference43 articles.
1. D. Bahdanau K. Cho and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. D. Bahdanau K. Cho and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献