Affiliation:
1. Department of Information Science and Engineering, BMS College of Engineering, Bangalore 560019, India, and Affiliated to
Abstract
Visvesvaraya Technological University, Belagavi, Karnataka, India One of the promising resources to extract dictionaries are said to be parallel corpora. Majority of the substantial works are based on parallel corpora, whereas for the resource scarce language pairs building a parallel
corpus is a challenging task. To prevail over this issue, researchers found comparable corpora could be an alternative to extract dictionary. Proposed approach is to extract dictionary for a low resource language pair English and Kannada using comparable corpora obtained from Wikipedia dumps
and corpus received from Indian Language Corpus Initiative (ILCI). Dictionary constructed comprises of both translation and transliteration entities with term level associations from English to Kannada. Resultant dictionary is of size 77545 tokens with precision score of 0.79. Proposed work
is independent of language and could be expanded to other language pairs.
Publisher
American Scientific Publishers
Subject
Electrical and Electronic Engineering,Computational Mathematics,Condensed Matter Physics,General Materials Science,General Chemistry
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献