Author:
Abbas Rasha Hassan,Kareem Firas Abdul Elah Abdul
Abstract
People illustrate the world, convey stories, share ideas, and interconnect in over 6900 languages. Information on the Internet may appear unlimited. All over history, electrical and computer experts have built tools such as telephone, telegraph and internet router, which have helped people communicate. Computer software that can translate between languages stands for one of such tools. The first step of translating a text is to categorize its language. In this research, self-identification program of text language was designed and tested depending on text letters (frequency, self-information, and entropy of certain chosen letters) for the English, French and German languages. The research, trying to detect the original language, is successful of detecting these languages, after applied to randomly selected text files. The detection program was written using C++ programming language.
Publisher
Southwest Jiaotong University
Reference22 articles.
1. HUTCHINS, W.J. (2000) Early Years in Machine Translation: Memoirs and biographies of pioneers. John Benjamins Publishing Co.
2. HUTCHINS, J. (1997) First Steps in Mechanical Translation, pp. 14-23, https://pdfs.semanticscholar.org/8241/607f07ca47751bf89e5f173158999d07f8a6.pdf
3. NIRENBURG, S., SOMERS, H., and WILKS, Y. (2003) Readings in Machine Translation. Massachusetts Institute of Technology.
4. SAINI, S., KASLIWAL, B., and BHATIA, S. (2013) Language Identification Using G-LDA. International Journal of Research in Engineering and Technology, 02(11), pp 42-45.
5. LUI, M., LAU, J.H., and BALDWIN, T. (2014) Automatic Detection and Language Identification of Multilingual Documents. Transactions of the Association for Computational Linguistics, 2, pp 27-40.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献