Affiliation:
1. Keldysh Institute of Applied Mathematics
2. National Research University “Higher School of Economics”
Abstract
During our previous research, we found that the grammatical ambiguity of most frequent words of European languages has a different distribution in comparison with less frequent ones. In the current research, we investigate in more details the reasons of such a phenomenon; we pay a special attention to the first thousand of most frequent tokens. Our investigation of modern disambiguation systems demonstrated that the increase of language diversity, we had found for most frequent words, leads to increase of number of mistakes made by those systems..
Publisher
Keldysh Institute of Applied Mathematics
Reference44 articles.
1. Löbner S. Understanding Semantics (2nd ed.) // Routledge. 2013. 392 c.
2. Большакова Е. И., Пескова О. В., Клышинский Э. С., Носков А. А., Ландэ Д. В., Ягунова Е. В. Автоматическая обработка текстов на естественном языке и компьютерная лингвистика // М.: МИЭМ. 2011. 272 с.
3. Fabricz K. Particle homonymy and machine translation // IIn Proc. of International Conference on Computational Linguistics. 1986. С. 59-61.
4. Krovetz R. Homonymy and polysemy in information retrieval. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. 1997. С. 72-79.
5. Litowski Kenneth C. Desiderata for tagging with WordNet synsets or MCAA categories. ACL-SIGLEX Workshop “Tagging Text with Lexical Semantics: Why, What, and How?” April 4-5, 1997, Washington, D.C., USA. 1997. С. 12-17.