Applying artificial intelligence methods for solving problems of searching for semantic associates: case of toponym Moskva

Author:

Borovsky Andrei Viktorovich1,Rakovskaya Elena Evgenievna1

Affiliation:

1. Baikal State University

Abstract

Actual problems of toponymy imply the study of individual words in order to restore the conceptual meaning of geographical names lost, to find out how they reflected the characteristic features of the terrain, the type of ac-tivity of the people inhabiting it, etc. The purpose of the study is to determine the origin of the toponym Moskva by using artificial intelligence methods. The GeoWAC fastText embedding model based on the corpus of Russian-language texts of the RusVecteres service is used to calculate semantic similarity between words. The model assumes defining the semantic associates of toponyms by using the vector representation of words in the semantic space and finding the lexical vectors most closely located to the vector of the original word. To analyze a toponym there is applied a methods of semantic associates, a cluster analysis, a combined method based on the method of transformation of a word with a lost meaning and the analysis of semantic associates for a set of word transformants. The method is formalized by using a model that determines the similarity of the studied word and associates based on different versions of the model for one or more text corpora. The associated words obtained by the artificial intelligence are considered as a semantic cluster, and the calculated cosine similarity between vectors is considered as a measure of the similarity of elements in the cluster. To identify various hypotheses of the origin of the toponym Moskva there has been carried out a cluster analysis of the totality of the first ten vector associates for all transformants of this word. As a result, four hypotheses were advanced: “a famous man”, “firearms”, “beekeeping”, “blood-sucking insects”. The probabilities of the occurrence of these hypotheses are based on the study of the frequency of words in the corpus of the language. The main hypothesis is a “famous person”.

Publisher

Astrakhan State Technical University

Reference25 articles.

1. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // International Conference on Learning Representations. Scottsdale, 2013. URL: https://arxiv.org/abs/1301.3781 (дата обращения: 12.09.2021)., Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // International Conference on Learning Representations. Scottsdale, 2013. URL: https://arxiv.org/abs/1301.3781 (data obrascheniya: 12.09.2021).

2. Goldberg Y., Levy O. Word2vec Explained: Deriving Mikolov et al.'s Negative-sampling Word-Embedding Method // ArXiv. 2014. URL: https://arxiv.org/abs/1402.3722 (дата обращения: 12.09.2021)., Goldberg Y., Levy O. Word2vec Explained: Deriving Mikolov et al.'s Negative-sampling Word-Embedding Method // ArXiv. 2014. URL: https://arxiv.org/abs/1402.3722 (data obrascheniya: 12.09.2021).

3. Боровский А. В., Раковская Е. Е. Исследование топонимов Иркутской области с применением методов искусственного интеллекта // Изв. Байкал. гос. ун-та. 2021. Т. 32. № 3. С. 382–390., Borovskiy A. V., Rakovskaya E. E. Issledovanie toponimov Irkutskoy oblasti s primeneniem metodov iskusstvennogo intellekta // Izv. Baykal. gos. un-ta. 2021. T. 32. № 3. S. 382–390.

4. Bojanowski P., Grave E., Joulin A., Mikolov T. En-riching word vectors with subword information // Transac-tions of the Association for Computational Linguistics. 2017. V. 5. N. 1. P. 135–146., Bojanowski P., Grave E., Joulin A., Mikolov T. En-riching word vectors with subword information // Transac-tions of the Association for Computational Linguistics. 2017. V. 5. N. 1. P. 135–146.

5. RusVectōrēs: семантические модели для русского языка. URL: https://rusvectores.org/ (дата обращения 12.09.2021)., RusVectōrēs: semanticheskie modeli dlya russkogo yazyka. URL: https://rusvectores.org/ (data obrascheniya 12.09.2021).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3