Author:
Bochkarev Vladimir V.,Shevlyakova Anna V.
Abstract
Abstract
Several methods for detection changes in words semantics and appearance of new word meanings have been suggested. These methods use different techniques of estimating semantic distance between words. They are based both on neural network vector models and on simpler vector representations that use frequencies of n-grams including the studied words. This paper proposes a method for calculation the confidence interval of the semantic distance estimations obtained based on the frequency data of n-grams extracted from the large diachronic corpus. This task is complicated because the question about the law of distribution of frequency fluctuations of words and n-grams, despite a number of studies, remains open. The confidence intervals are calculated by statistic modeling using random permutations of n-gram frequencies. To test the proposed method, estimation of semantic distance between two Russian synonyms is used as an example.
Subject
General Physics and Astronomy
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献