Evaluation of word embedding models used for diachronic semantic change analysis-Reference-Cited by-同舟云学术

Evaluation of word embedding models used for diachronic semantic change analysis

Published:2024-02-01 Issue:1 Volume:2701 Page:012082
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Maslennikova Yulia,Bochkarev Vladimir

Abstract

Abstract In the last decade, the quantitative analysis of diachronic changes in language and lexical semantic changes have become the subject of active research. A significant role was played by the development of new effective techniques of word embedding. This direction has been effectively demonstrated in a number of studies. Some of them have focused on the analysis of the optimal type of word2vec models, hyperparameters for training, and evaluation techniques. In this research, we used Corpus of Historical American English (COHA). The paper demonstrates the results of multiple training runs and the comparison of word2vec models with different variations of hyperparameters used for lexical semantic change detection. In addition to traditional word similarities and analogical reasoning tests, we used testing on an extended set of synonyms. We have evaluated word2vec models on the set of more than 100,000 English synsets that were randomly selected from the WordNet database. We have shown that changing the word2vec model parameters (such as a dimension of word embedding, a size of context window, a type of model, a word discard rate etc.) can significantly impact on the resulting word embedding vector space and the detected lexical semantic changes. Additionally, the results strongly depended on properties of the corpus, such as word frequency distribution.

Publisher

IOP Publishing

Link

https://iopscience.iop.org/article/10.1088/1742-6596/2701/1/012082/pdf

Reference21 articles.

1. Evaluation methods for unsupervised word embeddings;Schnabel;Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing,2015

2. Evaluating Word Embeddings Using a Representative Suite of Practical Tasks;Nayak;Proc. of the 1st Workshop on Evaluating Vector Space Representations for NLP (Berlin),2016

3. Placing Search in Context: The Concept Revisited;Finkelstein;ACM Transactions on Information Systems,2002

4. EVALution1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models;Santus;Proc. of the 4th Workshop on Linked Data in Linguistics (LDL-2015) (Beijing, China),2015

5. How to Train Good Word Embeddings for Biomedical NLP;Chiu;Proc. of the 15th Workshop on Biomedical Natural Language Processing (Berlin),2016