Diachronic Semantic Tracking for Chinese Words and Morphemes over Centuries
-
Published:2024-04-30
Issue:9
Volume:13
Page:1728
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Chi Yang1, Giunchiglia Fausto123ORCID, Xu Hao12
Affiliation:
1. School of Artificial Intelligence, Jilin University, Changchun 130012, China 2. College of Computer Science and Technology, Jilin University, Changchun 130012, China 3. Department of Computer Science and Information Engineering (DISI), University of Trento, 38123 Trento, Italy
Abstract
Lexical semantic changes spanning centuries can reveal the complicated developing process of language and social culture. In recent years, natural language processing (NLP) methods have been applied in this field to provide insight into the diachronic frequency change for word senses from large-scale historical corpus, for instance, analyzing which senses appear, increase, or decrease at which times. However, there is still a lack of Chinese diachronic corpus and dataset in this field to support supervised learning and text mining, and at the method level, few existing works analyze the Chinese semantic changes at the level of morpheme. This paper constructs a diachronic Chinese dataset for semantic tracking applications spanning 3000 years and extends the existing framework to the level of Chinese characters and morphemes, which contains four main steps of contextual sense representation, sense identification, morpheme sense mining, and diachronic semantic change representation. The experiment shows the effectiveness of our method in each step. Finally, in an interesting statistic, we discover the strong positive correlation of frequency and changing trend between monosyllabic word sense and the corresponding morpheme.
Funder
Paleography and Chinese Civilization Inheritance and Development Program Collaborative Innovation Platform
Reference30 articles.
1. Kutuzov, A., Øvrelid, L., Szymanski, T., and Velldal, E. (2018, January 20–26). Diachronic word embeddings and semantic shifts: A survey. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA. 2. Devlin, J., Chang, M.-W., Lee, K., Google, K.T., and Language, A.I. (2019, January 2). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA. 3. Hu, R., Li, S., and Liang, S. (2020, January 5–10). Diachronic sense modeling with deep contextualized word embeddings: An ecological view. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Online. 4. Giulianelli, M., Del Tredici, M., and Fernández, R. (2020, January 5–10). Analysing lexical semantic change with contextualised word representations. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Online. 5. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2–4). Efficient estimation of word representations in vector space. Proceedings of the 1st International Conference on Learning Representations, Scottsdale, AZ, USA.
|
|