A generating model for Finnish nominal inflection using distributional semantics

Author:

Nikolaev Alexandre1ORCID,Chuang Yu-Ying2ORCID,Baayen R. Harald2

Affiliation:

1. University of Eastern Finland

2. University of Tübingen

Abstract

Abstract Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’ meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’ additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%. Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features, and interactions of inflectional features.

Publisher

John Benjamins Publishing Company

Subject

Cognitive Neuroscience,Linguistics and Language,Language and Linguistics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3