Attributive Collocations in the Gold Standard of Russian Collocability and Their Representation in Dictionaries and Corpora

Author:

Khokhlova Maria V.,

Abstract

The article discusses how collocations are represented in Russian dictionaries and how information about them can be covered in a collocation database that is being developed. Such a resource (gold standard) can be in demand when developing applications for teaching or learning Russian as a foreign language and solving other theoretical and applied issues. The aim of the study was twofold: firstly, to analyze how explanatory and specialized dictionaries of the Russian language represent collocations and hence to what extent their data coincide with each other, and, secondly, to investigate how these dictionary collocations are reflected in text corpora. This allows tracing the relation between manually collected data and modern corpora. For the study, the author used the disambiguated subcorpus and the main corpus of the Russian National Corpus (RNC) with a volume of 6 million and 321 million words, respectively, as well as the large Internet corpus ruTenTen with a volume of more than 14.5 billion words. The author considered attributive phrases built according to the “adjective/participle + noun” model. She analyzed 120 collocations with different dictionary index, i.e. the number of dictionaries in which this phrase is given. The following hypothesis was tested: high collocation frequencies correspond to the fact that the item is recorded in several dictionaries. In the analysis, nonparametric analogues of analysis of variance (Friedman and Kruskal-Wallis tests) were used to assess the statistical significance of differences in quantitative data. The frequencies of collocations in corpora of different volume and in different dictionaries were compared. In total, more than 15 thousand examples were processed, less than 0.5% of them were presented in four of the six reviewed dictionaries (five printed and one electronic). The results show data heterogeneity, items selected for a dictionary do not coincide with their frequency characteristics and thus word combinations turn out to be low-frequency. About 34% of the examples are absent in the RNC corpus with removed ambiguity, and about 12% of analyzed collocations are rare (less than 0.01 ipm) even in the ruTenTen corpus. The presence of collocations in several dictionaries indicates their higher frequencies and hence reproducibility in speech. Explanatory dictionaries and collocation dictionaries show the smallest intersection of data. The results show that the amount of data is a crucial issue, and the very phenomenon of collocability should be studied on large corpora.

Publisher

Tomsk State University

Subject

Linguistics and Language,Language and Linguistics

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3