Word frequency and text complexity: an eye-tracking study of young Russian readers


Laposhina Antonina N.ORCID,Lebedeva Maria Yu.ORCID,Berlin Khenis Alexandra A.ORCID


Although word frequency is often associated with the cognitive load on the reader and is widely used for automated text complexity assessment, to date, no eye-tracking data have been obtained on the effectiveness of this parameter for text complexity prediction for the Russian primary school readers. Besides, the optimal ways for taking into account the frequency of individual words to assess an entire text complexity have not yet been precisely determined. This article aims to fill these gaps. The study was conducted on a sample of 53 children of primary school age. As a stimulus material, we used 6 texts that differ in the classical Flesch readability formula and data on the frequency of words in texts. As sources of the frequency data, we used the common frequency dictionary based on the material of the Russian National Corpus and DetCorpus - the corpus of literature addressed to children. The speed of reading the text aloud in words per minute averaged over the grades was employed as a measure of the text complexity. The best predictive results of the relative reading time were obtained using the lemma frequency data from the DetCorpus. At the text level, the highest correlation with the reading speed was shown by the text coverage with a list of 5,000 most frequent words, while both sources of the lists - Russian National Corpus and DetCorpus - showed almost the same correlation values. For a more detailed analysis, we also calculated the correlation of the frequency parameters of specific word forms and lemmas with three parameters of oculomotor activity: the dwell time, fixations count, and the average duration of fixations. At the word-by-word level, the lemma frequency by DetCorpus demonstrated the highest correlation with the relative reading time. The results we obtained confirm the feasibility of using frequency data in the text complexity assessment task for primary school children and demonstrate the optimal ways to calculate frequency data.


Peoples' Friendship University of Russia


Linguistics and Language,Language and Linguistics

Reference37 articles.

1. Иомдин Б.Л., Морозов Д.А. Кто поймет «Незнайку»? Автоматическое определение сложности текстов для детей // Русская речь. 2021. № 5. С. 55-68. [Iomdin, Boris L. & Dmitry A. Morozov. 2021. Who can understand “Dunno”? Automatic assessment of text complexity in children’s literature. Russian Speech 5. 55-68 (In Russ.)]. https://doi.org/10.31857/S013161170017239-1

2. Корнеев А.А., Ахутина Т.В., Матвеева Е.Ю. Особенности чтения третьеклассников с разным уровнем развития навыка: анализ движений глаз // Вестник Московского университета. Серия 14. Психология. 2019. № 2. С. 64-87. [Korneev, Aleksei A., Tatiana V. Akhutina & Ekaterina Yu. Matveeva. 2019. Reading in third graders with different state of the skill: An eye-tracking study. Vestnik Moskovskogo Universiteta. Seriya 14. Psikhologiya 2. 64-87. (In Russ.)]. https://doi.org/10.11621/vsp.2019.02.64

3. Криони Н.К., Никин А.Д., Филиппова А.В. Автоматизированная система анализа сложности учебных текстов // Вестник Уфимского государственного авиационного технического университета. 2008. № 11 (1). С. 101-107. [Krioni, Nikolai K., Aleksei D. Nikin & Anastasia V. Filippova. 2008. Automated system for analyzing the complexity of educational texts. Bulletin of the Ufa State Aviation Technical University 11(1). 101-107. (In Russ.)].

4. Лапошина А.Н., Веселовская Т.С., Лебедева М.Ю., Купрещенко О.Ф. Лексический состав текстов учебников русского языка для младшей школы: корпусное исследование // Компьютерная лингвистика и интеллектуальные технологии: по материалам международной конференции «Диалог 2019». 2019. T. 18 (25). С. 351-363. [Laposhina, Antonina N., Тatiana S. Veselovskaya, Maria U. Lebedeva & Olga F. Kupreshchenko. 2019. Lexical analysis of the Russian language textbooks for primary school: Corpus study. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue 2019”18. 351-363. (In Russ.)].

5. Мартынова Е.В., Солнышкина М.И., Мерзлякова А.Ф., Гизатулина Д.Ю. Лексические параметры учебного текста (на материале текстов учебного корпуса русского языка) // Филология и культура. 2020. № 3 (61). С. 72-80. [Martynova, Ekaterina V., Marina I. Solnyshkina, Amina F. Merzlyakova & Diana Yu. Gizatulina. 2020. Lexical parameters of the academic text (based on the texts of the academic corpus of the Russian language). Philology and Culture 3. 72-80. (In Russ.)]. https://doi.org/10.26907/2074-0239-2020-61-3-72-80

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3