Abstract
The article examines the correlation of two indices characterizing the level of linguistic or semantic complexity of the book content. The first index is the age rating in accordance with the Russian Age Rating System for information products. The second index is the ease of understanding of the text, calculated based on the common readability metrics. The author compares the values of readability metrics for texts with different age rating scores. The experiments were carried out on the collection of 5,516 book previews collected by the author of the article. The previews used are freely available in electronic libraries, and they have age rating scores obtained from their publishers. In accordance with the system adopted in the Russian Federation, age rating scores characterize the book’s targeting to the following age categories: 0+, 6+, 12+, 16+, and 18+. In most cases, the size of the book preview is 10% of the full text, which makes it possible to calculate readability indices. The collected texts were scored according to five commonly used readability metrics: Flash-Kincaid Index, Coleman-Liau Index, ARI Index, SMOG Index, and Dale-Chell Formula. As a result of the readability assessment for the texts of each age category, the author obtained recommended levels of education necessary for their understanding. The obtained values were averaged within the age category and analyzed. The results of the experiments allow asserting that in most cases there is a direct relationship between the age rating score of the book and the expected level of education required to understand it. Moreover, readability scores in accordance with all the considered metrics are directly proportional to age rating scores for age categories from 0+ to 16+. The readability scores of books in the 18+ category roughly correspond to children’s literature, which is apparently explained by the genre characteristics of the books marked by the 18+ label. First of all, the results obtained indicate the adequacy of the existing approach to assessing the book age rating in terms of attributing the text to the target audience by age. Secondly, the relationship between readability indices and age rating scores allow using the values of readability metrics as text features in various computational linguistics tasks aimed at text addressee prediction.
Subject
Library and Information Sciences,Media Technology,Visual Arts and Performing Arts,Communication,Information Systems