Abstract
The statistics of the frequency distribution of consonant letters in the main modern languages of the Indo-European family are collected. The distributions of descending frequencies were studied, based on the analysis of literary texts with a length of about 1 million characters. It is shown that it is possible to introduce an invariant of language groups – Germanic, Romance, Slavic and Baltic – as the distance between the elements of the group in the L1 norm. The threshold distance at which languages are grouped as fully connected subgraphs is 0.14. It is also shown that the structures of the graph of near and far neighbors correspond to the model of dependent random variables.
Publisher
Keldysh Institute of Applied Mathematics
Reference10 articles.
1. Рассел С., Норвиг П. Искусственный интеллект. Современный подход. – М.: Вильямс. 2007. – 1480 с.
2. Маннинг К., Рагхаван П., Шютце Х. Введение в информационный поиск. – М.: Вильямс, 2011. – 528 с.
3. Hovy E., Lavid Ju. Towards a science of corpus annotation // International Journal of Translation, 2010. V. 22. No 1. P. 1-25,
4. Арутюнов А.А., Борисов Л.А., Зенюк Д.А., Ивченко А.Ю., Кирина-Лилинская Е.П., Орлов Ю.Н., Осминин К.П., Федоров С.Л., Шилин С.А. Статистические закономерности европейских языков и анализ рукописи Войнича // Препринты ИПМ им. М.В. Келдыша, 2016. № 52, 36 с. https://doi.org/10.20948/prepr-2016-52 https://library.keldysh.ru/preprint.asp?id=2016-52
5. Яглом А.М., Яглом И.М. Вероятность и информация. – М.: КомКнига, 2007. – 512 с.