A large quantitative analysis of written language challenges the idea that all languages are equally complex-Reference-Cited by-同舟云学术

A large quantitative analysis of written language challenges the idea that all languages are equally complex

Published:2023-09-16 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Koplenig Alexander,Wolfer Sascha,Meyer Peter

Abstract

AbstractOne of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.

Funder

Leibniz-Institut für Deutsche Sprache (IDS)

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-42327-3.pdf

Reference121 articles.

1. Nowak, M. A. Evolutionary biology of language. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 355, 1615–1622 (2000).

2. Sampson, G. A linguistic axiom challenged. In Language Complexity as an Evolving Variable (eds Sampson, G. et al.) 1–18 (Oxford University Press, 2009).

3. Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. TRENDS Cogn. Sci. 20, 649–660 (2016).

4. Dediu, D. et al. Cultural evolution of language. In Cultural Evolution (eds Richerson, P. J. & Christiansen, M. H.) 303–332 (The MIT Press, 2013). https://doi.org/10.7551/mitpress/9780262019750.003.0016.

5. Coupé, C., Oh, Y. M., Dediu, D. & Pellegrino, F. Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Sci. Adv. 5, eaaw2594 (2019).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Languages with more speakers tend to be harder to (machine-)learn;Scientific Reports;2023-10-28