Languages with more speakers tend to be harder to (machine-)learn-Reference-Cited by-同舟云学术

Languages with more speakers tend to be harder to (machine-)learn

Published:2023-10-28 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Koplenig Alexander,Wolfer Sascha

Abstract

AbstractComputational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.

Funder

Leibniz-Institut für Deutsche Sprache (IDS)

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-45373-z.pdf

Reference129 articles.

1. Nettle, D. Social scale and structural complexity in human languages. Philos. Trans. R. Soc. B Biol. Sci. 367, 1829–1836 (2012).

2. Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. Trends Cogn. Sci. 20, 649–660 (2016).

3. Wells, R. Archiving and language typology. Int. J. Am. Linguist. 20, 101–107 (1954).

4. Hockett, C. F. A Course in Modern Linguistics (Collier-Macmillan, 1958).

5. Trudgill, P. Accent, Dialect and the School (Edward Arnold, 1975).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Information-Theoretic Approach to Morphosyntactic Complexity in English, Dutch and German;Journal of Quantitative Linguistics;2024-07-10