Using lexical language models to detect borrowings in monolingual wordlists

Author:

Miller John E.ORCID,Tresoldi TiagoORCID,Zariquiey RobertoORCID,Beltrán Castañón César A.,Morozova NataliaORCID,List Johann-MattisORCID

Abstract

Lexical borrowing, the transfer of words from one language to another, is one of the most frequent processes in language evolution. In order to detect borrowings, linguists make use of various strategies, combining evidence from various sources. Despite the increasing popularity of computational approaches in comparative linguistics, automated approaches to lexical borrowing detection are still in their infancy, disregarding many aspects of the evidence that is routinely considered by human experts. One example for this kind of evidence are phonological and phonotactic clues that are especially useful for the detection of recent borrowings that have not yet been adapted to the structure of their recipient languages. In this study, we test how these clues can be exploited in automated frameworks for borrowing detection. By modeling phonology and phonotactics with the support of Support Vector Machines, Markov models, and recurrent neural networks, we propose a framework for the supervised detection of borrowings in mono-lingual wordlists. Based on a substantially revised dataset in which lexical borrowings have been thoroughly annotated for 41 different languages from different families, featuring a large typological diversity, we use these models to conduct a series of experiments to investigate their performance in mono-lingual borrowing detection. While the general results appear largely unsatisfying at a first glance, further tests show that the performance of our models improves with increasing amounts of attested borrowings and in those cases where most borrowings were introduced by one donor language alone. Our results show that phonological and phonotactic clues derived from monolingual language data alone are often not sufficient to detect borrowings when using them in isolation. Based on our detailed findings, however, we express hope that they could prove to be useful in integrated approaches that take multi-lingual information into account.

Funder

Pontificia Universidad Católica del Perú

European Research Council

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference57 articles.

1. List JM. Automatic detection of borrowing (Open problems in computational diversity linguistics 2); 2019. Web blog at: http://phylonetworks.blogspot.com/2019/03/automatic-detection-of-borrowing-open.html.

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Open Problems in Computational Historical Linguistics;Open Research Europe;2023-11-20

2. COVID-19, kovhidhi, dzihwamupengo: Language use, language change, and pandemic perceptions among Shona-speakers in Gweru, Zimbabwe;The African Journal of Information and Communication (AJIC);2023-06-30

3. Loanword identification based on web resources: A case study on wikipedia;Computer Speech & Language;2023-06

4. Improving the Robustness of Loanword Identification in Social Media Texts;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-03-24

5. Evolutionary Aspects of Language Change;Synthese Library;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3