Author:
Khalilia Hadi,Bella Gábor,Freihat Abed Alhakim,Darma Shandy,Giunchiglia Fausto
Abstract
Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.
Reference55 articles.
1. On the evaluation and improvement of Arabic WordNet coverage and usability;Abouenour;Lang. Resour. Eval.,2013
2. “One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia,”;Aji,2022
3. “Arabic languages, variation in,”;Al-Wer;Concise Encyclopedia of Languages of the World,2008
4. “A cross-linguistic database of phonetic transcription systems,”;Anderson,2018
5. “Bhāṣācitra visualising the dialect geography of South Asia,”;Arora,2021
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献