Affiliation:
1. University of Milano-Bicocca. a.devarda@campus.unimib.it
2. University of Milano-Bicocca. m.marelli@unimib.it
Abstract
Abstract
Massively multilingual models such as mBERT and XLM-R are increasingly valued in Natural Language Processing research and applications, due to their ability to tackle the uneven distribution of resources available for different languages. The models’ ability to process multiple languages relying on a shared set of parameters raises the question of whether the grammatical knowledge they extracted during pre-training can be considered as a data-driven cross-lingual grammar. The present work studies the inner workings of mBERT and XLM-R in order to test the cross-lingual consistency of the individual neural units that respond to a precise syntactic phenomenon, that is, number agreement, in five languages (English, German, French, Hebrew, Russian). We found that there is a significant overlap in the latent dimensions that encode agreement across the languages we considered. This overlap is larger (a) for long- vis-à-vis short-distance agreement and (b) when considering XLM-R as compared to mBERT, and peaks in the intermediate layers of the network. We further show that a small set of syntax-sensitive neurons can capture agreement violations across languages; however, their contribution is not decisive in agreement processing.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Reference62 articles.
1. The bilingual brain as revealed by functional neuroimaging;Abutalebi;Bilingualism: Language and Cognition,2001
2. Understanding intermediate layers using linear classifier probes;Alain;arXiv preprint arXiv:1610.01644,2016
3. On the pitfalls of analyzing individual neurons in language models;Antverg,2021
4. Does BERT agree? Evaluating knowledge of structure dependence through agreement relations;Bacon;arXiv preprint arXiv:1908.09892,2019
5. Identifying and controlling important neurons in neural machine translation;Bau,2018
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献