Affiliation:
1. Sapienza University of Rome, Italy
2. University of Edinburgh, United Kingdom
Abstract
In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Reference132 articles.
1. Persistent Anti-Muslim Bias in Large Language Models
2. Mitigating Language-Dependent Ethnic Bias in BERT
3. Julia Angwin Jeff Larson Lauren Kirchner and Surya Mattu. 2016. Machine bias. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
4. The Wage Effects of Sexual Orientation Discrimination
5. Data and algorithmic bias in the web
Cited by
44 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献