Biases in Large Language Models: Origins, Inventory, and Discussion-Reference-Cited by-同舟云学术

Biases in Large Language Models: Origins, Inventory, and Discussion

Published:2023-06-22 Issue:2 Volume:15 Page:1-21
ISSN:1936-1955
Container-title:Journal of Data and Information Quality
language:en
Short-container-title:J. Data and Information Quality

Author:

Navigli Roberto¹^ORCID,Conia Simone¹^ORCID,Ross Björn²^ORCID

Affiliation:

1. Sapienza University of Rome, Italy

2. University of Edinburgh, United Kingdom

Abstract

In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3597307

Reference132 articles.

1. Persistent Anti-Muslim Bias in Large Language Models

2. Mitigating Language-Dependent Ethnic Bias in BERT

3. Julia Angwin Jeff Larson Lauren Kirchner and Surya Mattu. 2016. Machine bias. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

4. The Wage Effects of Sexual Orientation Discrimination

5. Data and algorithmic bias in the web

Cited by 44 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. “HIV Stigma Exists” — Exploring ChatGPT’s HIV Advice by Race and Ethnicity, Sexual Orientation, and Gender Identity;Journal of Racial and Ethnic Health Disparities;2024-09-11

2. AI Safety and Security;Advances in Computational Intelligence and Robotics;2024-08-30

3. Ethical ChatGPT: Concerns, Challenges, and Commandments;Electronics;2024-08-28

4. Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models;Artificial Intelligence Review;2024-08-10

5. IndicDialogue: A dataset of subtitles in 10 Indic languages for Indic language modeling;Data in Brief;2024-08