Affiliation:
1. University of California, Berkeley, USA
Abstract
As natural language processing tools powered by big data become increasingly ubiquitous, questions of how to design, develop, and manage these tools and their impacts on diverse populations are pressing. We propose utilizing the concept of linguistic justice—the realization of equitable access to social and political life regardless of language—to provide a framework for examining natural language processing tools that learn from and use human language data. To support linguistic justice, we argue that natural language processing tools (along with the datasets that are used to train and evaluate them) must be examined not only from the perspective of a privileged, majority language user, but also from the perspectives of minoritized language users. Considering such perspectives can help to surface areas in which the data used within natural language processing tools may be (often inadvertently) working against linguistic justice by failing to provide access to information, services, or opportunities in users’ language of choice, underperforming for certain linguistic groups, or advancing harmful stereotypes that can lead to negative life outcomes for members of marginalized groups. At the same time, this framework can help to illuminate ways that these shortcomings can be addressed and allow us to use inclusive language data and approaches to leverage natural language processing technologies that advance linguistic justice.
Funder
The Center for Equity, Gender, and Leadership at Berkeley Haas School of Business
Subject
Library and Information Sciences,Information Systems and Management,Computer Science Applications,Communication,Information Systems
Reference27 articles.
1. Critical Hip-Hop Language Pedagogies: Combat, Consciousness, and the Cultural Politics of Communication
2. Linguistic Justice
3. Barthel M, Stocking G, Holcomb J, et al. (2016) Reddit news users more likely to be male, young and digital in their news preferences. Pew Research Center. Available at: https://www.pewresearch.org/journalism/2016/02/25/reddit-news-users-more-likely-to-be-male-young-and-digital-in-their-news-preferences/.
4. Bender E, Gebru T, McMillan-Major A, et al. (2021) On the dangers of Stochastic Parrots: can language models be too big? In: FAccT ’21, Virtual Event, 2021, pp. 610–623.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献