Author:
Baeza-Yates Ricardo,Murgai Leena
Abstract
AbstractBias is everywhere, sometimes blatantly explicit, but most of the time it’s hidden, as it often arises from that which is missing, the gaps in our knowledge or data. In this chapter, we cover what bias is and its different sources: how it arises, persists, feeds back into a system, and can be amplified through algorithms. To exemplify the problem, we use the Web, the largest information repository created by humankind. The first countermeasure against bias is awareness – to understand what is represented—so that we may identify what is not. So, we systematically explore a wide variety of biases which originate at different points on the Web’s information production and consumption cycle. Today, many if not all the predictive algorithms we interact with online rely on vast amounts of data harvested from the Web. Biased data will of course lead to biased algorithms, but those biases need not be replicated precisely. Without intervention, typically they are amplified. We start with engagement bias, that is, the difference in rates at which people produce content versus passively consume it. We then move onto data bias: who is producing data on the Web, in what language, and the associated measurement and cultural biases. Algorithmic bias and fairness are intertwined. We discuss the difficulty in defining fairness and provide examples of algorithmic bias in predictive systems. Lastly, we look at biases in user interactions. We discuss how position bias can be mitigated by distributing visuals across results and shared information about other users can lead to different social biases. We discuss how biases continually feed back into the Web and grow through content creation and diffusion.
Publisher
Springer Nature Switzerland
Reference87 articles.
1. ACM Tech Policy Council. (2022). Statement on responsible algorithmic systems. 26 October 2022. https://www.acm.org/binaries/content/assets/public-policy/final-joint-ai-statement-update.pdf
2. Agarwal, D., Chen, B.-C., & Elango, P. (2009). Explore/exploit schemes for web content optimization. In Proceedings of the Ninth IEEE International Conference on Data Mining. IEEE Computer Society.
3. Almánzar, A. R., Edinger-Schons, L. M., & Grüning, D. J. (2023). Persuading programmers to detect and mitigate bias in technology design: The role of motivational appeals and the speaker. PsyArXiv. https://doi.org/10.31234/osf.io/jbxeq
4. Artificial Intelligence Index Report. (2021). Diversity in AI. https://aiindex.stanford.edu/wp-content/uploads/2021/03/2021-AI-Index-Report-_Chapter-6.pdf.
5. Baeza-Yates, R. (2015). Incremental sampling of query logs. Industry track. In Proceedings of the 38th ACM SIGIR Conference (pp. 1093–1096).