Abstract
AbstractFor years, the number of opaque algorithmic decision-making systems (ADM systems) with a large impact on society has been increasing: e.g., systems that compute decisions about future recidivism of criminals, credit worthiness, or the many small decision computing systems within social networks that create rankings, provide recommendations, or filter content. Concerns that such a system makes biased decisions can be difficult to investigate: be it by people affected, NGOs, stakeholders, governmental testing and auditing authorities, or other external parties. Scientific testing and auditing literature rarely focuses on the specific needs for such investigations and suffers from ambiguous terminologies. With this paper, we aim to support this investigation process by collecting, explaining, and categorizing methods of testing for bias, which are applicable to black-box systems, given that inputs and respective outputs can be observed. For this purpose, we provide a taxonomy that can be used to select suitable test methods adapted to the respective situation. This taxonomy takes multiple aspects into account, for example the effort to implement a given test method, its technical requirement (such as the need of ground truth) and social constraints of the investigation, e.g., the protection of business secrets. Furthermore, we analyze which test method can be used in the context of which black box audit concept. It turns out that various factors, such as the type of black box audit or the lack of an oracle, may limit the selection of applicable tests. With the help of this paper, people or organizations who want to test an ADM system for bias can identify which test methods and auditing concepts are applicable and what implications they entail.
Funder
Volkswagen Foundation
Bundesministerium für Bildung und Forschung
Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Publisher
Springer Science and Business Media LLC
Reference99 articles.
1. Altenbockum, J. V. (2011). NRW verliert seinen letzten Frauenbuchladen. boersenblatt.net.
2. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias—there’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica.
3. Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning: Limitations and opportunities. MIT.
4. Barr, E. T., Harman, M., McMinn, P., Shahbaz, M., & Yoo, S. (2014). The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering, 41(5), 507–525.
5. Binns, R. (2020). On the apparent conflict between individual and group fairness. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. FAT* ’20 (pp. 514–524). Association for Computing Machinery. https://doi.org/10.1145/3351095.3372864