Using the Veil of Ignorance to align AI systems with principles of justice-Reference-Cited by-同舟云学术

Using the Veil of Ignorance to align AI systems with principles of justice

Published:2023-04-24 Issue:18 Volume:120 Page:
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc. Natl. Acad. Sci. U.S.A.

Author:

Weidinger Laura¹,McKee Kevin R.¹^ORCID,Everett Richard¹,Huang Saffron¹,Zhu Tina O.¹,Chadwick Martin J.¹,Summerfield Christopher¹²^ORCID,Gabriel Iason¹

Affiliation:

1. DeepMind, London N1C 4DN, United Kingdom

2. Department of Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom

Abstract

The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies ( N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Link

https://pnas.org/doi/pdf/10.1073/pnas.2213709120

Reference75 articles.

1. P. Lin, Why Ethics Matters for Autonomous Cars in Autonomous Driving (Springer, 2016), pp. 69–85.

2. Recommender systems and their ethical challenges

3. J. Stray Building human values into recommender systems: An interdisciplinary synthesis. arXiv [Preprint] (2022). http://arxiv.org/abs/2207.10192. (Accessed 7 April 2021).

4. A method for integrating ethics into the design of robots

5. S. Russell, Human Compatible: Artificial Intelligence and the Problem of Control (Penguin, 2019).

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. STELA: a community-centred approach to norm elicitation for AI alignment;Scientific Reports;2024-03-19

2. Engaging engineering teams through moral imagination: a bottom-up approach for responsible innovation and ethical culture change in technology companies;AI and Ethics;2023-12-19

3. AIdeal: Sentience and Ideology;Journal of Social Computing;2023-12

4. Persons and Personalization on Digital Platforms;Advances in Human and Social Aspects of Technology;2023-10-16

5. Scaffolding cooperation in human groups with deep reinforcement learning;Nature Human Behaviour;2023-09-07