The shutdown problem: an AI engineering puzzle for decision theorists-Reference-Cited by-同舟云学术

The shutdown problem: an AI engineering puzzle for decision theorists

Published:2024-06-19 Issue: Volume: Page:
ISSN:0031-8116
Container-title:Philosophical Studies
language:en
Short-container-title:Philos Stud

Author:

Thornley Elliott^ORCID

Abstract

AbstractI explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Funder

Center for AI Safety

Forethought Foundation

AI Alignment Awards

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11098-024-02153-3.pdf

Reference51 articles.

1. Adaptive Agent Team, Bauer, J., Baumli, K., Baveja, S., Behbahani, F., Bhoopchand, A., Bradley-Schmieg, N., et al. (2023). Human-timescale adaptation in an open-ended task space. arXiv. https://doi.org/10.48550/arXiv.2301.07608

2. Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., et al. (2022). ‘Do as I can, not as I say: Grounding language in robotic affordances. arXiv. https://doi.org/10.48550/arXiv.2204.01691

3. Ahn, M., Dwibedi, D., Finn, C., Arenas, M. G., Gopalakrishnan, K., Hausman, K., Ichter, B., et al. (2024). AutoRT: Embodied foundation models for large scale orchestration of robotic agents. https://auto-rt.github.io/static/pdf/AutoRT.pdf

4. Armstrong, S. (2015). ‘Motivated value selection for artificial agents’. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. https://www.fhi.ox.ac.uk/wp-content/uploads/2015/03/Armstrong_AAAI_2015_Motivated_Value_Selection.pdf

5. Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22, 71. https://doi.org/10.1007/s11023-012-9281-3