Current cases of AI misalignment and their implications for future risks-Reference-Cited by-同舟云学术

Current cases of AI misalignment and their implications for future risks

Published:2023-10-26 Issue:5 Volume:202 Page:
ISSN:1573-0964
Container-title:Synthese
language:en
Short-container-title:Synthese

Author:

Dung Leonard^ORCID

Abstract

AbstractHow can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem. Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.

Funder

Friedrich-Alexander-Universität Erlangen-Nürnberg

Publisher

Springer Science and Business Media LLC

Subject

General Social Sciences,Philosophy

Link

https://link.springer.com/content/pdf/10.1007/s11229-023-04367-0.pdf

Reference71 articles.

1. Armstrong, S., Bostrom, N., & Shulman, C. (2016). Racing to the precipice: A model of artificial intelligence development. AI & SOCIETY, 31(2), 201–206. https://doi.org/10.1007/s00146-015-0590-y.

2. Arrhenius, G., Bykvist, K., Campbell, T., & Finneron-Burns, E. (Eds.). (2022). The Oxford Handbook of Population Ethics (1st ed.). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190907686.001.0001.

3. Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., & Kaplan, J. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback (arXiv:2204.05862). arXiv. https://doi.org/10.48550/arXiv.2204.05862.

4. Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., & Mordatch, I. (2020). Emergent Tool Use From Multi-Agent Autocurricula (arXiv:1909.07528). arXiv. https://doi.org/10.48550/arXiv.1909.07528.

5. Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., Biderman, S., & Steinhardt, J. (2023). Eliciting Latent Predictions from Transformers with the Tuned Lens (arXiv:2303.08112). arXiv. http://arxiv.org/abs/2303.08112.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Language Agents and Malevolent Design;Philosophy & Technology;2024-08-17

2. Dimensions of disagreement: Divergence and misalignment in cognitive science and artificial intelligence.;Decision;2024-08-15

3. AGI crimes? The role of criminal law in mitigating existential risks posed by artificial general intelligence;AI & SOCIETY;2024-08-06

4. Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem;AI and Ethics;2024-05-27

5. The argument for near-term human disempowerment through AI;AI & SOCIETY;2024-04-14