Author:
Kirk-Giannini Cameron Domenico
Abstract
AbstractYum (2024) argues that the widespread adoption of language agent architectures would likely increase the risk posed by AI by simplifying the process of aligning artificial systems with human values and thereby making it easier for malicious actors to use them to cause a variety of harms. Yum takes this to be an example of a broader phenomenon: progress on the alignment problem is likely to be net safety-negative because it makes artificial systems easier for malicious actors to control. I offer some reasons for skepticism about this surprising and pessimistic conclusion.
Publisher
Springer Science and Business Media LLC
Reference6 articles.
1. Bales, A., D’Alessandro, W., & Kirk-Giannini, C. D. (2024). Artificial intelligence: Arguments for catastrophic risk. Philosophy Compass, 19(2), e12964.
2. Carlsmith, J. (2021). Is power-seeking AI an existential risk? arXiv Preprint:
3. Goldstein, S., & Kirk-Giannini, C. D. (2023a). AI wellbeing. PhilPapers Preprint:
4. Goldstein, S., & Kirk-Giannini, C. D. (2023b). Language agents reduce the risk of existential catastrophe. AI & Society. Online First.
5. Tubert, A., & Tiehen, J. (2024). Existentialist risk and value misalignment. Philosophical Studies. Online First.