Philosophical Investigations into AI Alignment: A Wittgensteinian Framework-Reference-Cited by-同舟云学术

Philosophical Investigations into AI Alignment: A Wittgensteinian Framework

Published:2024-07-01 Issue:3 Volume:37 Page:
ISSN:2210-5433
Container-title:Philosophy & Technology
language:en
Short-container-title:Philos. Technol.

Author:

Pérez-Escobar José Antonio^ORCID,Sarikaya Deniz^ORCID

Abstract

AbstractWe argue that the later Wittgenstein’s philosophy of language and mathematics, substantially focused on rule-following, is relevant to understand and improve on the Artificial Intelligence (AI) alignment problem: his discussions on the categories that influence alignment between humans can inform about the categories that should be controlled to improve on the alignment problem when creating large data sets to be used by supervised and unsupervised learning algorithms, as well as when introducing hard coded guardrails for AI models. We cast these considerations in a model of human–human and human–machine alignment and sketch basic alignment strategies based on these categories and further reflections on rule-following like the notion of meaning as use. To sustain the validity of these considerations, we also show that successful techniques employed by AI safety researchers to better align new AI systems with our human goals are congruent with the stipulations that we derive from the later Wittgenstein’s philosophy. However, their application may benefit from the added specificities and stipulations of our framework: it extends on the current efforts and provides further, specific AI alignment techniques. Thus, we argue that the categories of the model and the core alignment strategies presented in this work can inform further AI alignment techniques.

Funder

University of Geneva

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13347-024-00761-9.pdf

Reference67 articles.

1. Andrus, M., Dean, S., Gilbert, T. K., Lambert, N., & Zick, T. (2021). AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks. Retrieved from https://arxiv.org/abs/2102.04255

2. Antony, L. (2016). Bias: Friend or foe? Reflections on Saulish skepticism. Implicit Bias and Philosophy, 1, 157–190.

3. Arnold, Z., & Toner, H. (2021). AI Accidents: An Emerging Threat. Retrieved from https://cset.georgetown.edu/publication/ai-accidents-an-emerging-threat/

4. Asper, M. (2009). The two cultures of mathematics in Ancient Greece. In E. Robson & J. Stedall (Eds.), The Oxford Handbook of the History of Mathematics (pp. 107–132). Oxford: Oxford University Press.

5. Awad, E., Dsouza, S., Kim, R., et al. (2018). The Moral Machine experiment. Nature, 563, 59–64.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning in Society: Prospects, Risks, and Benefits;Philosophy & Technology;2024-07-31