The Alignment Problem: Machine Learning and Human Values

Author:

Christian Brian

Abstract

THE ALIGNMENT PROBLEM: Machine Learning and Human Values by Brian Christian. New York: W. W. Norton, 2020. 344 pages. Hardcover; $28.95. ISBN: 9780393635829. *The global conversation about artificial intelligence (AI) is increasingly polemic--"AI will change the world!" "AI will ruin the world!" Amidst the strife, Brian Christian's work stands out. It is thoughtful, nuanced, and, at times, even poetic. Coming on the heels of his two other bestsellers, The Most Human Human and Algorithms to Live By, this meticulously researched recounting of the last decade of research into AI safety provides a broad perspective of the field and its future. *The "alignment problem" in the title refers to the disconnect between what AI does and what we want it to do. In Christian's words, it is the disconnect between "machine learning and human values." This disconnect has been the subject of intense research in recent years, as both companies and academics continually discover that AIs inherit the mistakes and biases of their creators. *For example, we train AIs that predict recidivism rates of convicted criminals in hopes of crafting more accurate sentences. However, the AIs produce racially biased outcomes. Or, we train AIs which map words into mathematical spaces. These AIs can perform mathematical "computations" on words, such as "king - man + woman = queen" and "Paris - France + Italy = Rome." But they also say that "doctor - man + woman = nurse" and "computer programmer - man + woman = homemaker." These examples of racial and gender bias are some of the numerous ways that human bias appears inside the supposedly impartial tools we have created. *As Norbert Wiener, a famous mathematician in the mid-twentieth century, put it, "We had better be sure the purpose put into the machine is the purpose which we really desire" (p. 312). The discoveries of the last ten years have shocked researchers into realizing that our machines have purposes we never intended. Christian's message is clear: these mistakes must be fixed before those machines become a fixed part of our everyday lives. *The book is divided into three main sections. The first, Prophecy, provides a historical overview of how researchers uncovered the AI biases that are now well known. It traces the origins of how AI models ended up in the public sphere and the history of how people have tried to solve the problems AI creates. Perhaps one of the most interesting anecdotes in this section is about how researchers try to create explainable models to comply with GDPR requirements. *The second section, Agency, explores the alignment problem in the context of reinforcement learning. Reinforcement learning involves teaching computer "agents" (aka AIs) to perform certain tasks using complex reward systems. Time and time again, the reward systems that researchers create have unintended side effects, and Christian recounts numerous humorous examples of this. He explains in simple terms why it is so difficult to correctly motivate the behaviors we wish to see in others (both humans and machines), and what it might take to create machines which are truly curious. This section feels a bit long. Christian dives deeply into the research of a few specific labs and appears to lose his logical thread in the weeds of research. Eventually, he emerges. *The final section, Normativity, provides perspective on current efforts to understand and fix the alignment problem. Its subchapters, "Imitation," "Inference," and "Uncertainty," reference different qualities that human researchers struggle to instill in machines. Imitating correct behaviors while ignoring bad ones is hard, as is getting a machine to perform correctly on data it hasn't seen before. Finally, teaching a model (and humans reading its results) to correctly interpret uncertainty is an active area of research with no concrete solutions. *After spending over three hundred pages recounting the pitfalls of AI and the difficulties of realigning models with human values, Christian ends on a hopeful note. He postulates that the issues discovered in machine-learning models illuminate societal issues that might otherwise be ignored. "Unfair pretrial detection models, for one thing, shine a spotlight on upstream inequities. Biased language models give us, among other things, a way to measure the state of our discourse and offer us a benchmark against which to try to improve and better ourselves ... In seeing a kind of mind at work as it digests and reacts to the world, we will learn something both about the world and also, perhaps, about minds" (p. 328). *As a Christ-follower, I believe the biases found in AI are both terrible and unsurprising. Humans are imperfect creators. While researchers' efforts to fix biases and shortcomings in AI systems are important and worthwhile, they can never exorcise fallen human nature from AI. Christian's conclusions about AI pointing to biases in humans comes close to this idea but avoids taking an overtly theological stance. *This book is well worth reading for those who wish to better understand the limitations of AI and current efforts to fix them. It weaves together history, mathematics, ethics, and philosophy, while remaining accessible to a broad audience through smooth explanations of detailed concepts. You don't need to be an AI expert (or even familiar with AI at all) to appreciate this book's insights. *After you're done reading it, recommend this book to the next person who tells you, with absolute certainty, that AI will either save or ruin the world. Christian's book provides a much-needed dose of sanity and perspective amidst the hype. *Reviewed by Emily Wenger, graduate student in the Department of Computer Science, University of Chicago, Chicago, IL 60637.

Publisher

American Scientific Affiliation, Inc.

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3