Human Compatible

[Cover for Human Compatible] ALSO BY STUART RUSSELL The Use of Knowledge in Analogy and Induction (1989) Do the Right Thing: Studies in Limited Rationality (with Eric Wefald, 1991) Artificial Intelligence: A Modern Approach (with Peter Norvig, 1995, 2003, 2010, 2019) [Book title, Human Compatible, Subtitle, Artificial Intelligence and the Problem of Control, author, Stuart Russell, imprint, Viking] An imprint of Penguin Random House LLC penguinrandomhouse.com Copyright © 2019 by Stuart Russell Penguin supports copyright. Copyright fuels creativity, encourages diverse voices, promotes free speech, and creates a vibrant culture. Thank you for buying an authorized edition of this book and for complying with copyright laws by not reproducing, scanning, or distributing any part of it in any form without permission. You are supporting writers and allowing Penguin to continue to publish books for every reader. ISBN 9780525558613 (hardcover) ISBN 9780525558620 (ebook) Version_1 For Loy, Gordon, Lucy, George, and Isaac CONTENTS Also by Stuart Russell Title Page Copyright Dedication Chapter 1. IF WE SUCCEED Chapter 2. INTELLIGENCE IN HUMANS AND MACHINES Chapter 3. HOW MIGHT AI PROGRESS IN THE FUTURE? Chapter 4. MISUSES OF AI Chapter 5. OVERLY INTELLIGENT AI Chapter 6. THE NOT-SO-GREAT AI DEBATE Chapter 7. AI: A DIFFERENT APPROACH Chapter 8. PROVABLY BENEFICIAL AI Chapter 9. COMPLICATIONS: US Chapter 10. PROBLEM SOLVED? Appendix A. SEARCHING FOR SOLUTIONS Appendix B. KNOWLEDGE AND LOGIC Appendix C. UNCERTAINTY AND PROBABILITY Appendix D. LEARNING FROM EXPERIENCE Acknowledgments Image Credits About the Author Why This Book? Why Now? This book is about the past, present, and future of our attempt to understand and create intelligence. This matters, not because AI is rapidly becoming a pervasive aspect of the present but because it is the dominant technology of the future. The world’s great powers are waking up to this fact, and the world’s largest corporations have known it for some time. We cannot predict exactly how the technology will develop or on what timeline. Nevertheless, we must plan for the possibility that machines will far exceed the human capacity for decision making in the real world. What then? Everything civilization has to offer is the product of our intelligence; gaining access to considerably greater intelligence would be the biggest event in human history. The purpose of the book is to explain why it might be the last event in human history and how to make sure that it is not. Overview of the Book The book has three parts. The first part (Chapters 1 to 3) explores the idea of intelligence in humans and in machines. The material requires no technical background, but for those who are interested, it is supplemented by four appendices that explain some of the core concepts underlying present-day AI systems. The second part (Chapters 4 to 6) discusses some problems arising from imbuing machines with intelligence. I focus in particular on the problem of control: retaining absolute power over machines that are more powerful than us. The third part (Chapters 7 to 10) suggests a new way to think about AI and to ensure that machines remain beneficial to humans, forever. The book is intended for a general audience but will, I hope, be of value in convincing specialists in artificial intelligence to rethink their fundamental assumptions. IF WE SUCCEED A long time ago, my parents lived in Birmingham, England, in a house near the university. They decided to move out of the city and sold the house to David Lodge, a professor of English literature. Lodge was by that time already a well-known novelist. I never met him, but I decided to read some of his books: Changing Places and Small World. Among the principal characters were fictional academics moving from a fictional version of Birmingham to a fictional version of Berkeley, California. As I was an actual academic from the actual Birmingham who had just moved to the actual Berkeley, it seemed that someone in the Department of Coincidences was telling me to pay attention. One particular scene from Small World struck me: The protagonist, an aspiring literary theorist, attends a major international conference and asks a panel of leading figures, “What follows if everyone agrees with you?” The question causes consternation, because the panelists had been more concerned with intellectual combat than ascertaining truth or attaining understanding. It occurred to me then that an analogous question could be asked of the leading figures in AI: “What if you succeed?” The field’s goal had always been to create human-level or superhuman AI, but there was little or no consideration of what would happen if we did. A few years later, Peter Norvig and I began work on a new AI textbook, whose first edition appeared in 1995.¹ The book’s final section is titled “What If We Do Succeed?” The section points to the possibility of good and bad outcomes but reaches no firm conclusions. By the time of the third edition in 2010, many people had finally begun to consider the possibility that superhuman AI might not be a good thing—but these people were mostly outsiders rather than mainstream AI researchers. By 2013, I became convinced that the issue not only belonged in the mainstream but was possibly the most important question facing humanity. In November 2013, I gave a talk at the Dulwich Picture Gallery, a venerable art museum in south London. The audience consisted mostly of retired people—nonscientists with a general interest in intellectual matters—so I had to give a completely nontechnical talk. It seemed an appropriate venue to try out my ideas in public for the first time. After explaining what AI was about, I nominated five candidates for “biggest event in the future of humanity”:

  1. We all die (asteroid impact, climate catastrophe, pandemic, etc.).
  2. We all live forever (medical solution to aging).
  3. We invent faster-than-light travel and conquer the universe.
  4. We are visited by a superior alien civilization.
  5. We invent superintelligent AI. I suggested that the fifth candidate, superintelligent AI, would be the winner, because it would help us avoid physical catastrophes and achieve eternal life and faster-than-light travel, if those were indeed possible. It would represent a huge leap—a discontinuity—in our civilization. The arrival of superintelligent AI is in many ways analogous to the arrival of a superior alien civilization but much more likely to occur. Perhaps most important, AI, unlike aliens, is something over which we have some say. Then I asked the audience to imagine what would happen if we received notice from a superior alien civilization that they would arrive on Earth in thirty to fifty years. The word pandemonium doesn’t begin to describe it. Yet our response to the anticipated arrival of superintelligent AI has been . . . well, underwhelming begins to describe it. (In a later talk, I illustrated this in the form of the email exchange shown in figure 1.) Finally, I explained the significance of superintelligent AI as follows: “Success would be the biggest event in human history . . . and perhaps the last event in human history.” From: Superior Alien Civilization sac12@sirius.canismajor.u To: humanity@UN.org Subject: Contact Be warned: we shall arrive in 30–50 years From: humanity@UN.org To: Superior Alien Civilization sac12@sirius.canismajor.u Subject: Out of office: Re: Contact Humanity is currently out of the office. We will respond to your message when we return. ☺ FIGURE 1: Probably not the email exchange that would follow the first contact by a superior alien civilization. A few months later, in April 2014, I was at a conference in Iceland and got a call from National Public Radio asking if they could interview me about the movie Transcendence, which had just been released in the United States. Although I had read the plot summaries and reviews, I hadn’t seen it because I was living in Paris at the time, and it would not be released there until June. It so happened, however, that I had just added a detour to Boston on the way home from Iceland, so that I could participate in a Defense Department meeting. So, after arriving at Boston’s Logan Airport, I took a taxi to the nearest theater showing the movie. I sat in the second row and watched as a Berkeley AI professor, played by Johnny Depp, was gunned down by anti-AI activists worried about, yes, superintelligent AI. Involuntarily, I shrank down in my seat. (Another call from the Department of Coincidences?) Before Johnny Depp’s character dies, his mind is uploaded to a quantum supercomputer and quickly outruns human capabilities, threatening to take over the world. On April 19, 2014, a review of Transcendence, co-authored with physicists Max Tegmark, Frank Wilczek, and Stephen Hawking, appeared in the Huffington Post. It included the sentence from my Dulwich talk about the biggest event in human history. From then on, I would be publicly committed to the view that my own field of research posed a potential risk to my own species. How Did We Get Here? The roots of AI stretch far back into antiquity, but its “official” beginning was in 1956. Two young mathematicians, John McCarthy and Marvin Minsky, had persuaded Claude Shannon, already famous as the inventor of information theory, and Nathaniel Rochester, the designer of IBM’s first commercial computer, to join them in organizing a summer program at Dartmouth College. The goal was stated as follows: The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. Needless to say, it took much longer than a summer: we are still working on all these problems. In the first decade or so after the Dartmouth meeting, AI had several major successes, including Alan Robinson’s algorithm for general-purpose logical reasoning² and Arthur Samuel’s checker-playing program, which taught itself to beat its creator.³ The first AI bubble burst in the late 1960s, when early efforts at machine learning and machine translation failed to live up to expectations. A report commissioned by the UK government in 1973 concluded, “In no part of the field have the discoveries made so far produced the major impact that was then promised.”⁴ In other words, the machines just weren’t smart enough. My eleven-year-old self was, fortunately, unaware of this report. Two years later, when I was given a Sinclair Cambridge Programmable calculator, I just wanted to make it intelligent. With a maximum program size of thirty-six keystrokes, however, the Sinclair was not quite big enough for human-level AI. Undeterred, I gained access to the giant CDC 6600 supercomputer⁵ at Imperial College London and wrote a chess program—a stack of punched cards two feet high. It wasn’t very good, but it didn’t matter. I knew what I wanted to do. By the mid-1980s, I had become a professor at Berkeley, and AI was experiencing a huge revival thanks to the commercial potential of so-called expert systems. The second AI bubble burst when these systems proved to be inadequate for many of the tasks to which they were applied. Again, the machines just weren’t smart enough. An AI winter ensued. My own AI course at Berkeley, currently bursting with over nine hundred students, had just twenty-five students in 1990. The AI community learned its lesson: smarter, obviously, was better, but we would have to do our homework to make that happen. The field became far more mathematical. Connections were made to the long-established disciplines of probability, statistics, and control theory. The seeds of today’s progress were sown during that AI winter, including early work on large-scale probabilistic reasoning systems and what later became known as deep learning. Beginning around 2011, deep learning techniques began to produce dramatic advances in speech recognition, visual object recognition, and machine translation—three of the most important open problems in the field. By some measures, machines now match or exceed human capabilities in these areas. In 2016 and 2017, DeepMind’s AlphaGo defeated Lee Sedol, former world Go champion, and Ke Jie, the current champion—events that some experts predicted wouldn’t happen until 2097, if ever.⁶ Now AI generates front-page media coverage almost every day. Thousands of start-up companies have been created, fueled by a flood of venture funding. Millions of students have taken online AI and machine learning courses, and experts in the area command salaries in the millions of dollars. Investments flowing from venture funds, national governments, and major corporations are in the tens of billions of dollars annually—more money in the last five years than in the entire previous history of the field. Advances that are already in the pipeline, such as self-driving cars and intelligent personal assistants, are likely to have a substantial impact on the world over the next decade or so. The potential economic and social benefits of AI are vast, creating enormous momentum in the AI research enterprise. What Happens Next? Does this rapid rate of progress mean that we are about to be overtaken by machines? No. There are several breakthroughs that have to happen before we have anything resembling machines with superhuman intelligence. Scientific breakthroughs are notoriously hard to predict. To get a sense of just how hard, we can look back at the history of another field with civilization-ending potential: nuclear physics. In the early years of the twentieth century, perhaps no nuclear physicist was more distinguished than Ernest Rutherford, the discoverer of the proton and the “man who split the atom” (figure 2[a]). Like his colleagues, Rutherford had long been aware that atomic nuclei stored immense amounts of energy; yet the prevailing view was that tapping this source of energy was impossible. On September 11, 1933, the British Association for the Advancement of Science held its annual meeting in Leicester. Lord Rutherford addressed the evening session. As he had done several times before, he poured cold water on the prospects for atomic energy: “Anyone who looks for a source of power in the transformation of the atoms is talking moonshine.” Rutherford’s speech was reported in the Times of London the next morning (figure 2[b]). [FIGURE 2: (a) Lord Rutherford, nuclear physicist. (b) Excerpts from a report in the Times of September 12, 1933, concerning a speech given by Rutherford the previous evening. (c) Leo Szilard, nuclear physicist.] Leo Szilard (figure 2[c]), a Hungarian physicist who had recently fled from Nazi Germany, was staying at the Imperial Hotel on Russell Square in London. He read the Times’ report at breakfast. Mulling over what he had read, he went for a walk and invented the neutron-induced nuclear chain reaction.⁷ The problem of liberating nuclear energy went from impossible to essentially solved in less than twenty-four hours. Szilard filed a secret patent for a nuclear reactor the following year. The first patent for a nuclear weapon was issued in France in 1939. The moral of this story is that betting against human ingenuity is foolhardy, particularly when our future is at stake. Within the AI community, a kind of denialism is emerging, even going as far as denying the possibility of success in achieving the long-term goals of AI. It’s as if a bus driver, with all of humanity as passengers, said, “Yes, I am driving as hard as I can towards a cliff, but trust me, we’ll run out of gas before we get there!” I am not saying that success in AI will necessarily happen, and I think it’s quite unlikely that it will happen in the next few years. It seems prudent, nonetheless, to prepare for the eventuality. If all goes well, it would herald a golden age for humanity, but we have to face the fact that we are planning to make entities that are far more powerful than humans. How do we ensure that they never, ever have power over us? To get just an inkling of the fire we’re playing with, consider how content-selection algorithms function on social media. They aren’t particularly intelligent, but they are in a position to affect the entire world because they directly influence billions of people. Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user’s preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they will click on. (Possibly there is a category of articles that die-hard centrists are likely to click on, but it’s not easy to imagine what this category consists of.) Like any rational entity, the algorithm learns how to modify the state of its environment—in this case, the user’s mind—in order to maximize its own reward.⁸ The consequences include the resurgence of fascism, the dissolution of the social contract that underpins democracies around the world, and potentially the end of the European Union and NATO. Not bad for a few lines of code, even if it had a helping hand from some humans. Now imagine what a really intelligent algorithm would be able to do. What Went Wrong? The history of AI has been driven by a single mantra: “The more intelligent the better.” I am convinced that this is a mistake—not because of some vague fear of being superseded but because of the way we have understood intelligence itself. The concept of intelligence is central to who we are—that’s why we call ourselves Homo sapiens, or “wise man.” After more than two thousand years of self-examination, we have arrived at a characterization of intelligence that can be boiled down to this: Humans are intelligent to the extent that our actions can be expected to achieve our objectives. All those other characteristics of intelligence—perceiving, thinking, learning, inventing, and so on—can be understood through their contributions to our ability to act successfully. From the very beginnings of AI, intelligence in machines has been defined in the same way: Machines are intelligent to the extent that their actions can be expected to achieve their objectives. Because machines, unlike humans, have no objectives of their own, we give them objectives to achieve. In other words, we build optimizing machines, we feed objectives into them, and off they go. This general approach is not unique to AI. It recurs throughout the technological and mathematical underpinnings of our society. In the field of control theory, which designs control systems for everything from jumbo jets to insulin pumps, the job of the system is to minimize a cost function that typically measures some deviation from a desired behavior. In the field of economics, mechanisms and policies are designed to maximize the utility of individuals, the welfare of groups, and the profit of corporations.⁹ In operations research, which solves complex logistical and manufacturing problems, a solution maximizes an expected sum of rewards over time. Finally, in statistics, learning algorithms are designed to minimize an expected loss function that defines the cost of making prediction errors. Evidently, this general scheme—which I will call the standard model—is widespread and extremely powerful. Unfortunately, we don’t want machines that are intelligent in this sense. The drawback of the standard model was pointed out in 1960 by Norbert Wiener, a legendary professor at MIT and one of the leading mathematicians of the mid-twentieth century. Wiener had just seen Arthur Samuel’s checker-playing program learn to play checkers far better than its creator. That experience led him to write a prescient but little-known paper, “Some Moral and Technical Consequences of Automation.”¹⁰ Here’s how he states the main point: If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively . . . we had better be quite sure that the purpose put into the machine is the purpose which we really desire. “The purpose put into the machine” is exactly the objective that machines are optimizing in the standard model. If we put the wrong objective into a machine that is more intelligent than us, it will achieve the objective, and we lose. The social-media meltdown I described earlier is just a foretaste of this, resulting from optimizing the wrong objective on a global scale with fairly unintelligent algorithms. In Chapter 5, I spell out some far worse outcomes. All this should come as no great surprise. For thousands of years, we have known the perils of getting exactly what you wish for. In every story where someone is granted three wishes, the third wish is always to undo the first two wishes. In summary, it seems that the march towards superhuman intelligence is unstoppable, but success might be the undoing of the human race. Not all is lost, however. We have to understand where we went wrong and then fix it. Can We Fix It? The problem is right there in the basic definition of AI. We say that machines are intelligent to the extent that their actions can be expected to achieve their objectives, but we have no reliable way to make sure that their objectives are the same as our objectives. What if, instead of allowing machines to pursue their objectives, we insist that they pursue our objectives? Such a machine, if it could be designed, would be not just intelligent but also beneficial to humans. So let’s try this: Machines are beneficial to the extent that their actions can be expected to achieve our objectives. This is probably what we should have done all along. The difficult part, of course, is that our objectives are in us (all eight billion of us, in all our glorious variety) and not in the machines. It is, nonetheless, possible to build machines that are beneficial in exactly this sense. Inevitably, these machines will be uncertain about our objectives—after all, we are uncertain about them ourselves—but it turns out that this is a feature, not a bug (that is, a good thing and not a bad thing). Uncertainty about objectives implies that machines will necessarily defer to humans: they will ask permission, they will accept correction, and they will allow themselves to be switched off. Removing the assumption that machines should have a definite objective means that we will need to tear out and replace part of the foundations of artificial intelligence—the basic definitions of what we are trying to do. That also means rebuilding a great deal of the superstructure—the accumulation of ideas and methods for actually doing AI. The result will be a new relationship between humans and machines, one that I hope will enable us to navigate the next few decades successfully. INTELLIGENCE IN HUMANS AND MACHINES When you arrive at a dead end, it’s a good idea to retrace your steps and work out where you took a wrong turn. I have argued that the standard model of AI, wherein machines optimize a fixed objective supplied by humans, is a dead end. The problem is not that we might fail to do a good job of building AI systems; it’s that we might succeed too well. The very definition of success in AI is wrong. So let’s retrace our steps, all the way to the beginning. Let’s try to understand how our concept of intelligence came about and how it came to be applied to machines. Then we have a chance of coming up with a better definition of what counts as a good AI system. Intelligence How does the universe work? How did life begin? Where are my keys? These are fundamental questions worthy of thought. But who is asking these questions? How am I answering them? How can a handful of matter—the few pounds of pinkish-gray blancmange we call a brain—perceive, understand, predict, and manipulate a world of unimaginable vastness? Before long, the mind turns to examine itself. We have been trying for thousands of years to understand how our minds work. Initially, the purposes included curiosity, self-management, persuasion, and the rather pragmatic goal of analyzing mathematical arguments. Yet every step towards an explanation of how the mind works is also a step towards the creation of the mind’s capabilities in an artifact—that is, a step towards artificial intelligence. Before we can understand how to create intelligence, it helps to understand what it is. The answer is not to be found in IQ tests, or even in Turing tests, but in a simple relationship between what we perceive, what we want, and what we do. Roughly speaking, an entity is intelligent to the extent that what it does is likely to achieve what it wants, given what it has perceived. Evolutionary origins Consider a lowly bacterium, such as E. coli. It is equipped with about half a dozen flagella—long, hairlike tentacles that rotate at the base either clockwise or counterclockwise. (The rotary motor itself is an amazing thing, but that’s another story.) As E. coli floats about in its liquid home—your lower intestine—it alternates between rotating its flagella clockwise, causing it to “tumble” in place, and counterclockwise, causing the flagella to twine together into a kind of propeller so the bacterium swims in a straight line. Thus, E. coli does a sort of random walk—swim, tumble, swim, tumble—that allows it to find and consume glucose rather than staying put and dying of starvation. If this were the whole story, we wouldn’t say that E. coli is particularly intelligent, because its actions would not depend in any way on its environment. It wouldn’t be making any decisions, just executing a fixed behavior that evolution has built into its genes. But this isn’t the whole story. When E. coli senses an increasing concentration of glucose, it swims longer and tumbles less, and it does the opposite when it senses a decreasing concentration of glucose. So, what it does (swim towards glucose) is likely to achieve what it wants (more glucose, let’s assume), given what it has perceived (an increasing glucose concentration). Perhaps you are thinking, “But evolution built this into its genes too! How does that make it intelligent?” This is a dangerous line of reasoning, because evolution built the basic design of your brain into your genes too, and presumably you wouldn’t wish to deny your own intelligence on that basis. The point is that what evolution has built into E. coli’s genes, as it has into yours, is a mechanism whereby the bacterium’s behavior varies according to what it perceives in its environment. Evolution doesn’t know, in advance, where the glucose is going to be or where your keys are, so putting the capability to find them into the organism is the next best thing. Now, E. coli is no intellectual giant. As far as we know, it doesn’t remember where it has been, so if it goes from A to B and finds no glucose, it’s just as likely to go back to A. If we construct an environment where every attractive glucose gradient leads only to a spot of phenol (which is a poison for E. coli), the bacterium will keep following those gradients. It never learns. It has no brain, just a few simple chemical reactions to do the job. A big step forward occurred with action potentials, which are a form of electrical signaling that first evolved in single-celled organisms around a billion years ago. Later multicellular organisms evolved specialized cells called neurons that use electrical action potentials to carry signals rapidly—up to 120 meters per second, or 270 miles per hour—within the organism. The connections between neurons are called synapses. The strength of the synaptic connection dictates how much electrical excitation passes from one neuron to another. By changing the strength of synaptic connections, animals learn.¹ Learning confers a huge evolutionary advantage, because the animal can adapt to a range of circumstances. Learning also speeds up the rate of evolution itself. Initially, neurons were organized into nerve nets, which are distributed throughout the organism and serve to coordinate activities such as eating and digestion or the timed contraction of muscle cells across a wide area. The graceful propulsion of jellyfish is the result of a nerve net. Jellyfish have no brains at all. Brains came later, along with complex sense organs such as eyes and ears. Several hundred million years after jellyfish emerged with their nerve nets, we humans arrived with our big brains—a hundred billion (10¹¹) neurons and a quadrillion (10¹⁵) synapses. While slow compared to electronic circuits, the “cycle time” of a few milliseconds per state change is fast compared to most biological processes. The human brain is often described by its owners as “the most complex object in the universe,” which probably isn’t true but is a good excuse for the fact that we still understand little about how it really works. While we know a great deal about the biochemistry of neurons and synapses and the anatomical structures of the brain, the neural implementation of the cognitive level—learning, knowing, remembering, reasoning, planning, deciding, and so on—is still mostly anyone’s guess.² (Perhaps that will change as we understand more about AI, or as we develop ever more precise tools for measuring brain activity.) So, when one reads in the media that such-and-such AI technique “works just like the human brain,” one may suspect it’s either just someone’s guess or plain fiction. In the area of consciousness, we really do know nothing, so I’m going to say nothing. No one in AI is working on making machines conscious, nor would anyone know where to start, and no behavior has consciousness as a prerequisite. Suppose I give you a program and ask, “Does this present a threat to humanity?” You analyze the code and indeed, when run, the code will form and carry out a plan whose result will be the destruction of the human race, just as a chess program will form and carry out a plan whose result will be the defeat of any human who faces it. Now suppose I tell you that the code, when run, also creates a form of machine consciousness. Will that change your prediction? Not at all. It makes absolutely no difference.³ Your prediction about its behavior is exactly the same, because the prediction is based on the code. All those Hollywood plots about machines mysteriously becoming conscious and hating humans are really missing the point: it’s competence, not consciousness, that matters. There is one important cognitive aspect of the brain that we are beginning to understand—namely, the reward system. This is an internal signaling system, mediated by dopamine, that connects positive and negative stimuli to behavior. Its workings were discovered by the Swedish neuroscientist Nils-Åke Hillarp and his collaborators in the late 1950s. It causes us to seek out positive stimuli, such as sweet-tasting foods, that increase dopamine levels; it makes us avoid negative stimuli, such as hunger and pain, that decrease dopamine levels. In a sense it’s quite similar to E. coli’s glucose-seeking mechanism, but much more complex. It comes with built-in methods for learning, so that our behavior becomes more effective at obtaining reward over time. It also allows for delayed gratification, so that we learn to desire things such as money that provide eventual reward rather than immediate reward. One reason we understand the brain’s reward system is that it resembles the method of reinforcement learning developed in AI, for which we have a very solid theory.⁴ From an evolutionary point of view, we can think of the brain’s reward system, just like E. coli’s glucose-seeking mechanism, as a way of improving evolutionary fitness. Organisms that are more effective in seeking reward—that is, finding delicious food, avoiding pain, engaging in sexual activity, and so on—are more likely to propagate their genes. It is extraordinarily difficult for an organism to decide what actions are most likely, in the long run, to result in successful propagation of its genes, so evolution has made it easier for us by providing built-in signposts. These signposts are not perfect, however. There are ways to obtain reward that probably reduce the likelihood that one’s genes will propagate. For example, taking drugs, drinking vast quantities of sugary carbonated beverages, and playing video games for eighteen hours a day all seem counterproductive in the reproduction stakes. Moreover, if you were given direct electrical access to your reward system, you would probably self-stimulate without stopping until you died.⁵ The misalignment of reward signals and evolutionary fitness doesn’t affect only isolated individuals. On a small island off the coast of Panama lives the pygmy three-toed sloth, which appears to be addicted to a Valium-like substance in its diet of red mangrove leaves and may be going extinct.⁶ Thus, it seems that an entire species can disappear if it finds an ecological niche where it can satisfy its reward system in a maladaptive way. Barring these kinds of accidental failures, however, learning to maximize reward in natural environments will usually improve one’s chances for propagating one’s genes and for surviving environmental changes. Evolutionary accelerator Learning is good for more than surviving and prospering. It also speeds up evolution. How could this be? After all, learning doesn’t change one’s DNA, and evolution is all about changing DNA over generations. The connection between learning and evolution was proposed in 1896 by the American psychologist James Baldwin⁷ and independently by the British ethologist Conwy Lloyd Morgan⁸ but not generally accepted at the time. The Baldwin effect, as it is now known, can be understood by imagining that evolution has a choice between creating an instinctive organism whose every response is fixed in advance and creating an adaptive organism that learns what actions to take. Now suppose, for the purposes of illustration, that the optimal instinctive organism can be coded as a six-digit number, say, 472116, while in the case of the adaptive organism, evolution specifies only 472*** and the organism itself has to fill in the last three digits by learning during its lifetime. Clearly, if evolution has to worry about choosing only the first three digits, its job is much easier; the adaptive organism, in learning the last three digits, is doing in one lifetime what evolution would have taken many generations to do. So, provided the adaptive organisms can survive while learning, it seems that the capability for learning constitutes an evolutionary shortcut. Computational simulations suggest that the Baldwin effect is real.⁹ The effects of culture only accelerate the process, because an organized civilization protects the individual organism while it is learning and passes on information that the individual would otherwise need to learn for itself. The story of the Baldwin effect is fascinating but incomplete: it assumes that learning and evolution necessarily point in the same direction. That is, it assumes that whatever internal feedback signal defines the direction of learning within the organism is perfectly aligned with evolutionary fitness. As we have seen in the case of the pygmy three-toed sloth, this does not seem to be true. At best, built-in mechanisms for learning provide only a crude hint of the long-term consequences of any given action for evolutionary fitness. Moreover, one has to ask, “How did the reward system get there in the first place?” The answer, of course, is by an evolutionary process, one that internalized a feedback mechanism that is at least somewhat aligned with evolutionary fitness.¹⁰ Clearly, a learning mechanism that caused organisms to run away from potential mates and towards predators would not last long. Thus, we have the Baldwin effect to thank for the fact that neurons, with their capabilities for learning and problem solving, are so widespread in the animal kingdom. At the same time, it is important to understand that evolution doesn’t really care whether you have a brain or think interesting thoughts. Evolution considers you only as an agent, that is, something that acts. Such worthy intellectual characteristics as logical reasoning, purposeful planning, wisdom, wit, imagination, and creativity may be essential for making an agent intelligent, or they may not. One reason artificial intelligence is so fascinating is that it offers a potential route to understanding these issues: we may come to understand both how these intellectual characteristics make intelligent behavior possible and why it’s impossible to produce truly intelligent behavior without them. Rationality for one From the earliest beginnings of ancient Greek philosophy, the concept of intelligence has been tied to the ability to perceive, to reason, and to act successfully.¹¹ Over the centuries, the concept has become both broader in its applicability and more precise in its definition. Aristotle, among others, studied the notion of successful reasoning—methods of logical deduction that would lead to true conclusions given true premises. He also studied the process of deciding how to act—sometimes called practical reasoning—and proposed that it involved deducing that a certain course of action would achieve a desired goal: We deliberate not about ends, but about means. For a doctor does not deliberate whether he shall heal, nor an orator whether he shall persuade. . . . They assume the end and consider how and by what means it is attained, and if it seems easily and best produced thereby; while if it is achieved by one means only they consider how it will be achieved by this and by what means this will be achieved, till they come to the first cause . . . and what is last in the order of analysis seems to be first in the order of becoming. And if we come on an impossibility, we give up the search, e.g., if we need money and this cannot be got; but if a thing appears possible we try to do it.¹² This passage, one might argue, set the tone for the next two-thousand-odd years of Western thought about rationality. It says that the “end”—what the person wants—is fixed and given; and it says that the rational action is one that, according to logical deduction across a sequence of actions, “easily and best” produces the end. Aristotle’s proposal seems reasonable, but it isn’t a complete guide to rational behavior. In particular, it omits the issue of uncertainty. In the real world, reality has a tendency to intervene, and few actions or sequences of actions are truly guaranteed to achieve the intended end. For example, it is a rainy Sunday in Paris as I write this sentence, and on Tuesday at 2:15 p.m. my flight to Rome leaves from Charles de Gaulle Airport, about forty-five minutes from my house. I plan to leave for the airport around 11:30 a.m., which should give me plenty of time, but it probably means at least an hour sitting in the departure area. Am I certain to catch the flight? Not at all. There could be huge traffic jams, the taxi drivers may be on strike, the taxi I’m in may break down or the driver may be arrested after a high-speed chase, and so on. Instead, I could leave for the airport on Monday, a whole day in advance. This would greatly reduce the chance of missing the flight, but the prospect of a night in the departure lounge is not an appealing one. In other words, my plan involves a trade-off between the certainty of success and the cost of ensuring that degree of certainty. The following plan for buying a house involves a similar trade-off: buy a lottery ticket, win a million dollars, then buy the house. This plan “easily and best” produces the end, but it’s not very likely to succeed. The difference between this harebrained house-buying plan and my sober and sensible airport plan is, however, just a matter of degree. Both are gambles, but one seems more rational than the other. It turns out that gambling played a central role in generalizing Aristotle’s proposal to account for uncertainty. In the 1560s, the Italian mathematician Gerolamo Cardano developed the first mathematically precise theory of probability—using dice games as his main example. (Unfortunately, his work was not published until 1663.¹³) In the seventeenth century, French thinkers including Antoine Arnauld and Blaise Pascal began—for assuredly mathematical reasons—to study the question of rational decisions in gambling.¹⁴ Consider the following two bets: A: 20 percent chance of winning $10 B: 5 percent chance of winning $100 The proposal the mathematicians came up with is probably the same one you would come up with: compare the expected values of the bets, which means the average amount you would expect to get from each bet. For bet A, the expected value is 20 percent of $10, or $2. For bet B, the expected value is 5 percent of $100, or $5. So bet B is better, according to this theory. The theory makes sense, because if the same bets are offered over and over again, a bettor who follows the rule ends up with more money than one who doesn’t. In the eighteenth century, the Swiss mathematician Daniel Bernoulli noticed that this rule didn’t seem to work well for larger amounts of money.¹⁵ For example, consider the following two bets: A: 100 percent chance of getting $10,000,000 (expected value $10,000,000) B: 1 percent chance of getting $1,000,000,100 (expected value $10,000,001) Most readers of this book, as well as its author, would prefer bet A to bet B, even though the expected-value rule says the opposite! Bernoulli posited that bets are evaluated not according to expected monetary value but according to expected utility. Utility—the property of being useful or beneficial to a person—was, he suggested, an internal, subjective quantity related to, but distinct from, monetary value. In particular, utility exhibits diminishing returns with respect to money. This means that the utility of a given amount of money is not strictly proportional to the amount but grows more slowly. For example, the utility of having $1,000,000,100 is much less than a hundred times the utility of having $10,000,000. How much less? You can ask yourself! What would the odds of winning a billion dollars have to be for you to give up a guaranteed ten million? I asked this question of the graduate students in my class and their answer was around 50 percent, meaning that bet B would have an expected value of $500 million to match the desirability of bet A. Let me say that again: bet B would have an expected dollar value fifty times greater than bet A, but the two bets would have equal utility. Bernoulli’s introduction of utility—an invisible property—to explain human behavior via a mathematical theory was an utterly remarkable proposal for its time. It was all the more remarkable for the fact that, unlike monetary amounts, the utility values of various bets and prizes are not directly observable; instead, utilities are to be inferred from the preferences exhibited by an individual. It would be two centuries before the implications of the idea were fully worked out and it became broadly accepted by statisticians and economists. In the middle of the twentieth century, John von Neumann (a great mathematician after whom the standard “von Neumann architecture” for computers was named¹⁶) and Oskar Morgenstern published an axiomatic basis for utility theory.¹⁷ What this means is the following: as long as the preferences exhibited by an individual satisfy certain basic axioms that any rational agent should satisfy, then necessarily the choices made by that individual can be described as maximizing the expected value of a utility function. In short, a rational agent acts so as to maximize expected utility. It’s hard to overstate the importance of this conclusion. In many ways, artificial intelligence has been mainly about working out the details of how to build rational machines. Let’s look in a bit more detail at the axioms that rational entities are expected to satisfy. Here’s one, called transitivity: if you prefer A to B and you prefer B to C, then you prefer A to C. This seems pretty reasonable! (If you prefer sausage pizza to plain pizza, and you prefer plain pizza to pineapple pizza, then it seems reasonable to predict that you will choose sausage pizza over pineapple pizza.) Here’s another, called monotonicity: if you prefer prize A to prize B, and you have a choice of lotteries where A and B are the only two possible outcomes, you prefer the lottery with the highest probability of getting A rather than B. Again, pretty reasonable. Preferences are not just about pizza and lotteries with monetary prizes. They can be about anything at all; in particular, they can be about entire future lives and the lives of others. When dealing with preferences involving sequences of events over time, there is an additional assumption that is often made, called stationarity: if two different futures A and B begin with the same event, and you prefer A to B, you still prefer A to B after the event has occurred. This sounds reasonable, but it has a surprisingly strong consequence: the utility of any sequence of events is the sum of rewards associated with each event (possibly discounted over time, by a sort of mental interest rate).¹⁸ Although this “utility as a sum of rewards” assumption is widespread—going back at least to the eighteenth-century “hedonic calculus” of Jeremy Bentham, the founder of utilitarianism—the stationarity assumption on which it is based is not a necessary property of rational agents. Stationarity also rules out the possibility that one’s preferences might change over time, whereas our experience indicates otherwise. Despite the reasonableness of the axioms and the importance of the conclusions that follow from them, utility theory has been subjected to a continual barrage of objections since it first became widely known. Some despise it for supposedly reducing everything to money and selfishness. (The theory was derided as “American” by some French authors,¹⁹ even though it has its roots in France.) In fact, it is perfectly rational to want to live a life of self-denial, wishing only to reduce the suffering of others. Altruism simply means placing substantial weight on the well-being of others in evaluating any given future. Another set of objections has to do with the difficulty of obtaining the necessary probabilities and utility values and multiplying them together to calculate expected utilities. These objections are simply confusing two different things: choosing the rational action and choosing it by calculating expected utilities. For example, if you try to poke your eyeball with your finger, your eyelid closes to protect your eye; this is rational, but no expected-utility calculations are involved. Or suppose you are riding a bicycle downhill with no brakes and have a choice between crashing into one concrete wall at ten miles per hour or another, farther down the hill, at twenty miles per hour; which would you prefer? If you chose ten miles per hour, congratulations! Did you calculate expected utilities? Probably not. But the choice of ten miles per hour is still rational. This follows from two basic assumptions: first, you prefer less severe injuries to more severe injuries, and second, for any given level of injuries, increasing the speed of collision increases the probability of exceeding that level. From these two assumptions it follows mathematically—without considering any numbers at all—that crashing at ten miles per hour has higher expected utility than crashing at twenty.²⁰ In summary, maximizing expected utility may not require calculating any expectations or any utilities. It’s a purely external description of a rational entity. Another critique of the theory of rationality lies in the identification of the locus of decision making. That is, what things count as agents? It might seem obvious that humans are agents, but what about families, tribes, corporations, cultures, and nation-states? If we examine social insects such as ants, does it make sense to consider a single ant as an intelligent agent, or does the intelligence really lie in the colony as a whole, with a kind of composite brain made up of multiple ant brains and bodies that are interconnected by pheromone signaling instead of electrical signaling? From an evolutionary point of view, this may be a more productive way of thinking about ants, since the ants in a given colony are typically closely related. As individuals, ants and other social insects seem to lack an instinct for self-preservation as distinct from colony preservation: they will always throw themselves into battle against invaders, even at suicidal odds. Yet sometimes humans will do the same even to defend unrelated humans; it is as if the species benefits from the presence of some fraction of individuals who are willing to sacrifice themselves in battle, or to go off on wild, speculative voyages of exploration, or to nurture the offspring of others. In such cases, an analysis of rationality that focuses entirely on the individual is clearly missing something essential. The other principal objections to utility theory are empirical—that is, they are based on experimental evidence suggesting that humans are irrational. We fail to conform to the axioms in systematic ways.²¹ It is not my purpose here to defend utility theory as a formal model of human behavior. Indeed, humans cannot possibly behave rationally. Our preferences extend over the whole of our own future lives, the lives of our children and grandchildren, and the lives of others, living now or in the future. Yet we cannot even play the right moves on the chessboard, a tiny, simple place with well-defined rules and a very short horizon. This is not because our preferences are irrational but because of the complexity of the decision problem. A great deal of our cognitive structure is there to compensate for the mismatch between our small, slow brains and the incomprehensibly huge complexity of the decision problem that we face all the time. So, while it would be quite unreasonable to base a theory of beneficial AI on an assumption that humans are rational, it’s quite reasonable to suppose that an adult human has roughly consistent preferences over future lives. That is, if you were somehow able to watch two movies, each describing in sufficient detail and breadth a future life you might lead, such that each constitutes a virtual experience, you could say which you prefer, or express indifference.²² This claim is perhaps stronger than necessary, if our only goal is to make sure that sufficiently intelligent machines are not catastrophic for the human race. The very notion of catastrophe entails a definitely-not-preferred life. For catastrophe avoidance, then, we need claim only that adult humans can recognize a catastrophic future when it is spelled out in great detail. Of course, human preferences have a much more fine-grained and, presumably, ascertainable structure than just “non-catastrophes are better than catastrophes.” A theory of beneficial AI can, in fact, accommodate inconsistency in human preferences, but the inconsistent part of your preferences can never be satisfied and there’s nothing AI can do to help. Suppose, for example, that your preferences for pizza violate the axiom of transitivity: ROBOT: Welcome home! Want some pineapple pizza? YOU: No, you should know I prefer plain pizza to pineapple. ROBOT: OK, one plain pizza coming up! YOU: No thanks, I like sausage pizza better. ROBOT: So sorry, one sausage pizza! YOU: Actually, I prefer pineapple to sausage. ROBOT: My mistake, pineapple it is! YOU: I already said I like plain better than pineapple. There is no pizza the robot can serve that will make you happy because there’s always another pizza you would prefer to have. A robot can satisfy only the consistent part of your preferences—for example, let’s say you prefer all three kinds of pizza to no pizza at all. In that case, a helpful robot could give you any one of the three pizzas, thereby satisfying your preference to avoid “no pizza” while leaving you to contemplate your annoyingly inconsistent pizza topping preferences at leisure. Rationality for two The basic idea that a rational agent acts so as to maximize expected utility is simple enough, even if actually doing it is impossibly complex. The theory applies, however, only in the case of a single agent acting alone. With more than one agent, the notion that it’s possible—at least in principle—to assign probabilities to the different outcomes of one’s actions becomes problematic. The reason is that now there’s a part of the world—the other agent—that is trying to second-guess what action you’re going to do, and vice versa, so it’s not obvious how to assign probabilities to how that part of the world is going to behave. And without probabilities, the definition of rational action as maximizing expected utility isn’t applicable. As soon as someone else comes along, then, an agent will need some other way to make rational decisions. This is where game theory comes in. Despite its name, game theory isn’t necessarily about games in the usual sense; it’s a general attempt to extend the notion of rationality to situations with multiple agents. This is obviously important for our purposes, because we aren’t planning (yet) to build robots that live on uninhabited planets in other star systems; we’re going to put the robots in our world, which is inhabited by us. To make it clear why we need game theory, let’s look at a simple example: Alice and Bob playing soccer in the back garden (figure 3). Alice is about to take a penalty kick and Bob is in goal. Alice is going to shoot to Bob’s left or to his right. Because she is right-footed, it’s a little bit easier and more accurate for Alice to shoot to Bob’s right. Because Alice has a ferocious shot, Bob knows he has to dive one way or the other right away—he won’t have time to wait and see which way the ball is going. Bob could reason like this: “Alice has a better chance of scoring if she shoots to my right, because she’s right-footed, so she’ll choose that, so I’ll dive right.” But Alice is no fool and can imagine Bob thinking this way, in which case she will shoot to Bob’s left. But Bob is no fool and can imagine Alice thinking this way, in which case he will dive to his left. But Alice is no fool and can imagine Bob thinking this way. . . . OK, you get the idea. Put another way: if there is a rational choice for Alice, Bob can figure it out too, anticipate it, and stop Alice from scoring, so the choice couldn’t have been rational in the first place. [FIGURE 3: Alice about to take a penalty kick against Bob.] As early as 1713—once again, in the analysis of gambling games—a solution was found to this conundrum.²³ The trick is not to choose any one action but to choose a randomized strategy. For example, Alice can choose the strategy “shoot to Bob’s right with probability 55 percent and shoot to his left with probability 45 percent.” Bob could choose “dive right with probability 60 percent and left with probability 40 percent.” Each mentally tosses a suitably biased coin just before acting, so they don’t give away their intentions. By acting unpredictably, Alice and Bob avoid the contradictions of the preceding paragraph. Even if Bob works out what Alice’s randomized strategy is, there’s not much he can do about it without a crystal ball. The next question is, What should the probabilities be? Is Alice’s choice of 55 percent–45 percent rational? The specific values depend on how much more accurate Alice is when shooting to Bob’s right, how good Bob is at saving the shot when he dives the right way, and so on. (See the notes for the complete analysis.²⁴) The general criterion is very simple, however:
  6. Alice’s strategy is the best she can devise, assuming that Bob’s is fixed.
  7. Bob’s strategy is the best he can devise, assuming that Alice’s is fixed. If both conditions are satisfied, we say that the strategies are in equilibrium. This kind of equilibrium is called a Nash equilibrium in honor of John Nash, who, in 1950 at the age of twenty-two, proved that such an equilibrium exists for any number of agents with any rational preferences and no matter what the rules of the game might be. After several decades’ struggle with schizophrenia, Nash eventually recovered and was awarded the Nobel Memorial Prize in Economics for this work in 1994. For Alice and Bob’s soccer game, there is only one equilibrium. In other cases, there may be several, so the concept of Nash equilibria, unlike that of expected-utility decisions, does not always lead to a unique recommendation for how to behave. Worse still, there are situations in which the Nash equilibrium seems to lead to highly undesirable outcomes. One such case is the famous prisoner’s dilemma, so named by Nash’s PhD adviser, Albert Tucker, in 1950.²⁵ The game is an abstract model of those all-too-common real-world situations where mutual cooperation would be better for all concerned but people nonetheless choose mutual destruction. The prisoner’s dilemma works as follows: Alice and Bob are suspects in a crime and are being interrogated separately. Each has a choice: to confess to the police and rat on his or her accomplice, or to refuse to talk.²⁶ If both refuse, they are convicted on a lesser charge and serve two years; if both confess, they are convicted on a more serious charge and serve ten years; if one confesses and the other refuses, the one who confesses goes free and the accomplice serves twenty years. Now, Alice reasons as follows: “If Bob is going to confess, then I should confess too (ten years is better than twenty); if he is going to refuse, then I should confess (going free is better than spending two years in prison); so either way, I should confess.” Bob reasons the same way. Thus, they both end up confessing to their crimes and serving ten years, even though by jointly refusing they could have served only two years. The problem is that joint refusal isn’t a Nash equilibrium, because each has an incentive to defect and go free by confessing. Note that Alice could have reasoned as follows: “Whatever reasoning I do, Bob will also do. So we’ll end up choosing the same thing. Since joint refusal is better than joint confession, we should refuse.” This form of reasoning acknowledges that, as rational agents, Alice and Bob will make choices that are correlated rather than independent. It’s just one of many approaches that game theorists have tried in their efforts to obtain less depressing solutions to the prisoner’s dilemma.²⁷ Another famous example of an undesirable equilibrium is the tragedy of the commons, first analyzed in 1833 by the English economist William Lloyd²⁸ but named, and brought to global attention, by the ecologist Garrett Hardin in 1968.²⁹ The tragedy arises when several people can consume a shared resource—such as common grazing land or fish stocks—that replenishes itself slowly. Absent any social or legal constraints, the only Nash equilibrium among selfish (non-altruistic) agents is for each to consume as much as possible, leading to rapid collapse of the resource. The ideal solution, where everyone shares the resource such that the total consumption is sustainable, is not an equilibrium because each individual has an incentive to cheat and take more than their fair share—imposing the costs on others. In practice, of course, humans do sometimes avoid this tragedy by setting up mechanisms such as quotas and punishments or pricing schemes. They can do this because they are not limited to deciding how much to consume; they can also decide to communicate. By enlarging the decision problem in this way, we find solutions that are better for everyone. These examples, and many others, illustrate the fact that extending the theory of rational decisions to multiple agents produces many interesting and complex behaviors. It’s also extremely important because, as should be obvious, there is more than one human being. And soon there will be intelligent machines too. Needless to say, we have to achieve mutual cooperation, resulting in benefit to humans, rather than mutual destruction. Computers Having a reasonable definition of intelligence is the first ingredient in creating intelligent machines. The second ingredient is a machine in which that definition can be realized. For reasons that will soon become obvious, that machine is a computer. It could have been something different—for example, we might have tried to make intelligent machines out of complex chemical reactions or by hijacking biological cells³⁰—but devices built for computation, from the very earliest mechanical calculators onwards, have always seemed to their inventors to be the natural home for intelligence. We are so used to computers now that we barely notice their utterly incredible powers. If you have a laptop or a desktop or a smart phone, look at it: a small box, with a way to type characters. Just by typing, you can create programs that turn the box into something new, perhaps something that magically synthesizes moving images of oceangoing ships hitting icebergs or alien planets with tall blue people; type some more, and it translates English into Chinese; type some more, and it listens and speaks; type some more, and it defeats the world chess champion. This ability of a single box to carry out any process that you can imagine is called universality, a concept first introduced by Alan Turing in 1936.³¹ Universality means that we do not need separate machines for arithmetic, machine translation, chess, speech understanding, or animation: one machine does it all. Your laptop is essentially identical to the vast server farms run by the world’s largest IT companies—even those equipped with fancy, special-purpose tensor processing units for machine learning. It’s also essentially identical to all future computing devices yet to be invented. The laptop can do exactly the same tasks, provided it has enough memory; it just takes a lot longer. Turing’s paper introducing universality was one of the most important ever written. In it, he described a simple computing device that could accept as input the description of any other computing device, together with that second device’s input, and, by simulating the operation of the second device on its input, produce the same output that the second device would have produced. We now call this first device a universal Turing machine. To prove its universality, Turing introduced precise definitions for two new kinds of mathematical objects: machines and programs. Together, the machine and program define a sequence of events—specifically, a sequence of state changes in the machine and its memory. In the history of mathematics, new kinds of objects occur quite rarely. Mathematics began with numbers at the dawn of recorded history. Then, around 2000 BCE, ancient Egyptians and Babylonians worked with geometric objects (points, lines, angles, areas, and so on). Chinese mathematicians introduced matrices during the first millennium BCE, while sets as mathematical objects arrived only in the nineteenth century. Turing’s new objects—machines and programs—are perhaps the most powerful mathematical objects ever invented. It is ironic that the field of mathematics largely failed to recognize this, and from the 1940s onwards, computers and computation have been the province of engineering departments in most major universities. The field that emerged—computer science—exploded over the next seventy years, producing a vast array of new concepts, designs, methods, and applications, as well as seven of the eight most valuable companies in the world. The central concept in computer science is that of an algorithm, which is a precisely specified method for computing something. Algorithms are, by now, familiar parts of everyday life: a square-root algorithm in a pocket calculator receives a number as input and returns the square root of that number as output; a chess-playing algorithm takes a chess position and returns a move; a route-finding algorithm takes a start location, a goal location, and a street map and returns the fastest route from start to goal. Algorithms can be described in English or in mathematical notation, but to be implemented they must be coded as programs in a programming language. More complex algorithms can be built by using simpler ones as building blocks called subroutines—for example, a self-driving car might use a route-finding algorithm as a subroutine so that it knows where to go. In this way, software systems of immense complexity are built up, layer by layer. Computer hardware matters because faster computers with more memory allow algorithms to run more quickly and to handle more information. Progress in this area is well known but still mind-boggling. The first commercial electronic programmable computer, the Ferranti Mark I, could execute about a thousand (10³) instructions per second and had about a thousand bytes of main memory. The fastest computer as of early 2019, the Summit machine at the Oak Ridge National Laboratory in Tennessee, executes about 10¹⁸ instructions per second (a thousand trillion times faster) and has 2.5 × 10¹⁷ bytes of memory (250 trillion times more). This progress has resulted from advances in electronic devices and even in the underlying physics, allowing an incredible degree of miniaturization. Although comparisons between computers and brains are not especially meaningful, the numbers for Summit slightly exceed the raw capacity of the human brain, which, as noted previously, has about 10¹⁵ synapses and a “cycle time” of about one hundredth of a second, for a theoretical maximum of about 10¹⁷ “operations” per second. The biggest difference is power consumption: Summit uses about a million times more power. Moore’s law, an empirical observation that the number of electronic components on a chip doubles every two years, is expected to continue until 2025 or so, although at a slightly slower rate. For some years, speeds have been limited by the large amount of heat generated by the fast switching of silicon transistors; moreover, circuit sizes cannot get much smaller because the wires and connectors are (as of 2019) no more than twenty-five atoms wide and five to ten atoms thick. Beyond 2025, we will need to use more exotic physical phenomena—including negative capacitance devices,³² single-atom transistors, graphene nanotubes, and photonics—to keep Moore’s law (or its successor) going. Instead of just speeding up general-purpose computers, another possibility is to build special-purpose devices that are customized to perform just one class of computations. For example, Google’s tensor processing units (TPUs) are designed to perform the calculations required for certain machine learning algorithms. One TPU pod (2018 version) performs roughly 10¹⁷ calculations per second—nearly as much as the Summit machine—but uses about one hundred times less power and is one hundred times smaller. Even if the underlying chip technology remains roughly constant, these kinds of machines can simply be made larger and larger to provide vast quantities of raw computational power for AI systems. Quantum computation is a different kettle of fish. It uses the strange properties of quantum-mechanical wave functions to achieve something remarkable: with twice the amount of quantum hardware, you can do more than twice the amount of computation! Very roughly, it works like this:³³ Suppose you have a tiny physical device that stores a quantum bit, or qubit. A qubit has two possible states, 0 and 1. Whereas in classical physics the qubit device has to be in one of the two states, in quantum physics the wave function that carries information about the qubit says that it is in both states simultaneously. If you have two qubits, there are four possible joint states: 00, 01, 10, and 11. If the wave function is coherently entangled across the two qubits, meaning that no other physical processes are there to mess it up, then the two qubits are in all four states simultaneously. Moreover, if the two qubits are connected into a quantum circuit that performs some calculation, then the calculation proceeds with all four states simultaneously. With three qubits, you get eight states processed simultaneously, and so on. Now, there are some physical limitations so that the amount of work that gets done is less than exponential in the number of qubits,³⁴ but we know that there are important problems for which quantum computation is provably more efficient than any classical computer. As of 2019, there are experimental prototypes of small quantum processors in operation with a few tens of qubits, but there are no interesting computing tasks for which a quantum processor is faster than a classical computer. The main difficulty is decoherence—processes such as thermal noise that mess up the coherence of the multi-qubit wave function. Quantum scientists hope to solve the decoherence problem by introducing error correction circuitry, so that any error that occurs in the computation is quickly detected and corrected by a kind of voting process. Unfortunately, error-correcting systems require far more qubits to do the same work: while a quantum machine with a few hundred perfect qubits would be very powerful compared to existing classical computers, we will probably need a few million error-correcting qubits to actually realize those computations. Going from a few tens to a few million qubits will take quite a few years. If, eventually, we get there, that would completely change the picture of what we can do by sheer brute-force computation.³⁵ Rather than waiting for real conceptual advances in AI, we might be able to use the raw power of quantum computation to bypass some of the barriers faced by current “unintelligent” algorithms. The limits of computation Even in the 1950s, computers were described in the popular press as “super-brains” that were “faster than Einstein.” So can we say now, finally, that computers are as powerful as the human brain? No. Focusing on raw computing power misses the point entirely. Speed alone won’t give us AI. Running a poorly designed algorithm on a faster computer doesn’t make the algorithm better; it just means you get the wrong answer more quickly. (And with more data there are more opportunities for wrong answers!) The principal effect of faster machines has been to make the time for experimentation shorter, so that research can progress more quickly. It’s not hardware that is holding AI back; it’s software. We don’t yet know how to make a machine really intelligent—even if it were the size of the universe. Suppose, however, that we do manage to develop the right kind of AI software. Are there any limits placed by physics on how powerful a computer can be? Will those limits prevent us from having enough computing power to create real AI? The answers seem to be yes, there are limits, and no, there isn’t a ghost of a chance that the limits will prevent us from creating real AI. MIT physicist Seth Lloyd has estimated the limits for a laptop-sized computer, based on considerations from quantum theory and entropy.³⁶ The numbers would raise even Carl Sagan’s eyebrows: 10⁵¹ operations per second and 10³⁰ bytes of memory, or approximately a billion trillion trillion times faster and four trillion times more memory than Summit—which, as noted previously, has more raw power than the human brain. Thus, when one hears suggestions that the human mind represents an upper limit on what is physically achievable in our universe,³⁷ one should at least ask for further clarification. Besides limits imposed by physics, there are other limits on the abilities of computers that originate in the work of computer scientists. Turing himself proved that some problems are undecidable by any computer: the problem is well defined, there is an answer, but there cannot exist an algorithm that always finds that answer. He gave the example of what became known as the halting problem: Can an algorithm decide if a given program has an “infinite loop” that prevents it from ever finishing?³⁸ Turing’s proof that no algorithm can solve the halting problem³⁹ is incredibly important for the foundations of mathematics, but it seems to have no bearing on the issue of whether computers can be intelligent. One reason for this claim is that the same basic limitation seems to apply to the human brain. Once you start asking a human brain to perform an exact simulation of itself simulating itself simulating itself, and so on, you’re bound to run into difficulties. I, for one, have never worried about my inability to do this. Focusing on decidable problems, then, seems not to place any real restrictions on AI. It turns out, however, that decidable doesn’t mean easy. Computer scientists spend a lot of time thinking about the complexity of problems, that is, the question of how much computation is needed to solve a problem by the most efficient method. Here’s an easy problem: given a list of a thousand numbers, find the biggest number. If it takes one second to check each number, then it takes a thousand seconds to solve this problem by the obvious method of checking each in turn and keeping track of the biggest. Is there a faster method? No, because if a method didn’t check some number in the list, that number might be the biggest, and the method would fail. So, the time to find the largest element is proportional to the size of the list. A computer scientist would say the problem has linear complexity, meaning that it’s very easy; then she would look for something more interesting to work on. What gets theoretical computer scientists excited is the fact that many problems appear⁴⁰ to have exponential complexity in the worst case. This means two things: first, all the algorithms we know about require exponential time—that is, an amount of time exponential in the size of the input—to solve at least some problem instances; second, theoretical computer scientists are pretty sure that more efficient algorithms do not exist. Exponential growth in difficulty means that problems may be solvable in theory (that is, they are certainly decidable) but sometimes unsolvable in practice; we call such problems intractable. An example is the problem of deciding whether a given map can be colored with just three colors, so that no two adjacent regions have the same color. (It is well known that coloring with four different colors is always possible.) With a million regions, it may be that there are some cases (not all, but some) that require something like 2¹⁰⁰⁰ computational steps to find the answer, which means about 10²⁷⁵ years on the Summit supercomputer or a mere 10²⁴² years on Seth Lloyd’s ultimate-physics laptop. The age of the universe, about 10¹⁰ years, is a tiny blip compared to this. Does the existence of intractable problems give us any reason to think that computers cannot be as intelligent as humans? No. There is no reason to suppose that humans can solve intractable problems either. Quantum computation helps a bit (whether in machines or brains), but not enough to change the basic conclusion. Complexity means that the real-world decision problem—the problem of deciding what to do right now, at every instant in one’s life—is so difficult that neither humans nor computers will ever come close to finding perfect solutions. This has two consequences: first, we expect that, most of the time, real-world decisions will be at best halfway decent and certainly far from optimal; second, we expect that a great deal of the mental architecture of humans and computers—the way their decision processes actually operate—will be designed to overcome complexity to the extent possible—that is, to make it possible to find even halfway decent answers despite the overwhelming complexity of the world. Finally, we expect that the first two consequences will remain true no matter how intelligent and powerful some future machine may be. The machine may be far more capable than us, but it will still be far from perfectly rational. Intelligent Computers The development of logic by Aristotle and others made available precise rules for rational thought, but we do not know whether Aristotle ever contemplated the possibility of machines that implemented these rules. In the thirteenth century, the influential Catalan philosopher, seducer, and mystic Ramon Llull came much closer: he actually made paper wheels inscribed with symbols, by means of which he could generate logical combinations of assertions. The great seventeenth-century French mathematician Blaise Pascal was the first to develop a real and practical mechanical calculator. Although it could only add and subtract and was used mainly in his father’s tax-collecting office, it led Pascal to write, “The arithmetical machine produces effects which appear nearer to thought than all the actions of animals.” Technology took a dramatic leap forward in the nineteenth century when the British mathematician and inventor Charles Babbage designed the Analytical Engine, a programmable universal machine in the sense defined later by Turing. He was helped in his work by Ada, Countess of Lovelace, daughter of the romantic poet and adventurer Lord Byron. Whereas Babbage hoped to use the Analytical Engine to compute accurate mathematical and astronomical tables, Lovelace understood its true potential,⁴¹ describing it in 1842 as “a thinking or . . . a reasoning machine” that could reason about “all subjects in the universe.” So, the basic conceptual elements for creating AI were in place! From that point, surely, AI would be just a matter of time. . . . A long time, unfortunately—the Analytical Engine was never built, and Lovelace’s ideas were largely forgotten. With Turing’s theoretical work in 1936 and the subsequent impetus of World War II, universal computing machines were finally realized in the 1940s. Thoughts about creating intelligence followed immediately. Turing’s 1950 paper, “Computing Machinery and Intelligence,”⁴² is the best known of many early works on the possibility of intelligent machines. Skeptics were already asserting that machines would never be able to do X, for almost any X you could think of, and Turing refuted those assertions. He also proposed an operational test for intelligence, called the imitation game, which subsequently (in simplified form) became known as the Turing test. The test measures the behavior of the machine—specifically, its ability to fool a human interrogator into thinking that it is human. The imitation game serves a specific role in Turing’s paper—namely as a thought experiment to deflect skeptics who supposed that machines could not think in the right way, for the right reasons, with the right kind of awareness. Turing hoped to redirect the argument towards the issue of whether a machine could behave in a certain way; and if it did—if it was able, say, to discourse sensibly on Shakespeare’s sonnets and their meanings—then skepticism about AI could not really be sustained. Contrary to common interpretations, I doubt that the test was intended as a true definition of intelligence, in the sense that a machine is intelligent if and only if it passes the Turing test. Indeed, Turing wrote, “May not machines carry out something which ought to be described as thinking but which is very different from what a man does?” Another reason not to view the test as a definition for AI is that it’s a terrible definition to work with. And for that reason, mainstream AI researchers have expended almost no effort to pass the Turing test. The Turing test is not useful for AI because it’s an informal and highly contingent definition: it depends on the enormously complicated and largely unknown characteristics of the human mind, which derive from both biology and culture. There is no way to “unpack” the definition and work back from it to create machines that will provably pass the test. Instead, AI has focused on rational behavior, just as described previously: a machine is intelligent to the extent that what it does is likely to achieve what it wants, given what it has perceived. Initially, like Aristotle, AI researchers identified “what it wants” with a goal that is either satisfied or not. These goals could be in toy worlds like the 15-puzzle, where the goal is to get all the numbered tiles lined up in order from 1 to 15 in a little (simulated) square tray; or they might be in real, physical environments: in the early 1970s, the Shakey robot at SRI in California was pushing large blocks into desired configurations, and Freddy at the University of Edinburgh was assembling a wooden boat from its component pieces. All this work was done using logical problem-solvers and planning systems to construct and execute guaranteed plans to achieve goals.⁴³ By the 1980s, it was clear that logical reasoning alone could not suffice, because, as noted previously, there is no plan that is guaranteed to get you to the airport. Logic requires certainty, and the real world simply doesn’t provide it. Meanwhile, the Israeli-American computer scientist Judea Pearl, who went on to win the 2011 Turing Award, had been working on methods for uncertain reasoning based in probability theory.⁴⁴ AI researchers gradually accepted Pearl’s ideas; they adopted the tools of probability theory and utility theory and thereby connected AI to other fields such as statistics, control theory, economics, and operations research. This change marked the beginning of what some observers call modern AI. Agents and environments The central concept of modern AI is the intelligent agent—something that perceives and acts. The agent is a process occurring over time, in the sense that a stream of perceptual inputs is converted into a stream of actions. For example, suppose the agent in question is a self-driving taxi taking me to the airport. Its inputs might include eight RGB cameras operating at thirty frames per second; each frame consists of perhaps 7.5 million pixels, each with an image intensity value in each of three color channels, for a total of more than five gigabytes per second. (The flow of data from the two hundred million photoreceptors in the retina is even larger, which partially explains why vision occupies such a large fraction of the human brain.) The taxi also gets data from an accelerometer one hundred times per second, as well as GPS data. This incredible flood of raw data is transformed by the simply gargantuan computing power of billions of transistors (or neurons) into smooth, competent driving behavior. The taxi’s actions include the electronic signals sent to the steering wheel, brakes, and accelerator, twenty times per second. (For an experienced human driver, most of this maelstrom of activity is unconscious: you may be aware only of making decisions such as “overtake this slow truck” or “stop for gas,” but your eyes, brain, nerves, and muscles are still doing all the other stuff.) For a chess program, the inputs are mostly just the clock ticks, with the occasional notification of the opponent’s move and the new board state, while the actions are mostly doing nothing while the program is thinking, and occasionally choosing a move and notifying the opponent. For a personal digital assistant, or PDA, such as Siri or Cortana, the inputs include not just the acoustic signal from the microphone (sampled forty-eight thousand times per second) and input from the touch screen but also the content of each Web page that it accesses, while the actions include both speaking and displaying material on the screen. The way we build intelligent agents depends on the nature of the problem we face. This, in turn, depends on three things: first, the nature of the environment the agent will operate in—a chessboard is a very different place from a crowded freeway or a mobile phone; second, the observations and actions that connect the agent to the environment—for example, Siri might or might not have access to the phone’s camera so that it can see; and third, the agent’s objective—teaching the opponent to play better chess is a very different task from winning the game. To give just one example of how the design of the agent depends on these things: If the objective is to win the game, a chess program need consider only the current board state and does not need any memory of past events.⁴⁵ The chess tutor, on the other hand, should continually update its model of which aspects of chess the pupil does or does not understand so that it can provide useful advice. In other words, for the chess tutor, the pupil’s mind is a relevant part of the environment. Moreover, unlike the board, it is a part of the environment that is not directly observable. The characteristics of problems that influence the design of agents include at least the following:⁴⁶
  1. The machine’s only objective is to maximize the realization of human preferences.
  2. The machine is initially uncertain about what those preferences are.
  3. The ultimate source of information about human preferences is human behavior. Before delving into more detailed explanations, it’s important to remember the broad scope of what I mean by preferences in these principles. Here’s a reminder of what I wrote in Chapter 2: if you were somehow able to watch two movies, each describing in sufficient detail and breadth a future life you might lead, such that each constitutes a virtual experience, you could say which you prefer, or express indifference. Thus, preferences here are all-encompassing; they cover everything you might care about, arbitrarily far into the future.⁵ And they are yours: the machine is not looking to identify or adopt one ideal set of preferences but to understand and satisfy (to the extent possible) the preferences of each person. The first principle: Purely altruistic machines The first principle, that the machine’s only objective is to maximize the realization of human preferences, is central to the notion of a beneficial machine. In particular, it will be beneficial to humans, rather than to, say, cockroaches. There’s no getting around this recipient-specific notion of benefit. The principle means that the machine is purely altruistic—that is, it attaches absolutely no intrinsic value to its own well-being or even its own existence. It might protect itself in order to continue doing useful things for humans, or because its owner would be unhappy about having to pay for repairs, or because the sight of a dirty or damaged robot might be mildly distressing to passersby, but not because it wants to be alive. Putting in any preference for self-preservation sets up an additional incentive within the robot that is not strictly aligned with human well-being. The wording of the first principle brings up two questions of fundamental importance. Each merits an entire bookshelf to itself, and in fact many books have already been written on these questions. The first question is whether humans really have preferences in a meaningful or stable sense. In truth, the notion of a “preference” is an idealization that fails to match reality in several ways. For example, we aren’t born with the preferences we have as adults, so they must change over time. For now, I will assume that the idealization is reasonable. Later, I will examine what happens when we give up the idealization. The second question is a staple of the social sciences: given that it is usually impossible to ensure that everyone gets their most preferred outcome—we can’t all be Emperor of the Universe—how should the machine trade off the preferences of multiple humans? Again, for the time being—and I promise to return to this question in the next chapter—it seems reasonable to adopt the simple approach of treating everyone equally. This is reminiscent of the roots of eighteenth-century utilitarianism in the phrase “the greatest happiness for the greatest numbers,”⁶ and there are many caveats and elaborations required to make this work in practice. Perhaps the most important of these is the matter of the possibly vast number of people not yet born, and how their preferences are to be taken into account. The issue of future humans brings up another, related question: How do we take into account the preferences of nonhuman entities? That is, should the first principle include the preferences of animals? (And possibly plants too?) This is a question worthy of debate, but the outcome seems unlikely to have a strong impact on the path forward for AI. For what it’s worth, human preferences can and do include terms for the well-being of animals, as well as for the aspects of human well-being that benefit directly from animals’ existence.⁷ To say that the machine should pay attention to the preferences of animals in addition to this is to say that humans should build machines that care more about animals than humans do, which is a difficult position to sustain. A more tenable position is that our tendency to engage in myopic decision making—which works against our own interests—often leads to negative consequences for the environment and its animal inhabitants. A machine that makes less myopic decisions would help humans adopt more environmentally sound policies. And if, in the future, we give substantially greater weight to the well-being of animals than we currently do—which probably means sacrificing some of our own intrinsic well-being—then machines will adapt accordingly. The second principle: Humble machines The second principle, that the machine is initially uncertain about what human preferences are, is the key to creating beneficial machines. A machine that assumes it knows the true objective perfectly will pursue it single-mindedly. It will never ask whether some course of action is OK, because it already knows it’s an optimal solution for the objective. It will ignore humans jumping up and down screaming, “Stop, you’re going to destroy the world!” because those are just words. Assuming perfect knowledge of the objective decouples the machine from the human: what the human does no longer matters, because the machine knows the goal and pursues it. On the other hand, a machine that is uncertain about the true objective will exhibit a kind of humility: it will, for example, defer to humans and allow itself to be switched off. It reasons that the human will switch it off only if it’s doing something wrong—that is, doing something contrary to human preferences. By the first principle, it wants to avoid doing that, but, by the second principle, it knows that’s possible because it doesn’t know exactly what “wrong” is. So, if the human does switch the machine off, then the machine avoids doing the wrong thing, and that’s what it wants. In other words, the machine has a positive incentive to allow itself to be switched off. It remains coupled to the human, who is a potential source of information that will allow it to avoid mistakes and do a better job. Uncertainty has been a central concern in AI since the 1980s; indeed the phrase “modern AI” often refers to the revolution that took place when uncertainty was finally recognized as a ubiquitous issue in real-world decision making. Yet uncertainty in the objective of the AI system was simply ignored. In all the work on utility maximization, goal achievement, cost minimization, reward maximization, and loss minimization, it is assumed that the utility function, the goal, the cost function, the reward function, and the loss function are known perfectly. How could this be? How could the AI community (and the control theory, operations research, and statistics communities) have such a huge blind spot for so long, even while embracing uncertainty in all other aspects of decision making?⁸ One could make some rather complicated technical excuses,⁹ but I suspect the truth is that, with some honorable exceptions,¹⁰ AI researchers simply bought into the standard model that maps our notion of human intelligence onto machine intelligence: humans have objectives and pursue them, so machines should have objectives and pursue them. They, or should I say we, never really examined this fundamental assumption. It is built into all existing approaches for constructing intelligent systems. The third principle: Learning to predict human preferences The third principle, that the ultimate source of information about human preferences is human behavior, serves two purposes. The first purpose is to provide a definite grounding for the term human preferences. By assumption, human preferences aren’t in the machine and it cannot observe them directly, but there must still be some definite connection between the machine and human preferences. The principle says that the connection is through the observation of human choices: we assume that choices are related in some (possibly very complicated) way to underlying preferences. To see why this connection is essential, consider the converse: if some human preference had no effect whatsoever on any actual or hypothetical choice the human might make, then it would probably be meaningless to say that the preference exists. The second purpose is to enable the machine to become more useful as it learns more about what we want. (After all, if it knew nothing about human preferences, it would be of no use to us.) The idea is simple enough: human choices reveal information about human preferences. Applied to the choice between pineapple pizza and sausage pizza, this is straightforward. Applied to choices between future lives and choices made with the goal of influencing the robot’s behavior, things get more interesting. In the next chapter I explain how to formulate and solve such problems. The real complications arise, however, because humans are not perfectly rational: imperfection comes between human preferences and human choices, and the machine must take into account those imperfections if it is to interpret human choices as evidence of human preferences. Not what I mean Before going into more detail, I want to head off some potential misunderstandings. The first and most common misunderstanding is that I am proposing to install in machines a single, idealized value system of my own design that guides the machine’s behavior. “Whose values are you going to put in?” “Who gets to decide what the values are?” Or even, “What gives Western, well-off, white male cisgender scientists such as Russell the right to determine how the machine encodes and develops human values?”¹¹ I think this confusion comes partly from an unfortunate conflict between the commonsense meaning of value and the more technical sense in which it is used in economics, AI, and operations research. In ordinary usage, values are what one uses to help resolve moral dilemmas; as a technical term, on the other hand, value is roughly synonymous with utility, which measures the degree of desirability of anything from pizza to paradise. The meaning I want is the technical one: I just want to make sure the machines give me the right pizza and don’t accidentally destroy the human race. (Finding my keys would be an unexpected bonus.) To avoid this confusion, the principles talk about human preferences rather than human values, since the former term seems to steer clear of judgmental preconceptions about morality. “Putting in values” is, of course, exactly the mistake I am saying we should avoid, because getting the values (or preferences) exactly right is so difficult and getting them wrong is potentially catastrophic. I am proposing instead that machines learn to predict better, for each person, which life that person would prefer, all the while being aware that the predictions are highly uncertain and incomplete. In principle, the machine can learn billions of different predictive preference models, one for each of the billions of people on Earth. This is really not too much to ask for the AI systems of the future, given that present-day Facebook systems are already maintaining more than two billion individual profiles. A related misunderstanding is that the goal is to equip machines with “ethics” or “moral values” that will enable them to resolve moral dilemmas. Often, people bring up the so-called trolley problems,¹² where one has to choose whether to kill one person in order to save others, because of their supposed relevance to self-driving cars. The whole point of moral dilemmas, however, is that they are dilemmas: there are good arguments on both sides. The survival of the human race is not a moral dilemma. Machines could solve most moral dilemmas the wrong way (whatever that is) and still have no catastrophic impact on humanity.¹³ Another common supposition is that machines that follow the three principles will adopt all the sins of the evil humans they observe and learn from. Certainly, there are many of us whose choices leave something to be desired, but there is no reason to suppose that machines who study our motivations will make the same choices, any more than criminologists become criminals. Take, for example, the corrupt government official who demands bribes to approve building permits because his paltry salary won’t pay for his children to go to university. A machine observing this behavior will not learn to take bribes; it will learn that the official, like many other people, has a very strong desire for his children to be educated and successful. It will find ways to help him that don’t involve lowering the well-being of others. This is not to say that all cases of evil behavior are unproblematic for machines—for example, machines may need to treat differently those who actively prefer the suffering of others. Reasons for Optimism In a nutshell, I am suggesting that we need to steer AI in a radically new direction if we want to retain control over increasingly intelligent machines. We need to move away from one of the driving ideas of twentieth-century technology: machines that optimize a given objective. I am often asked why I think this is even remotely feasible, given the huge momentum behind the standard model in AI and related disciplines. In fact, I am quite optimistic that it can be done. The first reason for optimism is that there are strong economic incentives to develop AI systems that defer to humans and gradually align themselves to user preferences and intentions. Such systems will be highly desirable: the range of behaviors they can exhibit is simply far greater than that of machines with fixed, known objectives. They will ask humans questions or ask for permission when appropriate; they will do “trial runs” to see if we like what they propose to do; they will accept correction when they do something wrong. On the other hand, systems that fail to do this will have severe consequences. Up to now, the stupidity and limited scope of AI systems has protected us from these consequences, but that will change. Imagine, for example, some future domestic robot charged with looking after your children while you are working late. The children are hungry, but the refrigerator is empty. Then the robot notices the cat. Alas, the robot understands the cat’s nutritional value but not its sentimental value. Within a few short hours, headlines about deranged robots and roasted cats are blanketing the world’s media and the entire domestic-robot industry is out of business. The possibility that one industry player could destroy the entire industry through careless design provides a strong economic motivation to form safety-oriented industry consortia and to enforce safety standards. Already, the Partnership on AI, which includes as members nearly all the world’s leading technology companies, has agreed to cooperate to ensure that “AI research and technology is robust, reliable, trustworthy, and operates within secure constraints.” To my knowledge, all the major players are publishing their safety-oriented research in the open literature. Thus, the economic incentive is in operation long before we reach human-level AI and will only strengthen over time. Moreover, the same cooperative dynamic may be starting at the international level—for example, the stated policy of the Chinese government is to “cooperate to preemptively prevent the threat of AI.”¹⁴ A second reason for optimism is that the raw data for learning about human preferences—namely, examples of human behavior—are so abundant. The data come not just in the form of direct observation via camera, keyboard, and touch screen by billions of machines sharing data with one another about billions of humans (subject to privacy constraints, of course) but also in indirect form. The most obvious kind of indirect evidence is the vast human record of books, films, and television and radio broadcasts, which is almost entirely concerned with people doing things (and other people being upset about it). Even the earliest and most tedious Sumerian and Egyptian records of copper ingots being traded for sacks of barley give some insight into human preferences for different commodities. There are, of course, difficulties involved in interpreting this raw material, which includes propaganda, fiction, the ravings of lunatics, and even the pronouncements of politicians and presidents, but there is certainly no reason for the machine to take it all at face value. Machines can and should interpret all communications from other intelligent entities as moves in a game rather than as statements of fact; in some games, such as cooperative games with one human and one machine, the human has an incentive to be truthful, but in many other situations there are incentives to be dishonest. And of course, whether honest or dishonest, humans may be deluded in their own beliefs. There is a second kind of indirect evidence that is staring us in the face: the way we have made the world.¹⁵ We made it that way because—very roughly—we like it that way. (Obviously, it’s not perfect!) Now, imagine you are an alien visiting Earth while all the humans are away on holiday. As you peer inside their houses, can you begin to grasp the basics of human preferences? Carpets are on floors because we like to walk on soft, warm surfaces and we don’t like loud footsteps; vases are on the middle of the table rather than the edge because we don’t want them to fall and break; and so on—everything that isn’t arranged by nature itself provides clues to the likes and dislikes of the strange bipedal creatures who inhabit this planet. Reasons for Caution You may find the Partnership on AI’s promises of cooperation on AI safety less than reassuring if you have been following progress in self-driving cars. That field is ruthlessly competitive, for some very good reasons: the first car manufacturer to release a fully autonomous vehicle will gain a huge market advantage; that advantage will be self-reinforcing because the manufacturer will be able to collect more data more quickly to improve the system’s performance; and ride-hailing companies such as Uber would quickly go out of business if another company were to roll out fully autonomous taxis before Uber does. This has led to a high-stakes race in which caution and careful engineering appear to be less important than snazzy demos, talent grabs, and premature rollouts. Thus, life-or-death economic competition provides an impetus to cut corners on safety in the hope of winning the race. In a 2008 retrospective paper on the 1975 Asilomar conference that he co-organized—the conference that led to a moratorium on genetic modification of humans—the biologist Paul Berg wrote,¹⁶ There is a lesson in Asilomar for all of science: the best way to respond to concerns created by emerging knowledge or early-stage technologies is for scientists from publicly funded institutions to find common cause with the wider public about the best way to regulate—as early as possible. Once scientists from corporations begin to dominate the research enterprise, it will simply be too late. Economic competition occurs not just between corporations but also between nations. A recent flurry of announcements of multibillion-dollar national investments in AI from the United States, China, France, Britain, and the EU certainly suggests that none of the major powers wants to be left behind. In 2017, Russian president Vladimir Putin said, “The one who becomes the leader in [AI] will be the ruler of the world.”¹⁷ This analysis is essentially correct. Advanced AI would, as we saw in Chapter 3, lead to greatly increased productivity and rates of innovation in almost all areas. If not shared, it would allow its possessor to outcompete any rival nation or bloc. Nick Bostrom, in Superintelligence, warns against exactly this motivation. National competition, just like corporate competition, would tend to focus more on advances in raw capabilities and less on the problem of control. Perhaps, however, Putin has read Bostrom; he went on to say, “It would be strongly undesirable if someone wins a monopolist position.” It would also be rather pointless, because human-level AI is not a zero-sum game and nothing is lost by sharing it. On the other hand, competing to be the first to achieve human-level AI, without first solving the control problem, is a negative-sum game. The payoff for everyone is minus infinity. There’s only a limited amount that AI researchers can do to influence the evolution of global policy on AI. We can point to possible applications that would provide economic and social benefits; we can warn about possible misuses such as surveillance and weapons; and we can provide roadmaps for the likely path of future developments and their impacts. Perhaps the most important thing we can do is to design AI systems that are, to the extent possible, provably safe and beneficial for humans. Only then will it make sense to attempt general regulation of AI. PROVABLY BENEFICIAL AI If we are going to rebuild AI along new lines, the foundations must be solid. When the future of humanity is at stake, hope and good intentions—and educational initiatives and industry codes of conduct and legislation and economic incentives to do the right thing—are not enough. All of these are fallible, and they often fail. In such situations, we look to precise definitions and rigorous step-by-step mathematical proofs to provide incontrovertible guarantees. That’s a good start, but we need more. We need to be sure, to the extent possible, that what is guaranteed is actually what we want and that the assumptions going into the proof are actually true. The proofs themselves belong in journal papers written for specialists, but I think it is useful nonetheless to understand what proofs are and what they can and cannot provide in the way of real safety. The “provably beneficial” in the title of the chapter is an aspiration rather than a promise, but it is the right aspiration. Mathematical Guarantees We will want, eventually, to prove theorems to the effect that a particular way of designing AI systems ensures that they will be beneficial to humans. A theorem is just a fancy name for an assertion, stated precisely enough so that its truth in any particular situation can be checked. Perhaps the most famous theorem is Fermat’s Last Theorem, which was conjectured by the French mathematician Pierre de Fermat in 1637 and finally proved by Andrew Wiles in 1994 after 357 years of effort (not all of it by Wiles).¹ The theorem can be written in one line, but the proof is over one hundred pages of dense mathematics. Proofs begin from axioms, which are assertions whose truth is simply assumed. Often, the axioms are just definitions, such as the definitions of integers, addition, and exponentiation needed for Fermat’s theorem. The proof proceeds from the axioms by logically incontrovertible steps, adding new assertions until the theorem itself is established as a consequence of one of the steps. Here’s a fairly obvious theorem that follows almost immediately from the definitions of integers and addition: 1 + 2 = 2 + 1. Let’s call this Russell’s theorem. It’s not much of a discovery. On the other hand, Fermat’s Last Theorem feels like something completely new—a discovery of something previously unknown. The difference, however, is just a matter of degree. The truth of both Russell’s and Fermat’s theorems is already contained in the axioms. Proofs merely make explicit what was already implicit. They can be long or short, but they add nothing new. The theorem is only as good as the assumptions that go into it. That’s fine when it comes to mathematics, because mathematics is about abstract objects that we define—numbers, sets, and so on. The axioms are true because we say so. On the other hand, if you want to prove something about the real world—for example, that AI systems designed like so won’t kill you on purpose—your axioms have to be true in the real world. If they aren’t true, you’ve proved something about an imaginary world. Science and engineering have a long and honorable tradition of proving results about imaginary worlds. In structural engineering, for example, one might see a mathematical analysis that begins, “Let AB be a rigid beam. . . .” The word rigid here doesn’t mean “made of something hard like steel”; it means “infinitely strong,” so that it doesn’t bend at all. Rigid beams do not exist, so this is an imaginary world. The trick is to know how far one can stray from the real world and still obtain useful results. For example, if the rigid-beam assumption allows an engineer to calculate the forces in a structure that includes the beam, and those forces are small enough to bend a real steel beam by only a tiny amount, then the engineer can be reasonably confident that the analysis will transfer from the imaginary world to the real world. A good engineer develops a sense for when this transfer might fail— for example, if the beam is under compression, with huge forces pushing on it from each end, then even a tiny amount of bending might lead to greater lateral forces causing more bending, and so on, resulting in catastrophic failure. In that case, the analysis is redone with “Let AB be a flexible beam with stiffness K. . . .” This is still an imaginary world, of course, because real beams do not have uniform stiffness; instead, they have microscopic imperfections that can lead to cracks forming if the beam is subject to repeated bending. The process of removing unrealistic assumptions continues until the engineer is fairly confident that the remaining assumptions are true enough in the real world. After that, the engineered system can be tested in the real world; but the test results are just that. They do not prove that the same system will work in other circumstances or that other instances of the system will behave the same way as the original. One of the classic examples of assumption failure in computer science comes from cybersecurity. In that field, a huge amount of mathematical analysis goes into showing that certain digital protocols are provably secure—for example, when you type a password into a Web application, you want to be sure that it is encrypted before transmission so that someone eavesdropping on the network cannot read your password. Such digital systems are often provably secure but still vulnerable to attack in reality. The false assumption here is that this is a digital process. It isn’t. It operates in the real, physical world. By listening to the sound of your keyboard or measuring voltages on the electrical line that supplies power to your desktop computer, an attacker can “hear” your password or observe the encryption/decryption calculations that are occurring as it is processed. The cybersecurity community is now responding to these so-called side-channel attacks—for example, by writing encryption code that produces the same voltage fluctuations regardless of what message is being encrypted. Let’s look at the kind of theorem we would like eventually to prove about machines that are beneficial to humans. One type might go something like this: Suppose a machine has components A, B, C, connected to each other like so and to the environment like so, with internal learning algorithms l_(A), l_(B), l_(C) that optimize internal feedback rewards r_(A), r_(B), r_(C) defined like so, and [a few more conditions] . . . then, with very high probability, the machine’s behavior will be very close in value (for humans) to the best possible behavior realizable on any machine with the same computational and physical capabilities. The main point here is that such a theorem should hold regardless of how smart the components become—that is, the vessel never springs a leak and the machine always remains beneficial to humans. There are three other points worth making about this kind of theorem. First, we cannot try to prove that the machine produces optimal (or even near-optimal) behavior on our behalf, because that’s almost certainly computationally impossible. For example, we might want the machine to play Go perfectly, but there is good reason to believe that cannot be done in any practical amount of time on any physically realizable machine. Optimal behavior in the real world is even less feasible. Hence, the theorem says “best possible” rather than “optimal.” Second, we say “very high probability . . . very close” because that’s typically the best that can be done with machines that learn. For example, if the machine is learning to play roulette for us and the ball lands in zero forty times in a row, the machine might reasonably decide the table was rigged and bet accordingly. But it could have happened by chance; so there is always a small—perhaps vanishingly small—chance of being misled by freak occurrences. Finally, we are a long way from being able to prove any such theorem for really intelligent machines operating in the real world! There are also analogs of the side-channel attack in AI. For example, the theorem begins with “Suppose a machine has components A, B, C, connected to each other like so. . . .” This is typical of all correctness theorems in computer science: they begin with a description of the program being proved correct. In AI, we typically distinguish between the agent (the program doing the deciding) and the environment (on which the agent acts). Since we design the agent, it seems reasonable to assume that it has the structure we give it. To be extra safe, we can prove that its learning processes can modify its program only in certain circumscribed ways that cannot cause problems. Is this enough? No. As with side-channel attacks, the assumption that the program operates within a digital system is incorrect. Even if a learning algorithm is constitutionally incapable of overwriting its own code by digital means, it may, nonetheless, learn to persuade humans to do “brain surgery” on it—to violate the agent/environment distinction and change the code by physical means.² Unlike the structural engineer reasoning about rigid beams, we have very little experience with the assumptions that will eventually underlie theorems about provably beneficial AI. In this chapter, for example, we will typically be assuming a rational human. This is a bit like assuming a rigid beam, because there are no perfectly rational humans in reality. (It’s probably much worse, however, because humans are not even close to being rational.) The theorems we can prove seem to provide some insights, and the insights survive the introduction of a certain degree of randomness in human behavior, but it is as yet far from clear what happens when we consider some of the complexities of real humans. So, we are going to have to be very careful in examining our assumptions. When a proof of safety succeeds, we need to make sure it’s not succeeding because we have made unrealistically strong assumptions or because the definition of safety is too weak. When a proof of safety fails, we need to resist the temptation to strengthen the assumptions to make the proof go through—for example, by adding the assumption that the program’s code remains fixed. Instead, we need to tighten up the design of the AI system—for example, by ensuring that it has no incentive to modify critical parts of its own code. There are some assumptions that I call OWMAWGH assumptions, standing for “otherwise we might as well go home.” That is, if these assumptions are false, the game is up and there is nothing to be done. For example, it is reasonable to assume that the universe operates according to constant and somewhat discernible laws. If this is not the case, we will have no assurance that learning processes—even very sophisticated ones—will work at all. Another basic assumption is that humans care about what happens; if not, provably beneficial AI has no purpose because beneficial has no meaning. Here, caring means having roughly coherent and more-or-less stable preferences about the future. In the next chapter, I examine the consequences of plasticity in human preferences, which presents a serious philosophical challenge to the very idea of provably beneficial AI. For now, I focus on the simplest case: a world with one human and one robot. This case serves to introduce the basic ideas, but it’s also useful in its own right: you can think of the human as standing in for all of humanity and the robot as standing in for all machines. Additional complications arise when considering multiple humans and machines. Learning Preferences from Behavior Economists elicit preferences from human subjects by offering them choices.³ This technique is widely used in product design, marketing, and interactive e-commerce systems. For example, by offering test subjects choices among cars with different paint colors, seating arrangements, trunk sizes, battery capacities, cup holders, and so on, a car designer learns how much people care about various car features and how much they are willing to pay for them. Another important application is in the medical domain, where an oncologist considering a possible limb amputation might want to assess the patient’s preferences between mobility and life expectancy. And of course, pizza restaurants want to know how much more someone is willing to pay for sausage pizza than plain pizza. Preference elicitation typically considers only single choices made between objects whose value is assumed to be immediately apparent to the subject. It’s not obvious how to extend it to preferences between future lives. For that, we (and machines) need to learn from observations of behavior over time—behavior that involves multiple choices and uncertain outcomes. Early in 1997, I was involved in discussions with my colleagues Michael Dickinson and Bob Full about ways in which we might be able to apply ideas from machine learning to understand the locomotive behavior of animals. Michael studied in exquisite detail the wing motions of fruit flies. Bob was especially fond of creepy-crawlies and had built a little treadmill for cockroaches to see how their gait changed with speed. We thought it might be possible to use reinforcement learning to train a robotic or simulated insect to reproduce these complex behaviors. The problem we faced was that we didn’t know what reward signal to use. What were the flies and cockroaches optimizing? Without that information, we couldn’t apply reinforcement learning to train the virtual insect, so we were stuck. One day, I was walking down the road that leads from our house in Berkeley to the local supermarket. The road has a downhill slope, and I noticed, as I am sure most people have, that the slope induced a slight change in the way I walked. Moreover, the uneven paving resulting from decades of minor earthquakes induced additional gait changes, including raising my feet a little higher and planting them less stiffly because of the unpredictable ground level. As I pondered these mundane observations, I realized we had got it backwards. While reinforcement learning generates behavior from rewards, we actually wanted the opposite: to learn the rewards given the behavior. We already had the behavior, as produced by the flies and cockroaches; we wanted to know the specific reward signal being optimized by this behavior. In other words, we needed algorithms for inverse reinforcement learning, or IRL.⁴ (I did not know at the time that a similar problem had been studied under the perhaps less wieldy name of structural estimation of Markov decision processes, a field pioneered by Nobel laureate Tom Sargent in the late 1970s.⁵) Such algorithms would not only be able to explain animal behavior but also to predict their behavior in new circumstances. For example, how would a cockroach run on a bumpy treadmill that sloped sideways? The prospect of answering such fundamental questions was almost too exciting to bear, but even so it took some time to work out the first algorithms for IRL.⁶ Many different formulations and algorithms for IRL have been proposed since then. There are formal guarantees that the algorithms work, in the sense that they can acquire enough information about an entity’s preferences to be able to behave just as successfully as the entity they are observing.⁷ Perhaps the easiest way to understand IRL is this: the observer starts with some vague estimate of the true reward function and then refines this estimate, making it more precise, as more behavior is observed. Or, in Bayesian language:⁸ start with a prior probability over possible reward functions and then update the probability distribution on reward functions as evidence arrives.^(C) For example, suppose Robbie the robot is watching Harriet the human and wondering how much she prefers aisle seats to window seats. Initially, he is quite uncertain about this. Conceptually, Robbie’s reasoning might go like this: “If Harriet really cared about an aisle seat, she would have looked at the seat map to see if one was available rather than just accepting the window seat that the airline gave her, but she didn’t, even though she probably noticed it was a window seat and she probably wasn’t in a hurry; so now it’s considerably more likely that she either is roughly indifferent between window and aisle or even prefers a window seat.” The most striking example of IRL in practice is the work of my colleague Pieter Abbeel on learning to do helicopter aerobatics.⁹ Expert human pilots can make model helicopters do amazing things—loops, spirals, pendulum swings, and so on. Trying to copy what the human does turns out not to work very well because conditions are not perfectly reproducible: repeating the same control sequences in different circumstances can lead to disaster. Instead, the algorithm learns what the human pilot wants, in the form of trajectory constraints that it can achieve. This approach actually produces results that are even better than the human expert’s, because the human has slower reactions and is constantly making small mistakes and correcting for them. Assistance Games IRL is already an important tool for building effective AI systems, but it makes some simplifying assumptions. The first is that the robot is going to adopt the reward function once it has learned it by observing the human, so that it can perform the same task. This is fine for driving or helicopter piloting, but it’s not fine for drinking coffee: a robot observing my morning routine should learn that I (sometimes) want coffee, but should not learn to want coffee itself. Fixing this issue is easy—we simply ensure that the robot associates the preferences with the human, not with itself. The second simplifying assumption in IRL is that the robot is observing a human who is solving a single-agent decision problem. For example, suppose the robot is in medical school, learning to be a surgeon by watching a human expert. IRL algorithms assume that the human performs the surgery in the usual optimal way, as if the robot were not there. But that’s not what would happen: the human surgeon is motivated to have the robot (like any other medical student) learn quickly and well, and so she will modify her behavior considerably. She might explain what she is doing as she goes along; she might point out mistakes to avoid, such as making the incision too deep or the stitches too tight; she might describe the contingency plans in case something goes wrong during surgery. None of these behaviors make sense when performing surgery in isolation, so IRL algorithms will not be able to interpret the preferences they imply. For this reason, we will need to generalize IRL from the single-agent setting to the multi-agent setting—that is, we will need to devise learning algorithms that work when the human and robot are part of the same environment and interacting with each other. With a human and a robot in the same environment, we are in the realm of game theory—just as in the penalty shoot-out between Alice and Bob on this page. We assume, in this first version of the theory, that the human has preferences and acts according to those preferences. The robot doesn’t know what preferences the human has, but it wants to satisfy them anyway. We’ll call any such situation an assistance game, because the robot is, by definition, supposed to be helpful to the human.¹⁰ Assistance games instantiate the three principles from the preceding chapter: the robot’s only objective is to satisfy human preferences, it doesn’t initially know what they are, and it can learn more by observing human behavior. Perhaps the most interesting property of assistance games is that, by solving the game, the robot can work out for itself how to interpret the human’s behavior as providing information about human preferences. The paperclip game The first example of an assistance game is the paperclip game. It’s a very simple game in which Harriet the human has an incentive to “signal” to Robbie the robot some information about her preferences. Robbie is able to interpret that signal because he can solve the game, and therefore he can understand what would have to be true about Harriet’s preferences in order for her to signal in that way. [FIGURE 12: The paperclip game. Harriet the human can choose to make 2 paperclips, 2 staples, or 1 of each. Robbie the robot then has a choice to make 90 paperclips, 90 staples, or 50 of each.] The steps of the game are depicted in figure 12. It involves making paperclips and staples. Harriet’s preferences are expressed by a payoff function that depends on the number of paperclips and the number of staples produced, with a certain “exchange rate” between the two. For example, she might value paperclips at 45¢ and staples at 55¢ each. (We’ll assume the two values always add up to $1.00; it’s only the ratio that matters.) So, if 10 paperclips and 20 staples are produced, Harriet’s payoff will be 10 × 45¢ + 20 × 55¢ = $15.50. Robbie the robot is initially completely uncertain about Harriet’s preferences: he has a uniform distribution for the value of a paperclip (that is, it’s equally likely to be any value from 0¢ to $1.00). Harriet goes first and can choose to make two paperclips, two staples, or one of each. Then Robbie can choose to make 90 paperclips, 90 staples, or 50 of each.¹¹ Notice that if she were doing this by herself, Harriet would just make two staples, with a value of $1.10. But Robbie is watching, and he learns from her choice. What exactly does he learn? Well, that depends on how Harriet makes her choice. How does Harriet make her choice? That depends on how Robbie is going to interpret it. So, we seem to have a circular problem! That’s typical in game-theoretic problems, and that’s why Nash proposed the concept of equilibrium solutions. To find an equilibrium solution, we need to identify strategies for Harriet and Robbie such that neither has an incentive to change their strategy, assuming the other remains fixed. A strategy for Harriet specifies how many paperclips and staples to make, given her preferences; a strategy for Robbie specifies how many paperclips and staples to make, given Harriet’s action. It turns out there is only one equilibrium solution, and it looks like this:
  1. The first edition of my textbook on AI, co-authored with Peter Norvig, currently director of research at Google: Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 1st ed. (Prentice Hall, 1995).
  2. Robinson developed the resolution algorithm, which can, given enough time, prove any logical consequence of a set of first-order logical assertions. Unlike previous algorithms, it did not require conversion to propositional logic. J. Alan Robinson, “A machine-oriented logic based on the resolution principle,” Journal of the ACM 12 (1965): 23–41.
  3. Arthur Samuel, an American pioneer of the computer era, did his early work at IBM. The paper describing his work on checkers was the first to use the term machine learning, although Alan Turing had already talked about “a machine that can learn from experience” as early as 1947. Arthur Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development 3 (1959): 210–29.
  4. The “Lighthill Report,” as it became known, led to the termination of research funding for AI except at the universities of Edinburgh and Sussex: Michael James Lighthill, “Artificial intelligence: A general survey,” in Artificial Intelligence: A Paper Symposium (Science Research Council of Great Britain, 1973).
  5. The CDC 6600 filled an entire room and cost the equivalent of $20 million. For its era it was incredibly powerful, albeit a million times less powerful than an iPhone.
  6. Following Deep Blue’s victory over Kasparov, at least one commentator predicted that it would take one hundred years before the same thing happened in Go: George Johnson, “To test a powerful computer, play an ancient game,” The New York Times, July 29, 1997.
  7. For a highly readable history of the development of nuclear technology, see Richard Rhodes, The Making of the Atomic Bomb (Simon & Schuster, 1987).
  8. A simple supervised learning algorithm may not have this effect, unless it is wrapped within an A/B testing framework (as is common in online marketing settings). Bandit algorithms and reinforcement learning algorithms will have this effect if they operate with an explicit representation of user state or an implicit representation in terms of the history of interactions with the user.
  9. Some have argued that profit-maximizing corporations are already out-of-control artificial entities. See, for example, Charles Stross, “Dude, you broke the future!” (keynote, 34th Chaos Communications Congress, 2017). See also Ted Chiang, “Silicon Valley is turning into its own worst fear,” Buzzfeed, December 18, 2017. The idea is explored further by Daniel Hillis, “The first machine intelligences,” in Possible Minds: Twenty-Five Ways of Looking at AI, ed. John Brockman (Penguin Press, 2019).
  10. For its time, Wiener’s paper was a rare exception to the prevailing view that all technological progress was a good thing: Norbert Wiener, “Some moral and technical consequences of automation,” Science 131 (1960): 1355–58. CHAPTER 2
  11. Santiago Ramón y Cajal proposed synaptic changes as the site of learning in 1894, but it was not until the late 1960s that this hypothesis was confirmed experimentally. See Timothy Bliss and Terje Lomo, “Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path,” Journal of Physiology 232 (1973): 331–56.
  12. For a brief introduction, see James Gorman, “Learning how little we know about the brain,” The New York Times, November 10, 2014. See also Tom Siegfried, “There’s a long way to go in understanding the brain,” ScienceNews, July 25, 2017. A special 2017 issue of the journal Neuron (vol. 94, pp. 933–1040) provides a good overview of many different approaches to understanding the brain.
  13. The presence or absence of consciousness—actual subjective experience—certainly makes a difference in our moral consideration for machines. If ever we gain enough understanding to design conscious machines or to detect that we have done so, we would face many important moral issues for which we are largely unprepared.
  14. The following paper was among the first to make a clear connection between reinforcement learning algorithms and neurophysiological recordings: Wolfram Schultz, Peter Dayan, and P. Read Montague, “A neural substrate of prediction and reward,” Science 275 (1997): 1593–99.
  15. Studies of intracranial stimulation were carried out with the hope of finding cures for various mental illnesses. See, for example, Robert Heath, “Electrical self-stimulation of the brain in man,” American Journal of Psychiatry 120 (1963): 571–77.
  16. An example of a species that may be facing self-extinction via addiction: Bryson Voirin, “Biology and conservation of the pygmy sloth, Bradypus pygmaeus,” Journal of Mammalogy 96 (2015): 703–7.
  17. The Baldwin effect in evolution is usually attributed to the following paper: James Baldwin, “A new factor in evolution,” American Naturalist 30 (1896): 441–51.
  18. The core idea of the Baldwin effect also appears in the following work: Conwy Lloyd Morgan, Habit and Instinct (Edward Arnold, 1896).
  19. A modern analysis and computer implementation demonstrating the Baldwin effect: Geoffrey Hinton and Steven Nowlan, “How learning can guide evolution,” Complex Systems 1 (1987): 495–502.
  20. Further elucidation of the Baldwin effect by a computer model that includes the evolution of the internal reward-signaling circuitry: David Ackley and Michael Littman, “Interactions between learning and evolution,” in Artificial Life II, ed. Christopher Langton et al. (Addison-Wesley, 1991).
  21. Here I am pointing to the roots of our present-day concept of intelligence, rather than describing the ancient Greek concept of nous, which had a variety of related meanings.
  22. The quotation is taken from Aristotle, Nicomachean Ethics, Book III, 3, 1112b.
  23. Cardano, one of the first European mathematicians to consider negative numbers, developed an early mathematical treatment of probability in games. He died in 1576, eighty-seven years before his work appeared in print: Gerolamo Cardano, Liber de ludo aleae (Lyons, 1663).
  24. Arnauld’s work, initially published anonymously, is often called The Port-Royal Logic: Antoine Arnauld, La logique, ou l’art de penser (Chez Charles Savreux, 1662). See also Blaise Pascal, Pensées (Chez Guillaume Desprez, 1670).
  25. The concept of utility: Daniel Bernoulli, “Specimen theoriae novae de mensura sortis,” Proceedings of the St. Petersburg Imperial Academy of Sciences 5 (1738): 175–92. Bernoulli’s idea of utility arises from considering a merchant, Sempronius, choosing whether to transport a valuable cargo in one ship or to split it between two, assuming that each ship has a 50 percent probability of sinking on the journey. The expected monetary value of the two solutions is the same, but Sempronius clearly prefers the two-ship solution.
  26. By most accounts, von Neumann did not himself invent this architecture but his name was on an early draft of an influential report describing the EDVAC stored-program computer.
  27. The work of von Neumann and Morgenstern is in many ways the foundation of modern economic theory: John von Neumann and Oskar Morgenstern, Theory of Games and Economic Behavior (Princeton University Press, 1944).
  28. The proposal that utility is a sum of discounted rewards was put forward as a mathematically convenient hypothesis by Paul Samuelson, “A note on measurement of utility,” Review of Economic Studies 4 (1937): 155–61. If s₀, s₁, . . . is a sequence of states, then its utility in this model is U(s₀, s₁, . . .) = ∑_(t)γ^(t)R(s_(t)), where γ is a discount factor and R is a reward function describing the desirability of a state. Naïve application of this model seldom agrees with the judgment of real individuals about the desirability of present and future rewards. For a thorough analysis, see Shane Frederick, George Loewenstein, and Ted O’Donoghue, “Time discounting and time preference: A critical review,” Journal of Economic Literature 40 (2002): 351–401.
  29. Maurice Allais, a French economist, proposed a decision scenario in which humans appear consistently to violate the von Neumann–Morgenstern axioms: Maurice Allais, “Le comportement de l’homme rationnel devant le risque: Critique des postulats et axiomes de l’école américaine,” Econometrica 21 (1953): 503–46.
  30. For an introduction to non-quantitative decision analysis, see Michael Wellman, “Fundamental concepts of qualitative probabilistic networks,” Artificial Intelligence 44 (1990): 257–303.
  31. I will discuss the evidence for human irrationality further in Chapter 9. The standard references include the following: Allais, “Le comportement”; Daniel Ellsberg, Risk, Ambiguity, and Decision (PhD thesis, Harvard University, 1962); Amos Tversky and Daniel Kahneman, “Judgment under uncertainty: Heuristics and biases,” Science 185 (1974): 1124–31.
  32. It should be clear that this is a thought experiment that cannot be realized in practice. Choices about different futures are never presented in full detail, and humans never have the luxury of minutely examining and savoring those futures before choosing. Instead, one is given only brief summaries, such as “librarian” or “coal miner.” In making such a choice, one is really being asked to compare two probability distributions over complete futures, one beginning with the choice “librarian” and the other “coal miner,” with each distribution assuming optimal actions on one’s own part within each future. Needless to say, this is not easy.
  33. The first mention of a randomized strategy for games appears in Pierre Rémond de Montmort, Essay d’analyse sur les jeux de hazard, 2nd ed. (Chez Jacques Quillau, 1713). The book identifies a certain Monsieur de Waldegrave as the source of an optimal randomized solution for the card game Le Her. Details of Waldegrave’s identity are revealed by David Bellhouse, “The problem of Waldegrave,” Electronic Journal for History of Probability and Statistics 3 (2007).
  34. The problem is fully defined by specifying the probability that Alice scores in each of four cases: when she shoots to Bob’s right and he dives right or left, and when she shoots to his left and he dives right or left. In this case, these probabilities are 25 percent, 70 percent, 65 percent, and 10 percent respectively. Now suppose that Alice’s strategy is to shoot to Bob’s right with probability p and his left with probability 1 − p, while Bob dives to his right with probability q and left with probability 1 − q. The payoff to Alice is U_(A) = 0.25pq + 0.70 p(1 − q) + 0.65 (1 − p)q + 0.10(1 − p) (1 − q), while Bob’s payoff is U_(B) = −U_(A). At equilibrium, ∂U_(A)/∂p = 0 and ∂U_(B)/∂q = 0, giving p = 0.55 and q = 0.60.
  35. The original game-theoretic problem was introduced by Merrill Flood and Melvin Dresher at the RAND Corporation; Tucker saw the payoff matrix on a visit to their offices and proposed a “story” to go along with it.
  36. Game theorists typically say that Alice and Bob could cooperate with each other (refuse to talk) or defect and rat on their accomplice. I find this language confusing, because “cooperate with each other” is not a choice that each agent can make separately, and because in common parlance one often talks about cooperating with the police, receiving a lighter sentence in return for cooperating, and so on.
  37. For an interesting trust-based solution to the prisoner’s dilemma and other games, see Joshua Letchford, Vincent Conitzer, and Kamal Jain, “An ‘ethical’ game-theoretic solution concept for two-player perfect-information games,” in Proceedings of the 4th International Workshop on Web and Internet Economics, ed. Christos Papadimitriou and Shuzhong Zhang (Springer, 2008).
  38. Origin of the tragedy of the commons: William Forster Lloyd, Two Lectures on the Checks to Population (Oxford University, 1833).
  39. Modern revival of the topic in the context of global ecology: Garrett Hardin, “The tragedy of the commons,” Science 162 (1968): 1243–48.
  40. It’s quite possible that even if we had tried to build intelligent machines from chemical reactions or biological cells, those assemblages would have turned out to be implementations of Turing machines in nontraditional materials. Whether an object is a general-purpose computer has nothing to do with what it’s made of.
  41. Turing’s breakthrough paper defined what is now known as the Turing machine, the basis for modern computer science. The Entscheidungsproblem, or decision problem, in the title is the problem of deciding entailment in first-order logic: Alan Turing, “On computable numbers, with an application to the Entscheidungsproblem,” Proceedings of the London Mathematical Society, 2nd ser., 42 (1936): 230–65.
  42. A good survey of research on negative capacitance by one of its inventors: Sayeef Salahuddin, “Review of negative capacitance transistors,” in International Symposium on VLSI Technology, Systems and Application (IEEE Press, 2016).
  43. For a much better explanation of quantum computation, see Scott Aaronson, Quantum Computing since Democritus (Cambridge University Press, 2013).
  44. The paper that established a clear complexity-theoretic distinction between classical and quantum computation: Ethan Bernstein and Umesh Vazirani, “Quantum complexity theory,” SIAM Journal on Computing 26 (1997): 1411–73.
  45. The following article by a renowned physicist provides a good introduction to the current state of understanding and technology: John Preskill, “Quantum computing in the NISQ era and beyond,” arXiv:1801.00862 (2018).
  46. On the maximum computational ability of a one-kilogram object: Seth Lloyd, “Ultimate physical limits to computation,” Nature 406 (2000): 1047–54.
  47. For an example of the suggestion that humans may be the pinnacle of physically achievable intelligence, see Kevin Kelly, “The myth of a superhuman AI,” Wired, April 25, 2017: “We tend to believe that the limit is way beyond us, way ‘above’ us, as we are ‘above’ an ant. . . . What evidence do we have that the limit is not us?”
  48. In case you are wondering about a simple trick to solve the halting problem: the obvious method of just running the program to see if it finishes doesn’t work, because that method doesn’t necessarily finish. You might wait a million years and still not know if the program is really stuck in an infinite loop or just taking its time.
  49. The proof that the halting problem is undecidable is an elegant piece of trickery. The question: Is there a LoopChecker(P,X) program that, for any program P and any input X, decides correctly, in finite time, whether P applied to input X will halt and produce a result or keep chugging away forever? Suppose that LoopChecker exists. Now write a program Q that calls LoopChecker as a subroutine, with Q itself and X as inputs, and then does the opposite of what LoopChecker(Q,X) predicts. So, if LoopChecker says that Q halts, Q doesn’t halt, and vice versa. Thus, the assumption that LoopChecker exists leads to a contradiction, so LoopChecker cannot exist.
  50. I say “appear” because, as yet, the claim that the class of NP-complete problems requires superpolynomial time (usually referred to as P ≠ NP) is still an unproven conjecture. After almost fifty years of research, however, nearly all mathematicians and computer scientists are convinced the claim is true.
  51. Lovelace’s writings on computation appear mainly in her notes attached to her translation of an Italian engineer’s commentary on Babbage’s engine: L. F. Menabrea, “Sketch of the Analytical Engine invented by Charles Babbage,” trans. Ada, Countess of Lovelace, in Scientific Memoirs, vol. III, ed. R. Taylor (R. and J. E. Taylor, 1843). Menabrea’s original article, written in French and based on lectures given by Babbage in 1840, appears in Bibliothèque Universelle de Genève 82 (1842).
  52. One of the seminal early papers on the possibility of artificial intelligence: Alan Turing, “Computing machinery and intelligence,” Mind 59 (1950): 433–60.
  53. The Shakey project at SRI is summarized in a retrospective by one of its leaders: Nils Nilsson, “Shakey the robot,” technical note 323 (SRI International, 1984). A twenty-four-minute film, SHAKEY: Experimentation in Robot Learning and Planning, was made in 1969 and garnered national attention.
  54. The book that marked the beginning of modern, probability-based AI: Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).
  55. Technically, chess is not fully observable. A program does need to remember a small amount of information to determine the legality of castling and en passant moves and to define draws by repetition or by the fifty-move rule.
  56. For a complete exposition, see Chapter 2 of Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd ed. (Pearson, 2010).
  57. The size of the state space for StarCraft is discussed by Santiago Ontañon et al., “A survey of real-time strategy game AI research and competition in StarCraft,” IEEE Transactions on Computational Intelligence and AI in Games 5 (2013): 293–311. Vast numbers of moves are possible because a player can move all units simultaneously. The numbers go down as restrictions are imposed on how many units or groups of units can be moved at once.
  58. On human–machine competition in StarCraft: Tom Simonite, “DeepMind beats pros at StarCraft in another triumph for bots,” Wired, January 25, 2019.
  59. AlphaZero is described by David Silver et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv:1712.01815 (2017).
  60. Optimal paths in graphs are found using the A* algorithm and its many descendants: Peter Hart, Nils Nilsson, and Bertram Raphael, “A formal basis for the heuristic determination of minimum cost paths,” IEEE Transactions on Systems Science and Cybernetics SSC-4 (1968): 100–107.
  61. The paper that introduced the Advice Taker program and logic-based knowledge systems: John McCarthy, “Programs with common sense,” in Proceedings of the Symposium on Mechanisation of Thought Processes (Her Majesty’s Stationery Office, 1958).
  62. To get some sense of the significance of knowledge-based systems, consider database systems. A database contains concrete, individual facts, such as the location of my keys and the identities of your Facebook friends. Database systems cannot store general rules, such as the rules of chess or the legal definition of British citizenship. They can count how many people called Alice have friends called Bob, but they cannot determine whether a particular Alice meets the conditions for British citizenship or whether a particular sequence of moves on a chessboard will lead to checkmate. Database systems cannot combine two pieces of knowledge to produce a third: they support memory but not reasoning. (It is true that many modern database systems provide a way to add rules and a way to use those rules to derive new facts; to the extent that they do, they are really knowledge-based systems.) Despite being highly constricted versions of knowledge-based systems, database systems underlie most of present-day commercial activity and generate hundreds of billions of dollars in value every year.
  63. The original paper describing the completeness theorem for first-order logic: Kurt Gödel, “Die Vollständigkeit der Axiome des logischen Funktionenkalküls,” Monatshefte für Mathematik 37 (1930): 349–60.
  64. The reasoning algorithm for first-order logic does have a gap: if there is no answer—that is, if the available knowledge is insufficient to give an answer either way—then the algorithm may never finish. This is unavoidable: it is mathematically impossible for a correct algorithm always to terminate with “don’t know,” for essentially the same reason that no algorithm can solve the halting problem (this page).
  65. The first algorithm for theorem-proving in first-order logic worked by reducing first-order sentences to (very large numbers of) propositional sentences: Martin Davis and Hilary Putnam, “A computing procedure for quantification theory,” Journal of the ACM 7 (1960): 201–15. Robinson’s resolution algorithm operated directly on first-order logical sentences, using “unification” to match complex expressions containing logical variables: J. Alan Robinson, “A machine-oriented logic based on the resolution principle,” Journal of the ACM 12 (1965): 23–41.
  66. One might wonder how Shakey the logical robot ever reached any definite conclusions about what to do. The answer is simple: Shakey’s knowledge base contained false assertions. For example, Shakey believed that by executing “push object A through door D into room B,” object A would end up in room B. This belief was false because Shakey could get stuck in the doorway or miss the doorway altogether or someone might sneakily remove object A from Shakey’s grasp. Shakey’s plan execution module could detect plan failure and replan accordingly, so Shakey was not, strictly speaking, a purely logical system.
  67. An early commentary on the role of probability in human thinking: Pierre-Simon Laplace, Essai philosophique sur les probabilités (Mme. Ve. Courcier, 1814).
  68. Bayesian logic described in a fairly nontechnical way: Stuart Russell, “Unifying logic and probability,” Communications of the ACM 58 (2015): 88–97. The paper draws heavily on the PhD thesis research of my former student Brian Milch.
  69. The original source for Bayes’ theorem: Thomas Bayes and Richard Price, “An essay towards solving a problem in the doctrine of chances,” Philosophical Transactions of the Royal Society of London 53 (1763): 370–418.
  70. Technically, Samuel’s program did not treat winning and losing as absolute rewards; by fixing the value of material to be positive; however, the program generally tended to work towards winning.
  71. The application of reinforcement learning to produce a world-class backgammon program: Gerald Tesauro, “Temporal difference learning and TD-Gammon,” Communications of the ACM 38 (1995): 58–68.
  72. The DQN system that learns to play a wide variety of video games using deep RL: Volodymyr Mnih et al., “Human-level control through deep reinforcement learning,” Nature 518 (2015): 529–33.
  73. Bill Gates’s remarks on Dota 2 AI: Catherine Clifford, “Bill Gates says gamer bots from Elon Musk-backed nonprofit are ‘huge milestone’ in A.I.,” CNBC, June 28, 2018.
  74. An account of OpenAI Five’s victory over the human world champions at Dota 2: Kelsey Piper, “AI triumphs against the world’s top pro team in strategy game Dota 2,” Vox, April 13, 2019.
  75. A compendium of cases in the literature where misspecification of reward functions led to unexpected behavior: Victoria Krakovna, “Specification gaming examples in AI,” Deep Safety (blog), April 2, 2018.
  76. A case where an evolutionary fitness function defined in terms of maximum velocity led to very unexpected results: Karl Sims, “Evolving virtual creatures,” in Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (ACM, 1994).
  77. For a fascinating exposition of the possibilities of reflex agents, see Valentino Braitenberg, Vehicles: Experiments in Synthetic Psychology (MIT Press, 1984).
  78. News article on a fatal accident involving a vehicle in autonomous mode that hit a pedestrian: Devin Coldewey, “Uber in fatal crash detected pedestrian but had emergency braking disabled,” TechCrunch, May 24, 2018.
  79. On steering control algorithms, see, for example, Jarrod Snider, “Automatic steering methods for autonomous automobile path tracking,” technical report CMU-RI-TR-09-08, Robotics Institute, Carnegie Mellon University, 2009.
  80. Norfolk and Norwich terriers are two categories in the ImageNet database. They are notoriously hard to tell apart and were viewed as a single breed until 1964.
  81. A very unfortunate incident with image labeling: Daniel Howley, “Google Photos mislabels 2 black Americans as gorillas,” Yahoo Tech, June 29, 2015.
  82. Follow-up article on Google and gorillas: Tom Simonite, “When it comes to gorillas, Google Photos remains blind,” Wired, January 11, 2018. CHAPTER 3
  83. The basic plan for game-playing algorithms was laid out by Claude Shannon, “Programming a computer for playing chess,” Philosophical Magazine, 7th ser., 41 (1950): 256–75.
  84. See figure 5.12 of Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 1st ed. (Prentice Hall, 1995). Note that the rating of chess players and chess programs is not an exact science. Kasparov’s highest-ever Elo rating was 2851, achieved in 1999, but current chess engines such as Stockfish are rated at 3300 or more.
  85. The earliest reported autonomous vehicle on a public road: Ernst Dickmanns and Alfred Zapp, “Autonomous high speed road vehicle guidance by computer vision,” IFAC Proceedings Volumes 20 (1987): 221–26.
  86. The safety record for Google (subsequently Waymo) vehicles: “Waymo safety report: On the road to fully self-driving,” 2018.
  87. So far there have been at least two driver fatalities and one pedestrian fatality. Some references follow, along with brief quotes describing what happened. Danny Yadron and Dan Tynan, “Tesla driver dies in first fatal crash while using autopilot mode,” Guardian, June 30, 2016: “The autopilot sensors on the Model S failed to distinguish a white tractor-trailer crossing the highway against a bright sky.” Megan Rose Dickey, “Tesla Model X sped up in Autopilot mode seconds before fatal crash, according to NTSB,” TechCrunch, June 7, 2018: “At 3 seconds prior to the crash and up to the time of impact with the crash attenuator, the Tesla’s speed increased from 62 to 70.8 mph, with no precrash braking or evasive steering movement detected.” Devin Coldewey, “Uber in fatal crash detected pedestrian but had emergency braking disabled,” TechCrunch, May 24, 2018: “Emergency braking maneuvers are not enabled while the vehicle is under computer control, to reduce the potential for erratic vehicle behavior.”
  88. The Society of Automotive Engineers (SAE) defines six levels of automation, where Level 0 is none at all and Level 5 is full automation: “The full-time performance by an automatic driving system of all aspects of the dynamic driving task under all roadway and environmental conditions that can be managed by a human driver.”
  89. Forecast of economic effects of automation on transportation costs: Adele Peters, “It could be 10 times cheaper to take electric robo-taxis than to own a car by 2030,” Fast Company, May 30, 2017.
  90. The impact of accidents on the prospects for regulatory action on autonomous vehicles: Richard Waters, “Self-driving car death poses dilemma for regulators,” Financial Times, March 20, 2018.
  91. The impact of accidents on public perception of autonomous vehicles: Cox Automotive, “Autonomous vehicle awareness rising, acceptance declining, according to Cox Automotive mobility study,” August 16, 2018.
  92. The original chatbot: Joseph Weizenbaum, “ELIZA—a computer program for the study of natural language communication between man and machine,” Communications of the ACM 9 (1966): 36–45.
  93. See physiome.org for current activities in physiological modeling. Work in the 1960s assembled models with thousands of differential equations: Arthur Guyton, Thomas Coleman, and Harris Granger, “Circulation: Overall regulation,” Annual Review of Physiology 34 (1972): 13–44.
  94. Some of the earliest work on tutoring systems was done by Pat Suppes and colleagues at Stanford: Patrick Suppes and Mona Morningstar, “Computer-assisted instruction,” Science 166 (1969): 343–50.
  95. Michael Yudelson, Kenneth Koedinger, and Geoffrey Gordon, “Individualized Bayesian knowledge tracing models,” in Artificial Intelligence in Education: 16th International Conference, ed. H. Chad Lane et al. (Springer, 2013).
  96. For an example of machine learning on encrypted data, see, for example, Reza Shokri and Vitaly Shmatikov, “Privacy-preserving deep learning,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (ACM, 2015).
  97. A retrospective on the first smart home, based on a lecture by its inventor, James Sutherland: James E. Tomayko, “Electronic Computer for Home Operation (ECHO): The first home computer,” IEEE Annals of the History of Computing 16 (1994): 59–61.
  98. Summary of a smart-home project based on machine learning and automated decisions: Diane Cook et al., “MavHome: An agent-based smart home,” in Proceedings of the 1st IEEE International Conference on Pervasive Computing and Communications (IEEE, 2003).
  99. For the beginnings of an analysis of user experiences in smart homes, see Scott Davidoff et al., “Principles of smart home control,” in Ubicomp 2006: Ubiquitous Computing, ed. Paul Dourish and Adrian Friday (Springer, 2006).
  100. Commercial announcement of AI-based smart homes: “The Wolff Company unveils revolutionary smart home technology at new Annadel Apartments in Santa Rosa, California,” Business Insider, March 12, 2018.
  101. Article on robot chefs as commercial products: Eustacia Huen, “The world’s first home robotic chef can cook over 100 meals,” Forbes, October 31, 2016.
  102. Report from my Berkeley colleagues on deep RL for robotic motor control: Sergey Levine et al., “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research 17 (2016): 1–40.
  103. On the possibilities for automating the work of hundreds of thousands of warehouse workers: Tom Simonite, “Grasping robots compete to rule Amazon’s warehouses,” Wired, July 26, 2017.
  104. I’m assuming a generous one laptop-CPU minute per page, or about 10¹¹ operations. A third-generation tensor processing unit from Google runs at about 10¹⁷ operations per second, meaning that it can read a million pages per second, or about five hours for eighty million two-hundred-page books.
  105. A 2003 study on the global volume of information production by all channels: Peter Lyman and Hal Varian, “How much information?” sims.berkeley.edu/research/projects/how-much-info-2003.
  106. For details on the use of speech recognition by intelligence agencies, see Dan Froomkin, “How the NSA converts spoken words into searchable text,” The Intercept, May 5, 2015.
  107. Analysis of visual imagery from satellites is an enormous task: Mike Kim, “Mapping poverty from space with the World Bank,” Medium.com, January 4, 2017. Kim estimates eight million people working 24/7, which converts to more than thirty million people working forty hours per week. I suspect this is an overestimate in practice, because the vast majority of the images would exhibit negligible change over the course of one day. On the other hand, the US intelligence community employs tens of thousands of people sitting in vast rooms staring at satellite images just to keep track of what’s happening in small regions of interest; so one million people is probably about right for the whole world.
  108. There is substantial progress towards a global observatory based on real-time satellite image data: David Jensen and Jillian Campbell, “Digital earth: Building, financing and governing a digital ecosystem for planetary data,” white paper for the UN Science-Policy-Business Forum on the Environment, 2018.
  109. Luke Muehlhauser has written extensively on AI predictions, and I am indebted to him for tracking down original sources for the quotations that follow. See Luke Muehlhauser, “What should we learn from past AI forecasts?” Open Philanthropy Project report, 2016.
  110. A forecast of the arrival of human-level AI within twenty years: Herbert Simon, The New Science of Management Decision (Harper & Row, 1960).
  111. A forecast of the arrival of human-level AI within a generation: Marvin Minsky, Computation: Finite and Infinite Machines (Prentice Hall, 1967).
  112. John McCarthy’s forecast of the arrival of human-level AI within “five to 500 years”: Ian Shenker, “Brainy robots in our future, experts think,” Detroit Free Press, September 30, 1977.
  113. For a summary of surveys of AI researchers on their estimates for the arrival of human-level AI, see aiimpacts.org. An extended discussion of survey results on human-level AI is given by Katja Grace et al., “When will AI exceed human performance? Evidence from AI experts,” arXiv:1705.08807v3 (2018).
  114. For a chart mapping raw computer power against brain power, see Ray Kurzweil, “The law of accelerating returns,” Kurzweilai.net, March 7, 2001.
  115. The Allen Institute’s Project Aristo: allenai.org/aristo.
  116. For an analysis of the knowledge required to perform well on fourth-grade tests of comprehension and common sense, see Peter Clark et al., “Automatic construction of inference-supporting knowledge bases,” in Proceedings of the Workshop on Automated Knowledge Base Construction (2014), akbc.ws/2014.
  117. The NELL project on machine reading is described by Tom Mitchell et al., “Never-ending learning,” Communications of the ACM 61 (2018): 103–15.
  118. The idea of bootstrapping inferences from text is due to Sergey Brin, “Extracting patterns and relations from the World Wide Web,” in The World Wide Web and Databases, ed. Paolo Atzeni, Alberto Mendelzon, and Giansalvatore Mecca (Springer, 1998).
  119. For a visualization of the black-hole collision detected by LIGO, see LIGO Lab Caltech, “Warped space and time around colliding black holes,” February 11, 2016, youtube.com/watch?v=1agm33iEAuo.
  120. The first publication describing observation of gravitational waves: Ben Abbott et al., “Observation of gravitational waves from a binary black hole merger,” Physical Review Letters 116 (2016): 061102.
  121. On babies as scientists: Alison Gopnik, Andrew Meltzoff, and Patricia Kuhl, The Scientist in the Crib: Minds, Brains, and How Children Learn (William Morrow, 1999).
  122. A summary of several projects on automated scientific analysis of experimental data to discover laws: Patrick Langley et al., Scientific Discovery: Computational Explorations of the Creative Processes (MIT Press, 1987).
  123. Some early work on machine learning guided by prior knowledge: Stuart Russell, The Use of Knowledge in Analogy and Induction (Pitman, 1989).
  124. Goodman’s philosophical analysis of induction remains a source of inspiration: Nelson Goodman, Fact, Fiction, and Forecast (University of London Press, 1954).
  125. A veteran AI researcher complains about mysticism in the philosophy of science: Herbert Simon, “Explaining the ineffable: AI on the topics of intuition, insight and inspiration,” in Proceedings of the 14th International Conference on Artificial Intelligence, ed. Chris Mellish (Morgan Kaufmann, 1995).
  126. A survey of inductive logic programming by two originators of the field: Stephen Muggleton and Luc de Raedt, “Inductive logic programming: Theory and methods,” Journal of Logic Programming 19–20 (1994): 629–79.
  127. For an early mention of the importance of encapsulating complex operations as new primitive actions, see Alfred North Whitehead, An Introduction to Mathematics (Henry Holt, 1911).
  128. Work demonstrating that a simulated robot can learn entirely by itself to stand up: John Schulman et al., “High-dimensional continuous control using generalized advantage estimation,” arXiv:1506.02438 (2015). A video demonstration is available at youtube.com/watch?v=SHLuf2ZBQSw.
  129. A description of a reinforcement learning system that learns to play a capture-the-flag video game: Max Jaderberg et al., “Human-level performance in first-person multiplayer games with population-based deep reinforcement learning,” arXiv:1807.01281 (2018).
  130. A view of AI progress over the next few years: Peter Stone et al., “Artificial intelligence and life in 2030,” One Hundred Year Study on Artificial Intelligence, report of the 2015 Study Panel, 2016.
  131. The media-fueled argument between Elon Musk and Mark Zuckerberg: Peter Holley, “Billionaire burn: Musk says Zuckerberg’s understanding of AI threat ‘is limited,’” The Washington Post, July 25, 2017.
  132. On the value of search engines to individual users: Erik Brynjolfsson, Felix Eggers, and Avinash Gannamaneni, “Using massive online choice experiments to measure changes in well-being,” working paper no. 24514, National Bureau of Economic Research, 2018.
  133. Penicillin was discovered several times and its curative powers were described in medical publications, but no one seems to have noticed. See en.wikipedia.org/wiki/History_of_penicillin.
  134. For a discussion of some of the more esoteric risks from omniscient, clairvoyant AI systems, see David Auerbach, “The most terrifying thought experiment of all time,” Slate, July 17, 2014.
  135. An analysis of some potential pitfalls in thinking about advanced AI: Kevin Kelly, “The myth of a superhuman AI,” Wired, April 25, 2017.
  136. Machines may share some aspects of cognitive structure with humans, particularly those aspects dealing with perception and manipulation of the physical world and the conceptual structures involved in natural language understanding. Their deliberative processes are likely to be quite different because of the enormous disparities in hardware.
  137. According to 2016 survey data, the eighty-eighth percentile corresponds to $100,000 per year: American Community Survey, US Census Bureau, www.census.gov/programs-surveys/acs. For the same year, global per capita GDP was $10,133: National Accounts Main Aggregates Database, UN Statistics Division, unstats.un.org/unsd/snaama.
  138. If the GDP growth phases in over ten years or twenty years, it’s worth $9,400 trillion or $6,800 trillion, respectively—still nothing to sneeze at. On an interesting historical note, I. J. Good, who popularized the notion of an intelligence explosion (this page), estimated the value of human-level AI to be at least “one megaKeynes,” referring to the fabled economist John Maynard Keynes. The value of Keynes’s contributions was estimated in 1963 as £100 billion, so a megaKeynes comes out to around $2,200,000 trillion in 2016 dollars. Good pinned the value of AI primarily on its potential to ensure that the human race survives indefinitely. Later, he came to wonder whether he should have added a minus sign.
  139. The EU announced plans for $24 billion in research and development spending for the period 2019–20. See European Commission, “Artificial intelligence: Commission outlines a European approach to boost investment and set ethical guidelines,” press release, April 25, 2018. China’s long-term investment plan for AI, announced in 2017, envisages a core AI industry generating $150 billion annually by 2030. See, for example, Paul Mozur, “Beijing wants A.I. to be made in China by 2030,” The New York Times, July 20, 2017.
  140. See, for example, Rio Tinto’s Mine of the Future program at riotinto.com/australia/pilbara/mine-of-the-future-9603.aspx.
  141. A retrospective analysis of economic growth: Jan Luiten van Zanden et al., eds., How Was Life? Global Well-Being since 1820 (OECD Publishing, 2014).
  142. The desire for relative advantage over others, rather than an absolute quality of life, is a positional good; see Chapter 9. CHAPTER 4
  143. Wikipedia’s article on the Stasi has several useful references on its workforce and its overall impact on East German life.
  144. For details on Stasi files, see Cullen Murphy, God’s Jury: The Inquisition and the Making of the Modern World (Houghton Mifflin Harcourt, 2012).
  145. For a thorough analysis of AI surveillance systems, see Jay Stanley, The Dawn of Robot Surveillance (American Civil Liberties Union, 2019).
  146. Recent books on surveillance and control include Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (PublicAffairs, 2019) and Roger McNamee, Zucked: Waking Up to the Facebook Catastrophe (Penguin Press, 2019).
  147. News article on a blackmail bot: Avivah Litan, “Meet Delilah—the first insider threat Trojan,” Gartner Blog Network, July 14, 2016.
  148. For a low-tech version of human susceptibility to misinformation, in which an unsuspecting individual becomes convinced that the world is being destroyed by meteor strikes, see Derren Brown: Apocalypse, “Part One,” directed by Simon Dinsell, 2012, youtube.com/watch?v=o_CUrMJOxqs.
  149. An economic analysis of reputation systems and their corruption is given by Steven Tadelis, “Reputation and feedback systems in online platform markets,” Annual Review of Economics 8 (2016): 321–40.
  150. Goodhart’s law: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” For example, there may once have been a correlation between faculty quality and faculty salary, so the US News & World Report college rankings measure faculty quality by faculty salaries. This has contributed to a salary arms race that benefits faculty members but not the students who pay for those salaries. The arms race changes faculty salaries in a way that does not depend on faculty quality, so the correlation tends to disappear.
  151. An article describing German efforts to police public discourse: Bernhard Rohleder, “Germany set out to delete hate speech online. Instead, it made things worse,” WorldPost, February 20, 2018.
  152. On the “infopocalypse”: Aviv Ovadya, “What’s worse than fake news? The distortion of reality itself,” WorldPost, February 22, 2018.
  153. On the corruption of online hotel reviews: Dina Mayzlin, Yaniv Dover, and Judith Chevalier, “Promotional reviews: An empirical investigation of online review manipulation,” American Economic Review 104 (2014): 2421–55.
  154. Statement of Germany at the Meeting of the Group of Governmental Experts, Convention on Certain Conventional Weapons, Geneva, April 10, 2018.
  155. The Slaughterbots movie, funded by the Future of Life Institute, appeared in November 2017 and is available at youtube.com/watch?v=9CO6M2HsoIA.
  156. For a report on one of the bigger faux pas in military public relations, see Dan Lamothe, “Pentagon agency wants drones to hunt in packs, like wolves,” The Washington Post, January 23, 2015.
  157. Announcement of a large-scale drone swarm experiment: US Department of Defense, “Department of Defense announces successful micro-drone demonstration,” news release no. NR-008-17, January 9, 2017.
  158. Examples of research centers studying the impact of technology on employment are the Work and Intelligent Tools and Systems group at Berkeley, the Future of Work and Workers project at the Center for Advanced Study in the Behavioral Sciences at Stanford, and the Future of Work Initiative at Carnegie Mellon University.
  159. A pessimistic take on future technological unemployment: Martin Ford, Rise of the Robots: Technology and the Threat of a Jobless Future (Basic Books, 2015).
  160. Calum Chace, The Economic Singularity: Artificial Intelligence and the Death of Capitalism (Three Cs, 2016).
  161. For an excellent collection of essays, see Ajay Agrawal, Joshua Gans, and Avi Goldfarb, eds., The Economics of Artificial Intelligence: An Agenda (National Bureau of Economic Research, 2019).
  162. The mathematical analysis behind this “inverted-U” employment curve is given by James Bessen, “Artificial intelligence and jobs: The role of demand” in The Economics of Artificial Intelligence, ed. Agrawal, Gans, and Goldfarb.
  163. For a discussion of economic dislocation arising from automation, see Eduardo Porter, “Tech is splitting the US work force in two,” The New York Times, February 4, 2019. The article cites the following report for this conclusion: David Autor and Anna Salomons, “Is automation labor-displacing? Productivity growth, employment, and the labor share,” Brookings Papers on Economic Activity (2018).
  164. For data on the growth of banking in the twentieth century, see Thomas Philippon, “The evolution of the US financial industry from 1860 to 2007: Theory and evidence,” working paper, 2008.
  165. The bible for jobs data and the growth and decline of occupations: US Bureau of Labor Statistics, Occupational Outlook Handbook: 2018–2019 Edition (Bernan Press, 2018).
  166. A report on trucking automation: Lora Kolodny, “Amazon is hauling cargo in self-driving trucks developed by Embark,” CNBC, January 30, 2019.
  167. The progress of automation in legal analytics, describing the results of a contest: Jason Tashea, “AI software is more accurate, faster than attorneys when assessing NDAs,” ABA Journal, February 26, 2018.
  168. A commentary by a distinguished economist, with a title explicitly evoking Keynes’s 1930 article: Lawrence Summers, “Economic possibilities for our children,” NBER Reporter (2013).
  169. The analogy between data science employment and a small lifeboat for a giant cruise ship comes from a discussion with Yong Ying-I, head of Singapore’s Public Service Division. She conceded that it was correct on the global scale, but noted that “Singapore is small enough to fit in the lifeboat.”
  170. Support for UBI from a conservative viewpoint: Sam Bowman, “The ideal welfare system is a basic income,” Adam Smith Institute, November 25, 2013.
  171. Support for UBI from a progressive viewpoint: Jonathan Bartley, “The Greens endorse a universal basic income. Others need to follow,” The Guardian, June 2, 2017.
  172. Chace, in The Economic Singularity, calls the “paradise” version of UBI the Star Trek economy, noting that in the more recent series of Star Trek episodes, money has been abolished because technology has created essentially unlimited material goods and energy. He also points to the massive changes in economic and social organization that will be needed to make such a system successful.
  173. The economist Richard Baldwin also predicts a future of personal services in his book The Globotics Upheaval: Globalization, Robotics, and the Future of Work (Oxford University Press, 2019).
  174. The book that is viewed as having exposed the failure of “whole-word” literacy education and launched decades of struggle between the two main schools of thought on reading: Rudolf Flesch, Why Johnny Can’t Read: And What You Can Do about It (Harper & Bros., 1955).
  175. On educational methods that enable the recipient to adapt to the rapid rate of technological and economic change in the next few decades: Joseph Aoun, Robot-Proof: Higher Education in the Age of Artificial Intelligence (MIT Press, 2017).
  176. A radio lecture in which Turing predicted that humans would be overtaken by machines: Alan Turing, “Can digital machines think?,” May 15, 1951, radio broadcast, BBC Third Programme. Typescript available at turingarchive.org.
  177. News article describing the “naturalization” of Sophia as a citizen of Saudi Arabia: Dave Gershgorn, “Inside the mechanical brain of the world’s first robot citizen,” Quartz, November 12, 2017.
  178. On Yann LeCun’s view of Sophia: Shona Ghosh, “Facebook’s AI boss described Sophia the robot as ‘complete b——t’ and ‘Wizard-of-Oz AI,’” Business Insider, January 6, 2018.
  179. An EU proposal on legal rights for robots: Committee on Legal Affairs of the European Parliament, “Report with recommendations to the Commission on Civil Law Rules on Robotics (2015/2103(INL)),” 2017.
  180. The GDPR provision on a “right to an explanation” is not, in fact, new: it is very similar to Article 15(1) of the 1995 Data Protection Directive, which it supersedes.
  181. Here are three recent papers providing insightful mathematical analyses of fairness: Moritz Hardt, Eric Price, and Nati Srebro, “Equality of opportunity in supervised learning,” in Advances in Neural Information Processing Systems 29, ed. Daniel Lee et al. (2016); Matt Kusner et al., “Counterfactual fairness,” in Advances in Neural Information Processing Systems 30, ed. Isabelle Guyon et al. (2017); Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan, “Inherent trade-offs in the fair determination of risk scores,” in 8th Innovations in Theoretical Computer Science Conference, ed. Christos Papadimitriou (Dagstuhl Publishing, 2017).
  182. News article describing the consequences of software failure for air traffic control: Simon Calder, “Thousands stranded by flight cancellations after systems failure at Europe’s air-traffic coordinator,” The Independent, April 3, 2018. CHAPTER 5
  183. Lovelace wrote, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths.” This was one of the arguments against AI that was refuted by Alan Turing, “Computing machinery and intelligence,” Mind 59 (1950): 433–60.
  184. The earliest known article on existential risk from AI was by Richard Thornton, “The age of machinery,” Primitive Expounder IV (1847): 281.
  185. “The Book of the Machines” was based on an earlier article by Samuel Butler, “Darwin among the machines,” The Press (Christchurch, New Zealand), June 13, 1863.
  186. Another lecture in which Turing predicted the subjugation of humankind: Alan Turing, “Intelligent machinery, a heretical theory” (lecture given to the 51 Society, Manchester, 1951). Typescript available at turingarchive.org.
  187. Wiener’s prescient discussion of technological control over humanity and a plea to retain human autonomy: Norbert Wiener, The Human Use of Human Beings (Riverside Press, 1950).
  188. The front-cover blurb from Wiener’s 1950 book is remarkably similar to the motto of the Future of Life Institute, an organization dedicated to studying the existential risks that humanity faces: “Technology is giving life the potential to flourish like never before . . . or to self-destruct.”
  189. An updating of Wiener’s views arising from his increased appreciation of the possibility of intelligent machines: Norbert Wiener, God and Golem, Inc.: A Comment on Certain Points Where Cybernetics Impinges on Religion (MIT Press, 1964).
  190. Asimov’s Three Laws of Robotics first appeared in Isaac Asimov, “Runaround,” Astounding Science Fiction, March 1942. The laws are as follows:
  191. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  192. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  193. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. It is important to understand that Asimov proposed these laws as a way to generate interesting story plots, not as a serious guide for future roboticists. Several of his stories, including “Runaround,” illustrate the problematic consequences of taking the laws literally. From the standpoint of modern AI, the laws fail to acknowledge any element of probability and risk: the legality of robot actions that expose a human to some probability of harm—however infinitesimal—is therefore unclear.
  194. The notion of instrumental goals is due to Stephen Omohundro, “The nature of self-improving artificial intelligence” (unpublished manuscript, 2008). See also Stephen Omohundro, “The basic AI drives,” in Artificial General Intelligence 2008: Proceedings of the First AGI Conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin (IOS Press, 2008).
  195. The objective of Johnny Depp’s character, Will Caster, seems to be to solve the problem of physical reincarnation so that he can be reunited with his wife, Evelyn. This just goes to show that the nature of the overarching objective doesn’t matter—the instrumental goals are all the same.
  196. The original source for the idea of an intelligence explosion: I. J. Good, “Speculations concerning the first ultraintelligent machine,” in Advances in Computers, vol. 6, ed. Franz Alt and Morris Rubinoff (Academic Press, 1965).
  197. An example of the impact of the intelligence explosion idea: Luke Muehlhauser, in Facing the Intelligence Explosion (intelligenceexplosion.com), writes, “Good’s paragraph ran over me like a train.”
  198. Diminishing returns can be illustrated as follows: suppose that a 16 percent improvement in intelligence creates a machine capable of making an 8 percent improvement, which in turn creates a 4 percent improvement, and so on. This process reaches a limit at about 36 percent above the original level. For more discussion on these issues, see Eliezer Yudkowsky, “Intelligence explosion microeconomics,” technical report 2013-1, Machine Intelligence Research Institute, 2013.
  199. For a view of AI in which humans become irrelevant, see Hans Moravec, Mind Children: The Future of Robot and Human Intelligence (Harvard University Press, 1988). See also Hans Moravec, Robot: Mere Machine to Transcendent Mind (Oxford University Press, 2000). CHAPTER 6
  200. A serious publication provides a serious review of Bostrom’s Superintelligence: Paths, Dangers, Strategies: “Clever cogs,” Economist, August 9, 2014.
  201. A discussion of myths and misunderstandings concerning the risks of AI: Scott Alexander, “AI researchers on AI risk,” Slate Star Codex (blog), May 22, 2015.
  202. The classic work on multiple dimensions of intelligence: Howard Gardner, Frames of Mind: The Theory of Multiple Intelligences (Basic Books, 1983).
  203. On the implications of multiple dimensions of intelligence for the possibility of superhuman AI: Kevin Kelly, “The myth of a superhuman AI,” Wired, April 25, 2017.
  204. Evidence that chimpanzees have better short-term memory than humans: Sana Inoue and Tetsuro Matsuzawa, “Working memory of numerals in chimpanzees,” Current Biology 17 (2007), R1004–5.
  205. An important early work questioning the prospects for rule-based AI systems: Hubert Dreyfus, What Computers Can’t Do (MIT Press, 1972).
  206. The first in a series of books seeking physical explanations for consciousness and raising doubts about the ability of AI systems to achieve real intelligence: Roger Penrose, The Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics (Oxford University Press, 1989).
  207. A revival of the critique of AI based on the incompleteness theorem: Luciano Floridi, “Should we be afraid of AI?” Aeon, May 9, 2016.
  208. A revival of the critique of AI based on the Chinese room argument: John Searle, “What your computer can’t know,” The New York Review of Books, October 9, 2014.
  209. A report from distinguished AI researchers claiming that superhuman AI is probably impossible: Peter Stone et al., “Artificial intelligence and life in 2030,” One Hundred Year Study on Artificial Intelligence, report of the 2015 Study Panel, 2016.
  210. News article based on Andrew Ng’s dismissal of risks from AI: Chris Williams, “AI guru Ng: Fearing a rise of killer robots is like worrying about overpopulation on Mars,” Register, March 19, 2015.
  211. An example of the “experts know best” argument: Oren Etzioni, “It’s time to intelligently discuss artificial intelligence,” Backchannel, December 9, 2014.
  212. News article claiming that real AI researchers dismiss talk of risks: Erik Sofge, “Bill Gates fears AI, but AI researchers know better,” Popular Science, January 30, 2015.
  213. Another claim that real AI researchers dismiss AI risks: David Kenny, “IBM’s open letter to Congress on artificial intelligence,” June 27, 2017, ibm.com/blogs/policy/kenny-artificial-intelligence-letter.
  214. Report from the workshop that proposed voluntary restrictions on genetic engineering: Paul Berg et al., “Summary statement of the Asilomar Conference on Recombinant DNA Molecules,” Proceedings of the National Academy of Sciences 72 (1975): 1981–84.
  215. Policy statement arising from the invention of CRISPR-Cas9 for gene editing: Organizing Committee for the International Summit on Human Gene Editing, “On human gene editing: International Summit statement,” December 3, 2015.
  216. The latest policy statement from leading biologists: Eric Lander et al., “Adopt a moratorium on heritable genome editing,” Nature 567 (2019): 165–68.
  217. Etzioni’s comment that one cannot mention risks if one does not also mention benefits appears alongside his analysis of survey data from AI researchers: Oren Etzioni, “No, the experts don’t think superintelligent AI is a threat to humanity,” MIT Technology Review, September 20, 2016. In his analysis he argues that anyone who expects superhuman AI to take more than twenty-five years—which includes this author as well as Nick Bostrom—is not concerned about the risks of AI.
  218. A news article with quotations from the Musk–Zuckerberg “debate”: Alanna Petroff, “Elon Musk says Mark Zuckerberg’s understanding of AI is ‘limited,’” CNN Money, July 25, 2017.
  219. In 2015 the Information Technology and Innovation Foundation organized a debate titled “Are super intelligent computers really a threat to humanity?” Robert Atkinson, director of the foundation, suggests that mentioning risks is likely to result in reduced funding for AI. Video available at itif.org/events/2015/06/30/are-super-intelligent-computers-really-threat-humanity; the relevant discussion begins at 41:30.
  220. A claim that our culture of safety will solve the AI control problem without ever mentioning it: Steven Pinker, “Tech prophecy and the underappreciated causal power of ideas,” in Possible Minds: Twenty-Five Ways of Looking at AI, ed. John Brockman (Penguin Press, 2019).
  221. For an interesting analysis of Oracle AI, see Stuart Armstrong, Anders Sandberg, and Nick Bostrom, “Thinking inside the box: Controlling and using an Oracle AI,” Minds and Machines 22 (2012): 299–324.
  222. Views on why AI is not going to take away jobs: Kenny, “IBM’s open letter.”
  223. An example of Kurzweil’s positive views of merging human brains with AI: Ray Kurzweil, interview by Bob Pisani, June 5, 2015, Exponential Finance Summit, New York, NY.
  224. Article quoting Elon Musk on neural lace: Tim Urban, “Neuralink and the brain’s magical future,” Wait But Why, April 20, 2017.
  225. For the most recent developments in Berkeley’s neural dust project, see David Piech et al., “StimDust: A 1.7 mm³, implantable wireless precision neural stimulator with ultrasonic power and communication,” arXiv: 1807.07590 (2018).
  226. Susan Schneider, in Artificial You: AI and the Future of Your Mind (Princeton University Press, 2019), points out the risks of ignorance in proposed technologies such as uploading and neural prostheses: that, absent any real understanding of whether electronic devices can be conscious and given the continuing philosophical confusion over persistent personal identity, we may inadvertently end our own conscious existences or inflict suffering on conscious machines without realizing that they are conscious.
  227. An interview with Yann LeCun on AI risks: Guia Marie Del Prado, “Here’s what Facebook’s artificial intelligence expert thinks about the future,” Business Insider, September 23, 2015.
  228. A diagnosis of AI control problems arising from an excess of testosterone: Steven Pinker, “Thinking does not imply subjugating,” in What to Think About Machines That Think, ed. John Brockman (Harper Perennial, 2015).
  229. A seminal work on many philosophical topics, including the question of whether moral obligations may be perceived in the natural world: David Hume, A Treatise of Human Nature (John Noon, 1738).
  230. An argument that a sufficiently intelligent machine cannot help but pursue human objectives: Rodney Brooks, “The seven deadly sins of AI predictions,” MIT Technology Review, October 6, 2017.
  231. Pinker, “Thinking does not imply subjugating.”
  232. For an optimistic view arguing that AI safety problems will necessarily be resolved in our favor: Steven Pinker, “Tech prophecy.”
  233. On the unsuspected alignment between “skeptics” and “believers” in AI risk: Alexander, “AI researchers on AI risk.” CHAPTER 7
  234. For a guide to detailed brain modeling, now slightly outdated, see Anders Sandberg and Nick Bostrom, “Whole brain emulation: A roadmap,” technical report 2008-3, Future of Humanity Institute, Oxford University, 2008.
  235. For an introduction to genetic programming from a leading exponent, see John Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, 1992).
  236. The parallel to Asimov’s Three Laws of Robotics is entirely coincidental.
  237. The same point is made by Eliezer Yudkowsky, “Coherent extrapolated volition,” technical report, Singularity Institute, 2004. Yudkowsky argues that directly building in “Four Great Moral Principles That Are All We Need to Program into AIs” is a sure road to ruin for humanity. His notion of the “coherent extrapolated volition of humankind” has the same general flavor as the first principle; the idea is that a superintelligent AI system could work out what humans, collectively, really want.
  238. You can certainly have preferences over whether a machine is helping you achieve your preferences or you are achieving them through your own efforts. For example, suppose you prefer outcome A to outcome B, all other things being equal. You are unable to achieve outcome A unaided, and yet you still prefer B to getting A with the machine’s help. In that case the machine should decide not to help you—unless perhaps it can do so in a way that is completely undetectable by you. You may, of course, have preferences about undetectable help as well as detectable help.
  239. The phrase “the greatest good of the greatest number” originates in the work of Francis Hutcheson, An Inquiry into the Original of Our Ideas of Beauty and Virtue, In Two Treatises (D. Midwinter et al., 1725). Some have ascribed the formulation to an earlier comment by Wilhelm Leibniz; see Joachim Hruschka, “The greatest happiness principle and other early German anticipations of utilitarian theory,” Utilitas 3 (1991): 165–77.
  240. One might propose that the machine should include terms for animals as well as humans in its own objective function. If these terms have weights that correspond to how much people care about animals, then the end result will be the same as if the machine cares about animals only through caring about humans who care about animals. Giving each living animal equal weight in the machine’s objective function would certainly be catastrophic—for example, we are outnumbered fifty thousand to one by Antarctic krill and a billion trillion to one by bacteria.
  241. The moral philosopher Toby Ord made the same point to me in his comments on an early draft of this book: “Interestingly, the same is true in the study of moral philosophy. Uncertainty about moral value of outcomes was almost completely neglected in moral philosophy until very recently. Despite the fact that it is our uncertainty of moral matters that leads people to ask others for moral advice and, indeed, to do research on moral philosophy at all!”
  242. One excuse for not paying attention to uncertainty about preferences is that it is formally equivalent to ordinary uncertainty, in the following sense: being uncertain about what I like is the same as being certain that I like likable things while being uncertain about what things are likable. This is just a trick that appears to move the uncertainty into the world, by making “likability by me” a property of objects rather than a property of me. In game theory, this trick has been thoroughly institutionalized since the 1960s, following a series of papers by my late colleague and Nobel laureate John Harsanyi: “Games with incomplete information played by ‘Bayesian’ players, Parts I–III,” Management Science 14 (1967, 1968): 159–82, 320–34, 486–502. In decision theory, the standard reference is the following: Richard Cyert and Morris de Groot, “Adaptive utility,” in Expected Utility Hypotheses and the Allais Paradox, ed. Maurice Allais and Ole Hagen (D. Reidel, 1979).
  243. AI researchers working in the area of preference elicitation are an obvious exception. See, for example, Craig Boutilier, “On the foundations of expected expected utility,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence (Morgan Kaufmann, 2003). Also Alan Fern et al., “A decision-theoretic model of assistance,” Journal of Artificial Intelligence Research 50 (2014): 71–104.
  244. A critique of beneficial AI based on a misinterpretation of a journalist’s brief interview with the author in a magazine article: Adam Elkus, “How to be good: Why you can’t teach human values to artificial intelligence,” Slate, April 20, 2016.
  245. The origin of trolley problems: Frank Sharp, “A study of the influence of custom on the moral judgment,” Bulletin of the University of Wisconsin 236 (1908).
  246. The “anti-natalist” movement believes it is morally wrong for humans to reproduce because to live is to suffer and because humans’ impact on the Earth is profoundly negative. If you consider the existence of humanity to be a moral dilemma, then I suppose I do want machines to resolve this moral dilemma the right way.
  247. Statement on China’s AI policy by Fu Ying, vice chair of the Foreign Affairs Committee of the National People’s Congress. In a letter to the 2018 World AI Conference in Shanghai, Chinese president Xi Jinping wrote, “Deepened international cooperation is required to cope with new issues in fields including law, security, employment, ethics and governance.” I am indebted to Brian Tse for bringing these statements to my attention.
  248. A very interesting paper on the non-naturalistic non-fallacy, showing how preferences can be inferred from the state of the world as arranged by humans: Rohin Shah et al., “The implicit preference information in an initial state,” in Proceedings of the 7th International Conference on Learning Representations (2019), iclr.cc/Conferences/2019/Schedule.
  249. Retrospective on Asilomar: Paul Berg, “Asilomar 1975: DNA modification secured,” Nature 455 (2008): 290–91.
  250. News article reporting Putin’s speech on AI: “Putin: Leader in artificial intelligence will rule world,” Associated Press, September 4, 2017. CHAPTER 8
  251. Fermat’s Last Theorem asserts that the equation a^(n) = b^(n) + c^(n) has no solutions with a, b, and c being whole numbers and n being a whole number larger than 2. In the margin of his copy of Diophantus’s Arithmetica, Fermat wrote, “I have a truly marvellous proof of this proposition which this margin is too narrow to contain.” True or not, this guaranteed that mathematicians pursued a proof with vigor in the subsequent centuries. We can easily check particular cases—for example, is 7³ equal to 6³ + 5³? (Almost, because 7³ is 343 and 6³ + 5³ is 341, but “almost” doesn’t count.) There are, of course, infinitely many cases to check, and that’s why we need mathematicians and not just computer programmers.
  252. A paper from the Machine Intelligence Research Institute poses many related issues: Scott Garrabrant and Abram Demski, “Embedded agency,” AI Alignment Forum, November 15, 2018.
  253. The classic work on multiattribute utility theory: Ralph Keeney and Howard Raiffa, Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, 1976).
  254. Paper introducing the idea of inverse RL: Stuart Russell, “Learning agents for uncertain environments,” in Proceedings of the 11th Annual Conference on Computational Learning Theory (ACM, 1998).
  255. The original paper on structural estimation of Markov decision processes: Thomas Sargent, “Estimation of dynamic labor demand schedules under rational expectations,” Journal of Political Economy 86 (1978): 1009–44.
  256. The first algorithms for IRL: Andrew Ng and Stuart Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of the 17th International Conference on Machine Learning, ed. Pat Langley (Morgan Kaufmann, 2000).
  257. Better algorithms for inverse RL: Pieter Abbeel and Andrew Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the 21st International Conference on Machine Learning, ed. Russ Greiner and Dale Schuurmans (ACM Press, 2004).
  258. Understanding inverse RL as Bayesian updating: Deepak Ramachandran and Eyal Amir, “Bayesian inverse reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, ed. Manuela Veloso (AAAI Press, 2007).
  259. How to teach helicopters to fly and do aerobatic maneuvers: Adam Coates, Pieter Abbeel, and Andrew Ng, “Apprenticeship learning for helicopter control,” Communications of the ACM 52 (2009): 97–105.
  260. The original name proposed for an assistance game was a cooperative inverse reinforcement learning game, or CIRL game. See Dylan Hadfield-Menell et al., “Cooperative inverse reinforcement learning,” in Advances in Neural Information Processing Systems 29, ed. Daniel Lee et al. (2016).
  261. These numbers are chosen just to make the game interesting.
  262. The equilibrium solution to the game can be found by a process called iterated best response: pick any strategy for Harriet; pick the best strategy for Robbie, given Harriet’s strategy; pick the best strategy for Harriet, given Robbie’s strategy; and so on. If this process reaches a fixed point, where neither strategy changes, then we have found a solution. The process unfolds as follows:
  263. Start with the greedy strategy for Harriet: make 2 paperclips if she prefers paperclips; make 1 of each if she is indifferent; make 2 staples if she prefers staples.
  264. There are three possibilities Robbie has to consider, given this strategy for Harriet:
  265. If Robbie sees Harriet make 2 paperclips, he infers that she prefers paperclips, so he now believes the value of a paperclip is uniformly distributed between 50¢ and $1.00, with an average of 75¢. In that case, his best plan is to make 90 paperclips with an expected value of $67.50 for Harriet.
  266. If Robbie sees Harriet make 1 of each, he infers that she values paperclips and staples at 50¢, so the best choice is to make 50 of each.
  267. If Robbie sees Harriet make 2 staples, then by the same argument as in 2(a), he should make 90 staples.
  268. Given this strategy for Robbie, Harriet’s best strategy is now somewhat different from the greedy strategy in step 1: if Robbie is going to respond to her making 1 of each by making 50 of each, then she is better off making 1 of each not just if she is exactly indifferent but if she is anywhere close to indifferent. In fact, the optimal policy is now to make 1 of each if she values paperclips anywhere between about 44.6¢ and 55.4¢.
  269. Given this new strategy for Harriet, Robbie’s strategy remains unchanged. For example, if she chooses 1 of each, he infers that the value of a paperclip is uniformly distributed between 44.6¢ and 55.4¢, with an average of 50¢, so the best choice is to make 50 of each. Because Robbie’s strategy is the same as in step 2, Harriet’s best response will be the same as in step 3, and we have found the equilibrium.
  270. For a more complete analysis of the off-switch game, see Dylan Hadfield-Menell et al., “The off-switch game,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, ed. Carles Sierra (IJCAI, 2017).
  271. The proof of the general result is quite simple if you don’t mind integral signs. Let P(u) be Robbie’s prior probability density over Harriet’s utility for the proposed action a. Then the value of going ahead with a is (We will see shortly why the integral is split up in this way.) On the other hand, the value of action d, deferring to Harriet, is composed of two parts: if u > 0, then Harriet lets Robbie go ahead, so the value is u, but if u < 0, then Harriet switches Robbie off, so the value is 0: Comparing the expressions for EU(a) and EU(d), we see immediately that EU(d) ≥ EU(a) because the expression for EU(d) has the negative-utility region zeroed out. The two choices have equal value only when the negative region has zero probability—that is, when Robbie is already certain that Harriet likes the proposed action. The theorem is a direct analog of the well-known theorem concerning the non-negative expected value of information.
  272. Perhaps the next elaboration in line, for the one human–one robot case, is to consider a Harriet who does not yet know her own preferences regarding some aspect of the world, or whose preferences have not yet been formed.
  273. To see how exactly Robbie converges to an incorrect belief, consider a model in which Harriet is slightly irrational, making errors with a probability that diminishes exponentially as the size of error increases. Robbie offers Harriet 4 paperclips in return for 1 staple; she refuses. According to Robbie’s beliefs, this is irrational: even at 25¢ per paperclip and 75¢ per staple, she should accept 4 for 1. Therefore, she must have made a mistake—but this mistake is much more likely if her true value is 25¢ than if it is, say, 30¢, because the error costs her a lot more if her value for paperclips is 30¢. Now Robbie’s probability distribution has 25¢ as the most likely value because it represents the smallest error on Harriet’s part, with exponentially lower probabilities for values higher than 25¢. If he keeps trying the same experiment, the probability distribution becomes more and more concentrated close to 25¢. In the limit, Robbie becomes certain that Harriet’s value for paperclips is 25¢.
  274. Robbie could, for example, have a normal (Gaussian) distribution for his prior belief about the exchange rate, which stretches from −∞ to +∞.
  275. For an example of the kind of mathematical analysis that may be needed, see Avrim Blum, Lisa Hellerstein, and Nick Littlestone, “Learning in the presence of finitely or infinitely many irrelevant attributes,” Journal of Computer and System Sciences 50 (1995): 32–40. Also Lori Dalton, “Optimal Bayesian feature selection,” in Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, ed. Charles Bouman, Robert Nowak, and Anna Scaglione (IEEE, 2013).
  276. Here I am rephrasing slightly a question by Moshe Vardi at the Asilomar Conference on Beneficial AI, 2017.
  277. Michael Wellman and Jon Doyle, “Preferential semantics for goals,” in Proceedings of the 9th National Conference on Artificial Intelligence (AAAI Press, 1991). This paper draws on a much earlier proposal by Georg von Wright, “The logic of preference reconsidered,” Theory and Decision 3 (1972): 140–67.
  278. My late Berkeley colleague has the distinction of becoming an adjective. See Paul Grice, Studies in the Way of Words (Harvard University Press, 1989).
  279. The original paper on direct stimulation of pleasure centers in the brain: James Olds and Peter Milner, “Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain,” Journal of Comparative and Physiological Psychology 47 (1954): 419–27.
  280. Letting rats push the button: James Olds, “Self-stimulation of the brain; its use to study local effects of hunger, sex, and drugs,” Science 127 (1958): 315–24.
  281. Letting humans push the button: Robert Heath, “Electrical self-stimulation of the brain in man,” American Journal of Psychiatry 120 (1963): 571–77.
  282. A first mathematical treatment of wireheading, showing how it occurs in reinforcement learning agents: Mark Ring and Laurent Orseau, “Delusion, survival, and intelligent agents,” in Artificial General Intelligence: 4th International Conference, ed. Jürgen Schmidhuber, Kristinn Thórisson, and Moshe Looks (Springer, 2011). One possible solution to the wireheading problem: Tom Everitt and Marcus Hutter, “Avoiding wireheading with value reinforcement learning,” arXiv:1605.03143 (2016).
  283. How it might be possible for an intelligence explosion to occur safely: Benja Fallenstein and Nate Soares, “Vingean reflection: Reliable reasoning for self-improving agents,” technical report 2015-2, Machine Intelligence Research Institute, 2015.
  284. The difficulty agents face in reasoning about themselves and their successors: Benja Fallenstein and Nate Soares, “Problems of self-reference in self-improving space-time embedded intelligence,” in Artificial General Intelligence: 7th International Conference, ed. Ben Goertzel, Laurent Orseau, and Javier Snaider (Springer, 2014).
  285. Showing why an agent might pursue an objective different from its true objective if its computational abilities are limited: Jonathan Sorg, Satinder Singh, and Richard Lewis, “Internal rewards mitigate agent boundedness,” in Proceedings of the 27th International Conference on Machine Learning, ed. Johannes Fürnkranz and Thorsten Joachims (2010), icml.cc/Conferences/2010/papers/icml2010proceedings.zip. CHAPTER 9
  286. Some have argued that biology and neuroscience are also directly relevant. See, for example, Gopal Sarma, Adam Safron, and Nick Hay, “Integrative biological simulation, neuropsychology, and AI safety,” arxiv.org/abs/1811.03493 (2018).
  287. On the possibility of making computers liable for damages: Paulius Čerka, Jurgita Grigienė, and Gintarė Sirbikytė, “Liability for damages caused by artificial intelligence,” Computer Law and Security Review 31 (2015): 376–89.
  288. For an excellent machine-oriented introduction to standard ethical theories and their implications for designing AI systems, see Wendell Wallach and Colin Allen, Moral Machines: Teaching Robots Right from Wrong (Oxford University Press, 2008).
  289. The sourcebook for utilitarian thought: Jeremy Bentham, An Introduction to the Principles of Morals and Legislation (T. Payne & Son, 1789).
  290. Mill’s elaboration of his tutor Bentham’s ideas was extraordinarily influential on liberal thought: John Stuart Mill, Utilitarianism (Parker, Son & Bourn, 1863).
  291. The paper introducing preference utilitarianism and preference autonomy: John Harsanyi, “Morality and the theory of rational behavior,” Social Research 44 (1977): 623–56.
  292. An argument for social aggregation via weighted sums of utilities when deciding on behalf of multiple individuals: John Harsanyi, “Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility,” Journal of Political Economy 63 (1955): 309–21.
  293. A generalization of Harsanyi’s social aggregation theorem to the case of unequal prior beliefs: Andrew Critch, Nishant Desai, and Stuart Russell, “Negotiable reinforcement learning for Pareto optimal sequential decision-making,” in Advances in Neural Information Processing Systems 31, ed. Samy Bengio et al. (2018).
  294. The sourcebook for ideal utilitarianism: G. E. Moore, Ethics (Williams & Norgate, 1912).
  295. News article citing Stuart Armstrong’s colorful example of misguided utility maximization: Chris Matyszczyk, “Professor warns robots could keep us in coffins on heroin drips,” CNET, June 29, 2015.
  296. Popper’s theory of negative utilitarianism (so named later by Smart): Karl Popper, The Open Society and Its Enemies (Routledge, 1945).
  297. A refutation of negative utilitarianism: R. Ninian Smart, “Negative utilitarianism,” Mind 67 (1958): 542–43.
  298. For a typical argument for risks arising from “end human suffering” commands, see “Why do we think AI will destroy us?,” Reddit, reddit.com/r/Futurology/comments/38fp6o/why_do_we_think_ai_will_destroy_us.
  299. A good source for self-deluding incentives in AI: Ring and Orseau, “Delusion, survival, and intelligent agents.”
  300. On the impossibility of interpersonal comparisons of utility: W. Stanley Jevons, The Theory of Political Economy (Macmillan, 1871).
  301. The utility monster makes its appearance in Robert Nozick, Anarchy, State, and Utopia (Basic Books, 1974).
  302. For example, we can fix immediate death to have a utility of 0 and a maximally happy life to have a utility of 1. See John Isbell, “Absolute games,” in Contributions to the Theory of Games, vol. 4, ed. Albert Tucker and R. Duncan Luce (Princeton University Press, 1959).
  303. The oversimplified nature of Thanos’s population-halving policy is discussed by Tim Harford, “Thanos shows us how not to be an economist,” Financial Times, April 20, 2019. Even before the film debuted, defenders of Thanos began to congregate on the subreddit r/thanosdidnothingwrong/. In keeping with the subreddit’s motto, 350,000 of the 700,000 members were later purged.
  304. On utilities for populations of different sizes: Henry Sidgwick, The Methods of Ethics (Macmillan, 1874).
  305. The Repugnant Conclusion and other knotty problems of utilitarian thinking: Derek Parfit, Reasons and Persons (Oxford University Press, 1984).
  306. For a concise summary of axiomatic approaches to population ethics, see Peter Eckersley, “Impossibility and uncertainty theorems in AI value alignment,” in Proceedings of the AAAI Workshop on Artificial Intelligence Safety, ed. Huáscar Espinoza et al. (2019).
  307. Calculating the long-term carrying capacity of the Earth: Daniel O’Neill et al., “A good life for all within planetary boundaries,” Nature Sustainability 1 (2018): 88–95.
  308. For an application of moral uncertainty to population ethics, see Hilary Greaves and Toby Ord, “Moral uncertainty about population axiology,” Journal of Ethics and Social Philosophy 12 (2017): 135–67. A more comprehensive analysis is provided by Will MacAskill, Krister Bykvist, and Toby Ord, Moral Uncertainty (Oxford University Press, forthcoming).
  309. Quotation showing that Smith was not so obsessed with selfishness as is commonly imagined: Adam Smith, The Theory of Moral Sentiments (Andrew Millar; Alexander Kincaid and J. Bell, 1759).
  310. For an introduction to the economics of altruism, see Serge-Christophe Kolm and Jean Ythier, eds., Handbook of the Economics of Giving, Altruism and Reciprocity, 2 vols. (North-Holland, 2006).
  311. On charity as selfish: James Andreoni, “Impure altruism and donations to public goods: A theory of warm-glow giving,” Economic Journal 100 (1990): 464–77.
  312. For those who like equations: let Alice’s intrinsic well-being be measured by w_(A) and Bob’s by w_(B). Then the utilities for Alice and Bob are defined as follows: U_(A) = w_(A) + C_(AB) w_(B) U_(B) = w_(B) + C_(BA) w_(A). Some authors suggest that Alice cares about Bob’s overall utility U_(B) rather than just his intrinsic well-being w_(B), but this leads to a kind of circularity in that Alice’s utility depends on Bob’s utility which depends on Alice’s utility; sometimes stable solutions can be found but the underlying model can be questioned. See, for example, Hajime Hori, “Nonpaternalistic altruism and functional interdependence of social preferences,” Social Choice and Welfare 32 (2009): 59–77.
  313. Models in which each individual’s utility is a linear combination of everyone’s well-being are just one possibility. Much more general models are possible—for example, models in which some individuals prefer to avoid severe inequalities in the distribution of well-being, even at the expense of reducing the total, while other individuals would really prefer that no one have preferences about inequality at all. Thus, the overall approach I am proposing accommodates multiple moral theories held by individuals; at the same time, it doesn’t insist that any one of those moral theories is correct or should have much sway over outcomes for those who hold a different theory. I am indebted to Toby Ord for pointing out this feature of the approach.
  314. Arguments of this type have been made against policies designed to ensure equality of outcome, notably by the American legal philosopher Ronald Dworkin. See, for example, Ronald Dworkin, “What is equality? Part 1: Equality of welfare,” Philosophy and Public Affairs 10 (1981): 185–246. I am indebted to Iason Gabriel for this reference.
  315. Malice in the form of revenge-based punishment for transgressions is certainly a common tendency. Although it plays a social role in keeping members of a community in line, it can be replaced by an equally effective policy driven by deterrence and prevention—that is, weighing the intrinsic harm done when punishing the transgressor against the benefits to the larger society.
  316. Let E_(AB) and P_(AB) be Alice’s coefficients of envy and pride respectively, and assume that they apply to the difference in well-being. Then a (somewhat oversimplified) formula for Alice’s utility could be the following: U_(A) = w_(A) + C_(AB) w_(B) – E_(AB) (w_(B) – w_(A)) + P_(AB) (w_(A) – w_(B))       = (1 + E_(AB) + P_(AB)) w_(A) + (C_(AB) – E_(AB) – P_(AB)) w_(B). Thus, if Alice has positive pride and envy coefficients, they act on Bob’s welfare exactly like sadism and malice coefficients: Alice is happier if Bob’s welfare is lowered, all other things being equal. In reality, pride and envy typically apply not to differences in well-being but to differences in visible aspects thereof, such as status and possessions. Bob’s hard toil in acquiring his possessions (which lowers his overall well-being) may not be visible to Alice. This can lead to the self-defeating behaviors that go under the heading of “keeping up with the Joneses.”
  317. On the sociology of conspicuous consumption: Thorstein Veblen, The Theory of the Leisure Class: An Economic Study of Institutions (Macmillan, 1899).
  318. Fred Hirsch, The Social Limits to Growth (Routledge & Kegan Paul, 1977).
  319. I am indebted to Ziyad Marar for pointing me to social identity theory and its importance in understanding human motivation and behavior. See, for example, Dominic Abrams and Michael Hogg, eds., Social Identity Theory: Constructive and Critical Advances (Springer, 1990). For a much briefer summary of the main ideas, see Ziyad Marar, “Social identity,” in This Idea Is Brilliant: Lost, Overlooked, and Underappreciated Scientific Concepts Everyone Should Know, ed. John Brockman (Harper Perennial, 2018).
  320. Here, I am not suggesting that we necessarily need a detailed understanding of the neural implementation of cognition; what is needed is a model at the “software” level of how preferences, both explicit and implicit, generate behavior. Such a model would need to incorporate what is known about the reward system.
  321. Ralph Adolphs and David Anderson, The Neuroscience of Emotion: A New Synthesis (Princeton University Press, 2018).
  322. See, for example, Rosalind Picard, Affective Computing, 2nd ed. (MIT Press, 1998).
  323. Waxing lyrical on the delights of the durian: Alfred Russel Wallace, The Malay Archipelago: The Land of the Orang-Utan, and the Bird of Paradise (Macmillan, 1869).
  324. A less rosy view of the durian: Alan Davidson, The Oxford Companion to Food (Oxford University Press, 1999). Buildings have been evacuated and planes turned around in mid-flight because of the durian’s overpowering odor.
  325. I discovered after writing this chapter that the durian was used for exactly the same philosophical purpose by Laurie Paul, Transformative Experience (Oxford University Press, 2014). Paul suggests that uncertainty about one’s own preferences presents fatal problems for decision theory, a view contradicted by Richard Pettigrew, “Transformative experience and decision theory,” Philosophy and Phenomenological Research 91 (2015): 766–74. Neither author refers to the early work of Harsanyi, “Games with incomplete information, Parts I–III,” or Cyert and de Groot, “Adaptive utility.”
  326. An initial paper on helping humans who don’t know their own preferences and are learning about them: Lawrence Chan et al., “The assistive multi-armed bandit,” in Proceedings of the 14th ACM/IEEE International Conference on Human–Robot Interaction (HRI), ed. David Sirkin et al. (IEEE, 2019).
  327. Eliezer Yudkowsky, in Coherent Extrapolated Volition (Singularity Institute, 2004), lumps all these aspects, as well as plain inconsistency, under the heading of muddle—a term that has not, unfortunately, caught on.
  328. On the two selves who evaluate experiences: Daniel Kahneman, Thinking, Fast and Slow (Farrar, Straus & Giroux, 2011).
  329. Edgeworth’s hedonimeter, an imaginary device for measuring happiness moment to moment: Francis Edgeworth, Mathematical Psychics: An Essay on the Application of Mathematics to the Moral Sciences (Kegan Paul, 1881).
  330. A standard text on sequential decisions under uncertainty: Martin Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 1994).
  331. On axiomatic assumptions that justify additive representations of utility over time: Tjalling Koopmans, “Representation of preference orderings over time,” in Decision and Organization, ed. C. Bartlett McGuire, Roy Radner, and Kenneth Arrow (North-Holland, 1972).
  332. The 2019 humans (who might, in 2099, be long dead or might just be the earlier selves of 2099 humans) might wish to build the machines in a way that respects the 2019 preferences of the 2019 humans rather than pandering to the undoubtedly shallow and ill-considered preferences of humans in 2099. This would be like drawing up a constitution that disallows any amendments. If the 2099 humans, after suitable deliberation, decide they wish to override the preferences built in by the 2019 humans, it seems reasonable that they should be able to do so. After all, it is they and their descendants who have to live with the consequences.
  333. I am indebted to Wendell Wallach for this observation.
  334. An early paper dealing with changes in preferences over time: John Harsanyi, “Welfare economics of variable tastes,” Review of Economic Studies 21 (1953): 204–13. A more recent (and somewhat technical) survey is provided by Franz Dietrich and Christian List, “Where do preferences come from?,” International Journal of Game Theory 42 (2013): 613–37. See also Laurie Paul, Transformative Experience (Oxford University Press, 2014), and Richard Pettigrew, “Choosing for Changing Selves,” philpapers.org/archive/PETCFC.pdf.
  335. For a rational analysis of irrationality, see Jon Elster, Ulysses and the Sirens: Studies in Rationality and Irrationality (Cambridge University Press, 1979).
  336. For promising ideas on cognitive prostheses for humans, see Falk Lieder, “Beyond bounded rationality: Reverse-engineering and enhancing human intelligence” (PhD thesis, University of California, Berkeley, 2018). CHAPTER 10
  337. On the application of assistance games to driving: Dorsa Sadigh et al., “Planning for cars that coordinate with people,” Autonomous Robots 42 (2018): 1405–26.
  338. Apple is, curiously, absent from this list. It does have an AI research group and is ramping up rapidly. Its traditional culture of secrecy means that its impact in the marketplace of ideas is quite limited so far.
  339. Max Tegmark, interview, Do You Trust This Computer?, directed by Chris Paine, written by Mark Monroe (2018).
  340. On estimating the impact of cybercrime: “Cybercrime cost $600 billion and targets banks first,” Security Magazine, February 21, 2018. APPENDIX A
  341. The basic plan for chess programs of the next sixty years: Claude Shannon, “Programming a computer for playing chess,” Philosophical Magazine, 7th ser., 41 (1950): 256–75. Shannon’s proposal drew on a centuries-long tradition of evaluating chess positions by adding up piece values; see, for example, Pietro Carrera, Il gioco degli scacchi (Giovanni de Rossi, 1617).
  342. A report describing Samuel’s heroic research on an early reinforcement learning algorithm for checkers: Arthur Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development 3 (1959): 210–29.
  343. The concept of rational metareasoning and its application to search and game playing emerged from the thesis research of my student Eric Wefald, who died tragically in a car accident before he could write up his work; the following appeared posthumously: Stuart Russell and Eric Wefald, Do the Right Thing: Studies in Limited Rationality (MIT Press, 1991). See also Eric Horvitz, “Rational metareasoning and compilation for optimizing decisions under bounded resources,” in Computational Intelligence, II: Proceedings of the International Symposium, ed. Francesco Gardin and Giancarlo Mauri (North-Holland, 1990); and Stuart Russell and Eric Wefald, “On optimal game-tree search using rational meta-reasoning,” in Proceedings of the 11th International Joint Conference on Artificial Intelligence, ed. Natesa Sridharan (Morgan Kaufmann, 1989).
  344. Perhaps the first paper showing how hierarchical organization reduces the combinatorial complexity of planning: Herbert Simon, “The architecture of complexity,” Proceedings of the American Philosophical Society 106 (1962): 467–82.
  345. The canonical reference for hierarchical planning is Earl Sacerdoti, “Planning in a hierarchy of abstraction spaces,” Artificial Intelligence 5 (1974): 115–35. See also Austin Tate, “Generating project networks,” in Proceedings of the 5th International Joint Conference on Artificial Intelligence, ed. Raj Reddy (Morgan Kaufmann, 1977).
  346. A formal definition of what high-level actions do: Bhaskara Marthi, Stuart Russell, and Jason Wolfe, “Angelic semantics for high-level actions,” in Proceedings of the 17th International Conference on Automated Planning and Scheduling, ed. Mark Boddy, Maria Fox, and Sylvie Thiébaux (AAAI Press, 2007). APPENDIX B
  347. This example is unlikely to be from Aristotle, but may have originated with Sextus Empiricus, who lived probably in the second or third century CE.
  348. The first algorithm for theorem-proving in first-order logic worked by reducing first-order sentences to (very large numbers of) propositional sentences: Martin Davis and Hilary Putnam, “A computing procedure for quantification theory,” Journal of the ACM 7 (1960): 201–15.
  349. An improved algorithm for propositional inference: Martin Davis, George Logemann, and Donald Loveland, “A machine program for theorem-proving,” Communications of the ACM 5 (1962): 394–97.
  350. The satisfiability problem—deciding whether a collection of sentences is true in some world—is NP-complete. The reasoning problem—deciding whether a sentence follows from the known sentences—is co-NP-complete, a class that is thought to be harder than NP-complete problems.
  351. There are two exceptions to this rule: no repetition (a stone may not be played that returns the board to a situation that existed previously) and no suicide (a stone may not be placed such that it would immediately be captured—for example, if it is already surrounded).
  352. The work that introduced first-order logic as we understand it today (Begriffsschrift means “concept writing”): Gottlob Frege, Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (Halle, 1879). Frege’s notation for first-order logic was so bizarre and unwieldy that it was soon replaced by the notation introduced by Giuseppe Peano, which remains in common use today.
  353. A summary of Japan’s bid for supremacy through knowledge-based systems: Edward Feigenbaum and Pamela McCorduck, The Fifth Generation: Artificial Intelligence and Japan’s Computer Challenge to the World (Addison-Wesley, 1983).
  354. The US efforts included the Strategic Computing Initiative and the formation of the Microelectronics and Computer Technology Corporation (MCC). See Alex Roland and Philip Shiman, Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993 (MIT Press, 2002).
  355. A history of Britain’s response to the re-emergence of AI in the 1980s: Brian Oakley and Kenneth Owen, Alvey: Britain’s Strategic Computing Initiative (MIT Press, 1990).
  356. The origin of the term GOFAI: John Haugeland, Artificial Intelligence: The Very Idea (MIT Press, 1985).
  357. Interview with Demis Hassabis on the future of AI and deep learning: Nick Heath, “Google DeepMind founder Demis Hassabis: Three truths about AI,” TechRepublic, September 24, 2018. APPENDIX C
  358. Pearl’s work was recognized by the Turing Award in 2011.
  359. Bayes nets in more detail: Every node in the network is annotated with the probability of each possible value, given each possible combination of values for the node’s parents (that is, those nodes that point to it). For example, the probability that Doubles₁₂ has value true is 1.0 when D₁ and D₂ have the same value, and 0.0 otherwise. A possible world is an assignment of values to all the nodes. The probability of such a world is the product of the appropriate probabilities from each of the nodes.
  360. A compendium of applications of Bayes nets: Olivier Pourret, Patrick Naïm, and Bruce Marcot, eds., Bayesian Networks: A Practical Guide to Applications (Wiley, 2008).
  361. The basic paper on probabilistic programming: Daphne Koller, David McAllester, and Avi Pfeffer, “Effective Bayesian inference for stochastic programs,” in Proceedings of the 14th National Conference on Artificial Intelligence (AAAI Press, 1997). For many additional references, see probabilistic-programming.org.
  362. Using probabilistic programs to model human concept learning: Brenden Lake, Ruslan Salakhutdinov, and Joshua Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science 350 (2015): 1332–38.
  363. For a detailed description of the seismic monitoring application and associated probability model, see Nimar Arora, Stuart Russell, and Erik Sudderth, “NET-VISA: Network processing vertically integrated seismic analysis,” Bulletin of the Seismological Society of America 103 (2013): 709–29.
  364. News article describing one of the first serious self-driving car crashes: Ryan Randazzo, “Who was at fault in self-driving Uber crash? Accounts in Tempe police report disagree,” Republic (azcentral.com), March 29, 2017. APPENDIX D
  365. The foundational discussion of inductive learning: David Hume, Philosophical Essays Concerning Human Understanding (A. Millar, 1748).
  366. Leslie Valiant, “A theory of the learnable,” Communications of the ACM 27 (1984): 1134–42. See also Vladimir Vapnik, Statistical Learning Theory (Wiley, 1998). Valiant’s approach concentrated on computational complexity, Vapnik’s on statistical analysis of the learning capacity of various classes of hypotheses, but both shared a common theoretical core connecting data and predictive accuracy.
  367. For example, to learn the difference between the “situational superko” and “natural situational superko” rules, the learning algorithm would have to try repeating a board position that it had created previously by a pass rather than by playing a stone. The results would be different in different countries.
  368. For a description of the ImageNet competition, see Olga Russakovsky et al., “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision 115 (2015): 211–52.
  369. The first demonstration of deep networks for vision: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, ed. Fernando Pereira et al. (2012).
  370. The difficulty of distinguishing over one hundred breeds of dogs: Andrej Karpathy, “What I learned from competing against a ConvNet on ImageNet,” Andrej Karpathy Blog, September 2, 2014.
  371. Blog post on inceptionism research at Google: Alexander Mordvintsev, Christopher Olah, and Mike Tyka, “Inceptionism: Going deeper into neural networks,” Google AI Blog, June 17, 2015. The idea seems to have originated with J. P. Lewis, “Creation by refinement: A creativity paradigm for gradient descent learning networks,” in Proceedings of the IEEE International Conference on Neural Networks (IEEE, 1988).
  372. News article on Geoff Hinton having second thoughts about deep networks: Steve LeVine, “Artificial intelligence pioneer says we need to start over,” Axios, September 15, 2017.
  373. A catalog of shortcomings of deep learning: Gary Marcus, “Deep learning: A critical appraisal,” arXiv:1801.00631 (2018).
  374. A popular textbook on deep learning, with a frank assessment of its weaknesses: François Chollet, Deep Learning with Python (Manning Publications, 2017).
  375. An explanation of explanation-based learning: Thomas Dietterich, “Learning at the knowledge level,” Machine Learning 1 (1986): 287–315.
  376. A superficially quite different explanation of explanation-based learning: John Laird, Paul Rosenbloom, and Allen Newell, “Chunking in Soar: The anatomy of a general learning mechanism,” Machine Learning 1 (1986): 11–46. Image Credits Figure 2: (b) © The Sun / News Licensing; (c) Courtesy of Smithsonian Institution Archives. Figure 4: © SRI International. creativecommons.org/licenses/by/3.0/legalcode. Figure 5: (left) © Berkeley AI Research Lab; (right) © Boston Dynamics. Figure 6: © The Saul Steinberg Foundation / Artists Rights Society (ARS), New York. Figure 7: (left) © Noam Eshel, Defense Update; (right) © Future of Life Institute / Stuart Russell. Figure 10: (left) © AFP; (right) Courtesy of Henrik Sorensen. Figure 11: Elysium © 2013 MRC II Distribution Company L.P. All Rights Reserved. Courtesy of Columbia Pictures. Figure 14: © OpenStreetMap contributors. OpenStreetMap.org. creativecommons.org/licenses/by/2.0/legalcode. Figure 19: Terrain photo: DigitalGlobe via Getty Images. Figure 20: (right) Courtesy of the Tempe Police Department. Figure 24: © Jessica Mullen / Deep Dreamscope. creativecommons.org/licenses/by/2.0/legalcode. ABCDEFGHIJKLMNOPQRSTUVWXYZ The page numbers in this index refer to the printed version of this book. The link provided will take you to the beginning of that print page. You may need to scroll forward from that location to find the corresponding reference on your e-reader. AAAI (Association for the Advancement of Artificial Intelligence), 250 Abbeel, Pieter, 73, 192 abstract actions, hierarchy of, 87–90 abstract planning, 264–66 access shortcomings, of intelligent personal assistants, 67–68 action potentials, 15 actions, discovering, 87–90 actuators, 72 Ada, Countess of Lovelace. See Lovelace, Ada adaptive organisms, 18–19 agent. See intelligent agent agent program, 48 “AI Researchers on AI Risk” (Alexander), 153 Alciné, Jacky, 60 Alexander, Scott, 146, 153, 169–70 algorithms, 33–34 Bayesian networks and, 275–77 Bayesian updating, 283, 284 bias and, 128–30 chess-playing, 62–63 coding of, 34 completeness theorem and, 51–52 computer hardware and, 34–35 content selection, 8–9, 105 deep learning, 58–59, 288–93 dynamic programming, 54–55 examples of common, 33–34 exponential complexity of problems and, 38–39 halting problem and, 37–38 lookahead search, 47, 49–50, 260–61 propositional logic and, 268–70 reinforcement learning, 55–57, 105 subroutines within, 34 supervised learning, 58–59, 285–93 Alibaba, 250 AlphaGo, 6, 46–48, 49–50, 55, 91, 92, 206–7, 209–10, 261, 265, 285 AlphaZero, 47, 48 altruism, 24, 227–29 altruistic AI, 173–75 Amazon, 106, 119, 250 Echo, 64–65 “Picking Challenge” to accelerate robot development, 73–74 Analytical Engine, 40 ants, 25 Aoun, Joseph, 123 Apple HomePod, 64–65 “Architecture of Complexity, The” (Simon), 265 Aristotle, 20–21, 39–40, 50, 52, 53, 114, 245 Armstrong, Stuart, 221 Arnauld, Antoine, 21–22 Arrow, Kenneth, 223 artificial intelligence (AI), 1–12 agent (See intelligent agent) agent programs, 48–59 beneficial, principles for (See beneficial AI) benefits to humans of, 98–102 as biggest event in human history, 1–4 conceptual breakthroughs required for (See conceptual breakthroughs required for superintelligent AI) decision making on global scale, capability for, 75–76 deep learning and, 6 domestic robots and, 73–74 general-purpose, 46–48, 100, 136 global scale, capability to sense and make decisions on, 74–76 goals and, 41–42, 48–53, 136–42, 165–69 governance of, 249–53 health advances and, 101 history of, 4–6, 40–42 human preferences and (See human preferences) imagining what superintelligent machines could do, 93–96 intelligence, defining, 39–61 intelligent personal assistants and, 67–71 limits of superintelligence, 96–98 living standard increases and, 98–100 logic and, 39–40 media and public perception of advances in, 62–64 misuses of (See misuses of AI) mobile phones and, 64–65 multiplier effect of, 99 objectives and, 11–12, 43, 48–61, 136–42, 165–69 overly intelligent AI, 132–44 pace of scientific progress in creating, 6–9 predicting arrival of superintelligent AI, 76–78 reading capabilities and, 74–75 risk posed by (See risk posed by AI) scale and, 94–96 scaling up sensory inputs and capacity for action, 94–95 self-driving cars and, 65–67, 181–82, 247 sensing on global scale, capability to, 75 smart homes and, 71–72 softbots and, 64 speech recognition capabilities and, 74–75 standard model of, 9–11, 13, 48–61, 247 Turing test and, 40–41 tutoring by, 100–101 virtual reality authoring by, 101 World Wide Web and, 64 “Artificial Intelligence and Life in 2030” (One Hundred Year Study on Artificial Intelligence), 149, 150 Asimov, Isaac, 141 assistance games, 192–203 learning preferences exactly in long run, 200–202 off-switch game, 196–200 paperclip game, 194–96 prohibitions and, 202–3 uncertainty about human objectives, 200–202 Association for the Advancement of Artificial Intelligence (AAAI), 250 assumption failure, 186–87 Atkinson, Robert, 158 Atlas humanoid robot, 73 autonomous weapons systems (LAWS), 110–13 autonomy loss problem, 255–56 Autor, David, 116 Avengers: Infinity War (film), 224 “avoid putting in human goals” argument, 165–69 axiomatic basis for utility theory, 23–24 axioms, 185 Babbage, Charles, 40, 132–33 backgammon, 55 Baidu, 250 Baldwin, James, 18 Baldwin effect, 18–20 Banks, Iain, 164 bank tellers, 117–18 Bayes, Thomas, 54 Bayesian logic, 54 Bayesian networks, 54, 275–77 Bayesian rationality, 54 Bayesian updating, 283, 284 Bayes theorem, 54 behavior, learning preferences from, 190–92 behavior modification, 104–7 belief state, 282–83 beneficial AI, 171–210, 247–49 caution regarding development of, reasons for, 179 data available for learning about human preferences, 180–81 economic incentives for, 179–80 evil behavior and, 179 learning to predict human preferences, 176–77 moral dilemmas and, 178 objective of AI is to maximize realization of human preferences, 173–75 principles for, 172–79 proofs for (See proofs for beneficial AI) uncertainty as to what human preferences are, 175–76 values, defining, 177–78 Bentham, Jeremy, 24, 219 Berg, Paul, 182 Berkeley Robot for the Elimination of Tedious Tasks (BRETT), 73 Bernoulli, Daniel, 22–23 “Bill Gates Fears AI, but AI Researchers Know Better” (Popular Science), 152 blackmail, 104–5 blinking reflex, 57 blockchain, 161 board games, 45 Boole, George, 268 Boolean (propositional) logic, 51, 268–70 bootstrapping process, 81–82 Boston Dynamics, 73 Bostrom, Nick, 102, 144, 145, 150, 166, 167, 183, 253 brains, 16, 17–18 reward system and, 17–18 Summit machine, compared, 34 BRETT (Berkeley Robot for the Elimination of Tedious Tasks), 73 Brin, Sergey, 81 Brooks, Rodney, 168 Brynjolfsson, Erik, 117 Budapest Convention on Cybercrime, 253–54 Butler, Samuel, 133–34, 159 “can’t we just . . .” responses to risks posed by AI, 160–69 “. . . avoid putting in human goals,” 165–69 “. . . merge with machines,” 163–65 “. . . put it in a box,” 161–63 “. . . switch it off,” 160–61 “. . . work in human-machine teams,” 163 Cardano, Gerolamo, 21 caring professions, 122 Chace, Calum, 113 changes in human preferences over time, 240–45 Changing Places (Lodge), 121 checkers program, 55, 261 chess programs, 62–63 Chollet, François, 293 chunking, 295 circuits, 291–92 CNN, 108 CODE (Collaborative Operations in Denied Environments), 112 combinatorial complexity, 258 common operational picture, 69 compensation effects, 114–17 completeness theorem (Gödel’s), 51–52 complexity of problems, 38–39 Comprehensive Nuclear-Test-Ban Treaty (CTBT) seismic monitoring, 279–80 computer programming, 119 computers, 32–61 algorithms and (See algorithms) complexity of problems and, 38–39 halting problem and, 37–38 hardware, 34–35 intelligent (See artificial intelligence) limits of computation, 36–39 software limitations, 37 special-purpose devices, building, 35–36 universality and, 32 computer science, 33 “Computing Machinery and Intelligence” (Turing), 40–41, 149 conceptual breakthroughs required for superintelligent AI, 78–93 actions, discovering, 87–90 cumulative learning of concepts and theories, 82–87 language/common sense problem, 79–82 mental activity, managing, 90–92 consciousness, 16–17 consequentialism, 217–19 content selection algorithms, 8–9, 105 content shortcomings, of intelligent personal assistants, 67–68 control theory, 10, 44–45, 54, 176 convolutional neural networks, 47 cost function to evaluate solutions, and goals, 48 Credibility Coalition, 109 CRISPR-Cas9, 156 cumulative learning of concepts and theories, 82–87 cybersecurity, 186–87 Daily Telegraph, 77 decision making on global scale, 75–76 decoherence, 36 Deep Blue, 62, 261 deep convolutional network, 288–90 deep dreaming images, 291 deepfakes, 105–6 deep learning, 6, 58–59, 86–87, 288–93 DeepMind, 90 AlphaGo, 6, 46–48, 49–50, 55, 91, 92, 206–7, 209–10, 261, 265, 285 AlphaZero, 47, 48 DQN system, 55–56 deflection arguments, 154–59 “research can’t be controlled” arguments, 154–56 silence regarding risks of AI, 158–59 tribalism, 150, 159–60 whataboutery, 156–57 Delilah (blackmail bot), 105 denial of risk posed by AI, 146–54 “it’s complicated” argument, 147–48 “it’s impossible” argument, 149–50 “it’s too soon to worry about it” argument, 150–52 Luddism accusation and, 153–54 “we’re the experts” argument, 152–54 deontological ethics, 217 dexterity problem, robots, 73–74 Dickinson, Michael, 190 Dickmanns, Ernst, 65 DigitalGlobe, 75 domestic robots, 73–74 dopamine, 17, 205–6 Dota 2, 56 DQN system, 55–56 Dune (Herbert), 135 dynamic programming algorithms, 54–55 E. coli, 14–15 eBay, 106 ECHO (first smart home), 71 “Economic Possibilities for Our Grandchildren” (Keynes), 113–14, 120–21 The Economic Singularity: Artificial Intelligence and the Death of Capitalism (Chace), 113 Economist, The, 145 Edgeworth, Francis, 238 Eisenhower, Dwight, 249 electrical action potentials, 15 Eliza (first chatbot), 67 Elmo (shogi program), 47 Elster, Jon, 242 Elysium (film), 127 emergency braking, 57 enfeeblement of humans problem, 254–55 envy, 229–31 Epicurus, 219 equilibrium solutions, 30–31, 195–96 Erewhon (Butler), 133–34, 159 Etzioni, Oren, 152, 157 eugenics movement, 155–56 expected value rule, 22–23 experience, learning from, 285–95 experiencing self, and preferences, 238–40 explanation-based learning, 294–95 Facebook, 108, 250 Fact, Fiction and Forecast (Goodman), 85 fact-checking, 108–9, 110 factcheck.org, 108 fear of death (as an instrumental goal), 140–42 feature engineering, 84–85 Fermat, Pierre de, 185 Fermat’s Last Theorem, 185 Ferranti Mark I, 34 Fifth Generation project, 271 firewalling AI systems, 161–63 first-order logic, 51, 270–72 probabilistic languages and, 277–80 propositional logic distinguished, 270 Ford, Martin, 113 Forster, E. M., 254–55 Fox News, 108 Frege, Gottlob, 270 Full, Bob, 190 G7, 250–51 Galileo Galilei, 85–86 gambling, 21–23 game theory, 28–32. See also assistance games Gates, Bill, 56, 153 GDPR (General Data Protection Regulation), 127–29 Geminoid DK (robot), 125 General Data Protection Regulation (GDPR), 127–29 general-purpose artificial intelligence, 46–48, 100, 136 geometric objects, 33 Glamour, 129 Global Learning XPRIZE competition, 70 Go, 6, 46–47, 49–50, 51, 55, 56 combinatorial complexity and, 259–61 propositional logic and, 269 supervised learning algorithm and, 286–87 thinking, learning from, 293–95 goals, 41–42, 48–53, 136–42, 165–69 God and Golem (Wiener), 137–38 Gödel, Kurt, 51, 52 Goethe, Johann Wolfgang von, 137 Good, I. J., 142–43, 153, 208–9 Goodhart’s law, 77 Goodman, Nelson, 85 Good Old-Fashioned AI (GOFAI), 271 Google, 108, 112–13 DeepMind (See DeepMind) Home, 64–65 misclassifying people as gorillas in Google Photo, 60 tensor processing units (TPUs), 35 gorilla problem, 132–36 governance of AI, 249–53 governmental reward and punishment systems, 106–7 Great Decoupling, 117 greed (as an instrumental goal), 140–42 Grice, H. Paul, 205 Gricean analysis, 205 halting problem, 37–38 hand construction problem, robots, 73 Hardin, Garrett, 31 hard takeoff scenario, 144 Harop (missile), 111 Harsanyi, John, 220, 229 Hassabis, Demis, 271–72, 293 Hawking, Stephen, 4, 153 health advances, 101 He Jiankui, 156 Herbert, Frank, 135 hierarchy of abstract actions, 87–90, 265–66 High-Level Expert Group on Artificial Intelligence (EU), 251 Hillarp, Nils-Åke, 17 Hinton, Geoff, 290 Hirsch, Fred, 230 Hobbes, Thomas, 246 Howard’s End (Forster), 254 Huffington Post, 4 human germline alteration, ban on, 155–56 human–machine teaming, 163–65 human preferences, 211–45 behavior, learning preferences from, 190–92 beneficial AI and, 172–77 changes in, over time, 240–45 different people, learning to make trade-offs between preferences of, 213–27 emotions and, 232–34 errors as to, 236–37 of experiencing self, 238–40 heterogeneity of, 212–13 loyal AI, 215–17 modification of, 243–45 of nice, nasty and envious humans, 227–31 of remembering self, 238–40 stupidity and, 232–34 transitivity of, 23–24 uncertainty and, 235–37 updates in, 241–42 utilitarian AI (See utilitarianism/utilitarian AI) utility theory and, 23–27 human roles, takeover of, 124–31 Human Use of Human Beings (Wiener), 137 humble AI, 175–76 Hume, David, 167, 287–88 IBM, 62, 80, 250 ideal utilitarianism, 219 IEEE (Institute of Electrical and Electronics Engineers), 250 ignorance, 52–53 imitation game, 40–41 inceptionism images, 291 inductive logic programming, 86 inductive reasoning, 287–88 inputs, to intelligent agents, 42–43 instinctive organisms, 18–19 Institute of Electrical and Electronics Engineers (IEEE), 250 instrumental goal, 141–42, 196 insurance underwriters, 119 intelligence, 13–61 action potentials and, 15 brains and, 16, 17–18 computers and, 39–61 consciousness and, 16–17 E. coli and, 14–15 evolutionary origins of, 14–18 learning and, 15, 18–20 nerve nets and, 16 practical reasoning and, 20 rationality and, 20–32 standard model of, 9–11, 13, 48–61, 247 successful reasoning and, 20 intelligence agencies, 104 intelligence explosions, 142–44, 208–9 intelligent agent, 42–48 actions generated by, 48 agent programs and, 48–59 defined, 42 design of, and problem types, 43–45 environment and, 43, 44, 45–46 inputs to, 42–43 multi-agent cooperation design, 94 objectives and, 43, 48–61 reflex, 57–59 intelligent computers. See artificial intelligence (AI) intelligent personal assistants, 67–71, 101 commonsense modeling and, 68–69 design template for, 69–70 education systems, 70 health systems, 69–70 personal finance systems, 70 privacy considerations, 70–71 shortcomings of early systems, 67–68 stimulus–response templates and, 67 understanding content, improvements in, 68 International Atomic Energy Agency, 249 Internet of Things (IoT), 65 interpersonal services as the future of employment, 122–24 algorithmic bias and, 128–30 decisions affecting people, use of machines in, 126–28 robots built in humanoid form and, 124–26 intractable problems, 38–39 inverse reinforcement learning, 191–93 Ishiguro, Hiroshi, 125 is-ought problem, 167 “it’s complicated” argument, 147–48 “it’s impossible” argument, 149–50 “it’s too soon to worry about it” argument, 150–52 jellyfish, 16 Jeopardy! (tv show), 80 Jevons, William Stanley, 222 JiaJia (robot), 125 jian ai, 219 Kahneman, Daniel, 238–40 Kasparov, Garry, 62, 90, 261 Ke Jie, 6 Kelly, Kevin, 97, 148 Kenny, David, 153, 163 Keynes, John Maynard, 113–14, 120–21, 122 King Midas problem, 136–40 Kitkit School (software system), 70 knowledge, 79–82, 267–72 knowledge-based systems, 50–51 Krugman, Paul, 117 Kurzweil, Ray, 163–64 language/common sense problem, 79–82 Laplace, Pierre-Simon, 54 Laser-Interferometer Gravitational-Wave Observatory (LIGO), 82–84 learning, 15 behavior, learning preferences from, 190–92 bootstrapping process, 81–82 culture and, 19 cumulative learning of concepts and theories, 82–87 data-driven view of, 82–83 deep learning, 6, 58–59, 84, 86–87, 288–93 as evolutionary accelerator, 18–20 from experience, 285–93 explanation-based learning, 294–95 feature engineering and, 84–85 inverse reinforcement learning, 191–93 reinforcement learning, 17, 47, 55–57, 105, 190–91 supervised learning, 58–59, 285–93 from thinking, 293–95 LeCun, Yann, 47, 165 legal profession, 119 lethal autonomous weapons systems (LAWS), 110–13 Life 3.0 (Tegmark), 114, 138 LIGO (Laser-Interferometer Gravitational-Wave Observatory), 82–84 living standard increases, and AI, 98–100 Lloyd, Seth, 37 Lloyd, William, 31 Llull, Ramon, 40 Lodge, David, 1 logic, 39–40, 50–51, 267–72 Bayesian, 54 defined, 267 first-order, 51–52, 270–72 formal language requirement, 267 ignorance and, 52–53 programming, development of, 271 propositional (Boolean), 51, 268–70 lookahead search, 47, 49–50, 260–61 loophole principle, 202–3, 216 Lovelace, Ada, 40, 132–33 loyal AI, 215–17 Luddism accusation, 153–54 machines, 33 “Machine Stops, The” (Forster), 254–55 machine translation, 6 McAfee, Andrew, 117 McCarthy, John, 4–5, 50, 51, 52, 53, 65, 77 malice, 228–29 malware, 253 map navigation, 257–58 mathematical proofs for beneficial AI, 185–90 mathematics, 33 matrices, 33 Matrix, The (film), 222, 235 MavHome project, 71 mechanical calculator, 40 mental security, 107–10 “merge with machines” argument, 163–65 metareasoning, 262 Methods of Ethics, The (Sidgwick), 224–25 Microsoft, 250 TrueSkill system, 279 Mill, John Stuart, 217–18, 219 Minsky, Marvin, 4–5, 76, 153 misuses of AI, 103–31, 253–54 behavior modification, 104–7 blackmail, 104–5 deepfakes, 105–6 governmental reward and punishment systems, 106–7 intelligence agencies and, 104 interpersonal services, takeover of, 124–31 lethal autonomous weapons systems (LAWS), 110–13 mental security and, 107–10 work, elimination of, 113–24 mobile phones, 64–65 monotonicity and, 24 Moore, G. E., 219, 221, 222 Moore’s law, 34–35 Moravec, Hans, 144 Morgan, Conway Lloyd, 18 Morgenstern, Oskar, 23 Mozi (Mozi), 219 multi-agent cooperation design, 94 Musk, Elon, 153, 164 “Myth of Superhuman AI, The” (Kelly), 148 narrow (tool) artificial intelligence, 46, 47, 136 Nash, John, 30, 195 Nash equilibrium, 30–31, 195–96 National Institutes of Health (NIH), 155 negative altruism, 229–30 NELL (Never-Ending Language Learning) project, 81 nerve nets, 16 NET-VISA, 279–80 Network Enforcement Act (Germany), 108, 109 neural dust, 164–65 Neuralink Corporation, 164 neural lace, 164 neural networks, 288–89 neurons, 15, 16, 19 Never-Ending Language Learning (NELL) project, 81 Newell, Allen, 295 Newton, Isaac, 85–86 New Yorker, The, 88 Ng, Andrew, 151, 152 Norvig, Peter, 2, 62–63 no suicide rule, 287 Nozick, Robert, 223 nuclear industry, 157, 249 nuclear physics, 7–8 Nudge (Thaler & Sunstein), 244 objectives, 11–12, 43, 48–61, 136–42, 165–69. See also goals off-switch game, 196–200 onebillion (software system), 70 One Hundred Year Study on Artificial Intelligence (AI100), 149, 150 OpenAI, 56 operations research, 10, 54, 176 Oracle AI systems, 161–63 orthogonality thesis, 167–68 Ovadya, Aviv, 108 overhypothesis, 85 overly intelligent AI, 132–44 fear and greed, 140–42 gorilla problem, 132–36 intelligence explosions and, 142–44, 208–9 King Midas problem, 136–40 paperclip game, 194–96 Parfit, Derek, 225 Partnership on AI, 180, 250 Pascal, Blaise, 21–22, 40 Passage to India, A (Forster), 254 Pearl, Judea, 54, 275 Perdix (drone), 112 Pinker, Steven, 158, 165–66, 168 Planet (satellite corporation), 75 Politics (Aristotle), 114 Popper, Karl, 221–22 Popular Science, 152 positional goods, 230–31 practical reasoning, 20 pragmatics, 204 preference autonomy principle, 220, 241 preferences. See human preferences preference utilitarianism, 220 Price, Richard, 54 pride, 230–31 Primitive Expounder, 133 prisoner’s dilemma, 30–31 privacy, 70–71 probability theory, 21–22, 273–84 Bayesian networks and, 275–77 first-order probabilistic languages, 277–80 independence and, 274 keeping track of not directly observable phenomena, 280–84 probabilistic programming, 54–55, 84, 279–80 programming language, 34 programs, 33 prohibitions, 202–3 Project Aristo, 80 Prolog, 271 proofs for beneficial AI assistance games, 184–210, 192–203 learning preferences from behavior, 190–92 mathematical guarantees, 185–90 recursive self-improvement and, 208–10 requests and instructions, interpretation of, 203–5 wireheading problem and, 205–8 propositional logic, 51, 268–70 Putin, Vladimir, 182, 183 “put it in a box” argument, 161–63 puzzles, 45 quantum computation, 35–36 qubit devices, 35–36 randomized strategy, 29 rationality Aristotle’s formulation of, 20–21 Bayesian, 54 critiques of, 24–26 expected value rule and, 22–23 gambling and, 21–23 game theory and, 28–32 inconsistency in human preferences, and developing theory of beneficial AI, 26–27 logic and, 39–40 monotonicity and, 24 Nash equilibrium and, 30–31 preferences and, 23–27 probability and, 21–22 randomized strategy and, 29 for single agent, 20–27 transitivity and, 23–24 for two agents, 27–32 uncertainty and, 21 utility theory and, 22–26 rational metareasoning, 262 reading capabilities, 74–75 real-world decision problem complexity and, 39 Reasons and Persons (Parfit), 225 Recombinant DNA Advisory Committee, 155 recombinant DNA research, 155–56 recursive self-improvement, 208–10 redlining, 128 reflex agents, 57–59 reinforcement learning, 17, 47, 55–57, 105, 190–91 remembering self, and preferences, 238–40 Repugnant Conclusion, 225 reputation systems, 108–9 “research can’t be controlled” arguments, 154–56 retail cashiers, 117–18 reward function, 53–54, 55 reward system, 17 Rise of the Robots: Technology and the Threat of a Jobless Future (Ford), 113 risk posed by AI, 145–70 deflection arguments, 154–59 denial of problem, 146–54 Robinson, Alan, 5 Rochester, Nathaniel, 4–5 Rutherford, Ernest, 7, 77, 85–86, 150 Sachs, Jeffrey, 230 sadism, 228–29 Salomons, Anna, 116 Samuel, Arthur, 5, 10, 55, 261 Sargent, Tom, 191 scalable autonomous weapons, 112 Schwab, Klaus, 117 Second Machine Age, The (Brynjolfsson & McAfee), 117 Sedol, Lee, 6, 47, 90, 91, 261 seismic monitoring system (NET-VISA), 279–80 self-driving cars, 65–67, 181–82, 247 performance requirements for, 65–66 potential benefits of, 66–67 probabilistic programming and, 281–82 sensing on global scale, 75 sets, 33 Shakey project, 52 Shannon, Claude, 4–5, 62 Shiller, Robert, 117 side-channel attacks, 187, 188 Sidgwick, Henry, 224–25 silence regarding risks of AI, 158–59 Simon, Herbert, 76, 86, 265 simulated evolution of programs, 171 SLAM (simultaneous localization and mapping), 283 Slate Star Codex blog, 146, 169–70 Slaughterbot, 111 Small World (Lodge), 1 Smart, R. N., 221–22 smart homes, 71–72 Smith, Adam, 227 snopes.com, 108 social aggregation theorem, 220–21 Social Limits to Growth, The (Hirsch), 230 social media, and content selection algorithms, 8–9 softbots, 64 software systems, 248 solutions, searching for, 257–66 abstract planning and, 264–66 combinatorial complexity and, 258 computational activity, managing, 261–62 15-puzzle and, 258 Go and, 259–61 map navigation and, 257–58 motor control commands and, 263–64 24-puzzle and, 258 “Some Moral and Technical Consequences of Automation” (Wiener), 10 Sophia (robot), 126 specifications (of programs), 248 “Speculations Concerning the First Ultraintelligent Machine” (Good), 142–43 speech recognition, 6 speech recognition capabilities, 74–75 Spence, Mike, 117 SpotMini, 73 SRI, 41–42, 52 standard model of intelligence, 9–11, 13, 48–61, 247 StarCraft, 45 Stasi, 103–4 stationarity, 24 statistics, 10, 176 Steinberg, Saul, 88 stimulus–response templates, 67 Stockfish (chess program), 47 striving and enjoying, relation between, 121–22 subroutines, 34, 233–34 Summers, Larry, 117, 120 Summit machine, 34, 35, 37 Sunstein, Cass, 244 Superintelligence (Bostrom), 102, 145, 150, 167, 183 supervised learning, 58–59, 285–93 surveillance, 104 Sutherland, James, 71 “switch it off” argument, 160–61 synapses, 15, 16 Szilard, Leo, 8, 77, 150 tactile sensing problem, robots, 73 Taobao, 106 technological unemployment. See work, elimination of Tegmark, Max, 4, 114, 138 Tellex, Stephanie, 73 Tencent, 250 tensor processing units (TPUs), 35 Terminator (film), 112, 113 Tesauro, Gerry, 55 Thaler, Richard, 244 Theory of the Leisure Class, The (Veblen), 230 Thinking, Fast and Slow (Kahneman), 238 thinking, learning from, 293–95 Thornton, Richard, 133 Times, 7, 8 tool (narrow) artificial intelligence, 46, 47, 136 TPUs (tensor processing units), 35 tragedy of the commons, 31 Transcendence (film), 3–4, 141–42 transitivity of preferences, 23–24 Treatise of Human Nature, A (Hume), 167 tribalism, 150, 159–60 truck drivers, 119 TrueSkill system, 279 Tucker, Albert, 30 Turing, Alan, 32, 33, 37–38, 40–41, 124–25, 134–35, 140–41, 144, 149, 153, 160–61 Turing test, 40–41 tutoring, 100–101 tutoring systems, 70 2001: A Space Odyssey (film), 141 Uber, 57, 182 UBI (universal basic income), 121 uncertainty AI uncertainty as to human preferences, principle of, 53, 175–76 human uncertainty as to own preferences, 235–37 probability theory and, 273–84 United Nations (UN), 250 universal basic income (UBI), 121 Universal Declaration of Human Rights (1948), 107 universality, 32–33 universal Turing machine, 33, 40–41 unpredictability, 29 utilitarian AI, 217–27 Utilitarianism ((Mill), 217–18 utilitarianism/utilitarian AI, 214 challenges to, 221–27 consequentialist AI, 217–19 ideal utilitarianism, 219 interpersonal comparison of utilities, debate over, 222–24 multiple people, maximizing sum of utilities of, 219–26 preference utilitarianism, 220 social aggregation theorem and, 220 Somalia problem and, 226–27 utility comparison across populations of different sizes, debate over, 224–25 utility function, 53–54 utility monster, 223–24 utility theory, 22–26 axiomatic basis for, 23–24 objections to, 24–26 value alignment, 137–38 Vardi, Moshe, 202–3 Veblen, Thorstein, 230 video games, 45 virtual reality authoring, 101 virtue ethics, 217 visual object recognition, 6 von Neumann, John, 23 W3C Credible Web group, 109 WALL-E (film), 255 Watson, 80 wave function, 35–36 “we’re the experts” argument, 152–54 white-collar jobs, 119 Whitehead, Alfred North, 88 whole-brain emulation, 171 Wiener, Norbert, 10, 136–38, 153, 203 Wilczek, Frank, 4 Wiles, Andrew, 185 wireheading, 205–8 work, elimination of, 113–24 caring professions and, 122 compensation effects and, 114–17 historical warnings about, 113–14 income distribution and, 123 occupations at risk with adoption of AI technology, 118–20 reworking education and research institutions to focus on human world, 123–24 striving and enjoying, relation between, 121–22 universal basic income (UBI) proposals and, 121 wage stagnation and productivity increases, since 1973, 117 “work in human–machine teams” argument, 163 World Economic Forum, 250 World Wide Web, 64 Worshipful Company of Scriveners, 109 Zuckerberg, Mark, 157 ABCDEFGHIJKLMNOPQRSTUVWXYZ About the Author Stuart Russell is a professor of Computer Science and holder of the Smith-Zadeh Chair in Engineering at the University of California, Berkeley. He has served as the Vice-Chair of the World Economic Forum's Council on AI and Robotics and as an advisor to the United Nations on arms control. He is a Fellow of the American Association for Artificial Intelligence, the Association for Computing Machinery, and the American Association for the Advancement of Science. He is the author (with Peter Norvig) of the definitive and universally acclaimed textbook on AI, Artificial Intelligence: A Modern Approach. [Penguin Random House Next Reads logo] What’s next on your reading list? Discover your next great read! Get personalized book picks and up-to-date news about this author. Sign up now.