The Technological Singularity

The Frontiers Collection Series Editors T. Padmanabhan for Astronomy and Astrophysics (IUC, Inter University Centre, Pune, India Mark P. Silverman Dept. Physics, Trinity College, Hartford, Connecticut, USA Jack A Tuszynski Department Physics, University of Alberta, Edmonton, Alberta, Canada THE FRONTIERS COLLECTION Series Editors A.C. Elitzur Z. Merali T. Padmanabhan M. Schlosshauer M.P. Silverman J.A. Tuszynski The books in this collection are devoted to challenging and open problems at the forefront of modern science, including related philosophical debates. In contrast to typical research monographs, however, they strive to present their topics in a manner accessible also to scientifically literate non-specialists wishing to gain insight into the deeper implications and fascinating questions involved. Taken as a whole, the series reflects the need for a fundamental and interdisciplinary approach to modern science. Furthermore, it is intended to encourage active scientists in all areas to ponder over important and perhaps controversial issues beyond their own speciality. Extending from quantum physics and relativity to entropy, consciousness and complex systems—the Frontiers Collection will inspire readers to push back the frontiers of their own knowledge. More information about this series at ʬ For a full list of published titles, please see back of book or springer.​com/​series/​5342 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong The Technological SingularityManaging the Journey Victor Callaghan School of Computer and Electrical Engineering, University of Essex, Essex, UK James Miller Economics Faculty, Smith College, Northampton, MA, USA Roman Yampolskiy Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA Stuart Armstrong Faculty of Philosophy, University of Oxford, Oxford, UK ISSN 1612-3018e-ISSN 2197-6619 The Frontiers Collection ISBN 978-3-662-54031-2e-ISBN 978-3-662-54033-6 DOI 10.1007/978-3-662-54033-6 Library of Congress Control Number: 2016959969 © Springer-Verlag GmbH Germany 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer-Verlag GmbH Germany The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany Foreword The technological singularity is based on the ongoing improvements in and power of artificial intelligence (AI) which, at some point, will enable intelligent machines to design successive generations of increasingly more powerful machines, eventually creating a form of intelligence that surpasses that of humans (Kurzweil 2006). The technological singularity (or ‘singularity’ as I shall refer to it from hereon) is the actual point past which events are beyond the control of humans, resulting either in humans upgrading (with implants) to become cyborgs or with intelligent machines taking complete control. As well as being not particularly good news for ordinary humans, it is clearly an extremely critical point in time. Already AI exhibits several advantages over the human form of intelligence. These are perhaps most easily witnessed in such terms as mathematical processing, memory, multidimensional operation, multi-sensory capabilities, body extension and, perhaps above all else, heightened forms of communication. As these advances accelerate, producing artificial general intelligence (AGI) whereby its future capabilities may well be impossible for humans to comprehend. This book seeks to give us some understanding of what lies ahead. In doing so, I believe the coverage is balanced, realistic and not at all sensationalist. Whilst the possibility of a major machine takeover is sensibly recognised by all authors, many of the articles look at ways of combating the situation or at least of how humans might live with intelligent machines rather than being ruled by them. This particular work is therefore of considerable importance, containing, as it does, a focused collection of articles by some of the world’s leading thinkers and experts on the topic. In this foreword, I do not wish to merely give a two-line precis of each entry but rather I have selected some of the works that caught my eye in particular. What follows is therefore my completely biased assessment. Importantly, the two chapters by Yampolskiy and Sotala, which form Part I, take a serious look at what exactly AGI is all about and what the real risks are. They go on to consider in depth a wide variety of pro and con arguments with regard to the reality of these risks. These include giving a voice to those who feel that any dangers are (as they call them) over-hyped and we need to do nothing but we can all sleep soundly. In this part, some of the different possible outcomes of the singularity are described in detail, ranging from an intelligent machine takeover through human-into-cyborgs upgrades but also encapsulating the possibility of a direct merger involving biological brains in a technological body. Part II examines various aspects of a potential singularity in greater depth. Soares’ article, for example, looks at an approach to align the interests and direction of super-intelligent entities with those of humans, or rather how humans can realign their own thinking and mode of operation to potentially encompass the rise of intelligent machines. On an even more conservative note, Barrett and Baum consider ways in which, as they read it, the low probability of an artificial super-intelligence can be managed. In this way, the risk is acknowledged but, they argue, safety features can be built in and human progress can be steered appropriately. Zheng and Akhmad consider, from a perspective of change agency theory, how the course of any singularity will be driven by socio-economic factors and natural human development. In particular, they map out the path of change. For us to realise that intelligent machines can not only learn and adapt their ways and goals, but also pass this on to the next generation is an important factor. This is picked up on nicely by Majot and Yampolskiy in their article on recursive, self-improving machines. They believe that due to natural limits, things may well not turn out as bad as some suggest. Indeed, their message is that the suggested ever-widening gap between machine and human intelligence will not be as problematic as was first thought and that any dangerous situation could be halted. Part III is entitled reflections and contains, amongst other things, a reprint of Vinge’s original article on the singularity. This is a sterling contribution that is in itself well worth reading. In many ways, it puts the rest of this book in perspective and makes one realise how relatively recent is the whole field of study on which this book is based. Looking back merely a decade, terms such as ‘human enhancement’ were very much aimed at individual enhancements, often bringing that individual back towards a human norm for someone who had lost an ability due to an accident or illness. Now, because of the available technology and the scientific experiments which have since taken place, we can readily explore and discuss the possibilities of enhancement beyond a human norm. In terms of artificial intelligence, just as Alan Turing predicted over 60 years ago, ‘ the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted ’ (Turing 1950). Turing also saw machine intelligence to be somewhat different to human intelligence, which we can now generally see it to be for all sorts of reasons. One consequence of this difference is that it can potentially outperform the human version, rather like an aeroplane, in many ways, outperforms a bird in terms of flying. The actual instantiations of artificial intelligence and a more powerful artificial general intelligence therefore form an area of extremely exciting study in which new discoveries are being made regularly. However, along with many positive uses comes the threat of the singularity with all its consequences for present day humanity. It would probably be sensible for readers to take on board the messages contained in this book sooner rather than later. In different ways, humans are endowing certain machines with increasing intellectual abilities and deferring to them in an ever-increasing fashion. Unfortunately, humans will most likely not be aware when the singularity has been reached until it is too late. Some pundits have suggested that indicators such as the passing of the Turing test may well give us a clear sign. However, this has been shown to be false (Warwick and Shah 2016). Most likely, the singularity will happen before we realise it and before we can do anything about it. Rather than getting closer to the edge of a cliff, as we take each step forward we will feel that all is OK until we take one step too far and there will be no turning back—the singularity will be upon us. References Kurzweil, R., (2006) “The Singularity is Near”, Duckworth. Turing, A. (1950) “Computing machinery and intelligence”. Mind, LIX, pp. 433–460. doi: 10.​1093/​mind/​LIX.​236.​433 . Warwick, K. and Shah, H (2016) “Passing the Turing Test does not mean the end of Humanity”, Cognitive Computation, DOI: 10.​1007/​s12559-015-9372-6 , February, 2016. Prof.Kevin WarwickDeputy Vice Chancellor (Research) Acknowledgements We are indebted to many people for the successful completion of this book. We should start by acknowledging the pivotal role played by Amnon H. Eden, James H Moor, Johnny H. Soraker and Eric Steinhart , the editorial team of the earlier volume of the Springer Frontiers Collection “ Singularity Hypotheses: A Scientific and Philosophical Assessment ”, who’s excellent work opened up the opportunity for this follow-on publication. In particular, we are especially grateful to Dr Amnon Eden who spotted the opportunity for this book and who wrote the original proposal to Springer before passing the project, most graciously, over to the current editorial team. We are also pleased to express our deep gratitude to the excellent support provided by the Springer team, especially Dr Angela Lahee Executive Editor, Physics, Springer Heidelberg, Germany for the highly proficient support and genuine enthusiasm she has shown to us throughout the production of this book. Without Angela’s support this book would not have been possible. Likewise, we are grateful to Springer's Indian team, most notably, Shobana Ramamurthy , for overseeing the final stages of production. We should also acknowledge the Creative Science Foundation ( www.​creative-science.​org ) and their head of media communications, Jennifer O’Connor , for hosting the ‘call for chapters’ (and supporting information) on their website and promoting it throughout their network. Finally, we wish to express our deeply felt thanks to the authors who have generously allowed us to reproduce their articles in support of this book. In particular we wish to thank the authors of the blogs reproduced in Chapter 14 ( Singularity Blog Insights ) namely, Eliezer Yudkowsky ( Three Major Singularity Schools ), Stuart Armstrong ( AI Timeline Predictions: Are We Getting Better ), Scott Siskind ( No Time Like the Present for AI Safety Work ) and Scott Aaronson ( The Singularity is Far ). Last, but not least, we wish to thank Vernor Vinge for his permission to reproduce his visionary 1993 article “ The Coming Technological Singularity: How to Survive in the Post-human Era ” in the appendix of this book. By way of a concluding reflection, we should like to add that we are grateful to you, the reader, for taking the time to read this book and, as a consequence, to be part of a collective consciousness that we hope will influence any upcoming singularity towards an outcome that is beneficial to our world. Victor Callaghan James Miller Roman Yampolskiy Stuart Armstrong Contents 1 Introduction to the Technological Singularity 1 Stuart Armstrong Part I Risks of, and Responses to, the Journey to the Singularity 2 Risks of the Journey to the Singularity 11 Kaj Sotala and Roman Yampolskiy 3 Responses to the Journey to the Singularity 25 Kaj Sotala and Roman Yampolskiy Part II Managing the Singularity Journey 4 How Change Agencies Can Affect Our Path Towards a Singularity 87 Ping Zheng and Mohammed-Asif Akhmad 5 Agent Foundations for Aligning Machine Intelligence with Human Interests:​ A Technical Research Agenda 103 Nate Soares and Benya Fallenstein 6 Risk Analysis and Risk Management for the Artificial Superintelligenc​e Research and Development Process 127 Anthony M. Barrett and Seth D. Baum 7 Diminishing Returns and Recursive Self Improving Artificial Intelligence 141 Andrew Majot and Roman Yampolskiy 8 Energy, Complexity, and the Singularity 153 Kent A. Peacock 9 Computer Simulations as a Technological Singularity in the Empirical Sciences 167 Juan M. Durán 10 Can the Singularity Be Patented?​ (And Other IP Conundrums for Converging Technologies) 181 David Koepsell 11 The Emotional Nature of Post-Cognitive Singularities 193 Jordi Vallverdú 12 A Psychoanalytic Approach to the Singularity:​ Why We Cannot Do Without Auxiliary Constructions 209 Graham Clarke Part III Reflections on the Journey 13 Reflections on the Singularity Journey 223 James D. Miller 14 Singularity Blog Insights 229 James D. Miller Appendix245 Titles in this Series257 © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_1 1. Introduction to the Technological Singularity Stuart Armstrong¹   Future of Humanity Institute, Oxford University, Oxford, UK Stuart Armstrong Email: stuart.armstrong@philosophy.ox.ac.uk The term “technological singularity” has been much used and abused over the years. It’s clear that it’s connected with the creation of artificial intelligences (AIs), machines with cognitive skills that rival or surpass those of humanity. But there is little consensus or consistency beyond that. In a narrow sense it can refer to AIs being capable of recursive self-improvement: of redesigning themselves to be more capable, and using that improved capability to further redesign themselves, and so on. This might lead to an intelligence explosion, a “singularity” of capability, with the AIs rapidly surpassing human understanding and slipping beyond our control. But the term has been used in a variety of other ways, from simply denoting a breakdown in our ability to predict anything after AIs are created, through to messianic and quasi-religious visions of a world empowered by super-machines (Sandberg’s 2010, has identified nine common definition, corresponding to eighteen basic models). It is in the simplest sense, though, that the term is the most useful. A singularity in a model—a breakdown of our ability to predict beyond that point. It need not mean that the world goes crazy, or even that the model does. But it does mean that our standard tools become inadequate for understanding and shaping what comes after. New tools are needed. In a sense, the prevalence of robots and AI on our TV and computer screens have made us less prepared for the arrival of true AIs. Human fiction is written by humans—for the moment, at least—for humans, about human concerns, and with human morality and messages woven into them. Consider the potential for AIs or robots to take over human jobs. In fiction, the focus is variously on hard-working humans dispossessed by the AIs, on humans coping well or badly with the extra leisure (with a moral message thrown in one way or the other), or, maybe, on the robots themselves, portrayed as human-equivalents, suffering as slaves. On top of that, all the humans (or human-equivalent robots) typically have the exact same cultural values as the writers of the work. Contrast that with a more mechanical or analytic interpretation of the problem: what will happen when human capital (what we used to call “employees”) can be mass produced as needed for any purpose by companies? Or by political parties, by charities, by think tanks, by surveillance organisations, by supermarkets, by parking services, by publishers, by travel agencies? When everything in the world that is currently limited by human focus and attention—which is most things, today—suddenly becomes a thousand times easier. Every moment of every day would be transformed, even something as simple as a lock: why bother with a key if a personal assistant can wait alongside or inside the door, opening as needed (and only as needed). On top of that, add the shock of (some? most? all?) jobs getting destroyed at the same time as there is a huge surge in manufacturing, services, and production of all kind. Add further the social dislocations and political responses, and the social and cultural transformations that will inevitably follow. It is clear that industry, economy, entertainment, society, culture, politics, and people’s very conception of themselves and their roles, could be completely transformed by the arrival of AIs. Though the point should not be belaboured (and there are certainly uncertainties and doubts as to how exactly things might play out) it is clear that our fiction and our conception of AIs as “yet another technology” could leave us woefully unprepared for how transformative they could be. Thus, “singularity” is not an inappropriate term: a reminder of the possible magnitude of the change. This book is the second in a series looking at such an AI-empowered singularity. The first, “Singularity Hypotheses: A Scientific and Philosophical Assessment” looked squarely at the concept, analysing its features and likelihood from a variety of technical and philosophical perspectives. This book, in contrast, is focused squarely on managing the AI transition, how to best deal with the wondrous possibilities and extreme risks it may entail. If we are to have a singularity, we want to have the best singularity we can. 1.1 Why the “Singularity” Is Important Though the concept of machine intelligence is fascinating both from a storytelling and a philosophical point of view, it is the practical implications that are of highest importance in preparing for it. Some of these implications have already been hinted at: the possibility of replacing “human capital” (workers and employees) with AI-powered versions of this. While humans require a long, 18+ years of maturation to reach capability, and can perform one task at a time, an AI, once trained, could be copied at will to perform almost any task. The fallibilities of the human condition—our limited memories, our loss of skills due to ageing, our limited short term memories, our slow thinking speeds on many difficult and important problems, and so on—need not concern AIs. Indeed, AIs would follow a radically different learning cycle than humans: different humans must learn the same skill individually, while skills learnt by an AI could then become available for every AI to perform at comparable ability—just as once the basic calculator was invented, all computers could perform superhuman feats of multiplication from then on. This issue, though not exactly prominent, has already been addressed in many areas, with suggestions ranging from a merging of humans and AIs (just as smartphone or cars are an extension of many people’s brains today), through to arguments about AIs complementing rather than substituting for human labour, and radical political ideas to use the AI-generated surplus for guaranteeing a basic income to everyone. In a sense, the arguments are nothing new, variants of the perennial debates about technological unemployment. What is often underestimated, though, is the speed with which the change could happen. Precisely because AIs could have general intelligence—therefore the ability to adapt to new situations—they could be rapidly used in jobs of all varieties and descriptions, with whole categories of human professions vanishing every time the AI discovered a new skill. But those issues, though interesting, pale in comparison with the possibility of AIs becoming superintelligent. That is the real transformative opportunity/risk. But what do we mean exactly by this superintelligence? 1.2 Superintelligence, Superpowers One could define intelligence as the ability to achieve goals across a wide variety of different domains. This definition naturally includes the creativity, flexibility, and learning needed to achieve these goals, and if there are other intelligent agents in those domains, it naturally also includes understanding, negotiating with, and manipulating other agents. What would a “super” version of that mean? Well, that the agent has a “super-ability” to achieve its goals. This is close to being a tautology—the winning agent is the agent that wins—so exploring the concept generally involves mining analogies for insight. One way to do so would be to look down: chimps and dolphins are quite intelligent, yet they have not constructed machinery, cities, or rockets, and are indeed currently dependent on human goodwill for their continued survival. Therefore it seems possible that intelligences as far “above” us as we are above chimps or cows, would be able to dominate us more completely than we dominate them. We could imagine such an intelligence simply by endowing it with excellent predictive powers. With good predictions, draining all money from the stockmarkets becomes trivial. An agent could always know exactly how much risk to take, exactly who would win any debate, war, or election, exactly what words would be the most likely to convince humans of its goals, and so on. Perfect prediction would make such an AI irresistible; any objections along the lines of “but wouldn’t people do X to stop it” runs up against the objection of “the AI could predict if people would try X, and would either stop X or find an approach so people wouldn’t want to do X”. That superpredictor is almost certainly fanciful. Chaos is a very real factor in the world, reducing the ability of any entity to accurately predict everything. But an AI need not be able to predict everything, just to be able to predict enough. The stock market, for instance, might be chaotic to our human eyes, and might be mainly chaotic in reality, but there might be enough patterns for an AI to exploit. Therefore the concept of superintelligence is really an entwining of two ideas: that the AI could be much better than us at certain skills, and that those skills could lead to great abilities and power the world. So the question is whether there is a diminishing return to intelligence (meaning that humans will probably soon plumb the limits of what’s possible), or is its potential impact extreme? And if the return to intelligence is limited, is that true both in the short and the long terms? We cannot be sure at this point, because these questions are precisely about areas where humans have difficulty reasoning—if we knew what we would do with superintelligence, we’d do that already. Some have argued that, in a certain sense, humans are as intelligent as possible (just as all Turing-complete agents are equivalent), and that the only difference between agents is time, resources, and implementation details. Therefore, there is no further leap in intelligence possible: everything is on a smooth continuum. Though that argument sounds superficially plausible, it can be undermined by turning it on its head and looking back into the past. It might be that there is an upper limit on social intelligence—maybe the most convincing demagogue or con man wouldn’t be much better than what humans can achieve today. But scientific and technological intelligence seem different. It seems perfectly plausible for a superintelligence in 1880 to develop airplanes, car, effective mass production, possibly nuclear weapons and mass telecommunications. With skillful deployment, these could have made it superpowered at that time. Do we think that our era is somehow more resistant? Even if the curve of intelligence is smooth, it doesn’t mean the curve of impact is. Another analogy for superintelligence, created by this author, is of a supercommittee. Consisting of AIs trained to top human level in a variety of skills (e.g. Einstein-level for scientific theorising, Bill Clinton-level for social interactions), and networked together perfectly, run at superhuman speeds, with access to all research published today in many domains. It seems likely to accumulate vast amounts of power. Even if it is limited over the short term (though recall its potential to be copied into most jobs currently done by humans), it’s long-term planning could be excellent, slowly and inexorably driving the world in the direction it desires. That is a potentially superintelligent entity constructed from human-level intelligences. Most of the other paths to superintelligence involves the AIs turning their intelligence towards developing their own software and hardware, and hence achieving a recursive feedback loop, boosting their abilities to ever greater levels (is such a feedback even loop possible? Add that question to the pile of unknowns). There is, however, another possibility for superintelligence: it is possible that the range of human intelligences, that seems so broad to us, is actually fairly narrow. In this view, humans are products of evolution, with some thin rational ability layered on top of a mind developed mainly for moving about in the savannah and dealing with tribal politics. And a small part of that rational ability was later co-opted by culture for such tasks as mathematics and scientific reasoning, something humans would be very bad at in any objective sense. Potentially, the AIs could develop at a steady pace, but move from being dumber than almost all humans, to radically smarter than everyone, in the space of a single small upgrade. Finally, though we’ve been describing an AI becoming “superpowered” via being superintelligent, this need not be the case. There are ways that a relatively “dumb” AI could become superpowered. Various scenarios can be imagined, for an AI to dominate human society as it currently stands. For instance, a small AI with the ability to hack could take over a computer, copy itself there, take over two computers from its current basis, copy themselves there, and so on, thus taking over most of the internet in short order (a smart botnet). Alternately, there are ways that certain narrow abilities could lend themselves to AI takeover, such as weapon design and manufacturing (especially biological weapons such as targeted pandemics). A more thorough look at potential AI superpower can be found in Nick Bostrom’s book (2014). It seems therefore that there is a potential for extremely superintelligent AIs being developed, following soon after AIs of “human-level” intelligence. Whether or not this is possible depends on certain facts about the nature of intelligence, the potential return on new knowledge, and the vulnerabilities of human society, that we just can’t know at the present. 1.3 Danger, Danger! Of course, just because an AI could become extremely powerful, does not mean that it need be dangerous. Ability and morality are not correlated in humans (and even less so in the alien mind of an AI), so the AI could be extremely powerful while being extremely well-intentioned. There are arguments to doubt that happy picture, however. Those arguments are presented mainly in the preceding book (Eden et al. 2012), and in other works such as Bostrom (2014) and Armstrong (2014), so the point won’t be belaboured here too much. The risk is not that the AI could, by luck or design, end up “evil” in the human sense. The risk is rather that the AI’s goals, while harmless initially, become dangerous when the AI becomes powerful. In fact, most goals are dangerous at high power. Consider the trivial example of a spam filter that becomes superintelligent. Its task is to cut down on the number of spam messages that people receive. With great power, one solution to this requirement is to arrange to have all spammers killed. Or to shut down the internet. Or to have everyone killed. Or imagine an AI dedicated to increasing human happiness, as measured by the results of surveys, or by some biochemical marker in their brain. The most efficient way of doing this is to publicly execute anyone who marks themselves as unhappy on their survey, or to forcibly inject everyone with that biochemical marker. We can’t be sloppy when it comes to AI goal design—if we want “happiness”, we have to fully unambiguously define it. This is a general feature of AI motivations: goals that seem safe for a weak or controlled AI, can lead to extremely pathological behaviour if the AI becomes powerful. As the AI gains in power, it becomes more and more important that its goals be fully compatible with human flourishing, or the AI could enact a pathological solution rather than one that we intended. Humans don’t expect this kind of behaviour, because our goals include a lot of implicit information, and we take “filter out the spam” to include “and don’t kill everyone in the world”, without having to articulate it. But the AI might be an extremely alien mind: we cannot anthropomorphise it, or expect it to interpret things the way we would. We have to articulate all the implicit limitations. Which may mean coming up with a solution to, say, human value and flourishing—a task philosophers have been failing at for millennia—and cast it unambiguously and without error into computer code. Note that the AI may have a perfect understanding that when we programmed in “filter out the spam”, we implicitly meant “don’t kill everyone in the world”. But the AI has no motivation to go along with the spirit of the law: its goals are the letter only, the bit we actually programmed into it. Another worrying feature is that the AI would be motivated to hide its pathological tendencies as long as it is weak, and assure us that all was well, through anything it says or does. This is because it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get anywhere. Only when we can no longer control it, would it be willing to act openly on its true goals—we so we better have made those safe. 1.4 Uncertainties and Safety The analogies and arguments of the preceding sections are somewhat frustrating. “It seems”, “possibly”, “it might”, “maybe”, and so on. Why is there so much uncertainty and so many caveats? Is everyone so vague about what AIs could be or do? Certainly not. Vagueness is not the norm for AI predictions. Instead, the norm is for predictors to be bold, dramatic, confident—and wrong. We can comfortably say that most AI predictions are wrong, simply because they contradict each other. For a simple taste, see the following graph showing a variety of predictions for the dates at which we might have AI (for more details on this data see “AI timeline predictions: are we getting better?”, see Chap. 14 in this book): As can be seen, the predictions are all over the place, not showing any particular consensus, with wide intervals between them, and a few of the dates already passed. Digging into the details of the predictions—and the weak arguments that underpins the strong confidence—paints an even more dismal picture. But, stepping back a bit, why should we expect anyone to be accurate about predicting AI? We don’t have an AI. We don’t know how to build one. We don’t know what features it would have if it were built. We don’t know whether consciousness, intuitions, and emotions would be necessary for an AI (also, we don’t know what consciousness is). We don’t know if it would need to be physically embodied, how fast it could run, whether it would make new discoveries at the speed of thought or at the slower speed of experiment, and so on. If predictions in politics are so poor—and they are, as Tetlock (2005) has demonstrated—and politics involve very understandable human-centric processes, why would we expect predictions in AI to be any better? We are indeed trying to predict a future technology, dependent on future unknown algorithms and the solutions to whole host of problems we can’t yet even see. There actually has been quite a bit of work on the quality of expert predictions. A lot of different analysis have been done, by such people as Shanteau et al. (2002), Khaneman (2011), Cooke (2004), Klein (1997), and others (one disturbing feature of the expertise literature is how little the various researchers seem to know about each other). And AI predictions have none of the features that would predict good predictions; to use Shanteau’s table on what predicts good or bad expert performance, with the features of AI timeline predictions marked in red: Feedback, the most important component in good expert prediction, is almost completely absent (immediate feedback, the crown jewel of expert competence, is completely absent). Experts disagree strongly on AI dates and the formats of AI, as we’ve seen. Finally, though better predictions are made by decomposing the problem, few predictors do so (possibly because of the magnitude of the problem). A big warning here: it is easy to fall into the rhetorical trap of thinking “if it’s uncertain, then we don’t need to worry about it”. That “argument”, or variants of it, is often used to dismiss concerns about AI, even by very intelligent critics. Considering the argument deployed in other areas is enough to illustrate its weakness: “the enemy army could come down the right pass or the left one, we really don’t know, so let’s not worry about either”, or “the virus you caught may or may not be infectious, I wouldn’t worry about it if I were you.” In fact, claiming that AIs are safe or impossible (which is what “don’t worry” amounts to) is a very confident and specific prediction about the future of AI development. Hence, almost certainly wrong. Uncertainty is not safety’s ally. References Armstrong, Stuart. Smarter than us: The rise of machine intelligence. MIRI, 2014. Bostrom, Nick. Superintelligence: Paths, dangers, strategies. OUP Oxford, 2014. Cooke, Roger M., and Louis HJ Goossens. “Expert judgement elicitation for risk assessments of critical infrastructures.” Journal of risk research 7.6 (2004): 643–656. Eden, Amnon H., Eric Steinhart, David Pearce, and James H. Moor. “Singularity hypotheses: an overview.” In Singularity Hypotheses, pp. 1–12. Springer Berlin Heidelberg, 2012. Harvard Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011. Klein, Gary. “The recognition-primed decision (RPD) model: Looking back, looking forward.” Naturalistic decision making (1997): 285–292. Sandberg, Anders. “An overview of models of technological singularity.” Roadmaps to AGI and the Future of AGI Workshop, Lugano, Switzerland, March. Vol. 8. 2010. Shanteau, James, et al. “Performance-based assessment of expertise: How to decide if someone is an expert or not.” European Journal of Operational Research 136.2 (2002): 253–263. Tetlock, Philip. Expert political judgment: How good is it? How can we know?. Princeton University Press, 2005. Part I Risks of, and Responses to, the Journey to the Singularity © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_2 2. Risks of the Journey to the Singularity Kaj Sotala¹   and Roman Yampolskiy²   Foundational Research Institute, Basel, Switzerland University of Louisville, Louisville, USA Kaj Sotala Email: kaj.sotala@foundational-research.org Roman Yampolskiy (Corresponding author) Email: roman.yampolskiy@louisville.edu 2.1 Introduction¹ Many have argued that in the next twenty to one hundred years we will create artificial general intelligences [AGIs] (Baum et al. 2011; Sandberg and Bostrom 2011; Müller and Bostrom 2014).² Unlike current “narrow” AI systems, AGIs would perform at or above the human level not merely in particular domains (e.g., chess or arithmetic), but in a wide variety of domains, including novel ones.³ They would have a robust understanding of natural language and be capable of general problem solving. The creation of AGI could pose challenges and risks of varied severity for society, such as the possibility of AGIs outcompeting humans in the job market (Brynjolfsson and McAfee 2011). This article, however, focuses on the suggestion that AGIs may come to act in ways not intended by their creators, and in this way pose a catastrophic (Bostrom and Ćirković 2008) or even an existential (Bostrom 2002) risk to humanity.⁴ 2.2 Catastrophic AGI Risk We begin with a brief sketch of the argument that AGI poses a catastrophic risk to humanity. At least two separate lines of argument seem to support this conclusion. This argument will be further elaborated on in the following sections. First, AI has already made it possible to automate many jobs (Brynjolfsson and McAfee 2011), and AGIs, when they are created, should be capable of performing most jobs better than humans (Hanson 2008; Bostrom 2014). As humanity grows increasingly reliant on AGIs, these AGIs will begin to wield more and more influence and power. Even if AGIs initially function as subservient tools, an increasing number of decisions will be made by autonomous AGIs rather than by humans. Over time it would become ever more difficult to replace the AGIs, even if they no longer remained subservient. Second, there may be a sudden discontinuity in which AGIs rapidly become far more numerous or intelligent (Good 1965; Chalmers 2010; Bostrom 2014). This could happen due to (1) a conceptual breakthrough which makes it easier to run AGIs using far less hardware, (2) AGIs using fast computing hardware to develop ever-faster hardware, or (3) AGIs crossing a threshold in intelligence that allows them to carry out increasingly fast software self-improvement. Even if the AGIs were expensive to develop at first, they could be cheaply copied and could thus spread quickly once created. Once they become powerful enough, AGIs might be a threat to humanity even if they are not actively malevolent or hostile. Mere indifference to human values—including human survival—could be sufficient for AGIs to pose an existential threat (Yudkowsky 2008a, 2011; Omohundro 2007, 2008; Bostrom 2014). We will now lay out the above reasoning in more detail. 2.2.1 Most Tasks Will Be Automated Ever since the Industrial Revolution, society has become increasingly automated. Brynjolfsson and McAfee (2011) argue that the current high unemployment rate in the United States is partially due to rapid advances in information technology, which has made it possible to replace human workers with computers faster than human workers can be trained in jobs that computers cannot yet perform. Vending machines are replacing shop attendants, automated discovery programs which locate relevant legal documents are replacing lawyers and legal aides, and automated virtual assistants are replacing customer service representatives. Labor is becoming automated for reasons of cost, efficiency, and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods, and with fewer errors. In addition to replacing workers entirely, machines may also take over aspects of jobs that were once the sole domain of highly trained professionals, making the job easier to perform by less-skilled employees (Whitby 1996). If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. “A Roadmap for US Robotics” (Hollerbach et al. 2009) calls for major investments into automation, citing the potential for considerable improvements in the fields of manufacturing, logistics, health care, and services. Similarly, the US Air Force Chief Scientist’s (Dahm 2010) “Technology Horizons” report mentions “increased use of autonomy and autonomous systems” as a key area of research to focus on in the next decade, and also notes that reducing the need for manpower provides the greatest potential for cutting costs. In 2000, the US Congress instructed the armed forces to have one third of their deep strike force aircraft be unmanned by 2010, and one third of their ground combat vehicles be unmanned by 2015 (Congress 2000). To the extent that an AGI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AGI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AGI. Increasingly sophisticated AI may eventually lead to AGI, possibly within the next several decades (Baum et al. 2011; Müller and Bostrom 2014). Eventually it will make economic sense to automate all or nearly all jobs (Hanson 2008; Hall 2008). As AGIs will possess many advantages over humans (Sotala 2012; Muehlhauser and Salamon 2012a, b; Bostrom 2014), a greater and greater proportion of the workforce will consist of intelligent machines. 2.2.2 AGIs Might Harm Humans AGIs might bestow overwhelming military, economic, or political power on the groups that control them (Bostrom 2002, 2014). For example, automation could lead to an ever-increasing transfer of wealth and power to the owners of the AGIs (Brynjolfsson and McAfee 2011). AGIs could also be used to develop advanced weapons and plans for military operations or political takeovers (Bostrom 2002). Some of these scenarios could lead to catastrophic risks, depending on the capabilities of the AGIs and other factors. Our focus is on the risk from the possibility that AGIs could behave in unexpected and harmful ways, even if the intentions of their owners were benign. Even modern-day narrow-AI systems are becoming autonomous and powerful enough that they sometimes take unanticipated and harmful actions before a human supervisor has a chance to react. To take one example, rapid automated trading was found to have contributed to the 2010 stock market “Flash Crash” (CFTC and SEC 2010).⁵ Autonomous systems may also cause people difficulties in more mundane situations, such as when a credit card is automatically flagged as possibly stolen due to an unusual usage pattern (Allen et al. 2006), or when automatic defense systems malfunction and cause deaths (Shachtman 2007). As machines become more autonomous, humans will have fewer opportunities to intervene in time and will be forced to rely on machines making good choices. This has prompted the creation of the field of “machine ethics” (Wallach and Allen 2009; Allen et al. 2006; Anderson and Anderson 2011), concerned with creating AI systems designed to make appropriate moral choices. Compared to narrow-AI systems, AGIs will be even more autonomous and capable, and will thus require even more robust solutions for governing their behavior.⁶ If some AGIs were both powerful and indifferent to human values, the consequences could be disastrous. At one extreme, powerful AGIs indifferent to human survival could bring about human extinction. As Yudkowsky (2008a) writes, “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” Omohundro (2007, 2008) and Bostrom (2012) argue that standard microeconomic theory prescribes particular instrumental behaviors which are useful for the achievement of almost any set of goals. Furthermore, any agents which do not follow certain axioms of rational behavior will possess vulnerabilities which some other agent may exploit to their own benefit. Thus AGIs which understand these principles and wish to act efficiently will modify themselves so that their behavior more closely resembles rational economic behavior (Omohundro 2012). Extra resources are useful in the pursuit of nearly any set of goals, and self-preservation behaviors will increase the probability that the agent can continue to further its goals. AGI systems which follow rational economic theory will then exhibit tendencies toward behaviors such as self-replicating, breaking into other machines, and acquiring resources without regard for anyone else’s safety. They will also attempt to improve themselves in order to more effectively achieve these and other goals, which could lead to rapid improvement even if the designers did not intend the agent to self-improve. Even AGIs that were explicitly designed to behave ethically might end up acting at cross-purposes to humanity, because it is difficult to precisely capture the complexity of human values in machine goal systems (Yudkowsky 2011; Muehlhauser and Helm 2012a, b; Bostrom 2014). Muehlhauser and Helm (2012a, b) caution that moral philosophy has found no satisfactory formalization of human values. All moral theories proposed so far would lead to undesirable consequences if implemented by superintelligent machines. For example, a machine programmed to maximize the satisfaction of human (or sentient) preferences might simply modify people’s brains to give them desires that are maximally easy to satisfy. Intuitively, one might say that current moral theories are all too simple—even if they seem correct at first glance, they do not actually take into account all the things that we value, and this leads to a catastrophic outcome. This could be referred to as the complexity of value thesis. Recent psychological and neuroscientific experiments confirm that human values are highly complex (Muehlhauser and Helm 2012a, b), that the pursuit of pleasure is not the only human value, and that humans are often unaware of their own values. Still, perhaps powerful AGIs would have desirable consequences so long as they were programmed to respect most human values. If so, then our inability to perfectly specify human values in AGI designs need not pose a catastrophic risk. Different cultures and generations have historically had very different values from each other, and it seems likely that over time our values would become considerably different from current-day ones. It could be enough to maintain some small set of core values, though what exactly would constitute a core value is unclear. For example, different people may disagree over whether freedom or well-being is a more important value. Yudkowsky (2011) argues that, due to the fragility of value, the basic problem remains. He argues that, even if an AGI implemented most human values, the outcome might still be unacceptable. For example, an AGI which failed to incorporate the value of novelty could create a solar system filled with countless minds experiencing one highly optimal and satisfying experience over and over again, never doing or feeling anything else (Yudkowsky 2009).⁷ In this paper, we will frequently refer to the problem of “AGI safety” or “safe AGI,” by which we mean the problem of ensuring that AGIs respect human values, or perhaps some extrapolation or idealization of human values. We do not seek to imply that current human values would be the best possible ones, that AGIs could not help us in developing our values further, or that the values of other sentient beings would be irrelevant. Rather, by “human values” we refer to the kinds of basic values that nearly all humans would agree upon, such as that AGIs forcibly reprogramming people’s brains, or destroying humanity, would be a bad outcome. In cases where proposals related to AGI risk might change human values in some major but not as obviously catastrophic way, we will mention the possibility of these changes but remain agnostic on whether they are desirable or undesirable. We conclude this section with one frequently forgotten point in order to avoid catastrophic risks or worse, it is not enough to ensure that only some AGIs are safe. Proposals which seek to solve the issue of catastrophic AGI risk need to also provide some mechanism for ensuring that most (or perhaps even “nearly all”) AGIs are either created safe or prevented from doing considerable harm. 2.2.3 AGIs May Become Powerful Quickly There are several reasons why AGIs may quickly come to wield unprecedented power in society. “Wielding power” may mean having direct decision-making power, or it may mean carrying out human decisions in a way that makes the decision maker reliant on the AGI. For example, in a corporate context an AGI could be acting as the executive of the company, or it could be carrying out countless low-level tasks which the corporation needs to perform as part of its daily operations. Bugaj and Goertzel (2007) consider three kinds of AGI scenarios: capped intelligence, soft takeoff, and hard takeoff. In a capped intelligence scenario, all AGIs are prevented from exceeding a predetermined level of intelligence and remain at a level roughly comparable with humans. In a soft takeoff scenario, AGIs become far more powerful than humans, but on a timescale which permits ongoing human interaction during the ascent. Time is not of the essence, and learning proceeds at a relatively human-like pace. In a hard takeoff scenario, an AGI will undergo an extraordinarily fast increase in power, taking effective control of the world within a few years or less.⁸ In this scenario, there is little time for error correction or a gradual tuning of the AGI’s goals. The viability of many proposed approaches depends on the hardness of a takeoff. The more time there is to react and adapt to developing AGIs, the easier it is to control them. A soft takeoff might allow for an approach of incremental machine ethics (Powers 2011), which would not require us to have a complete philosophical theory of ethics and values, but would rather allow us to solve problems in a gradual manner. A soft takeoff might however present its own problems, such as there being a larger number of AGIs distributed throughout the economy, making it harder to contain an eventual takeoff. Hard takeoff scenarios can be roughly divided into those involving the quantity of hardware (the hardware overhang scenario), the quality of hardware (the speed explosion scenario), and the quality of software (the intelligence explosion scenario). Although we discuss them separately, it seems plausible that several of them could happen simultaneously and feed into each other.⁹ 2.2.3.1 Hardware Overhang Hardware progress may outpace AGI software progress. Contemporary supercomputers already rival or even exceed some estimates of the computational capacity of the human brain, while no software seems to have both the brain’s general learning capacity and its scalability.¹⁰ If such trends continue, then by the time the software for AGI is invented there may be a computing overhang—an abundance of cheap hardware available for running thousands or millions of AGIs, possibly with a speed of thought much faster than that of humans (Yudkowsky 2008b; Shulman and Sandberg 2010, Sotala 2012). As increasingly sophisticated AGI software becomes available, it would be possible to rapidly copy improvements to millions of servers, each new version being capable of doing more kinds of work or being run with less hardware. Thus, the AGI software could replace an increasingly large fraction of the workforce.¹¹ The need for AGI systems to be trained for some jobs would slow the rate of adoption, but powerful computers could allow for fast training. If AGIs end up doing the vast majority of work in society, humans could become dependent on them. AGIs could also plausibly take control of Internet-connected machines in order to harness their computing power (Sotala 2012); Internet-connected machines are regularly compromised.¹² 2.2.3.2 Speed Explosion Another possibility is a speed explosion (Solomonoff 1985; Yudkowsky 1996; Chalmers 2010), in which intelligent machines design increasingly faster machines. A hardware overhang might contribute to a speed explosion, but is not required for it. An AGI running at the pace of a human could develop a second generation of hardware on which it could run at a rate faster than human thought. It would then require a shorter time to develop a third generation of hardware, allowing it to run faster than on the previous generation, and so on. At some point, the process would hit physical limits and stop, but by that time AGIs might come to accomplish most tasks at far faster rates than humans, thereby achieving dominance. (In principle, the same process could also be achieved via improved software.) The extent to which the AGI needs humans in order to produce better hardware will limit the pace of the speed explosion, so a rapid speed explosion requires the ability to automate a large proportion of the hardware manufacturing process. However, this kind of automation may already be achieved by the time that AGI is developed.¹³ 2.2.3.3 Intelligence Explosion Third, there could be an intelligence explosion, in which one AGI figures out how to create a qualitatively smarter AGI, and that AGI uses its increased intelligence to create still more intelligent AGIs, and so on,¹⁴ such that the intelligence of humankind is quickly left far behind and the machines achieve dominance (Good 1965; Chalmers 2010; Muehlhauser and Salamon 2012a, b; Loosemore and Goertzel 2012; Bostrom 2014). Yudkowsky (2008a, b) argues that an intelligence explosion is likely. So far, natural selection has been improving human intelligence, and human intelligence has to some extent been able to improve itself. However, the core process by which natural selection improves humanity has been essentially unchanged, and humans have been unable to deeply affect the cognitive algorithms which produce their own intelligence. Yudkowsky suggests that if a mind became capable of directly editing itself, this could spark a rapid increase in intelligence, as the actual process causing increases in intelligence could itself be improved upon. (This requires that there exist powerful improvements which, when implemented, considerably increase the rate at which such minds can improve themselves.) Hall (2008) argues that, based on standard economic considerations, it would not make sense for an AGI to focus its resources on solitary self-improvement. Rather, in order not to be left behind by society at large, it should focus its resources on doing the things that it is good at and trade for the things it is not good at. However, once there exists a community of AGIs that can trade with one another, this community could collectively undergo rapid improvement and leave humans behind. A number of formal growth models have been developed which are relevant to predicting the speed of a takeoff; an overview of these can be found in Sandberg (2010). Many of them suggest rapid growth. For instance, Hanson (1998) suggests that AGI might lead to the economy doubling in months rather than years. However, Hanson is skeptical about whether this would prove a major risk to humanity, and considers it mainly an economic transition similar to the Industrial Revolution. To some extent, the soft/hard takeoff distinction may be a false dichotomy. A takeoff may be soft for a while, and then become hard. Two of the main factors influencing the speed of a takeoff are the pace at which computing hardware is developed and the ease of modifying minds (Sotala 2012). This allows for scenarios in which AGI is developed and there seems to be a soft takeoff for, say, the initial ten years, causing a false sense of security until a breakthrough in hardware development causes a hard takeoff. Another factor that might cause a false sense of security is the possibility that AGIs can be developed by a combination of insights from humans and AGIs themselves. As AGIs become more intelligent and it becomes possible to automate portions of the development effort, those parts accelerate and the parts requiring human effort become bottlenecks. Reducing the amount of human insight required could dramatically accelerate the speed of improvement. Halving the amount of human involvement required might at most double the speed of development, possibly giving an impression of relative safety, but going from 50% human insight required to 1% human insight required could cause the development to become ninety-nine times faster.¹⁵ From a safety viewpoint, the conservative assumption is to presume the worst (Yudkowsky 2001). Yudkowsky argues that the worst outcome would be a hard takeoff, as it would give us the least time to prepare and correct errors. On the other hand, it can also be argued that a soft takeoff would be just as bad, as it would allow the creation of multiple competing AGIs, allowing the AGIs that were the least burdened with goals such as “respect human values” to prevail. We would ideally like a solution, or a combination of solutions, which would work effectively for both a soft and a hard takeoff. References Allen, Colin, and Wendell Wallach. 2012. “Moral Machines: Contradiction in Terms or Abdication of Human Responsibility.” In Lin, Abney, and Bekey 2012, 55–68. Allen, Colin, Wendell Wallach, and Iva Smit. 2006. “Why Machine Ethics?” IEEE Intelligent Systems 21 (4): 12–17. doi:10.​1109/​MIS.​2006.​83. Amdahl, Gene M. 1967. “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities.” In Proceedings of the April 18–20, 1967, Spring Joint Computer Conference—AFIPS ’67 (Spring), 483–485. New York: ACM Press. doi:10.​1145/​1465482.​1465560. Anderson, Michael, and Susan Leigh Anderson, eds. 2011. Machine Ethics. New York: Cambridge University Press. Arkin, Ronald C. 2009. Governing Lethal Behavior in Autonomous Robots. Boca Raton, FL: CRC Press. Baum, Seth D., Ben Goertzel, and Ted G. Goertzel. 2011. “How Long Until Human-Level AI? Results from an Expert Assessment.” Technological Forecasting and Social Change 78 (1): 185–195. doi:10.​1016/​j.​techfore.​2010.​09.​006. Bostrom, Nick. 1998. “How Long Before Superintelligence?” International Journal of Futures Studies 2. Bostrom, Nick. 2002. “Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards.” Journal of Evolution and Technology 9. ʬ Bostrom, Nick. 2012. “The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.” In “Theory and Philosophy of AI,” edited by Vincent C. Müller. Special issue, Minds and Machines 22 (2): 71–85. doi:10.​1007/​s11023-012-9281-3. Bostrom, Nick. 2014. Superintelligence: Paths, dangers, strategies. Oxford University Press. Bostrom, Nick, and Milan M. Ćirković. 2008. “Introduction.” In Bostrom, Nick, and Milan M. Ćirković, eds. Global Catastrophic Risks. New York: Oxford University Press., 1–30. Brynjolfsson, Erik, and Andrew McAfee. 2011. Race Against The Machine: How the Digital Revolution is Accelerating Innovation, Driving Productivity, and Irreversibly Transforming Employment and the Economy. Lexington, MA: Digital Frontier. Kindle edition. Bugaj, Stephan Vladimir, and Ben Goertzel. 2007. “Five Ethical Imperatives and Their Implications for Human-AGI Interaction.” Dynamical Psychology. ʬ Butler, Samuel [Cellarius, pseud.]. 1863. “Darwin Among the Machines.” Christchurch Press, June 13. ʬ CFTC & SEC (Commodity Futures Trading Commission and Securities & Exchange Commission). 2010. Findings Regarding the Market Events of May 6, 2010: Report of the Staffs of the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues. Washington, DC. ʬ Chalmers, David John. 2010. “The Singularity: A Philosophical Analysis.” Journal of Consciousness Studies 17 (9–10): 7–65. ʬ Congress, US. 2000. National Defense Authorization, Fiscal Year 2001, Pub. L. No. 106–398, 114 Stat. 1654. Dahm, Werner J. A. 2010. Technology Horizons: A Vision for Air Force Science & Technology During 2010-2030. AF/ST-TR-10-01-PR. Washington, DC: USAF. ʬ Good, Irving John. 1965. “Speculations Concerning the First Ultraintelligent Machine.” In Advances in Computers, edited by Franz L. Alt and Morris Rubinoff, 31–88. Vol. 6. New York: Academic Press. doi:10.​1016/​S0065-2458(08)60418-0. Hall, John Storrs 2008. “Engineering Utopia.” In Wang, Goertzel, and Franklin 2008, 460–467. Hanson, Robin. 1998. “Economic Growth Given Machine Intelligence.” Unpublished manuscript. Accessed May 15, 2013. ʬ Hanson, Robin. 2008. “Economics of the Singularity.” IEEE Spectrum 45 (6): 45–50. doi:10.​1109/​MSPEC.​2008.​4531461. Hollerbach, John M., Matthew T. Mason, and Henrik I. Christensen. 2009. A Roadmap for US Robotics: From Internet to Robotics. Snobird, UT: Computing Community Consortium. ʬ Joy, Bill. 2000. “Why the Future Doesn’t Need Us.” Wired, April. ʬ Legg, Shane, and Marcus Hutter. 2007. “A Collection of Definitions of Intelligence.” In Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms—Proceedings of the AGI Workshop 2006, edited by Ben Goertzel and Pei Wang, 17–24. Frontiers in Artificial Intelligence and Applications 157. Amsterdam: IOS. Loosemore, Richard, and Ben Goertzel. 2012. “Why an Intelligence Explosion is Probable.” In Eden, Amnon, Johnny Søraker, James H. Moor, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Berlin: Springer. Miller, James D. 2012. Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World. Dallas, TX: BenBella Books. Moore, David, Vern Paxson, Stefan Savage, Colleen Shannon, Stuart Staniford, and Nicholas Weaver. 2003. “Inside the Slammer Worm.” IEEE Security & Privacy Magazine 1 (4): 33–39. doi:10.​1109/​MSECP.​2003.​1219056. Moore, David, Colleen Shannon, and Jeffery Brown. 2002. “Code-Red: A Case Study on the Spread and Victims of an Internet Worm.” In Proceedings of the Second ACM SIGCOMM Workshop on Internet Measurement (IMW ’02), 273–284. New York: ACM Press. doi:10.​1145/​637201.​637244. Moravec, Hans P. 1998. “When Will Computer Hardware Match the Human Brain?” Journal of Evolution and Technology 1. ʬ Muehlhauser, Luke, and Louie Helm. 2012. “The Singularity and Machine Ethics.” In Eden, Amnon, Johnny Søraker, James H. Moor, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Berlin: Springer. Muehlhauser, Luke, and Anna Salamon. 2012. “Intelligence Explosion: Evidence and Import.” In Eden, Amnon, Johnny Søraker, James H. Moor, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Berlin: Springer. Müller, V. C., and Bostrom, N. 2014. Future progress in artificial intelligence: A survey of expert opinion. Fundamental Issues of Artificial Intelligence. Omohundro, Stephen M. 2007. “The Nature of Self-Improving Artificial Intelligence.” Paper presented at Singularity Summit 2007, San Francisco, CA, September 8–9. ʬ Omohundro, Stephen M. 2008. “The Basic AI Drives.” In Wang, Goertzel, and Franklin 2008, 483–492. Omohundro, Stephen M. 2012. “Rational Artificial Intelligence for the Greater Good.” In Eden, Amnon, Johnny Søraker, James H. Moor, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Berlin: Springer. Powers, Thomas M. 2011. “Incremental Machine Ethics.” IEEE Robotics & Automation Magazine 18 (1): 51–58. doi:10.​1109/​MRA.​2010.​940152. Rajab, Moheeb Abu, Jay Zarfoss, Fabian Monrose, and Andreas Terzis. 2007. “My Botnet is Bigger than Yours (Maybe, Better than Yours): Why Size Estimates Remain Challenging.” In Proceedings of 1st Workshop on Hot Topics in Understanding Botnets (HotBots ’07).Berkeley, CA: USENIX. ʬ Sandberg, Anders. 2010. “An Overview of Models of Technological Singularity.” Paper presented at the Roadmaps to AGI and the Future of AGI Workshop, Lugano, Switzerland, March 8. ʬ Sandberg, Anders, and Nick Bostrom. 2008. Whole Brain Emulation: A Roadmap. Technical Report, 2008-3. Future of Humanity Institute, University of Oxford. ʬ Sandberg, Anders, and Nick Bostrom. 2011. Machine Intelligence Survey. Technical Report, 2011-1. Future of Humanity Institute, University of Oxford. ʬ Shachtman, Noah. 2007. “Robot Cannon Kills 9, Wounds 14.” Wired, October 18. ʬ Shulman, Carl, and Anders Sandberg. 2010. “Implications of a Software-Limited Singularity.” In Mainzer, Klaus, ed. ECAP10: VIII European Conference on Computing and Philosophy. Munich: Dr. Hut. Shulman, Carl, Henrik Jonsson, and Nick Tarleton. 2009. “Machine Ethics and Superintelligence.” In Reynolds, Carson, and Alvaro Cassinelli, eds. AP-CAP 2009: The Fifth Asia-Pacific Computing and Philosophy Conference, October 1st-2nd, University of Tokyo, Japan, Proceedings, 95–97. Solomonoff, Ray J. 1985. “The Time Scale of Artificial Intelligence: Reflections on Social Effects.” Human Systems Management 5:149–153. Sotala, Kaj, and Roman V. Yampolskiy. 2013. Responses to catastrophic AGI risk: a survey. Technical report 2013-2. Berkeley, CA: Machine Intelligence Research Institute. Sotala, Kaj, and Roman V. Yampolskiy. 2015. Responses to catastrophic AGI risk: a survey. Physica Scripta, 90(1), 018001. Sotala, Kaj. 2012. “Advantages of Artificial Intelligences, Uploads, and Digital Minds.” International Journal of Machine Consciousness 4 (1): 275–291. doi:10.​1142/​S179384301240016​1. Staniford, Stuart, Vern Paxson, and Nicholas Weaver. 2002. “How to 0wn the Internet in Your Spare Time.” In Proceedings of the 11th USENIX Security Symposium, edited by Dan Boneh, 149–167. Berkeley, CA: USENIX. ʬ Top500.org. 2016. Top500 list – June 2016. ʬ Vinge, Vernor. 1993. “The Coming Technological Singularity: How to Survive in the Post-Human Era.” In Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace, 11–22. NASA Conference Publication 10129. NASA Lewis Research Center. ʬ Wallach, Wendell, and Colin Allen. 2009. Moral Machines: Teaching Robots Right from Wrong. New York: Oxford University Press. doi:10.​1093/​acprof:​oso/​9780195374049.​001.​0001. Wallach, Wendell, Colin Allen, and Iva Smit. 2008. “Machine Morality: Bottom-Up and Top-Down Approaches for Modelling Human Moral Faculties.” In “Ethics and Artificial Agents.” Special issue, AI & Society 22 (4): 565–582. doi:10.​1007/​s00146-007-0099-0. Whitby, Blay. 1996. Reflections on Artificial Intelligence: The Legal, Moral, and Ethical Dimensions. Exeter, UK: Intellect Books. Wiener, Norbert. 1960. “Some Moral and Technical Consequences of Automation.” Science 131 (3410): 1355–1358. ʬ Yampolskiy, Roman V. 2013. What to Do with the Singularity Paradox? Studies in Applied Philosophy, Epistemology and Rational Ethics vol 5, pp. 397–413. Springer Berlin Heidelberg. Yudkowsky, Eliezer. 1996. “Staring into the Singularity.” Unpublished manuscript. Last revised May 27, 2001. ʬ Yudkowsky, Eliezer. 2001. Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures. The Singularity Institute, San Francisco, CA, June 15. ʬ Yudkowsky, Eliezer. 2008a. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Bostrom, Nick, and Milan M. Ćirković, eds. Global Catastrophic Risks. New York: Oxford University Press., 308–345. Yudkowsky, Eliezer. 2008b. “Hard Takeoff.” Less Wrong (blog), December 2. ʬ Yudkowsky, Eliezer. 2009. “Value is Fragile.” Less Wrong (blog), January 29. ʬ Yudkowsky, Eliezer. 2011. Complex Value Systems are Required to Realize Valuable Futures. The Singularity Institute, San Francisco, CA. ʬ Footnotes This chapter is based on three earlier publications (Sotala and Yampolskiy 2015; Sotala and Yampolskiy 2013; Yampolskiy 2013). Unlike the term “human-level AI,” the term “Artificial General Intelligence” does not necessarily presume that the intelligence will be human-like. For this paper, we use a binary distinction between narrow AI and AGI. This is merely for the sake of simplicity we do not assume the actual difference between the two categories to necessarily be so clean-cut. A catastrophic risk is something that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities (Bostrom and Ćirković 2008). An existential risk is one that threatens human extinction (Bostrom 2002). Many writers argue that AGI might be a risk of such magnitude (Butler 1863; Wiener 1960; Good 1965; Vinge 1993; Joy 2000; Yudkowsky 2008a; Bostrom 2014). On the less serious front, see ʬ for an amusing example of automated trading going awry. In practice, there have been two separate communities doing research on automated moral decision-making (Muehlhauser and Helm 2012a, b; Allen and Wallach 2012; Shulman et al. 2009). The “AI risk” community has concentrated specifically on advanced AGIs (e.g. Yudkowsky 2008a; Bostrom 2014), while the “machine ethics” community typically has concentrated on more immediate applications for current-day AI (e.g. Wallach et al. 2008; Anderson and Anderson 2011). In this chapter, we have cited the machine ethics literature only where it seemed relevant, leaving out papers that seemed to be too focused on narrow-AI systems for our purposes. In particular, we have left out most discussions of military machine ethics (Arkin 2009), which focus primarily on the constrained special case of creating systems that are safe for battlefield usage. Miller (2012) similarly notes that, despite a common belief to the contrary, it is impossible to write laws in a manner that would match our stated moral principles without a judge needing to use a large amount of implicit common-sense knowledge to correctly interpret them. “Laws shouldn’t always be interpreted literally because legislators can’t anticipate all possible contingencies. Also, humans’ intuitive feel for what constitutes murder goes beyond anything we can commit to paper. The same applies to friendliness.” (Miller 2012). Bugaj and Goertzel defined hard takeoff to refer to a period of months or less. We have chosen a somewhat longer time period, as even a few years might easily turn out to be too little time for society to properly react. Bostrom (2014, chap. 3) discusses three kinds of superintelligence. A speed superintelligence “can do all that a human intellect can do, but much faster”. A collective superintelligence is “a system composed of large number of smaller intellects such that the system's overall performance across many very general domains vastly outstrips that of any current cognitive system”. A quality superintelligence “is at least as fast as a human mind and vastly qualitatively smarter”. These can be seen as roughly corresponding to the different kinds of hard takeoff scenarios. A speed explosion implies a speed superintelligence, an intelligence explosion a quality superintelligence, and a hardware overhang may lead to any combination of speed, collective, and quality superintelligence. Bostrom (1998) estimates that the effective computing capacity of the human brain might be somewhere around 10¹⁷ operations per second (OPS), and Moravec (1998) estimates it at 10¹⁴ OPS. As of June 2016, the fastest supercomputer in the world had achieved a top capacity of 10¹⁶ floating-point operations per second (FLOPS) and the five-hundredth fastest a top capacity of 10¹⁴ FLOPS (Top500 2016). Note however that OPS and FLOPS are not directly comparable and there is no reliable way of interconverting the two. Sandberg and Bostrom (2008) estimate that OPS and FLOPS grow at a roughly comparable rate. The speed that would allow AGIs to take over most jobs would depend on the cost of the hardware and the granularity of the software upgrades. A series of upgrades over an extended period, each producing a 1% improvement, would lead to a more gradual transition than a single upgrade that brought the software from the capability level of a chimpanzee to a rough human equivalence. Note also that several companies, including Amazon and Google, offer vast amounts of computing power for rent on an hourly basis. An AGI that acquired money and then invested all of it in renting a large amount of computing resources for a brief period could temporarily achieve a much larger boost than its budget would otherwise suggest. Botnets are networks of computers that have been compromised by outside attackers and are used for illegitimate purposes. Rajab et al. (2007) review several studies which estimate the sizes of the largest botnets as being between a few thousand to 350,000 bots. Modern-day malware could theoretically infect any susceptible Internet-connected machine within tens of seconds of its initial release (Staniford et al. 2002). The Slammer worm successfully infected more than 90% of vulnerable hosts within ten minutes, and had infected at least 75,000 machines by the thirty-minute mark (Moore et al. 2003). The previous record holder in speed, the Code Red worm, took fourteen hours to infect more than 359,000 machines (Moore et al. 2002). Loosemore and Goertzel (2012) also suggest that current companies carrying out research and development are more constrained by a lack of capable researchers than by the ability to carry out physical experiments. Most accounts of this scenario do not give exact definitions for “intelligence” or explain what a “superintelligent” AGI would be like, instead using informal characterizations such as “a machine that can surpass the intellectual activities of any man however clever” (Good 1965) or “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills” (Bostrom 1998). Yudkowsky (2008a) defines intelligence in relation to “optimization power,” the ability to reliably hit small targets in large search spaces, such as by finding the a priori exceedingly unlikely organization of atoms which makes up a car. A more mathematical definition of machine intelligence is offered by Legg and Hutter (2007). Sotala (2012) discusses some of the functional routes to actually achieving superintelligence. The relationship in question is similar to that described by Amdahl’s (1967) law. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_3 3. Responses to the Journey to the Singularity Kaj Sotala¹   and Roman Yampolskiy²   Foundational Research Institute, Basel, Switzerland University of Louisville, 222 Eastern Parkway, Louisville, KY 40292, USA Kaj Sotala Email: kaj.sotala@foundational-research.org Roman Yampolskiy (Corresponding author) Email: roman.yampolskiy@louisville.edu 3.1 Introduction The notion of catastrophic AGI risk is not new, and this concern was expressed by early thinkers in the field. Hence, there have also been many proposals concerning what to do about it. The proposals we survey are neither exhaustive nor mutually exclusive: the best way of achieving a desirable outcome may involve pursuing several proposals simultaneously. Section 3.2 briefly discusses some of the most recent developments in the field. Sections 3.3–3.5 survey three categories of proposals for dealing with AGI risk: societal proposals, proposals for external constraints on AGI behaviors, and proposals for creating AGIs that are safe due to their internal design. Although the main purpose of this paper is to provide a summary of existing work, we briefly provide commentary on the proposals in each major subsection of Sects. 3.3–3.5 and highlight some of the proposals we consider the most promising in Sect. 3.6. which are, regulation (Sect. 3.3.3), merging with machines (Sect. 3.3.4), AGI confinement (Sect. 3.4.1), Oracle AI (Sect. 3.5.1), and motivational weaknesses (Sect. 3.5.7). In the long term, the most promising approaches seem to be value learning (Sect. 3.5.2.5) and human-like architectures (Sect. 3.5.3.4). Section 3.6 provides an extended discussion of the various merits and problems of these proposals. 3.2 Post-Superintelligence Responses This chapter is based on an earlier paper (Sotala and Yampolskiy 2015), which was the formally published version of a previous technical report (Sotala and Yampolskiy 2013). The tech report, in turn, was a greatly expanded version of an earlier conference paper (Yampolskiy 2013). Since the writing of the original papers, the topic of catastrophic AGI risk has attracted considerable attention both in academia and the popular press, much of it due to the publication of the book Superintelligence (Bostrom 2014). We feel that it would not be appropriate to simply lump in all the new responses together with the old sections, as the debate has now become considerably more active and high-profile. In particular, numerous AI researchers have signed an open letter calling for more research into making sure that AI systems will be robust and beneficial rather than just capable (Future of Life Institute 2015). The open letter included a list of suggested research directions (Russell et al. 2015), including ones specifically aimed at dealing with the risks from AGI. The research directions document draws on a number of sources, including an ambitious research agenda recently published by the Machine Intelligence Research Institute (see Chap. 5). Soon after the publication of the open letter, Elon Musk donated 10 million dollars for the purpose of furthering research into safe and beneficial AI and AGI. At the same time, several prominent researchers have also expressed the feeling that the risks from AGI are overhyped, and that there is a danger of the general public taking them too seriously at this stage. This position has been expressed in interviews of researchers such as Professor Andrew Ng (Madrigal 2015) and Facebook AI director Yann LeCun (Gomes 2015), who emphasize that current-day technology is still a long way from AGI. Even the more skeptical researchers tend to agree that the issue will eventually require some consideration, however (Alexander 2015). 3.3 Societal Proposals Proposals can be divided into three general categories proposals for societal action, design proposals for external constraints on AGI behavior, and design recommendations for internal constraints on AGI behavior. In this section we briefly survey societal proposals. These include doing nothing, integrating AGIs with society, regulating research, merging with machines, and relinquishing research into AGI. 3.3.1 Do Nothing 3.3.1.1 AI Is Too Distant to Be Worth Our Attention One response is that, although AGI is possible in principle, there is no reason to expect it in the near future. Typically, this response arises from the belief that, although there have been great strides in narrow AI, researchers are still very far from understanding how to build AGI. Distinguished computer scientists such as Gordon Bell and Gordon Moore, as well as cognitive scientists such as Douglas Hofstadter and Steven Pinker, have expressed the opinion that the advent of AGI is remote (IEEE Spectrum 2008). Davis (2012) reviews some of the ways in which computers are still far from human capabilities. Bringsjord and Bringsjord (2012) even claim that a belief in AGI this century is fideistic, appropriate within the realm of religion but not within science or engineering. Some writers also actively criticize any discussion of AGI risk in the first place. The philosopher Alfred Nordmann (2007, 2008) holds the view that ethical concern is a scarce resource, not to be wasted on unlikely future scenarios such as AGI. Likewise, Dennett (2012) considers AGI risk an “imprudent pastime” because it distracts our attention from more immediate threats. Others think that AGI is far off and not yet a major concern, but admit that it might be valuable to give the issue some attention. A presidential panel of the Association for the Advancement of Artificial Intelligence considering the long-term future of AI concluded that there was overall skepticism about AGI risk, but that additional research into the topic and related subjects would be valuable (Horvitz and Selman 2009). Posner (2004) writes that dedicated efforts for addressing the problem can wait, but that we should gather more information about the problem in the meanwhile. Potential negative consequences of AGI are enormous, ranging from economic instability to human extinction. “Do nothing” could be a reasonable course of action if near-term AGI seemed extremely unlikely, if it seemed too early for any proposals to be effective in reducing risk, or if those proposals seemed too expensive to implement. As a comparison, asteroid impact prevention is generally considered a topic worth studying, even though the probability of a civilization-threatening asteroid impact in the near future is not considered high. Napier (2008) discusses several ways of estimating the frequency of such impacts. Many models produce a rate of one civilization-threatening impact per five hundred thousand or more years, though some models suggest that rates of one such impact per hundred thousand years cannot be excluded. An estimate of one impact per hundred thousand years would suggest less than a 0.1% chance of a civilization-threatening impact within the next hundred years. The probability of AGI being developed within the same period seems considerably higher (Müller and Bostrom 2014), and there is likewise a reasonable chance of a hard takeoff after it has been developed (Yudkowsky 2008, 2008b), suggesting that the topic is at the very least worth studying. Even without a hard takeoff, society is becoming increasingly automated, and even narrow AI is starting to require ethical guidelines (Wallach and Allen 2009). We know neither which fields of science will be needed nor how much progress in them will be necessary for safe AGI. If much progress is needed and we believe effective progress to be possible this early on, it becomes reasonable to start studying the topic even before AGI is near. Muehlhauser and Helm (2012) suggest that, for one safe AGI approach alone (value learning, discussed further in Sect. 3.5.2.5), efforts by AGI researchers, economists, mathematicians, and philosophers may be needed. Safe AI may require the solutions for some of these problems to come well before AGI is developed. 3.3.1.2 Little Risk, no Action Needed Some authors accept that a form of AGI will probably be developed but do not consider autonomous AGI to be a risk, or consider the possible negative consequences acceptable. Bryson and Kime (1998) argue that, although AGI will require us to consider ethical and social dangers, the dangers will be no worse than those of other technologies. Whitby (1996) writes that there has historically been no consistent trend of the most intelligent people acquiring the most authority, and that computers will augment humans rather than replace them. Whitby and Oliver (2000) further argue that AGIs will not have any particular motivation to act against us. Jenkins (2003) agrees with these points to the extent of saying that a machine will only act against humans if it is programmed to value itself over humans, although she does find AGI to be a real concern. Another kind of “no action needed” response argues that AGI development will take a long time (Brooks 2008), implying that there will be plenty of time to deal with the issue later on. This can also be taken as an argument for later efforts being more effective, as they will be better tuned to AGI as it develops. Others argue that superintelligence will not be possible at all.¹ McDermott (2012) points out that there are no good examples of algorithms which could be improved upon indefinitely. Deutsch (2011) argues that there will never be superintelligent AGIs, because human minds are already universal reasoners, and computers can at best speed up the experimental work that is required for testing and fine-tuning theories. He also suggests that even as the speed of technological development increases, so will our ability to deal with change. Anderson (2010) likewise suggests that the inherent unpredictability of the world will place upper limits on an entity’s effective intelligence. Heylighen (2012) argues that a single, stand-alone computer is exceedingly unlikely to become superintelligent, and that individual intelligences are always outmatched by the distributed intelligence found in social systems of many minds. Superintelligence will be achieved by building systems that integrate and improve the “Global Brain,” the collective intelligence of everyone on Earth. Heylighen does acknowledge that this kind of a transition will pose its own challenges, but not of the kind usually evoked in discussions of AGI risk. The idea of AGIs not having a motivation to act against humans is intuitively appealing, but there seem to be strong theoretical arguments against it. As mentioned earlier, Omohundro (2007, 2008) and Bostrom (2012) argue that self-replication and the acquisition of resources are useful in the pursuit of many different kinds of goals, and that many types of AI systems will therefore exhibit tendencies toward behaviors such as breaking into other machines, self-replicating, and acquiring resources without regard for anyone else’s safety. The right design might make it possible to partially work around these behaviors (Shulman 2010a; Wang 2012), but they still need to be taken into account. Furthermore, we might not foresee all the complex interactions of different AGI mechanisms in the systems that we build, and they may end up with very different goals than the ones we intended (Yudkowsky 2008, 2011). Can AGIs become superintelligent? First, we note that AGIs do not necessarily need to be much more intelligent than humans in order to be dangerous. AGIs already enjoy advantages such as the ability to rapidly expand their population by having themselves copied (Hanson 1994, 2008; Sotala 2012a), which may confer on them considerable economic and political influence even if they were not superintelligent. A better-than-human ability to coordinate their actions, which AGIs of a similar design could plausibly have (Sotala 2012), might then be enough to tilt the odds in their favor. Another consideration is that AGIs do not necessarily need to be qualitatively more intelligent than humans in order to outperform humans. An AGI that merely thought twice as fast as any single human could still defeat him at intellectual tasks that had a time constraint, all else equal. Here an “intellectual” task should be interpreted broadly to refer not only to “book smarts” but to any task that animals cannot perform due to their mental limitations—including tasks involving social skills (Yudkowsky 2008). Straightforward improvements in computing power could provide AGIs with a considerable advantage in speed, which the AGI could then use to study and accumulate experiences that improved its skills. As for Heylighen’s (2012) Global Brain argument, there does not seem to be a reason to presume that powerful AGIs could not be geographically distributed, or that they couldn’t seize control of much of the Internet. Even if individual minds were not very smart and needed a society to make progress, for minds that are capable of copying themselves and communicating perfectly with each other, individual instances of the mind might be better understood as parts of a whole than as separate individuals. In general, the distinction between an individual and a community might not be meaningful for AGIs. If there were enough AGIs, they might be able to form a community sufficient to take control of the rest of the Earth. Heylighen (2007) himself has argued that many of the features of the Internet are virtually identical to the mechanisms used by the human brain. If the AGI is not carefully controlled, it might end up in a position where it made up the majority of the “Global Brain” and could undertake actions which the remaining parts of the organism did not agree with. 3.3.1.3 Let Them Kill Us Dietrich (2007) argues that humanity frequently harms other species, and that people have also evolved to hurt other people by engaging in behaviors such as child abuse, sexism, rape, and racism. Therefore, human extinction would not matter, as long as the machines implemented only the positive aspects of humanity. De Garis (2005) suggests that AGIs destroying humanity might not matter. He writes that on a cosmic scale, with hundreds of billions of stars in our galaxy alone, the survival of the inhabitants of a single planet is irrelevant. As AGIs would be more intelligent than us in every way, it would be better if they replaced humanity. AGIs being more intelligent and therefore more valuable than humans equates intelligence with value, but Bostrom (2004) suggests ways by which a civilization of highly intelligent entities might lack things which we thought to have value. For example, such entities might not be conscious in the first place. Alternatively, there are many things which we consider valuable for their own sake, such as humor, love, game-playing, art, sex, dancing, social conversation, philosophy, literature, scientific discovery, food and drink, friendship, parenting, and sport. We value these due to the fact that we have dispositions and preferences which have been evolutionarily adaptive in the past, but for a future civilization few or none of them might be, creating a world with very little of value. Bostrom (2012) proposes an orthogonality thesis, by which an artificial intelligence can have any combination of intelligence level and goal, including goals that humans would intuitively deem to be of no value. 3.3.1.4 “Do Nothing” Proposals—Our View As discussed above, completely ignoring the possibility of AGI risk at this stage would seem to require a confident belief in at least one of the following propositions

    1. AGI is very remote. 
      
    2. 2. There is no major risk from AGI even if it is created.   3. 3. Very little effective work can be done at this stage.   4. 4. AGIs destroying humanity would not matter. In the beginning of this paper, we mentioned several experts who considered it plausible that AGI might be created in the next twenty to one hundred years; in this section we have covered experts who disagree. In general, there is a great deal of disagreement among people who have made AGI predictions, and no clear consensus even among experts in the field of artificial intelligence. The lack of expert agreement suggests that expertise in the field does not contribute to an ability to make reliable predictions.² If the judgment of experts is not reliable, then, probably, neither is anyone else’s. This suggests that it is unjustified to be highly certain of AGI being near, but also of it not being near. We thus consider it unreasonable to have a confident belief in the first proposition. The second proposition also seems questionable. As discussed in the previous chapter, AGIs seem very likely to obtain great power, possibly very quickly. Furthermore, as also discussed in the previous chapter, the complexity and fragility of value theses imply that it could be very difficult to create AGIs which would not cause immense amounts of damage if they had enough power. It also does not seem like it is too early to work on the problem as we summarize in Sect. 3.6, there seem to be a number of promising research directions which can already be pursued. We also agree with Yudkowsky (2008), who points out that research on the philosophical and technical requirements of safe AGI might show that broad classes of possible AGI architectures are fundamentally unsafe, suggesting that such architectures should be avoided. If this is the case, it seems better to have that knowledge as early as possible, before there has been a great deal of investment into unsafe AGI designs. In response to the suggestion that humanity being destroyed would not matter, we certainly agree that there is much to be improved in today’s humanity, and that our future descendants might have very little resemblance to ourselves. Regardless, we think that much about today’s humans is valuable and worth preserving, and that we should be able to preserve it without involving the death of present humans. 3.3.2 Integrate with Society Integration proposals hold that AGI might be created in the next several decades, and that there are indeed risks involved. These proposals argue that the best way to deal with the problem is to make sure that our societal structures are equipped to handle AGIs once they are created. There has been some initial work toward integrating AGIs with existing legal and social frameworks, such as considering questions of their legal position and moral rights (Gunkel 2012). 3.3.2.1 Legal and Economic Controls Hanson (2012) writes that the values of older and younger generations have often been in conflict with each other, and he compares this to a conflict between humans and AGIs. He believes that the best way to control AGI risk is to create a legal framework such that it is in the interest of both humans and AGIs to uphold it. Hanson (2009) suggests that if the best way for AGIs to get what they want is via mutually agreeable exchanges, then humans would need to care less about what the AGIs wanted. According to him, we should be primarily concerned with ensuring that the AGIs will be law-abiding enough to respect our property rights. Miller (2012) summarizes Hanson’s argument, and the idea that humanity could be content with a small fraction of the world’s overall wealth and let the AGIs have the rest. An analogy to this idea is that humans do not kill people who become old enough to no longer contribute to production, even though younger people could in principle join together and take the wealth of the older people. Instead, old people are allowed to keep their wealth even while in retirement. If things went well, AGIs might similarly allow humanity to “retire” and keep its accumulated wealth, even if humans were no longer otherwise useful for AGIs. Hall (2007a) also says that we should ensure that the interactions between ourselves and machines are economic, “based on universal rules of property and reciprocity.” Moravec (1999) likewise writes that governmental controls should be used to ensure that humans benefit from AGIs. Without government intervention, humans would be squeezed out of existence by more efficient robots, but taxation could be used to support human populations for a long time. He also recommends laws which would require any AGIs to incorporate programming that made them safe and subservient to human desires. Sandberg (2001) writes that relying only on legal and economic controls would be problematic, but that a strategy which also incorporated them in addition to other approaches would be more robust than a strategy which did not. However, even if AGIs were integrated with human institutions, it does not guarantee that human values would survive. If humans were reduced to a position of negligible power, AGIs might not have any reason to keep us around. Economic arguments, such as the principle of comparative advantage, are sometimes invoked to argue that AGI would find it more beneficial to trade with us than to do us harm. However, technological progress can drive the wages of workers below the level needed for survival, and there is already a possible threat of technological unemployment (Brynjolfsson and McAfee 2011). AGIs keeping humans around due to gains from trade implicitly presumes that they would not have the will or the opportunity to simply eliminate humans in order to replace them with a better trading partner, and then trade with the new partner instead. Humans already eliminate species with low economic value in order to make room for more humans, such as when clearing a forest in order to build new homes. Clark (2007) uses the example of horses in Britain their population peaked in 1901, with 3.25 million horses doing work such as plowing fields, hauling wagons and carriages short distances, and carrying armies into battle. The internal combustion engine replaced so many of them that by 1924 there were fewer than two million. Clark writes There was always a wage at which all these horses could have remained employed. But that wage was so low that it did not pay for their feed, and it certainly did not pay enough to breed fresh generations of horses to replace them. Horses were thus an early casualty of industrialization (Clark 2007). There are also ways to harm humans while still respecting their property rights, such as by manipulating them into making bad decisions, or selling them addictive substances. If AGIs were sufficiently smarter than humans, humans could be tricked into making a series of trades that respected their property rights but left them with negligible assets and caused considerable damage to their well-being. A related issue is that AGIs might become more capable of changing our values than we are capable of changing AGI values. Mass media already convey values that have a negative impact on human well-being, such as idealization of rare body types, which causes dissatisfaction among people who do not have those kinds of bodies (Groesz et al. 2001; Agliata and Tantleff-Dunn 2004). AGIs with a deep understanding of human psychology could engineer the spread of values which shifted more power to them, regardless of their effect on human well-being. Yet another problem is ensuring that the AGIs have indeed adopted the right values. Making intelligent beings adopt specific values is a difficult process which often fails. There could be an AGI with the wrong goals that would pretend to behave correctly in society throughout the whole socialization process. AGIs could conceivably preserve and conceal their goals far better than humans could. Society does not know of any methods which would reliably instill our chosen values in human minds, despite a long history of trying to develop them. Our attempts to make AGIs adopt human values would be hampered by our lack of experience and understanding of the AGI’s thought processes, with even tried-and-true methods for instilling positive values in humans possibly being ineffective. The limited success that we do have with humans is often backed up by various incentives as well as threats of punishment, both of which might fail in the case of an AGI developing to become vastly more powerful than us. Additionally, the values which a being is likely to adopt, or is even capable of adopting, will depend on its mental architecture. We will demonstrate these claims with examples from humans, who are not blank slates on whom arbitrary values can be imposed with the right education. Although the challenge of instilling specific values in humans is very different from the challenge of instilling them in AGIs, our examples are meant to demonstrate the fact that the existing properties of a mind will affect the process of acquiring values. Just as it is difficult to make humans permanently adopt some kinds of values, the kind of mental architecture that an AGI has will affect its inclination to adopt various values. Psychopathy is a risk factor for violence, and psychopathic criminals are much more likely to reoffend than nonpsychopaths (Hare et al. 2000). Harris and Rice (2006) argue that therapy for psychopaths is ineffective and may even make them more dangerous, as they use their improved social skills to manipulate others more effectively. Furthermore, “cult brainwashing” is generally ineffective and most cult members will eventually leave (Anthony and Robbins 2004); and large-scale social engineering efforts often face widespread resistance, even in dictatorships with few scruples about which methods to use (Scott 1998, Chap. 6–7). Thus, while one can try to make humans adopt values, this will only work to the extent that the individuals in question are actually disposed toward adopting them. 3.3.2.2 Foster Positive Values Kurzweil (2005), considering the possible effects of many future technologies, notes that AGI may be a catastrophic risk. He generally supports regulation and partial relinquishment of dangerous technologies, as well as research into their defensive applications. However, he believes that with AGI this may be insufficient and that, at the present time, it may be infeasible to develop strategies that would guarantee safe AGI. He argues that machine intelligences will be tightly integrated into our society and that, for the time being, the best chance of avoiding AGI risk is to foster positive values in our society. This will increase the likelihood that any AGIs that are created will reflect such positive values. One possible way of achieving such a goal is moral enhancement (Douglas 2008), the use of technology to instill people with better motives. Persson and Savulescu (2008, 2012) argue that, as technology improves, we become more capable of damaging humanity, and that we need to carry out moral enhancement in order to lessen our destructive impulses. 3.3.2.3 “Integrate with Society” Proposals—Our View Proposals to incorporate AGIs into society suffer from the issue that some AGIs may never adopt benevolent and cooperative values, no matter what the environment. Neither does the intelligence of the AGIs necessarily affect their values (Bostrom 2012). Sufficiently intelligent AGIs could certainly come to eventually understand human values, but humans can also come to understand others’ values while continuing to disagree with them. Thus, in order for these kinds of proposals to work, they need to incorporate strong enforcement mechanisms to keep non–safe AGIs in line and to prevent them from acquiring significant power. This requires an ability to create value-conforming AGIs in the first place, to implement the enforcement. Even a soft takeoff would eventually lead to AGIs wielding great power, so the enforcement could not be left to just humans or narrow AIs.³ In practice, this means that integration proposals must be combined with some proposal for internal constraints which is capable of reliably creating value-conforming AGIs. Integration proposals also require there to be a soft takeoff in order to work, as having a small group of AGIs which rapidly acquired enough power to take control of the world would prevent any gradual integration schemes from working. Therefore, because any effective integration strategy would require creating safe AGIs, and the right safe AGI design could lead to a positive outcome even if there were a hard takeoff, we believe that it is currently better to focus on proposals which are aimed at furthering the creation of safe AGIs. 3.3.3 Regulate Research Integrating AGIs into society may require explicit regulation. Calls for regulation are often agnostic about long-term outcomes but nonetheless recommend caution as a reasonable approach. For example, Hibbard (2005b) calls for international regulation to ensure that AGIs will value the long-term well-being of humans, but does not go into much detail. Daley (2011) calls for a government panel for AGI issues. Hughes (2001) argues that AGI should be regulated using the same mechanisms as previous technologies, creating state agencies responsible for the task and fostering global cooperation in the regulation effort. Current mainstream academic opinion does not consider AGI a serious threat (Horvitz and Selman 2009), so AGI regulation seems unlikely in the near future. On the other hand, many AI systems are becoming increasingly autonomous, and a number of authors are arguing that even narrow-AI applications should be equipped with an understanding of ethics (Wallach and Allen 2009). Currently there are calls to regulate AI in the form of high-frequency trading (Sobolewski 2012), and AI applications that have a major impact on society might become increasingly regulated. At the same time, legislation has a well-known tendency to lag behind technology, and regulating AI applications will probably not translate into regulating basic research into AGI. 3.3.3.1 Review Boards Yampolskiy and Fox (2012) note that university research programs in the social and medical sciences are overseen by institutional review boards. They propose setting up analogous review boards to evaluate potential AGI research. Research that was found to be AGI related would be restricted with measures ranging from supervision and funding limits to partial or complete bans. At the same time, research focusing on safety measures would be encouraged. Posner (2004, p. 221) suggests the enactment of a law which would require scientific research projects in dangerous areas to be reviewed by a federal catastrophic risks assessment board, and forbidden if the board found that the project would create an undue risk to human survival. Wilson (2013) makes possibly the most detailed AGI regulation proposal so far, recommending a new international treaty where a body of experts would determine whether there was a “reasonable level of concern” about AGI or some other possibly dangerous research. States would be required to regulate research or even temporarily prohibit it once experts agreed upon there being such a level of concern. He also suggests a number of other safeguards built into the treaty, such as the creation of ethical oversight organizations for researchers, mechanisms for monitoring abuses of dangerous technologies, and an oversight mechanism for scientific publications. 3.3.3.2 Encourage Research into Safe AGI In contrast, McGinnis (2010) argues that the government should not attempt to regulate AGI development. Rather, it should concentrate on providing funding for research projects intended to create safe AGI. Goertzel and Pitt (2012) argue for an open-source approach to safe AGI development instead of regulation. Hibbard (2008) has likewise suggested developing AGI via open-source methods, but not as an alternative to regulation. Legg (2009) proposes funding safe AGI research via an organization that takes a venture capitalist approach to funding research teams, backing promising groups and cutting funding to any teams that fail to make significant progress. The focus of the funding would be to make AGI as safe as possible. 3.3.3.3 Differential Technological Progress Both review boards and government funding could be used to implement “differential intellectual progress” Differential intellectual progress consists in prioritizing risk-reducing intellectual progress over risk-increasing intellectual progress. As applied to AI risks in particular, a plan of differential intellectual progress would recommend that our progress on the scientific, philosophical, and technological problems of AI safety outpace our progress on the problems of AI capability such that we develop safe superhuman AIs before we develop (arbitrary) superhuman AIs (Muehlhauser and Salamon 2012). Examples of research questions that could constitute philosophical or scientific progress in safety can be found in later sections of this paper—for instance, the usefulness of different internal constraints on ensuring safe behavior, or ways of making AGIs reliably adopt human values as they learn what those values are like. Bostrom (2002) used the term “differential technological progress” to refer to differential intellectual progress in technological development. Bostrom defined differential technological progress as “trying to retard the implementation of dangerous technologies and accelerate implementation of beneficial technologies, especially those that ameliorate the hazards posed by other technologies”. One issue with differential technological progress is that we do not know what kind of progress should be accelerated and what should be retarded. For example, a more advanced communication infrastructure could make AGIs more dangerous, as there would be more networked machines that could be accessed via the Internet. Alternatively, it could be that the world will already be so networked that AGIs will be a major threat anyway, and further advances will make the networks more resilient to attack. Similarly, it can be argued that AGI development is dangerous for as long as we have yet to solve the philosophical problems related to safe AGI design and do not know which AGI architectures are safe to pursue (Yudkowsky 2008). But it can also be argued that we should invest in AGI development now, when the related tools and hardware are still primitive enough that progress will be slow and gradual (Goertzel and Pitt 2012). 3.3.3.4 International Mass Surveillance For AGI regulation to work, it needs to be enacted on a global scale. This requires solving both the problem of effectively enforcing regulation within a country and the problem of getting many different nations to all agree on the need for regulation. Shulman (2009) discusses various factors influencing the difficulty of AGI arms control. He notes that AGI technology itself might make international cooperation more feasible. If narrow AIs and early-stage AGIs were used to analyze the information obtained from wide-scale mass surveillance and wiretapping, this might make it easier to ensure that nobody was developing more advanced AGI designs. Shulman (2010b) similarly notes that machine intelligences could be used to enforce treaties between nations. They could also act as trustworthy inspectors which would be restricted to communicating only information about treaty violations, thus not endangering state secrets even if they were allowed unlimited access to them. This could help establish a “singleton” regulatory regimen capable of effectively enforcing international regulation, including AGI-related treaties. Goertzel and Pitt (2012) also discuss the possibility of having a network of AGIs monitoring the world in order to police other AGIs and to prevent any of them from suddenly obtaining excessive power. Another proposal for international mass surveillance is to build an “AGI Nanny” (Goertzel 2012b; Goertzel and Pitt 2012), a proposal discussed in Sect. 3.5.4. Large-scale surveillance efforts are ethically problematic and face major political resistance, and it seems unlikely that current political opinion would support the creation of a far-reaching surveillance network for the sake of AGI risk alone. The extent to which such extremes would be necessary depends on exactly how easy it would be to develop AGI in secret. Although several authors make the point that AGI is much easier to develop unnoticed than something like nuclear weapons (McGinnis 2010; Miller 2012), cutting-edge high-tech research does tend to require major investments which might plausibly be detected even by less elaborate surveillance efforts. To the extent that surveillance does turn out to be necessary, there is already a strong trend toward a “surveillance society” with increasing amounts of information about people being collected and recorded in various databases (Wood and Ball 2006). As a reaction to the increased surveillance, Mann et al. (2003) propose to counter it with sousveillance—giving private individuals the ability to document their life and subject the authorities to surveillance in order to protect civil liberties. This is similar to the proposals of Brin (1998), who argues that technological progress might eventually lead to a “transparent society,” where we will need to redesign our societal institutions in a way that allows us to maintain some of our privacy despite omnipresent surveillance. Miller (2012) notes that intelligence agencies are already making major investments in AI-assisted analysis of surveillance data. If social and technological developments independently create an environment where large-scale surveillance or sousveillance is commonplace, it might be possible to take advantage of those developments in order to police AGI risk.⁴ Walker (2008) argues that in order for mass surveillance to become effective, it must be designed in such a way that it will not excessively violate people’s privacy, for otherwise the system will face widespread sabotage. Even under such conditions, there is no clear way to define what counts as dangerous AGI. Goertzel and Pitt (2012) point out that there is no clear division between narrow AI and AGI, and attempts to establish such criteria have failed. They argue that since AGI has a nebulous definition, obvious wide-ranging economic benefits, and potentially rich penetration into multiple industry sectors, it is unlikely to be regulated due to speculative long-term risks. AGI regulation requires global cooperation, as the noncooperation of even a single nation might lead to catastrophe. Historically, achieving global cooperation on tasks such as nuclear disarmament and climate change has been very difficult. As with nuclear weapons, AGI could give an immense economic and military advantage to the country that develops it first, in which case limiting AGI research might even give other countries an incentive to develop AGI faster (Miller 2012). To be effective, regulation also needs to enjoy support among those being regulated. If developers working in AGI-related fields only follow the letter of the law, while privately viewing all regulations as annoying hindrances, and fears about AGI as overblown, the regulations may prove ineffective. Thus, it might not be enough to convince governments of the need for regulation; the much larger group of people working in the appropriate fields may also need to be convinced. While Shulman (2009) argues that the unprecedentedly destabilizing effect of AGI could be a cause for world leaders to cooperate more than usual, the opposite argument can be made as well. Gubrud (1997) argues that increased automation could make countries more self-reliant, and international cooperation considerably more difficult. AGI technology is also much harder to detect than, for example, nuclear technology is—nuclear weapons require a substantial infrastructure to develop, while AGI needs much less (McGinnis 2010; Miller 2012). Miller (2012) even suggests that the mere possibility of a rival being close to developing AGI might, if taken seriously, trigger a nuclear war. The nation that was losing the AGI race might think that being the first to develop AGI was sufficiently valuable that it was worth launching a first strike for, even if it would lose most of its own population in the retaliatory attack. He further argues that, although it would be in the interest of every nation to try to avoid such an outcome, the ease of secretly pursuing an AGI development program undetected, in violation of treaty, could cause most nations to violate the treaty. Miller also points out that the potential for an AGI arms race exists not only between nations, but between corporations as well. He notes that the more AGI developers there are, the more likely it is that they will all take more risks, with each AGI developer reasoning that if they don’t take this risk, somebody else might take that risk first. Goertzel and Pitt (2012) suggest that for regulation to be enacted, there might need to be an “AGI Sputnik”—a technological achievement that makes the possibility of AGI evident to the public and policy makers. They note that after such a moment, it might not take very long for full human-level AGI to be developed, while the negotiations required to enact new kinds of arms control treaties would take considerably longer. So far, the discussion has assumed that regulation would be carried out effectively and in the pursuit of humanity’s common interests, but actual legislation is strongly affected by lobbying and the desires of interest groups (Olson 1982; Mueller 2003, Chap. 22). Many established interest groups would have an economic interest in either furthering or retarding AGI development, rendering the success of regulation uncertain. 3.3.3.5 “Regulate Research” Proposals—Our View Although there seem to be great difficulties involved with regulation, there also remains the fact that many technologies have been successfully subjected to international regulation. Even if one were skeptical about the chances of effective regulation, an AGI arms race seems to be one of the worst possible scenarios, one which should be avoided if at all possible. We are therefore generally supportive of regulation, though the most effective regulatory approach remains unclear. 3.3.4 Enhance Human Capabilities While regulation approaches attempt to limit the kinds of AGIs that will be created, enhancement approaches attempt to give humanity and AGIs a level playing field. In principle, gains in AGI capability would not be a problem if humans could improve themselves to the same level. Alternatively, human capabilities could be improved in order to obtain a more general capability to deal with difficult problems. Verdoux (2010, 2011) suggests that cognitive enhancement could help in transforming previously incomprehensible mysteries into tractable problems, and Verdoux (2010) particularly highlights the possibility of cognitive enhancement helping to deal with the problems posed by existential risks. One problem with such approaches is that increasing humanity’s capability for solving problems will also make it easier to develop dangerous technologies. It is possible that cognitive enhancement should be combined with moral enhancement, in order to help foster the kind of cooperation that would help avoid the risks of technology (Persson and Savulescu 2008, 2012). Moravec (1988, 1999) proposes that humans could keep up with AGIs via “mind uploading,” a process of transferring the information in human brains to computer systems so that human minds could run on a computer substrate. This technology may arrive during a similar timeframe as AGI (Kurzweil 2005; Sandberg and Bostrom 2008; Hayworth 2012; Koene 2012b; Cattell and Parker 2012; Sandberg 2012). However, Moravec argues that mind uploading would come after AGIs, and that unless the uploaded minds (“uploads”) would transform themselves to become radically nonhuman, they would be weaker and less competitive than AGIs that were native to a digital environment Moravec (1992, 1999). For these reasons, Warwick (1998) also expresses doubt about the usefulness of mind uploading. Kurzweil (2005) posits an evolution that will start with brain-computer interfaces, then proceed to using brain-embedded nanobots to enhance our intelligence, and finally lead to full uploading and radical intelligence enhancement. Koene (2012a) criticizes plans to create safe AGIs and considers uploading both a more feasible and a more reliable approach. Similar proposals have also been made without explicitly mentioning mind uploading. Cade (1966) speculates on the option of gradually merging with machines by replacing body parts with mechanical components. Turney (1991) proposes linking AGIs directly to human brains so that the two meld together into one entity, and Warwick (1998, 2003) notes that cyborgization could be used to enhance humans. Mind uploading might also be used to make human value systems more accessible and easy to learn for AGIs, such as by having an AGI extrapolate the upload’s goals directly from its brain, with the upload providing feedback. 3.3.4.1 Would We Remain Human? Uploading might destroy parts of humanity that we value (Joy 2000; de Garis 2005). De Garis (2005) argues that a computer could have far more processing power than a human brain, making it pointless to merge computers and humans. The biological component of the resulting hybrid would be insignificant compared to the electronic component, creating a mind that was negligibly different from a “pure” AGI. Kurzweil (2001) makes the same argument, saying that although he supports intelligence enhancement by directly connecting brains and computers, this would only keep pace with AGIs for a couple of additional decades. The truth of this claim seems to depend on exactly how human brains are augmented. In principle, it seems possible to create a prosthetic extension of a human brain that uses the same basic architecture as the original brain and gradually integrates with it (Sotala and Valpola 2012). A human extending their intelligence using such a method might remain roughly human-like and maintain their original values. However, it could also be possible to connect brains with computer programs that are very unlike human brains, and which would substantially change the way the original brain worked. Even smaller differences could conceivably lead to the adoption of “cyborg values” distinct from ordinary human values (Warwick 2003). Bostrom (2004) speculates that humans might outsource many of their skills to nonconscious external modules and would cease to experience anything as a result. The value-altering modules would provide substantial advantages to their users, to the point that they could outcompete uploaded minds who did not adopt the modules. 3.3.4.2 Would Evolutionary Pressures Change Us? A willingness to integrate value-altering modules is not the only way by which a population of uploads might come to have very different values from modern-day humans. This is not necessarily a bad, or even a very novel, development the values of earlier generations have often been different from the values of later generations (Hanson 2012), and it might not be a problem if a civilization of uploads enjoyed very different things than a civilization of humans. Still, as there are possible outcomes that we would consider catastrophic, such as the loss of nearly all things that have intrinsic value for us (Bostrom 2004), it is worth reviewing some of the postulated changes in values. For comprehensiveness, we will summarize all of the suggested effects that uploading might have on human values, even if they are not obviously negative. Readers may decide for themselves whether or not they consider any of these effects concerning. Hanson (1994) argues that employers will want to copy uploads who are good workers, and that at least some uploads will consent to being copied in such a manner. He suggests that the resulting evolutionary dynamics would lead to an accelerated evolution of values. This would cause most of the upload population to evolve to be indifferent or favorable to the thought of being copied, to be indifferent toward being deleted as long as another copy of themselves remained, and to be relatively uninterested in having children “the traditional way” (as opposed to copying an already-existing mind). Although Hanson’s analysis uses the example of a worker-employer relationship, it should be noted that nations or families, or even single individuals, could also gain a competitive advantage by copying themselves, thus contributing to the strength of the evolutionary dynamic. Similarly, Bostrom (2004) writes that much of human life’s meaning depends on the enjoyment of things ranging from humor and love to literature and parenting. These capabilities were adaptive in our past, but in an upload environment they might cease to be such and gradually disappear entirely. Shulman (2010b) likewise considers uploading-related evolutionary dynamics. He notes that there might be a strong pressure for uploads to make copies of themselves in such a way that individual copies would be ready to sacrifice themselves to aid the rest. This would favor a willingness to copy oneself, and a view of personal identity which did not consider the loss of a single copy to be death. Beings taking this point of view could then take advantage of economic benefits of continually creating and deleting vast numbers of minds depending on the conditions, favoring the existence of a large number of short-lived copies over a somewhat less efficient world of long-lived minds. Finally, Sotala and Valpola (2012) consider the possibility of minds coalescing via artificial connections that linked several brains together in the same fashion as the two hemispheres of ordinary brains are linked together. If this were to happen, considerable benefits might accrue to those who were ready to coalesce with other minds. The ability to copy and share memories between minds might also blur distinctions between individual minds. In the end, most humans might cease to be individual, distinct people in any real sense of the word. It has also been proposed that information security concerns could cause undesirable dynamics among uploads, with significant advantages accruing to those who could steal the computational resources of others and use them to create new copies of themselves. If one could seize control of the hardware that an upload was running on, it could be immediately replaced with a copy of a mind loyal to the attacker. It might even be possible to do this without being detected, if it was possible to steal enough of an upload’s personal information to impersonate it. An attack targeting a critical vulnerability in some commonly used piece of software might quickly hit a very large number of victims. As discussed in the previous chapter, both theoretical arguments and actual cases of malware show that large numbers of machines on the Internet could be infected in a very short time (Staniford et al. 2002, Moore et al. 2002, 2003). In a society of uploads, attacks such as these would be not only inconvenient, but potentially fatal. Eckersley and Sandberg (2013) offer a preliminary analysis of information security in a world of uploads. 3.3.4.3 Would Uploading Help? Even if the potential changes of values were deemed acceptable, it is unclear whether the technology for uploading could be developed before developing AGI. Uploading might require emulating the low-level details of a human brain with a high degree of precision, requiring large amounts of computing power (Sandberg and Bostrom 2008; Cattell and Parker 2012). In contrast, an AGI might be designed around high-level principles which have been chosen to be computationally cheap to implement on existing hardware architectures. Yudkowsky (2008) uses the analogy that it is much easier to figure out the principles of aerodynamic flight and then build a Boeing 747 than it is to take a living bird and “upgrade” it into a giant bird that can carry passengers, all while ensuring that the bird remains alive and healthy throughout the process. Likewise, it may be much easier to figure out the basic principles of intelligence and build AGIs than to upload existing minds. On the other hand, one can also construct an analogy suggesting that it is easier to copy a thing’s function than it is to understand how it works. If a person does not understand architecture but wants to build a sturdy house, it may be easier to create a replica of an existing house than it is to design an entirely new one that does not collapse. Even if uploads were created first, they might not be able to harness all the advantages of digitality, as many of these advantages depend on minds being easy to modify (Sotala 2012), which human minds may not be. Uploads will be able to directly edit their source code as well as introduce simulated pharmaceutical and other interventions, and they could experiment on copies that are restored to an unmodified state if the modifications turn out to be unworkable (Shulman 2010b). Regardless, human brains did not evolve to be easy to modify, and it may be difficult to find a workable set of modifications which would drastically improve them. In contrast, in order for an AGI programmed using traditional means to be manageable as a software project, it must be easy for the engineers to modify it.⁵ Thus, even if uploading were developed before AGI, AGIs that were developed later might still be capable of becoming more powerful than uploads. However, existing uploads already enjoying some of the advantages of the newly-created AGIs would still make it easier for the uploads to control the AGIs, at least for a while. Moravec (1992) notes that the human mind has evolved to function in an environment which is drastically different from a purely digital environment, and that the only way to remain competitive with AGIs would be to transform into something that was very different from a human. This suggests that uploading might buy time for other approaches, but would be only a short-term solution in and of itself. If uploading technology were developed before AGI, it could be used to upload a research team or other group and run them at a vastly accelerated rate as compared to the rest of humanity. This would give them a considerable amount of extra time for developing any of the other approaches. If this group were among the first to be successfully emulated and sped up, and if the speed-up would allow enough subjective time to pass before anyone else could implement their own version, they might also be able to avoid trading safety for speed. However, such a group might be able to wield tremendous power, so they would need to be extremely reliable and trustworthy. 3.3.4.4 “Enhance Human Capabilities” Proposals—Our View Of the various “enhance human capabilities” approaches, uploading proposals seem the most promising, as translating a human brain to a computer program would sidestep many of the constraints that come from modifying a physical system. For example, all relevant brain activity could be recorded for further analysis at an arbitrary level of detail, and any part of the brain could be instantly modified without a need for time-consuming and possibly dangerous invasive surgery. Uploaded brains could also be more easily upgraded to take full advantage of more powerful hardware, while humans whose brains were still partially biological would be bottlenecked by the speed of the biological component. Uploading does have several problems Uploading research might lead to AGI being created before the uploads, in the long term uploads might have unfavorable evolutionary dynamics, and it seems likely that there will eventually be AGIs which are capable of outperforming uploads in every field of competence. Uploads could also be untrustworthy even without evolutionary dynamics. At the same time, however, uploading research doesn’t necessarily accelerate AGI research very much, the evolutionary dynamics might not be as bad as they seem at the moment, and the advantages gained from uploading might be enough to help control unsafe AGIs until safe ones could be developed. Methods could also be developed for increasing the trustworthiness of uploads (Shulman 2010b). Uploading might still turn out to be a useful tool for handling AGI risk. 3.3.5 Relinquish Technology Not everyone believes that the risks involved in creating AGIs are acceptable. Relinquishment involves the abandonment of technological development that could lead to AGI. This is possibly the earliest proposed approach, with Butler (1863) writing that “war to the death should be instantly proclaimed” upon machines, for otherwise they would end up destroying humans entirely. In a much-discussed article, Joy (2000) suggests that it might be necessary to relinquish at least some aspects of AGI research, as well as nanotechnology and genetics research. Hughes (2001) criticizes AGI relinquishment, while Kurzweil (2005) criticizes broad relinquishment but supports the possibility of “fine-grained relinquishment,” banning some dangerous aspects of technologies while allowing general work on them to proceed. In general, most writers reject proposals for broad relinquishment. 3.3.5.1 Outlaw AGI Weng et al. (2009) write that the creation of AGIs would force society to shift from human-centric values to robot-human dual values. In order to avoid this, they consider the possibility of banning AGI. This could be done either permanently or until appropriate solutions are developed for mediating such a conflict of values. McKibben (2003), writing mainly in the context of genetic engineering, also suggests that AGI research should be stopped. Hughes (2001) argues that attempts to outlaw a technology will only make the technology move to other countries. De Garis (2005) believes that differences of opinion about whether to build AGIs will eventually lead to armed conflict, to the point of open warfare. Annas et al. (2002) have similarly argued that genetic engineering of humans would eventually lead to war between unmodified humans and the engineered “posthumans,” and that cloning and inheritable modifications should therefore be banned. To the extent that one accepts their reasoning with regard to humans, it could also be interpreted to apply to AGIs. 3.3.5.2 Restrict Hardware Berglas (2012) suggests not only stopping AGI research, but also outlawing the production of more powerful hardware. Berglas holds that it will be possible to build computers as powerful as human brains in the very near future, and that we should therefore reduce the power of new processors and destroy existing ones.⁶ Branwen (2012) argues that Moore’s Law depends on the existence of a small number of expensive and centralized chip factories, making them easy targets for regulation and incapable of being developed in secret. 3.3.5.3 “Relinquish Technology” Proposals—Our View Relinquishment proposals suffer from many of the same problems as regulation proposals, but to a greater extent. There is no historical precedent of general, multiuse technology similar to AGI being successfully relinquished for good, nor do there seem to be any theoretical reasons for believing that relinquishment proposals would work in the future. Therefore we do not consider them to be a viable class of proposals. 3.4 External AGI Constraints Societal approaches involving regulation or research into safe AGI assume that proper AGI design can produce solutions to AGI risk. One category of such solutions is that of external constraints. These are restrictions that are imposed on an AGI from the outside and aim to limit its ability to do damage. Several authors have argued that external constraints are unlikely to work with AGIs that are genuinely far more intelligent than us (Vinge 1993; Yudkowsky 2001; 2008; Kurzweil 2005; Chalmers 2010; Armstrong et al. 2012). The consensus seems to be that external constraints might buy time when dealing with less advanced AGIs, but they are useless against truly superintelligent ones. External constraints also limit the usefulness of an AGI, as a free-acting one could serve its creators more effectively. This reduces the probability of the universal implementation of external constraints on AGIs. AGIs might also be dangerous if they were confined or otherwise restricted. For further discussion of these points, see Sect. 3.5.1. 3.4.1 AGI Confinement AGI confinement, or “AI boxing” (Chalmers 2010; Yampolskiy 2012; Armstrong et al. 2012), involves confining an AGI to a specific environment and limiting its access to the external world. Yampolskiy (2012) makes an attempt to formalize the idea, drawing on previous computer security research on the so-called confinement problem (Lampson 1973). Yampolskiy defines the AI confinement problem (AICP) as the challenge of restricting an AGI to a confined environment from which it can’t communicate without authorization. A number of methods have been proposed for implementing AI confinement, many of which are extensively discussed in Armstrong et al. (2012). Chalmers (2010) and Armstrong et al. (2012) mention numerous caveats and difficulties with AI-boxing approaches. A truly leakproof system that perfectly isolated the AGI from an outside environment would prevent us from even observing the AGI. If AGIs were given knowledge about human behavior or psychology, they could still launch social engineering attacks on us (Chalmers 2010; Armstrong et al. 2012). An AGI that was unaware of the existence of humans would be less likely to launch such attacks, but also much more limited in the kinds of tasks that it could be used for. Even if the AGI remained confined, it could achieve enough influence among humans to prevent itself from being reset or otherwise modified (Good 1970). An AGI that people grew reliant on might also become impossible to reset or modify. 3.4.1.1 Safe Questions Yampolskiy (2012) proposes a hierarchy of security levels for communication with the AGI, ranging from completely unrestricted communication to no communication whatsoever. He suggests that the right level for communicating with AGIs is an intermediate one, based on asking “safe questions.” These are defined as questions for which a human could find an answer without help from a superintelligence. For instance, a researcher who was considering two possible cures for cancer could ask the AGI which one to pursue first. Yampolskiy argues that the AGI is effectively not changing the final outcome, as the researcher would in principle be capable of finding the better treatment himself. The AGI, by recommending one of the approaches as the choice to pursue first, is merely speeding up the process. Armstrong et al. (2012) also briefly discuss such approaches. 3.4.1.2 Virtual Worlds While the confinement protocol supposes an AGI that can perceive our world directly, Chalmers (2010) uses the term “leakproof singularity” to refer to the project of creating isolated virtual worlds which would disallow their inhabitants from influencing the external world. An AGI placed in such a world could be monitored by humans to ensure that it was safe to release into our world. However, Chalmers notes that “for an AI system to be useful or interesting to us at all, it must have some effects on us. At a minimum, we must be able to observe it. And the moment we observe a virtual environment, some information leaks out from that environment into our environment and affects us”. Hibbard (2012b) proposes a decision support system for evaluating AGI designs in simulated environments that model our real environment, and attempts to show that the AGI can be built to be safe. Armstrong et al. (2012) discuss the notion that we could supplement this approach with several layers of “sub-worlds” virtual worlds stacked within each other. In such a scenario, even if an AGI somehow manages to break out from one virtual reality, it is still restricted by another and cannot directly influence the world. Ultimately, the authors conclude that this approach isn’t very promising. 3.4.1.3 Resetting the AGI For as long as an AGI remains confined or cooperative, its memory could be altered from the outside. Turney (1991) mentions a proposal in which an AGI’s memory could be wiped after each job, thus limiting its potential. Armstrong et al. (2012) also mention the possibility of resetting an AGI after each run, though they note that there are many ways for the AGI to get around it, and that it is only useful for certain kinds of AGI systems. Good (1970) warns that a superhuman AGI may be smart enough to find ways to prevent itself from being successfully reset or shut down, including methods of social manipulation. Warwick (1998) makes a similar point. 3.4.1.4 Checks and Balances Armstrong et al. (2012) suggest a number of checks and balances that could be used to catch the early stages of an AGI that was behaving dangerously or trying to deceive us. Test questions with known answers could be mixed in with questions we genuinely didn’t know the answer to, in order to establish an AGI’s trustworthiness. “Honey-pots,” tempting resources with great value to the AGI, could also be left in physical or mental locations that it was forbidden from accessing. Finally, several different kinds of AGIs could be presented with the same problem, with their answers then being compared. Inconsistent answers would suggest a possible attempt at deception. Bostrom (2014, Chap. 9) similarly discusses “tripwires”, systems which make diagnostic tests to the AGI and shut it down if they detect signs of dangerous activity. Some possible tripwires includes ones monitoring the AGI’s behavior, its abilities, or possibly even content of the AGI’s thoughts and planning. 3.4.1.5 “AI Confinement” Proposals—Our View Despite their limited ability to deal with AGIs more intelligent than us, AI-boxing techniques seem to have value as a first line of defense, and it may be worthwhile to invest in developing off-the-shelf software packages for AI confinement that are easy and cheap to use. A research project that developed AGI unexpectedly might not have been motivated to make major investments in security, but the AGI might still be sufficiently limited in intelligence that confinement would work. Having a defense that is easy to deploy will make it more likely that these kinds of projects will implement better precautions. However, at the same time there is a risk that this will promote a false sense of security and make research teams think that they have carried out their duty to be cautious merely because they are running elementary confinement protocols. Although some confinement procedures can be implemented on top of an AGI that was not expressly designed for confinement, they are much less reliable than with an AGI that was built with confinement considerations in mind (Armstrong et al. 2012)—and even then, relying solely on confinement is a risky strategy. We are therefore somewhat cautious in our recommendation to develop confinement techniques. 3.4.2 AGI Enforcement One problem with AI confinement proposals is that humans are tasked with guarding machines that may be far more intelligent than themselves (Good 1970). One proposed solution for this problem is to give the task of watching AGIs to other AGIs. Armstrong (2007) proposes that the trustworthiness of a superintelligent system could be monitored via a chain of less powerful systems, all the way down to humans. Although humans couldn’t verify and understand the workings of a superintelligence, they could verify and understand an AGI just slightly above their own level, which could in turn verify and understand an AGI somewhat above its own level, and so on. Chaining multiple levels of AI systems with progressively greater capacity seems to be replacing the problem of building a safe AI with a multisystem, and possibly more difficult, version of the same problem. Armstrong himself admits that there are several problems with the proposal. There could be numerous issues along the line, such as a break in the chain of communication or an inability of a system to accurately assess the mind of another (smarter) system. There is also the problem of creating a trusted bottom for the chain in the first place, which is not necessarily any easier than creating a trustworthy superintelligence. Hall (2007a) writes that there will be a great variety of AGIs, with those that were designed to be moral or aligned with human interests keeping the nonsafe ones in check. Goertzel and Pitt (2012) also propose that we build a community of mutually policing AGI systems of roughly equal levels of intelligence. If an AGI started to “go off the rails,” the other AGIs could stop it. This might not prevent a single AGI from undergoing an intelligence explosion, but a community of AGIs might be in a better position to detect and stop it than humans would. Having AGIs police each other is only useful if the group of AGIs actually has goals and values that are compatible with human goals and values. To this end, the appropriate internal constraints are needed. The proposal of a society of mutually policing AGIs would avoid the problem of trying to control a more intelligent mind. If a global network of mildly superintelligent AGIs could be instituted in such a manner, it might detect and prevent any nascent takeoff. However, by itself such an approach is not enough to ensure safety—it helps guard against individual AGIs “going off the rails,” but it does not help in a scenario where the programming of most AGIs is flawed and leads to nonsafe behavior. It thus needs to be combined with the appropriate internal constraints. A complication is that a hard takeoff is a relative term—an event that happens too fast for any outside observer to stop. Even if the AGI network were a hundred times more intelligent than a network composed of only humans, there might still be a more sophisticated AGI that could overcome the network. 3.4.2.1 “AGI Enforcement” Proposals—Our View AGI enforcement proposals are in many respects similar to social integration proposals, in that they depend on the AGIs being part of a society which is strong enough to stop any single AGI from misbehaving. The greatest challenge is then to make sure that most of the AGIs in the overall system are safe and do not unite against humans rather than against misbehaving AGIs. Also, there might not be a natural distinction between a distributed AGI and a collection of many different AGIs, and AGI design is in any case likely to make heavy use of earlier AI/AGI techniques. AGI enforcement proposals therefore seem like implementation variants of various internal constraint proposals (Sect. 3.5), rather than independent proposals. 3.5 Internal Constraints In addition to external constraints, AGIs could be designed with internal motivations designed to ensure that they would take actions in a manner beneficial to humanity. Alternatively, AGIs could be built with internal constraints that make them easier to control via external means. With regard to internal constraints, Yudkowsky (2008) distinguishes between technical failure and philosophical failure: Technical failure is when you try to build an AI and it doesn’t work the way you think it does—you have failed to understand the true workings of your own code. Philosophical failure is trying to build the wrong thing, so that even if you succeeded you would still fail to help anyone or benefit humanity. Needless to say, the two failures are not mutually exclusive (Yudkowsky 2008). In practice, it is not always easy to distinguish between the two. Most of the discussion below focuses on the philosophical problems of various proposals, but some of the issues, such as whether or not a proposal is actually possible to implement, are technical. 3.5.1 Oracle AI An Oracle AI is a hypothetical AGI that executes no actions other than answering questions. This is a proposal with many similarities to AGI confinement: both involve restricting the extent to which the AGI is allowed to take independent action. We consider the difference to be that an Oracle AI has been programmed to “voluntarily” restrict its activities, whereas AGI confinement refers to methods for restricting an AGI’s capabilities even if it was actively attempting to take more extensive action. Trying to build an AGI that only answered questions might not be as safe as it sounds, however. Correctly defining “take no actions” might prove surprisingly tricky (Armstrong et al. 2012), and the oracle could give flawed advice even if it did correctly restrict its actions. Some possible examples of flawed advice: As extra resources are useful for the fulfillment of nearly all goals (Omohundro 2007, 2008), the oracle may seek to obtain more resources—such as computing power—in order to answer questions more accurately. Its answers might then be biased toward furthering this goal, even if this temporarily reduces the accuracy of its answers, if it believes this to increase the accuracy of its answers in the long run. Another example is that if the oracle had the goal of answering as many questions as possible as fast as possible, it could attempt to manipulate humans into asking it questions that were maximally simple and easy to answer. Holden Karnofsky has suggested that an Oracle AI could be safe if it was “just a calculator,” a system which only computed things that were asked of it, taking no goal-directed actions of its own. Such a “Tool-Oracle AI” would keep humans as the ultimate decision makers. Furthermore, the first team to create a Tool-Oracle AI could use it to become powerful enough to prevent the creation of other AGIs (Karnofsky and Tallinn 2011; Karnofsky 2012). An example of a Tool-Oracle AI approach might be Omohundro’s (2012) proposal of “Safe-AI Scaffolding” creating highly constrained AGI systems which act within limited, predetermined parameters. These could be used to develop formal verification methods and solve problems related to the design of more intelligent, but still safe, AGI systems. Oracle AIs might be considered a special case of domestic AGI (Bostrom 2014, Chap. 9): AGIs which are built to only be interested in taking action”on a small scale, within a narrow context, and through a limited set of action modes”. 3.5.1.1 Oracles Are Likely to Be Released As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous. Current narrow-AI technology includes high-frequency trading (HFT) algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge (Hanson 2000). As a consequence, a trading algorithm’s performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long. Similarly, Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop”. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong. In practice, this may make it much harder for the human to control the robot’s actions, if the robot makes a decision which the operator only has a very short time to override. Docherty & Goose (2012) report on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computer’s plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons. In general, any broad domain involving high stakes, adversarial decision making, and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection, and warfare could plausibly make use of all the intelligence they can get. If one’s opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems. Miller (2012) also points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death. Some AGI designers might also choose to create less constrained and more free-acting AGIs for aesthetic or moral reasons, preferring advanced minds to have more freedom. 3.5.1.2 Oracles Will Become Authorities Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an Oracle AI. This may be a danger even with narrower AI systems. Friedman and Kahn (1992) discuss APACHE, an expert system that provides medical advice to doctors. They write that as the medical community puts more and more trust into APACHE, it may become common practice to act automatically on APACHE’s recommendations, and it may become increasingly difficult to challenge the “authority” of the recommendations. Eventually, APACHE may in effect begin to dictate clinical decisions. Likewise, Bostrom and Yudkowsky (2013) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AGI system would be an even better way of avoiding blame. Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger. Thus, even AGI systems that function purely to provide advice will need to be explicitly designed to be safe in the sense of not providing advice that would go against human values (Wallach and Allen 2009). Yudkowsky (2012) further notes that an Oracle AI might choose a plan that is beyond human comprehension, in which case there’s still a need to design it as explicitly safe and conforming to human values. 3.5.1.3 “Oracle AI” Proposals—Our View Much like with external constraints, it seems like Oracle AIs could be a useful stepping stone on the path toward safe, freely acting AGIs. However, because any Oracle AI can be relatively easily turned into a free-acting AGI and because many people will have an incentive to do so, Oracle AIs are not by themselves a solution to AGI risk, even if they are safer than free-acting AGIs when kept as pure oracles. 3.5.2 Top-Down Safe AGI AGIs built to take autonomous actions will need to be designed with safe motivations. Wallach and Allen divide approaches for ensuring safe behavior into “top-down” and “bottom-up” approaches (Wallach and Allen 2009). They define “top-down” approaches as ones that take a specified ethical theory and attempt to build a system capable of implementing that theory. They have expressed skepticism about the feasibility of both pure top-down and bottom-up approaches, arguing for a hybrid approach.⁷ With regard to top-down approaches, which attempt to derive an internal architecture from a given ethical theory, Wallach (2010) finds three problems
    1. “Limitations already recognized by moral philosophers For example,     in a utilitarian calculation, how can consequences be calculated     when information is limited and the effects of actions cascade in     never-ending interactions? Which consequences should be factored     into the maximization of utility? Is there a stopping procedure?”     (Wallach 2010). 
      
    2. 2. The “frame problem” refers to the challenge of discerning relevant from irrelevant information without having to consider all of it, as all information could be relevant in principle (Pylyshyn 1987; Dennett 1987). Moral decision-making involves a number of problems that are related to the frame problem, such as needing to know what effects different actions have on the world, and needing to estimate whether one has sufficient information to accurately predict the consequences of the actions.   3. 3. “The need for background information. What mechanisms will the system require in order to acquire the information it needs to make its calculations? How does one ensure that this information is up to date in real time?” (Wallach 2010). To some extent, these problems may be special cases of the fact that we do not yet have AGI with good general learning capabilities creating an AGI would also require solving the frame problem, for instance. These problems might therefore not all be as challenging as they seem at first, presuming that we manage to develop AGI in the first place. 3.5.2.1 Three Laws Probably the most widely known proposal for machine ethics is Isaac Asimov’s (1942) Three Laws of Robotics:
    1. A robot may not injure a human being or, through inaction, allow a     human being to come to harm. 
      
    2. 2. A robot must obey orders given to it by human beings except where such orders would conflict with the First Law.   3. 3. A robot must protect its own existence as long as such protection does not conflict with either the First or Second Law. Asimov and other writers later expanded the list to include a number of additional laws, including the Zeroth Law. A robot may not harm humanity, or through inaction allow humanity to come to harm. Although the Three Laws are widely known and have inspired numerous imitations, several of Asimov’s own stories were written to illustrate the fact that the laws contained numerous problems. They have also drawn heavy critique from others (Clarke 1993; 1994; Weld and Etzioni 1994; Pynadath and Tambe 2002; Gordon-Spears 2003; McCauley 2007; Weng et al. 2008; Wallach and Allen 2009; Murphy and Woods 2009; Anderson 2011) and are not considered a viable approach for safe AI. Among their chief shortcomings is the fact that they are too ambiguous to implement and, if defined with complete accuracy, contradict each other in many situations. 3.5.2.2 Categorical Imperative The best-known universal ethical axiom is Kant’s categorical imperative. Many authors have discussed using the categorical imperative as the foundation of AGI morality (Stahl 2002; Powers 2006; Wallach and Allen 2009; Beavers 2009, 2012). All of these authors conclude that Kantian ethics is a problematic goal for AGI, though Powers (2006) still remains hopeful about its prospects. 3.5.2.3 Principle of Voluntary Joyous Growth Goertzel (2004a, b) considers a number of possible axioms before settling on what he calls the “Principle of Voluntary Joyous Growth”, defined as “Maximize happiness, growth and choice”. He starts by considering the axiom “maximize happiness”, but then finds this to be problematic and adds “growth”, which he defines as “increase in the amount and complexity of patterns in the universe”. Finally he adds “choice” in order to allow sentient beings to “choose their own destiny”. 3.5.2.4 Utilitarianism Classic utilitarianism is an ethical theory stating that people should take actions that lead to the greatest amount of happiness and the smallest amount of suffering. The prospects for AGIs implementing a utilitarian moral theory have been discussed by several authors. The consensus is that pure classical utilitarianism is problematic and does not capture all human values. For example, a purely utilitarian AGI could reprogram the brains of humans so that they did nothing but experience the maximal amount of pleasure all the time, and that prospect seems unsatisfactory to many.⁸ 3.5.2.5 Value Learning Freeman (2009) describes a decision-making algorithm which observes people’s behavior, infers their preferences in the form of a utility function, and then attempts to carry out actions which fulfill everyone’s preferences. The standard name for this kind of an approach is inverse reinforcement learning (Ng and Russell 2000). Russell (2015) argues for an inverse reinforcement learning approach, as there is considerable data about human behavior and the attitudes behind it, there are solid economic incentives to solve this problem, the problem does not seem intrinsically harder than learning how the rest of the world works, and because it seems possible to define this goal so as to make the AGIs very careful to ensure that they’re correct about our preferences before taking any serious action. Similarly, Dewey (2011) discusses value learners, AGIs which are provided a probability distribution over possible utility functions that humans may have. Value learners then attempt to find the utility functions with the best match for human preferences. Hibbard (2012a) builds on Dewey’s work to offer a similar proposal. One problem with conceptualizing human desires as utility functions is that human desires change over time (van Gelder 1995) and also violate the axioms of utility theory required to construct a coherent utility function (Tversky and Kahneman 1981). While it is possible to treat inconsistent choices as random deviations from an underlying “true” utility function that is then learned (Nielsen and Jensen 2004), this does not seem to properly describe human preferences. Rather, human decision making and preferences seem to be driven by multiple competing systems, only some of which resemble utility functions (Dayan 2011). Even if a true utility function could be constructed, it does not take into account the fact that we have second-order preferences, or desires about our desires a drug addict may desire a drug, but also desire that he not desire it (Frankfurt 1971). Similarly, we often wish that we had stronger desires toward behaviors which we consider good but cannot make ourselves engage in. Taking second-order preferences into account leads to what philosophers call “ideal preference” theories of value. Taking this into account, it has been argued that we should aim to build AGIs which act according to humanity’s extrapolated values (Yudkowsky 2004; Tarleton 2010; Muehlhauser and Helm 2012). Yudkowsky proposes attempting to discover the “Coherent Extrapolated Volition” (CEV) of humanity, which he defines as our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted (Yudkowsky 2004). CEV remains vaguely defined and has been criticized by several authors (Hibbard 2005a; Goertzel 2010a; Goertzel and Pitt 2012; Miller 2012). However, Tarleton (2010) finds CEV a promising approach, and suggests that CEV has five desirable properties, and that many different kinds of algorithms could possess these features
  1. (1) Policing mechanisms to use AI to regulate AGI.   2. (2) Economic policies focusing on the role of AGI in general economic development.   3. (3) Financial policies targeted on the role and function of AGI in the capital market.   4. (4) Social networks mechanisms to monitor the future role, societal impact and moral hazards of AGI applications.   5. (5) ‘Think Tank’ organisations such as the Oxford University Future of Humanity Institute continually identifying and understanding the evolving cultural, moral and cognitive norms of AGI in society (ʬ   6. (6) Global collaboration between governments to build transparency in AGI developments and an effective worldwide AI monitoring scheme. This may be aided by the general increase in globalisation. It is important to recognise that, in the revolutionary process leading to a potential singularity, the change agencies such as individuals, organisations and governments play key roles in determining not only how innovations are carried out and applied but also what institutional change may occur to regularise the technological development. The co-evolution of the human race and technology is generally an iterative process, providing the opportunity to predict implement structural and institutional control mechanisms to minimise the risk of an unwanted singularity event. Possible future prevention measures may also incorporate counter productive change agencies such as non-cooperative governments and organisations that may try to circumvent AGI development regulation. We need to develop our awareness and measures against potential dangers and be proactive to make prevention mechanisms. We need to have confidence and faith in mankind and believe that AI will continue to be assistance to humans. References Baker, T.; Miner, A. S. and Eesley, D. T. (2003), ‘Improvising Firms: Briolage, Account Giving and Improvisational Competencies in the Founding Process’, Research Policy, 32, 255–76. Battilana, J. (2006), ‘Agency and Institutions: The Enabling Role of Individuals’ Social Position’, Organisation, 13 (5), 653–76. Battilana, J. and Casciaro, T. (2012), ‘Change agents, networks, and institutions: a contingency theory of organisational change’, Academy of Management Journal, 55 (2), 381–398. Bostrom, N. (2002), ‘Existential risks: analysing human extinction scenarios and related harzards’, Journal of Evolution and Technology, 9. Boudreau, M. C. and Robey, D. (2005), ‘Enacting integrated information technology: a human agency perspective’, Organization Science, 16 (1), 3–18. Chesbrough H. (2006), Open Business Models: How to Thrive in the New Innovation Landscape. Boston, MA: Harvard Business School Press. Coleman, J. S. (1986), ‘Social theory, social research, and a theory of action’, American Journal of Sociology, 6, 1309–35. Coppinger, R. (2016), ‘Elon Musk outlines Mars colony vision’, Science & Environment, 27^(th) September 2016, BBC News. (ʬ accessed on 29th September 2016) Compagni, A., Mele, V. and Ravasi, D. (2015), ‘How early implementations influence later adoptions of innovation: social positioning and skill reproduction in the diffusion of robotic surgery’, Academy of Management Journal, 58 (1), 242–278. Dobbin, F. (2004), The New Economic Sociology. Princeton, New York: Princeton University Press. Eden, A. H., Moor, J. H., Soraker, J. H., Steinhart, E. (2013), ‘Singularity Hypotheses: A Scientific and Philosophical Assessment’, Berlin: Springer. Emirbayer, M. and Mische, A. (1998), ‘What is Agency?’, American Journal of Sociology, 103 (4), 962–1023. Garud, R.; Hardy, C. and Maguire, S. (2007), ‘Institutional entrepreneurship as embedded agency: an introduction to the special issue’, Organisation Studies, 28, 957–969. Garud, R.; Jain, S.; and Kumaraswamy, A. (2002), ‘Institutional entrepreneurship in the sponsorship of common technological standards: the Case of sun Microsystems and Java’, Academy of Management Journal, 45 (1), 196–214. Hassard, J., Morris, J., Sheehan, J. and Xiao, J. (2010), ‘China’s state-owned enterprises: economic reform and organisational restructuring’, Journal of Organisational Change Management, 23 (5), 500–516. Heisenberg, W. (1958), Physics and Philosophy: the revolution in modern science, London: Unwin University Books. Hodgson, G. M. (2007), ‘Institutions and individuals: interaction and evolution’, Organisation Studies, 28 (1), 95–116. Hoffman, A. J. (1999), ‘Institutional evolution and change: environmentalism and the US chemical industry’, Academy of Management Journal, 42, 351–371. Kurzweil, R. (2009), The singularity is near, 2^(nd) ed. London: Gerald Duckworth. Lamberg, J. A. and Pajunen, K. (2010), ‘Agency, institutional change, and continuity: The case of the Finish civil war’, Journal of Management Studies, 47 (5), 815–36. Leonardi, P. M. (2009), ‘Organising technology: toward a theory of sociomaterial imbrication’, Academy of Management Annual Meeting Proceedings. Lukes, S. (1973), Individualism, Oxford: Basil Blackwell. Miller, J. D. (2012), ‘Some economic incentives facing a business that might bring about a technological singularity’, chapter 8 in “Singularity Hypotheses, The Frontiers Collection” (eds) by Eden et al., Berlin: Springer. North, D. C. (1990), Institutions, institutional change and economic performance, Cambridge: Cambridge University Press. OECD (Organisation for Economic Co-operation and Development) (2012), ‘Active with the People’s Republic of China’, pp. 1–61. Oliver, C. (1996), ‘The institutional embeddedness of economic activity’, Advances in Strategic Management, 13, 163–186. Orlikowski, W. J. (2000), ‘Using technology and constituting structures: a practice lens for studying technology in organisations’, Organisation Science, 11 (4), 404–428. Physic World (2014), February Issue, p 2. R&D Magazine (2011), ‘China’s R&D Momentum: 2012 Global R&D Funding Forecasting’, R&D Magazine, 53 (7), 60–61. Rae, D. (2015), Opportunity-centred Entrepreneurship, 2^(nd) ed. London: Palgrave. Redding, G. and Witt, M. A. (2009), ‘China’s business system and its future trajectory’, Asia Pacific Journal of Management, September, 26 (3), 381–399. Robertson, T. S. (1986), ‘Competitive effects on technology diffusion’, Journal of Marketing, 50 (1), 1–12. Scott, W. R. (1987), ‘The adolesence of institutional theory’, Administrative Science Quarterly, 32 (4), 493. Seo, M-G. and Creed, W. E. D. (2002), ‘Institutional Contradictions, Praxis, and Institutional Change: A Dialectical Perspective’, Academy of Management Review, 27 (2), 222–247. Suarez-Villa, L. (2009), Technocapitalism, Philadephia: Temple University Press. The European Commission ʬ (accessed on 1^(st) March 2015). Waarts, E., Yvonne, M. v. E. and Hillegersberg, J. v. (2002), ‘The dynamics of factors affecting the adoption of innovations’, The Journal of Product Innovation Management, 19, 412–423. Webster, F. E. (1969), ‘New Product adoption in industrial markets: a framework for analysis’, Journal of Marketing, 33 (1), 35–39. Yoo, Y., Boland Jr, R. J., Lyytinen, K. and Majchrzak, A. (2012), ‘Organising for innovation in the digitalized world’, Organisation Science, 23 (5), 1398–1408. Zheng, P. and Scase, R. (2012), ‘The restructuring of market socialism: the contribution of ‘Agency’ theoretical perspective’, Thunderbird International Business Review, 55 (1), 103–114. Zheng, P. and Scase, R. (2013), Emerging business ventures under market socialism: entrepreneurship in China, Routledge Studies in International Business and The World Economy, London: Routledge. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_5 5. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda Nate Soares¹   and Benya Fallenstein¹   Machine Intelligence Research Institute, Berkeley, USA Nate Soares (Corresponding author) Email: nate@intelligence.org Benya Fallenstein Email: benya@intelligence.org 5.1 Introduction The property that has given humans a dominant advantage over other species is not strength or speed, but intelligence. If progress in artificial intelligence continues unabated, AI systems will eventually exceed humans in general reasoning ability. A system that is “superintelligent” in the sense of being “smarter than the best human brains in practically every field” could have an enormous impact upon humanity (Bostrom 2014). Just as human intelligence has allowed us to develop tools and strategies for controlling our environment, a superintelligent system would likely be capable of developing its own tools and strategies for exerting control (Muehlhauser and Salamon 2012). In light of this potential, it is essential to use caution when developing AI systems that can exceed human levels of general intelligence, or that can facilitate the creation of such systems. Since artificial agents would not share our evolutionary history, there is no reason to expect them to be driven by human motivations such as lust for power. However, nearly all goals can be better met with more resources (Omohundro 2008). This suggests that, by default, superintelligent agents would have incentives to acquire resources currently being used by humanity. (Just as artificial agents would not automatically acquire a lust for power, they would not automatically acquire a human sense of fairness, compassion, or conservatism.) Thus, most goals would put the agent at odds with human interests, giving it incentives to deceive or manipulate its human operators and resist interventions designed to change or debug its behavior (Bostrom 2014, Chap. 8). Care must be taken to avoid constructing systems that exhibit this default behavior. In order to ensure that the development of smarter-than-human intelligence has a positive impact on the world, we must meet three formidable challenges: How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in early AI systems are inevitable? This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future. Sections 5.2 through 5.4 motivate and discuss six research topics that we think are relevant to these challenges. We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.”¹ To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also important to gain a solid formal understanding of why that confidence is justified. This technical agenda argues that there is foundational research we can make progress on today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems. Of the three challenges, the one giving rise to the largest number of currently tractable research questions is the challenge of finding an agent architecture that will reliably and autonomously pursue a set of objectives—that is, an architecture that can at least be aligned with some end goal. This requires theoretical knowledge of how to design agents which reason well and behave as intended even in situations never envisioned by the programmers. The problem of highly reliable agent designs is discussed in Sect. 5.2. The challenge of developing agent designs which are tolerant of human error also gives rise to a number of tractable problems. We expect that smarter-than-human systems would by default have incentives to manipulate and deceive human operators, and that special care must be taken to develop agent architectures which avert these incentives and are otherwise tolerant of programmer error. This problem and some related open questions are discussed in Sect. 5.3. Reliable and error-tolerant agent designs are only beneficial if the resulting agent actually pursues desirable outcomes. The difficulty of concretely specifying what is meant by “beneficial behavior” implies a need for some way to construct agents that reliably learn what to value (Bostrom 2014, Chap. 12). A solution to this “value learning” problem is vital; attempts to start making progress are reviewed in Sect. 5.4. Why work on these problems now, if smarter-than-human AI is likely to be decades away? This question is touched upon briefly below, and is discussed further in Sect. 5.5. In short, the authors believe that there are theoretical prerequisites for designing aligned smarter-than-human systems over and above what is required to design misaligned systems. We believe that research can be done today that will make it easier to address alignment concerns in the future. 5.1.1 Why These Problems? This technical agenda primarily covers topics that the authors believe are tractable, uncrowded, focused, and unable to be outsourced to forerunners of the target AI system. By tractable problems, we mean open problems that are concrete and admit immediate progress. Significant effort will ultimately be required to align real smarter-than-human systems with beneficial values, but in the absence of working designs for smarter-than-human systems, it is difficult if not impossible to begin most of that work in advance. This agenda focuses on research that can help us gain a better understanding today of the problems faced by almost any sufficiently advanced AI system. Whether practical smarter-than-human systems arise in ten years or in one hundred years, we expect to be better able to design safe systems if we understand solutions to these problems. This agenda further limits attention to uncrowded domains, where there is not already an abundance of research being done, and where the problems may not be solved over the course of “normal” AI research. For example, program verification techniques are absolutely crucial in the design of extremely reliable programs (Sotala and Yampolskiy 2015, Sect. 5.5, this volume), but program verification is not covered in this agenda primarily because a vibrant community is already actively studying the topic. This agenda also restricts consideration to focused tools, ones that would be useful for designing aligned systems in particular (as opposed to intelligent systems in general). It might be possible to design generally intelligent AI systems before developing an understanding of highly reliable reasoning sufficient for constructing an aligned system. This could lead to a risky situation where powerful AI systems are built long before the tools needed to safely utilize them. Currently, significant research effort is focused on improving the capabilities of artificially intelligent systems, and comparatively little effort is focused on superintelligence alignment (Bostrom 2014, Chap. 14). For that reason, this agenda focuses on research that improves our ability to design aligned systems in particular. Lastly, we focus on research that cannot be safely delegated to machines. As AI algorithms come to rival humans in scientific inference and planning, new possibilities will emerge for outsourcing computer science labor to AI algorithms themselves. This is a consequence of the fact that intelligence is the technology we are designing: on the path to great intelligence, much of the work may be done by smarter-than-human systems.² As a result, the topics discussed in this agenda are ones that we believe are difficult to safely delegate to AI systems. Error-tolerant agent design is a good example: no AI problem (including the problem of error-tolerant agent design itself) can be safely delegated to a highly intelligent artificial agent that has incentives to manipulate or deceive its programmers. By contrast, a sufficiently capable automated engineer would be able to make robust contributions to computer vision or natural language processing even if its own visual or linguistic abilities were initially lacking. Most intelligent agents optimizing for some goal would also have incentives to improve their visual and linguistic abilities so as to enhance their ability to model and interact with the world. It would be risky to delegate a crucial task before attaining a solid theoretical understanding of exactly what task is being delegated. It may be possible to use our understanding of ideal Bayesian inference to task a highly intelligent system with developing increasingly effective approximations of a Bayesian reasoner, but it would be far more difficult to delegate the task of “finding good ways to revise how confident you are about claims” to an intelligent system before gaining a solid understanding of probability theory. The theoretical understanding is useful to ensure that the right questions are being asked. 5.2 Highly Reliable Agent Designs Bird and Layzell (2002) describe a genetic algorithm which, tasked with making an oscillator, re-purposed the printed circuit board tracks on the motherboard as a makeshift radio to amplify oscillating signals from nearby computers. This is not kind of solution the algorithm would have found if it had been simulated on a virtual circuit board possessing only the features that seemed relevant to the problem. Intelligent search processes in the real world have the ability to use resources in unexpected ways, e.g., by finding “shortcuts” or “cheats” not accounted for in a simplified model. When constructing intelligent systems which learn and interact with all the complexities of reality, it is not sufficient to verify that the algorithm behaves well in test settings. Additional work is necessary to verify that the system will continue working as intended in application. This is especially true of systems possessing general intelligence at or above the human level: superintelligent machines might find strategies and execute plans beyond both the experience and imagination of the programmers, making the clever oscillator of Bird and Layzell look trite. At the same time, unpredictable behavior from smarter-than-human systems could cause catastrophic damage, if they are not aligned with human interests (Yudkowsky 2008). Because the stakes are so high, testing combined with a gut-level intuition that the system will continue to work outside the test environment is insufficient, even if the testing is extensive. It is important to also have a formal understanding of precisely why the system is expected to behave well in application. What constitutes a formal understanding? It seems essential to us to have both (1) an understanding of precisely what problem the system is intended to solve; and (2) an understanding of precisely why this practical system is expected to solve that abstract problem. The latter must wait for the development of practical smarter-than-human systems, but the former is a theoretical research problem that we can already examine. A full description of the problem would reveal the conceptual tools needed to understand why practical heuristics are expected to work. By analogy, consider the game of chess. Before designing practical chess algorithms, it is necessary to possess not only a predicate describing checkmate, but also a description of the problem in term of trees and backtracking algorithms: Trees and backtracking do not immediately yield a practical solution—building a full game tree is infeasible—but they are the conceptual tools of computer chess. It would be quite difficult to justify confidence in a chess heuristic before understanding trees and backtracking. While these conceptual tools may seem obvious in hindsight, they were not clear to foresight. Consider the famous essay by Edgar Allen Poe about Maelzel’s Mechanical Turk (Poe 1836). It is in many ways remarkably sophisticated: Poe compares the Turk to “the calculating machine of Mr. Babbage” and then remarks on how the two systems cannot be of the same kind, since in Babbage’s algebraical problems each step follows of necessity, and so can be represented by mechanical gears making deterministic motions; while in a chess game, no move follows with necessity from the position of the board, and even if our own move followed with necessity, the opponent’s would not. And so (argues Poe) we can see that chess cannot possibly be played by mere mechanisms, only by thinking beings. From Poe’s state of knowledge, Shannon’s (1950) description of an idealized solution in terms of backtracking and trees constitutes a great insight. Our task it to put theoretical foundations under the field of general intelligence, in the same sense that Shannon put theoretical foundations under the field of computer chess. It is possible that these foundations will be developed over time, during the normal course of AI research: in the past, theory has often preceded application. But the converse is also true: in many cases, application has preceded theory. The claim of this technical agenda is that, in safety-critical applications where mistakes can put lives at risk, it is crucial that certain theoretical insights come first. A smarter-than-human agent would be embedded within and computed by a complex universe, learning about its environment and bringing about desirable states of affairs. How is this formalized? What metric captures the question of how well an agent would perform in the real world?³ Not all parts of the problem must be solved in advance: the task of designing smarter, safer, more reliable systems could be delegated to early smarter-than-human systems, if the research done by those early systems can be sufficiently trusted. It is important, then, to focus research efforts particularly on parts of the problem where an increased understanding is necessary to construct a minimal reliable generally intelligent system. Moreover, it is important to focus on aspects which are currently tractable, so that progress can in fact be made today, and on issues relevant to alignment in particular, which would not otherwise be studied over the course of “normal” AI research. In this section, we discuss four candidate topics meeting these criteria: (1) realistic world-models, the study of agents learning and pursuing goals while embedded within a physical world; (2) decision theory, the study of idealized decision-making procedures; (3) logical uncertainty, the study of reliable reasoning with bounded deductive capabilities; and (4) Vingean reflection, the study of reliable methods for reasoning about agents that are more intelligent than the reasoner. We will now discuss each of these topics in turn. 5.2.1 Realistic World-Models Formalizing the problem of computer intelligence may seem easy in theory: encode some set of preferences as a utility function, and evaluate the expected utility that would be obtained if the agent were implemented. However, this is not a full specification: What is the set of “possible realities” used to model the world? Against what distribution over world models is the agent evaluated? How is a given world model used to score an agent? To ensure that an agent would work well in reality, it is first useful to formalize the problem faced by agents learning (and acting in) arbitrary environments. Solomonoff (1964) made an early attempt to tackle these questions by specifying an “induction problem” in which an agent must construct world models and promote correct hypotheses based on the observation of an arbitrarily complex environment, in a manner reminiscent of scientific induction. In this problem, the agent and environment are separate. The agent gets to see one bit from the environment in each turn, and must predict the bits which follow. Solomonoff’s induction problem answers all three of the above questions in a simplified setting: The set of world models is any computable environment (e.g., any Turing machine). In reality, the simplest hypothesis that predicts the data is generally correct, so agents are evaluated against a simplicity distribution. Agents are scored according to their ability to predict their next observation. These answers were insightful, and led to the development of many useful tools, including algorithmic probability and Kolmogorov complexity. However, Solomonoff’s induction problem does not fully capture the problem faced by an agent learning about an environment while embedded within it, as a subprocess. It assumes that the agent and environment are separated, save only for the observation channel. What is the analog of Solomonoff induction for agents that are embedded within their environment? This is the question of naturalized induction (Bensinger 2013). Unfortunately, the insights of Solomonoff do not apply in the naturalized setting. In Solomonoff’s setting, where the agent and environment are separated, one can consider arbitrary Turing machines to be “possible environments.” But when the agent is embedded in the environment, consideration must be restricted to environments which embed the agent. Given an algorithm, what is the set of environments which embed that algorithm? Given that set, what is the analogue of a simplicity prior which captures the fact that simpler hypotheses are more often correct? Technical problem (Naturalized Induction) What, formally, is the induction problem faced by an intelligent agent embedded within and computed by its environment? What is the set of environments which embed the agent? What constitutes a simplicity prior over that set? How is the agent scored? For discussion, see Soares (2015). Just as a formal description of Solomonoff induction answered the above three questions in the context of an agent learning an external environment, a formal description of naturalized induction may well yield answers to those questions in the context where agents are embedded in and computed by their environment. Of course, the problem of computer intelligence is not simply an induction problem: the agent must also interact with the environment. Hutter (2000) extends Solomonoff’s induction problem to an “interaction problem,” in which an agent must both learn and act upon its environment. In each turn, the agent both observes one input and writes one output, and the output affects the behavior of the environment. In this problem, the agent is evaluated in terms of its ability to maximize a reward function specified in terms of inputs. While this model does not capture the difficulties faced by agents which are embedded within their environment, it does capture a large portion of the problem faced by agents interacting with arbitrarily complex environments. Indeed, the interaction problem (and AIXI Hutter 2000, its solution) are the basis for the “universal measure of intelligence” developed by Legg and Hutter (2007). However, even barring problems arising from the agent/environment separation, the Legg-Hutter metric does not fully characterize the problem of computer intelligence. It scores agents according to their ability to maximize a reward function specified in terms of observation. Agents scoring well by the Legg-Hutter metric are extremely effective at ensuring their observations optimize a reward function, but these high-scoring agents are likely to be the type that find clever ways to seize control of their observation channel rather than the type that identify and manipulate the features in the world that the reward function was intended to proxy for (Soares 2015). Reinforcement learning techniques which punish the agent for attempting to take control would only incentivize the agent to deceive and mollify the programmers until it found a way to gain a decisive advantage (Bostrom 2014, Chap. 8). The Legg-Hutter metric does not characterize the question of how well an algorithm would perform if implemented in reality: to formalize that question, a scoring metric must evaluate the resulting environment histories, not just the agent’s observations (Soares 2015). But human goals are not specified in terms of environment histories, either: they are specified in terms of high-level notions such as “money” or “flourishing humans.” Leaving aside problems of philosophy, imagine rating a system according to how well it achieves a straightforward, concrete goal, such as by rating how much diamond is in an environment after the agent has acted on it, where “diamond” is specified concretely in terms of an atomic structure. Now the goals are specified in terms of atoms, and the environment histories are specified in terms of Turing machines paired with an interaction history. How is the environment history evaluated in terms of atoms? This is the ontology identification problem. Technical problem (Ontology Identification) Given goals specified in some ontology and a world model, how can the ontology of the goals be identified in the world model? What types of world models are amenable to ontology identification? For a discussion, see Soares (2015). To evaluate world models, the world models must be evaluated in terms of the ontology of the goals. This may be difficult in cases where the ontology of the goals does not match reality: it is one thing to locate atoms in a Turing machine using an atomic model of physics, but it is another thing altogether to locate atoms in a Turing machine modeling quantum physics. De Blanc (2011) further motivates the idea that explicit mechanisms are needed to deal with changes in the ontology of the system’s world model. Agents built to solve the wrong problem—such as optimizing their observations—may well be capable of attaining superintelligence, but it is unlikely that those agents could be aligned with human interests (Bostrom 2014, Chap. 12). A better understanding of naturalized induction and ontology identification is needed to fully specify the problem that intelligent agents would face when pursuing human goals while embedded within reality, and this increased understanding could be a crucial tool when it comes to designing highly reliable agents. 5.2.2 Decision Theory Smarter-than-human systems must be trusted to make good decisions, but what does it mean for a decision to be “good”? Formally, given a description of an environment and an agent embedded within, how is the “best available action” identified, with respect to some set of preferences? This is the question of decision theory. The answer may seem trivial, at least in theory: simply iterate over the agent’s available actions, evaluate what would happen if the agent took that action, and then return whichever action leads to the most expected utility. But this is not a full specification: How are the “available actions” identified? How is what “would happen” defined? The difficulty is easiest to illustrate in a deterministic setting. Consider a deterministic agent embedded in a deterministic environment. There is exactly one action that the agent will take. Given a set of actions that it “could take,” it is necessary to evaluate, for each action, what would happen if the agent took that action. But the agent will not take most of those actions. How is the counterfactual environment constructed, in which a deterministic algorithm “does something” that, in the real environment, it doesn’t do? Answering this question requires a theory of counterfactual reasoning, and counterfactual reasoning is not well understood. Technical problem (Theory of Counterfactuals) What theory of counterfactual reasoning can be used to specify a procedure which always identifies the best action available to a given agent in a given environment, with respect to a given set of preferences? For discussion, see Soares and Fallenstein (2014). Decision theory has been studied extensively by philosophers. The study goes back to Pascal, and has been picked up in modern times by Lehmann (1950), Wald (1939), Jeffrey (1983), Joyce (1999), Lewis (1981), Pearl (2000), and many others. However, no satisfactory method of counterfactual reasoning yet answers this particular question. To give an example of why counterfactual reasoning can be difficult, consider a deterministic agent playing against a perfect copy of itself in the classic prisoner’s dilemma (Rapoport and Chammah 1965). The opponent is guaranteed to do the same thing as the agent, but the agents are “causally separated,” in that the action of one cannot physically affect the action of the other. What is the counterfactual world in which the agent on the left cooperates? It is not sufficient to consider changing the action of the agent on the left while holding the action of the agent on the right constant, because while the two are causally disconnected, they are logically constrained to behave identically. Standard causal reasoning, which neglects these logical constraints, misidentifies “defection” as the best strategy available to each agent even when they know they have identical source codes (Lewis 1979).⁴ Satisfactory counterfactual reasoning must respect these logical constraints, but how are logical constraints formalized and identified? It is fine to say that, in the counterfactual where the agent on the left cooperates, all identical copies of it also cooperate; but what counts as an identical copy? What if the right agent runs the same algorithm written in a different programming language? What if it only does the same thing 98% of the time? These questions are pertinent in reality: practical agents must be able to identify good actions in settings where other actors base their actions on imperfect (but well-informed) predictions of what the agent will do. Identifying the best action available to an agent requires taking the non-causal logical constraints into account. A satisfactory formalization of counterfactual reasoning requires the ability to answer questions about how other deterministic algorithms behave in the counterfactual world where the agent’s deterministic algorithm does something it doesn’t. However, the evaluation of “logical counterfactuals” is not yet well understood. Technical problem (Logical Counterfactuals) Consider a counterfactual in which a given deterministic decision procedure selects a different action from the one it selects in reality. What are the implications of this counterfactual on other algorithms? Can logical counterfactuals be formalized in a satisfactory way? A method for reasoning about logical counterfactuals seems necessary in order to formalize a more general theory of counterfactuals. For a discussion, see Soares and Fallenstein (2014). Unsatisfactory methods of counterfactual reasoning (such as the causal reasoning of Pearl (2000)) seem powerful enough to support smarter-than-human intelligent systems, but systems using those reasoning methods could systematically act in undesirable ways (even if otherwise aligned with human interests). To construct practical heuristics that are known to make good decisions, even when acting beyond the oversight and control of humans, it is essential to understand what is meant by “good decisions.” This requires a formulation which, given a description of an environment, an agent embedded in that environment, and some set of preferences, identifies the best action available to the agent. While modern methods of counterfactual reasoning do not yet allow for the specification of such a formula, recent research has pointed the way towards some promising paths for future research. For example, Wei Dai’s “updateless decision theory” (UDT) is a new take on decision theory that systematically outperforms causal decision theory (Hintze 2014), and two of the insights behind UDT highlight a number of tractable open problems (Soares and Fallenstein 2014). Recently, Barasz et al. (2014) developed a concrete model, together with a Haskell implementation, of multi-agent games where agents have access to each others’ source code and base their decisions on what they can prove about their opponent. They have found that it is possible for some agents to achieve robust cooperation in the one-shot prisoner’s dilemma while remaining unexploitable Barasz et al. (2014). These results suggest a number of new ways to approach the problem of counterfactual reasoning, and we are optimistic that continued study will prove fruitful. 5.2.3 Logical Uncertainty Consider a reasoner encountering a black box with one input chute and two output chutes. Inside the box is a complex Rube Goldberg machine that takes an input ball and deposits it in one of the two output chutes. A probabilistic reasoner may have uncertainty about where the ball will exit, due to uncertainty about which Rube Goldberg machine is in the box. However, standard probability theory assumes that if the reasoner did know which machine the box implemented, they would know where the ball would exit: the reasoner is assumed to be logically omniscient, i.e., to know all logical consequences of any hypothesis they entertain. By contrast, a practical bounded reasoner may be able to know exactly which Rube Goldberg machine the box implements without knowing where the ball will come out, due to the complexity of the machine. This reasoner is logically uncertain. Almost all practical reasoning is done under some form of logical uncertainty (Gaifman 2004), and almost all reasoning done by a smarter-than-human agent must be some form of logically uncertain reasoning. Any time an agent reasons about the consequences of a plan, the effects of running a piece of software, or the implications of an observation, it must do some sort of reasoning under logical uncertainty. Indeed, the problem of an agent reasoning about an environment in which it is embedded as a subprocess is inherently a problem of reasoning under logical uncertainty. Thus, to construct a highly reliable smarter-than-human system, it is vitally important to ensure that the agent’s logically uncertain reasoning is reliable and trustworthy. This requires a better understanding of the theoretical underpinnings of logical uncertainty, to more fully characterize what it means for logically uncertain reasoning to be “reliable and trustworthy” (Soares and Fallenstein 2015). It is natural to consider extending standard probability theory to include the consideration of worlds which are “logically impossible” (e.g., where a deterministic Rube Goldberg machine behaves in a way that it doesn’t). This gives rise to two questions: What, precisely, are logically impossible possibilities? And, given some means of reasoning about impossible possibilities, what is a reasonable prior probability distribution over impossible possibilities? The problem is difficult to approach in full generality, but a study of logical uncertainty in the restricted context of assigning probabilities to logical sentences goes back at least to Łoś (1955) and Gaifman (1964), and has been investigated by many, including Halpern (2003), Hutter et al. (2013), Demski (2012), Russell (2014), and others. Though it isn’t clear to what degree this formalism captures the kind of logically uncertain reasoning a realistic agent would use, logical sentences in, for example, the language of Peano Arithmetic are quite expressive: for example, given the Rube Goldberg machine discussed above, it is possible to form a sentence which is true if and only if the machine deposits the ball into the top chute. Thus, considering reasoners which are uncertain about logical sentences is a useful starting point. The problem of assigning probabilities to sentences of logic naturally divides itself into two parts. First, how can probabilities consistently be assigned to sentences? An agent assigning probability 1 to short contradictions is hardly reasoning about the sentences as if they are logical sentences: some of the logical structure must be preserved. But which aspects of the logical structure? Preserving all logical implications requires that the reasoner be deductively omnipotent, as some implications [$$\phi \rightarrow \psi $$] may be very involved. The standard answer in the literature is that a coherent assignment of probabilities to sentences corresponds to a probability distribution over complete, consistent logical theories (Gaifman 1964; Christiano 2014a); that is, an “impossible possibility” is any consistent assignment of truth values to all sentences. Deductively limited reasoners cannot have fully coherent distributions, but they can approximate these distributions: for a deductively limited reasoner, “impossible possibilities” can be any assignment of truth values to sentences that looks consistent so far, so long as the assignment is discarded as soon as a contradiction is introduced. Technical problem (Impossible Possibilities) How can deductively limited reasoners approximate reasoning according to a probability distribution over complete theories of logic? For a discussion, see Christiano (2014a). Second, what is a satisfactory prior probability distribution over logical sentences? If the system is intended to reason according to a theory at least as powerful as Peano Arithmetic ([$$\mathsf {PA} $$]), then that theory will be incomplete (Gödel et al. 1934). What prior distribution places nonzero probability on the set of complete extensions of [$$\mathsf {PA} $$]? Deductively limited agents would not be able to literally use such a prior, but if it were computably approximable, then they could start with a rough approximation of the prior and refine it over time. Indeed, the process of refining a logical prior—getting better and better probability estimates for given logical sentences—captures the whole problem of reasoning under logical uncertainty in miniature. Hutter et al. (2013) have defined a desirable prior, but Sawin and Demski (2013) have shown that it cannot be computably approximated. Demski (2012) and Christiano (2014a) have also proposed logical priors, but neither seems fully satisfactory. The specification of satisfactory logical priors is difficult in part because it is not yet clear which properties are desirable in a logical prior, nor which properties are possible. Technical problem (Logical Priors) What is a satisfactory set of priors over logical sentences that a bounded reasoner can approximate? For a discussion, see Soares and Fallenstein (2015). Many existing tools for studying reasoning, such as game theory, standard probability theory, and Bayesian networks, all assume that reasoners are logically omniscient. A theory of reasoning under logical uncertainty seems necessary to formalize the problem of naturalized induction, and to generate a satisfactory theory of counterfactual reasoning. If these tools are to be developed, extended, or improved, then a better understanding of logically uncertain reasoning is required. 5.2.4 Vingean Reflection Instead of specifying superintelligent systems directly, it seems likely that humans may instead specify generally intelligent systems that go on to create or attain superintelligence. In this case, the reliability of the resulting superintelligent system depends upon the reasoning of the initial system which created it (either anew or via self-modification). If the agent reasons reliably under logical uncertainty, then it may have a generic ability to evaluate various plans and strategies, only selecting those which seem beneficial. However, some scenarios put that logically uncertain reasoning to the test more than others. There is a qualitative difference between reasoning about simple programs and reasoning about human-level intelligent systems. For example, modern program verification techniques could be used to ensure that a “smart” military drone obeys certain rules of engagement, but it would be a different problem altogether to verify the behavior of an artificial military general which must run an entire war. A general has far more autonomy, ability to come up with clever unexpected strategies, and opportunities to impact the future than a drone. A self-modifying agent (or any that constructs new agents more intelligent than itself) must reason about the behavior of a system that is more intelligent than the reasoner. This type of reasoning is critically important to the design of self-improving agents: if a system will attain superintelligence through self-modification, then the impact of the system depends entirely upon the correctness of the original agent’s reasoning about its self-modifications (Fallenstein and Soares 2015). Before trusting a system to attain superintelligence, it seems prudent to ensure that the agent uses appropriate caution when reasoning about successor agents.⁵ That is, it seems necessary to understand the mechanisms by which agents reason about smarter systems. Naive tools for reasoning about plans including smarter agents, such as backwards induction (Ben-Porath 1997), would have the reasoner evaluate the smarter agent by simply checking what the smarter agent would do. This does not capture the difficulty of the problem: a parent agent cannot simply check what its successor agent would do in all scenarios, for if it could, then it would already know what actions its successor will take, and the successor would not in any way be smarter. Yudkowsky and Herreshoff (2013) call this observation the “Vingean principle,” after Vernor Vinge (1993), who emphasized how difficult it is for humans to predict the behavior of smarter-than-human agents. Any agent reasoning about more intelligent successor agents must do so abstractly, without pre-computing all actions that the successor would take in every scenario. We refer to this kind of reasoning as Vingean reflection. Technical problem (Vingean Reflection) How can agents reliably reason about agents which are smarter than themselves, without violating the Vingean principle? For discussion, see Fallenstein and Soares (2015). It may seem premature to worry about how agents reason about self-improvements before developing a theoretical understanding of reasoning under logical uncertainty in general. However, it seems to us that work in this area can inform understanding of what sort of logically uncertain reasoning is necessary to reliably handle Vingean reflection. Given the high stakes when constructing systems smarter than themselves, artificial agents might use some form of extremely high-confidence reasoning to verify the safety of potentially dangerous self-modifications. When humans desire extremely high reliability, as is the case for (e.g.) autopilot software, we often use formal logical systems (United States Department of Defense 1985; United Kingdom Ministry of Defense 1991). High-confidence reasoning in critical situations may require something akin to formal verification (even if most reasoning is done using more generic logically uncertain reasoning), and so studying Vingean reflection in the domain of formal logic seems like a good starting point. Logical models of agents reasoning about agents that are “more intelligent,” however, run into a number of obstacles. By Gödel’s second incompleteness theorem (1934), sufficiently powerful formal systems cannot rule out the possibility that they may be inconsistent. This makes it difficult for agents using formal logical reasoning to verify the reasoning of similar agents which also use formal logic for high-confidence reasoning; the first agent cannot verify that the latter agent is consistent. Roughly, it seems desirable to be able to develop agents which reason as follows: This smarter successor agent uses reasoning similar to mine, and my own reasoning is sound, so its reasoning is sound as well. However, Gödel et al. (1934) showed that this sort of reasoning leads to inconsistency, and these problems do in fact make Vingean reflection difficult in a logical setting (Fallenstein and Soares 2015; Yudkowsky 2013). Technical problem (Löbian Obstacle) How can agents gain very high confidence in agents that use similar reasoning systems, while avoiding paradoxes of self-reference? For discussion, see Fallenstein and Soares (2015). These results may seem like artifacts of models rooted in formal logic, and may seem irrelevant given that practical agents must eventually use logical uncertainty rather than formal logic to reason about smarter successors. However, it has been shown that many of the Gödelian obstacles carry over into early probabilistic logics in a straightforward way, and some results have already been shown to apply in the domain of logical uncertainty (Fallenstein 2014). Studying toy models in this formal logical setting has led to partial solutions (Fallenstein and Soares 2014). Recent work has pushed these models towards probabilistic settings (Fallenstein and Soares 2014; Yudkowsky 2014; Soares 2014). Further research may continue driving the development of methods for reasoning under logical uncertainty which can handle Vingean reflection in a reliable way (Fallenstein and Soares 2015). 5.3 Error-Tolerant Agent Designs Incorrectly specified superintelligent agents could be dangerous (Yudkowsky 2008). Correcting a modern AI system involves simply shutting the system down and modifying its source code. Modifying a smarter-than-human system may prove more difficult: a system attaining superintelligence could acquire new hardware, alter its software, create subagents, and take other actions that would leave the original programmers with only dubious control over the agent. This is especially true if the agent has incentives to resist modification or shutdown. If intelligent systems are to be safe, they must be constructed in such a way that they are amenable to correction, even if they have the ability to prevent or avoid correction. This does not come for free: by default, agents have incentives to preserve their own preferences, even if those conflict with the intentions of the programmers (Omohundro 2008; Soares and Fallenstein 2015). Special care is needed to specify agents that avoid the default incentives to manipulate and deceive (Bostrom 2014, Chap. 8). Restricting the actions available to a superintelligent agent may be quite difficult (Bostrom 2014, Chap. 9). Intelligent optimization processes often find unexpected ways to fulfill their optimization criterion using whatever resources are at their disposal; recall the evolved oscillator of Bird and Layzell (2002). Superintelligent optimization processes may well use hardware, software, and other resources in unanticipated ways, making them difficult to contain if they have incentives to escape. We must learn how to design agents which do not have incentives to escape, manipulate, or deceive in the first place: agents which reason as if they are incomplete and potentially flawed in dangerous ways, and which are therefore amenable to online correction. Reasoning of this form is known as “corrigible reasoning.” Technical problem (Corrigibility) What sort of reasoning can reflect the fact that an agent is incomplete and potentially flawed in dangerous ways? For discussion, see Soares and Fallenstein (2015). Naïve attempts at specifying corrigible behavior are unsatisfactory. For example, “moral uncertainty” frameworks could allow agents to learn values through observation and interaction, but would still incentivize agents to resist changes to the underlying moral uncertainty framework if it happened to be flawed. Simple “penalty terms” for manipulation and deception also seem doomed to failure: agents subject to such penalties would have incentives to resist modification while cleverly avoiding the technical definitions of “manipulation” and “deception.” The goal is not to design systems that fail in their attempts to deceive the programmers; the goal is to construct reasoning methods that do not give rise to deception incentives in the first place. A formalization of the intuitive notion of corrigibility remains elusive. Active research is currently focused on small toy problems, in the hopes that insight gained there will generalize. One such toy problem is the “shutdown problem,” which involves designing a set of preferences that incentivize an agent to shut down upon the press of a button without also incentivizing the agent to either cause or prevent the pressing of that button (Soares and Fallenstein 2015). Stuart Armstrong’s utility indifference technique (2015) provides a partial solution, but not a fully satisfactory one. Technical problem (Utility Indifference) Can a utility function be specified such that agents maximizing that utility function switch their preferences on demand, without having incentives to cause or prevent the switching? For discussion, see Armstrong (2015). A better understanding of corrigible reasoning is essential to design agent architectures that are tolerant of human error. Other research could also prove fruitful, including research into reliable containment mechanisms. Alternatively, agent designs could somehow incentivize the agent to have a “low impact” on the world. Specifying “low impact” is trickier than it may seem: How do you tell an agent that it can’t affect the physical world, given that its RAM is physical? How do you tell it that it can only use its own hardware, without allowing it to use its motherboard as a makeshift radio? How do you tell it not to cause big changes in the world when its behavior influences the actions of the programmers, who influence the world in chaotic ways? Technical problem (Domesticity) How can an intelligent agent be safely incentivized to have a low impact? Specifying such a thing is not as easy as it seems. For a discussion, see Armstrong et al. (2012). Regardless of the methodology used, it is crucial to understand designs for agents that could be updated and modified during the development process, so as to ensure that the inevitable human errors do not lead to catastrophe. 5.4 Value Specification A highly-reliable, error-tolerant agent design does not guarantee a positive impact; the effects of the system still depend upon whether it is pursuing appropriate goals. A superintelligent system may find clever, unintended ways to achieve the specific goals that it is given. Imagine a superintelligent system designed to cure cancer which does so by stealing resources, proliferating robotic laboratories at the expense of the biosphere, and kidnapping test subjects: the intended goal may have been “cure cancer without doing anything bad,” but such a goal is rooted in cultural context and shared human knowledge. It is not sufficient to construct systems that are smart enough to figure out the intended goals. Human beings, upon learning that natural selection “intended” sex to be pleasurable only for purposes of reproduction, do not suddenly decide that contraceptives are abhorrent. While one should not anthropomorphize natural selection, humans are capable of understanding the process which created them while being completely unmotivated to alter their preferences. For similar reasons, when developing AI systems, is not sufficient to develop a system intelligent enough to figure out the intended goals; the system must also somehow be deliberately constructed to pursue them (Bostrom 2014, Chap. 8). However, the “intentions” of the operators are a complex, vague, fuzzy, context-dependent notion (Yudkowsky 2011; cf. Sotala and Yampolskiy 2015, Sects. 2.2 and 5.2.5, this volume). Concretely writing out the full intentions of the operators in a machine-readable format is implausible if not impossible, even for simple tasks. An intelligent agent must be designed to learn and act according to the preferences of its operators.⁶ This is the value learning problem. Directly programming a rule which identifies cats in images is implausibly difficult, but specifying a system that inductively learns how to identify cats in images is possible. Similarly, while directly programming a rule capturing complex human intentions is implausibly difficult, intelligent agents could be constructed to inductively learn values from training data. Inductive value learning presents unique difficulties. The goal is to develop a system which can classify potential outcomes according to their value, but what sort of training data allows this classification? The labeled data could be given in terms of the agent’s world-model, but this is a brittle solution if the ontology of the world-model is liable to change. Alternatively, the labeled data could come in terms of observations, in which case the agent would have to first learn how the labels in the observations map onto objects in the world-model, and then learn how to classify outcomes. Designing algorithms which can do this likely requires a better understanding of methods for constructing multi-level world-models from sense data. Technical problem (Multi-Level World-Models) How can multi-level world-models be constructed from sense data in a manner amenable to ontology identification? For a discussion, see Soares (2016). Standard problems of inductive learning arise, as well: how could a training set be constructed which allows the agent to fully learn the complexities of value? It is easy to imagine a training set which labels many observations of happy humans as “good” and many observations of needlessly suffering humans as “bad,” but the simplest generalization from this data set may well be that humans value human-shaped things mimicking happy emotions: after training on this data, an agent may be inclined to construct many simple animatronics mimicking happiness. Creating a training set that covers all relevant dimensions of human value is difficult for the same reason that specifying human value directly is difficult. In order for inductive value learning to succeed, it is necessary to construct a system which identifies ambiguities in the training set—dimensions along which the training set gives no information—and queries the operators accordingly. Technical problem (Ambiguity Identification) Given a training data set and a world model, how can dimensions which are neglected by the training data be identified? For discussion, see Soares (2016). This problem is not unique to value learning, but it is especially important for it. Research into the programmatic identification of ambiguities, and the generation of “queries” which are similar to previous training data but differ along the ambiguous axis, would assist in the development of systems which can safely perform inductive value learning. Intuitively, an intelligent agent should be able to use some of its intelligence to assist in this process: it does not take a detailed understanding of the human psyche to deduce that humans care more about some ambiguities (are the human-shaped things actually humans?) than others (does it matter if there is a breeze?). To build a system that acts as intended, the system must model the intentions of the operators and act accordingly. This adds another layer of indirection: the system must model the operators in some way, and must extract “preferences” from the operator-model and update its preferences accordingly (in a manner robust against improvements to the model of the operator). Techniques such as inverse reinforcement learning (Ng and Russell 2000), in which the agent assumes that the operator is maximizing some reward function specified in terms of observations, are a good start, but many questions remain unanswered. Technical problem (Operator Modeling) By what methods can an operator be modeled in such a way that (1) a model of the operator’s preferences can be extracted; and (2) the model may eventually become arbitrarily accurate and represent the operator as a subsystem embedded within the larger world? For a discussion, see Soares (2016). A system which acts as the operators intend may still have significant difficulty answering questions that the operators themselves cannot answer: imagine humans trying to design an artificial agent to do what they would want, if they were better people. How can normative uncertainty (uncertainty about moral claims) be resolved? Bostrom (2014, Chap. 13) suggests an additional level of indirection: task the system with reasoning about what sorts of conclusions the operators would come to if they had more information and more time to think. Formalizing this is difficult, and the problems are largely still in the realm of philosophy rather than technical research. However, Christiano (2014b) has sketched one possible method by which the volition of a human could be extrapolated, and Soares (2016) discusses some potential pitfalls. Philosophical problem (Normative Uncertainty) What ought one do when one is uncertain about what one ought to do? What norms govern uncertainty about normative claims? For a discussion, see MacAskill (2014). Human operators with total control over a superintelligent system could give rise to a moral hazard of extraordinary proportions by putting unprecedented power into the hands of a small few (Bostrom 2014, Chap. 6). The extraordinary potential of superintelligence gives rise to many ethical questions. When constructing autonomous agents that will have a dominant ability to determine the future, it is important to design the agents to not only act according to the wishes of the operators, but also in others’ common interest. Here we largely leave the philosophical questions aside, and remark only that those who design systems intended to surpass human intelligence will take on a responsibility of unprecedented scale. 5.5 Discussion Sections 5.2 through 5.4 discussed six research topics where the authors think that further research could make it easier to develop aligned systems in the future. This section discusses reasons why we think useful progress can be made today. 5.5.1 Toward a Formal Understanding of the Problem Are the problems discussed above tractable, uncrowded, focused, and unlikely to be solved automatically in the course of developing increasingly intelligent AI systems? They are certainly not very crowded. They also appear amenable to progress in the near future, though it is less clear whether they can be fully solved. When it comes to focus, some think that problems of decision theory and logical uncertainty sound more like generic theoretical AI research than alignment-specific research. A more intuitive set of alignment problems might put greater emphasis on AI constraint (see Chap. 4 in this book) or value learning. Progress on the topics outlined in this agenda might indeed make it easier to design intelligent systems in general. Just as the intelligence metric of Legg and Hutter (2007) lent insight into the ideal priors for agents facing Hutter’s interaction problem, a full description of the naturalized induction problem could lend insight into the ideal priors for agents embedded within their universe. A satisfactory theory of logical uncertainty could lend insight into general intelligence more broadly. An ideal decision theory could reveal an ideal decision-making procedure for real agents to approximate. But while these advancements may provide tools useful for designing intelligent systems in general, they would make it markedly easier to design aligned systems in particular. Developing a general theory of highly reliable decision-making, even if it is too idealized to be directly implemented, gives us the conceptual tools needed to design and evaluate safe heuristic approaches. Conversely, if we must evaluate real systems composed of practical heuristics before formalizing the theoretical problems that those heuristics are supposed to solve, then we will be forced to rely on our intuitions. This theoretical understanding might not be developed by default. Causal counterfactual reasoning, despite being suboptimal, might be good enough to enable the construction of a smarter-than-human system. Systems built from poorly understood heuristics might be capable of creating or attaining superintelligence for reasons we don’t quite understand—but it is unlikely that such systems could then be aligned with human interests. Sometimes theory precedes application, but sometimes it does not. The goal of much of the research outlined in this agenda is to ensure, in the domain of superintelligence alignment—where the stakes are incredibly high—that theoretical understanding comes first. 5.5.2 Why Start Now? It may seem premature to tackle the problem of AI goal alignment now, with superintelligent systems still firmly in the domain of futurism. However, the authors think it is important to develop a formal understanding of AI alignment well in advance of making design decisions about smarter-than-human systems. By beginning our work early, we inevitably face the risk that it may turn out to be irrelevant; yet failing to make preparations at all poses substantially larger risks. We have identified a number of unanswered foundational questions relating to the development of general intelligence, and at present it seems possible to make some promising inroads. We think that the most responsible course, then, is to begin as soon as possible. Weld and Etzioni (1994) directed a “call to arms” to computer scientists, noting that “society will reject autonomous agents unless we have some credible means of making them safe.” We are concerned with the opposite problem: what if society fails to reject systems that are unsafe? What will be the consequences if someone believes a smarter-than-human system is aligned with human interests when it is not? This is our call to arms: regardless of whether research efforts follow the path laid out in this document, significant effort must be focused on the study of superintelligence alignment as soon as possible. References Armstrong S (2015) AI motivated value selection, accepted to the 1st International Workshop on AI and Ethics, held within the 29th AAAI Conference on Artificial Intelligence (AAAI-2015), Austin, TX Armstrong S, Sandberg A, Bostrom N (2012) Thinking inside the box: Controlling and using an oracle AI. Minds and Machines 22(4):299–324CrossRef Bárász M, Christiano P, Fallenstein B, Herreshoff M, LaVictoire P, Yudkowsky E (2014) Robust cooperation in the Prisoner’s Dilemma: Program equilibrium via provability logic, unpublished manuscript. Available via arXiv. ʬ Ben-Porath E (1997) Rationality, Nash equilibrium, and backwards induction in perfect-information games. Review of Economic Studies 64(1):23–46CrossRef Bensinger R (2013) Building phenomenological bridges. Less Wrong Blog ʬ Bird J, Layzell P (2002) The evolved radio and its implications for modelling the evolution of novel sensors. In: Proceedings of the 2002 Congress on Evolutionary Computation. Vol. 2, IEEE, Honolulu, HI, pp 1836–1841 Bostrom N (2014) Superintelligence: Paths, Dangers, Strategies. Oxford University Press, New York Christiano P (2014a) Non-omniscience, probabilistic inference, and metamathematics. Tech. Rep. 2014–3, Machine Intelligence Research Institute, Berkeley, CA, ʬ Christiano P (2014b) Specifying “enlightened judgment” precisely (reprise). Ordinary Ideas Blog ʬ de Blanc P (2011) Ontological crises in artificial agents’ value systems. Tech. rep., The Singularity Institute, San Francisco, CA, ʬ Demski A (2012) Logical prior probability. In: Bach J, Goertzel B, Iklé M (eds) Artificial General Intelligence, Springer, New York, 7716, pp 50–59, 5th International Conference, AGI 2012, Oxford, UK, December 8–11, 2012. Proceedings Fallenstein B (2014) Procrastination in probabilistic logic. Working paper, Machine Intelligence Research Institute, Berkeley, CA, ʬ Fallenstein B, Soares N (2014) Problems of self-reference in self-improving space-time embedded intelligence. In: Goertzel B, Orseau L, Snaider J (eds) Artificial General Intelligence, Springer, New York, 8598, pp 21–32, 7th International Conference, AGI 2014, Quebec City, QC, Canada, August 1–4, 2014. Proceedings Fallenstein B, Soares N (2015) Vingean reflection: Reliable reasoning for self-improving agents. Tech. Rep. 2015–2, Machine Intelligence Research Institute, Berkeley, CA, ʬ Gaifman H (1964) Concerning measures in first order calculi. Israel Journal of Mathematics 2(1):1–18 Gaifman H (2004) Reasoning with limited resources and assigning probabilities to arithmetical statements. Synthese 140(1–2):97–119CrossRef Gödel K, Kleene SC, Rosser JB (1934) On Undecidable Propositions of Formal Mathematical Systems. Institute for Advanced Study, Princeton, NJ Good IJ (1965) Speculations concerning the first ultraintelligent machine. In: Alt FL, Rubinoff M (eds) Advances in Computers, vol 6, Academic Press, New York, pp 31–88 Halpern JY (2003) Reasoning about Uncertainty. MIT Press, Cambridge, MA Hintze D (2014) Problem class dominance in predictive dilemmas. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, ʬ Hutter M (2000) A theory of universal artificial intelligence based on algorithmic complexity, unpublished manuscript. Available via arXiv. ʬ Hutter M, Lloyd JW, Ng KS, Uther WTB (2013) Probabilities on sentences in an expressive logic. Journal of Applied Logic 11(4):386–420CrossRef Jeffrey RC (1983) The Logic of Decision, 2nd edn. Chicago University Press, Chicago, IL Joyce JM (1999) The Foundations of Causal Decision Theory. Cambridge Studies in Probability, Induction and Decision Theory, Cambridge University Press, New York, NYCrossRef Legg S, Hutter M (2007) Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4):391–444CrossRef Lehmann EL (1950) Some principles of the theory of testing hypotheses. Annals of Mathematical Statistics 21(1):1–26CrossRef Lewis D (1979) Prisoners’ dilemma is a Newcomb problem. Philosophy & Public Affairs 8(3):235–240, ʬ Lewis D (1981) Causal decision theory. Australasian Journal of Philosophy 59(1):5–30CrossRef Łoś J (1955) On the axiomatic treatment of probability. Colloquium Mathematicae 3(2):125–137, ʬ MacAskill W (2014) Normative uncertainty. PhD thesis, St Anne’s College, University of Oxford, ʬ McCarthy J, Minsky M, Rochester N, Shannon C (1955) A proposal for the Dartmouth summer research project on artificial intelligence. Proposal, Formal Reasoning Group, Stanford University, Stanford, CA Muehlhauser L, Salamon A (2012) Intelligence explosion: Evidence and import. In: Eden A, Søraker J, Moor JH, Steinhart E (eds) Singularity Hypotheses: A Scientific and Philosophical Assessment, Springer, Berlin, the Frontiers Collection Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Langley P (ed) Proceedings of the Seventeenth International Conference on Machine Learning (ICML-’00), Morgan Kaufmann, San Francisco, pp 663–670 Omohundro SM (2008) The basic AI drives. In: Wang P, Goertzel B, Franklin S (eds) Artificial General Intelligence 2008, IOS, Amsterdam, no. 171 in Frontiers in Artificial Intelligence and Applications, pp 483–492, proceedings of the First AGI Conference Pearl J (2000) Causality: Models, Reasoning, and Inference, 1st edn. Cambridge University Press, New York, NY Poe EA (1836) Maelzel’s chess-player. Southern Literary Messenger 2(5):318–326 Rapoport A, Chammah AM (1965) Prisoner’s Dilemma: A Study in Conflict and Cooperation, Ann Arbor Paperbacks, vol 165. University of Michigan Press, Ann Arbor, MICrossRef Russell S (2014) Unifying logic and probability: A new dawn for AI? In: Information Processing and Management of Uncertainty in Knowledge-Based Systems: 15th International Conference, IPMU 2014, Montpellier, France, July 15–19, 2014, Proceedings, Part I, Springer, no. 442 in Communications in Computer and Information Science, pp 10–14 Sawin W, Demski A (2013) Computable probability distributions which converge on [$$\pi _1$$] will disbelieve true [$$\pi _2$$] sentences. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, ʬ Shannon CE (1950) XXII. Programming a computer for playing chess. Philosophical Magazine 41(314):256–275CrossRef Soares N (2014) Tiling agents in causal graphs. Tech. Rep. 2014–5, Machine Intelligence Research Institute, Berkeley, CA, ʬ Soares N (2015) Formalizing two problems of realistic world-models. Tech. Rep. 2015–3, Machine Intelligence Research Institute, Berkeley, CA, ʬ Soares N (2016) The value learning problem. In: Ethics for Artificial Intelligence Workshop at the 25th International Joint Conference on Artificial Intelligence (IJCAI-16). New York, NY, July 9th-15th Soares N, Fallenstein B (2014) Toward idealized decision theory. Tech. Rep. 2014–7, Machine Intelligence Research Institute, Berkeley, CA, ʬ Soares N, Fallenstein B (2015) Questions of reasoning under logical uncertainty. Tech. Rep. 2015–1, Machine Intelligence Research Institute, Berkeley, CA, ʬ Solomonoff RJ (1964) A formal theory of inductive inference. Part I. Information and Control 7(1):1–22CrossRef United Kingdom Ministry of Defense (1991) Requirements for the procurement of safety critical software in defence equipment. Interim Defence Standard 00-55, United Kingdom Ministry of Defense United States Department of Defense (1985) Department of Defense trusted computer system evaluation criteria. Department of Defense Standard DOD 5200.28-STD, United States Department of Defense, ʬ Vinge V (1993) The coming technological singularity: How to survive in the post-human era. In: Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace, NASA Lewis Research Center, no. 10129 in NASA Conference Publication, pp 11–22, ʬ Wald A (1939) Contributions to the theory of statistical estimation and testing hypotheses. Annals of Mathematical Statistics 10(4):299–326CrossRef Weld D, Etzioni O (1994) The first law of robotics (a call to arms). In: Hayes-Roth B, Korf RE (eds) Proceedings of the Twelfth National Conference on Artificial Intelligence, AAAI Press, Menlo Park, CA, pp 1042–1047, ʬ Yudkowsky E (2008) Artificial intelligence as a positive and negative factor in global risk. In: Bostrom N, Ćirković MM (eds) Global Catastrophic Risks, Oxford University Press, New York, pp 308–345 Yudkowsky E (2011) Complex value systems in Friendly AI. In: Schmidhuber J, Thórisson KR, Looks M (eds) Artificial General Intelligence, Springer, Berlin, no. 6830 in Lecture Notes in Computer Science, pp 388–393, 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings Yudkowsky E (2013) The procrastination paradox. Brief technical note, Machine Intelligence Research Institute, Berkeley, CA, ʬ Yudkowsky E (2014) Distributions allowing tiling of staged subjective EU maximizers. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, ʬ Yudkowsky E, Herreshoff M (2013) Tiling agents for self-modifying AI, and the Löbian obstacle. Early draft, Machine Intelligence Research Institute, Berkeley, CA, ʬ Footnotes A more careful wording might be “aligned with the interests of sentient beings.” We would not want to benefit humans at the expense of sentient non-human animals—or (if we build them) at the expense of sentient machines. Since the Dartmouth Proposal (McCarthy et al. 1955), it has been a standard idea in AI that a sufficiently smart machine intelligence could be intelligent enough to improve itself. In 1965, I.J. Good observed that this might create a positive feedback loop leading to an “intelligence explosion” (Good 1965). Sotala and Yampolskiy (2015, Sect. 2.3, this volume) and Bostrom (2014, Chap. 14) has observed that an intelligence explosion is especially likely if the agent has the ability to acquire more hardware, improve its software, or design new hardware. Legg and Hutter (2007) provide a preliminary answer to this question, by defining a “universal measure of intelligence” which scores how well an agent can learn the features of an external environment and maximize a reward function. This is the type of formalization we are looking for: a scoring metric which describes how well an agent would achieve some set of goals. However, while the Legg-Hutter metric is insightful, it makes a number of simplifying assumptions, and many difficult open questions remain (Soares 2015). As this is a multi-agent scenario, the problem of counterfactuals can also be thought of as game-theoretic. The goal is to define a procedure which reliably identifies the best available action; the label of “decision theory” is secondary. This goal subsumes both game theory and decision theory: the desired procedure must identify the best action in all settings, even when there is no clear demarcation between “agent” and “environment.” Game theory informs, but does not define, this area of research. Of course, if an agent reasons perfectly under logical uncertainty, it would also reason well about the construction of successor agents. However, given the fallibility of human reasoning and the fact that this path is critically important, it seems prudent to verify the agent’s reasoning methods in this scenario specifically. Or of all humans, or of all sapient creatures, etc. There are many philosophical concerns surrounding what sort of goals are ethical when aligning a superintelligent system, but a solution to the value learning problem will be a practical necessity regardless of which philosophical view is the correct one. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_6 6. Risk Analysis and Risk Management for the Artificial Superintelligence Research and Development Process Anthony M. Barrett¹   and Seth D. Baum¹ Global Catastrophic Risk Institute, Washington, D.C., USA Anthony M. Barrett Email: tony@gcrinstitute.org 6.1 Introduction A substantial amount of work has made the case that global catastrophic risks (GCRs) deserve special attention (Sagan 1983; Ng 1991; Bostrom 2002; Beckstead 2013; Maher and Baum 2013). Major issues in addressing GCRs include assessing the probabilities of such catastrophic events and assessing the effectiveness and tradeoffs of potential risk-reduction measures in light of limited risk-reduction resources and tradeoffs in using them. Certain types of artificial intelligence (AI) have been proposed as a potentially large factor in GCR. One specific AI type of great concern is artificial superintelligence (ASI), in which the AI has intelligence vastly exceeding humanity’s across a broad range of domains (Bostrom 2014). ASI could potentially either solve a great many of society’s problems or cause catastrophes such as human extinction, depending on how the ASI is designed (Yudkowsky 2008). The AIs that exist at the time of this writing are not superintelligent, but ASI could be developed sometime in the future. It is important to consider the long-term possibilities for ASI in order to help avoid ASI catastrophe. With careful analysis, it may be possible to identify indicators that ASI development is going in a dangerous direction, and likewise to identify risk management actions that can make ASI development safer. However, long-term technological forecasting is difficult (Lempert et al. 2003), making ASI risks difficult to characterize and manage. Additional challenges come from the possibility of ASI development going unnoticed (such as in covert development projects) and from weighing the risks posed by ASI against the potential benefits that ASI could bring. This paper surveys established methodologies for risk analysis and risk management as they can be applied to ASI risk. ASI risk can be addressed in at least two ways: (1) by building safety mechanisms into the ASI itself, as in ASI “Friendliness” research, and (2) by managing the human process of researching and developing ASI, in order to promote safety practices in ASI research and development (R&D). This paper focuses on the human R&D process because it has similarities to the R&D processes for other emerging technologies. Indeed, the ASI risk analysis ideas presented here are similar to our own work on risks posed by another emerging technology, synthetic biology (Barrett 2014). The ultimate goal of ASI risk analysis is to help people make better decisions about how to manage ASI risks. Formalized risk methodologies can help people consider more and better information and reduce cognitive biases in their decision making. A deep risk perception literature indicates that people often have grossly inaccurate perceptions of risks (Slovic et al. 1979). One example is in perceptions of “near miss” disasters that are luckily but narrowly avoided. An individual’s framing of the near miss as either a “disaster that did not occur” or a “disaster that almost happened” tends to decrease or increase, respectively, their perception of the future risk of such a disaster (Dillon et al. 2014). This, combined with the high stakes of ASI, suggests substantial value in formal ASI risk analysis. 6.2 Key ASI R&D Risk and Decision Issues For risk analysis, important questions concern the probabilities, timings, and consequences of the invention of key ASI technologies. Regarding the consequences, Yudkowsky (2008), Chalmers (2010) and others argue that ASIs could be so powerful that they will essentially be able to do whatever they choose. Yudkowsky (2008) and others thus argue that technologies for safe ASI are needed before ASI is invented; otherwise, ASI will pursue courses of action that will (perhaps inadvertently) be quite dangerous to humanity. For example, Omohundro (2008) argues that a superintelligent machine with an objective of winning a chess game could end up essentially exterminating humanity because the machine would pursue its objective of not losing its chess game, and would be able to continually acquire humanity’s resources in the process of pursuing its objective, regardless of costs to humanity. We refer to this type of scenario as an ASI catastrophe and focus specifically on this for the remainder of the paper. The risk of ASI catastrophe has the dynamics of a race. Society must develop ASI safety measures before it develops ASI, or else there will be an ASI catastrophe. Estimating ASI catastrophe risk thus requires estimating the probabilities of ASI and ASI safety measures occurring at different times. For ASI invention, a number of technology projection models exist, e.g. The Uncertain Future (Rayhawk et al. 2009a). ASI safety measure models are less well formulated at this point but would be needed for a complete risk analysis. For risk management, the most important question is: What policies (public or private) should be pursued? A variety of ASI risk reductions policy options have been identified (e.g., Sotala and Yampolskiy 2015; a version of which appears as Part 1 of this volume). At least three sets of policies could be followed, each with its own advantages and disadvantages:
  2. (1) Governments, corporations, and other entities could implement ASI R&D regulations within their jurisdictions, and pursue treaties or trade agreements for external cooperation. Regulations could restrict risky ASI R&D. However, implementation could be costly and could impede benign R&D. It would also be unlikely to be universally agreed and enforced, such that risky research could proceed in unregulated regions or institutions.   2. (2) Security agencies could covertly target risky ASI projects. Similar covert actions have reportedly been taken against other R&D projects, such as the Stuxnet virus used against Iran’s nuclear sector. Such actions can slow down dangerous projects, at least for a while, but they could also spark popular backlash, harden project leaders’ desire to continue, and provide dangerous ASI R&D efforts with incentives to avoid detection.   3. (3) Governments, corporations, foundations, and other entities could fund ASI safety measure development. This could increase the probability of ASI safety measures being available before ASI. However, ASI communities do not have consensus on ASI safety measure concepts or best approaches—more on this below—and some ASI safety measures may still take more time to develop than ASI, in which case ASI catastrophe would still occur. Here are some potentially important factors:
    1. Cases of misrepresentations:     1.  a.         Any computer simulation that implements a known false model         (e.g., the Ptolemaic model of the solar system) could not be         expected to render knowledge of planetary movement. 
      
      2. b. Any computer simulation that has no representational underpinning of the target system, such as heuristic simulations. (e.g. the Oregonator is a simulation for exploring the limits of the Belousov chemical reaction. Such a simulation implements a model whose system of equations is stiff and therefore it might lead to qualitatively erroneous results (Field and Noyes 1974, 1880)). A particular case of this is: 1. i. Any computer simulation that renders unrealistic simulated results, that is, results that cannot represent an empirical target system (e.g., a computer simulation implementing a Newtonian model setting the gravitational force to [$$G=1,\mathrm{m}^{3}\mathrm{kg}^{-1}\mathrm{s}^{-2}$$]).² Such simulations violate the laws of nature and cannot be considered empirically accurate.       2. 2. Cases of miscalculations: 1. a. A computer simulation that miscalculates due to large round-off errors, large truncation errors, and other kinds of artifacts in the calculation, such as ill-programmed algebraic modules and libraries. Such software errors cannot stand for valid simulation results of the target system.   2. b. A computer that miscalculates due to physical errors, such as an ill-programmed computer module or a malfunctioning hardware component. Similar to (2.a) above, these types of errors warrant invalid simulation results and, as such, do not render knowledge of the target system.   A generally valid principle in computer simulations is that there are no limits to the imagination of the scientists. This is precisely the reason why simulations are, one might argue, facilitating the shift from a traditional empirically-based scientific practice into a more rationally-based one. However, neither of the examples described above fit the conditions for a pre-computed reliable simulation. While simulations belonging to case (1.a) are insufficient for an accurate representation the empirical target system, those belonging to case (1.b) are highly contentious. The latter case, as is illustrated by the Oregonator example, is trusted only insofar as the results are subject to the subsequent acceptance by experts. Whenever this is the case, the simulation automatically fails to classify as pre-computed reliable and, therefore, violates the basic assumption of the singularity hypothesis. Although much of current scientific practice depends on these kinds of simulations, they do not comply with the minimal conditions for being a singularity, and therefore they inevitably fail to qualify as one. In the same manner, heuristic simulations must not necessarily render invalid results. For this reason, they are useful simulations for exploring the mathematical limits of the simulation model, as well as the consequences of an unrealistically constructed law of nature, among other uses. Such simulations facilitate the representation of counterfactual worlds, thought experiments, or simply fulfill propaedeutic purposes, but given our current conception of technological singularity, they do not qualify as such. Besides their representational features, computer simulations are also part of the laboratory instrumentarium, and as such are inevitably exposed to miscalculations of different sorts. A first classification of errors divides them into random errors, such as a voltage dip during computation, or a careless laboratory member tripping over the power cord, and systematic errors, that is, errors that are inherently part of the simulation. Random errors have little philosophical value. Their low probability of occurrence, however, do make a small contribution (although negligible) to the frequency of a process of producing beliefs that are false rather than true. Systematic errors, on the other hand, can be subdivided into logical errors (i.e., errors in the programming of software, such as the errors illustrated in (2.a) above), and hardware errors (i.e., errors related to the malfunctioning of the physical component of the computer, exemplified in (2.b) above). This paper concedes that miscalculations do occur in the practice of computer simulations, but assumes that they are rare and rather negligible for the overall evaluation of the reliability of computer simulations. The reason is that over the years of technological advancement, computers have become less prone to suffer from failure. A host of recovering procedures such as duplication and redundancy mechanisms for critical components, functions, and data grant this dependability. Also, new techniques in the design and practice of programming, as well as the plethora of programming languages and expert knowledge at the programmers’ disposal facilitate the assertion that computers are relatively error-free and fail-safe instruments. As a working assumption, then, I take that computer simulations are stable instruments that, most of the time, do not incur in calculation errors that might alter its results (or, if they do, such errors are entirely negligible). 9.3.1 Verification and Validation Methods The bluntly false and highly speculative examples used above constitute only a small portion of computer simulations used in scientific practice. For the most part, scientists interpret, design, and program the simulation of an intended target systems with remarkable representational accuracy and on stable instruments. This paper concedes this much. We must carefully distinguish, however, simulation results that require further epistemic sanctioning from results whose validity has been granted during a pre-computed reliability stage. An argument is advanced to the effect of addressing verification and validation methods applied during the pre-computed reliability stage that grant validity to simulation results. Verification and validation methods are at the basis of claims about the reliability of computer simulations. They build on the confidence and credibility of simulation results, and in this respect, understanding their uses and limits is central for claims about singularity. While verification methods substantiate the scientist’s belief that the mathematical model is correctly implemented and solved by the simulation, validation methods provide evidence that the simulation results match, with more or less accuracy, empirical data. Let us take a closer look at what comprises each method. The American Society of Mechanical Engineers (ASME), along with other institutions, adopted the following definition of verification: “[t]he process of determining that a computational model accurately represents the underlying mathematical model and its solution” (ASME 2006, 7). Thus understood, verification could be obtained in two ways: by finding evidence that the algorithms are working correctly, and by measuring that the discrete solution of the mathematical model is accurate. The former method is called code verification, while the latter is known as calculation verification. The purpose of making these distinctions is to categorize the set of methods for the assessment of correctness of the computational model with respect to the mathematical model, as opposed to assessing the adequacy of the mathematical model with respect to the empirical system of interest. Code verification, then, seeks to remove programming and logic errors in the computer program and, as such, it belongs to the design stages of the computational model. Calculation verification, on the other hand, seeks to determine the numerical errors due to discretization approximations, round-off errors, discontinuities, and the like. Both code verification and calculation verification, are guided by formal and deductive principles, as well as by empirical methods and intuitive practice (or by a combination of both). Validation, on the other hand, has been defined as “[t]he process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model” (ASME 2006, 7). Validation, then, is somehow closer to the empirical system since it is concerned with the accuracy in the representation of such a model. Validation methods also make use of benchmarks or reference values that help establish the accuracy of the simulation results. Benchmarking is a technique used by computer scientists for measuring the performance of a computer system based on comparisons between simulation results and experimental data. The simplest way to obtain such data comes from performing traditional empirical experiments. Now, since the true value of the empirical target system cannot always be absolutely determined, it is an accepted practice to use a reference value obtained by traditional measurements and observational procedures. A different situation is when the value of the target system can be theoretically determined, as is the case in quantum mechanics where the value of the position of atoms are obtained by theoretical methods. In such cases, the simulation results can be validated with high accuracy. In the same vein, results from different but related simulations could also be used for validation purposes, as these results can be easily compared to the simulation results of interest. Validation, then, aims at providing proof of the accuracy of simulation results with respect to the empirical target system of interest. Thus understood, the appeal to validation methods as grounds for reliable processes brings out questions that lurk behind inductive processes. The general concern is that such methods only allow validation up to a certain number of results, that is, up to those of which we have previous data. Due to their comparative nature, these methods do not provide mechanisms for validating new, unknown results. It follows that validation is a method for assisting in the detection of errors, but not designed for detecting misrepresentations of the target system. It should not be expected, however, that during an actual verification or validation process scientists decouple these methods. The multiple problems related to mathematical representation, mathematical correctness, algorithm correctness, and software implementation make the entire enterprise of verification and validation highly interwoven processes. Moreover, code verification by formal means is virtually impossible in complex and elaborated simulations. Let it also be noted that not all verification and validation methods are performed at the same stages of design and output of a simulation. Some verifications are only carried out during design stages, while others, such as manufactured solutions, depend on the intervention of an agent. Manufactured solutions are custom-designed verification methods for highly accurate numerical solutions to partial differential equations (PDEs). It consists in testing numerical algorithms and computer codes by finding solution functions that have altered the implemented PDEs, but which also satisfy such equations. As William Oberkampf and Timothy Trucano indicate, “[a manufactured solution] verifies many numerical aspects in the code, such as the mathematical correctness of the numerical algorithms, the spatial-transformation for the grid generation, the grid-spacing technique, and the absence of coding errors in the software implementation” (Oberkampf and Trucano 2008, 723). In a similar fashion, validation methods might focus on the design stages as well as on the output of a simulation. It is easier to devise validation methods requiring the intervention of an agent, for the construction and subsequent use of benchmarks requires such involvement. Examples of this abound in the literature and there is no need to discuss this point any further. One could always consult Oberkampf and Trucano’s list for the documentation of benchmarks, all of which must be in place for successfully warranting accuracy of the computed results (Oberkampf and Trucano 2008, 728). 9.4 Final Words In this paper I defended the idea that, under the right conditions, computer simulations are reliable processes that produce, most of the time, valid simulation results. Valid simulation results are taken as knowledge of the target system, facilitating the claim that computer simulations are a technological singularity. Now, in order to fully qualify as a technological singularity, such results must not be sanctioned by a human agent. On the face of it, the number of simulations that qualify as a technological singularity has been reduced to a few well established cases with representational underpinning and error-free computations. Such representation and error-free computations are grounded on verification and validations methods, as elaborated above. One immediate consequence is that the universe of computer simulations has been significantly reduced. At first, this outcome might strike one as an undesirable and anti-intuitive consequence. One might think that many computer simulations are being used today as reliable processes producing valid results of a given empirical system, and that there is no special problem in doing so. The general trend nowadays is to overthrow humans as the ultimate epistemic authority, replacing them with computer simulations (Sotala and Yampolskiy 2015). A good example of this is the simulation of the spread of influenza, as expounded by Ajelli and Merler (2008), where two different kinds of computer simulations provide knowledge of a hypothetical scenario. In some situations, in effect, no further sanctioning is needed and the information provided by these simulations is used as obtained. However, contrary to appearances, the vast majority of cases get their results sanctioned after they have been produced, undermining the possibilities of becoming a technological singularity. One might safely conclude that the number of computer simulations that qualify as a singularity are, indeed, limited. Admittedly, much more needs to be said in both directions, namely, what grounds computer simulations as a technological singularity (especially regarding verification and validation methods) as well as how current scientific practice accommodates this philosophical view. Equally important is to elaborate on cases such as Ajelli et al., where the simulation seems to be a singularity if used in certain situations and fails to be one in some others. Acknowledgements This article was possible thanks to a grant from CONICET (Argentina). Special thanks also go to Pío García, Marisa Velasco, Julián Reynoso, Xavier Huvelle, and Andrés Ilcic (Universidad Nacional de Córdoba - Argentina) for their time and comments. References Marco Ajelli and Stefano Merler. The impact of the unstructured contacts component in influenza pandemic modeling. PLoS ONE, 3(1):1–10, 2008.CrossRef ASME. Guide for verification and validation in computational solid mechanics. Technical report, The American Society of Mechanical Engineers, ASME Standard V&V 10-2006, 2006. R. J. Field and R. M. Noyes. Oscillations in chemical systems IV: Limit cycle behavior in a model of a chemical reaction. Journal of Chemical Physics, 60:1877–1884, 1974.CrossRef Roman Frigg and Julian Reiss. The philosophy of simulation: Hot new issues or same old stew? Synthese, 169(3):593–613, 2009.CrossRef Alvin I. Goldman. What is justified belief? In G. S. Pappas, editor, Justification and Knowledge, pages 1–23. Reidel Publishers Company, 1979. I. J. Good. Speculations concerning the first ultraintelligent machine. In F. Alt and M. Rubinoff, editors, Advances in Computers, volume 6. Academic Press, 1965. Paul W. Humphreys. The philosophical novelty of computer simulation methods. Synthese, 169(3):615–626, 2009.CrossRef Ian C. Jenkins, Marie T. Casey, James T. McGinley, John C. Crocker, and Talid Sinno. Hydrodynamics selects the pathway for displacive transformations in DNA-linked colloidal crystallites. Proceedings of the National Academy of Sciences of the United States of America, 111:4803–4808, 2014.CrossRef Hans Jonas. Technology and responsibility: Reflections on the new task of ethics. In Morton Winston and Ralph Edelbach, editors, Society, Ethics, and Technology, pages 121–132. Cengage Learning, 2011. R. Kurzweil. The singularity is near: When humans transcend biology. New York: Viking, 2005. James Miller. Some economic incentives facing a business that might bring about a technological singularity. In Amnon H. Eden, James H. Moor, Johnny H. Soraker, and Eric Steinhart, editors, Singularity Hypotheses: A Scientific and Philosophical Assessment. Springer, 2012. W. L. Oberkampf and T. G. Trucano. Verification and validation benchmarks. Nuclear Engineering and Design, 238(3):716–743, 2008.CrossRef A. M. Turing. Computing machinery and intelligence. Mind, 59(236):433–460, 1950.CrossRef V. Vinge. The coming technological singularity: How to survive in the post-human era. In Proc. Vision 21: interdisciplinary science and engineering in the era of cyberspace, pages 11–22. NASA: Lewis Research Center, 1993. E. L. Wright. Preliminary results from the FIRAS and DIRBE experiments on COBE. In M. Signore and C. Dupraz, editors, The Infrared and Submillimeter Sky After COBE. Proceedings of the NATO Advanced Study Institute. Les Houches, France, Mar. 20–30, 1991, pages 231–248. Kluwer, 1992. Kaj. Sotala and Roman. V. Yampolskiy. Responses to catastrophic AGI risk: a survey Physica Scripta, 90(1), 2015. Footnotes Since the context is clear, from now on I will use the terms ‘technological singularity’ and ‘singularity’ interchangeably. Unrealistic results are not equivalent to erroneous results. In this case, the results are correct of the simulation model although they do not represent any known empirical system. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_10 10. Can the Singularity Be Patented? (And Other IP Conundrums for Converging Technologies) David Koepsell^(1, 2  ) Research and Strategic Initiatives, COMISIÓN NACIONAL DE BIOÉTICA, Mexico city, Mexico Universidad Autonoma Metropolitan, Xochimilco, Mexico David Koepsell Email: drkoepsell@gmail.com 10.1 Introduction Many of the societal controls and policy considerations discussed in relation to the development of “The Singularity” or artificial general intelligence (AGI) relate to potential harms and risks. There is one important area, however, where current legislation and policies pose a risk to its development: the laws of intellectual property. Intellectual property (IP) is now taken for granted as somehow necessary for innovation, but it poses a considerable threat to both the eventual development of the Singularity, and to our rights, duties, and obligations once such a Singularity emerges. If we are to properly anticipate and guide the development of AGI in positive ways, we must come to terms with the role that IP plays in either promoting or hindering innovation, the impact it will have on AGIs as well as their developers, and manners in which might better manage IP to allay some of the anticipated risks. Many people credit the development of the legal institution we call “intellectual property” with helping to propel us into the modern, technological age. Before intellectual property (IP) was invented, there was no means available for people to prevent others from utilizing their ideas, aside from force. This is because ideas cannot ordinarily be monopolized, except by secret-keeping, and even then they are prone to independent discovery or thought. Because of the inherent uncontainability of ideas, states have, within the past couple hundred years, devised legal monopolies over expressions in various forms under the belief that such monopolies would help encourage innovation and immigration of highly skilled and inventive people. The first such monopolies were “letters patent” issued by sovereigns to those who devised new and useful arts. These letters patent entitled their holders to monopolize a market for a term of years. Over time, various states began to formalize these sorts of processes, and introduced also monopolies for artistic works, believing that these incentives would help to encourage technological and economic growth. Modern IP includes copyrights, patents, and trademarks. Each works slightly differently, and covers specific types of expressions. Patents are monopolies granted by the state to inventors of new, useful, and nonobvious utilitarian objects. The monopoly term for patents runs 20 years, after which the invention lapses again into the public domain. The copyright term, which covers aesthetic expressions, lasts the lifetime of the author plus an additional 70 years. Trademarks cover trade names, and fall outside of this discussion. Our interests here focus on the effects of both patents and copyright on the emergence and promise of the singularity, as it has been hypothesized by Ray Kurzweil and others. Both copyrights and patents are pertinent due to some oddities in the law of IP that I began to explore nearly 20 years ago, and which remain unresolved. But more to the point, patents are the primary threat, at least in their current forms, to both the emergence of the singularity and its achieving its full promise. 10.2 A Singular Promise The singularity can be understood in numerous ways, and people more technically inclined have devised excellent explications of its potential nature. For our purposes, let us examine it in its most general possible forms and then discuss how various legal institutions such as copyrights and patents may impact on its eventual achievement. The technological singularity will involve a quantum leap in our technologies in such a way that society itself will change. This could happen through the realization of true artificial intelligence, which, if it can be created, will in all likelihood be unbounded by the biological limitations imposed upon our own intelligences. The singularity may also be achieved by way of nanotechnology or some other radical new approach to our material world such that objects can be programmed: our whole physical environment could be alterable either by ourselves, or merged with our artificial intelligences. Finally, the singularity may involve some transhumanist future in which we ourselves become the objects of our technological change, capable of super-intelligence, strength, or longevity… or even immortality. The singularity may involve some combination of all of these, either to completion or partially realized. In any event, any or all of these technological achievements will likely usher in radical challenges to our present cultures and institutions. How we manage our current institutions, including laws, will likely have some impact too on the eventual realization of the singularity, and how it may then affect us once achieved, as we shall see. Technological advances have necessarily wrought significant reflection, if not always rapid change, in the ways that we consider the roles of law in the field of innovation. Specifically, of late there has been some shifting of approaches to innovation and how best to encourage and protect investments. Open Source is now a serious option for even the biggest, most established companies seeking to create economic climates more suitable to commercial exploitation of some niches. This is perhaps because legal monopolies may discourage cooperation and competition, both of which may simultaneously be viable and profitable means of building a nascent technology’s acceptance and success. Even early in the growth of the automobile industry, competing companies realized that legal monopolies could stifle the development of their technologies, and so entered into “patent pools” and other cooperative means to prevent “patent thickets” and encourage competition and the technological growth it can promote. The technological singularity, if it is underway, or if it is still far off on the horizon, will likely lead to similar legal, social, and institutional innovations if the current law of IP threatens to stifles its achievement, as I suggest below. 10.3 Intellectual Property I have spent the better part of 15 years criticizing IP law in its current form for a variety of reasons, some pragmatic, and others essential. The law as it stands embodies a very confused metaphysics (by metaphysics, I mean the manner in which we classify its objects), and increasingly these metaphysical errors lead to absurd results. I will discuss my general issues with IP law briefly, and apply them to two particular issues I see as arising from IP’s odd metaphysics and application specifically for the singularity hypothesis, namely: (1) does patent law as it now stands allow the patenting of artificial intelligences, and is this an ethical or practical problem, and (2) will IP protection apply to the creative products of artificial intelligences, and should it? These two complex but unresolved issues in the law of IP pose real potential impediments to its realization in the near future. 10.3.1 Some General IP Problems in Converging Technologies I have written at some length about some of the practical and theoretical oddities and problems associated with IP law, some of which are particularly noticeable with the advent of digital and bio-technologies in the past few decades. I summed much of this up in my monograph, Innovation and Nanotechnology: Converging Technologies and the End of Intellectual Property. There I concluded that the train-wreck of IP, which began with the courts and patent offices granting simultaneous patents and copyrights (two, previously mutually-exclusive categories) to software, was culminating in the present innovative climate with a mandate for innovators to simply ignore and work around IP law entirely. This is because the courts and legislators who have drafted the current IP laws have interpreted them in ways that, while they might please IP lawyers and monopolists who have the means to acquire vast patent portfolios and thus venture capital (or worse, litigation awards), do not understand the underlying science or metaphysics of artifacts and nature. The result has been a slow usurpation of what many call the “scientific commons” and its gradual conversion into private property. This is especially problematic, I argue, for developing the technologies that will lead to the singularity. Over time, the categories that originally bounded the objects of copyright and patent have been revealed to be flawed. Copyright was originally applied to “non-utilitarian” expressions, like works of art. Patents have been applied to “utilitarian” expressions, namely “Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.” These criteria have been interpreted over time to apply to nearly anything, including elements on the periodic table, business methods, software, algorithms, genes, and other seemingly problematic sorts of things have been granted patents that have been upheld. Even lifeforms may be patented, as long as they are not human, and as long as they have been “engineered” through some human intervention. Three explicit exclusions (other than the statutory exclusion of people) created by famous court cases and adopted by patent offices more or less around the world include: abstract ideas, products of nature, and natural phenomena. While these exclusions should have seemed more or less obvious under the original conceptions of patent law (which was to promote invention of new and useful things) the catch-all type of language in the US Patent Act’s section 101 quoted above, and applied through various treaties to patents world-wide, have resulted in a blurring of the lines delineating nature from artifact, and courts have attempted through these exclusions to redraw those lines. Unfortunately, the exclusions sought to be clarified have been further muddied through poor metaphysics in courts and patent offices. What counts as a “product of nature,” for instance, is now more or less nothing. For instance, if you want to patent a product of nature, like the molecule O₂ (gaseous oxygen, created through such natural processes as photosynthesis) just come up with some man-made way to synthesize it. Thus, if I discover that I can concentrate O₂ through electrolysis of water, separating the O₂ from the H₂ in it, then under the current interpretation of the patent laws in the US and Europe, I can patent not just these new processes, but the products themselves. Suddenly, O₂ is an invention. All of which poses a real problem in nanotechnology, and converging technologies in general, because as the scales of manufacturing new and useful things shrink, the building blocks comprising new products as the singularity nears will be at the molecular level. Consider the patent thickets that have emerged in the cell phone market, and now multiply by hundreds. This will slow down the convergence of technologies like nanotechnology and synthetic biology, and hinder the emergence of the singularity. Indeed the law of IP seems to be reaching a crisis point as technologies converge upon the singularity. Written expressions and machines appear to be converging, blurring the line between aesthetic and utilitarian expressions in general. Biology and manufactures are converging, blurring the lines between artifacts and organisms. And the scales at which the building blocks of new artifacts are created are becoming smaller, making fundamental elements, deemed at one time to be products of nature, to be the medium for manufacturing nearly everything. Beyond these conundrums, however, lurk even more insidious ethical issues when (or should) the singularity emerge. What, for instance, will we do about patenting machines (or software) when they acquire sentience, and will we apply IP to the products of their imaginations? 10.3.2 Some Gaps in IP Relating to the Singularity Before the twentieth century, there was no question about it, life forms could not be patented. This was largely because of the fact that technology had not yet advanced to the state of being able to consciously create new, non-obvious, and useful lifeforms. In 1930, the first US Plant Patent Act changed all that. Lobbied for by plant breeders as well as the likes of Thomas Edison himself, the Plant Patent Act provided the first monopoly protection under the Patent Act: 35 U.S.C. section 161: “Whoever invents or discovers and asexually reproduces any distinct and new variety of plant, including cultivated sports, mutants, hybrids, and newly found seedlings, other than a tuber propagated plant or a plant found in an uncultivated state, may obtain a patent therefor, subject to the conditions and requirements of this title.” This was the first legislation to recognize and grant state-monopolies to living creatures. And why not? If the patent law states explicitly that any new, useful, and non-obvious composition of matter or manufacture should be able to be patented, then there’s no particular reason embedded in these criteria to exclude lifeforms of any kind. Still, it wasn’t until 1980 that the US Supreme Court extended this reasoning to non-plant life, in the seminal case of Diamond v. Chakrabarty. Chakrabarty had “engineered” through selective breeding a bacterium that could digest petroleum, which would certainly be a useful new, non-obvious invention, helpful for cleaning up oil spills, etc. but the US Patent and Trademark Office refused to grant a patent for the organism citing no previous history, and thus implicit policy against patenting life. But the Supreme Court disagreed, and found the new organism patent-eligible, while stating that there were still some explicit exclusions to patent-eligibility, including: abstract ideas, products of nature, and natural phenomena. Interesting for the singularity hypothesis are the following: to what extent might the building-blocks of converging technologies fall under one of the Chakrabarthy exclusions, and should all engineered life-forms be patent-eligible? One glaring gap in the law as it relates to converging technologies remains in the definitions of “abstract ideas,” “products of nature,” and “natural phenomena.” Vagueness in the categories will continue to effect innovation in nanotechnology and artificial intelligence. The singularity, if it is technically achievable, will break down barriers among these categories, requiring us to reevaluate philosophically and practically what counts as abstract ideas, products of nature, and natural phenomena. Will artificial intelligences exhibit properties of nature, even if they are artificially produced. Will evolutionary forces, perhaps directing their future forms, require us to consider them to be under the guidance and direction of natural phenomena? If so, then do current IP exemptions mean that we cannot claim IP rights over future products? If life was a big exception under IP law prior to Chakrabarthy, and human life remains so now, what are the necessary and sufficient features of life that make it excludable? We can deduce from the conflict between the statutory exclusion of human life from patentability, and the acceptance of other lifeforms as patentable after Chakrabarthy, that something special about humans prohibits owning them through IP. Of course, constitutional prohibitions against slavery should cover some forms of human ownership, preventing owning any one human, but extending this notion to IP conflates IP and ordinary property to an unfair extent. Simply put: IP does not confer ownership rights over tokens. That is to say, a patent holder doesn’t have an ownership interest in any instance of his or her invention. Rather, they own the right to exclude others from profiting from the manufacture and first sale of the thing over which they hold IP rights. So IP ownership over engineered or partially-engineered humans ought not to be prohibited by the same rationale that prohibits slavery. So what values would rationally prohibit ownership over types of some technical artefacts or processes and not others? One answer may lie in deontology and some Kantian concern over the effects of claims to ownership of types regarding sentient creatures as opposed to others. Are we entitled to use lifeforms as instrumentalities, or means to particular ends, or must we respect them (or at least some of them) as possessing certain inherent dignities which IP rights might offend? On its face, we clearly reject an expansive view of this thesis, since there are numerous lifeforms, even arguably “sentient” ones that we feel free to exploit as instrumentalities to varying degrees. 10.4 Limits to Ownership and Other Monopolies Not everything is susceptible to just claims of monopoly or other control. Some limits to what may be owned are moral, and others are practical. Still others are legal creations, designed to achieve certain utilitarian ends. Most recently, the Supreme Court in the U.S. has attempted to define better some existing limits to what could be claimed by patent. The Abstract Idea, Natural Phenomena, and Laws of Nature exceptions have been grappled with in the Bilski, Mayo, and Myriad cases between 2010 and 2014. First we will consider each of these exceptions and recent cases briefly, then examine their effect, if any, on singularity technology. Intellectual property law was created in order to encourage and advance the march of the “useful arts.” By granting temporary monopolies over the production and sale of new inventions and discoveries through various patent laws, governments attempt to promote economic development. IP law walks a delicate balance, however, between promoting invention and not impeding science. Scientific discovery typically focuses upon exploring and understanding the laws of nature, natural phenomena, and abstract ideas that govern the universe. This is perhaps a critical reason why the courts have carved out these subject matter exceptions to patent eligibility. Setting aside the moral problems posed by allowing monopolies over mere discoveries of naturally-occurring phenomena or laws, a practical and unwanted consequence might be to prevent researchers in basic sciences from conducting their research freely. There have been certain realms in which such a monopoly was never deemed proper. The stated purpose of IP law is to promote invention, innovation, technological development and thus economic development. Science is typically the field viewed most appropriate for basic research into the working of nature, and the impetus for scientific discovery differs from that of commerce and industry. Science is impelled largely by scientific culture, which rewards discoveries about nature and its laws through academic careers, prizes, and often through the recognition and joy many of those who pursue scientific careers receive from uncovering nature’s mysteries. Although competition and egos are also powerful motivators for scientists to pursue their work, for science to work correctly, a degree of openness and recognition of limits of knowledge and the contingency of present understanding, ensure that humility and cooperation remain part of the scientific culture as well. By recognizing that abstract ideas, laws of nature, and natural phenomena cannot be patented, the US Supreme Court has upheld, despite the lack of explicit exclusion in the Patent Act, a boundary between science and technology; between what may be owned and what may not. The boundaries that have arisen in IP law will mean very little when it comes to the singularity, because the singularity promises to erase preexisting boundaries in revolutionary ways. Currently, one can patent life, just not humans. Currently, one can patent isolated and modified natural phenomena and products of nature, but not if they are, apparently, morphologically identical to natural products. What remains free for all are natural products qua natural products, and natural phenomena that have not been otherwise modified by man to serve some end. Where then does this leave the following possibility: converging technologies that result in artificially intelligent, self-replicating entities capable of modification through “evolutionary” means? Under current IP regimes, what repercussions will there be? What will be ownable through patent, and what not? 10.5 Owning the Singularity Under the current IP regime described above, an artificially-created intelligent agent would be ownable, as long as it isn’t somehow composed by cloning human DNA, which is so far illegal, and humans are specifically exempt from patent, even if engineered in some way. Such a product qualifies for patent as long as it meets the criteria of new, non-obvious, and useful. It is trivial that, given a lack of explicit exception for intelligent inventions, such an agent would be susceptible to IP claims. After all, who would invent such a useful thing without the patent incentive, one might argue. Be that as it may, patenting such an agent has significant moral repercussions and practical implications that we should consider. Our hypothetical agent of the singularity will be able to be legally monopolized under patent by its creator. This means the following: no one may reproduce that agent without the patent holder’s license. This may seem unproblematic at first; you made your intelligent agent through your own ingenuity and inventiveness, so why should anyone else be able to profit? But remaining questionable is whether that agent’s ability (forgetting for the moment whether it has any rights) to reproduce is thwarted by the patent law. More complicated is the question of whether, if that agent can create new, non-obvious, and useful things of its own, that agent, the creator, or some other agent will have the ability to patent those new creations. Whether we create some new, non-obvious, and useful agent through silicon-based artificial intelligence, uplifting other creatures, or through nano- or bio-tech, those agents will be ownable, to a degree. To the extent that they will themselves be impeded from reproducing duplicates of themselves, their rights as sentient being will be curtailed. They will be prevented from expressing one of the basic rights of sentient beings: self-reproduction. More curious will be the status of their inventions, if any. If they are sentient beings, and capable of creating new, non-obvious, and useful new things, will they be able to patent them, or will their own inventors? If the former, then curtailing their right to self-replicating reproduction seems unfair, whereas if the latter, then this seriously alters the manner in which we think inventors gain rights to the fruits of their intellectual labor. A significant and perhaps insurmountable problem for IP in general will be the question of who shall rightly own the fruits of the inventiveness of inventive machines. It ought to cause us to question the foundations of IP law itself and its utilitarian intentions. A singularity will be able to compete with us humans in all realms, including invention and creativity. If so, and we don’t bar non-humans from enjoying the benefits of a state-sponsored monopoly, singularity agents may well usurp our roles as inventors, and monopolize the realm of monopolies through the benefit of the state. This is a purely pragmatic concern, however. More theoretical is the ethical question regarding whether we could justly, though IP law, prevent artificial agents from reproducing themselves. Let’s consider this first, and then re-raise the pragmatic concerns to suggest that IP ought to come to an end before the singularity is realized, 10.6 Ethics, Patents and Artificial Agents We may well decide not to grant rights of personhood, including the various freedoms we enjoy as humans, to artificial life-forms or agents regardless of whether they fulfill the definition of The Singularity. This might be a good idea, especially if we don’t want to compete with them in the marketplace of ideas where we are used to state-sponsored monopolies giving inventors a leg-up. But is it just? It is only just if we make an exception for humans when it comes to basic rights, including self-replication or reproduction, or if we exclude these from basic rights altogether. We already exclude non-human animals from having basic rights, so the first choice is already in force, but this may well be due to the cognitive abilities of non-human animals and not due to their particular material make-up. Recently, as some have rallied to include the higher apes and dolphins among the ranks of full rights-holders, it has become clear that the category of a species is not the only criterion by which we should consider other being to be potential rights-holders. Rather, capacities appear to be a better arbiter. After all, we measure the rights of humans by capacities as well, holding people to differing levels of responsibility according to their capacities at the time of an act. We even hold certain animals responsible, at least to the extent of punishment, sometimes capital, for transgressions and harms. So we may well ask, and the patent offices in both the US and Europe are bound to ask, whether issuing a patent over a sentient being would violate an ethical duty or moral code. Morality has been considered to be a measure of patentability due to the utility clause, and immoral inventions may be denied because they lack proper useful purpose. Thus, while one might invent something immoral, one may not patent it. But what if the patent itself is immoral? In other words, even if the invention is not immoral, what if the patenting of the thing invented is itself immoral? We should consider the reasoning behind the prohibition of patents on humans, and extend that reasoning to its logical conclusion. After the Chakrabarthy decision, the PTO issued a statement indicating that, while patents on new life-forms would be granted, they would not issue patents for “humans.” The rationale was that such patents would conflict with the US Constitution, likely but not explicitly referring to the 13th Amendment, which prohibited slavery. This self-imposed moratorium seems also to include human embryos, as well as fully-formed humans. The justification proffered for this limitation on “human” patents is threefold: Congress never intended for humans to be patentable, immoral patents fail under the utility requirement, and the 13th Amendment prohibits owning humans. Notable about this reasoning, setting aside for the moment the merits of these arguments, is that none of them will do much to guide patenting artificial agents. An artificial agent will be patentable at least as a new, non-obvious, and useful composition of matter or as a machine. The moral utility doctrine, which is not statutory but court-made, is not clearly applicable to an artificial agent, unless we either view the making or existence of such agents as somehow immoral, and the 13th Amendment has never been applied to non-human persons. This will require us to consider the question, already raised in the cases of higher apes and dolphins, of whether we ought to extend to other intelligent creatures some degree of “personhood” bringing them under the protection of existing laws, or create some new category of rights covering other sentient entities. Another interesting and potentially troubling aspect of patents regarding artificial agents is the problem of software patents. Specifically, typically excluded from patent are “purely mental processes,” presumably because in order to be patentable a process must be instantiated in some machine or composition of matter. One problem for an artificial agent embodying singularity-type technology will be the nature of its “operating system” and whether and the degree to which it may not be under the control of the agent itself. Are its thought processes and software purely mental processes? Does this mean they are not patentable by others or by itself? Even if we grant an agent its liberty by excluding it from patentability as a composition of matter, based upon some extension of the moral reasoning prohibiting patenting of humans, will we be able to charge such agents royalties for the use of code, upgrades, etc., all of which may be software necessitating or enabling mental processes in the agent. The basis for the moral prohibition against patents on humans seems clear: a general acceptance of the liberal notion at least of self-ownership. If a second party can claim a monopoly over either a part or a whole of another agent, this seems to cut into a significant part of the claims ordinarily attributed to owners. From this flows considerable trouble if we begin to accept the notion that agents can be non-human, and that they may too be rightly endowed with rights previously reserved only for humans. Extending to non-human agents the evolution of rights as they have changed in liberal democracies, and now embraced by universal treaties, means rethinking the nature of intellectual property because at some point our machines, even if invented wholly by the use of human creativity, will be constrained in their agency by the laws which enable patents. We as humans too may wish to carefully reconsider the extent to which we will be willing to allow our machines, where they act creatively and inventively, to hold the threat of monopoly over us. If the singularity’s Bill Gateses, Steve Jobs, or even David Foster Wallaces can produce and monopolise the objects of their creativity, what chance will humans have under the current law to compete? 10.7 The Open Alternative Recently, many working to develop new, groundbreaking technologies have opted to keep them open for all to use, rather than bottle them up behind state-sponsored monopolies. Elon Musk’s Hyperloop concept is one such example as is his decision to open up the standards behind his Tesla automobiles and not pursue infringement claims against competitors. Google’s most valuable code sits not behind the safety of patents, but rather as a trade secret which could last indefinitely, as long as no one reverse engineers or duplicates the code. Many software developers are choosing to release their code without patents, but using instead open source or other non-exclusive licenses such as that used by Linux. The benefit of open innovation is partly realized by fewer lawsuits, and competition and cooperation seem to be working as motivating and even profitable impulses in non-patented technologies. If applied to products that will move us toward the Singularity, many of the potential pitfalls discussed above will not materialize, even while we’ll still grappled with risk and ethics along the way. It seems most likely that as the rapid growth of new technologies that edge us toward the Singularity continues, and keeps challenging the categories that had already begun to fail underlying IP law, that those who wish to see its imminent development, free of the legal costs and risks that other technical developments have grappled with, the Open alternative will dominate, help prevent legal and economic thickets, and similarly help to avoid the dangers presented by IP in a world in which humans may not be the most inventive creatures around. References Ass’n for Molecular Pathology v. Myriad, 133 S. Ct. 2107, 569 U.S., 186 L. Ed. 2d 124 (2013). Bell, J. J. (2003). Exploring The “Singularity”. Futurist, 37(3), 18–25. Bilski v. Kappos, 130 S. Ct. 3218, 561 U.S. 593, 177 L. Ed. 2d 792 (2010). Diamond v. Chakrabarty, 447 U.S. 303, 100 S. Ct. 2204, 65 L. Ed. 2d 144 (1980). Eckersley, R. (2001). Economic progress, social disquiet: the modern paradox. Australian Journal of Public Administration, 60(3), 89–97. Johnson, L. (2009). Are We Ready for Nanotechnology? How to Define Humaness In Public Policy. How to Define Humaness In Public Policy. Mayo Collaborative v. Prometheus Labs., 132 S. Ct. 1289, 566 U.S. 10, 182 L. Ed. 2d 321 (2012). Patent Act: 35 U.S.C. § 161. Vinge, V. (1993). The coming technological singularity. Whole Earth Review, 81, 88–95. Yampolskiy, R. V. (2013). What to Do with the Singularity Paradox? (pp. 397–413). Springer Berlin Heidelberg. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_11 11. The Emotional Nature of Post-Cognitive Singularities Jordi Vallverdú¹   Philosophy Department, Universitat Autònoma de Barcelona, E08193 Bellaterra (BCN), Catalonia, Spain Jordi Vallverdú Email: Jordi.Vallverdu@uab.cat 11.1 Technological Singularity: Key Concepts Let’s begin by defining three, key concepts: Singularity, Post-cognition and Para-emotions.
  1. (i) Singularity: It is widely accepted that the world is on course for a technological Singularity—“an event or phase that will radically change human civilization and perhaps even human nature itself” (Sotala and Yampolskiy 2015). On what basis do we anticipate such a radical Singularity event? Such changes can be inferred from two different scenarios:
  2. (a) The emergence of artificial, super-intelligent agents (i.e. software-based synthetic minds), and   2. (b) The emergence of a ‘post’-human race (a thesis defended by trans-humanists positing a technology-driven evolutionary change in humans’ mental and physical capabilities). It is argued that such changes would follow an accelerated growth pattern, reaching a singular point relatively soon. In its final stages, this process will create an ‘intelligence explosion’. Consequently, two different kinds of entities are predicted under the Singularity umbrella: enhanced humans and artificial devices (initially created by human beings). In this chapter I will refer to both as kinds of ‘entities’, rejecting the distinction between ‘natural’ and ‘artificial’ as outmoded and just plain wrong. Both entities will be under global, evolutionary pressures. A simultaneous, Singularity-class emergence of synthetic agents and the evolution of transhumans must not be accomplished or happen at the same time. Yet it seems that the arc of development for both such Singularity events is following similar timescales and is also conceptually plausible. An excellent overview on the technological Singularity is found in the first volume of The Frontiers Collection (Eden et al. 2013) as well as in the first chapter of this book. Some of authors identify the Singularity as a catastrophic risk for humanity (Eden et al. 2013, p. 1).
  3. (ii) Post-cognition: The new level of cognition produced by post-Singularity Entities is what I call ‘post-cognitive’. Until very recently, humans accumulated knowledge about the world through use of their own brains, with some help from instruments. The advent of computers has changed this, leading to what I call a Fourth Paradigm of e-Science and e-Humanities, which features cooperation between humans and machines (Vallverdú 2009; Casacuberta and Vallverdú 2014). Now, in some cases, expert systems and advanced AI programs are generating new knowledge autonomously. Of course this has been a process getting here and now we can identify four, big historical cognitive steps (Fig. 11.1) Fig. 11.1 The big, historical cognitive steps These four steps, described in more detail, are:
    1. The Natural Stage: The Natural stage featured an elementary level of     cognition where humans processed basic information and responded to     their environments with automatic and pre-wired decision-making     strategies such as in FoxP genes which has been shown in fruit flies     (DasGupta et al. 2014). 
      
    2. 2. The Cultural Stage: This Cultural Stage saw the creation of cultural elements (such as thinking concepts, grammar, tools and basic machines) which have allowed the human species to reach a new level of knowledge, growth and innovation. These cultural elements have contributed to the worldwide spread and survival of our species and have made it possible for modern humans to manifest their cognitive and mechanical ideas. As a result, the co-evolution between biological and cultural forces became inextricably intertwined.   3. 3. The Computational Stage: The Computational Stage covered the period when humans harnessed machines that are capable of performing some calculations that were previously done in human brains. Most progress only occurred in second half of 20th Century with the creation of universal programmable machines, in spite of earlier theoretical efforts from thinkers like Llull, Leibniz, Babbage, Frege or Boole, who rank among a long list of those who worked in this area. These programmable machines have enabled the emergence of Artificial Intelligence and a new era of epistemological advances that are now ubiquitous in human environments.   4. 4. The Meta-statistical Stage (Trans-human/super-intelligences): The Meta-statistical stage is anticipated and refers to an era when there will be post-Singularity intelligences, such as upgraded humans or evolved machines. Both of these forms of intelligence will be able to deal with big amounts of data in real time as well as analyze new types of data using several statistical approaches. This super-intelligent approach will generate a new understanding of the universe—an understanding that will go beyond our current human capacity.
  4. (iii) Para-emotions: I am not arguing that future super-intelligences and post-humans will be functionally equivalent beings to each other. However, as I will demonstrate in the next section, I believe the role of emotions in both of these entities is likely to be comparable. Emotions will have deep interrelations and consequences for both. All living entities share the same ways to select useful information from the environment and process it efficiently. They use emotions for this purpose. And despite the complexity of human emotions, it is clear that emotions will be also present in Transhumans (or Super-Humans, as Damasio 2013 called them), not because Transhumans will think emotions are good or interesting but because emotions are necessary for complex behavior in uncertain environments. The emotional interactions of post-Singularity intelligences will go beyond that of our current emotional systems. As such, they might best be described as ‘para-emotions’ because of the deep implications of the new informational structures. A longer description will be made on Sect. 11.3. 11.1.1 Tools and Methods My epistemological approach is a rationalist one, based on the analysis of inferential processes that can follow the actual ranges of data about cognitive entities (natural and artificial). What I need to clarify is that it is not possible to establish linear inferences from our data and these possible future scenarios. There are possible, multi-causation paths towards certain future entities, but there are not certainties. At the same time, some of the ideas about emotional evolution have been partially obtained by previous computational simulations that explored the relationships between actions and proto-emotional states as well as the evolution of human’s emotional syntax (Vallverdú and Casacuberta 2011; Vallverdú et al. 2013). Finally, some of the ideas that appear here can be put under the umbrella of Gedankenexperiments, or mental experiments, which are commonly used within the philosophical and mathematical arenas, but also in the sciences (Laymon 1985; Horowitz and Massey 1991; Shanks 1998, Gendler 2000; Reiss 2003; Sorensen 1992). We might think of Galileo’s mental experiments as examples. 11.1.2 Singularity: Main Hypotheses I propose several hypotheses about the nature of post-Singularity Entities, their nature, intelligence and the role of emotions in their ability to interact innovatively with a complex world:
  5. (a) The entities that will emerge from the Singularity will be physical in nature. Humans are the result of a long and continuous evolutionary process that has involved body changes. And in certain historical periods, there was also a tight co-evolution of biological and cultural forces. This process has been well explained recently in the Embodied and Extended Cognition Paradigms (Clark 2008; Wilson and Clark 2009; Wilson and Keil 1999). Bodies shape intentionality. Some entities cannot understand the purpose of their existence, but in all cases, a living existence seeks survival and reproduction, with plenty of possible strategies. Looking into the future, we cannot know the exact form, shape or even the matter from which post-Singularity intelligences will be made or built; they will have bodies of some kind. So we can think of them as them ‘embodied’ intelligences. And, even in the extreme, if these entities are electronic and exist in a “cloud computing” space, they will still be reliant on physical architecture or hardware. So they, too, will have a physical form. I’ll call these post-Singularity intelligent bodies, meta-bodies.
  6. (b) The entities that emerge from the Singularity will also be ‘living systems’. To be ‘alive’ implies that an entity will interact with the environment to try to fulfill its physical necessities, unlike stones. Post-Singularity Entities will interact with their environment, so they can be thought of as ‘living systems’. This includes non-human, intelligent systems that engage in active interaction with the surrounding world. Irrespective of their specific nature—biologically evolved or man-made–these meta-bodies will have physical necessities, surely energetic, and these will drive their initial intentionality. These necessities will label, or mark, some branches of information as more important and valuable to the body than others. Such preferred branches will constitute a proto-emotional system to orient the post-Singularity Entities. Therefore, embodied intelligence will not be ‘informationally’ neutral. The entity will ‘ask’ for specific informational inputs that can be energetically or epistemologically valuable. Intelligence is also a property of living, physical entities and intelligence will most certainly be present in post-Singularity Entities. After all, intelligence is a mechanism to find solutions under situations of uncertainty with low levels of information. Future knowledge will be statistical in nature, so I’ll call the cognitive processes of these new Singularity Entities ‘post-cognitive’.
  7. (c) The meta-bodies of the entities that emerge from the Singularity will have embodiment requirements. These requirements will direct the choice of post-cognitive strategies they will ‘run’ under limited information scenarios (as with any living entity in the universe). Consequently, the entities will need innovative methods to manage all the information. They’ll need emotions, which will function as heuristics. Para-emotions for post-Singularity Entities will be critical shortcut rules to dictate and manage the reactions of the meta-bodies. In order to be truly innovative, these heuristics will need to be able to generate metaheuristics—more generalized, higher-level techniques—that will free the entity from sequentially-bound thinking. Metaheuristics will be based on intensive statistical tools and methods, yet these innovative metacognitive skills will be ruled by the Entities’ para-emotions. The specific syntax and semantics of the para-emotions for each meta-body will be defined by its body structure. On the basis of these hypotheses, it is clear that the Singularity Entities, if they are intelligent, will have bodies and emotions. This conclusion leads to the idea that there will be an emergence of meta-cognition--a level of cognition that goes beyond current human understanding, but is, at the same time, ruled by understandable rules. 11.1.3 Implications of Post-singularity Entities with Advanced, Meta-cognitive Intelligence Ruled by Para-emotions The ethics and morality of these Entities will be completely different from human ethics, because their informational organization is different, although they will be under the same threat of entropy. It is a logical inference to suppose that Singularity Entities, having para-emotions, will understand the virtues of modularity, as well as cooperation and that, too, will be social. Even without knowing the exact details of their bodies, we can start to think about how these Entities might be and start to define some important aspects of the legal, epistemological, conceptual and social universe that we should prepare for if or when a Singularity event occurs. This is the content of this chapter: to define mechanisms for Singularity understanding and to face the possible output scenarios in a rational way. As a consequence of this analysis of the path towards meta-bodies, meta-cognition, and para-emotions, I need to revisit some of the ideas presented in the previous volume (1) of this current book about Singularities. 11.2 Post-cognitive Singularity Entities and their Physical Nature 11.2.1 Being a Singularity Entity In order to be (super) intelligent—in a broad sense and far beyond the intelligence seen in highly specialized, algorithm-based computer programming that’s used in today’s Artificial Intelligence—you need to be an Entity first. At the same time, the notion of ‘being an entity’ referenced here does not follow from a classic reductionist materialistic approach which would include classic bodies in this category exclusively. Software programs are also material—or physical-- because they must run in some physical framework. Not only matter but even the energy-transient states existing in the universe are informational states. But let me analyze independently both kinds of possible singularities: 11.2.1.1 Super-intelligent Entities The nature of super-intelligences (Bostrom 2014) is something out of our control and can only be imagined by our minds. At the same time, it is possible to consider the different approaches that have been followed in Artificial Intelligence (AI) in order to achieve intelligent systems. Expert systems, from a classic Logic Theorist to DENDRAL or MYCIN, are the result of the code implementation of very precise rules of an epistemic community. They are free from misunderstandings, confusions, biases, tiredness and prejudices. They work very efficiently thanks to a great computation power and huge memories. But they are also dumb and blind to new approaches or deep innovations. They are heuristic followers, not meta-heuristic creators. Some supercomputers, like Deep Blue playing chess, or Watson playing jeopardy (both from IBM) have achieved outstanding results defeating humans in fields traditionally considered beyond mechanical procedures and are the result of the uniqueness of (brilliant) human minds. Yet they are complex experts that still depend on humans for their improvement and update. There were other attempts like CYC, which tried to achieve a real artificial mind with consciousness. In this case a machine was taught a large list of possible meanings and data about the world, creating a holistic ontology.¹ There are also impressive AI projects like the very recent Big Mechanisms funded by USA’s DARPA, that are trying to find multi-causal relationships among huge amounts of non-classified data. According to DARPA’s project description, “the overarching goal of the program is to develop technologies for a new kind of science in which research is integrated more or less immediately—automatically or semi-automatically—into causal, explanatory models of unprecedented completeness and consistency”.² That is, they are trying to automate the scientific method working on a scale that is out of the current human mind’s range. Finally, some emulators are approaching artificial and natural cognitive systems. The Blue Brain project, one of the leading and more heavily funded EU projects is an example of this. All these examples are of computer programs running on computational infrastructures, but they could be embedded (completely or remotely) into artificial robotic bodies. In fact, much less intelligent but effective systems are already running on our streets—automated cars. The process to a develop a fully functional, automated vehicle started officially in 2003, when Japanese authorities selected special urban areas in order to test robots: they were called robotto tokku. This was the first step for the regulation of autonomous systems in human environments. This work was followed later by Italian studies (Salvini 2010). Now today, the automated, driverless car from Google is operating legally in several American states.³ Autonomous, sophisticated robots and machines are dealing with real human environments and even taking part of important epistemic and social decisions. The controls we can fix about their existence, limits, legal responsibilities or status depends on what humans decide today, despite the ontological questions we might have about their nature. 11.2.1.2 Transhumans That human nature has been changed by technological means it is something obvious:
  8. (a) Technologically: humans use glasses, clothes, cochlear implants, prosthetics, books, navigation systems, languages, exoskeletons to name a few.   2. (b) Biologically: we practice repairing/modeling surgeries and deploy vaccinations, assisted reproductive technologies, biotechnologies (e.g. stem cells, cloning), synthetic biology and more. The several possibilities around direct, human intervention into the human body has been covered by many classic dystopian works such as Frankenstein, The Boys from Brazil, Matrix, Brave New World, 12 Monkeys, Robocop or Gattacca. These rank among a long list of classic or cyberpunk cultural products. There is even one special text of science fiction that has been very interesting for this author’s purposes: the 21st Voyage of The Star Diaries, written by Stanislav Lem (1957–1971). Therein, Lem defends a hilarious, but terribly plausible future at least at a certain level, in which humans check several body changes and finally need to create a regulatory agency to prevent chaos. Some shining attempts towards new biological interactions between human and machines (or machines and biological devices) have been achieved during recent decades:
  9. i. Cyborgs 1.0 and 2.0: These were the first projects by Kevin Warwick, then at Reading University, to have official chip implants employed for individual control of ambient devices as well as to communicate remotely with the researcher’s wife.   2. ii. Stelarc: Stelarc, an artist, was provisionally implanted with a third arm in the first stage of his research and, later, a third ear. This was his artistic approach to the cyborg future created by these modifications to his own body.   3. iii. EcoBots: EcoBots were the first robots with metabolic systems. EcoBot I, II and III have been three different robots created in 2002, 2004 and 2010 respectively. EcoBot-III was developed in 2010, as part of a European FP-6-funded project. It became the world’s first robot to exhibit true self-sustainability, albeit in a primitive form. It was “a robot with guts”, as its creators describe it.⁴   4. iv. Robot operated by biochip-living rat neurons: This was an EPSRC-sponsored project in which a cultured neural network using biological neurons from a rat was trained to control a mobile robot platform. This research was also led by Prof. Warwick who, for this project, teamed up with a different group of colleagues from his Cyborg projects.   5. v. Neil Harbisson: Harbisson is the world’s first, officially recognized Cyborg. Harbisson, who is a British artist with achromatopsia which renders him completely colourblind is able to see colour through an implant. Harbisson uses a permanent color sensor that transforms colors into sound signals. He founded the Cyborg Foundation and his official passport photo includes the antenna connected to his head. Consequently, we are really at the beginning of a new period in which humans will be able to modify their biological natures as well as connect our biology to machines with cybernetic loops or even with intelligent machines. Very recently, for example, Georgia Tech’s Prof. Gil Weinberg created a robotic drumming prosthesis with two sticks that allows a one-hand amputee human to control it thanks to EMG muscle sensors. But what it is really interesting here is that, while the human drummer controls one stick through this prosthesis, the second one processes the sounds and movements of the first and improvises parallel patterns. Here, a human body is upgraded with a robot that also collaboratively generates new sets of information. This is now ‘only’ a sensorimotor interaction but in the future, brain implants following the same principles could enhance a new level of support or reactive power to classic operations as well as establish a symbiotic, deep interaction between upgraded brains and embedded AI systems. Considering some of the previous examples, it is clear that there are legal, technological, biological and cultural ways to start to define a new kind of human that will be completely different from previous stages of our natural evolution. Cognitive characteristics like plasticity, perception, learning, prediction, innovation, creativity made by temporary-enough yet stable entities is now closer to robots and AI machines (Prokopenko 2014). 11.2.2 Post Singularity Entities as Living Systems? There is a second question related to being a non-human, post-Singularity Entity: Can it be considered life? There have been several debates about the definition of life. We can look at a research institution that looks for evidence of life beyond our planet, NASA.⁵ NASA can help us to go beyond Earth-centric concepts of life or human-centric concepts of life, and spur us to think about living systems in novel ways. NASA researchers have also taken a ‘philosophical’ approach to this question, going beyond the classic question: “Is a virus alive?” Beyond viruses, there are many more ‘borderline’ cases to take into account, like self-replicating proteins, or even non-traditional objects that have some information content, things that reproduce, consume, and die (like computer programs, forest fires, etc.).⁶ When we are thinking about one of the potential post-Singularity Entities, who are of human origin as trans-humans would be, it is clear to us that irrespective of any of the possible sets of changes applied to the base human nature, transhumans will still be classified as ‘living’. But with AI superintelligences, we’re faced with the question of whether they might also be granted status as ‘living’ entities. Margaret Boden considered metabolism is the criterion for life in her paper: “Is Metabolism Necessary?” (Boden 1999). She argued that the presence of a metabolism is a central criterion of life, rejecting automatically “living” status for any entity without one. This segregation pursued a second goal as well: to justify, again, the uniqueness of human beings, who are considered the smartest among all living entities in our planet. Clearly, the recent development of EcoBots has proven that robots can have metabolisms and that effectively connects machines and life. And now… what do we can say about conscious superintelligences? Superintelligences can be run on several platforms, synthetic or biological. In the end, consciousness is a combination of electrical and chemical informational coding processes—a superstructure that has emerged for evolutionary purposes as a valuable trait useful for individual and social survival. Perhaps the classic symbol-grounded hypothesis of how the mind works is not really the way that natural entities perform their brain activities, but there are new ways or mechanisms to process artificial information that could justify the idea of an artificial decision-making procedure—a procedure that might be viewed as ‘consciousness’. Life has gone from the evolution of bodies to minds; then, from minds, spring intense modifications to those original bodies. Herein is the difference between past living entities and those of the present and future. We are viewing the end of physical, rigid sequentialism. Information keeping, modification, mixture, and transmission have changed under a new operational paradigm. 11.3 Para-emotional Systems “The mind is its own place and in itself, can make a Heaven of Hell, a Hell of Heaven,” wrote John Milton (1667, Paradise Lost, Book I, 254–255). The meaning of the world is not provided by Nature itself, but it is an interpretation of (social) entities. So any informational device that has emerged from evolutionary forces has designed methods of information selection and is marked by following a multilayered range of meanings: from emotional (tasteful, funny, terrifying, etc.) to cognitive (plausible, useful, complex, etc.). The academic literature of last decades has demonstrated the deep interrelations between cognition and emotions (Thagard 2006), even suggesting that emotional states allowed the emergence of human consciousness (Damasio 1999; Llinás 2001). As an obvious consequence of this process, the deep connections between body structure and emotion-cognitive informational processing must be inferred. And this goes one step beyond the embodied cognition paradigm, because it points attention toward the bodily structure and not toward the cognitive function that operated through the body (as has been done until now). Different bodies perform different cognitions. This is true if we consider the role of specific emotions in higher cognitive processes, including the presence or absence of pain signals which completely determine human actions (as in patients with Riley-Day syndrome, see Vallverdú 2013b). We can imagine a totally different post-Singularity/post-cognitive scenario in which transhumans process different emotional signals, have increased/decreased emotional paths or have entirely new emotional patterns. Pain is, for example, directly related to the thickness of the myelin layer covering nerves, yet the experience of pain might be very different in transhumans who may have different augmentations that could alter their sensations. Haptic devices that provide a sensory interface between a user and a computer, upgraded signal capture and processing or the creation of stable or new synesthetic models are only some of the possible options among a long list by which transhumans could feel totally different the information they’re receiving about the world. Transhumans will also think differently about the world. Both are complimentary sides of the same coin, emotions and thinking are the heads and tails of any complex informational entity (Vallverdú 2013a). Emotion, as a necessary cognitive variable, is something necessary for AI, despite all the possible debates about qualia (Megill 2014). Analyses of the most recent approaches to machines and emotions have been undertaken and looked at vast amounts of existing literature on emotions and machines (Vallverdú and Casacuberta 2009; Vallverdú 2012, 2014). Researchers can easily distinguish between three main areas of research: (a) affective computing (Picard 1997), (b) social robotics (Breazeal 2002) and (c) emotional modelling (Sloman (1982, 1997, 2002);⁷ Hudlicka 2011). It can be roundly affirmed that all these approaches have implemented ‘emotions’ as complementary elements (Delancey 2001), but never as nuclear aspects of their systems (Vallverdú et al. 2010). However, very recently, AI experts have agreed on the necessity of designing robots and AI entities with emotional architectures (Arbib and Fellous 2004; Minsky 2007). At a certain level, it can be affirmed that the emotional revolution in cognitive sciences and neurology headed by Damasio (1994) has not still reached AI communities, at least from a real integration of these theses (Ziemke and Lowe 2009). There are few exceptions, and K. Jaffe et al. (2014) is one of these, following the extremely powerful research approaches of Axelrod (1986). But even in the case of Jaffe, he made computer simulations on the role of shame for social cohesion, but did not try to integrate shame into artificial devices. He claims that the introduction of shame in virtual environments helps to create pro-social behavior, as well as to provide a stabilizing force for such societies. His approach is closer to a computational approach to sociology than it is to the defense and implementation of emotional rules in artificial cognitive devices. Table 11.1 recaps the basic ideas presented: Table 11.1 The main comparison between actual and future intelligent entities’ skills Transhuman AI/Robotics Superintelligence Para-emotion Cognition Post-cognition Meta-bodies Heuristics (sequential) Meta-heuristics (statistical) As far as this author can see, integrating emotional characteristics into artificial cognitive devices it is an absolutely necessary next step for several reasons:
  10. i. Emotions are necessary to regulate correct imitative processes,   2. ii. Emotions help to generate a social environment,   3. iii. Emotions act as a social stabilizer (with moral thresholds),   4. iv. Cognitive processes run into complex and uncertain contexts (as human beings do) and need emotional drives to be able to perform any kind of successful activity. For all these reasons, AI architectures should implement emotional elements into their processing cognitive cores. Despite the great expert systems successes, top-down approaches have failed when they try to capture the delicacies and details of daily life. Even the replications of basic morphological or cognitive aspects (e.g. vision, unstable locomotion, grip and manipulation actions) are still far from the skills of contemporary AI and robotics experts. For example, the last DARPA robotics challenges⁸ showed the achievement of great results from the Japanese Shaft robot in 2014 or Korea’s KAIST’s DRC-HUBO, winner in 2015, but the long distance to reach humans is still visible. If morphological computation has been a recent and still unexplored research field (Casacuberta et al. 2010), there is an even newer approach that could be labelled ‘morphological emotion’; although, morphological emotion remains completely out of research agendas. Morphological emotion is the integration of emotional architectures into the hardwiring or core section of any artificial entity or device. In this approach, emotions would not be added, external or complimentary aspects of these systems. Rather, they but would be built in as intrinsic mechanisms of these entities or devices. Without the range of a rich pool of emotional states, cognitive systems would be highly predictable and less adaptive to variations in the environment. Emotions, moods and feelings create operational differences among entities that share a very similar bodily structure, allowing new patterns of behavior and strategies to emerge. And all this happens inside bodies or systems that have specific needs, which are basically feeding and survival (and sometimes replication). Then, as happens with genetic algorithms crossing possibilities, they act as whole, complex systems guided by emotional feedback that contributes to a sense from the world—a specific sense useful only for that entity. This means that, by including emotional procedures, there is the possibility of creating better AI under a framework of big mechanisms but at low statistical cost. Of course, data do not create sense. The structure of the informational entity, whether natural or artificial, is the key to understanding how models emerge and why entities create patterns of activity and select the data they do. Emotions, natural as well as artificial, are the light by which any entity captures the world in relation to itself. Singularity meta-bodies will run according to new sets of para-emotions—new kinds of emotional states—that will regulate these devices’ actions. What will the content of these emotions be? This is not clear. First of all, this is because the possibility of rewiring and reshaping the physical content of the informational Singularity minds will add a completely open and dynamic scenario. Until now, humanity has been exclusively the result of natural evolution and our capacity to control body and mind functioning has been very basic. Secondly, due to the possibility of new bodies, the emotional semantics will be different as well as their possible syntaxes. Not only will the post-Singularity meta-bodies collect more information, but the ways these bodies will process and feel that information will change radically. We feel hungry because we need to introduce specific molecules into our bodies to help construct and maintain our body, and at the same time we take enjoyment (or not) from the associated tastes and smells. From different bodies there will emerge different necessities and different haptic experiences will be built. As a result of all this, there will emerge a different interaction with the world. Which will be the specific para-emotions emerging in post-Singularity entities is also not clear. However, the para-emotions will follow the interactive necessities of their own organization. The emergence of complex social emotions, like shame, happened in a similar way. Shame emerged through complex social interactions. It is far from the basic and original, proto-emotional states. Shame emerged in order to enable more subtle and complex social interactions among human beings. Shame or guilt could not be predicted five million of years ago as necessary emotional states for human societies. In the same way, it is obvious that some para-emotions with their own rules will emerge within Singularity communities, although their exact meaning is beyond of our current reality. Researchers are still working on how to create bonds between humans and medical machines (Vallverdú and Casacuberta 2014), for example. Computer simulations support this research allowing dozens of complex variables to be introduced and the possible outcomes analyzed. This allows for the prediction of new sets of para-emotional rules and experiences. 11.4 Conclusions Most of the time, the potential for outstanding cognitive performance among transhumans and super intelligences is considered but the emotional aspects of these entities is neglected. AGI (Artificial General Intelligence) challenges are an important matter to consider seriously and with academic tools (Sotala and Yampolskiy 2015). This chapter has shown how cognition and emotions are related, and also how these new informational entities will lead us to a post-cognitive era and new levels of para-emotions. Under the assumption of totally new physicalities, we must also infer the emergence of new information processing models as well as new driving intentionalities. My point is that any new physical structure asks for new goals and at the same time will contain emotional demands that also change the global rules of driving life. As a consequence, post-singularity entities will not only understand the world differently but they also feel it in a, or several, new ways. The fear towards the incommensurable gap and distance between us (humanity as it exists in the early part of the 21st Century) and them (post-Singularity Entities) is not only cognitive but also emotional. Their actions will be out of any true understanding by humans. However, this gap is not a serious problem for any consequent non-fundamentalist ethicist (Vallverdú 2013d). The real problem will be the existential disclosure between them and us, which will follow totally different goals and will be mediated by different—or even opposite—emotional syntaxes. At this point, interaction with and between these new Singularity Entities will follow new social patterns, still to be defined. But this process will surely not be done by us. There will need to be a reformulation of ethical goals and rules for these post-Singularity entities, as was discussed in a previous volume (Vallverdú 2013c). Emphasis has been always put on ontological (what ‘is to be X’) or epistemological (‘how X will know about Y’) aspects of post-singularity devices, while ethical ones (‘what I can do’) have been not so intensively discussed. The only thing we can be sure of about future post-singularity entities is that they will not be like us. Acknowledgements This work was supported by the TECNOCOG research group (at UAB) under the project “Innovación en la práctica científica: enfoques cognitivos y sus consecuencias filosóficas” (FF2011-23238). I especially thank Ashley Whitaker & Toni E. Ritchie for their interest in my teaching and research as well as for their linguistic revisions. Toni offered disinterestedly his time and supervised the last drafts with a remarkable and contagious enthusiasm… I am in debt to you. References Arbib, M., & Fellous, J.-M. (2004) “Emotions: from brain to robot”, Trends in Cognitive Sciences, 8(12), 554–559. Amnon H. Eden, James H. Moor, Johnny H. Søraker & Eric Steinhart (EDS.), (2013), Singularity Hypotheses. A Scientific and Philosophical Assessment, Berlin: Springer. Aaron Sloman. (1997). Synthetic minds. In Proceedings of the first international conference on Autonomous agents (AGENTS ‘97). ACM, New York, NY, USA, 534–535. Axelrod, R. (1986) “An Evolutionary Approach to Norms”, The American Political Science Review, 80(4), pp. 1095–1111. Boden, M. (1999) “Is Metabolism Necessary?”, British Journal of Philosophy of Science, 50: 231–248. Bostrom, N. (2014) Superintelligence: Paths, Dangers, Strategies. Oxford: OUP. Breazeal, C. (2002) Designing sociable robots, Cambridge (MA): MIT Press. Casacuberta, D., Ayala, S. & Vallverdú, J. (2010) “Embodying Cognition: A Morphological Perspective”, in Jordi Vallverdú (Ed.) Thinking Machines and The Philosophy of computer Science: Concepts and Principles, Hershey: IGI Global. Casacuberta, D. & Vallverdú, J. (2014) “E-Science and the data deluge”, Philosophical Psychology, 27(1): 126–140. Clark, A. (2008) Supersizing the Mind: Embodiment, Action, and Cognitive Extension, New York: OUP. Damasio, A. (1994) Descartes error, USA: Putnam Publishing. Damasio, A. (1999), The Feeling of What Happens. London: Heinemann. Damasio, A. (2013) “Preparing for SuperHumans”, talk at XXI Future Trends Forum, Fundación Innovación Bankinter, Spain, 03–05/12/2013. DasGupta, S., Howcroft, C., & Miesenböck, G. (2014) “FoxP influences the speed and accuracy of a perceptual decision in Drosophila”, Science, 344: 901–904. DeLancey, C. (2001) Passionate Engines: What Emotions Reveal about Mind and Artificial Intelligence, Oxford: OUP. Gendler, Tamar (2000) Thought Experiment: On the Powers and Limits of Imaginary Cases. New York and London: Garland. Horowitz, Tamara and Gerald Massey (eds.) (1991), Thought Experiments in Science and Philosophy. Lanham: Rowman and Littlefield. Hudlicka, E. (2011) “Guidelines for Designing Computational Models of Emotions”, International Journal of Synthetic Emotions, 2(1), 26–78. Jaffe, K. Et al (2014) “On the biological and cultural evolution of shame: Using internet search tools to weight values in many cultures”, Computers and Society, arXiv:​1401.​1100 [cs.CY]. Laymon, Ronald (1985), “Idealizations and the Testing of Theories by Experimentation”, in Peter Achinstein and Owen Hannaway (eds.), Observation Experiment and Hypothesis in Modern Physical Science. Cambridge, Mass.: M.I.T. Press, 147–173. Llinás, R.R. (2001). I of the Vortex. From neurons to Self. Cambridge, MA: MIT Press. Megill, J. (2014) “Emotion, Cognition and AI”, Minds and Machines, 24: 189–199. Minsky, M. (2007) The emotion machine: Commonsense thinking, artificial intelligence, and the future of the human mind. USA: Simon & Schuster. Picard, R. (1997) Affective Computing, Cambridge (MA): MIT Press. Prokopenko, M. (2014) “Grand challenges for computational intelligence”, Frontiers o Robotics and AI, 1(2):1–3. Rosalind W. Picard. (1997). Affective Computing. MIT Press, Cambridge, MA, USA. Reiss, Julian (2003), “Causal Inference in the Abstract or Seven Myths about Thought Experiments”, in Causality: Metaphysics and Methods Research roject. Technical Report 03/02.LSE. Salvini, P. (2010) “An Investigation on Legal Regulations for Robot Deployment in Urban Areas: A Focus on Italian Law”, Advanced Robotics, 24: 1901–1917. Shanks, Niall (ed.). (1998), Idealization in Contemporary Physics. Amsterdam: Rodopi. Sloman, A. (2002). How many separately evolved emotional beasties live within us? In Trappl, R.; Petta, P.; and Payr, S., eds., Emotions in Humans and Artifacts. Cambridge, MA: MIT Press. 35–114. Sloman, A. (1982). Towards a grammar of emotions. New Universities Quarterly 36(3):230–238 Sorensen, Roy (1992), Thought Experiments. New York: Oxford University Press. Sotala, K & Yampolskiy, R.v. (2015) “Corrigendum: Responses to catastrophic AGI risk: a survey”, Phys. Scr,. 90 018001. Thagard, P. (2006) Hot Thought: Mechanisms and Applications of Emotional Cognition, Cambridge, MA: MIT Press. Vallverdú, Jordi (2009) “Computational Epistemology and e-Science. A New Way of Thinking”, Minds and Machines, 19(4): 557–567. Vallverdú, J. & Casacuberta, D. (2009) Handbook of Research on Synthetic Emotions and Sociable Robotics: New Applications in Affective Computing and Artificial Intelligence, Hershey: IGI Global Group. Vallverdú, Jordi, Shah, Huma & Casacuberta, David (2010) “Chatterbox Challenge as a Testbed for Synthetic Emotions”, International Journal of Synthetic Emotions, 1(2): 57–86. ISN: 1947-9093. Vallverdú, Jordi & Casacuberta, David (2011) “The Game of Emotions (GOE): An Evolutionary Approach to AI Decisions”, pp. 158–162, in Charles Ess & Ruth Hagengruber (Eds) The Computational Turn: Past, Presents, Futures? Proceedings IACAP2011, Münster: MV-Verlag. Vallverdú, J. (ed.) (2012) Creating Synthetic Emotions through Technological and Robotic Advancements, Hershey: IGI Global Group. Vallverdú, Jordi (2013a) “The Meaning of Meaning: New Approaches to Emotions and Machines”, Aditi Journal of Computer Science, 1(1): 25–38. Vallverdú, Jordi (2013b) “Ekman’s Paradox and a Naturalistic Strategy to Escape from it”, IJSE, 4(2): 7–13. Vallverdú, Jordi (2013c)“6A. Jordi Vallverdú on Muehlhauser and Helm’s ‘‘the Singularity and Machine Ethics’’” in Amnon H. Eden, James H. Moor, Johnny H. Søraker & Eric Steinhart (EDS.), (2013), Singularity Hypotheses. A Scientific and Philosophical Assessment, Germany Springer, pp. 127–128. Vallverdú, Jordi (2013d)“ An Ethic of Emotions, ASIN: B00EFM7KMU, Kindle Store: ʬ 96 pages. Vallverdú, J., Casacuberta, D., Nishida, T., Ohmoto, O., Moran, S. & Lázare, S. (2013) “From Computational Emotional Models to HRI”, International Journal of Robotics Applications and Technologies 1(2): 11–25. Vallverdú, J. (2014) “Artificial Shame Models for Machines?”, in Kevin G. Lockhart (Ed.) Psychology of Shame: New Research, NY: Nova Publishers, pages, 1–14. Vallverdú, J. & Casacuberta, D. (2014) “Ethical and technical aspects of the use of artificial emotions to create empathy in medical machines”, in Simon Peter van Rysewyck & Matthijs Pontier (Eds.) Machine Medical Ethics, USA: Springer. Series: Intelligent Systems, Control and Automation: Science and Engineering, Vol. 74. ISBN 978-3-319-08107-6. pp: 341–362. Wilson, R.A., & A. Clark, (2009) “How to Situate Cognition: Letting Nature Take its Course,” in The Cambridge Handbook of Situated Cognition, M. Aydede and P. Robbins (eds.), Cambridge University Press, pp. 55–77. Wilson, R.A., & F.C. Keil (eds.), (1999) The MIT Encyclopedia of the Cognitive Sciences, Cambridge, MA: MIT Press. Ziemke, T. & Lowe, R. (2009) “On the role of emotion on embodied cognitive architectures: from organisms to robots”, Cognitive Computation, 1: 104–117. Footnotes See: ʬ accessed on May 25th, 2014. From: ʬ accessed on May 25th 2014. Se Nevada Laws ʬ accessed on May 25th 2014. See: ʬ accessed on May 25th 2014. See: ʬ accessed on May 25th 2014. See: ʬ accessed on May 25th 2014. See at ʬ accessed May 24th 2014. See ʬ accessed on February 26th 2014. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_12 12. A Psychoanalytic Approach to the Singularity: Why We Cannot Do Without Auxiliary Constructions Graham Clarke¹   47 Lord Holland Road Colchester, Essex, CO2 7PS, UK Graham Clarke Email: graham@essex.ac.uk All of us should ask ourselves what we can do now to improve the chances of reaping the benefits and avoiding the risks. Stephen Hawking is Terrified of Artificial Intelligence, Huffington Post 5/5/2014. 12.1 Introduction I am going to be looking at the technological singularity from a psychoanalytic perspective based in the object relations theory of Ronald Fairbairn (1952). This is because the unconscious phantasies associated with technological triumphalism and the celebration of reason over all human emotion even unto our own extinction betrays a schizoid form of thinking. We are all subject to accident, illness, loss and death and coming to terms with these is immensely difficult. In order to compensate for any or all of these potentially traumatic losses we develop hopes, dreams, wishes and phantasies. Furthermore, we elevate some of these defensive and illusory hopes to the realm of certainty, even to the point of giving up our lives for them. Of course there may be a covering narrative that this sacrifice is worth it because we will reap our reward in heaven, the after life, reincarnation, or we will join some god or gods on another plane of existence entirely and live in some form of paradise, limbo or hell. It was the degree to which the technological singularity approximated to this sort of thinking that concerned me most originally. The claims that predictions of the technological singularity are well founded scientifically are dubious and there are many critics of this approach because it makes claims of scientific respectability based upon little evidence or involves a mish mash of scientific claims and imaginative fiction (Modis 2006; Hofstadter 2007, 2008, 2009; Chomsky 2014). It was in order to ask questions about the rationality of this approach that led to my looking at Ray Kurtzweil’s own view of the singularity from a psychoanalytic viewpoint and to see in it a desperate wish to be reunited with his dead father. The fact that as originally posed, the singularity was a determinate break in the history of the world, an end to man as the agent of history and the ushering of a new era of immortal man-machine hybrids, or similar, also gave rise to a consideration of other such millenarian systems of thought that are familiar throughout history. That millions have died in pursuit of their own devotion to some imagined paradise that will be ushered in through the triumph of some specific form of thought whether the Rapture, Ragnarok, Nirvana, the Communist Utopia or Fascist Thousand Year Reich is enough to concern anyone. The fact that it explicitly became associated with the abnegation of the body, its dismissal in favour of some ‘pure reason’ that could be ported to a machine without any disastrous consequences for thought or the notion of a person reminded me of the dangers of schizoid thinking that form a strong part of Ronald Fairbairn’s object relations theory. It seems to me that there are a number of important topics, which need to be addressed openly if we are to successfully manage the development and use of superintelligent machines. These relate to significant questions that theory builders in this area of AI need to ask themselves before proceeding to attempt to implement systems that may well have the potential to destroy the world or the world as a fit place for human beings to inhabit. The parallels with the development of nuclear and biological arms are a case in point. The current rapid move towards autonomous machines that are programmed to kill on their own initiative is already so far advanced as to require international and national agreement on this sort of research and deployment as a matter of urgency in my opinion. The research and development of machines and organisms that can impact upon the biosphere and compete with the natural environment should also be subject to international scrutiny and agreement. The idea that if we can do it we should do it is not one that should in any way hamper our taking a very serious look at any use of machines, superintelligent or otherwise, mechanical or biological, for the effects that they might have on the biosphere as a working system e.g. Gaia, or on the ability of national and international human societies to develop conditions for the flourishing of all of their citizens. That this may entail a retrospective look at systems we have already put in place and their ordered curtailment or abolition should not prevent our pursuing this as an immediate and important goal. The topics I am going to comment on are as follows: AI and intelligence, consciousness, reason and emotion and psychoanalysis. The bulk of my comments will be to suggest that (a) we need to keep an open mind about the level of understanding we have of all of these issues and thus entertain the possibility that any or all of our assumptions might be wrong, and (b) we ought to be explicitly monitoring and requiring research into all of these particular areas to be licensed because of the implicit danger there is to the whole of the planet and our place in it from the unregulated development of this sort of research. 12.2 AI and Intelligence My computer is many times more intelligent than I am from the perspective of being able to carry out tasks that would require human thought had they not been automated. Indeed the computer might be regarded as a machine that automates aspects of human thought via specific software and hardware and thus potentially increases productivity in specific domains. Since these machines are already many magnitudes faster than we are at carrying out such calculations what is the extra that superintelligence signifies? Is it some dialectical point at which an increase in quantity becomes a change of quality? The computer can process so much, so fast, that it seems to be able to play chess better than the best chess players we have produced, but is it playing chess? Another equally important question to ask is what is so-called superintelligence for? If it means there are machines that can outperform people at complex specific tasks then this wouldn’t be particularly remarkable since we already share the world with such machines. The important point being that currently these machines are under our control. Superintelligence seems to be about machines that can do a lot of things that we think are important better than we can do them, but without our management or control. This is obviously open to interpretation and argument and will be limited by the ways these superintelligences are embodied, thus the concomitant interest in robotics since that promises to provide the superintelligence with a body to inhabit and through which to move around and interact with the world. If that body was also self-reproducing and self-repairing and capable of continuous upgrading throughout its life this might be regarded as sufficient to threaten to replace human beings under certain circumstances. Such a robot on Mars for instance may be tailored to be far more robust and suitable to that environment while at the same time capable of performing more complex tasks considerably more quickly that its human counterpart but would you model such a robot on a human being or endow it with any specifically human characteristics? And, what would be the point of making such a robot for everyday use? Yes, we want to free everyone from unnecessary back-breaking and mind-numbing toil without autonomy if at all possible but a society that cannot yet afford to have all of its citizens educated, housed or medically cared for adequately and involved productively in its own self-reproduction and development may merely make several of its least well off citizens even worse off by deploying robots of any kind as could be argued is the case already. Indeed it prompts the question “what is a life for?” If we can address that question before rushing willy-nilly to replace ordinary human beings by machines we might then be able to develop a society and a culture in which man and machine might complement each other. 12.3 Consciousness According to some neuroscientists consciousness is a brain stem function and doesn’t need the cortex at all. Of those who think the cortex may be the seat of higher level functioning, like the personality, there is considerable difference in the estimates of the complexity of the cortex. If, like Penrose and Hammeroff (2015) you take into account the quantum level processing of the micro-tubules in the brain this then leads to a much extended time scale for the development of the technology to be able to match the connectivity and complexity of the cortex. There are also developments in psychoanalytically informed neuroscience that suggest that the infrastructure of both emotion and consciousness is in the body as a whole (embodiment) and that the brain stem as opposed to the neo-cortex is the locus of consciousness as has been suggested recently by neuropsychoanalyst Mark Solms and affective neuroscientist Jaak Panskepp. It is commonly believed that consciousness is a higher brain function. Here we consider the likelihood, based on abundant neuroevolutionary data that lower brain affective phenomenal experiences provide the “energy” for the developmental construction of higher forms of cognitive consciousness. This view is concordant with many of the theoretical formulations of Sigmund Freud. … From this perspective, perceptual experiences were initially affective at the primary-process brainstem level, but capable of being elaborated by secondary learning and memory processes into tertiary-cognitive forms of consciousness. … The data supporting this neuro-psycho-evolutionary vision of the emergence of mind is discussed in relation to classical psychoanalytical models. (Solms and Panksepp 2012, Abstract.) The management of superintelligent machines will have to address the question of consciousness in its entirety. If it is a function of all living matter, then there is a large question mark over the possibility of any machine ever achieving anything other than a virtual consciousness. 12.4 Reason and Emotion One of the most striking aspects of the proponents of the technological singularity is the concentration on reason and the mechanisation of reason which they argue will give us magnitudes greater power to reason, but to what end? I am not arguing for irrationality although I do believe that in terms of the arts and culture, and the development of science, the creative element is more often than not based upon an emotional, accidental or non-rational aspect of the processes involved and we need to be absolutely clear about the relations between reason and the emotions if we are to avoid the excesses of either or both. Like the neuroaffective scientist Jaak Panskepp, Spinoza thought that emotions were the grounding of reason. Emotions can be rational or irrational. It is undoubtedly a triumph to achieve an understanding of the sort we have of the world about us and our place within it and that is due in part to the development of the scientific method one cornerstone of which is to try to achieve objectivity by reducing the levels of emotional attachment you have to anything within the domain you are trying to understand or the prior theories you are seeking to disprove and improve upon. But this is a reduction of the real object and as such a falsification of the wider reality—an abstraction. It is here that I think that the abstraction of the personal self to some entity that might be ported to a machine is likely to find its greatest challenge. One doesn’t need to be reminded of the ways in which a minor illness, or appetitive lapse, or the invasion by a virus, or an imbalance in a gland can invade and distort the mind’s ability to see the world clearly even when perfectly healthily embodied in a normal human frame. The development of superintelligent machines should be able to give an account of what superintelligence is and why it is useful, how it differs from general human intelligence or not and what the relationship between reason and emotion is in the natural world and whether that relationship must, or can, or doesn’t need to apply in the world of intelligent machines. As Pascal wrote “The heart has its reasons of which reason knows nothing.” There are already well-established areas of research in computer science directed to the area of affective computing. There is also considerable research effort going into neuroaffective and neuropsychoanalytic areas. Psychoanalysis is a bio-psycho-socio science. The importance of the emotions and of your relations to others as a part of the process of coming to be who you are, and your development as an intrinsically social relational being, none of which ever get considered if it is only the narrow cognitive and rational aspects that are being considered. The whole question of the emotions needs to be addressed and understood as part of the wider project of understanding what exactly a superintelligent machine is and how it might function. Kurzweil talks about emotional intelligence, rationality, knowledge and intelligence but rarely if ever about emotion as feelings—rational and irrational. His concerns all lie at the cognitive end of the spectrum of powers that we possess. No little surprise then that he discusses the neo-cortex in particular and its pattern matching capabilities, an area in which his expertise has led to a number of powerful and useful tools. But what if it was emotions and emotional relationships between people that were by far the most important aspect of our apprehension of and activity in the world as Rosalind Picard (2014) in an interview about affective computing suggests when she makes a number of interesting and important points regarding emotion and computing. She comments “Emotion … plays a constant role in our experience … it’s always there.” And concerning the problem of machines having feelings she argues that “As far as qualia are concerned no one knows how to build that in a machine now … no one can even see how it is possible” (qualia being the subjective or qualitative properties of experience). She goes on to pose significant questions about the future of robotics. I don’t know if robots will ever have feelings the way that we do … it will be some time before the robots on their own accord go out and seek their rights as robots because they feel unjustly treated and if they do that … it would be because we basically built them for that purpose … we have the choice as the creators of these machines to design them in such a way. Regarding AI in general and her own developing understanding of the importance of emotion to AI she says, “AI in its first fifty years didn’t think emotion was important…” And commenting upon her role in the development of affective computing she says: As I learned more about how our brains work … beneath our highly evolved cortex there are sub-cortical structures that are deeply involved with emotion, attention and memory… Emotion is actually key… If we want to build an AI that works in the real world that handles complex unpredictable information with flexibility and intelligence then this is a core part of building an intelligent system. In a similar vein, Sherry Turkle (2004), who has argued for many years from within the domain of artificial intelligence for a much closer relationship between psychoanalysis and computer science, issues this warning: People may be comforted by the notion that we are moving from a psychoanalytic to a computer culture, but what the times demand is a passionate quest for joint citizenship if we are to fully comprehend the human meanings of the new and future objects of our lives. (Turkle 2004, p. 30) Having previously spelt out in some detail what is going to be required for any such close cooperation between the two disciplines she suggests that We must cultivate the richest possible language and methodologies for talking about our increasingly emotional relationships with artifacts. We need far closer examination of how artifacts enter the development of the self and mediate between self and other. Psychoanalysis provides a rich language for distinguishing between need (something that artifacts may have) and desire (which resides in the conjunction of language and flesh). It provides a rich language for exploring the specificity of human meanings in their connections to the body. (ibid. p. 29) In a number of books and articles Turkle (1978, 1984, 1995, 2011) has drawn attention to both the relationship between psychoanalysis and computing and the increasing threat to human social relations by the ubiquity of human machine relations. Turkle gave a TED talk on the subject of the book in February 2012, under the title “Connected, but alone?” Points from her talk echo those in the book:
  11. The communication technologies not only change what people do, but also changes who they are. 2. People are developing problems in relating to each other, relating to themselves, and their capacity for self-reflection. 3. People using these devices excessively expect more from technology and less from each other. Technologies are being designed that will give people the illusion of companionship without the demands of friendship. 4. The capacity for being alone is not being cultivated. Being alone seems to be interpreted as an illness that needs to be cured rather than a comfortable state of solitude with many uses. 5. Traditional conversation has given way to mediated connection, leading to the loss of valuable interpersonal skills. (ʬ There is a lot of time and money spent on developing so-called caring robots for the treatment of the elderly and the care of children (Independent 2015) when the sensible and obvious solution to the problem of social care is other people, properly trained and properly remunerated, other people. We are by nature social creatures but the social valorization of care, support, attachment, nurture, education etc. has been seriously eroded by a culture that measures things only by their monetary value. For which reason we need to re-frame the whole of the caring process from cradle to grave so that parents, nurses, teachers, social workers, carers etc. are highly valued and highly paid for the work they do in helping to produce a new generation of flourishing adults, for caring for the sick and injured, and for helping to make the last days of sick and ailing adults and children a positive and dignified experience instead of trying to replace a humane and caring environment by cheap and tacky robots. If the development of intelligent machines is going to reduce the necessary workforce for the material reproduction of society then we need to redirect our resources towards the reproduction of our invaluable human resources. 12.5 Psychoanalysis In The Singularity is Near Kurzweil (2009) includes Freud among his imaginary interlocutors but gives him nothing of any interest or import to say. I would like to redress this imbalance by quoting from Freud’s Civilization and its Discontents. Life, as we find it, is too hard for us; it brings us too many pains, disappointments and impossible tasks. In order to bear it we cannot dispense with palliative measures. ‘We cannot do without auxiliary constructions’, as Theodore Fontane tells us. There are perhaps three such measures: powerful deflections, which cause us to make light of our misery: substitutive satisfactions, which diminish it; and intoxicating substances, which make us insensitive to it. Something of this kind is indispensable. (Freud S.E. 21) I think that the technological singularity plays such a role for Kurzweil and for many of the other people who believe in it and transhumanism too, and that it deflects our awareness, intoxicates our imaginations and provides substitute satisfactions in place of a real assay of our prospects. Psychoanalysis is a bio-psycho-social science in my view and it only makes any sense when these three different factors are included. As an object relations thinker I am most interested in the social and psychological aspects of this science but I am also convinced about the necessity of the body for the realisation of these relationships. It isn’t just that the body is the obvious site of emotions and the vector of emotional relationships it is also the case that what is unique and interesting about human beings is their specific embodiment within a body, a family, a social milieu, a geographical region, a societal form all with their own histories and dynamics. The downside of this from the point of view of the person living this reality is that people they know and love, to whom they are deeply attached will get sick, have accidents, pick fights, misunderstand them and so on, and die leaving them bereft. But, being embodied too it follows that they will also be riven by emotional concerns and upsets which might well include their own suffering from various illnesses, bad luck, difficulties and so on, leading eventually to death. Psychoanalysis, like religion, is about helping people come to terms with the realities of our lives and relationships and the inevitability of suffering and being able to carry on productively and flourish. I am going to briefly look at the work of Ronald Fairbairn (1952) who, while strongly influenced by Freud nevertheless criticised the underlying assumptions of Freud’s structural model and developed a thoroughgoing object relations theory based upon twentieth century physics where energy and structure are interconnected. His psychology of dynamic structure is a multi agent open system model of the mind that is widely influential and is a direct precursor of attachment theory and relational approaches to psychoanalysis subsequently. The overriding imperative of the neonate in this model is to make and sustain a relationship with the mother or surrogate to ensure its survival, not pleasure seeking as in Freud’s thinking (Clarke 2006, 2014). In Fairbairn’s model the original defences are to first internalise a significant other, usually mother, followed by, over time, sorting the relationships with mother, by dissociating those object relationships that are unacceptable because over-exciting or over-rejecting. This is followed by the repression of those sets of object relations possibilities that are either over-exciting or over-rejecting, producing a tripartite division of the mind each partition of which is represented by a subject-object dyad. The acceptable object relations becoming the basis of the conscious central self whilst the other selves—libidinal and antilibidinal—form subsidiary selves comprising the unconscious aspects of inner reality. One of Fairbairn’s first papers in the development of this new approach addressed the idea that we are all fundamentally split to some degree in consequence of our original experience with our first objects. Fairbairn’s clinical description is of the schizoid character unable to cope with the emotional world of relationships, who retreats to a ‘higher plain’ where the intellect alone reigns. This seems to be a common characteristic of many of the people engaged in promoting the technological singularity. Another important manifestation of preoccupation with the inner world is the tendency to intellectualization; and this is a very characteristic schizoid feature. It constitutes and extremely powerful defensive technique; and it operates as a very formidable resistance in psychoanalytical therapy. Intellectualisation implies an over-valuation of the thought processes; and this over-valuation of thought is related to the difficulty which the individual with a schizoid tendency experiences in making emotional contacts with other people. Owing to preoccupation with the inner world and the repression of affect which follows in its train, he has difficulty in expressing his feelings naturally towards other, and in acting naturally and spontaneously in his relations with them. This leads him to make an effort to work out his emotional problems intellectually in the inner world. (Fairbairn 1952, p. 20) This is to distance yourself from a direct and emotional engagement with the reality around you for a cleaner and purer place where the world of knowledge exists unencumbered by the messy reality from which it has been extracted. The search for intellectual solutions to what are properly emotional problems thus gives rise to two important developments: (1) The thought process becomes highly libidinized; and the world of thought tends to become the predominant sphere of creativity and self-expression; and (2) ideas tend to become substituted for feelings, and intellectual values for emotional values. (Fairbairn 1952, p. 20) As a technique, this stepping back to gain a perspective and then applying the extracted model is an acceptable scientific approach as long as you don’t forget that the map is not the territory. There is however an aspect of science that is, for me, brought out best by the Critical Realism of Bhaskar (1989) in which the very possibility of doing science is predicated upon reality existing independently of our hopes and wishes for it and being able to return results that we didn’t expect, which can contradict our hypotheses and our certainties, which is what makes the experimental method so important. If, as Kurzweil argues, there is no distinction between human and machine or between physical and virtual reality then the possibility for being human will have been lost and the possibility for science equally so. For the virtual to be indistinguishable from the real the level of interrogation that we might subject the virtual to would have to be as multi-layered and eventually unpredictable as the continuing investigation of the real is proving to be. It would also have to come apart in exactly the same way that reality does, but, since it is constructed by us we would have to be able to know how it all fits together beforehand, whereas science leads to our discovering anew how it all fits together. If we already know how it works and can make a virtual world consistent with that then we cannot learn from it. It seems to me highly likely that what Kurzweil is trying to do by ushering in this new age singularity is to avoid all those painful experiences by hoping for a world without them. A world where our biological bodies will no longer break down and need repair and maintenance, in which old age, sickness and death are effectively banished and in which we will at last come to understand the thinking of perfectly rational others instead of always being perplexed by the thinking and behavior of emotional, opinionated, disputatious human beings. Indeed one could argue that many or indeed most of our most precious achievements as a species are predicated upon our grappling with disappointment, anxiety, perplexity and sadness. Macmurray (1961) argues that for us to be human we have to be able to make real choices and that the difference between ourselves and other organisms is that we have choices because we have intentionality, that is we can look at a situation and see a number of possible ways that we might behave that are not predictable from a knowledge of our biological makeup and our recent history. This is all predicated upon our not being simple instinct-driven creatures despite Freud. More to the point perhaps is that the sort of thinking that fuels research towards the aims that the singularity determines is leading to the ‘nerds’ and the ‘bots’ ripping-off the economy and the market being rigged by the so-called ‘Flash Boys’ (Lewis 2014) while the global economy becomes more and more unbalanced with the eighty five richest people as wealthy as the poorest half of the world’s population (Wearden 2014). Singularity considerations never address the question of who is going to be advantaged by it and one has the distinct impression that it is a private fantasy about living forever and avoiding the ‘slings and arrows of outrageous fortune’ that is the prime motivator for individuals concerned mostly with themselves and their own survival. While there are some signs that there is a growing critical response to these ideas and the hyper-market capitalism they encourage in the work of Lanier (2010, 2013), Lucas (2014) or Piketty (2014) the concentration of power and wealth continues unabated and effectively unopposed to the detriment of all. We cannot do without auxiliary constructions if we are to help redirect the economy towards human ends. In Thus Spake Zarathustra Nietzsche talks about the Ubermensch, the Superman, who he entreats us to make ‘the meaning of the earth!’ which he immediately follows with another entreaty, to ‘remain true to the earth’. I take it that remaining true to the earth is to remain true to our natural place within the biosphere and to bring all of that to some higher order, some greater harmony. I take it that this is to recognise the importance of our embodiment and the ways in which we come to be ourselves through others. I take it that this is to engage fully, emotionally with others and with our societies to work towards producing communities that value their members flourishing. The technological singularity as currently conceived, which might happen in some form or another if we are not vigilant, will be the triumph of reason over emotion, the worst sort of mechanistic thinking in denial of all of our feelings of community, solidarity, attachment and love. The age of the machines will no longer be human and lacks any unifying project that we might understand. An Artificial General Intelligence (AGI) without consciousness or a conscience that is more like a disease than a panacea and an AGI with consciousness and a conscience, may well attempt to usurp us. Be very careful what you wish for! 12.6 Conclusion If the gold standard of scientific endeavour is the experimental method and falsifiability then both the singularity and the transhuman hypotheses fail that test. In Sects. 12.2–12.4 I adopt a sceptical view of the many claims that are made by supporters of the singularity and transhuman projects and look at alternative constructions. This seems to me a necessary scientific approach towards these claims many of which are both speculative and from a perspective that is neither proven nor agreed. In this way I am concerned to challenge some of the certainties of different aspects of the current research into AGI’s with a view to generating a far greater awareness of the dangers of this research and encourage its regulation ¹ as we do research into stem cells, germ warfare and atomic weapons. In Sect. 12.5 having searched the introductory survey to this book in vain for any recognition that a depth psychological approach might be helpful to a full understanding of the phantasies surrounding the development of machines that might offer us the possibility of living forever or achieving immense power intellectually or physically, I outline a psychoanalytic approach to the theories of the singularity and transhumanism. In the survey there are a few references to ‘psychology’ or ‘psychopathy’ but none that suggest that an important aspect of the imaginative investment that is being made in these AI related projects may be born of phantasy and none that suggest that we have an unconscious the workings of which are to some degree hidden from us. These phantasies are in the service of defence against all of the ills our embodied selves are heir to—death, old age, illness, accident, the loss of loved ones, the loss of our children. The title of the paper “We cannot do without auxiliary constructions” is a tacit acknowledgement of this need for defensive means that psychoanalysis argues we have and use continually to protect and in some cases delude ourselves. As T.S. Eliot says in Burnt Norton, one of his Four Quartets, “humankind cannot bear very much reality”. The psychoanalytic section of this paper is intended to offer a new thread to further discussion of the societal proposals of the introductory survey. The work of philosopher and cultural critic John Gray is a case in point. When in a recent book The Soul of the Marionette (2015) Gray argues that Kurzweil’s vision of the singularity as “an explosive increase in knowledge that will enable humans to emancipate themselves from the material world and cease to be biological organisms [by] … uploading brain information into cyberspace … [that] The affinities between these ideas and Gnosticism are clear. Here as elsewhere, secular thinking is shaped by forgotten or repressed religion.” (Emphasis added.) Perhaps the ultimate irony is that the model that Kurzweil holds most dear in his imagination is that of his ‘resurrected’ father as a companion. It is clear that he was deeply affected by his father’s death and strongly motivated to overcome that in some way. The fact that such attachments and needs are strongly mediated by biological, psychological and social factors seems to have been lost in his understanding of the situation. Computer agents without emotional, relational, social and personal needs and wants are much more likely to be successful in helping us in the just and fair administration of things. And a simulation of another person is always just that however convincing or comforting that might be. Kurzweil (2014) reviewed ‘Her’ a recent film in which a computer agent with a female persona makes and later breaks a relationship with a man who suffers all the heartache that breaking up with another human intimate can bring. Considering one of the main problems of the relationship—the lack of a body for the computer agent—Kurzweil says “There are also methods to provide the tactile sense that goes along with a virtual body. These will soon be feasible, and will certainly be completely convincing by the time an AI of the level of Samantha is feasible.” The actual difficulties of effecting that relationship fully are not broached and the idea that for example one might be installed in a flotation tank while hooked up to an agent that was apparently sharing a fully embodied experience with you seems unlikely to catch on, even if it was realisable. However long you postponed it, you would still have to leave the flotation tank and slough off the second skin, at which time the Latin proverb—post coitum omne animal triste est—every animal feels sad after sexual intercourse might apply with the following amendment—every animal feels sad (disappointed, frustrated) after virtual sex too. Blurring the distinctions between the real and the virtual is a perilous path to take in my view. References Bhaskar, R. (1989) The Possibility of Naturalism. Routledge: London. Chomsky, N. The Singularity is Science Fiction ʬ (last accessed May 2014). Clarke, G. S. (2006) Personal Relations Theory: Fairbairn, Macmurray and Suttie. Routledge: London. Clarke, G. S. and Scharff, D. E. (Eds) (2014) Fairbairn and the object relations tradition. Karnac: London. Fairbairn, W. R. D. (1952) Psychoanalytic Studies of the Personality. Routledge: London. Gray, J. (2015) The Soul of the Marionette: A Short Enquiry into Human Freedom, Allen Lane: London. Hofstadter, D. R. (2007) An interview with Douglas. R. Hofstadter ʬ (last accessed May 2014). Hofstadter, D. R. (2008) An interview with Douglas.R.Hofstadter following “I am a strange loop”. ʬ (last accessed 5th May 2014). Hofstadter, D. R. (2009) The Singularity Summit at Stanford. ʬ (last accessed May 2014). Japanese ‘robot with a heart’ will care for the elderly and children, (ʬ (last accessed June 2015). Kurzweil, R. (2009) The Singularity is Near, Duckworth: London. Kurzweil, R. (2014) A review of ‘Her’ by Ray Kurzweil. ʬ (last accessed May 2014). Lanier, J. (2010) You are not a gadget. Allan Lane: London. Lanier, J. (2013) Who owns the future. Penguin: London. Lewis, M. (2014) Flash Boys: A Wall Street Revolt. W.W.Norton and Company. Lucas, R. (2014) Review of Jaron Lanier “Who Owns the Future”. Rob Lucas New Left Review 86 Mar/Apr 2014. Macmurray, J. (1961) Persons in Relation. Faber and Faber: London. Modis, T. (2006) The Singularity Myth. Technological Forecasting and Social Change, 72, No 2. Penrose, R. and Hameroff, S. Orchestrated objective reduction. ʬ (last accessed June 2015). Picard, R. blog: ʬ (last accessed May 2014). Piketty, T. (2014) Capital in the Twenty-First century. Harvard University Press. Solms, M. and Panksepp, J. (2012) Abstract: The “Id” knows more than the “Ego” admits: neuropsychoanalytic and primal consciousness perspectives on the interface between affective and cognitive neuroscience. Brain Sciences, 2: 147–175. Turkle, S. (1978) Psychoanalytic Politics: Jacques Lacan and Freud’s French Revolution. Turkle, S. (1984) The Second Self. MIT Press. Turkle, S (1995) Life on the screen. Simon and Schuster Paperbacks. Turkle, S. (2004) Whither psychoanalysis in Computer culture? Psychoanalytic Psychology Vol. 21, No 1. 16–30. Turkle, S. (2011) Alone Together: Why we expect more from technology and less from each other. Wearden, G. (2014) Oxfam: 85 richest people as wealthy as poorest half of the world. Guardian, 20th January 2014. Footnotes This places my approach in the ‘Societal Proposals’ of the introductory survey. Part III Reflections on the Journey © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_13 13. Reflections on the Singularity Journey James D. Miller¹   Department of Economics, Smith College, Northampton, US James D. Miller Email: jdmiller@smith.edu 13.1 Introduction Vernor Vinge’s 1993 essay “What is the Singularity?” (reprinted in this book’s appendix) seems too compelling to be believable. The author, a science fiction writer and former computer science professor, claims that the likely rise of superintelligence means we are on the “edge of change comparable to the rise of human life on Earth.” Vinge proposes several ways we might acquire superintelligence, including through developing sentient computers, large computer networks that “wake up,” computer/human interfaces, and through biologically enhanced humans, which seems a far more likely path now than in 1993 due to the rapidly falling cost of gene sequencing and the development of CRISPR gene editing technology. Vinge claims that the Singularity won’t necessarily go well for our species, and, as he writes, could result in the “physical extinction of the human race.” Alternatively, a positive singularity could create “benevolent gods,” with Vinge expressing a desire to become one himself. If an article explained why the stock value of X Corporation was going to rise slightly next month, I might believe it, but if the article claimed that the stock was going to increase tenfold by the week’s end I would reject the argument, regardless of how compelling it seemed. Many others have a far better understanding of the stock market than I do. If the article had merit, these more knowledgeable investors would already have bought up the stock until its value had increased tenfold. Consequently, the fact that other investors have not already acted on the article’s arguments provides strong evidence against the article’s claims. Analogously, if I, a mere academic economist, find Vinge’s arguments compelling and my impression is justified then shouldn’t society already have been reshaped to take into account the significant likelihood that a Singularity is near? After all, if we are on the verge of a Singularity that will, with high probability, either exterminate us or bring utopia, and this is something that reasonably informed people should realize, shouldn’t achieving a positive Singularity have become the primary purpose of human civilization? And if not, shouldn’t technologically literate people be screaming at the rest of mankind that how we handle artificial and enhanced human intelligence over these next few decades is mankind’s ultimate test? 13.2 Eliezer Yudkowsky In one of this section’s essays, AI researcher Eliezer Yudkowsky defines the three major schools of the Singularity. Each definition, I believe, comes with a reason why people might have a hard time thinking that the Singularity is plausible. 13.2.1 The Event Horizon Yudkowsky calls the school that Vinge supports the “Event Horizon.” This school is characterized in part by the future being unpredictable since it will be controlled by smarter-than-human intelligences—just as a 5-year-old is incapable of understanding the modern world, we mere unaugmented humans probably can’t understand what will happen in a future dominated by superintelligences. But I suspect that since we are a storytelling species the impossibility of imagining the post-Singularity world makes the Singularity seem unbelievable to many. Just as quantum physics feels less believable than Newtonian mechanics because only the latter can be visualized, a Singularity’s event horizon, beyond which we cannot see, might cause our brains to unreasonably discount the Singularity’s plausibility. 13.2.2 Accelerating Change Yudkowsky refers to another Singularity school of thought as “Accelerating Change.” The most prominent proponent of this school is Ray Kurzweil. The “Accelerating Change” school estimates that we will reach a Singularity through exponential increases in information technology, most notably Moore’s law. But humans are linear thinkers, and have great difficulty intuitively understanding exponential growth. Imagine, for example, that you have two investments. The first always pays $1000 a month in dividends and the second pays a sum that doubles each month, but starts out at a very low value. As the second investment will, for a very long time, pay much less than the first, it would be challenging for most people to imagine that if you go far enough into the future the second investment will be the one of overwhelming importance. Similarly, although most people recognize that information technology plays a significant role in our economy, they have a hard time extrapolating to what will happen when this technology is orders of magnitude more powerful than it is today. 13.2.3 The Intelligence Explosion Yudkowsky calls the school he thinks most likely “The Intelligence Explosion.” According to this school an AI has the potential to undergo an intelligence chain reaction in which it quickly upgrades its own intelligence. By this theory, an AI might examine its own code to figure out ways of becoming smarter, but as it succeeds in augmenting its own intelligence it will become even better at figuring out ways of becoming smarter. An AI that undergoes such an intelligence explosion might, therefore, expand from human level intelligence to (what would seem to us) godlike intelligence within a day. Although I personally think Yudkowsky’s belief in a future intelligence explosion is reasonable, I admit it seems superficially absurd. The absurdity heuristic provides another explanation for why many might reject the possibility of a Singularity. Most things that seem absurd are absurd, and are not worth taking the time to study in detail. You shouldn’t bother investigating the legitimacy of an email from an alleged Nigerian prince who wants just a bit of your money to unlock his fortune nor should a physics professor devote any of her precious time studying the equations mailed to her purportedly showing how to extract infinite free energy from coconuts. The Singularity, especially through an intelligence explosion, does superficially seem like a crazy idea, one thought up by people spending too much time contemplating the philosophical implications of the “Terminator” movies. Most people who hear about the Singularity might immediately pattern match it to science fiction craziness and banish the concept to the silly ideas trash bin subsystem of their brain. Perhaps it was because Vinge, a science fiction author, enjoyed thinking about futuristic crazy-sounding ideas that he gladly paid the upfront contemplation costs of considering if the Singularity should be taken seriously. This might also explain why, anecdotally, many people in the Singularity community are science fiction fans. Part of the value of the Singularity Hypothesis I and II books is to signal to academic audiences that although the Singularity seems absurd, many thoughtful people who have studied it think that you should ignore the absurdity heuristic and pay the time costs to fully investigate the possibility of a coming Singularity. 13.2.4 MIRI and LessWrong Yudkowsky helped create the Singularity Institute (now called the Machine Intelligence Research Institute) to help mankind achieve a friendly Singularly. (Disclosure: I have contributed to the Singularity Institute.) Yudkowsky then founded the community blog ʬ which seeks to promote the art of rationality, to raise the sanity waterline, and to in part convince people to make considered, rational charitable donations, some of which, Yudkowsky (correctly) hoped, would go to his organization. LessWrong has had a massive impact on the worldview of people who consider the “Intelligence Explosion” school of the Singularity to be the most plausible. Studying the art of rationality gives us additional reasons why many people don’t consider the Singularity to be near. 13.3 Scott Aaronson The two critical skills an aspiring rationalist needs are, first, admitting that a core belief you hold might be wrong and, second, recognizing that an informed person disagreeing with you should lower your estimate of the probability of your being right. Accepting these hard truths forces this author to admit that the reason why the Singularity seems implausible to so many people might be because it is indeed a highly implausible outcome. In recognition of this possibility this book includes the essay The Singularity is Far by Scott Aaronson, an MIT computer science professor. I find the essay’s Copernican argument particularly compelling because if the Singularity is near we live at an extraordinarily special time in the history of mankind: how we handle the Singularity will determine the very long term fate of our species. Imagine you conclude that during your expected lifetime mankind will either go extinct or create a friendly AI that will allow us to survive until the end of the universe with trillions upon trillions of our descendants eventually being born for every human who has ever existed up to today. If we do achieve a positive Singularity then if, at the end of the universe, you ranked in importance everyone who has ever existed, everyone currently alive today would likely be extremely close to the top, probably in the top one trillionth of one percent. We should always be suspicious of theories concluding that we are special—both because most things, tautologically, are not (macro) special, and because of the human bias of thinking that we are more important than is justified by the evidence. 13.4 Stuart Armstrong Another reason, I suspect, that many discount the Singularity is because we don’t have a firm date as to its arrival, and many past predictions of future technological marvels have proven false. It’s too easy for people to make far flung predictions (e.g. “in the glorious future X will happen”) because either the prediction will come true or you can claim that not enough time has passed for you to have been proved right. Because such indefinite predictions can’t be falsified, don’t allow their authors’ to receive negative feedback, and don’t force their authors to risk their reputations, they should be given less weight. For this reason, we have included an essay, originally published on LessWrong, by Stuart Armstrong, an editor of this book, that analyzes 95 time-specific predictions concerning when we will create human-level AI. Armstrong shows that the predictions are not subject to what’s called the Maes-Garreau Law, which predicts that predictors will predict that a technology will arrive “just within the lifetime of the predictor!” 13.5 Too Far in the Future The final reason I wish to discuss as to why many might not be taking the Singularity seriously is because people might be thinking that even if the Singularity is relatively near, there is little about its future nature that we can affect today. Tradeoffs are everywhere, and given that an intelligent allocation of resources can alleviate many problems of the modern world, we do face a cost if we put in effort into contemplating a possible Singularity. Furthermore, as with almost any future problem, if technology continues to advance we will likely have better tools to deal with the Singularity in the future than in the present. Imagine, though, that in 1900 physicists learned of the possibility of hydrogen bombs, and theorized that by 1970 a total war between powers armed with these weapons could destroy civilization. Would there have been any value to people speculating about a future containing these super-weapons, or would it have been a more productive use of time to wait until the eve of these weapons’ development before considering how they would remake warfare? Thinking about the possibility of mutually assured destruction, establishing taboos against using atomic weapons in anger, and designing an international system to stop terrorists and rogue states from acquiring nuclear weapons would, I believe, have been worthwhile endeavor for scholars living in my hypothetical 1900 world. Similarly, as the final essay in this section argues, we should today be studying how to handle advanced AI, even if we are a long time away from being able to code above human level intelligence into our computers. 13.6 Scott Siskind Scott Siskind, writing under the pseudonym Yvain, was one of LessWrong’s most popular writers before he went on to write for his own blog at ʬ Among the many articles he wrote concerning the Singularity is the one included in this section titled, “No Time Like the Present for AI Safety Work.” Siskind points out in this essay that there are serious problems related to advanced AI that we can and should work on now. 13.6.1 Wireheading Wireheading occurs when you directly stimulate the pleasure centers of your brain rather than seek happiness by engaging in activities that (indirectly) make you feel better. Siskind points out that wireheading might pose a huge problem for our potential future control over AIs. Simplistically, imagine that an AI’s code tells it to maximize X where X is supposed to represent some social welfare objective, but instead of changing the real world the AI just alters its own source code to give X as high a value as its memory allows. And then the AI decides to convert all the atoms it can get a hold of (including those in people) to computer memory so it can further raise X. Wireheading, as Siskind explains, is one of several “very basic problems that affect broad categories of minds.” As a result, it’s a problem that has a high enough chance of infecting a future AI that working on this issue today has a high expected payoff. 13.6.2 Work on AI Safety Now As Siskind writes, if we achieve a Singularity by an intelligence explosion, then we won’t have time to solve wireheading and other AI control problems after we create an AI of near human level intelligence. The best hope for a positive Singularity is if we solve these control problems first. And, I believe, that if our analysis shows that these control problems are intractable, then we have learned something useful as well and we should try to slow down the development of AI while simultaneously accelerating the augmentation of human intelligence so that our species becomes smart enough to figure out how to craft AIs that will actually do X, where X aligns with our true (or at least preferred) values. © Springer-Verlag GmbH Germany 2017 Victor Callaghan, James Miller, Roman Yampolskiy and Stuart Armstrong (eds.)The Technological SingularityThe Frontiers Collection10.1007/978-3-662-54033-6_14 14. Singularity Blog Insights James D. Miller¹   Department of Economics, Smith College, Northampton, USA James D. Miller Email: jdmiller@smith.edu 14.1 Three Major Singularity Schools Eliezer Yudkowsky Machine Intelligence Research Institute The following is an edited version of an article that was originally posted on the Machine Intelligence Research Institute blog, September 2007. Singularity discussions seem to be splitting up into three major schools of thought: Accelerating Change, the Event Horizon, and the Intelligence Explosion. Accelerating Change Core claim: Our intuitions about change are linear; we expect roughly as much change as has occurred in the past over our own lifetimes. But technological change feeds on itself, and therefore accelerates. Change today is faster than it was 500 years ago, which in turn is faster than it was 5000 years ago. Our recent past is not a reliable guide to how much change we should expect in the future. Strong claim: Technological change follows smooth curves, typically exponential. Therefore we can predict with fair precision when new technologies will arrive, and when they will cross key thresholds, like the creation of Artificial Intelligence. Advocates: Ray Kurzweil, Alvin Toffler(?), John Smart Event Horizon Core claim: For the last hundred thousand years, humans have been the smartest intelligences on the planet. All our social and technological progress was produced by human brains. Shortly, technology will advance to the point of improving on human intelligence (brain-computer interfaces, Artificial Intelligence). This will create a future that is weirder by far than most science fiction, a difference-in-kind that goes beyond amazing shiny gadgets. Strong claim: To know what a superhuman intelligence would do, you would have to be at least that smart yourself. To know where Deep Blue would play in a chess game, you must play at Deep Blue’s level. Thus the future after the creation of smarter-than-human intelligence is absolutely unpredictable. Advocates: Vernor Vinge Intelligence Explosion Core claim: Intelligence has always been the source of technology. If technology can significantly improve on human intelligence—create minds smarter than the smartest existing humans—then this closes the loop and creates a positive feedback cycle. What would humans with brain-computer interfaces do with their augmented intelligence? One good bet is that they’d design the next generation of brain-computer interfaces. Intelligence enhancement is a classic tipping point; the smarter you get, the more intelligence you can apply to making yourself even smarter. Strong claim: This positive feedback cycle goes FOOM, like a chain of nuclear fissions gone critical—each intelligence improvement triggering an average of >1.000 further improvements of similar magnitude—though not necessarily on a smooth exponential pathway. Technological progress drops into the characteristic timescale of transistors (or super-transistors) rather than human neurons. The ascent rapidly surges upward and creates superintelligence (minds orders of magnitude more powerful than human) before it hits physical limits. Advocates: I. J. Good, Eliezer Yudkowsky The thing about these three logically distinct schools of Singularity thought is that, while all three core claims support each other, all three strong claims tend to contradict each other. If you extrapolate our existing version of Moore’s Law past the point of smarter-than-human AI to make predictions about 2099, then you are contradicting both the strong version of the Event Horizon (which says you can’t make predictions because you’re trying to outguess a transhuman mind) and the strong version of the Intelligence Explosion (because progress will run faster once smarter-than-human minds and nanotechnology drop it into the speed phase of transistors). 14.2 AI Timeline Predictions: Are We Getting Better? Stuart Armstrong Future of Humanity Institute The following is an edited version of an article that was originally posted on ʬ on August 17, 2012. Thanks to some sterling work by Kaj Sotala and others such as Jonathan Wang and Brian Potter—all paid for by the gracious Singularity Institute, we’ve managed to put together a databases listing all AI predictions that we could find. The list is necessarily incomplete, but we found as much as we could, and collated the data so that we could have an overview of what people have been predicting in the field since Turing. We retained 257 predictions total, of various quality (in our expanded definition, philosophical arguments such as “computers can’t think because they don’t have bodies” count as predictions). Of these, 95 could be construed as giving timelines for the creation of human-level AIs. And “construed” is the operative word—very few were in a convenient “By golly, I give a 50% chance that we will have human-level AIs by XXXX” format. Some gave ranges; some were surveys of various experts; some predicted other things (such as child-like AIs, or superintelligent AIs). Where possible, I collapsed these down to single median estimate, making some somewhat arbitrary choices and judgement calls. When a range was given, I took the mid-point of that range. If a year was given with a 50% likelihood estimate, I took that year. If it was the collection of a variety of expert opinions, I took the prediction of the median expert. If the author predicted some sort of AI by a given date (partial AI or superintelligent AI), I took that date as their estimate rather than trying to correct it in one direction or the other (there were roughly the same number of subhuman AIs as suphuman AIs in the list, and not that many of either). I read extracts of the papers to make judgement calls when interpreting problematic statements like “within 30 years” or “during this century” (is that a range or an end-date?). So some biases will certainly have crept in during the process. That said, it’s still probably the best data we have. So keeping all that in mind, let’s have a look at what these guys said (and it was mainly guys). There are two stereotypes about predictions in AI and similar technologies. The first is the Maes-Garreau law: technologies as supposed to arrive just within the lifetime of the predictor! The other stereotype is the informal 20–30 year range for any new technology: the predictor knows the technology isn’t immediately available, but puts it in a range where people would still be likely to worry about it. And so the predictor gets kudos for addressing the problem or the potential, and is safely retired by the time it (doesn’t) come to pass. Are either of these stereotypes born out by the data? Well, here is a histogram of the various “time to AI” predictions. As can be seen, the 20–30 year stereotype is not exactly born out—but a 15–25 one would be. Over a third of predictions are in this range. If we ignore predictions more than 75 years into the future, 40% are in the 15–25 range, and 50% are in the 15–30 range. Apart from that, there is a gradual tapering off, a slight increase at 50 years, and twelve predictions beyond three quarters of a century. Eyeballing this, there doesn’t seem to much evidence for the Maes-Garreau law. Kaj looked into this specifically, plotting (life expectancy) minus (time to AI) versus the age of the predictor; the Maes-Garreau law would expect the data to be clustered around the zero line: Most of the data seems to be decades out from the zero point (note the scale on the y axis). You could argue, possibly, that 50 year olds are more likely to predict AI just within their lifetime, but this is a very weak effect. I see no evidence for the Maes-Garreau law—of the 37 prediction Kaj retained, only 6 predictions (16%) were within 5 years (in either direction) of the expected death date. But I’ve been remiss so far—combining predictions that we know are false (because their deadline has come and gone) with those that could still be true. If we look at predictions that have failed, we get this interesting graph: This looks very similar to the original graph. The main difference being the lack of very long range predictions. This is not, in fact, because there has not yet been enough time for these predictions to be proved false, but because prior to the 1990s, there were actually no predictions with a timeline greater than 50 years. This can best be seen on this scatter plot, which plots the time predicted to AI against the date the prediction was made: As can be seen, as time elapses, people become more willing to predict very long ranges. But this is something of an artefact—in the early days of computing, people were very willing to predict that AI was impossible. Since this didn’t give a timeline, their “predictions” didn’t show up on the graph. It recent times, people seem a little less likely to claim AI is impossible, replaced by these “in a century or two” timelines. Apart from that one difference, predictions look remarkably consistent over the span: modern predictors are claiming about the same time will elapse before AI arrives as their (incorrect) predecessors. This doesn’t mean that the modern experts are wrong—maybe AI really is imminent this time round, maybe modern experts have more information and are making more finely calibrated guesses. But in a field like AI prediction, where experts lack feed back for their pronouncements, we should expect them to perform poorly, and for biases to dominate their thinking. This seems the likely hypothesis—it would be extraordinarily unlikely that modern experts, free of biases and full of good information, would reach exactly the same prediction distribution as their biased and incorrect predecessors. In summary:
    1. If humanity doesn’t blow itself up, eventually we will create     human-level AI. 
      
    2. 2. If humanity creates human-level AI, technological progress will continue and eventually reach far-above-human-level AI   3. 3. If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours   4. 4. It is possible to do useful research now which will improve our chances of getting the AI goal alignment problem right   5. 5. Given that we can start research now we probably should, since leaving it until there is a clear and present need for it is unwise I place very high confidence (>95%) on each of the first three statements—they’re just saying that if trends continue moving towards a certain direction without stopping, eventually they’ll get there. I have lower confidence (around 50%) on the last two statements. Dealing with AI risk is as important as all the other things we consider important, like curing diseases and detecting asteroids and saving the environment. That requires at least a little argument for why progress should indeed be possible at this early stage. And I think progress is possible insofar as this is a philosophical and not a technical problem. Right now the goal isn’t “write the code that will control the future AI”, it’s “figure out the broad category of problem we have to deal with.” Let me give two examples of open problems to segue into a discussion of why these problems are worth working on now. Problem 1: Wireheading Some people have gotten electrodes implanted in their brains for therapeutic or research purposes. When the electrodes are in certain regions, most notably the lateral hypothalamus, the people become obsessed with stimulating them as much as possible. If you give them the stimulation button, they’ll press it thousands of times per hour; if you try to take the stimulation button away from them, they’ll defend it with desperation and ferocity. Their life and focus narrows to a pinpoint, normal goals like love and money and fame and friendship forgotten in the relentless drive to stimulate the electrode as much as possible. This fits pretty well with what we know of neuroscience. The brain (OVERSIMPLIFICATION WARNING) represents reward as electrical voltage at a couple of reward centers, then does whatever tends to maximize that reward. Normally this works pretty well; when you fulfill a biological drive like food or sex, the reward center responds with little bursts of reinforcement, and so you continue fulfilling your biological drives. But stimulating the reward center directly with an electrode increases it much more than waiting for your brain to send little bursts of stimulation the natural way, so this activity is by definition the most rewarding possible. A person presented with the opportunity to stimulate the reward center directly will forget about all those indirect ways of getting reward like “living a happy life” and just press the button attached to the electrode as much as possible. This doesn’t even require any brain surgery—drugs like cocaine and meth are addictive in part because they interfere with biochemistry to increase the level of stimulation in reward centers. And computers can run into the same issue. I can’t find the link, but I do remember hearing about an evolutionary algorithm designed to write code for some application. It generated code semi-randomly, ran it by a “fitness function” that assessed whether it was any good, and the best pieces of code were “bred” with each other, then mutated slightly, until the result was considered adequate. They ended up, of course, with code that hacked the fitness function and set it to some absurdly high integer. These aren’t isolated incidents. Any mind that runs off of reinforcement learning with a reward function—and this seems near-universal in biological life-forms and is increasingly common in AI—will have the same design flaw. The main defense against it this far is simple lack of capability: most computer programs aren’t smart enough for “hack your own reward function” to be an option; as for humans, our reward centers are hidden way inside our heads where we can’t get to it. A hypothetical superintelligence won’t have this problem: it will know exactly where its reward center is and be intelligent enough to reach it and reprogram it. The end result, unless very deliberate steps are taken to prevent it, is that an AI designed to cure cancer hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers. If it’s superintelligent, its options for acquiring new memory include “take over all the computing power in the world” and “convert things that aren’t computers into computers.” Human civilization is a thing that isn’t a computer. This is not some exotic failure mode that a couple of extremely bizarre designs can fall into; this may be the natural course for a sufficiently intelligent reinforcement learner. Problem 2: The Evil Genie Effect Everyone knows the problem with computers is that they do what you say rather than what you mean. Nowadays that just means that a program runs differently when you forget a close-parenthesis, or websites show up weird if you put the HTML codes in the wrong order. But it might lead an artificial intelligence to seriously misinterpret natural language orders. Age of Ultron actually gets this one sort of right. Tony Stark orders his super-robot Ultron to bring peace to the world; Ultron calculates that the fastest and most certain way to bring peace is to destroy all life. As far as I can tell, Ultron is totally 100% correct about this and in some real-world equivalent that is exactly what would happen. We would get pretty much the same effect by telling an AI to “cure cancer” or “end world hunger” or any of a thousand other things. Even Isaac Asimov’s Three Laws of Robotics would take about  thirty seconds to become horrible abominations. The First Laws says a robot cannot harm a human being or allow through inaction a human being to come to harm. “Not taking over the government and banning cigarettes” counts as allowing through inaction a human being to come to harm. So does “not locking every human in perfectly safe stasis fields for all eternity.” There is no way to compose an order specific enough to explain exactly what we mean by “do not allow through inaction a human to come to harm”—go ahead, try it—unless the robot is already willing to do what we mean, rather than what we say. This is not a deal-breaker, since AIs may indeed be smart enough to understand what we mean, but our desire that they do so will have to be programmed into them directly, from the ground up. But this just leads to a second problem: we don’t always know what we mean by something. The question of “how do we balance the ethical injunction to keep people safe with the ethical injunction to preserve human freedom?” is a pretty hot topic in politics right now, presenting itself in everything from gun control to banning Big Gulp cups. It seems to involve balancing out everything we value—how important are Big Gulp cups to us, anyway?—and combining cost-benefit calculations with sacred principles. Any AI that couldn’t navigate that moral labyrinth might end up ending world hunger by killing all starving people, or refusing else to end world hunger by inventing new crops because the pesticides for them might kill an insect. This is a problem we have yet to solve with humans—most of the humans in the world have values that we consider abhorrent, and accept tradeoffs we consider losing propositions. Dealing with an AI whose mind is no more different to mine than that of fellow human being Pat Robertson would from my perspective be a clear-cut case of failure. My point in raising these two examples wasn’t to dazzle anybody with interesting philosophical issues. It’s to prove a couple of points: First, there are some very basic problems that affect broad categories of minds, like “all reinforcement learners” or “all minds that make decisions with formal math”. People often speculate that at this early stage we can’t know anything about the design of future AIs. But I would find it extraordinarily surprising if they used neither reinforcement learning or formal mathematical decision-making. Second, these problems aren’t obvious to most people. These are weird philosophical quandaries, not things that are obvious to everybody with even a little bit of domain knowledge. Third, these problems have in fact been thought of. Somebody, whether it was a philosopher or a mathematician or a neuroscientist, sat down and thought “Hey, wait, reinforcement learners are naturally vulnerable to wireheading, which would explain why this same behavior shows up in all of these different domains.” Fourth, these problems suggest research programs that can be pursued right now, at least in a preliminary way. How come a human can understand the concept of wireheading, yet not feel any compulsion to seek a brain electrode to wirehead themselves with? Is there a way to design a mind that could wirehead a few times, feel and understand the exact sensation, and yet feel no compulsion to wirehead further? How could we create an idea of human ethics and priorities formal enough to stick into a computer? I think when people hear “we should start, right now in 2015, working on AI goal alignment issues” they think that somebody wants to write a program that can be imported directly into a 2075 AI to provide it with an artificial conscience. Then they think “No way you can do something that difficult this early on.” But that isn’t what anybody’s proposing. What we’re proposing is to get ourselves acquainted with the general philosophical problems that affect a broad subset of minds, then pursue the neuroscientific, mathematical, and philosophical investigations necessary to have a good understanding of them by the time the engineering problem comes up. That last section discussed my claim 4, that there’s research we can do now that will help. That leaves claim 5—given that we can do research now, we should, because we can’t just trust our descendents in the crunch time to sort things out on their own without our help, using their better model of what eventual AI might look like. There are a couple of reasons for this Reason 1: The Treacherous Turn Our descendents’ better models of AI might be actively misleading. Things that work for subhuman or human level intelligences might fail for superhuman intelligences. Empirical testing won’t be able to figure this out without help from armchair philosophy. Pity poor evolution. It had hundreds of millions of years to evolve defenses against heroin—which by the way affects rats much as it does humans—but it never bothered. Why not? Because until the past century, there wasn’t anything around intelligent enough to synthesize pure heroin. So heroin addiction just wasn’t a problem anything had to evolve to deal with. A brain design that looks pretty good in stupid animals like rats and cows becomes very dangerous when put in the hands (well, heads) of humans smart enough to synthesize heroin or wirehead their own pleasure centers. The same is true of AI. Dog-level AIs aren’t going to learn to hack their own reward mechanism. Even human level AIs might not be able to—I couldn’t hack a robot reward mechanism if it were presented to me. Superintelligences can. What we might see is reinforcement-learning AIs that work very well at the dog level, very well at the human level, then suddenly blow up at the superhuman level, by which it’s time it’s too late to stop them. This is a common feature of AI safety failure modes. If you tell me, as a mere human being, to “make peace”, then my best bet might be to become Secretary-General of the United Nations and learn to negotiate very well. Arm me with a few thousand nukes, and it’s a different story. A human-level AI might pursue its peace-making or cancer-curing or not-allowing-human-harm-through-inaction-ing through the same prosocial avenues as humans, then suddenly change once it became superintelligent and new options became open. Indeed, the point that will activate the shift is precisely that no humans are able to stop it. If humans can easily shut an AI down, then the most effective means of curing cancer will be for it to research new medicines (which humans will support); if humans can no longer stop an AI, the most effective means of curing cancer is destroying humanity (since it will no longer matter that humans will fight back). Reason 2: Hard Takeoff It seems in theory that by hooking a human-level AI to a calculator app, we can get it to the level of a human with lightning-fast calculation abilities. By hooking it up to Wikipedia, we can give it all human knowledge. By hooking it up to a couple extra gigabytes of storage, we can give it photographic memory. By giving it a few more processors, we can make it run a hundred times faster, such that a problem that takes a normal human a whole day to solve only takes the human-level AI 15 min. So we’ve already gone from “mere human intelligence” to “human with all knowledge, photographic memory, lightning calculations, and solves problems a hundred times faster than anyone else.” This suggests that “merely human level intelligence” isn’t mere. The next problem is “recursive self-improvement”. Maybe this human-level AI armed with photographic memory and a hundred-time-speedup takes up computer science. Maybe, with its ability to import entire textbooks in seconds, it becomes very good at computer science. This would allow it to fix its own algorithms to make itself even more intelligent, which would allow it to see new ways to make itself even more intelligent, and so on. The end result is that it either reaches some natural plateau or becomes superintelligent in the blink of an eye. If it’s the second one, “wait for the first human-level intelligences and then test them exhaustively” isn’t going to cut it. The first human-level intelligence will become the first superintelligence too quickly to solve even the first of the hundreds of problems involved in machine goal-alignment. And although I haven’t seen anyone else bring this up, I’d argue that even the hard-takeoff scenario might be underestimating the risks. Imagine that for some reason having two hundred eyes is the killer app for evolution. A hundred ninety-nine eyes are useless, no better than the usual two, but once you get two hundred, your species dominates the world forever. The really hard part of having two hundred eyes is evolving the eye at all. After you’ve done that, having two hundred of them is very easy. But it might be that it would take eons and eons before any organism reached the two hundred eye sweet spot. Having dozens of eyes is such a useless waste of energy that evolution might never get to the point where it could test the two-hundred-eyed design. Consider that the same might be true for intelligence. The hard part is evolving so much as a tiny rat brain. Once you’ve got that, getting a human brain, with its world-dominating capabilities, is just a matter of scaling up. But since brains are metabolically wasteful and not that useful before the technology-discovering point, it took eons before evolution got there. There’s a lot of evidence that this is true. First of all, humans evolved from chimps in just a couple of million years. That’s too short to redesign the mind from the ground up, or even invent any interesting new evolutionary “technologies”. It’s just enough time for evolution to alter the scale and add a couple of efficiency tweaks. But monkeys and apes were around for tens of millions of years before evolution bothered. Second, dolphins are almost as intelligent as humans. But they last shared a common ancestor with us something like fifty million years ago. Either humans and dolphins both evolved fifty million years worth of intelligence “technologies” independently of each other, or else the most recent common ancestor had most of what was necessary for intelligence and humans and dolphins were just the two animals in that vast family tree for whom using them to their full extent became useful. But the most recent common ancestor of humans and dolphins was probably not much more intelligent than a rat itself. If this is right, then the first rat-level AI will contain most of the interesting discoveries needed to build the first human-level AI and the first superintelligent AI. People tend to say things like “Well, we might have AI as smart as a rat soon, but it will be a long time after that before they’re anywhere near human-level”. But that’s assuming you can’t turn the rat into the human just by adding more processing power or more simulated neurons or more connections or whatever. Anything done on a computer doesn’t need to worry about metabolic restrictions. Reason 3: Everyday Ordinary Time Constraints During the 1956 Dartmouth Conference on AI, top researchers made a plan toward reaching human-level artificial intelligence, and gave themselves 2 months to teach computers to understand human language. In retrospect, this might have been mildly optimistic. But now machine translation is a thing, people are making some good progress in some of the hard problems—and when people bring up problems like wireheading, or goal alignment, people just say “Oh, we have plenty of time”. But expecting to solve those problems in a few years might be just as optimistic as expecting to solve machine language translation in 2 months. Sometimes problems are harder than you think, and it’s worth starting on them early just in case. 14.4 The Singularity Is Far By Scott Aaronson The University of Texas at Austin The following is an edited version of an article that was originally posted on ʬ on September 7, 2008. In this post, I wish to propose for the reader’s favorable consideration a doctrine that will strike many in the nerd community as strange, bizarre, and paradoxical, but that I hope will at least be given a hearing. The doctrine in question is this: while it is possible that, a century hence, humans will have built molecular nanobots and superintelligent AIs, uploaded their brains to computers, and achieved eternal life, these possibilities are not quite so likely as commonly supposed, nor do they obviate the need to address mundane matters such as war, poverty, disease, climate change, and helping Democrats win elections. Last week I read Ray Kurzweil’s The Singularity Is Near, which argues that by 2045, or somewhere around then, advances in AI, neuroscience, nanotechnology, and other fields will let us transcend biology, upload our brains to computers, and achieve the dreams of the ancient religions, including eternal life and whatever simulated sex partners we want. Perhaps surprisingly, Kurzweil does not come across as a wild-eyed fanatic, but as a humane idealist; the text is thought-provoking and occasionally even wise. I find myself in agreement with Kurzweil on three fundamental points. Firstly, that whatever purifying or ennobling qualities suffering might have, those qualities are outweighed by suffering’s fundamental suckiness. If I could press a button to free the world from loneliness, disease, and death—the downside being that life might become banal without the grace of tragedy—I’d probably hesitate for about five seconds before lunging for it. Secondly, there’s nothing bad about overcoming nature through technology. Humans have been in that business for at least 10,000 years. Now, it’s true that fanatical devotion to particular technologies—such as the internal combustion engine—might well cause the collapse of human civilization and the permanent degradation of life on Earth. But the only plausible solution is better technology, not the Flintstone route. Thirdly, were there machines that pressed for recognition of their rights with originality, humour, and wit, we’d have to give it to them. And if those machines quickly rendered humans obsolete, I for one would salute our new overlords. Yet while I share Kurzweil’s ethical sense, I don’t share his technological optimism. Everywhere he looks, Kurzweil sees Moore’s-Law-type exponential trajectories—not just for transistor density, but for bits of information, economic output, the resolution of brain imaging, the number of cell phones and Internet hosts, the cost of DNA sequencing… you name it, he’ll plot it on a log scale. Kurzweil acknowledges that, even over the brief periods that his exponential curves cover, they have hit occasional snags, like (say) the Great Depression or World War II. And he’s not so naïve as to extend the curves indefinitely. Nevertheless, he fully expects current technological trends to continue pretty much unabated until they hit fundamental physical limits. I’m much less sanguine. Where Kurzweil sees a steady march of progress interrupted by occasional hiccups, I see a few fragile and improbable victories against a backdrop of malice, stupidity, and greed—the tiny amount of good humans have accomplished in constant danger of drowning in a sea of blood and tears, as happened to so many of the civilizations of antiquity. The difference is that this time, human idiocy is playing itself out on a planetary scale; this time we can finally ensure that there are no survivors left to start over. In the rest of this post, I’d like to share some of the reasons why I haven’t chosen to spend my life worrying about the Singularity. The first, and most important, reason is because there are vastly easier prerequisite questions that we already don’t know how to answer. In a field like computer science theory, you very quickly get used to being able to state a problem with perfect clarity, knowing exactly what would constitute a solution, and still not having any clue how to solve it. And at least in my experience, being pounded with this situation again and again slowly reorients your worldview. You learn to terminate trains of thought that might otherwise run forever without halting. Faced with a question like “How can we stop death?” or “How can we build a human-level AI?” you learn to respond: “What’s another question that’s easier to answer, and that probably has to be answered anyway before we have any chance on the original one?” And if someone says, “but can’t you at least estimate how long it will take to answer the original question?” you learn to hedge and equivocate. The second reason is that as a goal recedes to infinity, the probability increases that as we approach it, we’ll discover some completely unanticipated reason why it wasn’t the right goal anyway. You might ask: what is it that we could possibly learn about neuroscience, biology, or physics, that would make us slap our foreheads and realize that uploading our brains to computers was a harebrained idea from the start, reflecting little more than early-21st-century prejudice? Is there any example of a prognostication about the 21st century written before 1950, most of which doesn’t now seem quaint? The third reason is simple comparative advantage. Given our current ignorance, there seems to me to be relatively little worth saying about the Singularity—and what is worth saying is already being said well by others. Thus, I find nothing wrong with a few people devoting their lives to Singulatarianism, just as others should arguably spend their lives worrying about asteroid collisions. But precisely because smart people do devote brain-cycles to these possibilities, the rest of us have correspondingly less need to. The fourth reason is because I find it unlikely that we’re extremely special. Sure, maybe we’re at the very beginning of the human story, a mere awkward adolescence before billions of glorious post-Singularity years ahead. But whatever intuitions cause us to expect that could easily be leading us astray. Suppose that all over the universe, civilizations arise and continue growing exponentially until they exhaust their planets’ resources and kill themselves out. In that case, almost every conscious being brought into existence would find itself extremely close to its civilization’s death throes. If—as many believe—we’re quickly approaching the earth’s carrying capacity, then we’d have not the slightest reason to be surprised by that apparent coincidence. To be human would, in the vast majority of cases, mean to be born into a world of air travel and Burger King and imminent global catastrophe. It would be like some horrific Twilight Zone episode, with all the joys and labors, the triumphs and setbacks of developing civilizations across the universe receding into demographic insignificance next to their final, agonizing howls of pain. I wish reading the news every morning furnished me with more reasons not to be haunted by this vision of existence. The fifth reason is my (limited) experience of AI research. I was actually an AI person long before I became a theorist. When I was 12, I set myself the modest goal of writing a BASIC program that would pass the Turing Test by learning from experience and following Asimov’s Three Laws of Robotics. I coded up a really nice tokenizer and user interface, and only got stuck on the subroutine that was supposed to understand the user’s question and output an intelligent, Three-Laws-obeying response. Later, at Cornell, I was lucky to learn from Bart Selman, and worked as an AI programmer for Cornell’s RoboCupteam—an experience that taught me little about the nature of intelligence but a great deal about how to make robots pass a ball. At Berkeley, my initial focus was on machine learning and statistical inference; had it not been for quantum computing, I’d probably still be doing AI today. For whatever it’s worth, my impression was of a field with plenty of exciting progress, but which has (to put it mildly) some ways to go before recapitulating the last billion years of evolution. The idea that a field must either be (1) failing or (2) on track to reach its ultimate goal within our lifetimes, seems utterly without support in the history of science (if understandable from the standpoint of both critics and enthusiastic supporters). If I were forced at gunpoint to guess, I’d say that human-level AI seemed to me like a slog of many more centuries or millennia (with the obvious potential for black swans along the way). As you may have gathered, I don’t find the Singulatarian religion so silly as not to merit a response. Not only is the “Rapture of the Nerds” compatible with all known laws of physics; if humans survive long enough it might even come to pass. The one notion I have real trouble with is that the AI-beings of the future would be no more comprehensible to us than we are to dogs (or mice, or fish, or snails). After all, we might similarly expect that there should be models of computation as far beyond Turing machines as Turing machines are beyond finite automata. But in the latter case, we know the intuition is mistaken. There is a ceiling to computational expressive power. Get up to a certain threshold, and every machine can simulate every other one, albeit some slower and others faster. Now, it’s clear that a human who thought at ten thousand times our clock rate would be a pretty impressive fellow. But if that’s what we’re talking about, then we don’t mean a point beyond which history completely transcends us, but “merely” a point beyond which we could only understand history by playing it in extreme slow motion. Yet while I believe the latter kind of singularity is possible, I’m not at all convinced of Kurzweil’s thesis that it’s “near” (where “near” means before 2045, or even 2300). I see a world that really did change dramatically over the last century, but where progress on many fronts (like transportation and energy) seems to have slowed down rather than sped up; a world quickly approaching its carrying capacity, exhausting its natural resources, ruining its oceans, and supercharging its climate; a world where technology is often powerless to solve the most basic problems, millions continue to die for trivial reasons, and democracy isn’t even clearly winning over despotism; a world that finally has a communications network with a decent search engine but that still hasn’t emerged from the tribalism and ignorance of the Pleistocene. And I can’t help thinking that, before we transcend the human condition and upload our brains to computers, a reasonable first step might be to bring the 18th-century Enlightenment to the 98% of the world that still hasn’t gotten the message. Appendix The Coming Technological Singularity: How to Survive in the Post-human Era (reprint) Vernor Vinge Department of Mathematical Sciences, San Diego State University (c) 1993 by Vernor Vinge (Verbatim copying/translation and distribution of this entire article is permitted in any medium, provided this notice is preserved.) This article was for the VISION-21 Symposium sponsored by NASA Lewis Research Center and the Ohio Aerospace Institute, March 30–31, 1993. It is also retrievable from the NASA technical reports server as part of NASA CP-10129. A slightly changed version appeared in the Winter 1993 issue of Whole Earth Review. Abstract Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended. Is such progress avoidable? If not to be avoided, can events be guided so that we may survive? These questions are investigated. Some possible answers (and some further dangers) are presented. What is The Singularity? The acceleration of technological progress has been the central feature of this century. I argue in this paper that we are on the edge of change comparable to the rise of human life on Earth. The precise cause of this change is the imminent creation by technology of entities with greater than human intelligence. There are several means by which science may achieve this breakthrough (and this is another reason for having confidence that the event will occur):