It has been suggested that in a rapid enough takeoff scenario, governance would not be useful, because the transition to superintelligence would be too rapid for human actors—whether governments, corporations, or individuals—to respond to. This seems to imply that we only care about takeoff speed. And if that is the only relevant factor, the case for governance only applies if you believe slow takeoff is likely. Of course, it also matters how long we have until takeoff—but even so, I think this leaves a fair amount on the table in terms of what governance could do, and I want to try to make the case that even in that world, governance (still defined broadly1) is important—though in different ways.
The Easy/Hard Spectrum
To make the argument, I will lay out three possibilities about AI alignment which are orthogonal to takeoff speed and timing; alignment-by-default, prosaic alignment, and provable alignment. These are actually somewhat of a spectrum, with the three scenarios spaced along it. In any case, for each possibility, governance needs to accomplish very different things in order to be successful, according to the above definition—and the relationship with takeoff speeds seems important, but not fully determinative. The first possibility, alignment-by-default, is that if we train systems via reinforcement learning or similar, then even without particular effort to solve alignment, all systems which are successful end up learning policies and goals close enough to human values that they are beneficial and influenceable. In the slower takeoff case, initially, governance looks a lot like human governance, making sure that actors, both human and AI, can cooperate and follow mutually understood and agreed upon rules. Later, and in the faster takeoff case, our efforts towards governance become irrelevant as the AI systems replace human structures, or improve them. The second possibility, prosaic alignment, is that alignment of artificial intelligence systems is somewhat difficult, but achievable via approaches which can be developed. So some systems will be aligned, but without oversight, unaligned systems are possible or likely. In this case, the key task of governance is to ensure that all early HLMI/PASTA/AGI systems undergo robust alignment procedures. Prior to the emergence of such systems, many tasks will be useful for ensuring this outcome, including monitoring progress, developing standards, and building norms about safety. But as above, later and/or in the faster takeoff cases, governance becomes less relevant. Note, however, that this means more emphasis is needed on pre-emergence and early stage efforts, rather than eliminating the need for governance. The final possibility is that the only way alignment can occur is via currently-impossible provable alignment. In this case, it may be that there are few potential ways to train safe AGI, and almost all earlier attempts are dangerous. Somewhat similar to the previous case, the key task is to prevent misaligned systems. In a fast takeoff case, the entirety of the usefulness of governance is prior to emergence, perhaps via intensive monitoring or limits of compute, while in slow takeoff case, there is some chance that governance can prevent disaster while allowing work in AI, perhaps via some sort of policing, a la lsusr’s Bayeswatch.
Along the different spectra
There are now three different dimensions being discussed. The first is how long we have until takeoff begins, which determines how much time we have to solve the various problems. The second is difficulty of alignment, which I argued above determines the key task of governance, whether it is to prevent unaligned systems, or it is to ensure that systems are aligned. And lastly, there is the speed of takeoff, which determines how much time governance has to act once takeoff begins. In this model, along the second two dimensions, as either speed or difficulty increases, the relative emphasis on pre-AGI governance increases, and the usefulness of governance during the transition decreases. This leaves us with effectively a single dimension, albeit still one that is orthogonal to when takeoff occurs. And while there are certainly a class of interventions which are helpful towards one end of the spectrum, but harmful on the other2, there is also the real possibility that we can find approaches which are beneficial in both cases. As a few small examples of what these might look like, regardless of where on the spectrum we are, governance can reduce risks by 1) monitoring compute usage and capabilities to enable response, 2) vastly improving computer security for AI labs which could prevent or slow at least some forms of takeoff, and 3) building norms around care taken in development, testing, and deployment of proto-AGI systems.
- Allan Dafoe has suggested that "AI governance concerns how humanity can best navigate the transition to a world with advanced AI systems." This seems broadly correct, and to add to it, he has suggested it concerns "norms and institutions shaping how AI is built and deployed, as well as the policy and research efforts to make it go well."2) This analysis implies that the vast majority of governance efforts matter in slow takeoff / relatively easy alignment worlds, but are irrelevant or in some cases even harmful in faster takeoff / harder alignment worlds. This is an issue, but the existence of such tradeoffs alone does not imply that these approaches should not be seriously considered or pursued. Thanks to Allan Dafoe for very helpful feedback on an earlier version of this.
Comment
In the post, I wanted to distinguish between two things you’re now combining; how hard alignment is, and how long we have. And yes, combining these, we get the issue of how hard it will be to solve alignment in the time frame we have until we need to solve it. But they are conceptually distinct.And neither of these directly relates to takeoff speed, which in the current framing is something like the time frame from when we have systems that are near-human until they hit a capability discontinuity. You said "First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster." This last implication might be true, or might not. I agree that there are many worlds in which they are correlated, but there are plausible counter-examples. For instance, we may continue with fast progress and get to HLMI and a utopian freedom from almost all work, but then hit a brick wall on scaling deep learning, and have another AI winter until we figure out how to make actually AGI which can then scale to ASI—and that new approach could lead to either a slow or a fast takeoff. Or we may have progress slow to a crawl due to costs of scaling input and compute until we get to AGI, at which point self-improvement takeoff could be near-immediate, or could continue glacially.And I agree with your claims about why Eliezer is pessimistic about prosaic alignment—but that’s not why he’s pessimistic about governance, which is a mostly unrelated pessimism.
Comment
Like I said in my first comment, the in practice difficulty of alignment is obviously connected to timeline and takeoff speed.
But you’re right that you’re talking about the intrinsic difficulty of alignment Vs takeoff speed in this post, not the in practice difficulty.
But those are also still correlated, for the reasons I gave—mainly that a discontinuity is an essential step in Eleizer style pessimism and fast takeoff views. I’m not sure how close this correlation is.
Do these views come apart in other possible worlds? I.e. could you believe in a discontinuity to a core of general intelligence but still think prosaic alignment can work?
I think that potentially you can—if you think that still enough capabilities in pre-HLMI AI (pre discontinuity) to help you do alignment research before dangerous HLMI shows up. But prosaic alignment seems to require more assumptions to be feasible assuming a discontinuity, like that the discontinuity doesn’t occur before all the important capabilities you need to do good alignment research.
Comment
I’m not sure I agree with the compatibility of discontinuity and prosaic alignment, though you make a reasonable case, but I do think there is compatibility between slower governance approaches and discontinuity, if it is far enough away.
It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches, as well as effective interdisciplinary management. In my inside view, the highest-marginal-impact interventions involve making multiple different things go right simultaneously for the first AGIs, which is not trivial, and the stakes are astronomical.Little clear progress has been made on provable alignment after over a decade of trying. My inside view is that it got privileged attention because the first people to take the problem seriously happened to be highly abstract thinkers. Then they defined the scope and expectations of the field, alienating other perspectives and creating a self-reinforcing trapped prior.
Comment
First, I think it’s ludicrous to say "Little clear progress has been made on provable alignment after over a decade of trying." The progress is actually quite amazing—yes, we’re decades away from a solution to provable alignment, if one is possible at all, but not only has there been some really amazing and groundbreaking work coming out of MIRI, but you aren’t paying attention if you don’t see all of the contributions that work made to all of the questions which "prosaic alignment" is now trying to answer.Second, "It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches," is correct, but several years too late, given that it’s a majority of the work which is being done already.