Epistemic status: Written for Blog Post Day III. I don’t get to talk to people "in the know" much, so maybe this post is obsolete in some way. I think that at some point at least one AI project will face an important choice between deploying and/or enlarging a powerful AI system, or holding back and doing more AI safety research. (Currently, AI projects face choices like this all the time, except they aren’t important in the sense I mean it, because the AI isn’t potentially capable of escaping and taking over large parts of the world, or doing something similarly bad.) Moreover, I think that when this choice is made, most people in the relevant conversation will be insufficiently concerned/knowledgeable about AI risk. Perhaps they will think: "This new AI design is different from the classic models, so the classic worries don’t arise." Or: "Fear not, I did [insert amateur safety strategy]." I think it would be very valuable for these conversations to end with "OK, we’ll throttle back our deployment strategy for a bit so we can study the risks more carefully," rather than with "Nah, we’re probably fine, let’s push ahead." This buys us time. Say it buys us a month. A month of extra time right after scary-powerful AI is created is worth a lot, because we’ll have more serious smart people paying attention, and we’ll have more evidence about what AI is like. I’d guess that a month of extra time in a situation like this would increase the total amount of quality-weighted AI safety and AI policy work by 10%. That’s huge. One way to prepare for these conversations is to raise awareness about AI risk and technical AI safety problems, so that it’s more likely that more people in these conversations are more informed about the risks. I think this is great. However, there’s another way to prepare, which I think is tractable and currently neglected:
- Identify some people who might be part of these conversations, and who already are sufficiently concerned/knowledgeable about AI risk.
- Help them prepare for these conversations by giving them resources, training, and practice, as needed: 2a. Resources: Perhaps it would be good to have an Official List of all the AI safety strategies, so that whatever rationale people give for why this AI is safe can be compared to the list. (See this prototype list.) Perhaps it would be good to have an Official List of all the AI safety problems, so that whatever rationale people give for why this AI is safe can be compared to the list, e.g. "OK, so how does it solve outer alignment? What about mesa-optimizers? What about the malignity of the universal prior? I see here that your design involves X; according to the Official List, that puts it at risk of developing problems Y and Z..." (See this prototype list.) Perhaps it would be good to have various important concepts and arguments re-written with an audience of skeptical and impatient AI researchers in mind, rather than the current audience of friends and LessWrong readers. 2b. Training & practice: Maybe the person is shy, or bad at public speaking, or bad at keeping cool and avoiding fluster in high-stakes discussions. If so, some coaching and practice could go a long way. Maybe they have the opposite problems, frequently coming across as overconfident, arrogant, aggressive, or paranoid. If so someone should tell them this and help them tone it down. In general it might be good to do some role-play exercises or something, to prepare for these conversations. As an academic, I’ve seen plenty of mock-dissertation-defense sessions and mock-job-talk-question-sessions, which seem to help. And maybe there are ways to get even more realistic practice, e.g. by trying to convince your skeptical friends that their favorite AI design might kill them if it worked. Note that most of part 2 can be done without having done part 1. This is important in case we don’t know anyone who might be part of one of these conversations, which is true for many and perhaps most of us. Why do I think this is tractable? Well, seems like the sort of thing that people producing AI safety research can do on the margin, just by thinking more about their audience and maybe recording their work (or other people’s work) on some Official List. Moreover people who don’t do (or even read) AI safety research can contribute to this, e.g. by reading the literature on how to practice for situations like this, and writing up the results. Why do I think this is neglected? Well, maybe it isn’t. In fact I’d bet that some people are already thinking along these lines. It’s a pretty obvious idea. But just in case it is neglected, I figured I’d write this. Moreover, the Official Lists I mentioned don’t exist, and I think they would if people were taking this idea seriously. Finally—and this more than anything else is what caused me to write this post—I’ve heard one or two people explicitly call this out as something that they *don’t *think is an important use case for the alignment research they were doing. I disagreed with them, and here we are. If this is a bad idea, I’d love to know why.
Hey Daniel, don’t have time for a proper reply right now but am interested in talking about this at some point soon. I’m currently in UK Civil Service and will be trying to speak to people in their Office for AI at some point soon to get a feel for what’s going on there, perhaps plant some seeds of concern. I think some similar things apply.
Comment
Sure, I’d be happy to talk. Note that I am nowhere near the best person to talk to about this; there are plenty of people who actually work at an AI project, who actually talk to AI scientists regularly, etc.
Planned summary for the Alignment Newsletter:
Comment
Sounds good. Thanks! My current opinion is basically not that different from yours.
Comment
Thanks for the thoughtful pushback! It was in anticipation of comments like this that I put hedging language in like "it think" and "perhaps." My replies: This seems a bit like writing the bottom line first?> Like, AI fears in our community have come about because of particular arguments. If those arguments don’t apply, I don’t see why one should strongly assume that AI is to be feared, outside of having written the bottom line first.1. Past experience has shown that even when particular AI risk arguments don’t apply, often an AI design is still risky, we just haven’t thought of the reasons why yet. So we should make a pessimistic meta-induction and conclude that even if our standard arguments for risk don’t apply, the system might still be risky—we should think more about it. 2. I intended those two "perhaps..." statements to be things the person says, not necessarily things that are true. So yeah, maybe they say the standard arguments don’t apply. But maybe they are wrong. People are great at rationalizing, coming up with reasons to get to the conclusion they wanted. If the conclusion they want is "We finally did it and made a super powerful impressive AI, come on come on let’s take it for a spin!" then it’ll be easy to fool yourself into thinking your architecture is sufficiently different as to not be problematic, even when your architecture is just a special case of the architecture in the standard arguments. Points 1 and 2 are each individually sufficient to vindicate my claims, I think.
Comment
Comment
Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good. I wonder whether 1 will be easy. I think it relies on the first AI systems being made by one of a small selection of easily-identifiable orgs
Comment
Comment
Comment
Interesting. I’d love to hear more about the sorts of worlds conditioned on in your (b). For my part, the worlds I described in the original post seem both the most likely and also not completely hopeless—maybe with a month of extra effort we can actually come up with a solution, or else a convincing argument that we need another month, etc. Or maybe we already have a mostly-working solution by the time The Talk happens and with another month we can iron out the bugs.
Comment
I just wanted to say that this is a good question, but I’m not sure I know the answer yet. Worlds that appear most often in my musings (but I’m not sure they’re likely enough to count) are:
an aligned group getting a decisive strategic advantage
safety concerns being clearly demonstrated and part of mainstream AI research
Perhaps general reasoning about agents and intelligence improves, and we can apply these techniques to AI designs
Perhaps things contiguous with alignment concerns cause failures in capable AI systems early on
A more alignable paradigm overtaking ML
This seems like a fantasy
Could be because ML gets bottlenecked or a different approach makes rapid progress
Comment
Thanks, that was an illuminating answer. I feel like those three worlds are decently likely, but that if those worlds occur purchasing additional expected utility in them will be hard, precisely because things will be so much easier. For example, if safety concerns are part of mainstream AI research, then safety research won’t be neglected anymore.
Comment
You can purchase additional EU by pumping up their probability as well EDIT: I know I originally said to condition on these worlds, but I guess that’s not what I actually do. Instead, I think I condition on not-doomed worlds
Comment
Ah, that sounds much better to me. Yeah, maybe the cheapest EU lies in trying to make these worlds more likely. I doubt we have much control over which paradigms overtake ML, and I think that the intervention I’m proposing might help make the first and second kinds of world more likely (because maybe with a month of extra time to analyze their system, the relevant people will become convinced that the problem is real)