Proposal for EA fund about x-risks

https://www.lesswrong.com/posts/DF7pJLfJvkiQwDoTb/proposal-for-ea-fund-about-x-risks

Introduction: Elon Musk seems to care a lot about x-risks and particularly about misaligned AGI, but seems to have not updated to shorter timelines so far and have some misundertandings about AGI risks. From what I see on this forum and from progress in ML, it seems that we have about 5-10 years until misaligned (maybe prosaic) AGI. It might be very helpful to ensure that somebody like Elon is informed and ready to take actions if needed.[Epistemetic status: just my thoughts about possible low hanging fruits. Also, I am somewhat positively biased about Elon Musk]For clarification, by "we" I mean "people who interested in this kind of project and probably have very pessimistic AGI timelines"**How much will this proposal cost? **Around 50-100 k$.Something like "Hello Elon, we are willing to pay you 50k$ for 4 hours of iterated conversations with us, it is about AGI". Or maybe there is someone in the EA community who can contact him directly. I don’t really know how this works, but I think it will be the easiest part.In order to persuade Elon, I imagine somebody like Robert Miles, Scott Alexander, Gwern, maybe even Eliezer Yudkowsy himself. I know that he doesn’t like Musk for that thing with OpenAI. But hey, Elon only co-founded it, not really participating, and it was 7 years ago, and it originally planned to be an AI Safety lab. Anyway, each member of a team of 2-5 people probably will be happy with something like 10k$ each for that job, or even less. Again, I don’t know how much exactly it will cost. But I am sure that it is relatively cheap.That’s it. Plus a couple of weeks discussing strategy, and in the end we might have a billionaire with a giant social network at his disposal aligned with the AI Safety community. For me it sounds like a very good deal. Even if you think that the actions I came up with are not going to work, I still think that there is a lot of value in changing people’s minds, especially powerful ones. Elon just seems to be the easiest one to start. Also, if it turns out that Elon doesn’t actually understand what the problem is, and/​or really misaligned, then I agree that it would be a bad idea to go further. So the first part is more like an experiment to figure out his intentions and models, and maybe nudging them in the right direction if it is possible. But I am pretty convinced by his words that he does actually care about humanity and more or less understands AI-risks. The second part is about probable actions we can take with his help.**Probable actions with help of Elon Musk:**1. For some people, the world is a mysterious place by default, and we probably can’t change anything about that. It is just not a viable strategy to try to explain to them something like "Alignment problem". So maybe it would be smarter to just accept their rules and create some social game for them to play. Before you are going to write a comment about bad things associated with "using social status", I want to remind that we happen to live in universe full of agents that still value it, and many of agents, responsible for adapting policies and making global decisions, are no exception at all.This part is the hardest. We can develop a comprehensive exam called, something like "AI Alignment theoretical minimum". It will be much harder to create such that as hard to game as physics theoretical minimum by Landau. But I assume it is worth a try.I imagine something like a two-step exam. The first part being the easiest, for the initial screening, just a test with few dozens of questions with several answers each. The second part will have harder questions without answers to choose from, and will require somebody to evaluate answers. For this we will probably need to hire several AI Safety researchers that will evaluate answers.Then Elon can create a unique badge on Twitter indicating those who have successfully passed the exam, or failed it while trying. Exam can be passed again, but only after a certain amount of time, like 6 months. Shiny mysterious badge, upon tapping or hovering a cursor over it, will explain the core ideas behind it and give a link to the further materials and the exam itself. As a schelling fence, it might be wise to precommit to creating only one such badge ever, so it will hopefully become a noticeable social status signal.2. Financing Convincing All Capability Researchers and/​or promoting it in a way that makes their expected failure as shameful and noticeable as possible.3. We can secretly [REDACTED]4. Your option for "how convince a lot of people that we’re all going to die if nothing changes" might be here.**Can I clearly tell a story on how this reduces x-risk?**For exam and badge story:If we managed to create a good social signal associated with AI-risk, it would be much easier to persuade policy makers to do something. More people may start to investigate the problem and change their minds. It is especially important for capability researchers. I imagine something like:Yet another ML researcher being sceptic (R):R: oh, look at these loonatics and their stupid badge. I am sure I can easily game their stupid exam and start posting even more about how much of a loonatics they are. Lets see.. hm, "in a case of failing the exam you will can not retry passing an exam or remove the badge for next 6 months". Ok then, I am going to look at their stupid booklets and make sure that I will pass the exam. ** **several hours laterR: holy sht, they are not loonatics, we are really screwed.[Good ending]However, it can backlash if we create an exam that is easy to game, so my guess is that it is the most probable point of failure. **several hours later*R: I told you, stupid easy exam that means nothing. Look at me, now I am an AI Safety loonatic, our microwaves are going to kill us lol.[Bad ending]Of course, it is not about one person. It is about the percentage of "good endings" compared to "bad ones", as a function of the quality of an exam.Also, it may backfire as more people start to speak about it, and it will generate an information cascade, resulting in an uncontrollable change in public opinion, which may be very harmful, but I don’t think that it is probable. Big interventions will always have some probability of a very bad outcome. But inaction seems to lead to catastrophe.What it is not aboutI don’t claim that we can simply buy opinions on this topic. But we can try to incentivise people to look into it, by maybe publicly rewarding those who succeed with something interesting, not the money.I don’t claim that Elon Musk will figure it out all by himself and that we can simply outsource it to him. But we can use his help in order to probably achieve more.I don’t think that I have a complete list of possible further actions. It very might be that somebody will come up with something better. It is more about ensuring alignment with powerful people.It is not only about Elon Musk. I just think it would be easier with him. I would be great to explain why we are doomed to other billionaires and presidents and so on, but it seems to be much harder, and more likely to backfire.TL;DR:1. Contact Elon, meet with him, figure out what he thinks about AI-risks.2. If it is possible, change his mind about crucial points like instrumental convergence, possible timelines and the current impossibility to robustly align our goals with AGI.3. In case of success, we can think about what to do next. Probably something from my list, or something better.P.s. English is not my primary language, so I am sorry in advance for stupid typos.

Comment

https://www.lesswrong.com/posts/DF7pJLfJvkiQwDoTb/proposal-for-ea-fund-about-x-risks?commentId=NqD7TjHYPXyrrPwLc

You make it sound like Elon Musk founded OpenAI without speaking to anyone in X-risk. That’s not the case. EA’s talked with him back then. As far as my memory goes, at least Nick Bostrom did try to explain everything to Musk. Later, I do remember someone speaking about talking to Musk during Neurolink’s existence and being disappointed with Musk’s inability to follow arguments related to why Neurolink is not a good plan to avoid AI risk.

If we managed to create a good social signal associated with AI-risk, it would be much easier to persuade policy makers to do something. The goal isn’t to get people "to do something" but to get people to take effective action. If you push AI safety to be something that’s about signaling, you will unlikely get effective action related to it.

Comment

https://www.lesswrong.com/posts/DF7pJLfJvkiQwDoTb/proposal-for-ea-fund-about-x-risks?commentId=CZfgPNdJ4vfFExewu

Thank you for reply.

  • You make it sound like Elon Musk founded OpenAI without speaking to anyone in X-risk

I didn’t know about that, it was good move from EA, why don’t try it again? Again, I don’t say that we definitely need to make badge on twitter, first of all, we can try to change Elon’s models, and after that we can think what to do next.

2.Musk’s inability to follow arguments related to why Neurolink is not a good plan to avoid AI risk.

Well, if it is conditional on: "there are widespread concerns and regulations about AGI" and "neuralink is working and can significantly enhance human intelligence" then i can clearly see how it will decrease AI-risks. Imagine Yudkowsky with significantly enhanced capabilities working with several others AI safety researchers, communicating with speed of thought. Of course it will mean that no one else get their hands on that for a while, and we need to build it before AGI become a thing. But it still possible, and i can clearly see how anybody in 2016 is incapable of predicting current ML progress and therefore places their bets on something long-playing, like neuralink

  • If you push AI safety to be something that’s about signaling, you will unlikely get effective action related to it.

If you can’t use signalling before you can pass "a really good exam that shows your understanding of topic" why it will be a bad signal? There are exams that didn’t fall that badly for goodhart’s law, like, you can’t solve a test for calculating integrals, without actually good practical skill. My idea around badge was more like "trick people that it is easy and they can get another social signal, watch how they realize the problem after investigating it"

And the whole idea of post isn’t about "badge", it’s about "talk with powerful people to explain to them our models"