Review of ‘Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More’

https://www.lesswrong.com/posts/ZPEGLoWMN242Dob6g/review-of-debate-on-instrumental-convergence-between-lecun

I think that ‘robust instrumentality’ is a more apt name for ‘instrumental convergence.’ That said, for backwards compatibility, this post often uses the latter. * In the summer of 2019, I was building up a corpus of basic reinforcement learning theory. I wandered through a sun-dappled Berkeley, my head in the clouds, my mind bent on a single ambition: proving the existence of instrumental convergence. Somehow. I needed to find the right definitions first, and I couldn’t even imagine what the final theorems would say. The fall crept up on me… and found my work incomplete. Let me tell you: if there’s ever been a time when I wished I’d been months ahead on my research agenda, it was September 26, 2019: the day when world-famous AI experts debated whether instrumental convergence was a thing, and whether we should worry about it. The debate unfolded below the link-preview: an imposing robot staring the reader down, a title containing ‘Terminator’, a byline dismissive of AI risk: **Scientific American*Don’t Fear the Terminator**"Artificial intelligence never needed to evolve, so it didn’t develop the survival instinct that leads to the impulse to dominate others."The byline seemingly affirms the consequent: "evolution \implies survival instinct" does not imply "no evolution \implies no survival instinct." That said, the article raises at least one good point: we choose the AI’s objective, and so why must that objective incentivize power-seeking?I wanted to reach out, to say, "hey, here’s a paper formalizing the question you’re all confused by!" But it was too early. Now, at least, I can say what I wanted to say back then: This debate about instrumental convergence is really, really confused. I heavily annotated the play-by-play of the debate in a Google doc, mostly checking local validity of claims. (**Most of this review’s object-level content is in that document, by the way. **Feel free to add comments of your own.) This debate took place in the pre-theoretic era of instrumental convergence. Over the last year and a half, I’ve become a lot less confused about instrumental convergence. I think my formalisms provide great abstractions for understanding "instrumental convergence" and "power-seeking." I think that this debate suffers for lack of formal grounding, and I wouldn’t dream of introducing someone to these concepts via this debate. While the debate is clearly historically important, I don’t think it belongs in the LessWrong review. I don’t think people significantly changed their minds, I don’t think that the debate was particularly illuminating, and I don’t think it contains the philosophical insight I would expect from a LessWrong review-level essay. Rob Bensinger’s nomination reads:

May be useful to include in the review with some of the comments, or with a postmortem and analysis by Ben (or someone). I don’t think the discussion stands great on its own, but it may be helpful for:

Yann LeCun: … instrumental subgoals are much weaker drives of behavior than hardwired objectives. Else, how could one explain the lack of domination behavior in non-social animals, such as orangutans. I’m glad that this debate happened, but I think it monkeys around too much to be included in the LessWrong 2019 review.

Comment

https://www.lesswrong.com/posts/ZPEGLoWMN242Dob6g/review-of-debate-on-instrumental-convergence-between-lecun?commentId=HqtD6eK5B3gF9xfYf

Yeah I agree. I think it’s useful to have a public record of it, and I’m glad that public conversation happened, but I don’t think it’s an important part of the ongoing conversation in the rationality community, and the conversation wasn’t especially insightful. I hope some day we’ll have better debates with more resources devoted by either side than a FB comment thread, and perhaps one day that will be good for the review.