Embedded Agents

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents

(A longer text-based version of this post is also available on MIRI’s blog here, and the bibliography for the whole sequence can be found here)

Comment

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=eMPBrLqEQXZm5g3M3

I actually have some understanding of what MIRI’s Agent Foundations work is about

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=7o44mg7ym8ffbNtWx

This post (and the rest of the sequence) was the first time I had ever read something about AI alignment and thought that it was actually asking the right questions. It is not about a sub-problem, it is not about marginal improvements. Its goal is a gears-level understanding of agents, and it directly explains why that’s hard. It’s a list of everything which needs to be figured out in order to remove all the black boxes and Cartesian boundaries, and understand agents as well as we understand refrigerators.

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=F4HAqe2ESrv4NSk5R

I nominate this post for two reasons.

One, it is an excellent example of providing supplemental writing about basic intuitions and thought processes, which is extremely helpful to me because I do not have a good enough command of the formal work to intuit them.

Two, it is one of the few examples of experimenting with different kinds of presentation. I feel like this is underappreciated and under-utilized; better ways of communicating seems like a strong baseline requirement of the rationality project, and this post pushes in that direction.

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=q5D8HQkL9maM3mL37

This post has significant changed my mental model of how to understand key challenges in AI safety, and also given me a clearer understanding of and language for describing why complex game-theoretic challenges are poorly specified or understood. The terms and concepts in this series of posts have become a key part of my basic intellectual toolkit.

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=PeeQx3QGBc4PQtyyA

This sequence was the first time I felt I understood MIRI’s research. (Though I might prefer to nominate the text-version that has the whole sequence in one post.)

https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents?commentId=ptM3SJTeewSmshptt

Read sequence as research for my EA/​rationality novel, this was really good and also pretty easy to follow despite not having any technical background