(A longer text-based version of this post is also available on MIRI’s blog here, and the bibliography for the whole sequence can be found here)
(A longer text-based version of this post is also available on MIRI’s blog here, and the bibliography for the whole sequence can be found here)
I actually have some understanding of what MIRI’s Agent Foundations work is about
This post (and the rest of the sequence) was the first time I had ever read something about AI alignment and thought that it was actually asking the right questions. It is not about a sub-problem, it is not about marginal improvements. Its goal is a gears-level understanding of agents, and it directly explains why that’s hard. It’s a list of everything which needs to be figured out in order to remove all the black boxes and Cartesian boundaries, and understand agents as well as we understand refrigerators.
I nominate this post for two reasons.
One, it is an excellent example of providing supplemental writing about basic intuitions and thought processes, which is extremely helpful to me because I do not have a good enough command of the formal work to intuit them.
Two, it is one of the few examples of experimenting with different kinds of presentation. I feel like this is underappreciated and under-utilized; better ways of communicating seems like a strong baseline requirement of the rationality project, and this post pushes in that direction.
This post has significant changed my mental model of how to understand key challenges in AI safety, and also given me a clearer understanding of and language for describing why complex game-theoretic challenges are poorly specified or understood. The terms and concepts in this series of posts have become a key part of my basic intellectual toolkit.
This sequence was the first time I felt I understood MIRI’s research. (Though I might prefer to nominate the text-version that has the whole sequence in one post.)
Read sequence as research for my EA/rationality novel, this was really good and also pretty easy to follow despite not having any technical background