c07fa7eb8284496300745a22d3af73bcad778966e1bc6f28af7d83da8fa0fd74b5200b565b5db9b4816b2106e1adf0bab52d6275a5b97604acd23f0a835c933e

printouts

{"Answer":["There's the \"we never figure out how to reliably instill AIs with human friendly goals\" filter, which seems pretty challenging, especially with ［https://www.youtube.com/watch?v꞊bJLcIBixGj8 inner alignment］, solving morality in a way which is possible to code up, interpretability, etc.\n\nThere's the \"race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely\" which is potentially made worse by the twin issues of \"maybe robustly aligned AIs are much harder to build\" and \"maybe robustly aligned AIs are much less compute efficient\".\n\nThere's the \"we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late\" filter. The paper ［https://arxiv.org/abs/1701.04739 The Pursuit of Exploitable Bugs in Machine Learning］ explores this.\n\nFor a much more in depth analysis, see ［https://ai-alignment.com/ai-alignment-landscape-d3773c37ae38 Paul Christiano's AI Alignment Landscape］ talk and ［https://www.alignmentforum.org/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?］."],"StampedBy":["plex"],"Tags":[{"fulltext":"Superintelligence","fullurl":"https://stampy.ai/wiki/Superintelligence","namespace":0,"exists":"1","displaytitle":"superintelligence"},{"fulltext":"Race dynamics","fullurl":"https://stampy.ai/wiki/Race_dynamics","namespace":0,"exists":"1","displaytitle":"race dynamics"}]}

fulltext

Plex's Answer to Why is AI alignment hard?

fullurl

https://stampy.ai/wiki/Plex%27s_Answer_to_Why_is_AI_alignment_hard%3F

namespace

exists

displaytitle

Answer to Why is AI alignment hard?

question

Why is AI alignment hard?

answer

["There's the \"we never figure out how to reliably instill AIs with human friendly goals\" filter, which seems pretty challenging, especially with ［https://www.youtube.com/watch?v꞊bJLcIBixGj8 inner alignment］, solving morality in a way which is possible to code up, interpretability, etc.\n\nThere's the \"race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely\" which is potentially made worse by the twin issues of \"maybe robustly aligned AIs are much harder to build\" and \"maybe robustly aligned AIs are much less compute efficient\".\n\nThere's the \"we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late\" filter. The paper ［https://arxiv.org/abs/1701.04739 The Pursuit of Exploitable Bugs in Machine Learning］ explores this.\n\nFor a much more in depth analysis, see ［https://ai-alignment.com/ai-alignment-landscape-d3773c37ae38 Paul Christiano's AI Alignment Landscape］ talk and ［https://www.alignmentforum.org/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?］."]

Question: Why is AI alignment hard?

Answer: There's the "we never figure out how to reliably instill AIs with human friendly goals" filter, which seems pretty challenging, especially with ［https://www.youtube.com/watch?v꞊bJLcIBixGj8 inner alignment］, solving morality in a way which is possible to code up, interpretability, etc.

There's the "race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely" which is potentially made worse by the twin issues of "maybe robustly aligned AIs are much harder to build" and "maybe robustly aligned AIs are much less compute efficient".

There's the "we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late" filter. The paper ［https://arxiv.org/abs/1701.04739 The Pursuit of Exploitable Bugs in Machine Learning］ explores this.

For a much more in depth analysis, see ［https://ai-alignment.com/ai-alignment-landscape-d3773c37ae38 Paul Christiano's AI Alignment Landscape］ talk and ［https://www.alignmentforum.org/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?］.