AI Alignment Using Reverse Simulation

https://www.lesswrong.com/posts/7iJWxMrzngoHiBaRk/ai-alignment-using-reverse-simulation

A simulation is often used to describe a computer program imitating a physical process. In this post, I use the term "reverse simulation" to refer to a scenario when a computer program is simulated using physics. The approach I am suggesting here, is meant to address one problem of AI alignment that belongs to the philosophy of mind: How can we prevent AGI from accidentally causing suffering of minds, when we do not yet know an objective theory of minds? It is not evident to me that suffering of minds can be assigned a utility. Of course, assuming one could assign a utility, there might be a correct decision theory. However, if suffering of minds can not be accurately described, then there is some philosophical unsoundness that follows from an inaccurate description. The basic idea is to use two steps for aligning AGI: