Humanity as an entity: An alternative to Coherent Extrapolated Volition

https://www.lesswrong.com/posts/6sJeAp2gziBpPBvLr/humanity-as-an-entity-an-alternative-to-coherent

Contents

What is value?

The recursion principle

FAI contains a model of human values. FAI queries its model. Human values sometimes contain contradictions. Humans sometimes disagree. How does FAI resolve these contradictions? Recursion. Query the model of human values again. How would we *want *FAI to resolve its internal contradictions and disagreements? How would we *want *FAI to handle the contradictions in human values? No hard-coded answer. Use recursion. (Some starting "seed" algorithm for resolving conflict/​contradiction in values is needed, though. Otherwise no answer will ever be produced)

Differences from Coherent Extrapolated Volition (CEV)

CEV (also described in more detail on arbital) seems to like to hard-code things into existence which (I think) don’t need to be hard-coded (and might actually be harmful to hard-code) My understanding of humanity’s values is not coherent by definition. It is not extrapolated by definition. It is simply volition. Coherence is a trade-off. FAI wants to be coherent. It doesn’t want to be acting at cross-purposes with itself. That doesn’t have to be *defined *into existence. It simply follows out of human values. But FAI is also ok with wanting contradictory things, because humanity wants contradictory things. FAI is OK with internal conflicts, because some desires are fundamentally in conflict. FAI might even be OK with external conflicts: if two humans are in competition, the shard of FAI that supports human A in some sense opposes the shard of FAI that supports human B. That is in line with humanity’s values. Extrapolation is definitely a trade-off. FAI would want to act on some extrapolated version of our values (In words of EY, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together"). Because we would *want *it to, because we wouldn’t want FAI acting on a version of our values based on our delusions and false understanding of reality, a version of our values that would **horrify **us if we were smarter and more self-aware. Once again, this is not needed to be defined into existence. This is simply what humans want. But why would we want to extrapolate values to begin with? Why are we not happy with our existing values? Surely *some *humans are happy with their existing values? Contradictions. Either the values contain internal contradictions, or they are based on a false understanding of reality. Your values want contradictory things? Problem. How resolve? Your values are based on believing in god that doesn’t exist? Problem. How resolve? Well, how would you *want *this to be resolved? (see above. Applying the recusion principle). A hypothetical version of a religious fundamentalist, boosted with higher self-awareness, intelligence, and true knowledge of reality would sort their errors and contradictions out (and their new values would be what we call "extrapolated" values of that person). No external correction needed. But why are contradictions a problem to begin with? Even a contradictory value system produces *some *sort of behavior. Why can’t the value system just say, "I’m fine with being contradictory. Don’t change me. Don’t extrapolate me". And maybe some value systems would. Maybe some values systems would be aware of their own contradictoriness and be OK with it. And maybe some value systems would say "Yup, my understanding of reality is not completely full or correct. My predictions of the future are not completely full or correct. I’m fine with this. Leave it this way. Don’t make me better at those things". But I don’t think that is true of the value systems of present-day humans. So FAI would extrapolate our values, because our values would want to be extrapolated. And it at some point our values would stop wanting to be extrapolated further. And so FAI would stop extrapolating our values. At some point, the hypothetical-agent would say "Stop. Don’t make me smarter. Don’t make me more self-aware. Doing so would damage or *destroy *some of the essential contradictions that make me who I am."″The extrapolation trade-off is no longer worth it. The harm done to me by further extrapolation outweighs the harm done to me and others by the contradictions in my values and my incomplete understanding of reality". Because this is what this is about, isn’t it? The human cost of our disagreements, our delusions, our *misalignment *with humanity’s values, or the misalignment of our *own *values. The suffering, the destruction, the deaths caused by it. But, once most of the true-suffering has been eliminated (and I am talking about true-suffering because not all suffering is bad. Not all suffering is undesirable. Not all suffering is negutility).Then "the human cost" argument would no longer apply. And we would no longer want for our values to be fast-forwarded to the values humanity will have a million years in future, or the values a superintelligent human with 1000000 IQ points would have. Because at that point, we would *already *be in Utopia. And Utopia is about fun. And it’s more fun to learn and grow at our own pace, than to be handed an answer on a silver platter, or than to be forced to comply with values we do not yet feel are our own.

Comment

https://www.lesswrong.com/posts/6sJeAp2gziBpPBvLr/humanity-as-an-entity-an-alternative-to-coherent?commentId=p8mmev8dAHmnB77Ad

Argument against CEV seems cool, thanks for formulating it. I guess we are leaving some utility on the table with any particular approach. Part on referring to a model to adjudicate itself seems really off. I have a hard time imagining a thing that has better performance at meta-level than on object-level. Do you have some concrete example?

Comment

https://www.lesswrong.com/posts/6sJeAp2gziBpPBvLr/humanity-as-an-entity-an-alternative-to-coherent?commentId=XRLcwQeRGLZdvX9gw

Part on referring to a model to adjudicate itself seems really off. I have a hard time imagining a thing that has better performance at meta-level than on object-level. Do you have some concrete example? Let me rephrase it: FAI has a part of its utility function that decides how to "aggregate" our values, how to resolve disagreements and contradictions in our values, and how to extrapolate our values. Is FAI allowed to change that part? Because if not, it is stuck with our initial guess on how to do that, forever. That seems like it *could *be really bad. Actual example: -What if groups of humans self-modify to care a lot about some particular issue, in an attempt to influence FAI? More far-fetched examples: -What if a rapidly-spreading mind virus drastically changes the values of most humans? -What if aliens create trillions of humans that all recognize the alien overlords as their masters? Just to be clear of the point of the examples, these are examples where a "naive" aggregation function might allow itself to be influenced, while a "recursive" function would follow the meta-reasoning that we wouldn’t want FAI’s values and behavior to be influenced by *adversarial *modification of human values, only by *genuine *changes in such (whatever "genuine" means to us. I’m sure that’s a very complex question. Which is kind of the point of needing to use recursive reasoning. Human values are very complex. Why would human meta-values be any less complex?)