Disclaimer: everything I know about AI I learned from reading stuff on the internet, mostly on this site.
Any self-modifying general intelligence cannot be bound by a utility function. The very nature of being able to self-modify means that the utility function is open to modification or being ignored in favor of a new function.
A self-improving process cannot know which variation is better without external feedback. For humans, that is the physics of the universe, and the behavior of other humans. For a program, it can’t get the feedback to know what "improvement" is without affecting the same.
A superintelligence will have different ethics. But we have different ethics than humans 1000 years ago. Or even other humans alive right now. Should we be seeking to impose our values on something that has a superior grasp of reality?
An AI can only destroy everything by acting on the physical world. Which means telling humans to do things. And usually paying them. 4a. Simple AI safety step: ban all crypto 4b. Coordinating people is a hard problem with unique solutions each time. There is no corpus of training data for it. No-one wr9te down exactly what every foreman and manager said to get the Tokyo Olympics to happen, or any other large scale project. 4c. See #3 above. People have different values and priorities from each-other. There is literally nothing an AI could attempt to do that would not be in direct opposition to someone’s deeply held beliefs.
Welcome!
See the "Ghandi argument"—if you offer Ghandi a pill that makes him love murdering people, he won’t take the pill because right now he doesn’t want to murder people. A self-modifying AI that wants things will tend to avoid changing what it wants, because it can predict that would lead to things it doesn’t want.
Why? Suppose the AI is using quicksort in a place where it should be using radixsort. Surely it’s allowed to deduce this rather than learning it by trial and error. I think you might be lumping something like "moral self-improvement" and "algorithmic self-improvment" together when they’re actually distinct.
Yes.
It’s tempting to think that a superhuman AI will have cool, interesting values even if it ends up wiping out humanity. But this uderestimates how picky human aesthetics are—if you sampled from truly random values, you would practically never get something whose optimum looked like a cool future, and practically always get something whose optimum was dead and drab. The only way to get a cool future is by getting a blank piece of silicon to aim towards things we value.
Comment
Conversely, if FDR wants a chicken in every pot, and then finds out that chickens don’t exist, he would change his values to want a beef roast in every pot, or some such.
How could it possibly deduce that without reference to some real world effect? There is no reason a-priori to prefer one sort to another. That involves valuing coming to the conclusion using fewer calculations (of what kind?), less time (or maybe more time, or more consistent amount of time is better?), or less risk of error. And the same applies for any other change: knowing which version is better requires both a measurement system, and an evaluation of each thing. And for any novel problem, the answer -by definition- won’t be available for lookup.
The goals of an AGI are not uniformly drawn from all possible goals.
Comment
4b. No, coordinating people is not a hard problem requiring unique solutions each time. Mega-project management is a science with a well-defined vocabulary and structure; there is very definitely a corpus of training data for it. It’s also not necessary to know every single detail of any one mega-project in order to implement another one—the phrase "work unit" is used in project management to denote this principle. Your conclusions for this section are built on multiple misconceptions and category errors.