Louie and I are sharing a draft of our chapter submission to The Singularity Hypothesis for feedback:
The Singularity and Machine Ethics
Thanks in advance.
Also, thanks to Kevin for suggesting in February that I submit an abstract to the editors. Seems like a lifetime ago, now.
**Edit: **As of 3/31/2012, the link above now points to a preprint.
Gack! I wish I had encountered this quote (from Gladwell) long ago, so I could include it in "The Singularity and Machine Ethics":
Lenat had developed an artificial-intelligence program that he called Eurisko, and he decided to feed his program the rules of the tournament. Lenat did not give Eurisko any advice or steer the program in any particular strategic direction. He was not a war-gamer. He simply let Eurisko figure things out for itself. For about a month, for ten hours every night on a hundred computers at Xerox PARC, in Palo Alto, Eurisko ground away at the problem, until it came out with an answer. Most teams fielded some version of a traditional naval fleet—an array of ships of various sizes, each well defended against enemy attack. Eurisko thought differently. "The program came up with a strategy of spending the trillion on an astronomical number of small ships like P.T. boats, with powerful weapons but absolutely no defense and no mobility," Lenat said. "They just sat there. Basically, if they were hit once they would sink. And what happened is that the enemy would take its shots, and every one of those shots would sink our ships. But it didn’t matter, because we had so many." Lenat won the tournament in a runaway.
The next year, Lenat entered once more, only this time the rules had changed. Fleets could no longer just sit there. Now one of the criteria of success in battle was fleet "agility." Eurisko went back to work. "What Eurisko did was say that if any of our ships got damaged it would sink itself—and that would raise fleet agility back up again," Lenat said. Eurisko won again.
...The other gamers were people steeped in military strategy and history… Eurisko, on the other hand, knew nothing but the rule book. It had no common sense… [But] not knowing the conventions of the game turned out to be an advantage.
[Lenat explained:] "What the other entrants were doing was filling in the holes in the rules with real-world, realistic answers. But Eurisko didn’t have that kind of preconception..." So it found solutions that were, as Lenat freely admits, "socially horrifying": send a thousand defenseless and immobile ships into battle; sink your own ships the moment they get damaged.
Comment
Or, an example of human Munchkinism:
Nice find! This will come in handy.
Sounds like the sort of strategy that evolution would invent. Or rather, already has, repeatedly — "build a lot of cheap little war machines and don’t mind the casualties" is standard operating procedure for a lot of insects.
But yeah, it’s an awesome lesson in "the AI optimizes for what you tell it to optimize for, not for what humans actually want."
Overall, I thought it was very good. I agree that "super optimizer" is more likely to create the correct impression in the average person than "super intelligence", and will stop using the latter term.
The bit about the "golem genie" seems forced, though—I’m not sure it actually clarifies things. It seems like such a direct analogy; I’d expect that people that understand "superoptimizer" won’t need the analogy, and those who don’t understand, won’t be helped by it. For the latter group of people, it might help to introduce the golem before talking about superoptimization at all. It’s quite possible I’m wrong
Reading this, I felt a strange sense of calm coming over me: we finally have a really good introductory article to the issue, and SingInst finally has people who can write such articles.
I feel like humanity’s future is in good hands, and that SI now has a realistic chance of attracting enough mainstream academic interest to make a difference.
Also, this paragraph:
made me feel like SI might now have a clue of how to usefully put extra money into use if they got it, something that I was doubtful about before.
I like this—I feel it does a decent job of showing how your neuroscience posts fit into the FAI/intelligence explosion narrative. A few minor comments:
I like the "superoptimizer" terminology, but this sentence makes it sound like we can expect superintelligence to behave differently merely by calling it something different. I realise this isn’t what you mean—I just feel it would be better rephrased in terms of "this avoids bias-inducing loaded terminology".
Very minor point: it would be nice to add a citation here: someone who says that orgasmium is suboptimal or that most people think orgasmium is suboptimal.
What is it about this particular example that casts doubt on the homuncular "self"? I can believe that we have many cognitive modules that give competing answers to the crying baby dilemma, but how can I tell that just by reading it? (And doesn’t the same thing happen for every moral dilemma we read?)
You use the "Golem Genie" in an odd way (it figures in only a tiny portion of the paper). You introduce the thought experiment (to elicit a sense of urgency and concrete importance, I assume), and point out the analogy to superintelligence. With the exception of a few words on hedonistic utilitarianism, all the specific examples of moral theories resulting in unwanted consequences when implemented are talked about with reference to superintelligence, never mentioning the Genie again. If you want to keep the Genie part, I would keep it until you’ve gone through all the moral theories you discuss, and only at the end point out the analogy to superintelligence.
One very tiny nitpick:
Is "unpublished" the right term to use here? It hasn’t been published in a peer-reviewed source, but in much usage, published online does count as published.
Comment
Agreed—either way, give the URL.
Even smaller nitpicks concerning formatting and layout:
The footnotes are in a sans-serif font, and larger than the main text, and all the text is ragged-right, and there are lonely section headings (eg "The Golem Genie"), and footnotes split across pages.
The intro is basically a slightly expanded version of the abstract. That is common in academic publications, but not in good ones.
The paper seems to end rather than reach a conclusion. As with all the above, this is a criticism of form, not content.
Step 1 - for many minds without too short a horzion—is to conquer the galaxy to make sure there are no aliens arount that might threaten their entire value system. Tiling the local universe with tiny happy digital minds could easily turn out to be a recipe for long-term disaster—resulting in a universe with happiness levels being dictated by others.
This part feels like it should have a cite to Omohundro’s Basic AI Drives paper, which contains these paragraphs:
Another approach to keeping systems from self-improving is to try to restrain them from the inside; to build them so that they don’t want to self-improve. For most systems, it would be easy to do this for any specific kind of self-improvement. For example, the system might feel a "revulsion" to changing its own machine code. But this kind of internal goal just alters the landscape within which the system makes its choices. It doesn’t change the fact that there are changes which would improve its future ability to meet its goals. The system will therefore be motivated to find ways to get the benefits of those changes without triggering its internal "revulsion". For example, it might build other systems which are improved versions of itself. Or it might build the new algorithms into external "assistants" which it calls upon whenever it needs to do a certain kind of computation. Or it might hire outside agencies to do what it wants to do. Or it might build an interpreted layer on top of its machine code layer which it can program without revulsion. There are an endless number of ways to circumvent internal restrictions unless they are formulated extremely carefully.
Might be good to link to some papers on problems with the RL agents—horizon, mugging, and the delusion box http://lesswrong.com/lw/7fl/link_report_on_the_fourth_conference_on/
The Golem Genie is not explained—you’re writing it as neither a positive nor negative agent: it is not an evil genie/demon and that’s a intuition pump that should be avoided as an anthropomorphism.
And negative-sum games too, presumably, like various positional or arms races. (Don’t have any citations, I’m afraid.)
My impression: You spend a long time discussing the problem, but very little on what solutions need to look like. It just ends.
That seems a bit over-the-top to me—"superintelligence" is fine.
Update: This link now points to a preprint.
Update: This link now points to a 02-26-2012 draft of ‘The Singularity and Machine Ethics’.
So: there are other reasons to expect that as well: more effective repuatation systems seem likely to make all actors more moral, including the companies that make computers and robots. Machines are a component of society—and they will be for quite a while.