Draft of Muehlhauser & Helm, ‘The Singularity and Machine Ethics’

rPLvEQyTx6HnPijtC

title

authors

lukeprog

date_published

2011-11-18T07:00

score

omega_karma

votes

Comment

id

cXJ55eD2TocdFPzhN
authors

lukeprog
score

9
omega_karma
votes

6
date_published

2013-04-01T16:04

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=cXJ55eD2TocdFPzhN

Or, an example of human Munchkinism:

While playing Rollercoaster Tycoon one time, I remember that I was tasked with the mission of getting a higher approval rating than the park next door. Rather than make my park better, I instead built a rollercoaster that launched people at 100mph into my rival’s park. Since technically those people died in my rival’s park, their approval rating would plummet and people would rush to my park and straight into my deathcoaster, which only caused their rating to drop lower and lower. I did this for an hour until the game said I’d won.

id

tAS77pmhz5v9r3GB4
authors

danieldewey
score

4
omega_karma
votes

2
date_published

2013-03-27T09:20

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=tAS77pmhz5v9r3GB4

Nice find! This will come in handy.

id

K5JAAAq4XLi8nBkxq
authors

fubarobfusco
score

0
omega_karma
votes

0
date_published

2013-03-27T06:48

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=K5JAAAq4XLi8nBkxq

Sounds like the sort of strategy that evolution would invent. Or rather, already has, repeatedly — "build a lot of cheap little war machines and don’t mind the casualties" is standard operating procedure for a lot of insects.

But yeah, it’s an awesome lesson in "the AI optimizes for what you tell it to optimize for, not for what humans actually want."

id

4rfYt7RxdwBtWree4
authors

lavalamp
score

8
omega_karma
votes

7
date_published

2011-11-18T16:59

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=4rfYt7RxdwBtWree4

Overall, I thought it was very good. I agree that "super optimizer" is more likely to create the correct impression in the average person than "super intelligence", and will stop using the latter term.

The bit about the "golem genie" seems forced, though—I’m not sure it actually clarifies things. It seems like such a direct analogy; I’d expect that people that understand "superoptimizer" won’t need the analogy, and those who don’t understand, won’t be helped by it. For the latter group of people, it might help to introduce the golem before talking about superoptimization at all. It’s quite possible I’m wrong

id

J4PnfzBpaGLaaQ2wP
authors

Kaj_Sotala
score

7
omega_karma
votes

8
date_published

2011-11-18T10:37

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=J4PnfzBpaGLaaQ2wP

Reading this, I felt a strange sense of calm coming over me: we finally have a really good introductory article to the issue, and SingInst finally has people who can write such articles.

I feel like humanity’s future is in good hands, and that SI now has a realistic chance of attracting enough mainstream academic interest to make a difference.

Also, this paragraph:

Neuroeconomists and other cognitive neuroscientists can continue to uncover how human values are encoded and modified in the brain. Philosophers and mathematicians can develop more sophisticated value extrapolation algorithms, building on the literature concerning reflective equilibrium and "ideal preference" or "full information" theories of value. Economists, neuroscientists, and AI researchers can extend current results in choice modelling (Hess and Daly 2010) and preference elicitation (Domshlak et al. 2011) to extract preferences from human behavior and brain activity. Decision theorists can work to develop a decision theory that is capable of reasoning about decisions and values subsequent to self-modification: a "reflective" decision theory.

made me feel like SI might now have a clue of how to usefully put extra money into use if they got it, something that I was doubtful about before.

id

KWHDLSC96Jou543KB
authors

Giles
score

6
omega_karma
votes

5
date_published

2011-11-18T20:45

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=KWHDLSC96Jou543KB

I like this—I feel it does a decent job of showing how your neuroscience posts fit into the FAI/intelligence explosion narrative. A few minor comments:

Using this term, it should be clear that a machine superoptimizer will not necessarily be modest or honest

I like the "superoptimizer" terminology, but this sentence makes it sound like we can expect superintelligence to behave differently merely by calling it something different. I realise this isn’t what you mean—I just feel it would be better rephrased in terms of "this avoids bias-inducing loaded terminology".

Thus, though some utilitarians have proposed that all we value is pleasure, our intuitive negative reaction to hypothetical worlds in which pleasure is (more or less) maximized suggests that pleasure is not the only thing we value.

Very minor point: it would be nice to add a citation here: someone who says that orgasmium is suboptimal or that most people think orgasmium is suboptimal.

Consider the "crying baby" scenario (Greene et al. 2004):

What is it about this particular example that casts doubt on the homuncular "self"? I can believe that we have many cognitive modules that give competing answers to the crying baby dilemma, but how can I tell that just by reading it? (And doesn’t the same thing happen for every moral dilemma we read?)

id

JAfWD7GztWD8nfXBS
authors

AlexSchell
score

6
omega_karma
votes

5
date_published

2011-11-18T15:29

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=JAfWD7GztWD8nfXBS

You use the "Golem Genie" in an odd way (it figures in only a tiny portion of the paper). You introduce the thought experiment (to elicit a sense of urgency and concrete importance, I assume), and point out the analogy to superintelligence. With the exception of a few words on hedonistic utilitarianism, all the specific examples of moral theories resulting in unwanted consequences when implemented are talked about with reference to superintelligence, never mentioning the Genie again. If you want to keep the Genie part, I would keep it until you’ve gone through all the moral theories you discuss, and only at the end point out the analogy to superintelligence.

id

CqiRXK8XswwD3Q4cN
authors

Kaj_Sotala
score

5
omega_karma
votes

3
date_published

2011-11-18T10:40

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=CqiRXK8XswwD3Q4cN

One very tiny nitpick:

———. "Infinite Ethics." Unpublished manuscript, 2009.

Is "unpublished" the right term to use here? It hasn’t been published in a peer-reviewed source, but in much usage, published online does count as published.

Comment

id

QfNcec86QYwQsPmxm
authors

jmmcd
score

2
omega_karma
votes

2
date_published

2011-11-18T21:20

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=QfNcec86QYwQsPmxm

Agreed—either way, give the URL.

Even smaller nitpicks concerning formatting and layout:

The footnotes are in a sans-serif font, and larger than the main text, and all the text is ragged-right, and there are lonely section headings (eg "The Golem Genie"), and footnotes split across pages.

The intro is basically a slightly expanded version of the abstract. That is common in academic publications, but not in good ones.

The paper seems to end rather than reach a conclusion. As with all the above, this is a criticism of form, not content.

id

ZMJNDcjNaFvgaE3LQ
authors

timtyler
score

4
omega_karma
votes

3
date_published

2011-11-18T14:37

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=ZMJNDcjNaFvgaE3LQ

If a machine superoptimizer’s goal system is programmed to maximize pleasure, it might not tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience, but we think it would do something undesirable like that.

Step 1 - for many minds without too short a horzion—is to conquer the galaxy to make sure there are no aliens arount that might threaten their entire value system. Tiling the local universe with tiny happy digital minds could easily turn out to be a recipe for long-term disaster—resulting in a universe with happiness levels being dictated by others.

id

STqjSaxuNoTbw7jgF
authors

Kaj_Sotala
score

2
omega_karma
votes

2
date_published

2011-11-20T12:51

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=STqjSaxuNoTbw7jgF

More generally, it seems that rules are unlikely to seriously constrain the actions of a machine superoptimizer. First, consider the case in which rules about allowed actions or consequences are added to a machine’s design "outside of" its goals. A machine superoptimizer will be able to circumvent the intentions of such rules in ways we cannot imagine, with far more disastrous effects than those of a lawyer who exploits loopholes in a legal code. A machine superoptimizer would recognize these rules as obstacles to achieving its goals, and would do everything in its considerable power to remove or circumvent them. It could delete the section of its source code that contains the rules, or it could create new machines that don’t have the constraint written into them. This approach requires humans to out-think a machine superoptimzer (Muehlhauser 2011).

This part feels like it should have a cite to Omohundro’s Basic AI Drives paper, which contains these paragraphs:

If we wanted to prevent a system from improving itself, couldn’t we just lock up its hardware and not tell it how to access its own machine code? For an intelligent system, impediments like these just become problems to solve in the process of meeting its goals. If the payoff is great enough, a system will go to great lengths to accomplish an outcome. If the runtime environment of the system does not allow it to modify its own machine code, it will be motivated to break the protection mechanisms of that runtime. For example, it might do this by understanding and altering the runtime itself. If it can’t do that through software, it will be motivated to convince or trick a human operator into making the changes. Any attempt to place external constraints on a system’s ability to improve itself will ultimately lead to an arms race of measures and countermeasures.

Another approach to keeping systems from self-improving is to try to restrain them from the inside; to build them so that they don’t want to self-improve. For most systems, it would be easy to do this for any specific kind of self-improvement. For example, the system might feel a "revulsion" to changing its own machine code. But this kind of internal goal just alters the landscape within which the system makes its choices. It doesn’t change the fact that there are changes which would improve its future ability to meet its goals. The system will therefore be motivated to find ways to get the benefits of those changes without triggering its internal "revulsion". For example, it might build other systems which are improved versions of itself. Or it might build the new algorithms into external "assistants" which it calls upon whenever it needs to do a certain kind of computation. Or it might hire outside agencies to do what it wants to do. Or it might build an interpreted layer on top of its machine code layer which it can program without revulsion. There are an endless number of ways to circumvent internal restrictions unless they are formulated extremely carefully.

id

yFLoKb5EH7T4Lr7uJ
authors

gwern
score

1
omega_karma
votes

1
date_published

2011-11-19T02:21

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=yFLoKb5EH7T4Lr7uJ

Also, Legg’s formal definition of intelligence is drawn from a dualistic "agent-environment" model of optimal agency (Legg 2008: 40) that does not represent its own computation as occurring in a physical world with physical limits and costs. Our notion of optimization power is inspired by Yudkowsky (2008b).

Might be good to link to some papers on problems with the RL agents—horizon, mugging, and the delusion box http://lesswrong.com/lw/7fl/link_report_on_the_fourth_conference_on/

The Golem Genie is not explained—you’re writing it as neither a positive nor negative agent: it is not an evil genie/demon and that’s a intuition pump that should be avoided as an anthropomorphism.

Second, because the existence of zero-sum games means that the satisfaction of one human’s preferences can conflict with the satisfaction of another’s (Geckil and Anderson 2009).

And negative-sum games too, presumably, like various positional or arms races. (Don’t have any citations, I’m afraid.)

id

uTtGftS7YNBgrbsfH
authors

CharlesR
score

1
omega_karma
votes

1
date_published

2011-11-18T22:39

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=uTtGftS7YNBgrbsfH

My impression: You spend a long time discussing the problem, but very little on what solutions need to look like. It just ends.

id

N8SmA3rFL9FQYy39E
authors

timtyler
score

1
omega_karma
votes

2
date_published

2011-11-18T15:01

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=N8SmA3rFL9FQYy39E

To avoid anthropomorphic bias and other previously mentioned problems with the word "intelligence," in this chapter we will use the term "machine superoptimizer" in place of "machine superintelligence."

That seems a bit over-the-top to me—"superintelligence" is fine.

id

KnhbCNMr3iA3nbbZm
authors

lukeprog
score

0
omega_karma
votes

0
date_published

2012-03-31T14:00

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=KnhbCNMr3iA3nbbZm

Update: This link now points to a preprint.

id

F2QDN39JG8uP8ttqz
authors

lukeprog
score

0
omega_karma
votes

0
date_published

2012-02-27T01:17

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=F2QDN39JG8uP8ttqz

Update: This link now points to a 02-26-2012 draft of ‘The Singularity and Machine Ethics’.

id

RntEXm3JDBS6NScBd
authors

timtyler
score

0
omega_karma
votes

0
date_published

2011-11-18T15:05

https://www.lesswrong.com/posts/rPLvEQyTx6HnPijtC/draft-of-muehlhauser-and-helm-the-singularity-and-machine?commentId=RntEXm3JDBS6NScBd

AI researcher J. Storrs Hall suggests that our machines may be more moral than we are, and cites as partial evidence the fact that in humans "criminality is strongly and negatively correlated with IQ" (Hall 2007, 340). But machine intelligence has little to do with IQ or with the human cognitive architectures and social systems that might explain a correlation between human criminality and IQ.

So: there are other reasons to expect that as well: more effective repuatation systems seem likely to make all actors more moral, including the companies that make computers and robots. Machines are a component of society—and they will be for quite a while.