Contents
- Psychological Twin Prisoner’s Dilemma
- The Smoking Lesion Problem
- Parfit’s Hitchhiker Problem
- The Transparent Newcomb Problem
- The Cosmic Ray Problem
- The XOR Blackmail
- Immunity from Adversarial Predictors Cross-posted from my blog. Epistemic status: Probably discussed to death in multiple places, but people still make this mistake all the time. I am not well versed in UDT, but it seems along the same lines. Or maybe I am reinventing some aspects of Game Theory. We know that physics does not support the idea of metaphysical free will. By metaphysical free will I mean the magical ability of agents to change the world by just making a decision to do so. To the best of our knowledge, we are all (probabilistic) automatons who think themselves as agents with free choices. A model compatible with the known laws of physics is that what we think of as modeling, predicting and making choices is actually learning which one of the possible worlds we live in. Think of it as being a passenger in a car and seeing new landscapes all the time. The main difference is that the car is invisible to us and we constantly update the map of the expected landscape based on what we see. We have a sophisticated updating and predicting algorithm inside, and it often produces accurate guesses. We experience those as choices made. As if we were the ones in the driver’s seat, not just the passengers. Realizing that decisions are nothing but updates, that making a decision is a subjective experience of discovering which of the possible worlds is the actual one immediately adds clarity to a number of decision theory problems. For example, if you accept that you have no way to change the world, only to learn which of the possible worlds you live in, then the Newcomb’s problem with a perfect predictor becomes trivial: there is no possible world where a two-boxer wins. There are only two possible worlds, one where you are a one-boxer who wins, and one where you are a two-boxer who loses. Making a decision to either one-box or two-box is a subjective experience of learning what kind of a person are you, i.e. what world you live in. This description, while fitting the observations perfectly, is extremely uncomfortable emotionally. After all, what’s the point of making decisions if you are just a passenger spinning a fake steering wheel not attached to any actual wheels? The answer is the usual compatibilism one: we are compelled to behave as if we were making decisions by our built-in algorithm. The classic quote from Ambrose Bierce applies: "There’s no free will," says the philosopher; "To hang is most unjust."****″There is no free will," assents the officer; "We hang because we must." So, while uncomfortable emotionally, this model lets us make better decisions (the irony is not lost on me, but since "making a decision" is nothing but an emotionally comfortable version of "learning what possible world is actual", there is no contradiction). An aside on quantum mechanics. It follows from the unitary evolution of the quantum state, coupled with the Born rule for observation, that the world is only predictable probabilistically at the quantum level, which, in our model of learning about the world we live in, puts limits on how accurate the world model can be. Otherwise the quantum nature of the universe (or multiverse) has no bearing on the perception of free will. Let’s go through the examples some of which are listed as the numbered dilemmas in a recent paper by Eliezer Yudkowsky and Nate Soares, Functional decision theory: A new theory of instrumental rationality. From here on out we will refer to this paper as EYNS.
Psychological Twin Prisoner’s Dilemma
An agent and her twin must both choose to either "cooperate" or "defect." If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin? First we enumerate all the possible worlds, which in this case are just two, once we ignore the meaningless verbal fluff like "their decisions are causally independent, made in separate rooms without communication." This sentence adds zero information, because the "agent and the twin know that they reason the same way", so there is no way for them to make different decisions. These worlds are
-
Cooperate world: $1,000,000
-
Defect world: $1,000 There is no possible world, factually or counterfactually, where one twin cooperates and the other defects, no more than there are possible worlds where 1 = 2. Well, we can imagine worlds where math is broken, but they do not usefully map onto observations. The twins would probably be smart enough to cooperate, at least after reading this post. Or maybe they are not smart enough and will defect. Or maybe they hate each other and would rather defect than cooperate, because it gives them more utility than money. If this was a real situation we would wait and see which possible world they live in, the one where they cooperate, or the one where they defect. At the same time, subjectively to the twins in the setup it would feel like they are making decisions and changing their future. The absent-minded Driver problem: An absent-minded driver starts driving at START in Figure 1. At X he can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The essential assumption is that he cannot distinguish between intersections X and Y, and cannot remember whether he has already gone through one of them. There are three possible worlds here, A, B and C, with utilities 0, 4 and 1 correspondingly, and by observing the driver "making a decision" we learn which world they live in. If the driver is a classic CDT agent, they would turn and end up at A, despite it being the lowest-utility action. Sucks to be them, but that’s their world.
The Smoking Lesion Problem
An agent is debating whether or not to smoke. She knows that smoking is correlated with an invariably fatal variety of lung cancer, but the correlation is (in this imaginary world) entirely due to a common cause: an arterial lesion that causes those afflicted with it to love smoking and also (99% of the time) causes them to develop lung cancer. There is no direct causal link between smoking and lung cancer. Agents without this lesion contract lung cancer only 1% of the time, and an agent can neither directly observe, nor control whether she suffers from the lesion. The agent gains utility equivalent to $1,000 by smoking (regardless of whether she dies soon), and gains utility equivalent to $1,000,000 if she doesn’t die of cancer. Should she smoke, or refrain? The problem does not specify this explicitly, but it seems reasonable to assume that the agents without the lesion do not enjoy smoking and get 0 utility from it. There are 8 possible worlds here, with different utilities and probabilities: An agent who "decides" to smoke has higher expected utility than the one who decides not to, and this "decision" lets us learn which of the 4 possible worlds could be actual, and eventually when she gets the test results we learn which one is the actual world. Note that the analysis would be exactly the same if there was a "direct causal link between desire for smoking and lung cancer", without any "arterial lesion". In the problem as stated there is no way to distinguish between the two, since there are no other observable consequences of the lesion. There is 99% correlation between the desire to smoke and and cancer, and that’s the only thing that matters. Whether there is a "common cause" or cancer causes the desire to smoke, or desire to smoke causes cancer is irrelevant in this setup. It may become relevant if there were a way to affect this correlation, say, by curing the lesion, but it is not in the problem as stated. Some decision theorists tend to get confused over this because they think of this magical thing they call "causality," the qualia of your decisions being yours and free, causing the world to change upon your metaphysical command. They draw fancy causal graphs like this one: instead of listing and evaluating possible worlds.
Parfit’s Hitchhiker Problem
*An agent is dying in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they arrive and give the driver $1,000.*The driver will have no way to enforce this after they arrive, but she does have an extraordinary ability to detect lies with 99% accuracy. Being left to die causes the agent to lose the equivalent of $1,000,000. In the case where the agent gets to the city, should she proceed to visit the ATM and pay the driver? We note a missing piece in the problem statement: what are the odds of the agent lying about not paying and the driver detecting the lie and giving a ride, anyway? It can be, for example, 0% (the driver does not bother to use her lie detector in this case) or the same 99% accuracy as in the case where the agent lies about paying. We assume the first case for this problem, as this is what makes more sense intuitively. As usual, we draw possible worlds, partitioned by the "decision" made by the hitchhiker and note the utility of each possible world. We do not know which world would be the actual one for the hitchhiker until we observe it ("we" in this case might denote the agent themselves, even though they feel like they are making a decision). So, while the highest utility world is where the agent does not pay and the driver believes they would, the odds of this possible world being actual are very low, and the agent who will end up paying after the trip has higher expected utility before the trip. This is pretty confusing, because the intuitive CDT approach would be to promise to pay, yet refuse after. This is effectively thwarted by the driver’s lie detector. Note that if the lie detector was perfect, then there would be just two possible worlds:
-
pay and survive,
-
do not pay and die. Once the possible worlds are written down, it becomes clear that the problem is essentially isomorphic to Newcomb’s. Another problem that is isomorphic to it is
The Transparent Newcomb Problem
Events transpire as they do in Newcomb’s problem, except that this time both boxes are transparent — so the agent can see exactly what decision the predictor made before making her own decision. The predictor placed $1,000,000 in box B iff she predicted that the agent would leave behind box A (which contains $1,000) upon seeing that both boxes are full. In the case where the agent faces two full boxes, should she leave the $1,000 behind? Once you are used to enumerating possible worlds, whether the boxes are transparent or not, does not matter. The decision whether to take one box or two already made before the boxes are presented, transparent or not. The analysis of the conceivable worlds is identical to the original Newcomb’s problem. To clarify, if you are in the world where you see two full boxes, wouldn’t it make sense to two-box? Well, yes, it would, but if this is what you "decide" to do (and all decisions are made in advance, as far as the predictor is concerned, even if the agent is not aware of this), you will never (or very rarely, if the predictor is almost, but not fully infallible) find yourself in this world. Conversely, if you one-box even if you see two full boxes, that situation is always, or almost always happens. If you think you pre-committed to one-boxing but then are capable of two boxing, congratulations! You are in the rare world where you have successfully fooled the predictor! From this analysis it becomes clear that the word "transparent" is yet another superfluous stipulation, as it contains no new information. Two-boxers will two-box, one-boxers will one-box, transparency or not. At this point it is worth pointing out the difference between world counting and EDT, CDT and FDT. The latter three tend to get mired in reasoning about their own reasoning, instead of reasoning about the problem they are trying to decide. In contrast, we mindlessly evaluate probability-weighted utilities, unconcerned with the pitfalls of causality, retro-causality, counterfactuals, counter-possibilities, subjunctive dependence and other hypothetical epicycles. There are only recursion-free possible worlds of different probabilities and utilities, and a single actual world observed after everything is said and done. While reasoning about reasoning is clearly extremely important in the field of AI research, the dilemmas presented in EYNS do not require anything as involved. Simple counting does the trick better. The next problem is rather confusing in its original presentation.
The Cosmic Ray Problem
An agent must choose whether to take $1 or $100. With vanishingly small probability, a cosmic ray will cause her to do the opposite of what she would have done otherwise. If she learns that she has been affected by a cosmic ray in this way, she will need to go to the hospital and pay $1,000 for a check-up. Should she take the $1, or the $100? A bit of clarification is in order before we proceed. What does "do the opposite of what she would have done otherwise" mean, operationally?. Here let us interpret it in the following way: ***Deciding and attempting to do X, but ending up doing the opposite of X and realizing it after the fact. *** Something like "OK, let me take $100… Oops, how come I took $1 instead? I must have been struck by a cosmic ray, gotta do the $1000 check-up!" Another point is that here again there are two probabilities in play, the odds of taking $1 while intending to take $100 and the odds of taking $100 while intending to take $1. We assume these are the same, and denote the (small) probability of a cosmic ray strike as p. The analysis of the dilemma is boringly similar to the previous ones: Thus attempting to take $100 has a higher payoff as long as the "vanishingly small" probability of the cosmic ray strike is under 50%. Again, this is just a calculation of expected utilities, though an agent believing in metaphysical free will may take it as a recommendation to act a certain way. The following setup and analysis is slightly more tricky, but not by much.
The XOR Blackmail
*An agent has been alerted to a rumor that her house has a terrible termite infestation that would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy predictor with a strong reputation for honesty learns whether or not it’s true, and drafts a letter:**I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.*The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. 13 Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up? The problem is called "blackmail" because those susceptible to paying the ransom receive the letter when their house doesn’t have termites, while those who are not susceptible do not. The predictor has no influence on the infestation, only on who receives the letter. So, by pre-committing to not paying, one avoids the blackmail and if they receive the letter, it is basically an advanced notification of the infestation, nothing more. EYNS states "the rational move is to refuse to pay" assuming the agent receives the letter. This tentatively assumes that the agent has a choice in the matter once the letter is received. This turns the problem on its head and gives the agent a counterintuitive option of having to decide whether to pay after the letter has been received, as opposed to analyzing the problem in advance (and precommitting to not paying, thus preventing the letter from being sent, if you are the sort of person who believes in choice). The possible worlds analysis of the problem is as follows. Let’s assume that the probability of having termites is p, the greedy predictor is perfect, and the letter is sent to everyone "eligible", i.e. to everyone with an infestation who would not pay, and to everyone without the infestation who would pay upon receiving the letter. We further assume that there are no paranoid agents, those who would pay "just in case" even when not receiving the letter. In general, this case would have to be considered as a separate world. Now the analysis is quite routine: Thus not paying is, not surprisingly, always better than paying, by the "blackmail amount" 1,000(1-p). One thing to note is that the case of where the would-pay agent has termites, but does not receive a letter is easy to overlook, since it does not include receiving a letter from the predictor. However, this is a possible world contributing to the overall utility, if it is not explicitly stated in the problem. Other dilemmas that yield to a straightforward analysis by world enumeration are Death in Damascus, regular and with a random coin, the Mechanical Blackmail and the Psychopath Button. One final point that I would like to address is that treating the apparent decision making as a self- and world-discovery process, not as an attempt to change the world, helps one analyze adversarial setups that stump the decision theories that assume free will.
Immunity from Adversarial Predictors
EYNS states in Section 9: "There is no perfect decision theory for all possible scenarios, but there may be a general-purpose decision theory that matches or outperforms all rivals in fair dilemmas, if a satisfactory notion of "fairness" can be formalized." and later "There are some immediate technical obstacles to precisely articulating this notion of fairness. Imagine I have a copy of Fiona, and I punish anyone who takes the same action as the copy. Fiona will always lose at this game, whereas Carl and Eve might win. Intuitively, this problem is unfair to Fiona, and we should compare her performance to Carl’s not on the "act differently from Fiona" game, but on the analogous "act differently from Carl" game. It remains unclear how to transform a problem that’s unfair to one decision theory into an analogous one that is unfair to a different one (if an analog exists) in a reasonably principled and general way." I note here that simply enumerating possible worlds evades this problem as far as I can tell. Let’s consider a simple "unfair" problem: If the agent is predicted to use a certain decision theory DT1, she gets nothing, and if she is predicted to use some other approach (DT2), she gets $100. There are two possible worlds here, one where the agent uses DT1, and the other where she uses DT2: So a principled agent who always uses DT1 is penalized. Suppose another time the agent might face the opposite situation, where she is punished for following DT2 instead of DT1. What is the poor agent to do, being stuck between Scylla and Charybdis? There are 4 possible worlds in this case:
-
Agent uses DT1 always
-
Agent uses DT2 always
-
Agent uses DT1 when rewarded for using DT1 and DT2 when rewarded for using DT2
-
Agent uses DT1 when punished for using DT1 and DT2 when punished for using DT2 The world number 3 is where a the agent wins, regardless of how adversarial or "unfair" the predictor is trying to be to her. Enumerating possible worlds lets us crystallize the type of an agent that would always get maximum possible payoff, no matter what. Such an agent would subjectively feel that they are excellent at making decisions, whereas they simply live in the world where they happen to win.
Comment
Comment
"The sage is one with causation." The same argument that "we" do not "do" things, also shows that there is no such thing as a jumbo jet, no such thing as a car, not even any such thing as an atom; that nothing made of parts exists. We thought protons were elementary particles, until we discovered quarks. But no: according to this view "we" did not "think" anything, because "we" do not exist and we do not "think". Nobody and nothing exists. All that such an argument does is redefine the words "thing" and "exist" in ways that no-one has ever used them and no-one ever consistently could. It fails to account for the fact that the concepts work. You say that agency is bugs and uncertainty, that its perception is an illusion stemming from ignorance; I say that agency is control systems, a real thing that can be experimentally detected in both living organisms and some machines, and detected to be absent in other things.
Comment
Comment
Comment
Will only reply to one part, to highlight our basic (ontological?) differences:
Comment
If choice and counterfactuals exist, then an action is something that can affect the future, while a thought is not. Of course, that difference no longer applies if your ontology doesn’t feature choices and countefactuals...
What your ontology should be is "nothing" or "mu". You are not keeping up to your commitments.
Comment
We seem to have very different ontologies here, and not converging. Also, telling me what my ontology "should" be is less than helpful :) It helps to reach mutual understanding before giving prescriptions to the other person. Assuming you are interested in more understanding, and less prescribing, let me try again to explain what I mean.
Comment
And very different epistemologies. I am not denying the very possibility of knowing things about reality.
All I am doing is taking you at your word.
You keep saying that it is models all the way down, and there is no way to make true claims about reality. If I am not to take those comments literally, how am I to take them? How am I to guess the correct non-literal interpretation, out of the many possible ones.?
That’s an implicit claim about reality. Something can only be a a mind projection if there is nothing in reality corresponding to it. It is not sufficient to say that it is in the head or the model, it also has to not be in the territory, or else it is a true belief, not a mind projection.. To say that something doesn’t exist in reality is to make a claim about reality as much as to say that something does.
Again "in the model" does not imply "not in the territory".
You seem happy enough with "not exist" as in "agents, counterfactuals and choices don’t exist"
If it is really possible for an agent to affect the future or street themselves into alternative futures, then there is a lot of potential utility in it, in that you can end up in a higher-utility future than you would otherwise have. OTOH, if there are no counterfactuals, then whatever utility you gain is predetermined. So one cannot assess the usefulness, in the sense of utility gain, of models, in a way independent of the metaphysics of determinism and counterfactuals. What is useful, and how useful is, depends on what is true.
It contradicts the "agents don’t exist thing" and the "I never talk about existence thing". If you only objective to reductively inexplicable agents, that would be better expressed as "there is nothing nonreductive".
Although that still wouldn’t help you come to the conclusion that there is no choice and no counterfactuals, because that is much more about determinism than reductionism.
Comment
Comment
That’s a statement about the world. Care to justify it?
How do you know that the people who say "agents exist" don’t mean "some systems can be usefully modelled as agents"?
You are making a claim about reality, that counterfactuals don’t exist., even though you are also making a meta claim that you don’t make claims about reality.
If probablistic agents[*], and counterfactuals are both useful models (and I don’t see how you can consistentlt assert the former and deny the latter) then counterfactuals "exist" by your lights.
[*] Or automaton, if you prefer. If someone builds a software gismo that is probablistic and acts without specific instruction, then it is an agetn and an automaton all at the same time.
There is no full strength top-down determinism, but systems-level behaviour is enough to support a common-sense view of decision making.
Comment
I agree, the apparent emergent high-level structures look awfully like agents. That intentional stance tends to dissipate once we understand them more.
Comment
If intentionality just mean seeking to pursue or maximise some goal, there is no reason an artificial system should not have it. But the answer is different if intentionality means having a ghost or homunculus inside. And neither is the same as the issue of whether an agent is deterministic , or capable of changing the future.
More precision is needed.
Even when the agent has more compute than we do? I continue to take the intentional stance towards agents I understand but can’t compute, like MCTS-based chess players.
Comment
What do you mean by taking the intentional stance in this case?
Comment
I would model the program as a thing that is optimizing for a goal. While I might know something about the program’s weaknesses, I primarily model it as a thing that selects good chess moves. Especially if it is a better chess player than I am. See: Goal inference as inverse planning.
This seems to cut through a lot of confusion present in decision theory, so I guess the obvious question to ask is why don’t we already work things this way instead of the way they are normally approached in decision theory?
Comment
To the extent that this approach is a decision theory, it is some variant of UDT (see this explanation). The problems with applying and formalizing it are the usual problems with applying and formalizing UDT:
How do you construct "policy counterfactuals", e.g. worlds where "I am the type of person who one-boxes" and "I am the type of person who two-boxes"? (This isn’t a problem if the environment is already specified as a function from the agent’s policy to outcome, but that often isn’t how things work in the real world)
How do you integrate this with logical uncertainty, such that you can e.g. construct "possible worlds" where the 1000th digit of pi is 2 (when in fact it isn’t)? If you don’t do this then you get wrong answers on versions of these problems that use logical pseudorandomness rather than physical randomness.
How does this behave in multi-agent problems, with other versions of itself that have different utility functions? Naively both agents would try to diagonalize against each other, and an infinite loop would result.
Comment
Those are excellent questions! Thank you for actually asking them, instead of simply stating something like "What you wrote is wrong because..." Let me try to have a crack at them, without claiming that "I have solved decision theory, everyone can go home now!"
Comment
Comment
Thank you for your patience explaining the current leading edge and answering my questions! Let me try to see if my understanding of what you are saying makes sense.
Comment
OK, I misinterpreted you as recommending a way of making decisions. It seems that we are interested in different problems (as I am trying to find algorithms for making decisions that have good performance in a variety of possible problems). Re top down causation: I am curious what you think of a view where there are both high and low level descriptions that can be true at the same time, and have their own parallel causalities that are consistent with each other. Say that at the low level, the state type is L and the transition function is t_l : L \rightarrow L. At the high level, the state type is H and the nondeterministic transition function is t_h : H \rightarrow Set(H), i.e. at a high-level sometimes you don’t know what state things will end up in. Say we have some function f : L \rightarrow H for mapping low-level states to high-level states, so each low-level state corresponds to a single high-level state, but a single high-level state may correspond to multiple low-level states. Given these definitions, we could say that the high and low level ontologies are compatible if, for each low level state l, it is the case that f(t_l(l)) \in t_h(f(l)), i.e. the high-level ontology’s prediction for the next high-level state is consistent with the predicted next high-level state according to the low-level ontology and f. Causation here is parallel and symmetrical rather than top-down: both the high level and the low level obey causal laws, and there is no causation from the high level to the low level. In cases where things can be made consistent like this, I’m pretty comfortable saying that the high-level states are "real" in an important sense, and that high-level states can have other high-level states as a cause. EDIT: regarding more minor points: Thanks for the explanation of the multi-agent games; that makes sense although in this case the enumerated worlds are fairly low-fidelity, and making them higher-fidelity might lead to infinite loops. In counterfactual mugging, you have to be able to enumerate both the world where the 1000th digit of pi is even and where the 1000th digit of pi is odd, and if you are doing logical inference on each of these worlds then that might be hard; consider the difficulty of imagining a possible world where 1+1=3.
Comment
Comment
My guess is that you, in practice, actually are interested in finding decision-relevant information and relevant advice, in everyday decisions that you make. I could be wrong but that seems really unlikely.
Re microstates/macrostates: it seems like we mostly agree about microstates/macrostates. I do think that any particular microstate can only lead to one macrostate.
By "low-fidelity" I mean the description of each possible world doesn’t contain a complete description of the possible worlds that the other agent enumerates. (This actually has to be the case in single-person problems too, otherwise each possible world would have to contain a description of every other possible world)
An issue with imagining a possible world where 1+1=3 is that it’s not clear in what order to make logical inferences. If you make a certain sequence of logical inferences with the axiom 1+1=3, then you get 2=1+1=3; if you make a difference sequence of inferences, then you get 2=1+1=(1+1-1)+(1+1-1)=(3-1)+(3-1)=4. (It seems pretty likely to me that, for this reason, logic is not the right setting in which to formalize logically impossible counterfactuals, and taking counterfactuals on logical statements is confused in one way or another)
If we fix a particular mental model of this world, then we can answer questions about this model; part of the decision theory problem is deciding what the mental model of this world should be, and that is pretty unclear.
Comment
In other words. usefulness (which DT to use) depends on truth (Which world model to use).
If there is indeterminism at the micro level , there is not the slightest doubt that it can be amplified to the macro level, because quantum mechanics as an experimental science depends on the ability to make macroscopic records of events involving single particles.
Comment
Amplifying microscopic indeterminism is definitely a thing. It doesn’t help the free choice argument though, since the observer is not the one making the choice, the underlying quantum mechanics does.
Comment
Macroscopic indeterminism is sufficient to establish real, not merely logical, counterfactuals.
Besides that, It would be helpful to separate the ideas of dualism , agency and free choice. If the person making the decision is not some ghost in the machine, then they the only thing they can be is the machine, as a total system,. In that case, the question becomes the question of whether the system as a whole can choose, could have chosen otherwise, etc.
But you’re in good company: Sam Harris is similarly confused.
Comment
Comment
You need to argue for that claim, not just state it. The contrary claim is supported by a simple argument: if an even is indeterministic, it need not have happened, or need not have happened that way. Therefore, there is a real possibility that it did not happened, or happened differently—and that is a real counterfactual.
You need to argue for that claim as well.
Comment
Comment
Your comment has no relevance, because probablistic laws automatically imply counterfactuals as well. In fact it’s just another way of saying the same thing. I could have shown it in modal logic, too.
Comment
Thank you, I am glad that I am not the only one for whom causation-free approach to decision theory makes sense. UDT seems a bit like that.
Comment
Please propose a mechanism by which you can make an agent who enumerates the worlds seen as possible by every agent, no matter what their decision theory is, end up in a world with lower utility than some other agent.
Comment
Say you have an agent A who follows the world-enumerating algorithm outlined in the post. Omega makes a perfect copy of A and presents the copy with a red button and a blue button, while telling it the following: "I have predicted in advance which button A will push. (Here is a description of A; you are welcome to peruse it for as long as you like.) If you press the same button as I predicted A would push, you receive nothing; if you push the other button, I will give you $1,000,000. Refusing to push either button is not an option; if I predict that you do not intend to push a button, I will torture you for 3^^^3 years." The copy’s choice of button is then noted, after which the copy is terminated. Omega then presents the real agent facing the problem with the exact same scenario as the one faced by the copy. Your world-enumerating agent A will always fail to obtain the maximum $1,000,000 reward accessible in this problem. However, a simple agent B who chooses randomly between the red and blue buttons has a 50% chance of obtaining this reward, for an expected utility of $500,000. Therefore, A ends up in a world with lower expected utility than B. Q.E.D.
Comment
Your scenario is somewhat ambiguous, but let me attempt to answer all versions of it that I can see.
First: does the copy of A (hereafter, A′) know that it’s a copy?
If yes, then the winning strategy is "red if I am A, blue if I am A′". (Or the reverse, of course; but whichever variant A selects, we can be sure that A′ selects the same one, being a perfect copy and all.)
If no, then indeed A receives nothing, but then of course this has nothing to do with any copies; it is simply the same scenario as if Omega predicted A’s choice, then gave A the money if A chose differently than predicted—which is, of course, impossible (Omega is a perfect predictor), and thus this, in turn, is the same as "Omega shows up, doesn’t give A any money, and leaves".
Or is it? You claim that in the scenario where Omega gives the money iff A chooses otherwise than predicted, A could receive the money with 50% probability by choosing randomly. But this requires us to reassess the terms of the "Omega, a perfect predictor" stipulation, as previously discussed by cousin_it. In any case, until we’ve specified just what kind of predictor Omega is, and how its predictive powers interact with sources of (pseudo-)randomness—as well as whether, and how, Omega’s behavior changes in situations involving randomness—we cannot evaluate scenarios such as the one you describe.
Comment
dxu did not claim that A could receive the money with 50% probability by choosing randomly. They claimed that a simple agent B that chose randomly would receive the money with 50% probability. The point is that Omega is only trying to predict A, not B, so it doesn’t matter how well Omega can predict B’s actions. The point can be made even more clear by introducing an agent C that just does the opposite of whatever A would do. Then C gets the money 100% of the time (unless A gets tortured, in which case C also gets tortured).
Comment
This doesn’t make a whole lot of sense. Why, and on what basis, are agents B and C receiving any money?
Are you suggesting some sort of scenario where Omega gives A money iff A does the opposite of what Omega predicted A would do, and then also gives any other agent (such as B or C) money iff said other agent does the opposite of what Omega predicted A would do?
This is a strange scenario (it seems to be very different from the sort of scenario one usually encounters in such problems), but sure, let’s consider it. My question is: how is it different from "Omega doesn’t give A any money, ever (due to a deep-seated personal dislike of A). Other agents may, or may not, get money, depending on various factors (the details of which are moot)"?
This doesn’t seem to have much to do with decision theories. Maybe shminux ought to rephrase his challenge. After all—
… can be satisfied with "Omega punches A in the face, thus causing A to end up with lower utility than B, who remains un-punched". What this tells us about decision theories, I can’t rightly see.
Comment
Comment
I didn’t read shminux’s post as suggesting that his scheme allows an agent to avoid, say, being punched in the face apropos of nothing. (And that’s what all the "unfair" scenarios described in the comments here boil down to!) I think we can all agree that "arbitrary face-punching by an adversary capable of punching us in the face" is not something we can avoid, no matter our decision theory, no matter how we make choices, etc.
Comment
I am not sure how else to interpret the part of shminux’s post quoted by dxu. How do you interpret it?
A mind-reader looks to see whether this is an agent’s decision procedure, and then tortures them if it is. The point of unfair decision problems is that they are unfair.
Can you clarify this?
One interpretation is that you’re talking about an agent who enumerates every world that any agent sees as possible. But your post further down seems to contradict this, "the unpunched world is not a possible one for the world enumerator". And it’s not obvious to me that this agent can exist.
Another is that the agent enumerates only the worlds that every agent sees as possible, but that agent doesn’t seem likely to get good results. And it’s not obvious to me that there are guaranteed to be any worlds at all in this intersection.
Am I missing an interpretation?
Great post!
I have a question, though, about the "adversarial predictor" section. My question is: how is world #3 possible? You say:
However, the problem statement said:
Are we to suppose that the copy of Fiona that the adversarial predictor is running does not know that an adversarial predictor is punishing Fiona for taking certain actions, but that the actual-Fiona does know this, and can thus deviate from what she would otherwise do? If so, then what happens when this assumption is removed—i.e., when we do not inform Fiona that she is being watched (and possibly punished) by an adversarial predictor, or when we do inform copy-Fiona of same?
Comment
One would have to ask Eliezer and Nat what they really meant, since it is easy to end up in a self-contradictory setup or to ask a question about an impossible world, like to asking what happens if in the Newcomb’s setup the agent decided to switch to two-boxing after the perfect predictor had already put $1,000,000 in. My wild guess is that the FDT Fiona from the paper uses a certain decision theory DT1 that does not cope well with the world with adversarial predictors. She uses some kind of causal decision graph logic that would lead her astray instead of being in the winning world. I also assume that Fiona makes her "decisions" while being fully informed about the predictor’s intentions to punish her and just CDT-like throws her hands in the air and cries "unfair!"
Hey, noticed what might be errors in your lesion chart: No lesion, no cancer should give +1m utils in both cases. And your probabilities don’t add to 1. Including p(lesion) explicitly doesn’t meaningfully change the EV difference, so eh. However, my understanding is that the core of the lesion problem is recognizing that p(lesion) is independent of smoking; EYNS seems to say the same. Might be worth including it to make that clearer? (I don’t know much about decision theory, so maybe I’m just confused.)
Assuming that an agent who doesn’t have the lesion gains no utility from smoking OR from having cancer changes the problem. But apart from that, this post is pretty good at explaining how to approach these problems from the perspective of Timeless Decision Theory. Worth reading about it if you aren’t familiar. Also, is generally agreed that in a deterministic world we don’t really make decisions as per libertarian free will. The question is then how to construct the counterfactuals for the decision problem. I’m in agreement with you TDT is much more consistent as the counterfactuals tend to describe actually consistent worlds.
I’m slightly confused. Is it that we’re learning about which world we are in or, given that counterfactuals don’t actually exist, are we learning what our own decision theory is given some stream of events/worldline?
Comment
What is the difference between the two? The world includes the agent, and discovering more about the world implies self-discovery
The compatibilist concept of free will is practical. It tells you under which circumstances someone can be held legally or ethically responsible. It does not require global additions about how the laws of the universe work. Only when compatibilist free will is asserted as being the only kind does it become a metaphysical claim, or rather an anti metaphysical one. The existence of compatibilist free will isn’t worth arguing about: it’s designed to be compatible with a wide variety of background assumptions.
Magical, or "counter causal" free will is designed to be absurd and impossible from the outset, and therefore is not worth worrying about either. (Incidentally, no proponent of libertarianism ever uses the term "counter causal")
What is worth worrying about is broadly naturalistic libertarian free will. That is, a conception of free will that, unlike compatibilist has some defeasible requirements, such as indeterministic laws of physics, but only requirements which are logically and physically possible. The middle ground is where the action is. (Note that the magical notion of free will is often accused of needing some fundamental third alternative to determinism and chance, whereas naturalistic libertarian ism only requires a mixture of the two structure d in a certain way
Comment
Comment
Perhaps I shoukd have bern clearer that complete determinism versus indeterminism is an open question in science . But then maybe you knew, because your you made a few references to indeterminism already. And maybe you knew because the issue is crucial to the the correct interpration of QM , which is discussed interminably here.
You hint very briefly at the he idea that randomness doesn’t support libertarian FW, but that is an open question in philosophy. It has been given book-length treatments.
Which? Is indeterminism incapable of supporting FW as stated in the first quote , or capable as in the second?
But that is slightly beside the point, since our are arguing against counterfactuals, and the existence of counterfactuals follows tatologously from the absence of strict determinism, questions of free will aside
If a probablistic agent can make a decision that is not fully determined by previous events, then the consequences of that decision trace back to the agent, as a whole system, and no further. That seems to support a respectable enough version of "changing the future". "Magic" might mean being able to make any decision, or carry through any decision, or having a decision making faculty with no moving parts. "Magic" is a term very worth tabooing.
Comment
Comment
Well, which? Iron chains of causality stretching back to infinity, or inherent randomness?
You may be taking it as obvious that both randomness and determinism exclude (soem version of ) free will, but that needs to be spelt out.
Comment
Scott Aaronson in The Ghost in the Quantum Turing Machine does a good job spelling all this out. There is no physical distinction between an agent and a non-agent.
Comment
Scott Aaronson in The Ghost in the Quantum Turing Machine uses the word "agent" 37 times. The building of agents is an engineering discipline. Much of the discussion on this board is about AIs’ which are agentive as well as intelligent.
You might mean there is no fundamental difference between an agent and a non-agent. But then you need to show that someone, somewhere has asserted that, rather than using the word "agent" merely as a "useful" way of expressing something non-fundamental.
More precision is needed.
Comment
Comment
My point was that intelligence corresponds to status in our world: calling the twins not smart means that you expect your readers to think less of them. If you don’t expect that, then I don’t understand why you wrote that remark. I don’t believe in libertarian free will either, but I don’t see the point of interpreting words like "recommending" "deciding" or "acting" to refer to impossible behavior rather than using their ordinary meanings. However, maybe that’s just a meaningless linguistic difference between us.
Comment
My point was that intelligence corresponds to status in > our world: calling the twins not smart means that you expect your readers to think less of them.I can see why you would interpret it this way. That was not my intention. I don’t respect Forrest Gumps any less than Einsteins.
Comment
You don’t harbor any hopes that after reading your post, someone will decide to cooperate in the twin PD on the basis of it? Or at least, if they were already going to, that they would conceptually connect their decision to cooperate with the things you say in the post?