Contents

Claim 1: impossibility of solving the game using independent randomness
Claim 2: solving the game using shared randomness
Conclusion and directions for further research
Appendix: extension to arbitrary normal-form cooperative games
Players play according to ag(u,R) with high probability
ag(u,R) is near-optimal in expectation
Proving the result from these facts Consider a simple coordination game. In this game, two players (player 1 and player 2) each simultaneously choose an action, X or Y. If they both choose X, they both get 1 utility. If they both choose Y, they both get u_y utility for some 0 \leq u_y \leq 2 known to both players. If they choose different actions, they both get 0 utility. Which action should they each choose? Assume they get to communicate before knowing u_y, but not after knowing u_y. An optimal policy pair is for each to pick X if u_y < 1, and Y otherwise. Unfortunately, this policy pair can break down in the presence of even a small amount of noise. Assume neither player observes u_y, but instead each receives an independent observation V_i (player 1 sees V_1, player 2 sees V_2), each of which is drawn uniformly from the range [u_y - \epsilon, u_y + \epsilon] for some small \epsilon > 0. If both players follow the policy of choosing X if and only if their observation of u_y is less than 1, then if u_y is very close to 1, there is a significant likelihood that one player will receive an observation less than 1, while the other will receive one greater than 1. Thus, they have a significant chance of choosing different actions and receiving 0 utility. This issue is essentially the same as the one faced by Buridan’s ass: both options are equally good, so there is no way to effectively decide between them. In fact, as I will prove: Claim 1: No pair of policies in which the players select their actions independently given their observations can simultaneously guarantee an expected utility greater than \max(1, u_y) - 0.5 for all u_y \in [0, 2]. But things change when a shared source of randomness is allowed. Then, as I will also prove: Claim 2: Suppose that, after observing their observation of u_y, each player also gets to observe a random number R \sim \mathrm{Uniform}([0, 1]). Then there is a pair of policies that achieves expected utility at least \max(1, u_y) - 2\sqrt{\epsilon} - \epsilon regardless of u_y . This solution method extends to arbitrary cooperative normal-form games where players observe a perturbed version of the true utility function. This is shown in the appendix. Why is this important? I expect progress in decision theory to come from studying very simple decision problems like this one. If it is possible to show that any solution to some simple problem has certain properties, then this usefully constrains the design space of possible decision theories. In this case, the result suggests that something like shared randomness may be required for achieving provable near-optimality even in cooperative settings.

Claim 1: impossibility of solving the game using independent randomness

Fix \epsilon > 0. Let \pi_1, \pi_2 : \mathbb{R} \rightarrow [0, 1] be Lesbegue measurable functions representing the two players’ policies, which map V_i to the player’s probability of choosing action Y. Define q(u) := \mathrm{Uniform}([u-\epsilon,u+\epsilon]) to be the function mapping u_y to the resulting V_i distribution. Define \tau_i(u) := \mathbb{E}_{v \sim q(u)}[\pi_i(v)]. This is defined so that \tau_i(u_y) equals player i’s overall probability of choosing action Y. For u_1, u_2 \in [0, 2], note that the total variation distance between q(u_1) and q(u_2) is at most \frac{|u_1 - u_2|}{\epsilon}. Thus, since \pi_i is bounded between 0 and 1, \tau_i is \frac{1}{\epsilon}-Lipschitz and therefore continuous. I will now show that, for some u_y, the players achieve expected utility at most \max(1, u_y) - 0.5 by case analysis on \tau_1:

If \tau_1(0) \geq 0.5 then at most 0.5 expected utility is achieved when u_y = 0 (since at most 1 utility is achieved when player 1 chooses action X, and no utility is achieved when player 1 chooses action Y).
If \tau_1(2) \leq 0.5, then at most 1.5 expected utility is achieved when u_y = 2 (since at most 1 utility is achieved when player 1 chooses action X, and at most 2 utility is achieved when player 1 chooses action Y).
If neither of the two above cases hold, then by continuity of \tau_1 and the intermediate value theorem, there exists some u \in [0, 2] with \tau_1(u) = 0.5. When u_y = u, since the two players choose actions independently, there is a 0.5 probability that they select the same action, ensuring that they achieve at most \frac{\max(1, u_y)}{2} \leq \max(1, u_y) - 0.5 expected utility. Thus, claim 1 is proven. This proof method bears resemblance to the problem faced by Buridan’s ass: in the third case, some u_y value is found so that the "policy" \tau_1 is equally compelled by both options, choosing between them with a 50⁄50 coin flip in a way that is disastrous for coordination.

Claim 2: solving the game using shared randomness

Define f_\epsilon(v) = \frac{v - (1 - \sqrt{\epsilon})}{2\sqrt{\epsilon}}. Consider a pair of policies in which player i chooses Y if R < f_\epsilon(V_i), and otherwise chooses X, where R \sim \mathrm{Uniform}([0, 1]). Intuitively, each of these policies increases its chance of taking action Y as its observation of u_y increases in a smooth fashion, and they correlate their randomness so that they are likely to choose the same action. Now note a couple properties of this pair of policies:

For u_y \geq 1 + \sqrt{\epsilon} + \epsilon, it is guaranteed that V_i, V_j \geq 1 + \sqrt{\epsilon}, so both players always take action Y.
Since f_\epsilon is \frac{1}{2\sqrt{\epsilon}}-Lipschitz, and |V_i - V_j| \leq 2 \epsilon, the probability of the players taking different actions is at most \sqrt{\epsilon}. I will now show that the players’ expected utility is at least \max(1, u_y) - 2\sqrt{\epsilon} - \epsilon by case analysis:

Suppose u_y \geq 1 + \sqrt{\epsilon} + \epsilon. Then by property 1, both players are guaranteed to take action Y, so they achieve expected utility u_y \geq \max(1, u_y) - 2\sqrt{\epsilon} - \epsilon.
Suppose u_y < 1 + \sqrt{\epsilon} + \epsilon. By property 2, the players choose the same action with probability at least 1 - \sqrt{\epsilon}, and taking the same action yields a utility of at least 1, so they achieve expected utility at least 1 - \sqrt{\epsilon}. Due to the bound on u_y, this is at least \max(1, u_y) - 2\sqrt{\epsilon} - \epsilon. Thus, the claim is proven. By setting \epsilon sufficiently low, this pair of policies yields an expected utility arbitrarily close to \max(1, u_y) regardless of u_y.

Conclusion and directions for further research

Together, these claims show that there is a simple set of noisy coordination games (namely, the set of games described in the beginning of this post for all possible u_y values) that is impossible to solve with only independent randomness, but which can be solved using shared randomness. Some notes on this:

The proof of claim 1 only used the fact that \tau_1 is continuous. So even without noise, if the players’ policies are a continuous function of the utility u_y, the same problem occurs.
A pair of Bayesians playing this coordination game who have a prior over u_y have no need for randomization, independent or joint. The goal of achieving a high expected utility regardless of u_y most clearly makes sense if u_y is selected adversarially (to maximize regret). It also makes sense if the environment is hard to model in a way that makes selection of a policy with a low rate of "ties" difficult. The comparison between the shared randomness solution and the Bayesian solution is similar to the comparison between randomized Quicksort and a "Bayesian" variant of Quicksort that selects pivots based on some expected distribution of inputs. While the Bayesian solution works better than the randomized solution when the prior over the input distribution is accurate, the randomized solution is simpler, is amenable to proofs, and works well even under adversarial conditions. Where to go from here? Roughly, my overall "plan" for formal decision theory consists of 3 steps:

Solve Cartesian cooperative decision theory problems where players reason about each other using something like reflective oracles.
Extend the solution in 1 to Cartesian multiplayer game theory problems where players have different utility functions and reason about each other using something like reflective oracles.
Extend the solution in 2 to a logically uncertain naturalized setting. Step 1 has still not been solved. Previous writing on this includes this post which studies a setting with a single memoryless player (which is similar to a set of players who have the same utility function). The post shows (redundantly with the paper introducing the absent-minded driver problem as I later found out) that, if the player’s policy is globally optimal (i.e. it achieves the highest possible expected utility), then all actions that might be taken by that policy are CDT-optimal, assuming SIA probabilities. This second condition is a local optimality condition, so the post shows that global optimality implies local optimality. It would be highly desirable to find a similar but different local optimality property that implies a global optimality property. That would essentially be a way of deriving collective self-interest from individual self-interest when individuals have the same utility function and common priors. As this current post shows, the globally optimal policy is discontinuous as a function of the utilities involved, and no continuous approximation always yields a near-optimal expected utility. I expect this to hinder attempts to derive global optimality from local optimality, as it implies there is a valley of bad policies between decent policies and good ones. Introducing shared randomness here may help by preserving continuity. So a natural next step is to find useful local optimality properties in a cooperative setting where players have shared randomness.

Appendix: extension to arbitrary normal-form cooperative games

The solution described in the section on claim 2 can be extended to arbitrary normal-form cooperative games where the players receive only noisy observations of the payoffs. Reading this appendix (which takes up the rest of this post) is unnecessary for understanding the main point of this post. Let n \geq 1 be the number of players, \mathcal{A}i be player i’s set of actions, and u : \prod{i=1}^n \mathcal{A}i \rightarrow \mathbb{R} be the shared unknown utility function over strategy profiles, whose minimum value is u{\min} and whose maximum value is u_{\max}. Fix 0 < \epsilon \leq u_{\max}-u_{\min}. Let V_i : \prod_{i=1}^n \mathcal{A}i \rightarrow \mathbb{R} be a "perturbed" version of u that player i observes; it must satisfy the property that for any strategy profile \mathbf{a}, |V_i(\mathbf{a}) - u(\mathbf{a})| \leq \epsilon. To define the policies, we will first number the strategy profiles arbitrarily \mathbf{a}^1, \ldots, \mathbf{a}^m where m = \prod{i=1}^n |\mathcal{A}i|. Let c > 0 be some number to be determined later. For j \in {1, \ldots, m}, define f_j(v) := \frac{\exp(cv(\mathbf{a}^j))}{\sum{j'=1}^m \exp(cv(\mathbf{a}^{j'}))}. This defines, given any v \in \prod_{i=1}^n \mathcal{A}i, a probability distribution over strategy profiles, since \sum{j=1}^m f_j(v) = 1. Specifically, the distribution is a softmax. This distribution is more likely to select strategy profiles that v considers better, but has some chance of selecting every strategy profile. Players will sample from this distribution using a source of shared randomness, R \sim \mathrm{Uniform}([0, 1]). Define g(v, r) := \min\left{j \in {1, \ldots, m} \mid r \leq \sum_{j'=1}^j f_{j'}(v)\right} Now note that P(g(v, R) = j) = f_j(v), i.e.g(v, R) is distributed according to f_j(v). Define policies \pi_i(v, r) := \mathbf{a}^{g(v,r)}_i. That is, player i will play their appropriate action for strategy profile number g(V_i, R). Roughly, we will now show 2 properties that are sufficient to establish that these policies are near-optimal:

With high probability, the players play according to \mathbf{a}^{g(u, R)}, i.e. for all i, \pi_i(V_i, R) = \mathbf{a}_i^{g(u, R)}.
The strategy profile \mathbf{a}^{g(u, R)} is near-optimal in expectation, i.e.\mathbb{E}[u(\mathbf{a}^{g(u, R)})] is close to u_{\max}.

Players play according to \mathbf{a}^{g(u, R)} with high probability

First, we will show that, for all i and j, f_j(V_i) is close to f_j(u). For all j \in {1, \ldots, m}, since |V_i(\mathbf{a}^j) - u(\mathbf{a}^j)| \leq \epsilon, we have \exp(-c\epsilon) \leq \frac{\exp(cV_i(\mathbf{a}^{j}))}{\exp(cu(\mathbf{a}^{j}))} \leq \exp(c\epsilon). Therefore, for all j \in {1, \ldots, m}, \frac{f_j(V_i)}{f_j(u)} = \frac{\exp(cV_i(\mathbf{a}^j))}{\exp(cu(\mathbf{a}^j))} \frac{\sum_{j'=1}^m \exp(cu(\mathbf{a}^{j'}))}{\sum_{j'=1}^m \exp(cV_i(\mathbf{a}^{j'}))} \leq \exp(2c\epsilon). By identical logic, \frac{f_j(u)}{f_j(V_i)} \leq \exp(2c\epsilon). Now we will bound \left|\sum_{j'=1}^jf_{j'}(V_i)-\sum_{j'=1}^jf_{j'}(u)\right|. \left|\sum_{j'=1}^jf_{j'}(V_i)-\sum_{j'=1}^jf_{j'}(u)\right| = \left|\sum_{j'=1}^j f_{j'}(V_i)\left(1 - \frac{f_{j'}(u)}{f_{j'}(V_i)}\right)\right| \leq \sum_{j'=1}^j f_{j'}(V_i)\left|1 - \frac{f_{j'}(u)}{f_{j'}(V_i)}\right| \leq \sum_{j'=1}^j f_{j'}(V_i) \max{\exp(2c\epsilon)-1,1-\exp(-2c\epsilon)} \leq \sum_{j'=1}^j f_{j'}(V_i) (\exp(2c\epsilon)-1) = \exp(2c\epsilon)-1 For it to be the case that g(u, R) \neq g(V_i, R), it must be the case that for some j, R \leq \sum_{j'=1}^j f_{j'}(u) \not\leftrightarrow R \leq \sum_{j'=1}^j f_{j'}(V_i) which implies R \in \mathrm{ConvexHull}\left(\left{\sum_{j'=1}^j f_{j'}(u), \sum_{j'=1}^j f_{j'}(V_i)\right}\right) Due to the bound on \left|\sum_{j'=1}^jf_{j'}(V_i)-\sum_{j'=1}^jf_{j'}(u)\right|, the measure of this hull is at most \exp(2c\epsilon)-1. Furthermore, there are m hulls of this form (one for each j value), so the total measure of these hulls is at most m(\exp(2c\epsilon)-1). So, player i chooses action \mathbf{a}^{g(u, R)}_i with probability at least 1-m(\exp(2c\epsilon)-1). Thus, by the union bound, the players jointly choose the actions \mathbf{a}^{g(u, R)} with probability at least 1-nm(\exp(2c\epsilon)-1).

\mathbf{a}^{g(u, R)} is near-optimal in expectation

First we will prove a lemma.
Softmax lemma: For any c > 0 and vector \mathbf{x} of length m, \frac{\sum_{j=1}^m \exp(cx_j)x_j}{\sum_{j=1}^m \exp(cx_j)} \geq \max_{j \in {1, \ldots, m}} x_j - \frac{m}{ec}. Proof: Let x^* be the maximum of \mathbf{x}. Now: \frac{\sum_{j=1}^m \exp(cx_j)(x^-x_j)}{\sum_{j=1}^m \exp(cx_j)} \leq \frac{\sum_{j=1}^m \exp(cx_j)(x^-x_j)}{\exp(cx^)} = \sum_{j=1}^m \exp(-c(x^-x_j))(x^-x_j) For any y, we have y \geq \log y + 1. Therefore for any y, cy \geq \log (cy) + 1 = \log y + \log c + 1 \Leftrightarrow \log y - cy \leq -\log c -1. By exponentiating both sides, y\exp(-cy) \leq \frac{1}{ec}. Applying this to the term from the previous inequality: \sum_{j=1}^m \exp(-c(x^-x_j))(x^-x_j) \leq \sum_{j=1}^m \frac{1}{ec} = \frac{m}{ec} At this point we have \frac{\sum_{j=1}^m \exp(cx_j)x_j}{\sum_{j=1}^m \exp(cx_j)} = x^ - \frac{\sum_{j=1}^m \exp(cx_j)(x^-x_j)}{\sum_{j=1}^m \exp(cx_j)} \geq x^-\frac{m}{ec}. \square At this point the implication is straightforward. Since g(u, R) is distributed according to f_j(u), we have \mathbb{E}[u(\mathbf{a}^{g(u,R)})] = \sum_{j=1}^m f_j(u) u(\mathbf{a}^j) = \frac{\sum_{j=1}^m \exp(cu(\mathbf{a}^j))u(\mathbf{a}^j)}{\sum_{j=1}^m \exp(cu(\mathbf{a}^j))} \geq u_{\max}-\frac{m}{ec} .

Proving the result from these facts

At this point we have:

Players play according to \mathbf{a}^{g(u, R)} with probability at least 1 - nm(\exp(2c\epsilon)-1).
\mathbb{E}[u(\mathbf{a}^{g(u, R)})] \geq u_{\max} - \frac{m}{ec}. The first fact lets us quantify the expected difference between u(\mathbf{a}^{g(u, R)}) and u(\pi_1(V_i, R), \ldots, \pi_n(V_n, R)). Since these random variables are equal with probability at least 1 - nm(\exp(2c\epsilon)-1), and their difference is bounded by u_{\max} - u_{\min}, their expected difference is at most nm(\exp(2c\epsilon)-1)(u_{\max}-u_{\min}). Combining this with the second inequality: \mathbb{E}[u(\pi_1(V_1, R), \ldots, \pi_n(V_n), R)] \geq u_{\max} - \frac{m}{ec} - nm(\exp(2c\epsilon)-1)(u_{\max}-u_{\min}) To minimize \frac{m}{ec} + nm(\exp(2c\epsilon)-1)(u_{\max}-u_{\min}), set c := \frac{1}{2\sqrt{en\epsilon (u_{\max}-u_{\min})}} This yields: \frac{m}{ec} = 2m\sqrt{\frac{n\epsilon(u_{\max}-u_{\min})}{e}} Now note that 2c\epsilon = \sqrt{\frac{\epsilon}{en(u_{\max}-u_{\min})}} Due to the fact that \epsilon \leq u_{\max} - u_{\min} and n \geq 0, 2c\epsilon \leq 1. Because for any 0 \leq x \leq 1, \exp(x) - 1 \leq 2x, we have nm(\exp(2c\epsilon)-1)(u_{\max}-u_{\min}) \leq 4nmc\epsilon(u_{\max}-u_{\min}) = 2m\sqrt{\frac{n\epsilon(u_{\max}-u_{\min})}{e}}. By combining inequalities and equalities so far: \mathbb{E}[u(\pi_1(V_1, R), \ldots, \pi_n(V_n, R))] \geq u_{\max} - 4m\sqrt{\frac{n\epsilon(u_{\max}-u_{\min})}{e}}. The expected suboptimality goes to 0 as \epsilon approaches 0.

Sometimes you want to prove a theorem like "The algorithm works well." You generally need randomness if you want to find algorithms that work without strong assumptions on the environment, whether or not there is really an adversary (who knows what kinds of correlations exist in the environment, whether or not you call them an "adversary"). A bayesian might not like this, because they’d prefer prove theorems like "The algorithm works well on average for a random environment drawn from the prior the agents use," for which randomness is never useful. But specifying the true prior is generally hideously intractable. So a slightly more wise Bayesian might want to prove statements like "The algorithm well on average for a random environment drawn from the real prior" where the "real prior" is some object that we can talk about but have no explicit access to. And now the wiser Bayesian is back to needing randomness.

id

TN2ZAHm7GpQYScR86
authors

gjm
score

16
omega_karma

2
votes

6
date_published

2018-07-17T12:47

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=TN2ZAHm7GpQYScR86

This doesn’t (I think) really have much to do with randomness as such. The relevant thing about R is that it’s shared information that a hypothetical adversary doesn’t get to see. If u_y isn’t chosen adversarially, then our players don’t care about pessimizing over u_y but about something like an average over u_y, and then R isn’t needed. Or, if they are ultra-cautious people who universally care about worst cases, then they don’t care about expectation w.r.t. R but about the worst case as R varies, and then R doesn’t help. R only helps them when u_y is chosen adversarially and R isn’t; or when they care about pessimizing w.r.t u_y but not w.r.t. R. So the real conclusion here is not "sometimes shared randomness helps", it’s "if some people are trying to coordinate in the presence of an adversary, anything they share and the adversary has no access to helps".

Comment

id

8dLo5demJacFjSkiA
authors

paulfchristiano
score

9
omega_karma

3
votes

3
date_published

2018-07-18T01:57

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=8dLo5demJacFjSkiA

id

2gMQ5KQDBFzbyRwX9
authors

Wei_Dai
score

10
omega_karma

2
votes

3
date_published

2018-07-19T20:23

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=2gMQ5KQDBFzbyRwX9

A bayesian might not like this, because they’d prefer prove theorems like "The algorithm works well on average for a random environment drawn from the prior the agents use," for which randomness is never useful.

It seems like a bayesian can conclude that randomness is useful, if their prior puts significant weight on "the environment happens to contain something that iterates over my decision algorithm and returns its worst-case input, or something that’s equivalent to or approximates this" (which they should, especially after updating on their own existence). I guess right now we don’t know how to handle this in a naturalistic way (e.g., let both intentional and accidental adversaries fall out of some simplicity prior) and so are forced to explicitly assume the existence of adversaries (as in game theory and this post).

id

EJaYA8LApdjRRdtHf
authors

jessicata
score

5
omega_karma

2
votes

3
date_published

2018-07-17T20:18

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=EJaYA8LApdjRRdtHf

This seems basically right. As discussed in the conclusion, there are reasons to care about worst-case performance other than literal adversaries.

id

HDGCBkgsSvfcEp5ct
authors

Dagon
score

4
omega_karma

1
votes

2
date_published

2018-07-17T21:44

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=HDGCBkgsSvfcEp5ct

Sure, but non-adversarial cases (really, any cases where u is determined independently of strategies chosen) can just choose R as a fixed part of the strategy, rather than a random shared component determined later.

id

jz2Z6DwZsbDw683ZY
authors

jessicata
score

4
omega_karma

3
votes

3
date_published

2018-07-17T21:53

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=jz2Z6DwZsbDw683ZY

That’s right, but getting the worst-case guarantee requires this initial choice to be random.

id

F4FqqCfCg9yeuZtZT
authors

Dagon
score

3
omega_karma
votes

2
date_published

2018-07-19T21:05

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=F4FqqCfCg9yeuZtZT

Nope. Random choice gives a specific value for R each game. The outcome for that iteration is IDENTICAL to the outcome if that R was chosen intentionally. Randomness only has game value as a mechanism to keep information from an adversarial actor.

id

G4Deyg2S3yxjRwdXF
authors

jessicata
score

0
omega_karma

1
votes

2
date_published

2018-07-19T21:43

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=G4Deyg2S3yxjRwdXF

To be clear, by "worst-case guarantee" I mean "the expected utility is guaranteed to be pretty good regardless of u_y", which is unattainable without shared randomness (claim 1). I think you are either misunderstanding or disagreeing with a lot of the terminology on randomized algorithms and worst-case guarantees that are commonly used in CS and statistics. This article is a decent introduction to this topic.

id

Ys72geSDfTa4svtea
authors

Dagon
score

6
omega_karma

1
votes

3
date_published

2018-07-16T18:54

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=Ys72geSDfTa4svtea

I’m missing something (and I haven’t digested the math, so maybe it’s obvious but just missing from the narrative description). Is epsilon the same for both players, in that they see the same V, it just may not exactly match u? or is it different for each player, meaning for the same u, they have different V? From your analysis (risk of 0), it sounds like the latter. In that case, I don’t see how additional shared knowledge helps coordinate them, nor why it needs to be random rather than just a fixed value they agree on in advance. And certainly not why it matters if the additional random shared value is generated before or after the game starts.
If they don’t have this additional source of shared randomness, can they just decide in their pre-game discussion to use R=0.5? Why or why not?

id

soY8MYMRCG5WqnRxA
authors

jessicata
score

3
omega_karma

2
votes

2
date_published

2018-07-16T19:20

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=soY8MYMRCG5WqnRxA

\epsilon is the same for both players but V_1 and V_2 (the players’ observations of u_y ) are different, both sampled independently uniformly from [u_y - \epsilon, u_y + \epsilon] . If they decide on R=0.5 then there exists some u_y value for which they get a bad expected utility (see Claim 1).

id

yE5BE8QWgA9fLWcAG
authors

Dagon
score

2
omega_karma
votes

1
date_published

2018-07-16T19:37

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=yE5BE8QWgA9fLWcAG

Sure, but that goes for a randomly-chosen R too. For every possible R, there is a u value for which they get bad outcomes. It doesn’t get better by randomly choosing R.

id

2qRtvo78Yo6h8ubdH
authors

jessicata
score

1
omega_karma

1
votes

1
date_published

2018-07-16T19:42

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=2qRtvo78Yo6h8ubdH

The assumption is that R is chosen after u_y . So for every u_y the pair of policies gets a good expected utility. See the point on Bayesian algorithms in the conclusion for more on why "get a high expected utility regardless of u_y " might be a desirable goal.

id

GLebLAvf4NEsNbmFb
authors

Dagon
score

2
omega_karma
votes

1
date_published

2018-07-16T19:48

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=GLebLAvf4NEsNbmFb

R∼Uniform([0,1]) How can it possibly matter whether R is chosen before or after uy? R is completely independent of u, right? It’s not a covert communication mechanism about the players’ observations, it’s a random value.

id

mKvaaio3YTwqYSfnn
authors

jessicata
score

1
omega_karma

1
votes

1
date_published

2018-07-16T19:51

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=mKvaaio3YTwqYSfnn

If u_y is chosen after R then it might be chosen to depend on R in such a way that the algorithm gets bad performance, e.g. using the method in the proof of Claim 1.

id

MmZhFjnwmngkh4Bzv
authors

Dagon
score

4
omega_karma

1
votes

2
date_published

2018-07-17T17:04

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=MmZhFjnwmngkh4Bzv

Based on other comments, I realize I’m making an assumption for something you haven’t specified. How is uy chosen? If it’s random and independent, then my assertion holds, if it’s selected by an adversary who knows the players’ full strategies somehow, then R is just a way of keeping a secret from the adversary—sequence doesn’t matter, but knowledge does.

id

iNkqBYfZwp4i3rwo7
authors

jessicata
score

0
omega_karma

1
votes

2
date_published

2018-07-17T20:21

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=iNkqBYfZwp4i3rwo7

Claim 1 says there exists some u_y value for which the algorithm gets high regret, so we might as well assume it’s chosen to maximize regret. Claim 2 says the algorithm has low regret regrardless of u_y , so we might as well assume it’s chosen to maximize regret.

id

xieYWCtekESJgZdmc
authors

Dagon
score

0
omega_karma

-1
votes

2
date_published

2018-07-16T19:59

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=xieYWCtekESJgZdmc

uy and R are independently chosen from well-defined distributions. Regardless of sequence, neither knows the other and CANNOT be chosen based on the other. I’ll see if I can find time tonight to figure out whether I’m saying your claim 1 is wrong (it dropped epsilon too soon from the floor value, but I’m not sure if it’s more fundamentally problematic than that) or that your claim 2 is misleading. My current expectation is that I’ll find that your claim 2 results are available in situation 1, by using your given function with a pre-agreed value rather than a random one.

id

JzcpeCKLxgatbMrve
authors

Rohin Shah
score

4
omega_karma

2
votes

3
date_published

2018-07-17T00:49

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=JzcpeCKLxgatbMrve

The theorems are of the form "For all uy, you get good outcomes" or "There exists a uy that causes bad outcomes". When you want to prove statements of this form, uy is chosen adversarially, so it matters whether it is chosen before or after R.

uy and R are independently chosen from well-defined distributions.What distribution is uy chosen from? That’s not specified anywhere in the post.

id

Afoa5CQkWbxSthzWy
authors

Laszlo_Treszkai
score

1
omega_karma
votes

1
date_published

2018-07-17T07:02

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=Afoa5CQkWbxSthzWy

True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)

The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.

(How can I write LaTeX in the comments?)

id

FgNmLpXmK6z5ZxLRs
authors

jessicata
score

1
omega_karma

1
votes

1
date_published

2018-07-17T07:53

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=FgNmLpXmK6z5ZxLRs

(How can I write LaTeX in the comments?)ctrl-4

id

HSGPPK2NWLHPhwT2w
authors

Donald Hobson
score

1
omega_karma

1
votes

1
date_published

2018-07-16T23:10

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=HSGPPK2NWLHPhwT2w

If you assume a fixed probability distribution over possible $u_y$ that both players know when coordinating, then they can set up the rules they choose to make sure that they probably win. The extra random information is only useful because of the implicit "for all $u_y$". If some malicious person had overheard their strategy, and was allowed to choose $u_y$, but didn’t have access to the random number source, then the random numbers are useful.

id

x7Hi2YMsYDZcusKwm
authors

avturchin
score

4
omega_karma

1
votes

3
date_published

2018-07-16T11:12

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=x7Hi2YMsYDZcusKwm

I expected that Lamport paper would be mentioned, as it describes a known catastrophic mode for autonomous systems, connected with Buridan ass problem and infinite recursion about predicting future time of the problem solving. I think that this problem is underexplored for AI Safety, despite previous attempt to present it on LessWrong.

id

AxaHGBftXwWfWCbAS
authors

cousin_it
score

2
omega_karma

1
votes

1
date_published

2018-07-16T09:38

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=AxaHGBftXwWfWCbAS

Is this a reinvention of correlated equilibrium?

id

uYTBXyBzfyXCxJqHj
authors

jessicata
score

3
omega_karma

2
votes

2
date_published

2018-07-16T10:03

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=uYTBXyBzfyXCxJqHj

This is not a reinvention of correlated equilibrium, although that is related.

id

fkLnPbJkQbRGTPQeJ
authors

cousin_it
score

4
omega_karma

2
votes

2
date_published

2018-07-16T21:10

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=fkLnPbJkQbRGTPQeJ

Looks like I was wrong again and shame on me. Correlated equilibrium can’t be useful in a purely cooperative game, it’s only useful if we’re trying to minimax over many cooperative games. You even spelled it out in the post, but my eyes skipped over it. Sorry again.

id

E5AioeEexjtvxkvhA
authors

cousin_it
score

4
omega_karma

2
votes

2
date_published

2018-07-16T12:58

https://www.lesswrong.com/posts/4xpDnGaKz472qB4LY/buridan-s-ass-in-coordination-games?commentId=E5AioeEexjtvxkvhA

Yeah, sorry, I was hasty. Looks like you’re making a more interesting point: correlated equilibrium can be useful even in purely cooperative games. I wonder what would be the simplest example game...

Buridan’s ass in coordination games

Claim 1: impossibility of solving the game using independent randomness

Claim 2: solving the game using shared randomness

Conclusion and directions for further research

Appendix: extension to arbitrary normal-form cooperative games

Players play according to \mathbf{a}^{g(u, R)} with high probability

\mathbf{a}^{g(u, R)} is near-optimal in expectation

Proving the result from these facts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment