Logical Inductor Lemmas

https://www.lesswrong.com/posts/5bd75cc58225bf067037556c/logical-inductor-lemmas

The final lemma in this sequence will be necessary for use in an upcoming post about logical inductors and correlated equilibria, so it will be proved here in order to not clutter up that post.

Much of the notation in this is reused from the previous post, Two Notions of Best Response

Let S_{1} be the set of actions available to player 1. |S_{1}|=m. Given a matrix M, M_{j} refers to the j-th row vector of M. Given some vector \vec{v}, |\vec{v}| is defined in the usual way. In this post, a_{j} is typically used to refer to some default action by player 1, and a_{j'} typically refers to some better action.

Consider the space \prod_{ij}\mathbb{R}. This would be a space of probabilities/​prices assigned to the various joint outcomes, by some logical inductor. It’s possible to use expressible features to pick out fuzzy subsets of this space, by taking the prices of the various sentences as input and mapping them to [0,1] (in practice, the prices will always be in a hypercube, but it’ll be convenient to not have to deal with the special case where the prices are on an edge of the hypercube)

The infinite sequence {p_{t}}_{t\in\mathbb{N^{+}}} of points that lie in this space can be identified with the sequence of prices of the logical inductor on day t.

Consider some fuzzy set/​\mathbb{P}-generable region in that space, bounded in [0,1], which we’ll denote by C.

Let \alpha_{C}(k) be the number in [0,1] that the expressible feature \alpha which implements C outputs when given the market prices of the day k. By a slight abuse of notation, \alpha_{C}(p) refers to what the expressible feature would output if the market prices were given by the point p.

C\in\mathcal{B}{a{j}} iff there’s a point p\in supp(C) such that p\in B_{a_{j}}. In words, there’s a point in C where a_{j} is an evidential best response.

Given some point p, where some action a_{j} has nonzero probability (for a well-defined expected utility), then an action a_{j'} dominates a_{j} by \delta if EU_{lower}(a_{j'})=EU(a_{j})+\delta.

#Lemma 1:

When a point p has |P_{j}|\ge\epsilon, all points q in an L_{1}-ball of size \delta around p where \delta<<\epsilon will have |EU_{q}(a_{j})-EU_{p}(a_{j})|\le 2\delta. (approximately)

Proof:

EU_{p}(a_{j})=\frac{U_{j}\cdot P_{j}}{|P_{j}|}=\frac{U_{j}\cdot P_{j}}{\epsilon}\simeq\frac{U_{j}\cdot P_{j}\pm\delta}{\epsilon\pm\delta}=EU_{q}(a_{j})

(here, \pm\delta should be interpreted to be any number in [-\delta,\delta], and it doesn’t necessarily have to be the same number on the top and bottom. Also, in the next line, X will be used as an abbreviation for U_{j}\cdot P_{j}, which is a number in [0,\epsilon])

\frac{U_{j}\cdot P_{j}\pm\delta}{\epsilon\pm\delta}\simeq\frac{(X\pm\delta)(\epsilon\pm\delta)}{\epsilon^{2}-\delta^{2}}\simeq\frac{\epsilon X\pm\delta X\pm\epsilon\delta}{\epsilon^{2}}=\frac{X\pm\delta\frac{X}{\epsilon}\pm\delta}{\epsilon}\simeq EU_{p}(a_{j})\pm 2\delta

QED.

Now let \max_{n}(a_{j}) be the n’th largest utility in the j-th row of the utility matrix.\max_{2}(a_{j}) refers to the highest utility less than the maximum utility, in the event that there are two actions with maximal utility.\min_{n} is defined similarly.

#Lemma 2:

For all points p, there exists a change in probabilities by \epsilon (according to the L_{1}-norm), for which the expected utility of the move a_{j} can be increased by (\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2}, or pushed to the highest attainable utility, whichever one is lower.

The same argument will also work to establish that the probabilities can be lowered by (\min_{2}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2}, or pushed to the lowest attainable utility, whichever one is higher.

Assume that the probability matrix P has P_{j}=\vec{0} Then there is an arbitrarily small change in probability that will have a_{j} have the highest possible expected utility, and it is proved.

Assume that the probability matrix P has P_{j} have less than \frac{\epsilon}{2} probability, total, on the i which correspond to non-maximal utilities in U_{j}. Then there is an \epsilon-change in probabilities (take all the probability mass from non-optimal outcomes and put it on the best outcome) that makes a_{j} have the highest possible expected utility, and it is proved.

Assume that the probability matrix P has P_{j} have \frac{\epsilon}{2} or greater probability, total, on the i which correspond to non-maximal utilities in U_{j}. Then by taking \frac{\epsilon}{2} probability mass from the non-optimal outcomes, and putting it on the optimal outcome, it is easy to verify from the expected utility equation that it will produce at least a (\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2} increase in the expected utility.

#Lemma 3:

If there is a point p for which no action a_{j'} dominates a_{j} by a positive number, and there is a \mathbb{P}-generable fuzzy set C\not\in\mathcal{B}{a{j}}, then p must be on the boundary of supp(C), or on the exterior of supp(C).

Assume the opposite, that this point is on the interior of C. Due to the continuity of the fuzzy set, and the fact that a_{j} is not dominated by any positive number, then there is an arbitrarily small perturbation of the prices to make a_{j} an evidential best response. Now there is a point where a_{j} is an evidential best response in the support of C. However, we assumed that the support of C had no points where a_{j} was an evidential best response. Therefore, there is a contradiction, and p must either be on the boundary of supp(C), or the exterior of supp(C).

In both cases, due to continuity, this point must have \alpha_{c}(p)=0.

#Lemma 4:

Given a \mathbb{P}-generable fuzzy set C such that C\not\in\mathcal{M}{a{j}}, and some \epsilon, then let . Then all points in C' have some action a_{j'} that dominates a_{j} by \delta, for some positive constant delta.

To begin with, the function \alpha_{C} has some bounded lipschitz constant L. Let \epsilon':=\frac{\epsilon}{2L}.

Given a point p\in C', associate each action with EU_{lower,p} for that action, and select an arbitrary highest-value action to be a_{j'}.

Consider the function F(p) that uses the highest-value action for p as a_{j'}, and then outputs \min\left((\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon'}{2},(\min_{2}(a_{j'})-\min_{1}(a_{j'}))\frac{\epsilon'}{2}\right) when both of these terms are positive, and the term that is positive when the other term is 0. (we can ignore the case where both terms are 0, as we shall soon see)

To begin, assume that both terms are 0. That means that EU(a_{j'}) and EU(a_{j}) are both constants. If EU(a_{j'})>EU(a_{j}), then the theorem is proved. If they are equal, then one of two situations apply. In case 1, a_{j'} is never selected as one of the best actions, in which case we can ignore it. In case 2, a_{j'} is selected as one of the best actions somewhere in C', and then no action would dominate a_{j} by a positive number in the interior of C, which is impossible by Lemma 3.

Therefore, assuming the theorem hasn’t been trivially proved yet, F(p) is always defined for all p\in C'. Also, note that F(p) is bounded below by a constant (because there are only finitely many actions, all of which have their corresponding term be a constant, or 0)

Select an arbitrary point p, and the corresponding action, a_{j'} such that

EU_{lower,p}(a_{j'})-EU_{upper,p}(a_{j})<F(p)

If there isn’t any, we’re done (and \delta=argmin_{p}F(p).

Otherwise, by Lemma 2, if we look at the L_{1} ball of \epsilon'-sized perturbations, one of two events will have occurred. In case 1, there’s a point in that ball where the "domination gap" given by F(p) has been closed completely. In case 2, the maximum possible utility for a_{j} is less than the minimum possible utility for a_{j'}, and the ball of size \epsilon' has been ruled out, as all points in that ball have a_{j'} dominating a_{j} by a fixed constant \delta.

In case 1, because the point q where the utility gap closes is \epsilon'-away from a point p where \alpha_{C}(p)>\epsilon, and \epsilon'=\frac{\epsilon}{2L}, then \alpha_{C}(q)\ge\frac{\epsilon}{2} (due to the bounded lipschitz constant of \alpha_{C}) Then there’s a point where a_{j} is not dominated on the interior of C, which by Lemma 3, is impossible.

Therefore, case 2 must apply. However, because the worst-case utility of action a_{j'} dominates the best-case utility of action a_{j} by \delta, our assumption that the theorem was false is false.

#Lemma 5:

Given some point p where a_{j'} dominates a_{j} by \delta or more, and |P_{j}|>\epsilon>>\delta, and p is more than 2(m-1)\epsilon+\frac{\delta}{3} from a point that is in B_{a_{j}} (according to the L_{1} norm), then the sequence of prices {p_{t}}{t\in\mathbb{N}^{+}} will only finitely often be in the L{1}-ball of size ~\frac{\delta}{6} around p.

We will consider two cases, and prove them seperately.

In case 1, |P_{j'}|\ge\epsilon.

By Lemma 1, the change in expected utility for a_{j} and a_{j'} over the \frac{\delta}{6} L_{1} ball will be about, or less than, \frac{\delta}{3}. Because a_{j'} dominates a_{j} by \delta or more at p, then in this ball, a_{j'} always dominates a_{j} by \frac{\delta}{3} or more.

Now, there is a convex \mathbb{P}-generable fuzzy set C which covers this ball (and a tiny region outside of it, due to the continuous indicator functions, but this doesn’t affect anything, because the support of that region still has a_{j'} dominating a_{j} by about \frac{\delta}{3} or more). Assuming that there are infinitely many points in the ball (by contradiction), then by calibration, the empirical joint probabilities, summed over all time, will be within th support of C.

To recap the definition of expected value for a logical inductor,

\mathbb{E}{t}(U{t}|a_{t}=j)=\frac{1}{t}\sum_{i=0}^{i=t-1}\min\left(\frac{\mathbb{P}{t}("a{t}=j"\wedge "U_{t}>\frac{i}{t}")}{\mathbb{P}{t}(a{t}=j)},1\right)

By assumption, a_{t}=j has a positive probability around \epsilon or greater. By coherence, the logical inductor will eventually learn that the price on a_{t}=j should equal the sum of the prices over all the other joint outcomes, so in the limit, \mathbb{P}{t}(a{t}=j)\simeq_{t}|P_{j}|. A similar argument applies to learning the utilities of the various outcomes in the top of the fraction, so in the limit, \mathbb{E}{t}(U{t}|a_{t}=j)\simeq_{t}EU(a_{j}).

Now, since a_{j'} also has probability bounded away from 0, then the same argument applies to it as well. The expected utilities will eventually be learned. Because there’s a \frac{\delta}{3} or larger difference in expected utilities, a_{j} will be taken with limiting probability 0 when the prices are inside C. However, due to calibration, the probability of a_{j} must be bounded away from 0, so we have a contradiction, and the sequences of prices {p_{t}}{t\in\mathbb{N}^{+}} must have only finitely many elements in the L{1}-ball.

Now for case 2. Assume case 1 is false, so for the point p, for all a_{j'} which dominate a_{j} by \delta or more, |a_{j'}|<\epsilon.

Consider the L_{1}-ball of size \frac{\delta}{6} around p. There degree of improvement in EU(a_{j}) over the whole ball can be at most \frac{\delta}{3}, by Lemma 1. Let \delta' be the maximum increase over EU_{p}(a_{j}) for a_{j}, for the whole ball.

For each action a_{j'} that is not a_{j} and has EU_{lower,p}(a_{j'})>EU_{upper,p}(a_{j}), there is a perturbation of size 2\epsilon or less that will either drive their expected utility to EU_{upper,p}(a_{j})+\delta' or below, or drive their expected utility to the minimum possible.

If a_{j'} has a probability less than \epsilon (as all the a_{j'} that \delta-dominate a_{j} do), by taking \epsilon utility mass from that action (to drive it to probability 0) and putting it on a_{j} in a way that preserves expected value, that a_{j'} will be counted as having the lowest possible expected utility in the dominance calculation. If a_{j'} has a probability \ge\epsilon (and has an expected utility within \delta of a_{j}) Lemma 2 works to either drive the expected utility of a_{j'} below that of a_{j}, or drive it to the minimum possible expected utility.

If the worst-case utility for all those actions is equal to or less than EU_{p}(a_{j})+\delta', then there’s a point in B_{a_{j}} that’s within 2(m-1)\epsilon+\frac{\delta}{3} distance of p (apply the perturbations mentioned above to make all other actions lose), which violates one of the starting assumptions. Therefore, there must be some action a_{j'} that dominates a_{j} by more than \delta' at p, using the worst-case utility, so it dominates a_{j} by a fixed amount over the whole L_{1} ball.

Now, we shouldn’t necessarily assume that, over the whole ball, accurate utilities for a_{j'} will be learned, because the ball may have regions where the probability of a_{j'} is arbitrarily close to 0. However, density-zero exploration lets the inductor learn a lower bound on the utility of a_{j'}, as we’ll see.

Assume that the L_{1}-ball has infinitely many price points in it (for a contradiction) Then, we can again make a \mathbb{P}-generable convex set C with infinitely many points in it, and because of infinite exploration on divergent \mathbb{P}-generable subsequences (this is a slight strengthening of Scott’s density-zero exploration), the action a_{j'} will be taken infinitely often, and in finite time, the inductor will learn that \mathbb{E}{t}(U{t}|a_{t}=j) will be close to, or greater than, the worst-case utility for a_{j'} when the prices are in the set C.

Again, we get the logical inductor learning to never take action a_{j}, which produces a contradiction with the calibration property in the same way as before, because in this region, the price of a_{j} is bounded away from 0.

Therefore, in case 2, there are still only finitely many points in the ball around p, and the theorem has been proved.

#Lemma 6:

One of the following two properties applies to all convex \mathbb{P}-generable sets of prices over joint outcomes:

1: C\in\mathcal{B}{a{j}}

2: \lim_{t\to\infty}\frac{1}{t}\sum_{k\le t}\alpha_{C}(k)\cdot ind(a_{j},k)=0

(I suspect it’s possible to strengthen condition 2 a little bit, but this more than suffices for the upcoming proof in the next post. Also, ind(a_{j},k) is the function that is 0 when a_{j} is not taken on turn k, and 1 otherwise.)

Assume 1 is false. If it’s true, the proof is over. Also, assume there are infinitely many points in the interior of C, because otherwise 2 is true and the proof is over. The space of prices is composed entirely of the following three disjoint sets of points. Set \Gamma is composed of points where \alpha_{C}(p)<\epsilon. in sets \Delta and \Theta, this doesn’t hold (and by Lemma 4, there must always be some action a_{j'} that dominates a_{j} by \delta or more).

Set \Delta\cup\Theta is convex, because C is. Set \Delta is the (convex) set of points where a_{j}<\frac{\epsilon}{L}, and set \Theta is the (convex) set of points where a_{j}\ge\frac{\epsilon}{L}.

Set \Theta can have Lemma 5 applied to it to conclude that the sequence of prices will only be within it finitely often. Just identify \frac{\epsilon}{L} with 2(m-1)\epsilon+\frac{\delta}{3} in Lemma 5 to get the bounding away from B_{a_{j}}, because that’s the minimum distance to travel to get outside of supp(C), and identify \delta or something sufficiently smaller with the \delta in Lemma 5 to get the domination condition.

We can split up \lim_{t\to\infty}\frac{1}{t}\sum_{k\le t}\alpha_{C}(k)\cdot ind(a_{j},k) into three sequences (with some overlap, but it won’t matter, because it produces an overestimate)

In the first sequence, the prices for the day are in set \Gamma. In the second (fuzzy) sequence,the expected utility of a_{j} is less than \frac{\epsilon}{L} (plus a little bit, so an expressible feature can capture this), and in particular, this fuzzy sequence is \mathbb{P}-generable. In the third sequence, the prices for the day are in set \Theta. All days are in one of these three sequences.

Over the first sequence, the time-averaged sum must limit to \epsilon or less, because \alpha_{C}(k)<\epsilon, always.

Over the second (fuzzy) sequence, the long-term frequency of a_{j} being taken must limit to \frac{\epsilon}{L} or less, by calibration, so the time-averaged sum must limit to \frac{\epsilon}{L} or less.

Over the third sequence, because it has finitely many points in it, the time-averaged sum limits to 0.

Therefore, the time-averaged sum must limit to \epsilon(1+\frac{1}{L}) or less. Because L is a fixed constant, and \epsilon can be as small as we want, the time-averaged sum must converge to 0.