Optimal predictors and propositional calculus

https://www.lesswrong.com/posts/5bd75cc58225bf0670374fb3/optimal-predictors-and-propositional-calculus

This is a writeup of the stuff Benja and I thought up during the logical uncertainty workshop in May.

An optimal predictor for (D, \mu) allows assigning probabilities to logical sentences of the form x \in D but in general it doesn’t allow assigning probabilities to propositional formulae constructed from sentences of that form. Here we outline two approaches to defining such probabilities.

Consider D a language. Define B_D \subseteq \Words to be the set of words encoding propositional formulae whose variables are labeled by elements of \Words such that the formula evaluates to truth if a variable labeled by x is substituted for the truth value of x \in D. We can think of B_D as consisting of expressions like \Quote{0110 \in D} \land \neg \Quote{101 \in D} which represent true sentences.

Consider \mu a word ensemble and suppose P is an optimal predictor for (D, \mu). It would be nice to show P satisfies something like the "coherence condition"

P^k(\phi) = P^k(\phi \land \psi) + P(\phi \land \neg \psi)

Obviously we cannot expect the exact equality since P is only defined up to similarity relative to \mu. Instead, we were able to show that under certain assumptions on \mu, we have that (P^k(\phi) - P^k(\phi \land \psi) - P(\phi \land \neg \psi))^2 is "negligible on average" in some sense.

To see this define the languages

D_+ = {(\phi, \psi) \mid \phi \land \psi \in B_D} D_- = {(\phi, \psi) \mid \phi \land \neg \psi \in B_D} D_0 = {(\phi, \psi) \mid \phi \in B_D}

All three languages have reductions to B_D:

f_+(\phi, \psi) := \phi \land \psi f_-(\phi, \psi) := \phi \land \neg \psi f_0(\phi, \psi) := \phi

Assume a word ensemble \nu exists s.t. f_+, f_- and f_0 are pseudo-invertible reductions of (D_+, \nu), (D_-, \nu) and (D_0, \nu) to (B_D, \mu). Then by Theorem 6.1 of [1], f_+^{-1}(P) is an optimal predictor for (D_+, \nu), f_-^{-1}(P) is an optimal predictor for (D_-, \nu) and f_0^{-1}(P) is an optimal predictor for (D_0, \nu). Since D_0 = D_+ \sqcup D_-, Theorems 5.1 and 4.2 imply that f_0^{-1}(P) \overset{\nu}{\sim} f_+^{-1}(P) + f_-^{-1}(P). That is, E_{\nu^k}[(P^k(\phi) - P^k(\phi \land \psi) - P(\phi \land \neg \psi))^2] is negligible.

Another approach is considering a language D equipped with a binary operation m: \Words \times \Words \rightarrow \Words such that \chi_D(m(x, y))=\chi_D(x) \chi_D(y). If \mu is a word ensemble and P is an optimal predictor for (D, \mu) then P^k(m(x, y)) can be interpreted as the probability of \Quote{x \in D} \land \Quote{y \in D}. The probability of \Quote{x \in D} \land \neg \Quote{y \in D} can be taken to be \eta(P^k(m(x, y)) - P^k(y)). Continuing in this manner, it is possible to assign probabilities to all propositional formulae of the sort.

In this case, the coherence condition would hold automatically except for the presence of \eta. Define

D_+ = {(x, y) \mid m(x, y) \in D} D_0 = {(x, y) \mid x \in D}

We can now apply the same method as above combined with Lemma 4.3 (D_+ \subseteq D_0) to show approximate coherence.

It would be interesting to construct concrete examples in which these results are applicable.

[1] "A complexity theoretic approach to logical uncertainty (Draft)"