Improved error space for universal optimal predictor schemes

https://www.lesswrong.com/posts/5bd75cc58225bf067037502c/improved-error-space-for-universal-optimal-predictor-schemes

Contents

We construct an error space which is smaller than \Delta_{avg}^2 but admits analogous existence theorems for optimal predictor schemes.

Results

Construction

Given \phi \in \Phi we define \Delta_\phi^1 to be the set of functions \delta: \Nats \rightarrow \Reals^{\geq 0} s.t. \exists \epsilon > 0: \LimInf{k} \phi(k)^\epsilon \delta(k) = 0. It is easily seen \Delta_\phi^1 is an error space.

Given \phi \in \Phi, denote t_\phi(k) := \Floor{2^{(\log k)^{\phi(k)}}}. We define \Delta_{ll,\phi}^2 to be the set of bounded functions \delta: \Nats^2 \rightarrow \Reals^{\geq 0} s.t. for any \phi' \in \Phi, if \phi' \leq \phi then

\frac{\sum\limits_{j=2}^{t_{\phi'}(k)-1} (\log \log (j+1) - \log \log j) \delta(k, j)}{\log \log t_{\phi'}(k)} \in \Delta_{\phi'}^1

We define \Delta_{ll}^2 := \bigcap_{\phi \in \Phi} \Delta_{ll,\phi}^2.

Proposition 1

\Delta_{ll,\phi}^2 is an error space for any \phi \in \Phi. \Delta_{ll}^2 is an error space.

Proposition 2

Consider a polynomial q: \Nats^2 \rightarrow \Nats. There is a function \lambda_q: \Nats^3 \rightarrow [0,1] s.t.

(i) \forall k,j \in \Nats: \sum_{i \in \Nats} \lambda_q(k,j,i) = 1

(ii) For any function \epsilon: \Nats^2 \rightarrow [0,1] we have

\epsilon(k,j) - \sum_{i \in \Nats} \lambda_q(k,j,i) , \epsilon(k,q(k,j)+i) \in \Delta_{ll}^2

The proofs of Propositions 1 and 2 are in the Appendix. The following are proved using exactly like the analogous statements for \Delta_{avg}^2 and we omit the proofs.

Lemma

Consider (f, \mu) a distributional estimation problem, (P,r,a), (Q,s,b) (poly,log)-predictor schemes. Suppose p: \Nats^2 \rightarrow \Nats a polynomial and \delta \in \Delta_{ll}^2 are s.t.

\forall i,k,j \in \Nats: E[(P^{k,p(k,j)+i}-f)^2] \leq E[(Q^{kj}-f)^2] + \delta(k,j)

Then \exists \delta' \in \Delta_{avg}^2 s.t.

E[(P^{kj}-f)^2] \leq E[(Q^{kj}-f)^2] + \delta'(k,j)

Theorem 1

Consider (f,\mu) a distributional estimation problem. Define \Upsilon: \Nats^2 \times \Words^3 \xrightarrow{alg} [0,1] by

\Upsilon^{kj}(x,y,Q) := \beta(ev^j(Q,x,y))

Define \upsilon_{f,\mu}: \Nats^2 \rightarrow \Words by

\upsilon_{f,\mu}^{kj}:=\Argmin{|Q| \leq \log j} E_{\mu^k \times U^j}[(\Upsilon^{kj}(x,y,Q)-f(x))^2]

Then, (\Upsilon, j, \upsilon_{f,\mu}) is a \Delta_{ll}^2(poly, log)-optimal predictor scheme for (f,\mu).

Theorem 2

There is an oracle machine \Lambda that accepts an oracle of signature SF: \Nats \times \Words \rightarrow \Words \times [0,1] and a polynomial r: \Nats \rightarrow \Nats where the allowed oracle calls are SF^k(x) for |x|=r(k) and computes a function of signature \Nats^2 \times \Words^2 \rightarrow [0,1] s.t. for any \phi \in \Phi, (f, \mu) a distributional estimation problem and G:=(S, F, r^S, a^S) a corresponding \Delta_\phi^1(log)-generator, \Lambda[G] is a \Delta_{ll,\phi}^2(poly,log)-optimal predictor scheme for (f,\mu).

Appendix

Proof of Proposition 1

The only slightly non-obvious condition is (v). We have

\limsup_{k \rightarrow \infty} \phi(k)^{\epsilon \alpha} E_{\kappa_\phi^k}[\delta(k,j)^\alpha] \leq \limsup_{k \rightarrow \infty} \phi(k)^{\epsilon \alpha} E_{\kappa_\phi^k}[\delta(k,j)]^\alpha

\limsup_{k \rightarrow \infty} \phi(k)^{\epsilon \alpha} E_{\kappa_\phi^k}[\delta(k,j)^\alpha] \leq (\limsup_{k \rightarrow \infty} \phi(k)^\epsilon E_{\kappa_\phi^k}[\delta(k,j)])^\alpha

\lim_{k \rightarrow \infty} \phi(k)^{\epsilon \alpha} E_{\kappa_\phi^k}[\delta(k,j)^\alpha] = 0

Proof of Proposition 2

Given functions q_1,q_2: \Nats^2 \rightarrow \Nats s.t.q_1(k,j) \geq q_2(k,j) for k,j \gg 0, the proposition for q_1 implies the proposition for q_2 by setting

\lambda_{q_2}(k,j,i):=\begin{cases}\lambda_{q_1}(k,j,i-q_1(k,j)+q_2(k,j)) & \text{if } i-q_1(k,j)+q_2(k,j) \geq 0 \ 0 & \text{if } i-q_1(k,j)+q_2(k,j) < 0 \end{cases}

Therefore, it is enough to prove to proposition for functions of the form q(k,j)=j^{m+n \log k} for m > 0.

Consider any \phi \in \Phi. We have

\LimInf{k} \phi(k)^{-\frac{1}{2}} = 0

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\log\log k}{\phi(k) \log\log k} = 0

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\log (m+n \log k)}{\phi(k) \log \log k} = 0

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\int\limits_{x=2}^{2^{m+n \log k}} d(\log \log x)}{\phi(k) \log \log k} = 0

Since \epsilon takes values in [0,1]

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\int\limits_{x=2}^{2^{m+n \log k}} \epsilon(k,\Floor{x}) d(\log \log x)}{\phi(k) \log \log k} = 0

Similarly

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\int\limits_{x=t_\phi(k)}^{t_\phi(k)^{m+n \log k}} \epsilon(k,\Floor{x}) d(\log \log x)}{\phi(k) \log \log k} = 0

The last two equations imply that

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\int\limits_{x=2}^{t_\phi(k)} \epsilon(k,\Floor{x}) d(\log \log x) - \int\limits_{x=2^{m+n \log k}}^{t_\phi(k)^{m+n \log k}} \epsilon(k,\Floor{x}) d(\log \log x)}{\phi(k) \log \log k} = 0

Raising x to a power is equivalent to adding a constant to \log \log x, therefore

\LimInf{k} \phi(k)^{\frac{1}{2}} \frac{\int\limits_{x=2}^{t_\phi(k)} \epsilon(k,\Floor{x}) d(\log \log x) - \int\limits_{x=2}^{t_\phi(k)} \epsilon(k,\Floor{x^{m+n \log k}}) d(\log \log x)}{\phi(k) \log \log k} = 0

\LimInf{k}\phi(k)^{\frac{1}{2}}\frac{\int\limits_{x=2}^{t_\phi(k)} (\epsilon(k,\Floor{x})-\epsilon(k,\Floor{x^{m+n \log k}})) d(\log \log x)}{\phi(k) \log \log k} = 0

Since \Floor{x^{m+n \log k}} \geq \Floor{x}^{m+n \log k} we can choose \lambda_q satisfying condition (i) so that

\int\limits_{x=j}^{j+1} \epsilon(k,\Floor{x^{m+n \log k}}) d(\log\log x) = (\log\log(j+1)-\log\log j) \sum_i \lambda_q(k,j,i) , \epsilon(k,j^{m+n \log k}+i)

It follows that

\int\limits_{x=j}^{j+1} \epsilon(k,\Floor{x^{m+n \log k}}) d(\log\log x) = \int\limits_{x=j}^{j+1} \sum_i \lambda_q(k,\Floor{x},i) , \epsilon(k,\Floor{x}^{m+n \log k}+i) d(\log\log x)

\LimInf{k}\phi(k)^{\frac{1}{2}}\frac{\int\limits_{x=2}^{t_\phi(k)} (\epsilon(k,\Floor{x})-\sum_i \lambda_q(k,\Floor{x},i) , \epsilon(k,\Floor{x}^{m+n \log k}+i)) d(\log \log x)}{\phi(k) \log \log k} = 0

\LimInf{k}\phi(k)^{\frac{1}{2}}\frac{\sum_{j=2}^{t_\phi(k)-1} (\log\log(j+1)-\log\log j)(\epsilon(k,j)-\sum_i \lambda_q(k,j,i) , \epsilon(k,j^{m+n \log k}+i))}{\phi(k) \log \log k} = 0

\epsilon(k,j) - \sum_{i \in \Nats} \lambda_q(k,j,i) , \epsilon(k,q(k,j)+i) \in \Delta_{ll}^2