This post is part of the Asymptotic Logical Uncertainty series. In this post, I will talk more about the exact assumptions we need to make about the sequence of sentences in the Benford Test.
I will start by making a few changes to the previous notation. First, we no longer need the output of L to be computable in bounded time. From now on, we can just say that L is a Turing machine that on input N eventually accepts if \phi_N is provable, eventually rejects if \phi_N is disprovable, and runs forever otherwise.
We will also change the sequence of sentences in the Benford test. Instead of "The first digit of A(n) in base 10 is 1," we will use "The first digit of 3\uparrow^n 3 in base 10 is 1." This way, the reasonable answer to every question is \frac{1}{\log 10} as opposed to before the powers of 10 were an exception. This change does not make the Benford test weaker, and the program we present will also pass the original Benford test, but it will make many definitions have fewer messy cases.
Finally, we will switch to deterministic machines. Instead of making a machine which outputs 1 with the correct probability, we will have a machine that outputs a probability. This clearly makes the problem no easier.
Here is the new version of the Benford test:
Let M be a Turing machine which on input N runs quickly and outputs a probability M(N), which represents the probability assigned to \phi_N. We say that M satisfies the Benford test if \lim_{n\rightarrow\infty}M(s_n)=\frac{1}{\log(10)}, where \phi_{s_n}= ``The first digit of 3\uparrow^n 3 is a 1.″
In this post, we will not present a solution to the Benford test, but will be very explicit about the assumptions we need to make.
The fact that the best probability to assign to \phi_{s_n} is \frac{1}{\log(10)} is dependent not only on the fact that frequency with which \phi_{s_n} is true tends to \frac{1}{\log(10)}, but also on the fact that the sequence of truth values of \phi_{s_n} contains no patterns that can be used to quickly compute a better probability on some subsequence. We therefore assume that this sequence of truth values is indistinguishable from a a sequence produced by a coin that outputs "true" with probability \frac{1}{\log(10)}. Formally, we are assuming that {S={s_n|n\in\mathbb{N}}} is an irreducible pattern with probability \frac{1}{\log(10)} as defined below.
Fix a universal Turing machine UTM and an encoding scheme for machines, and let UTM(M,x) denote running the machine UTM to simulate M with input x.
Let S\subseteq\mathbb{N} be an infinite subset of natural numbers such that \phi_N is provable or disprovable for all N\in S, and there exists a Turing machine Z which on input N runs in time T(N) and accepts N if and only if N\in S. We say that S is an irreducible pattern with probability p if there exists a constant c such that for every positive integer m\geq 3 and every quickly computable subset S'\subseteq S with at least m elements, we have |r(m,W)-p|<\frac{c{k(W)}\sqrt{\log\log m}}{\sqrt{m}}, where k(W) is the number of bits necessary to encode a Turing machine W such that, for N\in S, UTM(W,N) accepts in time T(N) if and only if N\in S', and r(m,W) is the probability that~\phi_N is provable when N is chosen uniformly at random from the first m elements of S'.
This may seem like an unmotivated definition, but there is a good reason for it. It comes from the Law of the Iterated Logarithm. A coin that outputs true with probability p will pass this test with probability 1. The definition is tight in the sense that we cannot replace the right hand side with something that diminishes more quickly as m increases. It is also important to note that while we think that this is a good definition of the subsequence being quickly indistinguishable from a coin, we really only need it as a necessary condition, so that the sequence of Benford sentences is an irreducible pattern.
Theorem: If we replace provability in the definiton of irreducible pattern with random process, such that for each N\in S the sentence \phi_N is independently called "provable" with probability p, then S would almost surely be an irreducible pattern with probability p.
Proof: Fix a Turing machine W. By the law of the iterated logarithm, there exists a constant c_1 such that \limsup_{m\rightarrow\infty} \frac{|mr(m,W)-mp|}{\sqrt{m\log\log m}}=c_1 almost surely. Therefore \sup_{m} \frac{|mr(m,W)-mp|}{\sqrt{m\log\log m}}<\infty almost surely. We will use \Phi(W) as a short hand for this supremum. For any \varepsilon>0, there therefore exists a c_2 such that \Phi(W)>c_2 with probability at most \varepsilon.
We now show that the probability that \Phi(W)>2c_2+1 is at most \varepsilon^2. It suffices to show that the probability of \Phi(W)>2c_2+1 given \Phi(W)>c_2 is at most \varepsilon. Let m_1 be the first m such that \frac{|mr(m,W)-mp|}{\sqrt{m\log\log m}}>c_2. It suffices to show that the probability that there exists an m_2 with \frac{|m_2r(m_2,W)-m_2p|}{\sqrt{m_2\log\log m_2}}-\frac{|m_1r(m_1,W)-m_1p|}{\sqrt{m_1\log\log m_1}}>c_2 is at most \varepsilon.
Observe that \frac{|m_2r(m_2,W)-m_2p|}{\sqrt{m_2\log\log m_2}}-\frac{|m_1r(m_1,W)-m_1p|}{\sqrt{m_1\log\log m_1}}\leq\frac{|m_2r(m_2,W)-m_1r(m_1,W)-(m_2-m_1)p|}{\sqrt{(m_2-m_1)\log\log (m_2-m_1)}}, and that the probability there exists an m_2 with \frac{|m_2r(m_2,W)-m_1r(m_1,W)-(m_2-m_1)p|}{\sqrt{(m_2-m_1)\log\log (m_2-m_1)}}>c_2 is the same as the probability that \Phi(W)>c_2, which is at most \varepsilon.
We have thus shown that for every \varepsilon, there exists a constant c_3=c_2+1 such that the probability that \Phi(W)\geq2^{\ell}c_3 is at most \varepsilon^{2^\ell}.
Partition the set of all Turing machines into sets \mathcal{W}1,\mathcal{W}2,\ldots, such that \mathcal{W}\ell contains all Turing machines expressed in at least 2^\ell, but fewer than 2^{\ell+1} bits. The probability that a Turing W machine in \mathcal{W}\ell violates |r(m,W)-p|<\frac{c_3{k(W)}\sqrt{\log\log m}}{\sqrt{m}}, (call this equation (\star)) for any m\geq 3 is at most \varepsilon^{2^\ell}. The number of Turing machines in \mathcal{W}\ell is at most 2^{2^{\ell+1}}, so the probability that there is any W\in\mathcal{W}\ell and m\geq 3 which violate (\star) is at most \varepsilon^{2^\ell}2^{2^{\ell+1}}.
Therefore, the probability that there is any Turing machine W and m\geq 3 which violate (\star) is at most \sum_{\ell\in\mathbb{N}} \varepsilon^{2^\ell}2^{2^{\ell+1}}=\sum_{\ell\in\mathbb{N}} (4\varepsilon)^{2^\ell}. For small enough \varepsilon this goes to 0, so for large enough c_3, the probability that (\star) holds for all W and m goes to 1. Therefore, with probability 1, there exists a c such that |r(m,W)-p|<\frac{c{k(W)}\sqrt{\log\log m}}{\sqrt{m}}, for all m and W.
It appears to me that there is a natural analogue of the concept of irreducible pattern in the language of average case complexity. Moreover, the calibration theorem for optimal predictor schemes implies they pass the Benford test in the associated sense. I’ll write it down carefully and post...
There is a typo in the text: "We say that S is an ??? with probability p." I guess this is supposed to be "irreducible pattern"?
Btw, it seems that the definition makes sense for arbitrary promise problems, you don’t have to consider provability in particular.