Contents
- Updated list of proposals by complexity class
- Proofs
- Imitative amplification with weak HCH accesses EXP
- Approval-based amplification accesses EXP
- Recursive reward modeling accesses EXP This post is a follow-up to my "Alignment proposals and complexity classes" post. Thanks to Sam Eisenstat for helping with part of the proof here.
Previously, I proved that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling access \text{PSPACE} while AI safety via market making accesses \text{EXP}. At the time, I wasn’t sure whether my market making proof would generalize to the others, so I just published it with the \text{PSPACE} proofs instead. However, I have since become convinced that the proof does generalize—and that it generalizes for all of the proposals I mentioned—such that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling all actually access \text{EXP}. This post attempts to prove that.
Updated list of proposals by complexity class
\text{P}: Imitation learning (trivial)
\text{PSPACE}: AI safety via debate (proof)
\text{EXP}: AI safety via market making (proof), Imitative amplification with weak HCH (proof below), Approval-based amplification (proof below), Recursive reward modeling (proof below)
\text{NEXP}: Debate with cross-examination (proof)
\text{R}: Imitative amplification with strong HCH (proof), AI safety via market making with pointers (proof)
Proofs
Imitative amplification with weak HCH accesses \text{EXP}
The proof here is similar in structure to my previous proof that weak HCH accesses \text{PSPACE}, so I’ll only explain where this proof differs from that one.
First, since l \in \text{EXP}, we know that for any x \in X, T_l(x) halts in O(2^{\text{poly}(n)}) steps where n = |x|. Thus, we can construct a function f_l(n) = c_1 + c_2 e^{c_3 n^{c_4}} such that for all x \in X, T_l(x) halts in less than or equal to f_l(x) steps by picking c_3,~ c_4 large enough that they dominate all other terms in the polynomial for all n \in \mathbb N. Note that f_l is then computable in time polynomial in n.
Second, let H’s new strategy be as follows:
-
Given p, let s,~ x = M(p : f(|x|)). Then, return accept/reject based on whether s is an accept or reject state (it will always be one or the other)
-
Given p : 0, return s_0,~ x_0 where s_0 is the starting state and x_0 is the empty tape symbol.
-
Given p : i, let \begin{align*} & s,~ x = M(p : i - 1) \ & x_{\text{left}} = M(p : i - 1 : -1) \ & x_{\text{right}} = M(p : i - 1 : 1). \end{align*} Then, return s^\prime,~ x^\prime where s^\prime is the next state of T_l from state s over symbol x (where if we’re already in an accept/reject state we just stay there) and x^\prime is the new tape symbol at the head (which can be determined given s,~ x,~ x_\text{left},~ x_\text{right}).
-
Given p : 0 : j, return x_0.
-
Given p : i : j, let s,~ x_\text{head} = M(p : i - 1). Then, let h be the amount by which T_l on s,~ x_\text{head} changes the head position by (such that h \in {-1, 0, 1}). If j + h = 0, let s^\prime,~ x^\prime = M(p : i - 1) otherwise let x^\prime = M(p : i - 1 : j + h). Return x^\prime.
Note that this strategy precisely replicates the strategy used in my proof that market making accesses \text{EXP} for inputs p : i and p : i : j. Thus, I’ll just defer to that proof for why the strategy works and is polynomial time on those inputs. Note that this is where l \in \text{EXP} becomes essential, as it guarantees that i and j can be represented in polynomial space.
Then, the only difference between this strategy and the market making one is for the base p input. On p, given that M(p : i) works, M(p : f(|x|)) will always return a state after T_l has halted, which will always be an accept state if T_l accepts and a reject state if it rejects. Furthermore, since |M(p : f(|x|)| \in O(1) and f is computable in polynomial time, the strategy for p is polynomial, as desired.
Since this procedure works for all l \in \text{EXP}, we get that amplification with weak HCH accesses \text{EXP}, as desired.
Approval-based amplification accesses \text{EXP}
The proof here is almost precisely the same as my previous proof that approval-based amplification accesses \text{PSPACE} with the only modification being that we need to verify that |M(q)| + |H(q, M)| are still polynomial in size for the new imitative amplification strategy given above. However, the fact that i and j are bounded above by t_p means that they are representable in polynomial space, as desired.
Recursive reward modeling accesses \text{EXP}
Again, the proof here is almost precisely the same as my previous proof that recursive reward modeling accesses \text{PSPACE} with the only modification being the same as with the above proof that approval-based amplification accesses \text{EXP}.