Supervise Process, not Outcomes

https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes

Contents

The spectrum

Supervising outcomes

Supervision of outcomes is what most people think about when they think about machine learning. Local components are optimized based on an overall feedback signal:

Supervising process

If you’re not optimizing based on how well something works empirically (outcomes), then the main way you can judge it is by looking at whether it’s structurally the right thing to do (process). For many tasks, we understand what pieces of work we need to do and how to combine them. We trust the result because of this reasoning, not because we’ve observed final results for very similar tasks:

In between process and outcomes

Many tasks can be approached in both ways, and in practice, most systems will likely end up somewhere in between. Examples: Search engine:

It’s better to supervise process than outcomes

Why prefer supervision of process? If we don’t need to look at outcomes, then:

Differential capabilities: Supervising process helps with long-horizon tasks

We’d like to use AI to advance our collective capability at long-horizon tasks like:

Alignment: Supervising process is safety by construction

With outcome-based systems, we’ll eventually have AI that is incentivized to game the outcome evaluations. This could lead to catastrophes through AI takeover. (Perhaps obvious to most readers, but seems worth making explicit: A big reason we care about alignment is that we think that, from our current vantage point, the world could look pretty crazy[1] in a few decades.) What is the endgame for outcome-based systems? Because we can’t specify long-term objectives like "don’t cause side-effects we wouldn’t like if we understood them", we’re using proxy objectives that don’t fully distinguish "things seem good" from "things are good". As ML systems get smarter, eventually all of the optimization effort in the world is aimed at causing high evaluations on these proxies. If it’s easier to make evaluations high by compromising sensors, corrupting institutions, or taking any other bad actions, this will eventually happen. Suppose instead that we understood the role of each component, and that each component was constructed based on arguments that it will fulfill that role well; or it was constructed and understood by something whose behavior we understood and constructed to fulfill its role. In that case, we may be able to avoid this failure mode. This is closely related to interpretability and reducing risks from inner alignment failures:

In the long run, differential capabilities and alignment converge

Today, differential capabilities and alignment look different. Differential capabilities are starting to matter now. Alignment is a much less prominent issue because we don’t yet have AI systems that are good at gaming our metrics. In the crazy future, when automated systems are much more capable and make most decisions in the world, differential capabilities and alignment are two sides of the same coin:

Two attractors: The race between process- and outcome-based systems

Outcome-based optimization is an attractor

In some sense, you could almost always do better through end-to-end training, at least according to any one metric. You start with a meaningful task decomposition, track a global metric, and then backpropagate to make the system better along that metric. This messes with the meaning of the components and soon, they can’t be interpreted in isolation anymore. We expect that, at some point, there will be strong pressure to optimize the components of most digital systems we’re using for global metrics. The better we are at building process-based systems, the less pressure there will be.

Process-based optimization could be an attractor, too

The good crazy future is one with an ecosystem of AIs made out of components with roles that are in principle human-understandable, with each component optimized based on how well it accomplishes its local role. Advanced process-based systems could self-regulate to remain process-based, which makes them a local attractor:

The state of the race

Today, process-based systems are ahead: Most systems in the world don’t use much machine learning, and to the extent that they use it, it’s for small, independently meaningful, fairly interpretable steps like predictive search, ranking, or recommendation as part of much larger systems. However, the history of machine learning is the bitter lesson of outcomes winning. Vision and NLP started with more structured systems, which were replaced with end-to-end systems. In these areas, the structured systems are much worse, and we don’t know how to make them competitive on standard benchmarks. Deepmind and OpenAI have better infrastructure for running RL on outcome-based metrics than for collecting process-based feedback. They tend towards a "research aesthetic" that favors outcomes-based approaches even in cases where they work worse. Overall, it’s up in the air which tasks will be solved in which way. Some parts of the AI community are leaning toward process, others toward outcomes. If we see impressive results from process-based feedback, institutional knowledge and research tastes may shift toward process-based systems. Future norms and laws, perhaps similar to existing algorithmic transparency laws, might strengthen this position. We don’t need process-based systems to be a perfect attractor. If most systems are largely process-based around the time of transformative AI, with small amounts of outcome-based optimization, we’re likely in good shape.

Conclusion

If we run into trouble with early advanced AI systems, it will likely be clear that supervision of process would be better than supervision of outcomes. At that point, the question is whether we’re good enough at process-based systems that they’re a realistic option. If so, then for the most important and high-stakes use cases, people will likely switch. This requires that we develop the relevant know-how now. Beyond AI, we view understanding how to build systems and institutions that make correct decisions even when outcomes aren’t available as part of a broader agenda of advancing reason and wisdom in the world. Making mistakes about the long-term consequences of our short-term decisions is one way we fall short of our potential. Making wise decisions in cases where we can’t easily learn from our failures is likely key to living up to it. Acknowledgments Thanks to Paul Christiano and Jon Uesato for relevant discussions, and Jon Uesato, Owain Evans, Ben Rachbach, and Luke Stebbing for feedback on a draft.

Comment

https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes?commentId=hjRPzo2qGwLzbLeoy

I don’t think I buy the argument for why process-based optimization would be an attractor. The proposed mechanism—an evaluator maintaining an "invariant that each component has a clear role that makes sense independent of the global objective"—would definitely achieve this, but why would the system maintainers add such an invariant? In any concrete deployment of a process-based system, they would face strong pressure to optimize end-to-end for the outcome metric. I think the way process-based systems could actually win the race is something closer to "network effects enabled by specialization and modularity". Let’s say you’re building a robotic arm. You could use a neural network optimized end-to-end to map input images into a vector of desired torques, or you could use a concatenation of a generic vision network and a generic action network, with a common object representation in between. The latter is likely to be much cheaper because the generic network training costs can be amortized across many applications (at least in an economic regime where training cost dominates inference cost). We see a version of this in NLP where nobody outside the big players trains models from scratch, though I’m not sure how to think about fine-tuned models: do they have the safety profile of process-based systems or outcome-based systems?

https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes?commentId=Emh8AKeL3JF7j2RDC

This approach reminds me of the six-sigma manufacturing philosophy which was very successful and impactful in improving manufactured products quality.

Comment

https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes?commentId=6onap4JszXkYnjoG6

Thanks for that pointer. It’s always helpful to have analogies in other domains to take inspiration from.

https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes?commentId=HXuAZH3LysQ8qvnD8

It’s not clear to me that as complexity increases, process-based systems are actually easier to reason about, debug, and render safe than outcome-based systems. If you tell me an ML system was optimized for a particular outcome in a particular environment, I can probably predict its behavior and failure modes much better than an equivalently performant human-written system involving 1000s of lines of code. Both types of systems can fail catastrophically with adversarially selected inputs, but it’s probably easier to automatically generate such inputs (and thus, to guard against them) for the ML system. So it’s still plausible to me that our limited budget of human supervision should be spent on specifying the outcome better, rather than on specifying and improving complex modular processes.