Link post Copying the abstract of the paper:
The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem. I also mention this in the latest Alignment Newsletter, but I think this is probably one of the best ways to get started on AI alignment from the empirical ML perspective: it will (hopefully) give you a sense of what it is like to work with algorithms that learn from human feedback, in a more realistic setting than Atari / MuJoCo, while still not requiring a huge amount of background or industry-level compute budgets. Section 1.1 of the paper goes into more detail about the pathways to impact. At a high level, the story is that better algorithms for learning from human feedback will improve our ability to build AI systems that do what their designers intend them to do. This is straightforwardly improving on intent alignment (though it is not solving it), which in turn allows us to better govern our AI systems by enabling regulations like "your AI systems must be trained to do X" without requiring a mathematical formalization of X.
Nice! If I had university CS affiliations I would send them this with unsubtle comments that it would be a cool project to get students to try :P In fact, now that I think about it, I do have one contact through the UIUC datathon. Or would you rather not have this sort of marketing?
Comment
Love this idea. From the linked post on the BAIR website, the idea of "prompting" a Minecraft task with e.g. a brief sequence of video frames seems especially interesting. Would you anticipate the benchmark version of this would ask participants to disclose metrics such as "amount of task-specific feedback or data used in training"? Or does this end up being too hard to quantify because you’re explicitly expecting folks to use a variety of feedback modalities to train their agents?
Comment
Comment
That makes sense, though I’d also expect that LfLH benchmarks like BASALT could turn out to be a better fit for superscale models in general. (e.g. a BASALT analogue might do a better job of capturing the flexibility of GPT-N or DALL-E type models than current benchmarks do, though you’d probably need to define a few hundred tasks for that to be useful. It’s also possible this has already been done and I’m unaware of it.)
Comment
It’s not quite as interesting as I initially thought, since they allow handcrafted reward functions and heuristics. It would be more interesting if the designers did not know the particular task in advance, and the AI would be forced to learn the task entirely from demonstrations and/or natural language description.
Comment
Going from zero to "produce an AI that learns the task entirely from demonstrations and/or natural language description" is really hard for the modern AI research hive mind. You have to instead give it a shaped reward, breadcrumbs along the way that are easier, (such as allowing handcrafted heuristics and such, and allowing knowledge of a particular target task) to get the hive mind started making progress.
Comment
To clarify, by hive mind, do you mean humans?
Comment
Yes. I’m being a bit whimsical here, I’m tickled by the analogy to training neural nets.
Comment
In general: a research AI that will train agents based on its understanding of described benchmarks does sound interesting. Although how it would get ahold of things like ‘human feedback’, and craft setups for that, isn’t clear.* Trying to create a setup so AIs can learn from other AIs (crafting rewards—seems unlikely. Expert demonstrations—might be doable. Whether this would be more or less useful than a human demonstration, I’m not sure. There actually might be the option to ask ‘what would _ do in this situation’ if you can actually run _.) Edited to add: *First I was imagining Skynet. Now I’m imagining a really weird license agreement: you may use this AI in exchange for [$ + billing schedule], but with the data that’s trained on (even if seemingly totally unrelated), you must also have it try working on this AI benchmark, (frequency based on usage), and publicly share the score (but not necessarily pictures of outputs—there’s the risk of user data being leaked, by means of being recreated in minecraft, and the user retains full responsibility for the security of such data, and is encouraged but not required to run it ‘offline’, i.e. not on Minecraft servers).
It’s not "from zero" though, I think that we already have ML techniques that should be applicable here.
Comment
Right, but you’re also going for tasks that are relatively simple and easy. In the sense that, "MakeWaterfall" is something that I can, based on my own experience, imagine solving without any ML at all (but ofc going to that extreme would require massive work). It might be that for such tasks solutions using handcrafted rewards/heuristics would be viable, but wouldn’t scale to more complex tasks. If your task was e.g. "follow arbitrary natural language instructions" then I wouldn’t care about the "lax" rules.
This is certainly good, but I wonder what are the exact rules here. Suppose the designer trains a neural network to recognize trees in minecraft by getting the minecraft engine to generate lots of images of trees. The resulting network is then used as a hardcoded part of the agent architecture. Is that allowed? If not, how well can you enforce it (I imagine something of the sort can be done in subtler ways)?
Not saying that what you’re doing is not useful, just pointing out a certain way in which the benchmark might diverge from its stated aim.
Comment
Comment
Okay, regardless of what the AI safety community claims, I want to make that claim. (I think a substantial chunk of the AI safety community also makes that claim but I’m not interested in defending that here.)
Comment
The problem with this is that you need an AI whose task is "protect humanity from unaligned AIs", which is already very "general" in a way (i.e. requires operating on large scales of space, time and strategy). Unless you can effectively reduce this to many "narrow" tasks which is probably not impossible but also not easy.
I think it’s very easy to say "don’t do general system do task specific ones", because general ones might promise a lot of economic returns. A task like "Handle this Amazon customer query correctly" is already very general as it includes a host of different long tail issues about possible bugs that might appear (some of those unknown). If a customer faces an issue on a page that’s likely a bug, a customer service AI profits from understanding the code that produces the issue that the customer has. Given the way economic pressures work, I see it as very probably that companies will just go ahead and look at what’s most efficient for their business goals.
It’s not clear that a system which doesn’t use reward doesn’t have the same issue (relative to "99,9% of the time").