Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“When I read the ProcGen paper from OpenAI, one thing I was also thinking in terms of the ordering was what if we just thought about the environment as your opponent? So what if we thought about single-agent RL as more of a two-agent problem where basically you had an adversarial environment that acted as your opponent.”
“I believe in the value of model-based [RL]. I think that a lot of model-based work does focus on slightly toy settings. And it’s toy not because of the environment, it’s toy because of the premise of the studies, in the sense that a lot of times when you look at model-based papers, they’re essentially learning a model of the RL environment. But the RL environment, by assumption, is already a model — you already have a perfect model for that domain […] in the form of a reinforcement learning environment simulator.”
“One of the huge benefits of learning a model [is] that when you learn the model as a neural network you essentially get a differentiable simulator for free.”
“In supervised learning [unlike RL], there is an exploration problem (that’s how you got your data) but we just assume it’s already solved. We assume that there is an outside process that did the exploring and collected all the data.”
Referenced in this podcast
- OpenAI’s ProcGen
- Minqi’s paper Prioritized Level Replay (PLR)
- DeepMind’s AlphaStar
- UCB Auto-DrAC paper by Roberta Raileanu et al.
- Protagonist Antagonist Induced Regret Environment Design (PAIRED) paper by Michael Dennis et al.
- Dreamer paper by Danijar Hafner et al.
- Google’s Brax
- Learning to Communicate with Deep Multi-Agent Reinforcement Learning by Jakob Foerster et al.
- Multi-agent PPO (MAPPO) paper
- When Do Curricula Work?
- Robust PLR
Thanks to Tessa Hall for editing the podcast.