Minqi Jiang, UCL: On environment and curriculum design for general RL agents

RSS · Spotify · Apple Podcasts · Pocket Casts

Minqi Jiang is a Ph.D. student at UCL and FAIR, advised by Tim Rocktäschel and Edward Grefenstette. Minqi is interested in how simulators can enable AI agents to learn useful behaviors that generalize to new settings. He is especially focused on problems at the intersection of generalization, human-AI coordination, and open-ended systems. In this episode, we chat about environment and curriculum design for reinforcement learning, model-based RL, emergent communication, open-endedness, and artificial life.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“When I read the ProcGen paper from OpenAI, one thing I was also thinking in terms of the ordering was what if we just thought about the environment as your opponent? So what if we thought about single-agent RL as more of a two-agent problem where basically you had an adversarial environment that acted as your opponent.”

“I believe in the value of model-based [RL]. I think that a lot of model-based work does focus on slightly toy settings. And it’s toy not because of the environment, it’s toy because of the premise of the studies, in the sense that a lot of times when you look at model-based papers, they’re essentially learning a model of the RL environment. But the RL environment, by assumption, is already a model — you already have a perfect model for that domain […] in the form of a reinforcement learning environment simulator.”

“One of the huge benefits of learning a model [is] that when you learn the model as a neural network you essentially get a differentiable simulator for free.”

“In supervised learning [unlike RL], there is an exploration problem (that’s how you got your data) but we just assume it’s already solved. We assume that there is an outside process that did the exploring and collected all the data.”

Referenced in this podcast

OpenAI’s ProcGen
Minqi’s paper Prioritized Level Replay (PLR)
DeepMind’s AlphaStar
UCB Auto-DrAC paper by Roberta Raileanu et al.
Protagonist Antagonist Induced Regret Environment Design (PAIRED) paper by Michael Dennis et al.
Dreamer paper by Danijar Hafner et al.
Google’s Brax
Learning to Communicate with Deep Multi-Agent Reinforcement Learning by Jakob Foerster et al.
Multi-agent PPO (MAPPO) paper
When Do Curricula Work?
Robust PLR
ACCEL

Thanks to Tessa Hall for editing the podcast.