RSS · Spotify · Apple Podcasts · Pocket Casts
Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“Say you’re in a new situation and you need to adapt to this new situation. If you’re just doing a forward pass through your neural network that’s not trained on this new situation, that’s not going to work. But if you’re doing planning, that might work a little better. It’s still probably not going to work super well because your model was never trained on this situation. So it might not generalize to it, and then the plan that you come up with will not be adequate. But that’s okay. If you’re in a new situation, there’s no way for you to come up with the optimal solution. You just have to come up with the best possible solution.”
“I think that’s a huge divide between RL now and where RL needs to be, which is that it’s not even looking at the right problems. We somehow need to figure out: how do we actually get our datasets to the scale—and the scale here means diversity—that’s large enough so that we can actually see some of this magic. And simple things like ‘can train on 50 different Atari games together’—well, that’s not going to give you generalization. You might generalize to the 51st Atari game, but maybe not, maybe actually you won’t. I highly suspect that 50 different examples won’t let you generalize to a 51st example. You probably need thousands of different examples. You would need to train this agent on a thousand of different environments.”
“Videos fundamentally have dynamic information. Dynamic information can help you. What can it help you with? Well, maybe it can help you for planning. Maybe you can learn a dynamics model on videos from the internet. And if you are training the entire internet, it will be a very generalizable dynamics model. It will tell you how the world evolves in all possible situations. You can train it on the entire YouTube or you can train it on a dataset of all movies.”
Referenced in this podcast
- Andrew Jaegle, who overlapped with Oleh in Kostas Daniilidis’ lab
- Tim Lillicrap and Konrad Kording and their perspectives as neuroscientists
- Oleh’s papers Plan2Explore and LEXA
- Cognitive Maps in Rats and Men by Tolman in 1948
- Go-Explore by Ecoffet et al. 2019
- OpenAI’s CLIP
- Oleh’s paper Learning what you can do before doing anything
- Residual Connections Encourage Iterative Inference by Jastrzębski et al. 2018
- Oleh’s paper Model-Based Reinforcement Learning via Latent-Space Collocation (LatCo)
- RL benchmarks Atari, DM Control, MineRL, Crafter
- RMA: Rapid Motor Adaptation for Legged Robots
- Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition
- Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
Thanks to Tessa Hall for editing the podcast.