Oleh Rybkin, UPenn: On exploration and planning with world models

RSS · Spotify · Apple Podcasts · Pocket Casts

Oleh Rybkin is a Ph.D. student at the University of Pennsylvania and a student researcher at Google. He is advised by Kostas Daniilidis and Sergey Levine. Oleh's research focus is on reinforcement learning, particularly unsupervised and model-based RL in the visual domain. In this episode, we discuss agents that explore and plan (and do yoga), how to learn world models from video, what's missing from current RL research, and much more!

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“Say you’re in a new situation and you need to adapt to this new situation. If you’re just doing a forward pass through your neural network that’s not trained on this new situation, that’s not going to work. But if you’re doing planning, that might work a little better. It’s still probably not going to work super well because your model was never trained on this situation. So it might not generalize to it, and then the plan that you come up with will not be adequate. But that’s okay. If you’re in a new situation, there’s no way for you to come up with the optimal solution. You just have to come up with the best possible solution.”

“I think that’s a huge divide between RL now and where RL needs to be, which is that it’s not even looking at the right problems. We somehow need to figure out: how do we actually get our datasets to the scale—and the scale here means diversity—that’s large enough so that we can actually see some of this magic. And simple things like ‘can train on 50 different Atari games together’—well, that’s not going to give you generalization. You might generalize to the 51st Atari game, but maybe not, maybe actually you won’t. I highly suspect that 50 different examples won’t let you generalize to a 51st example. You probably need thousands of different examples. You would need to train this agent on a thousand of different environments.”

“Videos fundamentally have dynamic information. Dynamic information can help you. What can it help you with? Well, maybe it can help you for planning. Maybe you can learn a dynamics model on videos from the internet. And if you are training the entire internet, it will be a very generalizable dynamics model. It will tell you how the world evolves in all possible situations. You can train it on the entire YouTube or you can train it on a dataset of all movies.”

Referenced in this podcast

Andrew Jaegle, who overlapped with Oleh in Kostas Daniilidis’ lab
Tim Lillicrap and Konrad Kording and their perspectives as neuroscientists
Oleh’s papers Plan2Explore and LEXA
Cognitive Maps in Rats and Men by Tolman in 1948
Go-Explore by Ecoffet et al. 2019
OpenAI’s CLIP
Oleh’s paper Learning what you can do before doing anything
Residual Connections Encourage Iterative Inference by Jastrzębski et al. 2018
Oleh’s paper Model-Based Reinforcement Learning via Latent-Space Collocation (LatCo)
RL benchmarks Atari, DM Control, MineRL, Crafter
RMA: Rapid Motor Adaptation for Legged Robots
Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition
Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Thanks to Tessa Hall for editing the podcast.