Oleh Rybkin, UPenn: On exploration and planning with world models

July 11, 2022

RSS · Spotify · Apple Podcasts · Pocket Casts

Oleh Rybkin is a Ph.D. student at the University of Pennsylvania and a student researcher at Google. He is advised by Kostas Daniilidis and Sergey Levine. Oleh's research focus is on reinforcement learning, particularly unsupervised and model-based RL in the visual domain. In this episode, we discuss agents that explore and plan (and do yoga), how to learn world models from video, what's missing from current RL research, and much more!

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“Say you’re in a new situation and you need to adapt to this new situation. If you’re just doing a forward pass through your neural network that’s not trained on this new situation, that’s not going to work. But if you’re doing planning, that might work a little better. It’s still probably not going to work super well because your model was never trained on this situation. So it might not generalize to it, and then the plan that you come up with will not be adequate. But that’s okay. If you’re in a new situation, there’s no way for you to come up with the optimal solution. You just have to come up with the best possible solution.”

“I think that’s a huge divide between RL now and where RL needs to be, which is that it’s not even looking at the right problems. We somehow need to figure out: how do we actually get our datasets to the scale—and the scale here means diversity—that’s large enough so that we can actually see some of this magic. And simple things like ‘can train on 50 different Atari games together’—well, that’s not going to give you generalization. You might generalize to the 51st Atari game, but maybe not, maybe actually you won’t. I highly suspect that 50 different examples won’t let you generalize to a 51st example. You probably need thousands of different examples. You would need to train this agent on a thousand of different environments.”

“Videos fundamentally have dynamic information. Dynamic information can help you. What can it help you with? Well, maybe it can help you for planning. Maybe you can learn a dynamics model on videos from the internet. And if you are training the entire internet, it will be a very generalizable dynamics model. It will tell you how the world evolves in all possible situations. You can train it on the entire YouTube or you can train it on a dataset of all movies.”

Referenced in this podcast

Thanks to Tessa Hall for editing the podcast.