Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“If we see all the states we’ve seen so far and look at the representations, let’s imagine that those representations have a length of one so we can think about them as points on a sphere. Then after we put each these points on the sphere, we can turn the sphere around and say, okay, where are most of the points and where are we missing points? And say, well, you’re missing points down near Antarctica or something. And then we can say, okay, let’s try to get down to Antarctica. And then we could, because we’re learning a goal condition policy, we say, okay, try to get here or try to get to a state that has this representation.”
“One thing that I’m really excited about is thinking about how we can leverage this idea of connecting contrastive learning to reinforcement learning to make use of advances in contrastive learning in other domains like NLP and computer vision. In NLP, we’ve seen really great uses of contrastive learning for things like CLIP that can connect image ideas with language using contrastive learning. And in our contrastive project, we saw how we can connect the states and the actions to the future states. As you might imagine that maybe there’s a way of plugging these components together, and indeed, you can feel that mathematically there is. And so one thing I’m really excited in exploring is saying, well, ‘can we use this to specify tasks?’ Not in terms of images of what you would want to happen, but rather language descriptions.”
“I think one of the reasons why I’m particularly excited about these problems is that these language models, they’re trained to maximize the likelihood of the next token that draws us a really strong connection to this way of treating reinforcement learning problems as predicting probabilities and as maximizing probabilities. And so I think that these tools are actually much, much more similar than they might seem on the surface.”
“I don’t know how controversial it is, but I would like to see more effort on taking even existing methods and applying them to new tasks, to real problems. I think part of this will require a shift in how we evaluate papers—evaluating them not so much on algorithmic novelty rather than on ‘did you actually solve some interesting problem?‘”
Referenced in this podcast
- Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning by Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine
- Contrastive Learning As a Reinforcement Learning Algorithm by Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine
- Diversity Is All You Need: Learning Diverse Skills Without a Reward Function by Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
- The Information Geometry of Unsupervised Reinforcement Learning by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
- Search on the Replay Buffer: Bridging Planning and Reinforcement Learning by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
- RvS: What Is Essential For Offline RL via Supervised Learning? by Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine
- Imitating Past Successes Can Be Very Suboptimal by Benjamin Eysenbach, Soumith Udatha, Sergey Levine, Ruslan Salakhutdinov
Thanks to Tessa Hall for editing the podcast.