RSS · Spotify · Apple Podcasts · Pocket Casts
Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“I was pretty interested in building the most general agent possible. And to me, reinforcement learning was maybe the only abstraction which offered the possibility for that, because you need interaction.”
“The only reason we have any hope for generalist agents is because we care about a small distribution of tasks in this entire world compared to all the possible tasks we could care about. And hopefully there’s some structure to be exploited across those tasks which can help us generalize to those things.”
“Humans learn to do something once and then just kind of keep doing it, even if that’s not the optimal thing to do. That’s exactly the idea here as well. If you know kind of how to do the task and suddenly you find yourself out of distribution, or I don’t know what to do, I kind of wanna get back to the states where I actually do know how to do it. And I just like reason my way there, whatever way possible. And then I’ll finish the task. So this is basically the same idea there as well, that you kind of drive the agent towards this part of the state space where you know how to solve the task.”
“Walking along a cliff seems naturally unsafe because it’s very close to a state where you cannot reverse it. So there’s a dynamic programming style effect where states close to our irreversible states also become unsafe.”
“There’s a lot of value in making an idea work. There’s obviously a lot of people who like to claim that ‘this is something I did back in the 90s,’ but I feel like those claims are frivolous at best because, I mean, what matters is if something works right now or not.”
Referenced in this podcast
- Yoshua Bengio
- Dynamics-Aware Unsupervised Discovery of Skills (DADS)
- Sergey Levine
- Follow up to DADS with surprising robot behavior: Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning
- Autonomous Reinforcement Learning via Subgoal Curricula (NeurIPS 2021 paper)
- You Only Live Once: Single-Life Reinforcement Learning (QWALE)
- State-Distribution Matching Approach to Non-Episodic Reinforcement
- Variational empowerment as representation learning for goal-based reinforcement learning
- Shixiang Shane Gu
- Steven Pinker hypothesis
- Chelsea Finn
- Juergen Schmidhuber
- Discriminator Augmented Model-Based Reinforcement Learning
- Minerva
Thanks to Tessa Hall for editing the podcast.