Archit Sharma, Stanford: On unsupervised and autonomous reinforcement learning

RSS · Spotify · Apple Podcasts · Pocket Casts

Archit Sharma is a Ph.D. student at Stanford advised by Chelsea Finn. His recent work is focused on autonomous deep reinforcement learning—that is, getting real world robots to learn to deal with unseen situations without human intervention. Prior to this, he was an AI resident at Google Brain and interned with Yoshua Bengio at Mila. In this episode, we chat about unsupervised, non-episodic, autonomous reinforcement learning (and much more).

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“I was pretty interested in building the most general agent possible. And to me, reinforcement learning was maybe the only abstraction which offered the possibility for that, because you need interaction.”

“The only reason we have any hope for generalist agents is because we care about a small distribution of tasks in this entire world compared to all the possible tasks we could care about. And hopefully there’s some structure to be exploited across those tasks which can help us generalize to those things.”

“Humans learn to do something once and then just kind of keep doing it, even if that’s not the optimal thing to do. That’s exactly the idea here as well. If you know kind of how to do the task and suddenly you find yourself out of distribution, or I don’t know what to do, I kind of wanna get back to the states where I actually do know how to do it. And I just like reason my way there, whatever way possible. And then I’ll finish the task. So this is basically the same idea there as well, that you kind of drive the agent towards this part of the state space where you know how to solve the task.”

“Walking along a cliff seems naturally unsafe because it’s very close to a state where you cannot reverse it. So there’s a dynamic programming style effect where states close to our irreversible states also become unsafe.”

“There’s a lot of value in making an idea work. There’s obviously a lot of people who like to claim that ‘this is something I did back in the 90s,’ but I feel like those claims are frivolous at best because, I mean, what matters is if something works right now or not.”

Referenced in this podcast

Yoshua Bengio
Dynamics-Aware Unsupervised Discovery of Skills (DADS)
Sergey Levine
Follow up to DADS with surprising robot behavior: Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning
Autonomous Reinforcement Learning via Subgoal Curricula (NeurIPS 2021 paper)
You Only Live Once: Single-Life Reinforcement Learning (QWALE)
State-Distribution Matching Approach to Non-Episodic Reinforcement
Variational empowerment as representation learning for goal-based reinforcement learning
Shixiang Shane Gu
Steven Pinker hypothesis
Chelsea Finn
Juergen Schmidhuber
Discriminator Augmented Model-Based Reinforcement Learning
Minerva

Thanks to Tessa Hall for editing the podcast.