Nicklas Hansen, UCSD: On long-horizon planning and why algorithms don't drive research progress

RSS · Spotify · Apple Podcasts · Pocket Casts

Nicklas Hansen is a Ph.D. student at UC San Diego advised by Prof Xiaolong Wang and Prof Hao Su. He is also a student researcher at Meta AI. Nicklas' research interests involve developing machine learning systems, specifically neural agents, that have the ability to learn, generalize, and adapt over their lifetime. In this episode, we talk about long-horizon planning, adapting reinforcement learning policies during deployment, why algorithms don't drive research progress, and much more!

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“I think that was the first realization—that clearly we cannot train on all environments that exist because we cannot, like in the difficulty of training these algorithms, but also just the practicality of defining all of the things that we want to be robust to.”

“Like it’s a huge problem in RL research, I feel like one of the major bottlenecks is the lack of data sets, benchmarks, and environments where you can really explore all of these different directions of RL research, especially when it comes to generalization. We had to take an existing benchmark and artificially change the simulation to make it look different, but it’s still pretty limited how much diversity you can get from that.”

“You could provide a reward signal and intuitively that would be able to adapt as well using rewards. And we did actually do those experiments and tried to compare like how many samples do you need with self-supervision versus how many do you need with reward? And it turns out—I don’t recall the exact numbers—but it was something like a hundred episodes or something if you do reward-based fine-tuning versus self-supervision which was like one episode.”

Referenced in this podcast

Thanks to Tessa Hall for editing the podcast.