Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“I think that was the first realization—that clearly we cannot train on all environments that exist because we cannot, like in the difficulty of training these algorithms, but also just the practicality of defining all of the things that we want to be robust to.”
“Like it’s a huge problem in RL research, I feel like one of the major bottlenecks is the lack of data sets, benchmarks, and environments where you can really explore all of these different directions of RL research, especially when it comes to generalization. We had to take an existing benchmark and artificially change the simulation to make it look different, but it’s still pretty limited how much diversity you can get from that.”
“You could provide a reward signal and intuitively that would be able to adapt as well using rewards. And we did actually do those experiments and tried to compare like how many samples do you need with self-supervision versus how many do you need with reward? And it turns out—I don’t recall the exact numbers—but it was something like a hundred episodes or something if you do reward-based fine-tuning versus self-supervision which was like one episode.”
Referenced in this podcast
- Ludwig Schmidt
- Do ImageNet Classifiers Generalize to ImageNet?
- DeepMind Control Suite
- Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
- Self-supervised Policy Adaptation During Deployment
- The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
- Catastrophic forgetting
- Yann LeCun Cake Analogy
- Generalization in Reinforcement Learning by Soft Data Augmentation
- Reinforcement Learning with Augmented Data
- Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
- SVEA: Stabilized Q-Value Estimation under Data Augmentation
- Do Vision Transformers See Like Convolutional Neural Networks?
- Temporal Difference Learning for Model Predictive Control
- Introducing Dreamer: Scalable Reinforcement Learning Using World Models
- UC San Diego Lab advised by Xiaolong Wang
- SAPIEN by UC San Diego
- ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
- Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers
- A Survey of Generalisation in Deep Reinforcement Learning
Thanks to Tessa Hall for editing the podcast.