RSS · Spotify · Apple Podcasts · Pocket Casts
Some highlights from our conversation
“The way we train neural nets, the way we do supervised learning, it’s super convenient, and it’s gotten us very far. But the way we do it is so different from how humans learn. Neural nets are trained from scratch on IID images and one-hot labels. Humans learn on interactive, dynamic experience where their task is constantly changing and they’re always observing distribution shifts.”
“It’s difficult to think how academics can really contribute when they aren’t able to train at that kind of compute scale like Google or Facebook can. But if we study formal problems, where the problems can be studied at small scale, we can make progress.”
“The disentanglement definition, kind of what was put out by beta-VAE, was saying ‘we want each dimension to represent different information.’ But […] some things literally can’t be put into a single continuous latent. If I talk about 3D rotation, 3D rotation is correlated. So how am I going to put three-dimensional rotation into a single latent dimension?”
Referenced in this podcast
- Key papers that inspired Yash’s interest in adversarial examples: Szegedi et al. 2014, Goodfellow et al. 2014, and Carlini & Wagner 2016
- Matthias Bethge’s lab, where Yash is doing his PhD work
- Bernhard Schölkopf and his work on causality
- The Book of Why by Judea Pearl
- Foundational work on disentangled systems by Irina Higgins and coauthors: beta-VAE and a more recent paper from 2018 working towards a definition of disentangled representations
- Yoshua Bengio and his work on “factors of variation”
- Locatello et al. 2018, which won best paper at ICML 2019
- Invariant Risk Minimization (IRM) by Arjovsky et. al. and a follow-up paper In Search of Lost Domain Generalization by Gulrajani & Lopez-Paz
- A recent paper on non-linear IRM by Lu et al.
- AI-generating algorithms (AI-GAs) by Jeff Clune
Further discussions
-
Yash brought up the dilemma of exploration vs. exploitation in research, explaining why he decided to switch his focus in grad school instead of continuing to build on his expertise in adversarial robustness. In particular, he noted that incoming grad students in an increasingly competitive admissions landscape are often expected to already have experience in whatever topic they plan to specialize in. How can new researchers optimally balance exploitation of previous experience with exploration of the broader field?
-
We discussed how lack of robustness to adversarial examples provides a human-to-AI comparison that seems worth digging into, as it demonstrates severe out of distribution generalizability. On the other hand, these are not naturally occurring distribution shifts. Beyond its practical security implications, does robustness to adversarial examples have anything to teach us about intelligence?
Thanks to Tessa Hall for editing the podcast.