RSS · Spotify · Apple Podcasts · Pocket Casts
Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“I think if you looked at the structure of the masks on a simple dataset, like MNIST, you could actually see the structure, especially in the first layer; you can see that the weights that are masked are the weights that connect to empty space, usually right on the borders of the digit. That kind of analysis gets impossible as soon as you move away from MNIST. And I almost think—I’m pessimistic about taking a sparsity structure that you identify from lottery tickets to a general principle of how you should design architectures.”
“I have this speculation or hunch or hypothesis—people talk about system 1 and system 2, right? And they say deep learning does system 1 mostly. But for humans, we don’t always rely on system 2 to solve questions that are reasoning problems. […] You have to make a conscious decision when to do that. It takes energy. It’s not the default state. So my speculation is that these large pre-trained models also have both system 1 and system 2 capabilities, it’s just they may be in different proportions. And also the system 2 capabilities might be masked by system 1—by the more correlational and easier and probably stronger features in the model.”
“For humans, even though we don’t like the idea of forgetting, we think it’s a bad thing, it’s actually a very important mechanism in our brain that helps with learning, helps with processing information. So could it be the case that that same process could be beneficial to artificial neural networks?”
Referenced in this podcast
- The Lottery Ticket Hypothesis
- Hattie’s paper Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
- Supermasks in Superposition
- The “system 1 and system 2” concept from Thinking, Fast and Slow by Daniel Kahnema
- Coherent Gradients
- LCA: Loss Change Allocation for Neural Network Training
- Compositional Languages Emerge in a Neural Iterated Learning Model
- Knowledge Evolution in Neural Networks
- Hattie’s paper Fortuitous Forgetting in Connectionist Networks
- RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr
- The Primacy Bias in Deep Reinforcement Learning
- Chris Olah and his work on neural circuits
- Greg Yang and his work on neural tangent kernels
Thanks to Tessa Hall for editing the podcast.