Hattie Zhou, Mila: On supermasks, iterative learning, and fortuitous forgetting

October 14, 2022

RSS · Spotify · Apple Podcasts · Pocket Casts

Hattie Zhou is a Ph.D. student at Mila working with Hugo Larochelle and Aaron Courville. Her research focuses on understanding how and why neural networks work, starting with deconstructing why lottery tickets work and most recently exploring how forgetting may be fundamental to learning. Prior to Mila, she was a data scientist at Uber and did research with Uber AI Labs. In this episode, we chat about supermasks and sparsity, coherent gradients, iterative learning, fortuitous forgetting, and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“I think if you looked at the structure of the masks on a simple dataset, like MNIST, you could actually see the structure, especially in the first layer; you can see that the weights that are masked are the weights that connect to empty space, usually right on the borders of the digit. That kind of analysis gets impossible as soon as you move away from MNIST. And I almost think—I’m pessimistic about taking a sparsity structure that you identify from lottery tickets to a general principle of how you should design architectures.”

“I have this speculation or hunch or hypothesis—people talk about system 1 and system 2, right? And they say deep learning does system 1 mostly. But for humans, we don’t always rely on system 2 to solve questions that are reasoning problems. […] You have to make a conscious decision when to do that. It takes energy. It’s not the default state. So my speculation is that these large pre-trained models also have both system 1 and system 2 capabilities, it’s just they may be in different proportions. And also the system 2 capabilities might be masked by system 1—by the more correlational and easier and probably stronger features in the model.”

“For humans, even though we don’t like the idea of forgetting, we think it’s a bad thing, it’s actually a very important mechanism in our brain that helps with learning, helps with processing information. So could it be the case that that same process could be beneficial to artificial neural networks?”

Referenced in this podcast

Thanks to Tessa Hall for editing the podcast.