
RSS · Spotify · Apple Podcasts · Pocket Casts
Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley. His main research interest is in designing machine learning systems that are reliable and aligned with human values. Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment. His most recent paper at ICLR 2021 proposes a new test to measure an NLP model’s accuracy on a wide variety of tasks, ranging from mathematics, US history, law, and more. It provides a measurement tool to help researchers specify an important problem: while current models can achieve superhuman performance on benchmarks, they lack the ability to understand language on a whole. Another of Jacob’s papers at ICLR focuses on measuring a language model’s knowledge of basic concepts of morality. It shows that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Highlights from our conversation:
📜 “Test accuracy is a very limited metric.“
👨👩👧👦 “You might not be able to get lots of feedback on human values.”
📊 ”I’m interested in measuring the progress in AI capabilities.”
Below are the show notes and full transcript. As always, please feel free to reach out with feedback, ideas, and questions!
[11:33] On the freedom of knowing how to communicate unusual ideas:
“But I had to learn how to write a good paper without having a template. I think it required me to learn to become a significantly better writer. And I think that helped later on, because it made me feel more comfortable pursuing unusual ideas. I knew I had the skills to present those ideas. As long as I believed in them, I could get other people to believe in them.”
[34:55] On learning hard phenomena from big data sets:
“People have historically been interested in these parts, like compositionality of objects and occlusion…but thinking about these complicated things directly is just not really the right way to go. You just want this very diverse distribution of things that are deeply ingrained in evolutionary history as opposed to being part of explicit reasoning”
[21:10] Why measurements matters for AI safety:
“I’ve been really obsessed with this idea of measurement. First of all, test accuracy is a very limited metric. What are we trying to do with it? I’m kind of thinking in analogy with climate change as another field. For a while, there was a lot of climate skepticism or climate denial. At some point it becomes pretty clear, when there’s regular heat waves fires and that sort of thing. You probably wanted to do something about it before that point. Having these more subtle measurements that you can look at are important. And the other thing is I think it actually laid the groundwork for the more extreme weather events to become a convincing signal.”
Thanks to Luke Cheng for writing drafts of this post and Tessa Hall for editing the podcast.