
RSS · Spotify · Apple Podcasts · Pocket Casts
Dylan Hadfield-Menell (Google Scholar) (Website) recently finished his PhD at UC Berkeley and is starting as an assistant professor at MIT. He works on the problem of designing AI algorithms that pursue the intended goal of their users, designers, and society in general. This is known as the value alignment problem. His most recent paper at NeurIPS is Consequences of Misaligned AI. It models the value alignment problem in AI by looking at a common situation, where the user’s true goals are only expressed to an AI system through proxies. This initially leads to positive utility, but decreases to negative utility over time as the AI system over-optimizes for the proxy objective. Their solution is to give the user the ability to update their proxied goals, thus increasing utility again. This model offers a general look at the consequences of misalignment and how AI recommender systems can be improved.
Dylan would love to hear any questions or comments on his paper, so feel free to reach out!
Highlights from our conversation:
👨👩👧👦 How to align AI to human values
📉 Consequences of misaligned AI -> bias & misdirected optimization
📱 Better AI recommender systems
Below are the show notes and full transcript. As always, please feel free to reach out with feedback, ideas, and questions!
[3:30] Dylan’s work on normative information about AI systems:
“Since then, my research has been, how to provide normative information about AI system behavior. We often talk about, in the world, the distinction between objective and subjective properties. Like predicting images from pixel to pixel is a fairly objective thing. There’s a clear well-defined right answer. You predict the right pixels or you don’t. For normative properties of the world. That’s not true.
When the right answer is not externally defined, you have to appeal to who built the system and what do they want it to do, in order to really answer that question. And I think most of my research is about trying to understand what are the channels by which we communicate that information. How do we make sure that system’s behavior aligns well enough with this subjective goal that we have.”
[18:25] The main result in Dylan’s 2020 NeurIPS paper:
“What we show is in this model, if you optimize for any fixed proxy utility function, eventually the overall utility is driven either towards a minimum at certain features or overall drives away unbounded if you don’t hit any environmental bounds. …
(Their solution:) We have a property of a proxy utility function and a true utility function such that local improvements in one lead to improvements in the other. And this implies that if you can update the features and your utility function fast enough, you can use proxy utility functions to maybe not define what you want in the long run, but to provide local directions of improvement for how your system should allocate its effort. And so this is another style of solution.”
[31:33] Obvious discoveries:
“I think that is the ultimate dream as a researcher: things that you didn’t realize before, but then seem so obvious, you can’t imagine not thinking of them in hindsight. If I can have a couple ideas like that in my career, I will call it a big success.”
[33:22] Current AI systems as an analog to compilers:
“I’ve come to believe that a lot of ML as we study it right now will fill a role in the future that’s analogous to what compilers fill in AI systems today. … Over the course of a long period of time is the combination of decision theory and statistics to effectively build a compiler that allows us to represent now more intelligent behaviors. Really just behaviors defined on complex open-world inputs in a higher level representation that can then be compiled down. That higher level representation in the form of supervised learning is a label dataset. … It’s a representation of an objective. It’s a ranking of different possible behaviors where those behaviors are encoded as the weights of the neural net, the parameters of a policy, something of that nature. We do have something like the general purpose programming language today. I think the supervised learning data set like Imagenet is, in this analogy, it looks like Python or C or something like that.”
Thanks to Luke Cheng for writing drafts of this post and Tessa Hall for editing the podcast.