RSS · Spotify · Apple Podcasts · Pocket Casts · YouTube
Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
On the paradigm shift of foundation models
I was spending a lot of time thinking about robustness of machine learning because there was a suspicion that deep learning methods were able to do really well on these benchmarks, but when you actually use them in real life, they would just fall apart. And this was true with adversarial examples, both in vision but also in language. It seemed like these really high performing systems that top these leaderboards, superhuman, actually just fell apart when they didn’t work out of the domain.
So I did that for a while, and then foundation models happened. GPT-3 came out and it just blew my socks off in terms of the idea that you could train a language model, just next word prediction, and you could get a model that did way more than I could imagine. Zero-shot in-context learning and all these capabilities just emerged. It really suggested to me that there was a paradigm shift, and I think at that point I sort of said, “you know what, I could go on and break the system in all sorts of different ways, but I think that’s not where the action is — I think the action is really trying to understand these systems, harness them for applications, and understand the social impact.”
On the benefits of academia in improving AI capabilities
I think that academia has multiple functions. One is, as usual, it’s constantly creating really novel ways of doing things, proving them out, and someone can scale it up. I think there is a difference between doing things at small scale with intention of doing things at small scale, and doing things at small scale with intention of scaling up. […] FlashAttention was one of my favorite examples of something that came out of academia and now is everywhere in industry. So, I think there’s always still space for producing these more fundamental changes to how model building works. Actually, another one — direct preference optimization (DPO) — I think that’s a really influential piece of work that you don’t need that much compute to do, so there’s a lot of things you can do on the method side.
Then there’s evaluation. We already talked about that and how being a sort of neutral third party and thinking deeply about the evaluation is something that I think we’re just as good as, if not better than, people with a larger compute budget to do. And then there’s the long-term stuff about how do you do data attribution and how do you retool the whole incentive system. I don’t think industry is just going to touch that because that’s really thinking at a societal level rather than an individual organization trying to build a model.
On using agents to simulate social dynamics
There’s actually two types of agents, so we publish on both. The classical type of agents like MLAgentBench is, you basically have a language model that’s wrapped around some sort of architecture with tool use and it is able to do more things than just a raw LLM. And this is what people typically think about agents. There’s the other type of agents, which is exemplified by generative agents, and there, the idea is simulation. There’s no goal. […] The goal is just to simulate and see what happens. Say you have a city of 25 agents, each backed by an LLM, prompted to basically live their daily lives. They interact, and what you see is different types of emergent behaviors, social emergent behaviors, not within a model. And I think that’s just really fascinating. One thing I think would be really interesting is what happens if you scale this up, which will require compute and fast inference. But if you could scale it up, maybe you actually have some interesting social dynamics.
On a fairer vision for training foundation models
Longer term, what I’m really excited about is a vision of how foundation models can be built. The current status quo is you have all these people in the world who write books write essays, take pictures, create, essentially, content which then gets scraped up into datasets that you use to train foundation models, then serve people and products. And this is has many structural problems. One is that the content producers don’t get actually any credit or pay. So that’s why you see many lawsuits that are happening. Another problem is that there’s a massive amount of centralization and determining these models’ behavior, which is, again, lack of transparency, so we don’t know what’s happening behind the scenes. And I just wonder how could we do things differently? I don’t have the technical answer, but just kind of a vision to paint out. So, what if we were able to actually attribute predictions to the actual training source?
This is actually something I worked on seven years ago, but in a more limited fashion. If you could do data attribution and you could do it reliably, then maybe you could actually set up a more economically viable system where you pay people for their contributions, and that maybe incentivizes better data quality. And there wouldn’t be the same lawsuits at least because maybe as long as people are getting paid, hopefully we’ll be happier. That’s one kind of direction.
The other direction is thinking about the values that these language models embody, which is something I think is really important to foreground and not just sweep it under the umbrella of ‘we’re aligning to human values and we’re being safe’ because that is such a complex construct, especially for a single organization to say, like, ‘Oh, don’t worry, we’ll handle it.’ It’s just not a viable way forward. So, how do you make this process more democratic? How can you elicit some values or how do you have a governance structure that is more participatory and gets you more better representation so that the values of a language model are actually reflecting what people want, rather than whatever a few set of people behind closed doors decided?
On the dangers of polarization
I do think that we live in this shared world, and if everyone has their own customized model, which really is a little virtual world that they live in, that’s basically how you get polarization. And I think that is a problem that we want to fight. If you think about each of these language models in the future, I think a primary way that we’ll interact with the world and get information and also take action in the world is probably going to be mediated by these models. So, that better be tethered to reality and not just based on some money-making ad scheme that gets people to basically believe whatever they want. And there needs to be some sort of shared reality, if nothing else because the real world demands it.
Referenced in this podcast
- Stanford Question Answering Dataset (SQuAD)
- Stanford Center for Research and Foundation Models
- Foundation Models Transparency Index
- MLCommons
- Generative Agents: Interactive Simulacra of Human Behavior by Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
- MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation by Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
- Adam: A Method for Stochastic Optimization by Diederik P. Kingma, Jimmy Ba
- Tengyu Ma
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness by Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
- The Collective Intelligence Project
- Collective Constitutional AI: Aligning a Language Model with Public Input
Thanks to Tessa Hall for editing the podcast.