Noam Brown, FAIR: On achieving human-level performance in poker and Diplomacy, and the power of spending compute at inference time

February 9, 2023

RSS · Spotify · Apple Podcasts · Pocket Casts

Noam Brown is a research scientist at FAIR. During his Ph.D. at CMU, he made the first AI to defeat top humans in No Limit Texas Hold 'Em poker. More recently, he was part of the team that built CICERO, which achieved human-level performance in Diplomacy. In this episode, we extensively discuss ideas underlying both projects, the power of spending compute at inference time, and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“I think it’s actually particularly relevant today because you look at where research is with large language models, you’re seeing a big emphasis on scaling up training, scaling up model capacity, but the inference cost is still fixed. And I think there’s a big open question about how can you leverage extra computation at inference time to get better. And I think if you can do that, you can get huge, huge gains. I mean, the way that we’ve done this in Poker and Go is always relatively domain specific. But if somebody can come up with a truly general way of being able to throw extra computation at inference time then you can unlock a lot of potential.”

“You really want to better model humans in a more holistic sense. The best way I could describe it is imagine you’re trying to model a human drive, and you have two models: one that perfectly predicts the human 99% of the time but 1% of the time thinks they’re gonna drive off a bridge. And then you have a different model that perfectly predicts them 98% of the time but 2% of the time they’re gonna use a different blinker. Which one is the better model of human behavior? I would say the one that is only accurate 98% of the time is the better model of human behavior.”

“Humans are really good at sniffing out weaknesses and finding exploits. Humans are really adaptive… you can’t just deploy a bot assuming that humans are gonna behave in a very stationary policy, like a very stationary way. You have to account for the fact that people are going to try to adapt and try to exploit your system. That’s one of the lessons that I took away from the poker work, we approached it with that mindset that we’re trying to find an equilibrium that even if the other players knew our policy, they would not be able to beat it.”

Referenced in this podcast

Thanks to Tessa Hall for editing the podcast.