RSS · Spotify · Apple Podcasts · Pocket Casts
Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
“I think it’s actually particularly relevant today because you look at where research is with large language models, you’re seeing a big emphasis on scaling up training, scaling up model capacity, but the inference cost is still fixed. And I think there’s a big open question about how can you leverage extra computation at inference time to get better. And I think if you can do that, you can get huge, huge gains. I mean, the way that we’ve done this in Poker and Go is always relatively domain specific. But if somebody can come up with a truly general way of being able to throw extra computation at inference time then you can unlock a lot of potential.”
“You really want to better model humans in a more holistic sense. The best way I could describe it is imagine you’re trying to model a human drive, and you have two models: one that perfectly predicts the human 99% of the time but 1% of the time thinks they’re gonna drive off a bridge. And then you have a different model that perfectly predicts them 98% of the time but 2% of the time they’re gonna use a different blinker. Which one is the better model of human behavior? I would say the one that is only accurate 98% of the time is the better model of human behavior.”
“Humans are really good at sniffing out weaknesses and finding exploits. Humans are really adaptive… you can’t just deploy a bot assuming that humans are gonna behave in a very stationary policy, like a very stationary way. You have to account for the fact that people are going to try to adapt and try to exploit your system. That’s one of the lessons that I took away from the poker work, we approached it with that mindset that we’re trying to find an equilibrium that even if the other players knew our policy, they would not be able to beat it.”
Referenced in this podcast
- CICERO
- Annual Computer Poker Competition
- Imperfect information game
- Rise of the Poker Bots, VICE News
- Hanabi Learning Environment
- ReBeL: A general game-playing AI bot that excels at poker and more
- DeepMind’s AlphaZero
- Counterfactual Regret Minimization
- DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker
- Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
- No-Press Diplomacy from Scratch
- No Press Diplomacy: Modeling Multi-Agent Gameplay (Mila paper)
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Google)
- MuZero: Mastering Go, chess, shogi and Atari without rules
Thanks to Tessa Hall for editing the podcast.