Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds (NeurIPS Spotlight Paper)

5 min read

Last updated 15 Jun 2026

What Is Avalon?
Avalon is a benchmark for generalization in RL
Twenty tasks test a variety of general skills
Avalon is a fast, easy-to-use 3D simulator for constructing RL environments
Who Should Use Avalon?
If you want a more challenging RL benchmark
If you want to do research on generalization in RL
If you want to develop your own RL tasks
Getting Started
Play in the environment
Run experiments
Tutorials
Learn More
Presentation
Paper
Contact Us

Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds (NeurIPS Spotlight Paper)

We’ve learned a lot from doing research in Avalon over the past year! Internally, we’ve extended it to deal with multiple agents, and add simple audio / linguistic inputs for the agents. We’ve conducted a number of experiments on the tasks in the benchmark, and found that while well-tuned PPO approaches can achieve reasonable performance on the more basic tasks, most RL algorithms really struggle with the more complex and compositional tasks. These results pushed us towards focusing on agents that were better able to incorporate more explicit reasoning. While we believe that Avalon remains a useful tool for conducting fundamental reinforcement learning research, we are currently more focused on creating agents in text-based environments (ex: your code editor, browsers, and computer desktop environment). This means that we are unlikely to be developing many significant new features for Avalon in the near future. See more in our blog post here.

What Is Avalon?

Avalon is a benchmark for generalization in RL

Agents in Avalon must accomplish a wide range of tasks, all with the same sparse reward structure. Different tasks correspond to randomly generated worlds that require that skill.

Twenty tasks test a variety of general skills

See the paper and presentation (below) for more details about the benchmark.

Avalon is a fast, easy-to-use 3D simulator for constructing RL environments

The Avalon benchmark is built on top of a simulator that we created specifically to suit the needs of RL researchers. Features:

Fast: can simulate up to 10,000 steps per second on a single GPU.
Fully open source: built on top of the free, open source Godot game engine.
Easy to use: includes a fully featured editor for creating and debugging new environments and game logic.
Vibrant community: Godot has thousands of online tutorials, great docs, and a large base of existing users.
Simple: the entire game engine is roughly 30MB including rendering, physics, etc, and Avalon is only a few thousand lines of additional code.
Baselines included: Avalon includes clean implementations of PPO, Impala, Dreamer v2, and BYOL-Explorer, most of which have been verified to replicate the original paper results.
Accessible: state-of-the-art agents can be trained with a single GPU.

Who Should Use Avalon?

If you want a more challenging RL benchmark

Avalon can be used as a largely drop-in replacement for Atari or other standard RL benchmarks, simply see this tutorial to get started.

If you want to do research on generalization in RL

Avalon is a great place to start, as it includes highly tuned, easy-to-understand implementations of a variety of popular and high-performance RL algorithms. See this tutorial to replicate our training.

If you want to develop your own RL tasks

Avalon is an extremely fast, extremely easy-to-use platform for research, built on top of the fully open source Godot game engine. See this tutorial to create a new environment from scratch.

Getting Started

Play in the environment

Avalon is incredibly easy to try—just download 30MB for your platform, unzip, and run.

Run experiments

To get started with Avalon, simply run the below in your own notebook!

python

python

python

Tutorials

See the full documentation and source code on our Github repository, or try one of these tutorials:

Learn More

Presentation

Our paper on Avalon was published at NeurIPS 2022. We will be presenting it in person.

Paper

Check out our paper for all the details on the environment and tasks, along with human and RL baselines (PPO, IMPALA, and Dreamer v2).

To cite Avalon, use the following:

text

Contact Us

If you’re interested in using Avalon, please feel free to reach out and say hello! We’re excited to help the research community build on top of Avalon.