Our Work

We build AI agents: coding systems that can reason about our problems and build software to solve them.

We want to make it possible for every person to build bespoke software, so that our computers can help us do more of what matters to each of us.

Highlights

Training a 70B model from scratch: open-source tools, evaluation datasets, and learnings

ResearchJune 25, 2024

Earlier this year, we pre-trained and fine-tuned a 70B-parameter model that outperforms GPT-4o zero-shot on a range of reasoning and coding…

Read more
Training a 70B model from scratch: open-source tools, evaluation datasets, and learnings

Imbue raises $200M to build AI systems that can reason and code

CompanySeptember 7, 2023

We’re excited to announce our latest funding round, a $200M Series B at a valuation of over $1 billion, with participation from Astera…

Read more

Our Approach to Agents


We take a full-stack approach to building agents...

...from training foundation models to designing new product interfaces:

  • Models: We train models, doing everything from pre-training a 70B model from scratch to reinforcement learning to fine-tuning models for specific tasks like detecting ambiguity. Today's models, while impressive, still lack important reasoning capabilities, such as the ability to understand whether their outputs are correct. Much of our work focuses on "last-mile reasoning": executing a series of steps to perform robustly and reliably in real-world scenarios.

  • Evaluations: Evaluations are one of the most critical aspects of creating high-performing machine learning systems. We invest heavily in internal metrics and datasets to evaluate model and agent performance on realistic coding tasks. This allows us to precisely measure the effects of new research ideas, enabling us to predict the performance of larger models using experiments from much smaller models. We continuously develop and refine evaluation tasks to make them more relevant to real-world problems.

  • Product: Our goal is to create self-coding agents that enable anyone to build software. To get there, we believe in "serious use": using agents to solve real problems. We start by building for technical users who can read and write code — as self-coding capabilities improve over time, less technical users will be able to create bespoke software. We design agents that collaboratively work with people and allow us to trust what's being created.

  • Infrastructure and tooling: We invest in building tools to speed up our iteration loop, from agent debugging interfaces to hyperparameter optimizers like CARBS. We've set up multi-thousand GPU clusters from the ground up, and even open-sourced some tools we developed to deploy and maintain such clusters.

  • Theory: We pursue fundamental laws behind deep learning in order to create a robust foundation for agents. We believe that by deeply understanding these systems, we can collectively move to more engineering-driven, safe, and understandable methods for creating AI agents that can reason and code.

All Work