Towards guardrails, not guidelines: a policy framework for powerful AI systems

June 8, 2023

Incumbent interests disproportionately define today’s AI policy agenda. Their proposals are either distracting, like drafting letters of concern, or toothless, like proposing weak governance models. This needs to change. AI systems are too important to leave to their creators – we need laws and industry norms that protect people over profit.

If we don’t, we risk repeating the errors that allowed surveillance advertising to become the prevailing business model for earlier technology companies. The history of the automobile provides a counterpoint: it felt as disruptive then as AI is today, but we have managed to make it relatively safe, predictable, and broadly available. The current state of the car is no utopia, but it offers concrete parallels for how to manage increasingly powerful AI systems.

A key lesson from the automobile is that control of a new technology should belong to society and that exercising that control requires us to use all the policy levers at our disposal. For AI, this falls across five distinct but interrelated domains: the values that we build into AI systems; their technical safety; our ability to recognize their harms; the consequences for causing or aiding those harms; and how broadly accessible AI will be for individuals.

We now have a choice to make, as we did with cars more than a century ago: take charge of our relationship with AI or accept a default path set by its creators.

Drivers wanted – no experience necessary: the road to traffic safety

Adoption was swift when the car arrived. By the end of the 1920s there were 23 million registered vehicles in the U.S., more than the number of households that had a radio. But their speed and range made them dangerous: road fatalities more than tripled between 1915 and 1925, sparking an era of fear and safety innovation.

The first U.S. speed limit came in 1901, but fines weren’t enough to restrain drivers who had trouble with the transition from animal-drawn carriages to self-propelled cars, and governments around the world were forced to establish safety frameworks that included driver education, traffic laws, and signal infrastructure. But this took decades, and safety features like seat belts and padded dashboards only became part of the U.S. federal standard in the 1960s.

Even though manufacturers would create some of the most significant safety breakthroughs for cars, that only took off in the 1930s when they realized they could sell it as a differentiator. Yet, even as they promoted innovations like all-steel car bodies and hydraulic brakes, they insisted that governments bore the greater burden to make driving safer through better roads, licensing, and traffic management.

Governments sporadically got things spectacularly wrong. In 1865, the rail and horse-drawn carriage lobbies in the U.K. drummed up enough outrage to pass the Locomotive Act, whose most notable feature was an absurd requirement that a man walk in front of any self-propelled vehicle waving a red flag and blowing a horn. It stunted the U.K. automobile industry for years.

The parallels to AI are prescient. A framework for automobiles came into place over time that relied on a sweeping set of changes – from state laws to the formation of the federal Department of Transportation. These made it possible to make profit-motivated safety innovations broadly available, as well as to decide what types of vehicles deserved to be on the road, the qualifications necessary to drive them, and the conditions that make driving safe.

An AI policy framework that puts people first

Our approach to AI policy is characterized by two features: placing the interests of people over corporations (including ourselves) and using the full range of tools already built into the structure of modern society.

We intend to use a five-part framework – values, technical safety, evaluation, responsibility, and distribution – to identify where technical and policy work can improve the safety and societal benefit of AI systems. These are necessarily related. If, for example, we audit systems for bias but fail to punish the deployment discriminatory systems, we’re not going to secure the benefits of AI for everyone.

Finally, even if we agree that AI poses unique challenges and governance issues, we should absolutely not build our interventions from scratch. We already have existing legal regimes, regulatory structures, and related-industry best practices – we need to use them to address AI’s challenges instead of spinning our wheels.

1. Values

Without guidance, AI will amplify the features of our current technology ecosystem, which include the permissibility of surveillance and a tolerance of bias, manipulation, and abuse.

Developers intentionally and unintentionally encode these values into the design and deployment of AI systems by deciding what the systems will or will not do. For example, engineering an AI system so it can’t replicate is another example of a value that a system can reflect – this time about prioritizing user control over the system’s purported interests.

We’re already seeing early attempts to train large language models (LLMs) on aspirational values like the Universal Declaration of Human Rights, but we need to ask deeper questions about the actual values we wish to encode (rather than mirror) in not only the language, but the function, dissemination, and governance of these systems. Foundational commitments, like limiting undue harms, can act as a starting point.

These commitments have real-world consequences. They allow people to appeal loss of access or set privacy boundaries that go further than before. They also have non-trivial consequences for business and operating models for developers, which challenges us to ask: which of these we can leave to companies to implement on their own and which require mandating? It took decades for something as reasonable as a seatbelt to become a standard in cars, costing innumerable lives. We don’t want to repeat a mistake like that with AI.

Decisions like these – deliberate or otherwise – can make the world worse off without destroying it; the values that anchor us can prevent that from happening.

2. Technical safety

AI systems need to be robust, trustworthy, and understandable to be as safe as possible. We are deeply concerned about large-scale AI risks (like helping develop bioweapons), but AI can still be unsafe, and very badly so, even if it doesn’t trigger the most extreme of our worries.

In order to build AI safely, we should look to other complicated systems – bridges, planes, chemical processing plants, etc. – that are extraordinarily safe because they’ve been engineered from the ground up to be that way, and followed with many layers of failsafes to protect when things go wrong. And when something is sufficiently critical, governments take an especially active role, like with air traffic control.

This requires a theoretical and practical understanding of the underlying principles of every component of the systems we deploy. It also means proactively deciding not to deploy systems that are poorly understood and investing heavily in efforts to shore up that understanding, sharing those findings, and cultivating an ecosystem of safety collaboration.

Some of this can be done independently by AI developers, but others present a coordination challenge – like agreeing on deployment principles or evaluating models that are not ready for release in an accountable way – and that will require governments and third-parties to step in.

There will always be reckless and irresponsible actors. As an industry, we should explore what it means to build defensive systems that can counteract the existence of dangerous ones (e.g., proactively detecting security vulnerabilities and patching them). There is enough commercial incentive for this to unfold on its own – consider the IT security industry – but public investment can speed it along.

We have done initial work to develop principles and practices that AI systems should follow. This includes conditions like being easy to shut down, having complete and immutable logs, comprehensive monitoring, requiring input when uncertain or the stakes are escalated, and not treating them as persons. We will flesh these out, incorporate them into the designs of the systems we build, and share them with the community.

3. Evaluation

We need to know that AI systems are working as expected, but this still leaves open many important questions about what to do next. Suppose a developer red teams their system and finds something wrong. They fix as much as they can and then – what? How will the public know the risks and whether they can be mitigated? Are there clear lines for what simply isn’t acceptable, regardless of intent or regulatory compliance?

Even more challenging is the fact that most foundational AI systems are closed boxes.

Self-regulatory proposals, such as the National Institute of Standards and Technology AI Risk Management Frameworks (AI RMF), are powerful tools for good faith efforts to build AI systems responsibly. But self-regulation is no match when the interests of developers and the public diverge. After all, every major social media platform extensively tracks users despite paying lip service to privacy.

Requiring transparency might not completely solve the problem either, since mandated disclosure is easy to obfuscate and discretionary disclosure can be abused. This means that developers may have to provide open access to governments or third-parties that can actually evaluate whether the systems work as intended.

We need standards and expectations for what systems can and cannot do, and to hold developers accountable regardless of their precautionary efforts. Otherwise, we’re simply gambling with society’s well-being.

4. Responsibility

At this stage, it’s hard to find a serious voice that opposes AI regulation. Yet the most concrete prescriptions are from industry incumbents.

We appear to be on a default path to risk regulation, which presupposes that regulation can cover all the risks of a given technology. But what happens when companies follow these rules but their products still cause harm? Risk regulations are usually overseen by expert agencies and focus on collective benefit, which means that individual recourse is much harder and the likelihood that a court can shut something down is much lower.

By contrast, consider car insurance, first introduced in the U.K. in 1930 and now a de facto standard throughout the world. Instead of putting the burden of paying or suing for the costs of an accident on individual drivers, we’ve mostly mitigated one of the largest financial risks of driving by having individuals insure their vehicles in advance.

As we come to better understand the harms of these systems and their mitigations, we need to ensure that our regulatory arsenal can similarly adapt.

Beyond insisting that systems need to be better, we need to ask what should or shouldn’t exist in the first place, and to do that we need to ask harder questions: Who bears responsibility when they act in ways that are hurtful? As a society, do we have a way to make sense of those harms? And when we do pass rules, will they easily be circumvented by the deepest pockets?

We also need a dose of humility. Even outside of AI we still haven’t figured out how to mitigate diffuse harms, like privacy violations. Evaluating a wider set of regulatory options can help us get to where we need.

5. Distribution

There is a risk that the most powerful AI capabilities will be highly concentrated.

Large language models (LLMs) require access to compute resources and expertise that are expensive and increasingly scarce. And as developers create these systems, they can use them to improve future models, creating a flywheel where advantage compounds advantage. Open source models invite interesting possibilities, but they won’t break immediate compute dependencies nor can we be sure that they’ll ever be able to compete with the self-advancement flywheel.

A foundational technology that might do everything from improve medical research to provide children with personalized tutoring may end up controlled by only a few companies. This can yield problems: safety research will be limited because it will be outside-in unless systems are forced to open up; market forces might determine who can use them and who can’t, with little recourse; and with gated access, the possibilities for what can be built with them will be limited.

Given this concentration, we should prevent AI systems from being governed exclusively by their creators.

The car teaches us that we can choose to control a new technology – even if it happens in fits and spades in a context of uncertainty – and that a broad set of actions must work together to support that control.

We now have this opportunity for AI. We insist that society’s interests must supersede those of the developers of AI systems, even if that means greater financial costs and difficulty bringing new products to market – societal benefit and safety must come first.

A framework like the one we outline is a useful starting point. It gives us an opportunity to think across the stages of AI development and to ask ourselves where we can be the most effective in getting the outcomes that we want. And in so doing, we just might lay the foundation for a future where we have safely unlocked human productivity.

Thanks to Abe Fetterman, Bas van Opheusden, Eric Gu, Grant Cohen, Jamie Simon, Josh Albrecht, Kanjun Qiu, Kevin Hartnett, Maksis Knutins, Martin Schmidbaur, and Michael Rosenthal for their feedback.