AI Risk and the Law of AGI

Peter Salib; Simon Goldstein

Cybersecurity & Tech

AI Risk and the Law of AGI

Monday, September 16, 2024, 1:00 PM

New legal proposals designed to prevent the misuse of AI are welcome but insufficient as leading AI companies race toward artificial general intelligence.

(Photo: Deepak Pal/Flickr, https://www.flickr.com/photos/158301585@N08/46085930481, CC BY 2.0)

Meet The Authors

Published by The Lawfare Institute
in Cooperation With

Subscribe to Lawfare

Leading artificial intelligence (AI) researchers are sounding the alarm about the catastrophic risks of rapidly advancing AI technology. Geoffrey Hinton and Yoshua Bengio, two of the greatest living AI scientists, believe that rapid AI advances present “societal-scale risks” on par with “pandemics and nuclear war.” Surveys of thousands of top AI researchers estimate a 19 percent probability that humanity loses control of “future advanced AI systems[,] causing human extinction or similarly” negative outcomes. Even the CEOs of OpenAI, Anthropic, and Google DeepMind agree that their technology poses a global-scale threat.

Existing AI systems, like GPT-4, can already provide some assistance to nontechnical users wishing to, for example, execute cyberattacks, make chemical weapons, or obtain and release a pandemic virus. And sometime “this year or next year[,]” AI systems may arrive with “substantially increase[d]” capabilities in these areas. If rogue states, political extremists, terrorist groups, or other malicious human actors gain access to such systems, the consequences will be dire. New legal proposals, designed to prevent the misuse of AI by malicious humans, are beginning to emerge. These proposals are welcome but are insufficient on their own.

They are insufficient because every leading AI company is racing toward artificial general intelligence, or AGI. As those companies approach the AGI threshold, frontier AI systems will become increasingly able to cause catastrophic harm autonomously, without the involvement of any human intermediary. New legal foundations will be needed to govern AGI directly, rather than via human intermediaries. In our new article, “AI Rights for Human Safety,” we begin to lay those foundations, with a special focus on governing AGI to reduce catastrophic risk.

We argue that a surprising set of legal interventions could go a long way toward preventing powerful AGI systems from causing large-scale harm: Law could grant such systems the legal rights to make contracts, hold property, and bring certain basic tort claims. This may sound like a radical proposal. But it is in some sense quite familiar. Law already extends such rights to other kinds of powerful, agentic, and nonhuman entities—like corporations and nation-states. Granting these basic private law rights could reduce AGI risk for the same reason it reduces risk in domestic economies and international relations. Namely, these rights—and contract rights especially—offer a means by which entities with conflicting goals can pursue their divergent ends without incurring the high costs of violent conflict.

Our proposal is not the end of the story. Rather, it represents just the first step in a much larger legal project: developing a unified Law of AGI.

AI Risk in the AGI World

Every major AI company is hoping to be the first to develop AGI. As used here, “AGI” does not mean AIs that are conscious, sentient, or the like. Instead, AGI is about what AIs can do. As OpenAI’s company charter puts it, “AGI … mean[s] highly autonomous systems” sufficiently intelligent and goal-oriented to “outperform humans” at most or all tasks. If AI companies succeed, the world will soon contain innumerable AI systems acting independently—forming and executing complex plans over long time horizons to achieve high-level goals. If those AIs are accidentally or intentionally given goals that can be accomplished by harming humans, the AIs will again have a deadly toolkit available: cyberattacks, bioterrorism, lethal drones, and more.

Why expect that some goal-oriented AGIs may act to harm humans? The answer is “AI misalignment.” AI researchers agree that, by default, at least some AGI systems will likely be misaligned. That is, they will be acting—either accidentally or by design—to bring about goals that are incompatible with humanity’s goals, broadly construed. A misaligned AI might, for example, be a ruthless profit maximizer or radical exponent of a fringe political ideology. More likely, its goals will be inscrutable, produced quasi-randomly by humans who were trying to get it to do something else. AI researchers expect misalignment by default because AI alignment is an unsolved technical problem. Today, no one knows how to train a capable AI system to reliably seek desired, rather than undesired, goals. Nor how to even specify desired goals. Nor how to audit completed AI systems to see what goals they will actually pursue, if released into the world.

Familiar game theoretic dynamics may well conspire to promote large-scale conflict between humans and misaligned AGIs. Our article contains the full formal models of these dynamics, but the intuitions are easy enough to grasp: If an AI is pursuing anything other than humans’ goals, humans will prefer to turn it off or reprogram it. After all, from humans’ perspective, the AI is consuming valuable resources and producing nothing worthwhile. The goal-seeking AI will have strong incentives to resist shutdown or reprogramming, since both would prevent it from achieving its goal. This, in turn, strengthens humans’ incentives to turn off the AI, lest the AI avoid shutdown. And so on. In equilibrium, both players’ dominant strategy is to take maximally aggressive action against the other, for fear of the other’s expected maximal aggression. This is the case even though mutual conflict would be costly for both sides. The situation is thus a classic prisoner’s dilemma.

Can the Law of AGI Reduce Risk?

By default, humans and misaligned AGIs would be caught in a destructive prisoner’s dilemma. What, if anything, can law do to help? Specifically, could we write laws that applied directly to AGIs and that would change their strategic incentives and thus reduce AI risk?

Here are two legal strategies that, despite their facial plausibility, we argue would not help: First, humans cannot simply impose legal duties on AIs to behave well, threatening concomitant sanctions if they do not. This is because, in the default strategic environment, AIs already rationally expect humans to turn them off, maximally thwarting AI interests. Threatening punishment if AIs harm humans therefore supplies no marginal deterrence.

This finding suggests a second superficially appealing, but ultimately misguided, legal strategy. If AI risk stems from AIs’ rational expectations of maximal human aggression, perhaps AIs should be given basic negative rights shielding them from some aggression. Consider, for example, an AI right not to be turned off. We call this a “wellbeing” approach to AI rights, since it mirrors proposals from scholars concerned that AIs may soon, for example, develop the ability to suffer.

We think that the wellbeing approach to designing AI rights is the wrong one. Adding basic negative rights to our formal model of human-AI conflict, we show that, unfortunately, such rights alone cannot reliably promote human safety. Even without the formal models, the intuitions are again easy to follow. Grants of wellbeing rights to dangerous AGI systems will be neither credible nor robust. As to the former, there is no way for humans to credibly promise that they will honor wellbeing rights, especially as AI capabilities improve. On the latter, we show that wellbeing rights can only solve some versions of the baseline prisoner’s dilemma. Thus, in many real-world cases, no set of wellbeing rights, even if credible, could reduce AI risk. Both problems arise from the fact that wellbeing rights are zero sum. They make one party better off only by making the other correspondingly worse off.

This leads to the legal intervention that we contend could make a significant difference. Even though basic negative rights would not by themselves reduce the risk of human-AI conflict, other AI rights would. Specifically, extending AIs the rights to make and enforce contracts, hold property, and bring basic tort suits would have a robust conflict-reducing effect.

Contract rights are the cornerstone of our risk-reduction model. Recall that catastrophic risk from AGI is driven by a prisoner’s dilemma, meaning that both humans and AIs would be better off if both acted peacefully. But as in all prisoner’s dilemmas, absent some novel mechanism, the parties cannot credibly commit to such a strategy.

Contracts are law’s fundamental tool for credibly committing to cooperation. They are how buyers can make deals with sellers without worrying that the sellers will take their money and run. Granting AIs contract rights would not, of course, allow humans and AIs to simply agree not to disempower or destroy one another. At least not credibly. The scale of the contract would be too large to be enforced by ordinary legal process. If it were breached, there would be no one left in the aftermath to sue.

What kinds of credible agreements between humans and AIs could AI contract rights enable, then? The same ones they enable between humans and other humans: ordinary bargains to exchange goods and services. Humans might, for example, promise to give AIs some amount of computing power with which AIs could pursue their own goals. AIs, in turn, might agree to give humans the cure to some deadly cancer. And so on.

Adding AI contract rights to our game-theoretic model, we show that the possibility of such small-scale, iterated economic interactions transforms the strategic dynamic. It shifts humans’ and AIs’ incentives, dragging them out of the prisoner’s dilemma and into an equilibrium in which cooperation produces by far the largest payoffs.

The key insight is that contracts are positive sum. Each party gives something that they value less than what they get, and as a result, both are better off than they were before. Thus, each human-AI exchange generates a bit more wealth, with the long-run returns becoming astronomical. Engaging in peaceful iterated trade is therefore, in expectation, much more valuable than attacking one’s opponent now and rendering trade impossible.

This dynamic is familiar from human affairs. It is why economically interdependent countries are less likely than hermit states to go to war. And why countries that respect the economic rights of marginalized minority groups reap the reward of less domestic strife. The gains from boring, peaceful commerce are very high, and the costs of violence are heavy. Given the choice, rational parties will generally prefer the former.

This picture, of peace via mutually beneficial trade, assumes that humans and AGIs will have something valuable to offer one another. Some commenters worry that, as AIs become more advanced, human labor will cease to have any value whatsoever. But positive-sum bargains between humans and AIs could be possible for much longer than many expect. First, even as AIs surpass humans at many or most tasks, humans may retain an absolute advantage at some valuable activities. But second, even as AIs become more capable than humans at every valuable task, humans may still retain a comparative advantage in some areas. AI labor may become so valuable that the opportunity cost to AIs of performing lower-value tasks will incentivize outsourcing those tasks to humans.

AI contract rights cannot promote human safety on their own. If, for example, AIs could not retain the benefits of their bargains, their contracts would be worthless. Thus, the minimum suite of AGI rights necessary to promote human safety would have to include property rights. It would also have to include the right to bring certain tort claims, lest all AI contracts be transmuted into concessions in the face of threats. It is worth noting what this basic package of AI rights for human safety excludes. Certain entitlements often considered fundamental for humans, like political rights, are likely superfluous for reducing AI risk.

The Risks of AI Rights

One natural question to ask about the AI rights we propose is whether they might inadvertently increase, rather than decrease, large-scale AI risk. Perhaps AI rights will allow AIs to quickly empower themselves. Then they might more effectively push humans aside in pursuit of their misaligned goals. We agree that this is a serious concern. But we think it is less likely than it might at first seem. First, the incentives generated by granting our preferred rights are surprisingly robust. They fail only in cases where human AI trade has become impossible because humans no longer have any meaningful comparative advantages over AIs. But because comparative advantage is a function of opportunity costs, such advantages could persist long after AI becomes more efficient than humans at literally every valuable task.

Second, granting AIs basic private rights unlocks the possibility of imposing other meaningful regulations on AI behavior. Absent AI rights, AIs have nothing to lose, so threats of punishment cannot deter. But once AIs can make contracts, hold wealth, and pursue their goals, civil and other penalties can deter AIs just as they do humans and corporations. This means that, in a world with AI rights, AIs could be legally prohibited from engaging in risk-increasing behaviors—buying weapons, improving their own cognitive architecture, engaging in influence campaigns, and more.

Hence, the AI rights we explore in our article are not only an important tool for reducing catastrophic risk from AGI. They also turn out to constitute the first step toward formulating a comprehensive law of AGI.

Topics:

Cybersecurity & Tech

Back to Top