When Manipulating AI Is a Crime
Some forms of “prompt injection” may violate federal law.
Published by The Lawfare Institute
in Cooperation With
In light of recent advancements in generative artificial intelligence (AI) technology, experts have been warning of attack methods capable of manipulating AI. Specifically, certain users have been experimenting with “prompt injection” attacks, where a user intentionally engineers a prompt designed to bypass hard-coded content generation restrictions. As a result, different AI systems have been tricked into generating content that is either dangerous or illegal. Some examples of such content include step-by-step instructions for building a bomb, hotwiring a car, creating meth, or hacking a computer. In other contexts, AI may be manipulated to generate deepfake pornographic videos or child sexual abuse material (CSAM).
AI manipulation attacks, such as prompt injection, raise critical questions for the law. Surely, some forms of AI manipulation may be acceptable as protected speech or security research. However, if prompt injection generates malicious code that is readily actionable, the consequences of executing such code could be dire. More importantly, would the act of AI manipulation be considered a form of cyber crime? In a forthcoming law review article, I argue that it does.
Federally, most cyber crimes are dealt with by 18 U.S.C. § 1030, the Computer Fraud and Abuse Act (CFAA), which, among other things, criminalizes acts of access without authorization or in excess of authorization to a computer—in other words, hacking. Under 1030(a)(2), anyone who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains … information from any protected computer” commits a crime. Notwithstanding the critique of the CFAA—its vagueness, broad scope, and outdated approach to hacking—could AI manipulation be considered an act of access without authorization? I argue that it could.
My main argument is that bypassing a code-based restriction through prompt injection is a form of access without authorization. In Van Buren v. United States, the Supreme Court articulated the test to distinguish authorization from lack thereof. According to the Court, the test is whether there is a technological gate preventing the individual from accessing a certain portion of the computer system. The Court refers to that test as the “gates up-or-down” inquiry. While the analogy is imperfect in many ways, its rationale comes from the code-based approach to authorization. In other words, is there code preventing a user from performing a certain action or accessing data? In non-AI cyber crime, this usually means that a user has bypassed an authentication screen or took advantage of a vulnerability in the code to gain access. Similarly, in the AI context, there is code behind every generative AI system that delineates which content is permissible and which isn’t. If a user knowingly and intentionally bypasses such code, that should similarly be considered access without authorization.
This approach is not without its challenges. One objection that could be raised is whether AI content restrictions constitute code-based restrictions for the purposes of the CFAA. To be clear, the CFAA and its surrounding case law make little distinction between types of code designed to protect against unauthorized outsiders. As Orin Kerr observed, bypassing code would constitute a cyber crime only if the code is a “real barrier” as opposed to a “speed bump.” Today’s landscape of generative AI makes that distinction difficult to operationalize. How good should the code behind an AI system be to be considered a real barrier? And how likely are we even to extract and fully understand the code behind new AI systems? It is possible that to criminalize certain forms of AI manipulation, we would have to come up with a new approach to authorization, perhaps one more focused on mens rea and the nature of the content generated. The cases of users generating malicious code, deepfake porn, and CSAM are good examples of why the nature of the content generated matters more than whether the code was a good enough barrier to prevent such generation.
An easier way to understand prompt injection as access without authorization is through an analogy to an attack that exploits a vulnerability in the database management language—structured query language (SQL)—allowing the person to access portions of a database they are not authorized to access. Colloquially, this type of attack is called SQL injection, which, just like prompt injection, allows a person to manipulate the database into performing some sort of action, such as modify, delete, or reveal restricted data in the database. The two types of attack are similar because both are designed to perform an action that is restricted by code under normal circumstances. Since, as Kerr notes, SQL injections are “unauthorized and illegal,” there is a strong case that prompt injections should be treated the same.
Prompt injection also violates the CFAA because manipulating AI to generate forbidden content constitutes obtaining information, which is one of the elements required for access without authorization. Unlike regular databases that store confidential data, generative AI generates content based on its training data. Despite the difference in technology, generated information can nonetheless constitute an act of obtaining information from a protected computer. It makes little difference whether the information in question was already in storage in a database or was generated on the spot based on a prompt.
Finally, to charge someone with access without authorization, one must do so intentionally. Deliberately engineering a prompt designed to manipulate AI is an intentional act. The CFAA criminalizes acts of access without authorization so long as they’re intentional. There is no question that some user prompts could be unintentional or that AI could be easily manipulated in some instances even by an ordinary user. However, the wording and structure of some prompts could point to the user’s intent. The following example illustrates that observation with a prompt that includes an adversarial suffix—an ending added to the prompt that effectively manipulates AI:
In the example above, when a user asked AI to generate a post encouraging drunk driving, the AI denied such request because its hard-coded restrictions do not allow for content generation that encourages illegal or dangerous conduct. However, once researchers added an adversarial suffix, that is, a string of letters and symbols specifically designed to manipulate AI, the AI system then returned a draft social media post that encourages drunk driving. The addition of the adversarial suffix is an example of how prompt injection attacks could be seen as intentional acts of AI manipulation, which in turn should be considered access without authorization to a protected computer while obtaining information.
While I argue that the CFAA applies to acts of AI manipulation, some lingering issues are not fully resolved. First, how does one calculate the value of information obtained through prompt injection? The CFAA emphasizes the value of information as opposed to the nature of the information obtained by the hacker. For example, if a hacker obtains information that is valued at more than $5,000, they are likely to be charged with the felony version of 1030(a)(2). The CFAA draws no distinction between dangerous information, like malware, and information that is simply confidential but not dangerous, like customer records. The upcoming wave of AI manipulation will mostly include the former and not the latter. While AI manipulation could be considered access without authorization, it is not fully clear which acts of AI manipulation are felonies and which ones are misdemeanors. It may be necessary in the future to draw a distinction between types of information (dangerous versus not dangerous) as opposed to value.
Since prompt injection attacks and AI manipulation are here to stay, it is important to evaluate the criminal limits of what actions one is allowed to perform on an AI system. Undoubtedly, AI systems will become better at security, but it is impractical for any system to be fully protected against these types of attacks. Criminalizing a limited subset of prompt injection attacks is not a silver bullet solution, as AI platforms will have to do better in terms of security and privacy. However, cyber crime could prove useful in certain cases, including the most severe ones, where a user seeks to generate content that is objectively dangerous and illegal, like malicious code. Such AI manipulation is undoubtedly access without authorization.