Criminal Justice & the Rule of Law Cybersecurity & Tech

AI-Generated Voice Evidence Poses Dangers in Court

Rebecca Wexler, Sarah Barrington, Emily Cooper, Hany Farid
Monday, March 10, 2025, 10:32 AM

In the age of AI, listener authentication of voice evidence should be permissive,  not mandatory.


Audio cassette tapes (Paul Seling, https://www.pexels.com/photo/cassette-tapes-in-close-up-shot-12341037/, public domain)

Published by The Lawfare Institute
in Cooperation With
Brookings

Gary Schildhorn received a call that no parent wants to receive. When Schildhorn picked up the phone, the voice of his panicked son told him that he had been in a car accident and was in jail. A second call, moments later, purportedly from a lawyer, gave Schildhorn instructions on how to pay the $9,000 bond. Schildhorn was preparing payment when he received a call from his real son, who was not, in fact, in jail. Schildhorn nearly fell victim to the growing trend of artificial intelligence (AI)-powered voice scams. AI-generated voices are a problem not only for fraud but also for the legal system. Indeed, accusations of AI-generated voice clones have now made their way into the courts, and the way the courts deal with audio recording evidence needs to catch up.

Under the current Federal Rules of Evidence, someone trying to introduce an audio recording of a voice can satisfy the authentication standard for admissibility merely by putting a witness on the stand who says they are familiar with the person’s voice and the recording sounds like them. Specifically, Rule 901 states that the following evidence “satisfies the requirement [for admissibility]: ... An opinion identifying a person’s voice—whether heard firsthand or through mechanical or electronic transmission or recording—based on hearing the voice at any time under circumstances that connect it with the alleged speaker.” The rule presumes that this evidence will be “sufficient to support a finding that the item is what the proponent claims it is.”

In the age of artificial intelligence, this presumption is no longer tenable. The Evidence Rulemaking Committee should amend the rules to make the enumerated examples in Rule 901(b) permissive, not mandatory. The examples should illustrate circumstances that may satisfy the authentication requirement while still leaving judges discretion to exclude an item of evidence if there is other proof that it is a fake. 

Realism of AI-Powered Voice Clones 

Over the past few years, AI-powered voice synthesis and cloning has improved at an impressive clip, culminating this past year in dramatic breakthroughs. Perhaps most striking is the ability to convincingly clone a person’s voice from as little as 30 seconds of reference audio using easily-accessible and low-cost commercial services

Indeed, a recent suite of perceptual studies highlights the current realism of voice cloning. In a large-scale online study, we asked 300 people to listen to pairs of audio clips of people speaking. We then asked them a simple question: Were these clips from the same person, or a different person? People were actually quite good at performing this discrimination when presented with audio clips of real human beings. When the two clips came from the same person, listeners correctly detected this fact with a median accuracy of 100 percent. At the same time, they were fooled only about 10 percent of the time into thinking two similar-sounding voices from different identities were the same.

The issue, however, arises when listeners hear a pair of voices comprising one real person and an AI clone of that person (generated with ElevenLabs, a voice-cloning service that is easy for anyone to use). In this case, listeners judged the real person and their AI clone to be the same person about 80 percent of the time, with one in four participants tricked by every single AI clone used in the study.

The upshot is clear: People can no longer reliably distinguish between a real voice of someone and the person’s AI clone. In a second study, we also asked listeners to explicitly make a real versus AI-generated judgment on audio clips. While they performed above chance (50 percent), the average performance across listeners (64 percent) was still well below what might be desired for definitive evidence. We are not the only researchers to find evidence of this deficiency. Others have reported similar issues, although performance can vary depending on the study details. For example, two other research groups recently reported accurate real/AI discrimination superior to our findings (although still falling short of 100 percent, with rates around 70-80 percent). However, these studies did not employ current state-of-the-art AI-clone technology, which is constantly improving.

There are, of course, caveats to these conclusions because no perceptual study perfectly recreates the real-world experience pertinent to criminal law. For example, the length and quality (in terms of audio compression rates or background noise) can impact people’s ability to discern the identity and naturalness of a voice. Similarly, while Schildhorn was taken in by his scammer, people may be generally better at this task for voices of individuals they know.

Policy Recommendation

Given these technological developments, it should not be the case that parties are entitled to introduce a voice recording to a jury merely by calling a witness to the stand who says they can identify the speaker because they are familiar with the voice. Even a witness, like Gary Schildhorn—who really thinks they know the speaker—might be wrong. Yet, under the current, mandatory version of Federal Rule of Evidence 901(b)(5), even if the party opposing that evidence were to introduce reliable forensic proof that the audio is an AI-generated fake, the rules would arguably require the judge to admit the recording. That’s ridiculous.

The Evidence Rulemaking Committee should fix this problem by adding the word “may” to Rule 901(b) so that it reads: “The following are examples only—not a complete list—of evidence that may satisfy the requirement [of authenticity]” (emphasis added). This would shift admissibility for all the enumerated examples, including the option to authenticate the identity of a person’s voice by calling a witness to the stand who says they recognize the speaker, to a permissive rule rather than a mandatory one. 

Other aspects of the authentication rule need not change for AI specifically. To be sure, one might criticize the low sufficiency standard that makes it easy to admit all kinds of physical evidence as long as you have some basis to think it is authentic, the lack of a distinct reliability analysis for expert “machine-generated evidence,” or the fact that—as with evidence law generally—it is the opposing party’s burden to object in a timely fashion or forever hold their peace. Yet, if these other aspects of the rules are problematic, then they are problematic for lots of physical evidence, not just AI-generated content.

What recent perceptual studies of AI-powered voice clones do show is that a mandatory route to authentication can quickly become outdated.

Hence, it would be better for authenticating all kinds of evidence to give judges discretion to decide on a case-by-case basis whether the party offering the evidence has made a sufficient showing that it is what they claim it is. Judges would still apply the low sufficiency standard, so they would not be substituting their judgment for that of the jury, raising the burden on parties seeking to introduce evidence, or opening the floodgates to a morass of evidentiary disputes. But the rules would no longer force judges to admit evidence when there is compelling proof that the evidence is fake.

This is not a recommendation to future-proof the law: It is a need to present-proof it.


Rebecca Wexler is the Hoessel-Armstrong Professor of Law at Berkeley Law School.
Sarah Barrington is a PhD student at the UC Berkeley School of Information.
Emily Cooper is an associate professor at University of California, Berkeley in the the Herbert Wertheim School of Optometry & Vision Science and the Helen Wills Neuroscience Institute.
Hany Farid is a Professor at the University of California, Berkeley with a joint appointment in Electrical Engineering & Computer Sciences and the School of Information. His research focuses on digital forensics, image analysis, and human perception. He received his undergraduate degree in Computer Science and Applied Mathematics from the University of Rochester in 1989, and his Ph.D. in Computer Science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in Brain and Cognitive Sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019. He is the recipient of an Alfred P. Sloan Fellowship, a John Simon Guggenheim Fellowship, and is a Fellow of the National Academy of Inventors.
}

Subscribe to Lawfare