Published by The Lawfare Institute
in Cooperation With
The international legal community saw a paradigm redefined at the beginning of this year. Judges in Colombia, for the first time, openly used generative artificial intelligence (GAI) to author parts of their judicial opinions. In the first case, a judge used GAI to help research and draft a holding addressing a petitioner’s request for a waiver of medical fee payments for treatment for her child with autism. In the second case, the court addressed how to conduct a virtual court appearance in the metaverse while citing GAI-based research.
These courtroom applications are the first two cases in which GAI has been applied in a context that parses and interprets the law, applying the weight of state authority. These developments lead us to consider how new harms can be ushered in with emerging technology like GAI and extended reality (or XR, the combination of virtual reality and augmented reality) in courts. The second case in Colombia, for instance, is one of just a handful in which national courts have tried to use the metaverse as a venue for hearings. On Sept. 23, 2022, a Chinese court held a pair of car accident trials in a XR venue, complete with online trial observers and digital displays of evidence and court transcripts, in a deliberate effort to promote the construction of “smart courts.” It’s no coincidence that both Colombian and Chinese courts used AI and XR in traffic-related cases, because these cases tend to be relatively “easy cases” to the extent that applicable laws provide clear solutions and evidence is not complex.
XR and GAI are tools that may prove helpful in certain limited circumstances. However, as society is in the early phases of the mass adoption of this new technology, we should be careful in allowing applications where the consequences of mistakes could be dire. Today’s GAI does not have the capacity to deeply understand contextual information and facts, which are the foundation of judicial discretion. XR still does not allow for high-resolution depictions of faces and body language, which are key to evaluating a witness’s credibility. Therefore, while GAI and XR may be appropriate to help understand technical issues or optimize administrative tasks that help courts run, they should not yet be applied in adjudications where significant life, liberty, and property interests are at stake.
Metaverse Courtrooms and GAI Opinions
In Colombia’s first case using GAI, a minor diagnosed with autism spectrum disorder was prescribed therapy by his treating physician. The plaintiff filed an acción de tutela, which is a constitutional remedy, claiming she lacked the financial means to cover the costs and transportation expenses necessary for her child’s therapy. The trial judge ruled in her favor. An appellate judge, Juan Manuel Padilla, confirmed the lower court’s decision and used ChatGPT-3 (ChatGPT) to further elaborate about the scope of the acción de tutela to seek access to the health care system. He argued that the recently enacted Law 2213/22 allows judges to use AI systems like ChatGPT as a tool to expedite judicial decision-making.
The second Colombian case, Siett v. Policia Nacional, involved disputed traffic tickets worth around $170,000. Both sides asked the judge to hold a virtual court appearance in the metaverse. In the hearing, which was livestreamed on YouTube, Judge Maria Victoria Quiñones described safeguards for a metaverse court, providing for video conferencing in the event of a technology failure. She asked ChatGPT specific prompts such as “what is an avatar” and “how to confirm whether an avatar is authentic.” The court then elaborated on the legal basis for ChatGPT’s suggestions and used them to set out the procedural rules on how the virtual court appearance would be conducted. Given that Law 2213/22 grants judges discretion to decide whether a trial shall unfold in person or virtually, Quiñones argued that holding a virtual court appearance in XR, instead of a video conferencing tool, could help her ensure a lively interaction with the parties and the evidence involved in the trial.
Colombia has a civil law-based tradition with a hybrid system. The higher court’s precedent has controlling authority over constitutional and statutory matters. Nonetheless, lower courts do not always follow precedents because of a still-developing stare decisis culture. Judges usually don’t have access to up-to-date case repositories, and when they do, it’s very hard for them to figure out controlling precedents due to the exponential number of cases (imagine four higher courts creating precedents on a daily basis). For example, the Constitutional Court reviewed around 600,000 acciones de tutela in 2022. According to the Colombian nonprofit organization Corporación Excelencia para la Justicia, Colombia’s court system had over 1.8 million pending cases in December 2020. Needless to say, Colombian judges could use an extra hand, even if it is digitally powered.
Cautionary Tales About XR
Because some courts have opted for a metaverse courtroom, advocates need to understand how virtual reality hardware and the metaverse operate differently than the internet or even Zoom. Experiences in XR are interpreted by human brains as actual reality. However, courts often involve highly sensitive sensory journeys, which XR hardware is incapable of conveying today.
Humans make judgments based on impressions of others’ expressions and movements. Mouths and eyes are particularly important for interpreting if someone is trustworthy or is a threat. In avatars, the face is the last step in crossing the uncanny valley. This is why some virtual humans or digital resurrections of actors, like Carrie Fisher’s posthumous Princess Leia, still feel “off,” flat, or just creepy, in an almost imperceptible way.
Currently, most XR avatars are cartoonish with limited options for personalization. To avoid the uncanny valley, research says a character’s facial expressions must match its tone of voice. Similarly, an avatar’s body movements must be responsive and reflect the virtual human’s desired emotional state. Special attention must also be paid to the avatar’s face, because its features must depict the nuances of emotion. For example, an avatar’s poorly modeled mouth could appear to have a fake smile or be aggressive when the user is actually happy.
Until we can make avatars with the capability to render animations around the eyes, the intricate movement of the muscles around the forehead and mouth, and subtle motions of the body, XR may lead users to presume trustworthiness or untrustworthiness, based on technical limitations and not the veracity of the witness.
Some of the challenges that video conferencing brought to courts, like ensuring fundamental rights like fairness, presumption of innocence, and the right to confront one’s accuser, will also be present in XR. The Brennan Center’s studies of Zoom trials and video testimony have revealed that virtual appearances have led to more negative outcomes in sentencing, more unfavorable impressions of child witnesses over video chat, and fewer defendants availing themselves of additional helpful resources in immigration court when making virtual appearances. Furthermore, high-end XR hardware with better rendering and image capture is still quite expensive and not even available in certain markets. Unless web-based XR is used or high-end hardware is made available to all parties, access to justice may be further limited by virtual courtrooms.
Overall, a judge or jury’s potential negative reaction to avatars is a new avenue to introduce bias. This may be crucial for the use of XR in virtual court appearances, since sometimes the judicial process involves a psychological and emotional journey where nonverbal language is key to adjudicating a dispute. While this could be mitigated somewhat with embedded video conferencing in XR courtrooms, if people’s instinctual responses to XR avatars are negative, or if harsher judicial outcomes result from virtual appearances in virtual courts, then XR trials risk fomenting new forms of societal harm.
Cautionary Tales About AI
Earlier experiments using AI as specific legaltech to inform courts have gone terribly wrong, and impacted citizens in unjust ways. For example, an AI-based criminal risk assessment algorithm, providing sentencing guidance for U.S. judges, was found to be biased against minorities. An article in the MIT Tech Review explained how the algorithm worked:
Risk assessment tools are designed to do one thing: take in the details of a defendant’s profile and spit out a recidivism score—a single number estimating the likelihood that he or she will reoffend. A judge then factors that score into a myriad of decisions that can determine what type of rehabilitation services particular defendants should receive, whether they should be held in jail before trial, and how severe their sentences should be. A low score paves the way for a kinder fate. A high score does precisely the opposite.
Advocates of the tools pointed to improving the potential allocation of resources of penal systems accordingly, if one could predict if an offender would commit future crimes. They theorized that bias could be reduced based on data-based recommendations, outside of human prejudice. However, the underlying problem was the data set that created the tools—past criminal behavior does not predict future offenses for any individual.
More concretely, the main failure of these types of predictive algorithms is that they entrench prejudice and mask it with data. AI identifies patterns in data sets. However, as over 100 civil rights organizations pointed out in a letter condemning pretrial risk assessments, this type of data analysis is correlative, especially when based on historical data, and is not causal. In practice, this means the mistakes of the past are given an opportunity to resurface anew.
While the need for consistency in precedent-based jurisdictions may facially suggest that AI might be suitable to identify patterns, outside of criminal risk assessments, GAI further muddies the waters. Currently, judges may research with online databases to construct their holdings. Some may even use Google as a research tool. But GAI is fundamentally different. Unlike specific AI solutions developed for legal uses, GAI was created for and has been used for, at least initially, general use. There is a strong possibility users will get different results applying different GAI models, different data sets inputs, and differently worded user prompts. Randomness and uniqueness are appreciated when performing creative tasks like generating digital art or writing a poem. But while we see evidence of the beauty of prompt craft in artistic and creative applications, GAI risks potentially harmful outcomes if it is not treated with due caution as a subjectively driven tool for parsing a data set. Stated another way, it’s a fallacy to treat GAI outputs as objective means to inform jurisprudence. It’s less tabula rasa, and more caveat emptor.
Courts will also need to decide if applying GAI to judicial procedures, like criminal or civil procedure, is akin to an exercising of discretion. Is it akin to using an online search engine? Is it akin to using a dictionary? Should the analysis differ if, like in the Colombian parking ticket case, GAI advises on judicial process or definition, and not an interpretation of facts?
A procedural call is not perfunctory if the decision could impact a litigant’s life, liberty, and property interests. Overall, it is the impact of procedural law that makes GAI a risky tool to apply: Procedural calls can impact a citizen’s access to justice. If judges are able to decide if a case is admissible, if an action is sanctionable, or if subjective criteria that are written into a statute or rule apply, we should be wary of applying GAI.
Therefore, based on the need for nuance in judicial reasoning, and the interests of justice, court systems should not allow judges to apply GAI to interpret substantive law or judicial procedures. They should constrain its use to limited administrative circumstances or as one of several tools to help courts understand technical questions.
In Its Own Words
After writing this article, we asked ChatGPT two questions: “Should courts hold virtual court appearances in the metaverse?” and “Would you trust a decision about someone’s life, liberty, or property to an AI system?” This brought up larger ethical questions for us, after reviewing the terms of service: “You may not use the Services in a way that infringes, misappropriates or violates any person’s rights.” Given our assessment of potential legal harms, it begs the question if ChatGPT should refuse to generate legally oriented advice at all.
The GAI agreed with our calculus, including what we believed was ultimately the most important consideration in the matter—but it also demurred to present its own opinion. In response to the AI question, it responded:
As an AI language model, I do not have personal opinions, but I can provide information based on the current understanding and views of experts in the field.
Currently, most experts in the AI industry and the wider community of ethicists and legal scholars believe that AI systems should not be solely trusted with decisions that have significant impacts on someone’s life, liberty, or property. AI systems, even advanced ones like me, are still limited in their ability to understand and interpret human values, emotions, and context. This can result in decisions that are unjust, biased, or simply incorrect.
Moreover, AI systems can also be susceptible to manipulation and errors, which can lead to unintended consequences. Additionally, the lack of transparency and accountability in some AI systems makes it difficult to determine why a particular decision was made, making it challenging to hold the system responsible in the event of any harm.
Therefore, it is generally agreed that decisions regarding someone’s life, liberty, or property should be made through a human-led process that takes into account ethical and legal principles, rather than relying solely on AI systems.
Appropriate Limited Applications
Nothing the authors have said here should be seen to argue that societies should replace judges. Justice is human and must remain that way. Our view is aligned with what Ken Goldberg calls “multiplicity,” where “systems of diverse combinations of humans and machines work together to innovate and respond rapidly to complex problems.”
As computer scientists examine technical obstacles to GAI’s limited contextual comprehension, and engineers push toward more realistic versions of XR avatars, judges may still be able to use these systems in limited ways. Courts may use new technology to understand technical issues. Technology can help optimize administrative tasks that are mechanical or repetitive in nature but critical to the administration of justice. Scheduling or transcription takes a court system’s resources but does not deal with the fundamentals of adjudication. Furthermore, transparency and expediency are required by courts to ensure defendants’ right to a speedy trial or other procedural protections. In this way, used outside of substantive analysis, GAI and XR may yet be tools to help ensure courts adhere to the rule of law.
Despite the somewhat panicked moments of social fascination with GAI and XR, we must determine these technologies’ proper application to governance by turning to how the systems work. XR profoundly impacts our bodies and minds, unlike any other computing system. GAI may continue to improve, but it’s technically impossible to have a complete data set that contains solutions to any case that might arise. Understanding distinctions between physical and digital justice will necessarily center around the specific affordances of emergent technologies—and how these differences impact citizens around the world.