Cybersecurity & Tech

The Lawfare Podcast: Contestability in Government AI Systems

Eugenia Lostri, Jim Dempsey, Ece Kamar, Jen Patja

Wednesday, April 3, 2024, 8:01 AM

What does meaningful contestability of AI systems look like in practice?

Published by The Lawfare Institute
in Cooperation With

The use of AI to make decisions about individuals raises the issue of contestability. When automated systems are used by governments to decide whether to grant or deny benefits, or calculate medical needs, the affected person has a right to know why that decision was made, and challenge it. But what does meaningful contestability of AI systems look like in practice?

To discuss this question, Lawfare's Fellow in Technology Policy and Law Eugenia Lostri was joined by Jim Dempsey, Senior Policy Advisor at the Stanford Cyber Policy Center, and Ece Kamar, Managing Director of the AI Frontiers Lab at Microsoft. In January, they convened a workshop with stakeholders across disciplines to issue recommendations that could help governments embrace AI while enabling the contestability required by law. They talked about the challenges that the use of AI creates for contestability, how their recommendations align with recently published OMB guidelines, and how different communities can contribute to the responsible use of AI in government.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

Transcript

[Audio Excerpt]

Ece Kamar

There is a big responsibility part here for the people who are making decisions about putting these systems into the world, for the people who are designing these systems and deploying them in the world, that there is a real human cost for not building the right processes in place, for protecting the users or the people who are getting affected from these systems, to understand how these decisions are being made and what can they do to be able to contest them and ask the question of why.

[Main Podcast]

Eugenia Lostri

I am Eugenia Lostri, Lawfare’s Fellow in Technology, Policy, and Law, and this is the Lawfare Podcast, April 3^rd, 2024. The use of AI to make decisions about individuals raises the issue of contestability. When automated systems are used by governments to decide whether to grant or deny benefits, or calculate medical needs, the affected person has a right to know why that decision was made and challenge it. But what does meaningful contestability of AI systems look like in practice?

To discuss this question I was joined by Jim Dempsey, Senior Policy Advisor at the Stanford Cyber Policy Center, and Ece Kamar, Managing Director of the AI Frontiers Lab at Microsoft. In January, they convened a workshop with stakeholders across disciplines to issue recommendations that could help governments embrace AI while enabling the contestability required by law. We talked about the challenges that the use of AI creates for contestability, how the recommendations align with recently published OMB guidelines, and how different communities can contribute to the responsible use of AI in government.

It's the Lawfare Podcast for April 3^rd: Contestability in Government AI Systems.

So our conversation today is going to focus on some of the recent work that you've been doing on contestability for government AI systems. But before we get into all of that, I wanted to ask you if you could start by maybe framing this conversation that we're having. When we're using, for example, even the term AI in this conversation, what should our listener be thinking about? What are you including under the term? What are some of the, maybe applications that they should be keeping in mind?

Ece Kamar

Sure. I'm happy to start on it. I'm an AI researcher, so I have been doing work in the AI space for the last 20 years or so. And one of the most contested questions actually is, what is artificial intelligence? How do we define artificial intelligence? So, the field of artificial intelligence has been around for at least 70 years, and there are numerous techniques that people have proposed and implemented and experimented with over the years. But with respect to the AI systems that are in practice today, the most common approach we think of when we think of AI systems are systems that learn from statistical patterns that exist in data and use those statistical patterns to create outputs in the world. Those outputs may include language, for example, asking a question to an AI system and getting an answer back. And in some cases, that response could be a decision. Posing a scenario to the system and asking for the system's best prediction based on the statistical patterns the system has seen in the past, in terms of what the most likely outcome or most likely intervention should be for that particular scenario.

So when we are thinking about AI systems, we are thinking about systems that recognize language, respond in language, recognize faces, do image recognition, and so forth. And we are also thinking about these systems that can really reflect on many human-centered scenarios we run into in our daily lives, like people asking for loans, making decisions about loans in banks, or decisions that are on the admissions or even in legal context. So there are many applications of these decision-support AI, systems who can actually make predictions about the outcomes. And just like many different domains in our lives, we are also seeing that these decision support systems, AI systems, are making their way into many government use cases as well. And that has been the focus of our study in this work.

Jim Dempsey

Building on what Ece said, the first thing to recognize is that AI is not just one thing. It's a set of technologies as, Ece suggested. Secondly, it's been around for a long time. And third, we, in our work that is looking at this, a question of due process or contestability, we actually didn't limit ourselves. So we didn't use the words “AI” or the phrase “AI” or “artificial intelligence” very much, partly because we didn't want to get hung up on the definitional question that Ece was referring to. We talked about advanced automated systems. Governments and private sector have clearly been using a variety of techniques for many years to make decisions about individuals. Ece mentioned loan decisions, benefits decisions, employment decisions where some data was used, and some technology was applied to that data. And so, I think in terms of thinking particularly about this question of due process or contestability, when government adopts technology to make decisions about people, it's important not to get hung up on, too much, the strict definition of what is AI and what isn't AI. We're talking about using automated technologies to draw inferences from data or to make decisions about individuals.

Eugenia Lostri

Let's be a bit more specific maybe here about what are some of those decisions that you were concerned about? What are the uses by governments of advanced automated systems that, that you were particularly concerned or that you think raise questions of contestability or due process?

Jim Dempsey

In December of last year, 2023, the GAO, General Accountability Office issued a report in which they found that 23 agencies had reported a total of 1,200 current and planned uses of artificial intelligence, and they used some definition of artificial intelligence. I think 1,200 was probably low as a number. HHS, Department of Health and Human Services alone in its inventory reported 163 uses. Many of those are—not that they’re non-controversial—but they are not rights-impacting. NOAA, the National Oceanic and Atmospheric Administration, uses AI to analyze weather patterns. The U.S. Patent and Trademark Office uses AI to find relevant documents, known as prior art, which is a critical step in the patenting process to see if there's anything out there previously that had talked about this technique that someone is claiming a patent for. Department of Education uses natural language processing, one of the technologies that ECE referred to, to answer questions about financial aid in a two-year period. They generated 11 million user messages using a natural language processing system to answer common questions about financial assistance.

Any of those may pose issues around fairness or discrimination. But really where we were focused was around veterans’ benefits, loan decisions where the government plays a role in loan guarantees, criminal justice sentencing decisions, employment decisions, health disability benefits, calculating a person's health disability. All of those are subject in some way to, let's say, algorithmic-based decision-making. Again, whether you would call it AI in its current use is a maybe debatable question. But those are the kinds of areas where a person will face individual real-world consequences in terms of employment, education, access to housing, access to veterans’ benefits, or disability benefits. Those were the cases we were focused on.

Ece Kamar

And just to emphasize a few points here, these advanced decision support systems, decision-making systems, they actually have a pretty large number of use cases that are both applicable in the private use and the government use. And it is not possible to enumerate all of them. But by looking at the use cases we see, we can see that some of them touch on people's lives and freedoms, and they matter for people's lives significantly. And because the use cases for such systems are so varied, the people who are being affected by the decisions of these systems, they may not be aware that there are such systems in action as these processes work out in their lives.

So one of the important considerations here is actually transparency that such systems are being used in these cases. Just talking about the awareness around government use cases or consequential use cases, many of you may know a particular case that came up to both the academic and the press's attention in 2016 around recidivism use cases for decision support systems, where in U.S. courts at the time, there was a predictive machine learning software being used to create risk profiles for defendants, for making a prediction about a defendant's probability of recidivism in U.S. courts. And there are studies anyone can go on the internet and read about these recidivism predictors. ProPublica was the first source that actually did a detailed investigation in particular to the fairness of such models with respect to demographic directions like race and gender.

But the important point I'd like to make about this case is that although the academic world, especially the FATE community in computer science—fairness, accountability, transparency, and ethics community—was shocked by this use case, these systems were actually active for a decade in U.S. courts at the time. So it is very important to create awareness that there are consequential use cases of these advanced decision support systems in the world, including government use cases. And where they are used and how they are impacting people's lives may not be transparent or easily observable from a user perspective. And that's something we should take some action about.

Eugenia Lostri

So what is it about these systems? What are the characteristics of them and the way that they arrive at a decision that make them concerning? What are some of the challenges that you face when government adopts the use of these, let's just call them AI systems or AI tools, for ease?

Ece Kamar

I can start by listing some of the challenges and Jim can continue. First, one of the challenges with these advanced decision support systems that come from a statistical learning technology is that these systems learn from historical data, and they are trained based on statistical patterns. Which means for training many of these systems, what the developers would tap into is historical data or exemplary cases, most likely where they can find scalable data. We are talking about thousands and tens of thousands of previous cases that these systems can tap into to learn the major statistical patterns from. And these historical data, if these models are learning from those historical data, they actually carry a lot of patterns about human biases. They include a lot of historical issues and fairness issues as well. And systems who are taking these historical data as the golden truth, as the truth of the world, as a blueprint of how these decisions should be made, they're actually learning about all of those patterns that may not be ideal to learn from.

Also, these datasets that systems learn from may not be comprehensive. They may not include some of the really important characteristics of the cases or the individuals. And they will still need to correlate their decisions with respect to the characteristics that are available in the data set. And what these models end up learning are not, in most cases, causal relationships that really explain how those outcomes turn out to be in the world. But most often, these models learn correlation patterns that only correlate the observable facts of the world or the characteristics of the individual with the outcome. And decisions that are made based on those correlated patterns may not be reliable for making decisions about some certain groups, sometimes because those characteristics are not included in the datasets, and sometimes there may not be enough details, enough data, about certain demographic groups in those data sets.

And finally, when we are talking about these statistical models, we also accept that what these models will do are stochastic decisions. They will learn patterns that may apply to the majority, but there are going to be these outlier cases or the one tenth of the group that this decision will not apply to. And it is very important to be aware that every decision of its statistical decision support system is going to be a stochastic one.

Jim Dempsey

What Ece is saying on that last point is critical, which is, if you have a probabilistic system, it is by definition going to be wrong some of the time. And the system may be pretty darn good, but what about those some percentage of people for whom you know it's making a wrong decision? And they need to have the ability to challenge the decision and say, “Hey, I don't fit the pattern, but I do deserve a loan or these benefits.”

And a critical point here is that everybody up and down the chain of command inside a government agency—for a minute, let's just focus on government agencies, which are obviously subject to constitutional principles of due process. Everybody up and down the chain of command needs to appreciate the kinds of things that Ece was describing in terms of what can go wrong with these systems. Even if they promise and bring, and even if they can deliver, huge benefits in terms of efficiency, cost reduction, perhaps even more accurate than a human in some contexts. Still, whether it's the secretary of the department or the head of the program of veterans benefits or health disability benefits, Medicare, Medicaid services, all the way down to the individual case officer, case agent case worker, interfacing individually with the person who's applying for benefits, for example. All of them need to understand at some level how these systems work and how they can go wrong. The last thing we want is what's known as automation bias, which is, oh you're not eligible for veterans’ benefits. Why? The machine tells you, the computer says you're not eligible.

So up and down the levels suitable to their position in the hierarchy, officials making decisions that, hey, we're going to automate, we're going to embrace AI, we're going to use AI, need to understand what can go wrong. And then they need to build in the checks and balances. But as a lawyer, I would call due process. They need to build in the protections to make sure that individuals are not being unfairly treated.

Eugenia Lostri

So in this quest to embrace AI, we talk a lot about some of the principles around the use of AI, and you've mentioned some of these before, but terms that we hear fairly often include fairness, accountability, transparency. And I would be actually quite interested in hearing from both of you, because you do belong from two fairly distinct fields. What do those terms mean for each one of your communities? Is there an alignment when it comes to using those words?

Jim Dempsey

It's my own, actually, that those terms are overused and misunderstood. Fairness, accountability, safety. And in fact, my participation in this project was generated by my feeling that I'm over accountability or safety of AI. Because these terms mean so much, they cover so much territory, that they don't really give you actionable insights or a path forward.

And I think, in a way, this is what the administration has been saying in its AI policy. Every company that makes AI is committed to accountability and safety. Everybody has already signed these general principles of AI safety or responsible AI or accountable AI. Okay, fine. It's come time to move on and figure out what those concepts, what those high-level principles mean in practice, which means drill down on bias, drill down on cybersecurity implications of AI, neither of which were issues that we took on, drill down on due process and the right to contest the decision made about you. That's what we wanted to drill down on because the time has come to put flesh on those bones, to spell out in actionable detail what that means. And that's what we tried to do in this project. And I think it's time to do that across the board, whether you're talking about anti-bias and discrimination, safety, again, these terms. A term like safety can mean almost anything. We need to get more concrete and we're trying to get more concrete.

Ece Kamar

I can answer that question from a computer science perspective as a computer scientist myself. These notions that we are talking about are notions that are really about having machine learning AI systems or these advanced automated systems functioning in the real world. If you go to many machine learning papers and look into how machine learning systems, AI systems, are being evaluated, you are going to see aggregated numbers computed on some test sets where people are going to say that according to this test set, this model that I have developed, this system I have developed, is correct 90 percent of the time, or that is correct 95 percent of the time.

What we are seeing with these, when we are getting into the real world, is that those numbers have a meaning in the use case context that those systems are deployed in. That, let's say, let's take a system that is 85 percent accurate. What are the cases that the system fails? What are those 15 percent of the time? When that 15 percent corresponds to some demographic group, or there is this concentrated errors that are happening for a certain demographic group. Now, we cannot talk about only statistical errors for these systems. Those errors have a particular meaning in our world, and that is that the system is not fair across the different demographic groups that it is working with.

Or when we take a system and we see that system is not working very well under some lighting conditions, let's say it's a vision system, that would be a reliability consideration because I don't expect that system to work equally well across different use cases and environmental conditions. Or when we take a system and look into how that system is making decisions, is a human able to look into that system and understand the logic behind those decisions? Or is it just this box where something is going in and something is going out, but I have no idea how the system is coming to that conclusion? That concern is a concern of transparency.

The computer science community has been working on adding meaning to how we should think about the different considerations around how a system functions and how well it functions. I agree with Jim that listing these principles out is really not understanding how they work in practice. And there's actually a lot of work in the last decade that has happened both in academia and industry trying to put that meat to the bone. But we also understand that there's so much to be done both on the fundamental technologies and also really diving into the use cases, understanding the considerations, and having a multidisciplinary dialogue where we are not looking at it only from a legal perspective, like Jim, or like a computer science perspective, like me, but we can actually bring these diverse perspectives together, ground them on particular use cases, and that could be one way to bring more meat to the bone. That's one of the things that we have been trying to do in this project, really bring that multidisciplinary group together to reflect on these upcoming important issues.

Eugenia Lostri

So drilling down on contestability then, how does contestability connect to these principles? How do you see it affecting the conversation that we're having?

Jim Dempsey

So again, I look at it again from a legal standpoint, and again, particularly in the governmental context. Everything the government does is subject to the Constitution, and the Constitution has in it the due process principle, which is that if a person is going to be deprived of property or life or liberty, they are entitled to due process.

And what does due process mean? One, it means a notice of what is happening to you and why. And second, a meaningful opportunity to challenge the outcome or decision. And both of those are implicated—again, right to a government guaranteed loan, right to veterans’ benefits, or health disability benefits, or other governmental benefits, education, including by state-run, government-run schools, criminal justice system, of course—in all of these contexts that right to due process is available. But what we had heard 10 years ago was, “Oh, the technology is so complicated. We can't actually explain it. Oh, the machine learning or some of these more advanced techniques, we can't explain it.” That argument, by the way, was made even with some of the relatively simplistic algorithms that had been developed in welfare context and other contexts.

And one of the things that I wanted to do with this report was to push back at that. And as Ece said, we brought together lawyers, policymakers, advocates, and computer scientists, experts in machine learning and artificial intelligence. And to me, one of the takeaways was not all AI is equal in this regard. Some systems, in fact, may be inscrutable or may be very difficult to explain their outcomes. Others can be explainable in a way that gives that individual that first principle of due process, which is the notice and explanation of what went on and why. Why did you get your health benefits reduced? Why were you denied veterans’ benefits? Why was the sentence recommended of a certain number of years for you?

This is a matter of conscious design choices. Technology is now at a state where it can be designed consciously in order to meet the needs of notice and a meaningful right to object. And that's the key choice that government policy, or one of the key choices, that government policymakers have to make and people up and down in the chain of command and agencies need to make when they're investigating artificial intelligence. Make the choices that understandability and contestability are part of the design criteria for the system. If you do, you can develop a system that does protect the rights of the individual. If you don't do that in the design phase, then you may be stuck with a system that is not explainable and does not give the individual the right that they're entitled to.

So it's a matter of design choices. Don't throw up your hands of, “Oh, it's AI. Forget about due process. Oh, it's AI. We can't understand it. We'll never—there's no AI that's explainable.” That's what I heard. Ece, right? You heard that as well, I think.

Ece Kamar

One of the most interesting things in our studies was listening from the people who are affected from such systems and the difficult road in front of them for understanding how these decisions are being made and also how to make the system work for themselves. And I think the most touching part of our meetings for me was hearing the human cost that is associated with not taking the responsibility that Jim is talking about, where people basically say, “Oh, it's AI, I don't have a control over what these systems do.” There is a big responsibility part here for the people who are making decisions about putting these systems into the world, for the people who are designing these systems and deploying them in the world that there is a real human cost for not building the right processes in place for protecting the users or the people who are getting affected from these systems to understand how these decisions are being made and what can they do to be able to contest them and ask the question of why.

So I really agree with Jim that there is this responsibility that people need to understand, and they also need to understand that there are certain decisions they can take. For example, what is the type of a decision system that they want to deploy into the world? Are they okay with a system that they cannot look up and understand the logic? If they are choosing a particular system where that logic is not clear, are they implementing additional methodologies on top that could actually create a proxy explanation for the logic of that system? How are they educating the people who are going to be using the recommended decisions of these systems? Are those people aware of the limitations of these decision support systems? Do they have the right means to be able to overcome when those decisions are wrong? That's another set of design decisions that the developers and the deployers of these systems should be aware of. And in other cases, let's say, when the decision is negative, for a user, are there ways to explain and help the user figure out what they can do to be able to revert that decision back?

So what we are trying to say here is that what matters, at least as much as putting that decision support system into the world, is how you build the right processes and the decision-making around it, such that you are taking responsibility for the decisions that those systems are supporting. You are supporting the people who will be using those systems in the way they make decisions, and you are enabling the people who are affected from the decisions of these systems to be able to understand and also take the right actions to be able to say, “You are wrong. I want my decision to be re-evaluated again,” so that these systems can actually do some good in the world. For example, adding more efficiency into the system or helping people to get to better decisions, but not put people into these really tough spots where they are just trapped with an unfortunate decision that these systems may make about themselves.

Eugenia Lostri

So I think this is the perfect entryway into talking with a little bit more detail about the workshop that you put together and all the different steps that you've taken in this work. You have a set of recommendations that you put forth precisely addressing this question. I would like to hear a little bit more about what you were hoping to achieve through the workshop, and what are some of the, maybe not all the recommendations, but some of the recommendations that you think are low hanging fruit, or the most important that need to be achieved. What would you highlight out of all of this work that you've been doing?

Jim Dempsey

One way to do that is to match our recommendations to a recent memorandum from the Office of Management and Budget, which was issued giving direction to federal government agencies looking at the adoption of artificial intelligence.

And the administration has had it clearly in mind, two goals, two principles. One, they want to promote the adoption of artificial intelligence within government. Secondly, they want to avoid the negative impacts that could come from poorly designed or poorly implemented artificial intelligence. So from the outset, the administration has said that artificial intelligence is both potentially very beneficial and simultaneously very risky.

And the president had set out an executive order last year, which launched the process, which is still ongoing in many respects. But one element of that was the issuance within the past week or so by the Office of Management and Budget of a directive to federal agencies. And it started where we did in some respects by identifying that there are certain rights-affecting the uses of artificial intelligence. And it identified them in substantial detail and they’re the ones we've been talking about—criminal justice, benefits, education, healthcare, language translation services for government services, et cetera, image, facial recognition, other biometric systems.

And for example, it had one recommendation, which was in a way partly based upon a recommendation we issued. We held our workshop in January of this year and we very quickly during the workshop, working with computer scientists, lawyers, advocates, civil society, academics, government officials, corporate representatives, we developed a set of recommendations and immediately made them available to the administration. One of them, for example, was the importance of testing. So you need to test your AI products in the real world on real people and see, are they working the way you thought they would? We started from the premise that due process is a constitutional right, that we recognize that there were challenges in ensuring that right when you use these automated systems. And so one of the recommendations, which the administration reflected in its OMB memo of last week was, test for performance in the real world. Ece, do you want to cover a couple others, or we can bounce back and forth?

Ece Kamar

Yeah. So what our recommendations were trying to do was thinking about the diverse set of stakeholders that both get affected by the outcomes of these systems and also have a role to play in making decisions about the deployment of such systems in real world settings. We recognize that these stakeholders are diverse, and we cannot expect them to have the same background in terms of where this technology is. And what we were trying to do with these recommendations is create a blueprint of the general steps and considerations these stakeholders should think about as they're thinking about applying such systems in their applications. We try to stay at a very general level so that regardless the use case, these recommendations would have some applicability.

One of the recommendations that matters a lot is an impact assessment at the design stage of an application of a system. What we mean by an impact assessment is that even before you make any decisions about which system to purchase or how to train it or where the data is coming from, you can actually think about a potential use case and imagine what can go wrong with it and put yourself in the shoes of the people who will be affected by these systems. What can really go wrong with it? That is such an important part of the envisioning process when it comes to these risks, because even before you attach yourself with any technology, you are empathizing with the end user. You are putting yourself in the shoes of them. And thinking about what can go wrong, that becomes a template for you to do further evaluation and testing that Jim has talked about. If we are not aware of what kind of risks may happen to which groups of people, then we don't know what to evaluate for afterwards. So this recommendation of being able to define the risks, imagine them, and then once we have the system, be able to quantifiably test them in the real world, is a pipeline that any person making decisions about the deployment of these systems should really be considerate about.

Another recommendation we have is for any use case, who are the experts you should be having in the room so that you can make the right assessments about the risks? Sometimes we may not have the right people in the room and some of the important insights may go unnoticed. Another observation is people are making decisions about purchasing such services. How can they actually list what are the requirements that really matter for their applications and can be informed about them from the earlier days and create a criteria, create a call, for such systems that actually takes these considerations into account? For example, are there certain domains where you want an interpretable system by default you can understand the logic of the system and because that will give you important benefits, then they should be in the call for such a system in the requirements of such systems. So we are trying to reach out to the people who are making decisions about purchasing these systems. We are providing some guidance for the people who are deploying them in the world, and also trying to create awareness that there is this additional important steps that we should all be taking as we put such systems into consequential scenarios in the world.

Jim Dempsey

One thing that came through loud and clear in our workshop and then came through loud and clear in our recommendations, and I see it in the OMB memo, is that the individuals who will be directly affected by an advanced automated system must be involved or represented at all stages of its development and use. This can't just be the lawyers and the technologists and the program managers. There's, Ece said, at the workshop we had, a woman who was very severely disabled, needed very intense medical care and had been fighting against her state, which administers the federal program, to get the healthcare that she needed, the home care that she needed, so she didn't have to be put into an institution, which of course is always more expensive anyhow. And just hearing from her and hearing from her lawyer about the struggles that she had gone through in responding to this automated system—it wasn't even really AI, but it was a relatively crude, but still hard to understand system—that brought home to us the importance of talk to the people who are affected because they have a perspective that it can be very unique. And this point came out, I'm pleased to see, in the administration directive, the OMB memo from last week. Agencies must consult, the OMB memo says, agencies must consult with affected communities.

Another thing that we talked about was the importance of accessibility in terms of language. This again came through pretty loud and clear in the OMB memo, that notices must be understandable, and they must be provided in appropriate languages. It’s interesting, if you go back to the Supreme Court cases, for decades, the Supreme Court has made this concept of understandability a part of the constitutional principle of due process, that government processes can be very complex and government officials and advocates and lawyers and the public interest groups, the advocacy groups can all talk in this very sort of closed lingo. And the Supreme Court has said, and we shouldn't forget it, they have said, “Your notice must be able to be understood by the people who are actually affected.” And again, this came through, I think, in the OMB directive.

We can talk about other ones. Maybe both Ece and I should talk about the training and sort of talent issues here because doing all of this is not going to be easy.

Eugenia Lostri

Yeah, actually that's where we were headed. But I did have a question and I think, Ece, I would like to start with you. If the question of contestability is a matter of design choices very early on, what are maybe some of the levers that can be used in order to incentivize the developers to make those choices from the get-go? How do we get them to make the right choice?

Ece Kamar

First of all, developers of such systems should be informed about their responsibility in their choices and how those choices will affect the end users of such systems. One of the things we observe in just the software engineering in general is that there could be different people involved in different stages of developing a product and putting it into the world. And sometimes you may not have all of them thinking together about what they are building and how that's actually going to affect people at the end. There might be this pipeline of people building independent components without understanding how that's going to come together and then do something in the world.

One of the things that's very important is, and that's why I was going by the first recommendation of the impact assessment, is that if you have a good understanding of what your system is going to do in the world and who that's going to be impacting, that gives a meaning to the different people who are involved in both making decisions of these systems and developing them and deploying them in the world, that there are certain requirements they should be bringing in for their own part, so that best outcomes can happen at the end for the whole system. So that awareness, taking that responsibility and understanding that there is a set of choices different people are going to be taking, and together that's going to be influencing the impact that those systems do in the world is very important.

But I also understand that these can come across as very abstract concepts, and that's why it is important to have concrete guidance, recommendations, and sometimes regulations in place so that they can create a blueprint for the developers of such systems to follow. For example, I work at Microsoft. I've been involved in the responsible AI practices at Microsoft for the last decade. And we observed the same thing happening in industry that people meant to build good, responsible products and put them in the world, but they didn't know where to start from. They didn't know what was that action space or the choice space that they could consider to make sure that their systems had contestability on them or even think about what, let's say, contestability meant for their systems.

So one of the things we did over the years was creating a responsible AI standard at Microsoft that's now active for every AI product we develop. And that standard does not give a detailed step by step process for everybody to follow, but it gives a blueprint. It gives something to start from. And by also providing some resources that, what are the set of techniques you can implement for contestability What is the vocabulary around it? What are the design choices? What are the tradeoffs? That can become a best practice guideline for any system that deals with contestability. We can provide hands on tools and guidance to the people who are developing these systems so that they can chart their way through it. And that can, from a point being very informed about the responsibility they are playing in their users’ lives. And that's one of the things that we've been trying to do with this work as well, focusing on the government use cases.

Jim Dempsey

And a lot of this is going to come down to the nuts and bolts of government procurement, which is not in the OMB memo. I hope that there's going to be follow on effort by OMB because, of course, many of these systems have been developed, will be developed, by private sector contractors. And the very nitty gritty stuff around how that request for proposals, the solicitation, and then the contract, how those are written to make contestability by design a contract requirement, because if it's not in the contract, the developer is not going to build it. If it is in the contract, the government has an enforcement mechanism over the contractor. And the kinds of details that Ece was talking about—and, again, if you don't tell the contractor that contestability is a requirement of the system, if you don't tell them that understandability is a requirement of the system, they're not going to build it.

This seemingly very obscure and very off-to-the-other-side set of concerns around how you write government solicitations and government contracts, that's really where the rubber is going to hit the road. And that's where agency by agency, the contracting officers need to, again, to understand what can go wrong, what are the different flavors of artificial intelligence, and build their contract to make contestability a contractual obligation.

Eugenia Lostri

Now, you've both been mentioning the need for awareness, the need for understanding so many different things at so many different levels. And this goes back, Jim, to the point that you were making before about how much talent, how much AI understanding, we're going to need in order to actually make these changes and ensure that we have this safe, secure, trustworthy, accountable, transparent, all the nice things for automated systems. So, two-part question: Do we have enough AI talent? And if not, how do we get there?

Jim Dempsey

Simple answer to question number one. No, nobody does. Private industry doesn't. And in fact, industry and government are competing with each other for talent. Second, how do we get there? Obviously, we want to encourage the educational system to do a better job of producing people with the necessary skills. But one recommendation that we made in our report, again, not reflected, unfortunately, in the OMB memo, but I think it still is viable and needs attention. The government is going to have to train its own AI talent to some extent on the technical side, but more even on this question of getting the policy people and the technology folks to understand each other and to be able to communicate. And we recommended that the government should establish a centralized training facility that the executive order and the OMB order put this on agency by agency, put the responsibility for talent development on each agency, which is fine at some level. But I don't think that any agency is going to have the capability to do that sufficiently.

We have government-wide law enforcement training. We have government-wide language training. We have one or two sort of centralized facilities for training people in the languages that our diplomats and other overseas people functions require. We have a centralized training facility for polygraphers. We have a centralized training facility on cryptography. I think we need to have a government AI governance training institute.

Ece Kamar

I would like to add one more thing. Like when we are talking about AI talent here, what we need for this is not necessarily only AI talent and just adding more technical expertise into the loop. Of course, we need the technical expertise to be able to bring the right techniques into the picture so that we have informed decisions about which technology we are implementing and when. At the same time, the kind of considerations we are talking about are not only technical considerations, they are sociotechnical considerations. So the right talent to include into the loop, to be able to address the issues we are talking about in the right way, is not necessarily bringing in people with a stronger technical perspective, it is actually finding the people who can bring this important mixture of societal understanding and the technical expertise to be able to bridge the gap between the use cases and the actual deployment of these systems in the world.

And this type of talent is even harder to get by than the general AI talent, because we are talking about people that are trained in a different way that really understand the social considerations, the societal considerations, as well as the technical considerations. So there is a call here also into the AI education that happens at institutions, that when we are thinking about training the next AI talent, we should not only be thinking about the algorithms, the models, the technical choices and their consequences, but we should train a generation that has a good understanding of the societal considerations, the impact that AI is creating in the world or these sophisticated decision support systems are creating in the world, and really understand that intersection well.

The other thing I'll add is that there is some training here for the people who are not building these systems, but making decisions around them. So for the people who are going to be using such systems, they need a certain education. For example, imagine there is a judge that will be using some advanced automated decision support system. What is the training that they should be receiving so that they are not being affected by the automation biases as much as possible? For the people who are going to be making the procurement decisions around these systems or writing the call for such systems, they need to have some understanding of the technical and societal considerations as well. The amount of the technical knowledge they need to know is not going to be the same as somebody building these systems.

But one of the things that's really important is thinking about the large set of stakeholders that will matter for getting this right. And what each role, different stakeholders need to know so that they can collectively make the right decisions.

Jim Dempsey

Amen to all that.

Eugenia Lostri

Absolutely. I think that's actually a pretty good point to end on. This conversation, really spanned a lot of different issues, different topics, but I think this call, making sure that we're focusing on the intersection of all these issues, is really a good wrapping point. So thank you both for joining me today.

Ece Kamar

Thank you.

Jim Dempsey

Thanks so much.

Eugenia Lostri

The Lawfare Podcast is produced in cooperation with the Brookings Institution. You can get an ad-free version of this and other Lawfare podcasts by becoming a Lawfare material supporter at patreon.com/lawfare. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts.

Look out for our other podcasts, including Rational Security, Chatter, Allies, and The Aftermath, our latest Lawfare Presents podcast series on the government's response to January 6. Check out our written work at lawfaremedia.org.

The podcast is edited by Jen Patja Howell, and your audio engineer this episode was Noam Osband of Goat Rodeo. Our music is performed by Sophia Yan. As always, thank you for listening.

Topics:

Cybersecurity & Tech

Eugenia Lostri

Eugenia Lostri is Lawfare's Fellow in Technology Policy and Law. Prior to joining Lawfare, she was an Associate Fellow at the Center for Strategic and International Studies (CSIS). She also worked for the Argentinian Secretariat for Strategic Affairs, and the City of Buenos Aires’ Undersecretary for International and Institutional Relations. She holds a law degree from the Universidad Católica Argentina, and an LLM in International Law from The Fletcher School of Law and Diplomacy.

Jim Dempsey

Jim Dempsey is a lecturer at the UC Berkeley Law School and a senior policy advisor at the Stanford Program on Geopolitics, Technology and Governance. From 2012-2017, he served as a member of the Privacy and Civil Liberties Oversight Board. He is the co-author of Cybersecurity Law Fundamentals (IAPP, 2024).

Ece Kamar

Ece Kamar is the Managing Director of the AI Frontiers Lab, where she leads research and development towards pushing the frontiers of AI capabilities.

Jen Patja

Jen Patja is the editor and producer of The Lawfare Podcast and Rational Security. She currently serves as the Co-Executive Director of Virginia Civics, a nonprofit organization that empowers the next generation of leaders in Virginia by promoting constitutional literacy, critical thinking, and civic engagement. She is the former Deputy Director of the Robert H. Smith Center for the Constitution at James Madison's Montpelier and has been a freelance editor for over 20 years.

The Lawfare Podcast: Contestability in Government AI Systems

Eugenia Lostri

Jim Dempsey

Ece Kamar

Jen Patja

Eugenia Lostri

Jim Dempsey

Ece Kamar

Jen Patja

More Articles

Challenges in the Online Child Safety Ecosystem

Know-Your-Customer Is Coming for the Cloud—The Stakes are High

Sandworm, an Inspiration for Hostile Actors

Other Topics

Subscribe to Lawfare

Lawfare

Resources

About