The Problem of Deepfakes and AI-Generated Evidence: Is it time to revise the rules of evidence? – Part One

September 19, 2024

Ralph Losey. Published September 19, 2024.

On April 19, 2024, the Advisory Committee on Evidence Rules for federal courts faced a critical question: Does AI-generated evidence, including deepfakes, demand new rules? The Committee’s surprising answer—’not yet.’ Was that the right call? Will they change their mind when they meet in November again right after the elections?

Part One analyzes the various rule change proposals. Chief among them is the proposal by Judge Paul Grimm (retired) and Professor Maura Grossman, who are well known to all legal tech readers. Several other interesting proposals were considered and discussed. You will hear the nerds inside view of the key driving facts at play here, the danger of deepfakes, the power of audio-video evidence, jury prejudice and the Liar’s Dividend. Part One also talks about why the Evidence Rules Committee chose not to act and why you should care.

Part Two will complete the story and look at what comes next with the meeting of November 8, 2024. It will also include a discussion of a second, slightly revised proposal by Paul Grimm and Maura Grossman that they just submitted and the latest article by Paul Grimm, Maura Grossman and six other experts: Deepfakes in Court: How Judges Can Proactively Manage Alleged AI-Generated Material in National Security Cases. They are all trying, once again, to push the Committee into action. Let us hope they succeed.

Summary of the Evidence Committee’s Decision and the Leadership of its Official Reporter

The Committee, under the strong leadership of its official Reporter for the last twenty-eight yearsDaniel J. Capra, considered multiple proposals to amend the Rules of Evidence, but rejected them all. Professor Capra cited the need for further development. For now, courts must manage the significant new challenges of AI with existing rules.

They key segment of the Committee’s work is the 26-page memorandum found at Tab 1-A of the 358-page agenda book. It was written by Professor Daniel J. Capra, Fordham University School of Law and Adjunct Professor at Columbia Law. Dan Capra is a man almost my age, very powerful and respected in academic and judicial circles. He is a true legend in the fields of evidence, legal ethics and education, but he is no nerd. His comments and the transcript of his interaction with two of the top tech-nerds in law, Judge Paul Grimm (retired) and Professor Maura Grossman, make clear that Professor Capra lacks hands-on experience and deep understanding of generative AI.

That is a handicap to his leadership of the Committee on the AI issues. His knowledge is theoretical only, and just one of many, many topics that he reads about. He does not teach AI and the law, as both Grimm and Grossman do. This may explain why he wanted to just wait things out, again. He recommended, and the Committee agreed, apparently with no dissenters, to reject the concerns of almost all of the hands-on nerds, including all of the legal experts proposing rule changes. They all warn of the dangers of generative AI and deepfakes to interfere with our evidence based system of justice. It may even make it impossible to protect our upcoming election from deepfake interference. Daniel Capra gives some consideration to danger, but thinks the concerns are overblown and the Committee should continue to study and defer any action.

Evaluation of the Pending Danger of Deepfakes

For authority that the dangers of deepfake are overblown, and so no rule changes are necessary, Professor Capra cites two articles. Professor Capra’s Memorandum to the Committee at pgs. 25-26 (pgs. 38-39 of 358). The first is unpersuasive, to say the least, a 2019 article in the Verge, Deepfake Propaganda is not a Real Problem, THE VERGE (Mar. 15, 2019). The article was written by Russell Brandom, who claims expertise on “the web, the culture, the law, the movies, and whatever else seems interesting.”

The second article was better, Riana Pfefferkorn, Deepfakes in the Courtroom, 29 Public Interest Law Journal 245, 259 (2020). Still, it was written in 2020 and so now is way out of date. My research discovered that Riana Pfefferkorn has published a much more recent paper pertaining to deepfakes, Addressing Computer-Generated Child Sex Abuse Imagery: Legal Framework and Policy Implications (Lawfare, February 2024). In the Introduction at page-2 of this well written paper she says:

Given the current pace of technological advancement in the field of generative ML, it will soon become significantly easier to generate images that are indistinguishable from actual photographic images depicting the sexual abuse of real children.

For Ms. Pfeffferkorn the problem of deepfakes is now a very real and urgent problem. At page 25 of the paper she asserts: “There is an urgent need, exacerbated by the breakneck pace of advancements in machine learning, for Congress to invest in solving this technical challenge.”

Professor Capra and the Committee see no “urgent need” to act. They do so in part because of their belief that new technology will emerge (or already exists) that is able to detect deepfakes and so this problem will just go away. Professor Capra has one expert to support that view, Grant Fredericks, the president of Forensic Video Solutions. I looked at the company website and see no claims to development or use of any new technologies. Capra relies on the vendor promises to detect fake videos and keep them out of evidence, “both because they can be discovered using the advanced tools of his (Fredricks) trade and because the video’s proponent would be unable to answer basic questions to authenticate it (who created the video, when, and with what technology.” Professor Capra’s Memorandum to the Committee at pg. 26 (pg. 39 of 358).

Capra’s memorandum to the Committee at first discusses why GenAI fraud detection is so difficult. He explains the cat and mouse competition between image generation software makers and fraud detection software companies. Oddly enough, his explanation seems correct to me, and so appears to impeach his later conclusion and the opinion of his expert, Fredericks. Here is the part of Capra’s memorandum that I agree with:

Generally speaking, there is an arms race between deepfake technology and the technology that can be employed to detect deepfakes. . . . any time new software is developed to detect fakes, deepfake creators can use that to their advantage in their discriminator models. A New York Times report reviewed some of the currently available programs that try to detect deepfakes. The programs varied in their accuracy. None were accurate 100 percent of the time.

Memorandum to the Committee at pg. 17 (pg. 30 of 358

Professor Capra’s supports his statement that “none were accurate 100 percent of the time,” by citing to a NYT article, Another Side of the A.I. Boom: Detecting What A.I. Makes (NYT, May 19, 2023). I read the article and it states that there are now more than a dozen companies offering tools to identify whether something was made with artificial intelligence, including Sensity AI, Optic, Reality Defender and FakeCatcher. The article repeats Professor Capra’s arms race scenario, but adds how the detector software always lags behind. That is common in cybersecurity too, where the defender is always at a disadvantage. Here is a quote from the NYT article:

Detection tools inherently lag behind the generative technology they are trying to detect. By the time a defense system is able to recognize the work of a new chatbot or image generator, like Google Bard or Midjourney, developers are already coming up with a new iteration that can evade that defense. The situation has been described as an arms race or a virus-antivirus relationship where one begets the other, over and over.

That has always been my understanding too, which is why I cannot believe that new technology is around the corner to finally make detection foolproof or that Grant Fredericks has a magic potion. I think it is more likely that the spy versus spy race will continue and uncertainty will be with us for a long time. Still, I sincerely hope that Professor Capra is right, and the fake image dangers are overstated. That’s my hope, but reason and science tells me that’s a risky assumption and we should mitigate our risks by making some modest revisions to the rules now. I would start with the two short proposals of Grimm and Grossman (as slightly revised in September 2024 and explained in Part Two).

Professor Capra’s Discussion of the Proposed Rule Amendments

There were four rule change proposals before the Committee in April 2024. One by Professor Andrea Roth of the University of California, Berkeley, School of Law, a second by Professor Rebecca Delfino of Loyla Law School and a third by Judge Paul Grimm (retired) and Professor Maura Grossman, already well known to most of my readers. I omit discussion here of a fourth proposal by John LaMonga in the interests of time, but you can learn about it in Professor Capra’s Memorandum to the Committee at pgs. 23-25 (pgs. 36-38 of 358). Also see John P. LaMonaga, A Break from Reality: Modernizing Authentication Standards for Digital Video Evidence in the Era of Deepfakes, 69 Am. U.L. Rev. 1945, 1984 (2020).

Professor Andrea Roth’s Rule Proposals

Professor Roth’s suggestions are nerdy interesting and forward thinking. Her suggestions are found in Professor Capra’s Memorandum to the Committee at pgs. 10-13 (pgs. 23-26 of 358) and Capra’s critical comments of the proposals follow at pgs. 13-16 (pgs. 26-29 of 358). I urge interested readers to check out her proposals for yourself. Capra’s comments seem a bit overly critical and I look forward to hearing more from her in the future.

Here is a Capra’s comment to one of her proposals, to add a new, independent subdivision to Rule 702. Testimony by Expert Witnesses.

The proposal addresses what could be thought to be a gap in the rules. Expert witnesses must satisfy reliability requirements for their opinions, but it is a stretch, to say the least, to call machine learning output an “opinion of an expert witness.”

Memorandum to the Committee at pg. 11 (pg. 24 of 358).

Oh really? A stretch, to say the least. Obviously Capra is not familiar with my work, and that of many others in AI, on the use of generative AI personas as experts. See e.gPanel of AI Experts for Lawyers; and, Panel of Experts for Everyone About AnythingAlso see: Du, Li, Torralba, Tenenbaum and Mordatch, Improving Factuality and Reasoning in Language Models through Multiagent Debate, (5/23/23).

For me Andrea Roth’ proposals are not a stretch, to say the least, but common sense based on my everyday use of generative AI.

Andrea Roth also suggests that Rule 806. Attacking and Supporting the Declarant’s Credibility be amended to allow opponents to “impeach” machine output in the same way as they would impeach hearsay testimony from a human witness. Professor Capra of course criticizes that too, but this time is more kind, saying at page 13 of his memo.

The goal here is to treat machine learning — which is thinking like a human — the same way that a human declarant may be treated. Thought must be given to whether all the forms of impeachment are properly applicable to machine learning. . . . The question is whether an improper signal is given by applying 806 wholesale to machine related evidence, when in fact not all the forms of impeachment are workable as applied to machines. That said, assuming that some AI-related rule is necessary, it seems like a good idea, eventually, to have a rule addressing the permitted forms of impeachment of machine learning evidence.

I thought Andrea Roth’s suggestion was a good one. I routinely cross-examine AI on their outputs and opinions. It is an essential prompt engineering skill to make sure their opinions are reliable and understand the sources of their opinions.

Due to concerns over the length of this article I must defer further discussion of Professor Andrea Roth’s work and proposals for another day.

Professor Rebecca Delfino’s Proposal to Remove Juries From Deepfake Authenticity Findings

Professor Rebecca Delfino of Loyla Law School is a member of the Committee’s expert panel. She is very concerned about the dangers of the powerful emotional impact of audiovisuals on jurors and the costs involved in authenticity determinations. Her recent writings on these issues include: Deepfakes on Trial: A Call to Expand the Trial Judge’s Gatekeeping Role To Protect Legal Proceedings from Technological Fakery, 74 HASTINGS L.J. 293 (2023); The Deepfake Defense—Exploring the Limits of the Law and Ethical Norms in Protecting Legal Proceedings from Lying Lawyers, Loyola Law School, 84 Ohio St. L.J., Issue 5 1068 [2024]; Pay-To-Play: Access To Justice In The Era Of Ai And Deepfakes, 55 Seton Hall L.Rev., Book 3, __ (forthcoming 2025) (Abstract: “The introduction of deepfake and AI evidence in legal proceedings will trigger a failure of the adversarial system because the law currently offers no effective solution to secure access to justice to pay for this evidence for those who lack resources.“) Professor Rebecca Delfino argues that the danger of deepfakes demands that the judge decide authenticity, not the jury.

Sub-Issue on Jury Determinations and the Psychological Impact of Deepfakes

I am inclined to agree with Professor Delfino. The important oral presentation of Paul Grimm and Maura Grossman to the Committee shows that they do too. We have a transcript of that by Fordham Law Review, Daniel J. Capra, Deepfakes Reach the Advisory Committee on Evidence Rules, 92 Fordham L.R. 2491 (2024) at pgs. 2421-2437.

Paul and Maura make a formidable team of presenters, including several notable moments where Maura shows Capra and his Committee a few deepfakes she made. In the first she put Paul Grimm’s head on Dan Capra’s body, and vice versa, which caused Dan to quip “I think you lose in that trade, Paul.” Then she asked the panel to close their eyes and listen to what turned out to be a fake audio of President Biden directing the Treasury Depart to make payment of $10,000 to Daniel Capra. Id. at pgs. 2427-2427.

I thought this was great ploy. Maura then told the Committee she made it in seconds using free software on the internet and that with more work it would sound exactly like the President. Id. at 2426-2427. Professor Capra, who has been stung before by surprise audios, did not seem amused and his ultimate negative recommendations show he was not persuaded.

Here are excerpts of the transcript of the next section of their presentation to the Committee.

PROF. GROSSMAN. Because there are two problems that these deepfakes and that generative AI cause. One is we’re moving into a world where none of us are going to be able to tell what is real from not real evidence—which of these videos are real, which of these aren’t. And I’m very worried about the cynicism and the attitude that people are going to have if they can’t trust a single thing anymore because I can’t use any of my senses to tell reality.

And the other is what they call the liar’s dividend, is why not doubt everything, even if it’s in fact real, because now I can say, “How do you know it’s not a deepfake?”, and we saw a lot of that in the January 6 cases. Some of the defendants said, “That wasn’t me there” or “How do you know it was me?”187 Elon Musk used that defense already.188 So you’re going to have both problems: one where it really is fake, and now every case going to require an expert; and the other where it really is real evidence, and you don’t want to become so cynical that you don’t believe any of it.

Deepfakes Reach the Advisory Committee on Evidence Rules, supra at pgs. 2427-2428.

To quote an NPR article on the “Liar’s Dividend:”

When we entered this age of deepfakes, anybody can deny reality. … That is the classic liar’s dividend.

The liar’s dividend is a term coined by law professors Bobby Chesney and Danielle Citron in a 2018 paper laying out the challenges deepfakes present to privacy, democracy, and national security. The idea is, as people become more aware of how easy it is to fake audio and video, bad actors can weaponize that skepticism. “Put simply: a skeptical public will be primed to doubt the authenticity of real audio and video evidence,” Chesney and Citron wrote.

Shannon Bond, People are trying to claim real videos are deepfakes. The courts are not amused (NPR, 5//8/23).

Back to the transcript of the presentation of Grossman and Grimm to the Committee: Judge Grimm went on to explain why, under the current rules, the jury may often have to make the final determination of authenticity. They emphasize that even if the jury decides it is inauthentic, the jury will still be tainted by the process, as they cannot unsee what they have seen. Instructions from a judge to disregard the video seen will be ineffective.

JUDGE GRIMM: Now there’s one monkey wrench in the machinery: When you’re dealing with authentication, you’re dealing with conditional relevance if there’s a challenge to whether or not the evidence is authentic. And so, if you’re going to have a factual situation where one side comes in and says, “This is the voice recording on my voicemail, this is the threatening message that was left on my voicemail, that’s Bill, I’ve known Bill for 10 years, I am familiar with Bill’s voice, that is plausible evidence from which a reasonable factfinder could find that it was Bill.”

If Bill comes in and says, “That was left at 12:02 PM last Saturday, at 12:02 PM I have five witnesses who will testify that I was at some other place doing something else where I couldn’t possibly have left that,” that is plausible evidence that it was not Bill.

And when that occurs, the judge doesn’t make the final determination under Rule 104(a).209 The jury does.210 And that’s a concern because the jury gets both versions now. It gets the plausible version that it is; it gets the plausible version that it’s not. The jury has to resolve that factual dispute before they know whether they can listen to that voicemail and take it into consideration as Bill’s voice in determining the outcome of the case.

PROF. GROSSMAN: Can I add just one thing? Two studies you should know about. One is jurors are 650 percent more likely to believe evidence if it’s audiovisual, so if that comes in and they see it or hear it, they are way more likely to believe it.211 (Rebecca A. Delfina, Deepfakes on Trial: A Call to Expand the Trial Judge’s Gatekeeping Role to Protect Legal Proceedings from Technological Fakery, 74 HASTINGS L.J. 293, 311 fn.101–02 (2023)).

And number two, there are studies that show that a group of you could play a card game. I could show you a video of the card game, and in my video it would be a deepfake, and I would have one of you cheating. Half of you would be willing to swear to an affidavit that you actually saw the cheating even though you didn’t because that video—that audio/video, the deepfake stuff—is so powerful as evidence that it almost changes perception.212 (See Wade, Green & Nash, Can Fabricated Evidence Induce False Eyewitness Testimony?, 24 APPLIED COGNITIVE PSYCH. 899 (2010)).

CHAIR SCHILTZ: But why would judges be any more resistant to the power of this than jurors?

JUDGE GRIMM: Well, for the same reason that that we believe that in a bench trial that the judge is going to be able to distinguish between the admissible versus the non-admissible.

CHAIR SCHILTZ: I know, but it is often fictional, right? There are certain things that I really am no better at than a juror is, like telling a real picture from an unreal picture, or deciding which of these two witnesses to believe—between the witness who says, “That’s his voice,” and the witness who said, “It couldn’t have been me.” Why am I any better at that than a juror?

JUDGE GRIMM: You might be better than a juror because you, as the judicial officer, can have it set up so that you have a hearing beforehand, which is a hearing on admissibility that the jury is not going to hear; and you have the witnesses come in, and you hear them; or you have a certificate under Rule 902(13). Also, you will be a repeat player.

PROF. GROSSMAN: Right. And you would at least know the questions to ask: How was this algorithm trained? Was it tested? What was it tested on? Who did the testing? Were they arm’s length? What’s the error rate?

JUDGE GRIMM: And order the discovery that the other side can have to be able to have the opportunity To challenge it by bringing that in.

CHAIR SCHILTZ: Yes, I get that part.

Deepfakes Reach the Advisory Committee on Evidence Rulessupra at pgs. 2427-2428.

The Chair, Hon. Patrick J. Schiltz asks good questions here and understands the issue. Anyone should be far more comfortable having a judge, especially one like Judge Schiltz, making the hard calls instead of a room of randomly called jurors. There is no question in my mind that judges are far better qualified than jurors to make these determinations. All three experts were making that point, Paul Grimm, Maura Grossman and Rebecca Delfino.

Back to Professor Rebecca Delfino’s Proposal

Here is Professor Capra explanation to the Committee of how Professor Delfino’s proposed rule changes would work. Unfortunately I have not found any argument from her on her proposal, just Capra’s explanation and he ultimately rejected it.

Professor Rebecca Delfino argues that the danger of deepfakes demands that the judge decide authenticity, not the jury.19 She contends that “[c]ountering juror skepticism and doubt over the authenticity of audiovisual images in the era of fake news and deepfakes calls for reallocating the fact finding authority to determine the authenticity of audiovisual evidence.” She contends that jurors cannot be trusted to fairly analyze whether a video is a deepfake, because deepfakes appear to be genuine, and “seeing is believing.” Professor Delfino suggests that Rule 901 should be amended to add a new subdivision (c), which would provide:

901(c). Notwithstanding subdivision (a), to satisfy the requirement of authenticating or identifying an item of audiovisual evidence, the proponent must produce evidence that the item is what the proponent claims it is in accordance with subdivision (b). The court must decide any question about whether the evidence is admissible.

She explains that the new Rule 901(c) “would relocate the authenticity of digital audiovisual evidence from Rule 104(b) to the category of relevancy in Rule 104(a)” and would “expand the gatekeeping function of the court by assigning the responsibility of deciding authenticity issues solely to the judge.”

The proposed rule would operate as follows: After the pretrial hearing to determine the authenticity of the evidence, if the court finds that the item is more likely than not authentic, the court admits the evidence. The court would instruct the jury that it must accept as authentic the evidence that the court has determined is genuine. The court would also instruct the jury not to doubt the authenticity, simply because of the existence of deepfakes. This new rule would take the Memorandum to the Committee at pgs. 22-23 (pgs. 35-36 of 358).

This proposal sounds feasible to me. It could help reduce the costs of expert battles and counter the Liar’s Dividend and CSI Effect. Professor Capra made a few helpful comments as to how Professor Delfino’s language would benefit by a few minor changes. But those are moot points because he respectfully declined to endorse the proposal noting that: “Given the presence of deepfakes in society, it may well be that jurors will do their own assessment, regardless of the instruction.” He seems to miss the point of minimizing the psychological impact on jurors by keeping deepfake videos and audios out of the jury room.

Paul Grimm and Maura Grossman‘s Two Rule Proposals

Two rule change proposals were made in early 2024 by Paul Grimm and Maura Grossman. (They were revised slightly and resubmitted in September 2024 as explained in Part Two of this article.) Paul and Maura are both well known to my readers as progressive leaders in law and technology. They have been working on these evidentiary issues for years. See eg., The GPTJudge: Justice in a Generative AI World, 23 Duke Law & Technology Review 1-34 (2023).

They were invited to present their proposals to the Committee to modify Rule 901(b)(9) for AI evidence and add a new Rule 901(c) for “Deepfake Evidence.” The transcript of their presentation was referred to previously. Deepfakes Reach the Advisory Committee on Evidence Rules, 92 Fordham L.R. 2491 (2024) at pgs. 2421-2437. I recommend you read this in full.

Here are the two rule changes Paul and Maura proposed:

901(b) Examples. The following are examples only—not a complete list—of evidence that satisfies the requirement [of Rule 901(b)]:
(9) Evidence about a Process or System. For an item generated by a process or system:
(A) evidence describing it and showing that it produces an accurate a valid and reliable result; and
(B) if the proponent concedes that the item was generated by artificial intelligence, additional evidence that:
(i) describes the software or program that was used; and
(ii) shows that it produced valid and reliable results in this instance.

Proposed New Rule 901(c) to address “Deepfakes”:

901(c): Potentially Fabricated or Altered Electronic Evidence. If a party challenging the authenticity of computer-generated or other electronic evidence demonstrates to the court that it is more likely than not either fabricated, or altered in whole or in part, the evidence is admissible only if the proponent demonstrates that its probative value outweighs its prejudicial effect on the party challenging the evidence.

As you can see their proposed new rule 901(c) makes it clear that a judge may take the jury out of the “fake or real” determination in close questions, and in so doing take away most of the potential prejudicial impact upon jurors. The burden of possible unconscious prejudice and emotional impact from viewing inadmissible deepfake media would be born solely by the judge. As discussed, the judge is better trained for that and will have the benefit of pretrial hearings and expert testimony. The jury retains its traditional power over all other determinations of justiciable facts. Note that this proposal does not go as far as Professor Delfino’s in taking determinations away from the jury and expanding the gatekeeper role of the judge. More on 901(c) in general will follow, but first the proposed revisions to Rule 901(b)(9).

Accuracy v. Reliability and Validity

Professor Capra killed both of the Grimm and Grossman proposals after asking for input from only one expert on his panel, the one who happened to be the only one on the panel proposing a competing rule change, Professor Rebecca Wexler. You might expect her to oppose Grimm and Grossman’s proposal, lobbying instead for her own rival proposals. To her credit she did not. Instead, in Capra’s own words, she “supported the proposals but suggested that they should be extended beyond AIMemorandum to the Committee at pgs. 9-10 (22-23 of 358). As to the amendment to Rule 901(b)(9) Professor Wexler said:

Re: the first Grimm/Grossman proposal, it may well be that the standard for authenticating system/process evidence should require a showing that the system/process produces “valid” and “reliable” results, rather than merely accurate results. . . .

I can understand the push to add a reliability requirement to 901(b)(9). It’s true that ML systems could rely on an opaque logic that gives accurate results most of the time but then sometimes goes off the rails and creates some seemingly illogical output. But manually coded systems can do the same thing. They could be deliberately or mistakenly programmed to fail in unexpected conditions, or even once every hundred runs on the same input data. So if reliability is important, why not make it a broader requirement?

Memorandum to the Committee at pg. 9 (22 of 358)

Still, Capra seemed to give little weight to her input and stuck with his objection. He continued to insist that the use of the words “valid and reliable” instead of “accurate” in Rule 901(b)(9) is an unnecessary and confusing complication. It appears that he does not fully understand the nerdy AI based technical reasons behind this change. Notice that Capra once again relies on a vendor, Evidently AI, to try to support his attempt to get technical. Professor Capra says in his Memorandum to the Committee at page 7 (20 of 358).

The proposal (on Rule 901(b)(9)) distinguishes the terms “validity,” “reliability,” and “accuracy.” That is complicated and perhaps may be unnecessary for a rule of evidence. . . . As to “accuracy”, the proposal rejects the term, but in fact there is a good deal of material on machine learning that emphasizes “accuracy.” See, e.g., https://www.evidentlyai.com/classification-metrics/accuracy-precision-recall  . . . The whole area is complicated enough without adding distinctions that may not make a difference.

Too complicated, really? Meaningless distinctions? Maura Grossman and Paul Grimm, who have extensive experience actually using these evidence rules in court, and are both bonafide nerds (especially Maura), were not, to my knowledge, given an opportunity to respond to these criticisms. I have not talked to them about this but would imagine they were not pleased.

Obviously fake Image of Judge Grimm as an unhappy robot by Ralph Losey using Visual Muse

To be continued … Part Two of this article will complete the analysis of the Grimm – Grossman rule proposals and look at what comes next with the Rule Committee meeting of November 8, 2024. It will also include a discussion of a second, slightly revised proposal by Paul Grimm and Maura Grossman that they just submitted and discussion of the new article by Judge Paul Grimm (retired), Professor Maura Grossman and six other experts: Deepfakes in Court: How Judges Can Proactively Manage Alleged AI-Generated Material in National Security Cases. They are all trying, once again, to push the Committee into action. Let us hope they succeed. Don’t look up, but an election is coming.

Ralph Losey Copyright 2024 — All Rights Reserved



DefCon Chronicles: Where Tech Elites, Aliens and Dogs Collide – Series Opener

August 21, 2023

Ralph Losey. Published August 21, 2023.

From Boris to Bots: Our First Dive into the DefCon Universe. This begins a series of blogs chronicling the infamous DefCon event in Las Vegas. The next installment will cover President Biden’s unprecedented request for hackers to attend DefCon to hack AI, and the hackers enthusiastic response, including reporter-AI-hacker Ralph Losey, to break existing AI software in an open contest. In addition, nearly all of the top cybersecurity leadership of the White House and Department of Homeland Security personally attended DefCon, including the Homeland Security Department Secretary himself, Alejandro Mayorkas. They came to help officially open the conference and stayed to give multiple policy statements and answer all hacker questions. It was a true breakthrough moment in cyber history.

Boris seems unimpressed by his official DefCon Dog award

I attended DefCon 31, on August 10-15, 2023, as independent Press, accompanied by my co-reporter daughter, a former lobbyist with an English Lit background, and her dog, Boris. Our press status with special green badge had a high price tag, but it gave us priority access to everything. It also facilitated our interaction with notable figures, from the White House Science Advisor, Arati Prabhakar, to DefCon’s enigmatic founder, Dark Tangent.

DefCon is the world’s largest tech hacker “conference” – more like a inter-dimensional portal at the Caesars Forum. When we first checked in, we happened to meet the leader of DefCon Press and P.R. She fell for little Boris in a handbag, and declared him the official DefCon 31 dog! What an honor. Way to go Boris, who everyone thinks is a Chihuahua, but is really a Russian Terrier. Nothing is as it seems at DefCon. The guy you see walking around in shorts, who looks like a bearded punk rocker, may actually be a senior NSA fed. We will tell you why the NSA was there later in this series.

At DefCon, we immersed ourselves in a diverse crowd of over 24,000 elite tech experts from across the globe. This included renowned names in Cybersecurity, notably the formidable red team professionals. Most of these hackers are law-abiding entrepreneurs, as well as members of top corporate and federal red and blue teams. Several thousand were there just to answer President Biden’s call for hackers everywhere to come to DefCon to compete to break AI. Such a request had never been made before. Much more on this later, including my joining in the AI competition.

The tech experts, hackers all, came together for the thirty-first year of DefCon. We were drawn to participate, and in our case, also report on, the hundreds of large and small lectures and other educational events, demonstrations and vendor exhibitions. In addition, the really big draw was, as usual, the dazzling array of hacker challenges and competitions. Some of these are quiet serious with major prizes and rep at stake, and required pre-qualifications and success in entry rounds. But most were open to all who showed up.

Picture walking into a football stadium, but in place of athletes, you’re surrounded by the world’s tech elite, each donning distinctive hacker attire. As we flooded in by the thousands, it was a blend of seasoned pros and enthusiastic fans. I counted myself among the fans, yet I eagerly took on several challenges, such as the AI red team event. The sheer diversity and expertise of all participants was impressive.

The entrance boasted a towering, thirty-foot neon sparkling mural that caught my eye immediately. I’ve refined the photo to focus on the mural, removing the surrounding crowds. And, just for fun, there’s an alien addition.

Ralph entering Defcon 31

The open competitions came in all shapes and sizes: hacker vs. computers and machines of all types, including voting machines, satellites and cars; hacker vs. hacker contests; and hacker teams against hacker teams in capture the flag type contests. An article will be devoted to these many competitions, not just the hacker vs. AI contest that I entered.

There was even a writing contest before the event to compete for the best hacker-themed short story, with the winner announced at DefCon. I did not win, but had fun trying. My story followed the designated theme, was set in part in Defcon, and was a kind of sci-fi, cyber dystopia involving mass shootings with AI and gun control to the rescue. The DefCon rules did not allow illustrations, just text, but, of course, I later had to add pictures, one of which is shown below. I’ll write another article on that fiction writing contest too. There were many submissions, most were farther-out and better than my humble effort. After submission, I was told that most seemed to involve Ai in some manner. It’s in the air.

Operation Veritas - short story by R. Losey
Illustration by Ralph for his first attempt at writing fiction, submitted for judging in the DefCon 31 writing competition.

So many ideas and writing projects are now in our head from these four days in Vegas. One of my favorite lectures, which I will certainly write about, was by a French hacker, who shared that he is in charge of cybersecurity for a nuclear power plant. He presented in a heavy French accent to a large crowd on a study he led on Science Fiction. It included statistical analysis of genres, and how often sci-fi predictions come true. All of DefCon seemed like a living sci-fi novel to us, and I am pretty sure there were multiple aliens safely mingling with the crowd.

We provide this first Defcon 31 chronicle as an appetizer for many more blogs to come. This opening provides just a glimpse of the total mind-blowing experience. The official DefCon 31 welcome trailer does a good job of setting the tone for the event. Enlarge to full screen and turn up the volume for best affects!

DefCon 31 official welcome video

Next, is a brief teaser description and image of our encounter with the White House Science Advisor, Dr. Arati Prabhakar. She and her government cyber and AI experts convinced President Biden to issue a call for hackers to come to Defcon, to try to break (hack) the new AI products. This kind of red team effort is needed to help keep us all safe. The response from tech experts worldwide was incredible, over a thousand hackers waited in a long line every day for a chance to hack the AI, myself included.

We signed a release form and were then led to one of fifty or more restricted computers. There we read the secret contest instructions, started the timer, and tried to jail break the AI in multiple scenarios. In quiet solo efforts, with no outside tools allowed and constant monitoring to prevent cheating, we tried to prompt ChatGPT4 and other software to say or do something wrong, to make errors and hallucinate. I had one success. The testing of AI vulnerabilities is very helpful to AI companies, including OpenAI. I will write about this is in much greater detail in a later article, as AI and Policy were my favorite of the dozens of tracks at DefCon.

A lot of walking was required to attend the event and a large chill-out room provided a welcome reprieve. They played music there with DJs, usually as a quiet background. There were a hundred decorated tables to sit down, relax, and if you felt like it, chat, eat and drink. The company was good, everyone was courteous to me, even though I was press. The food was pretty good too. I also had the joy of someone “paying it forward” in the food line, which was a first for me. Here is a glimpse of the chill out scene from the official video by Defcon Arts and Entertainment. Feel it. As the song says, “no one wants laws on their body.” Again, go full screen with volume up for this great production,

Defcon 31 Chill Out room, open all day, with video by Defcon Arts and Entertainment, DefConMusic.org

As a final teaser for our DefCon chronicles, check out my Ai enhanced photo of Arati Prabhakar, whose official title is Director of the Office of Science and Technology. She is a close advisor of the President and member of the Cabinet. Yes, that means she has seen all of the still top secret UFO files. In her position, and with her long DOD history, she knows as much as anyone in the world about the very real dangers posed by ongoing cyber-attacks and the seemingly MAD race to weaponize AI. Yet, somehow, she keeps smiling and portrays an aura of restrained confidence, albeit she did seem somewhat skeptical at times of her bizarre surroundings at DefCon, and who knows what other sights she has been privy too. Some of the questions she was asked about AI did seem strange and alien to me.

Arati Prabhakar speaking on artificial intelligence, its benefits and dangers, Photoshop, beta version, enhancements by Ralph Losey

Stay tuned for more chronicles. Our heads are exploding with new visuals, feelings, intuitions and ideas. They are starting to come together as new connections are made in our brains’ neural networks. Even a GPT-5 could not predict exactly what we will write and illustrate next. All we know for certain is that these ongoing chronicles will include video tapes of our interviews, presentations attended, including two mock trials of hackers, as well as our transcripts, notes, impressions and many more AI enhanced photos. All videos and photos will, of course, have full privacy protection of other participants who do not consent, which the strict rules of Def Con require. If you are a human, Ai or alien, and feel that your privacy rights have been violated by any of this content, please let us know and we will fuzz you out fast.

DefCon 31 entrance photo by Def Con taken before event started

Ralph Losey Copyright 2023 (excluding the two videos, photo and mural art, which are Def Con productions).


REAL OR FAKE? New Law Review Article Provides a Good Framework for Judges to Make the Call

June 13, 2023

Ralph Losey. Published June 13, 2023.

Losey Midjourney Image of a GPT Judge

The GPTJUDGE: Justice in a Generative AI World article will be published in October by Duke Law & Technology Review. The authors are Maura Grossman, Paul Grimm, Daniel Brown and Molly Xu. In addition to suggesting a legal framework for judges to determine if proffered evidence is Real or Fake, the article provides good background on generative AI. It does so in an entertaining way, touching on a wide variety of issues.

The evidentiary issues raised by generative type AI and deep fakes, and analysis of federal rules, are the parts of their article that interest me the most. Their proposed legal framework for adjudication of authenticity is excellent. It deserves attention by judges and arbitrators everywhere. ‘Real or Fake’ is not just a meme, it is an important issue of the day, both in the law and general culture. Justice depends on Truth, on true facts, reality. Justice is difficult, perhaps impossible to attain when lies and fakes confuse the courtroom.

Before I go into the article, lets play the `Real or Fake game’ now sweeping the Internet with some pictures of the two lead authors, both public figures, Paul Grimm and Maura Grossman. What do you think, which of these pictures are Real and which are Fake? There will be more tests like this to come. Leave a comment with your best guesses or use your AI to get in touch with my AI.

Introduction

The GPTJUDGE: Justice in a Generative AI World, will be published in October in Vol. 23, Iss. 1 of Duke Law & Technology Review (Oct. 2023). Maura Grossman was kind enough to provide me with an author’s advance version. Doug Austin of E-Discovery Today has already written a good summary of the entire twenty-six page article. The GPT Judge is a very ambitious article that covers A to Z on Generative AI and law. My good colleague, Doug Austin, describes the entire article. I recommend you read Doug’s article, or better yet, read the whole GPTJudge article for yourself.

Unlike Doug’s article, I will, as mentioned, only focus on one part of the article. This is the part of The GPTJudge, found at pages 12-18, which addresses the thorny evidentiary issues concerning LLM AI. Is it real or fake evidence offered? What rules govern these issues? And what a judge should do, or in my case, an arbitrator do, when these issues arise.

Although Doug Austin’s article is a real article, not a fake, it appears to me that Doug did have a wee bit of help in writing from the devil itself here, namely ChatGPT-4. He carefully reviewed and edited the Generative AI’s work, I am sure. It is a fine article, but has a familiar ring. Parts of my article will also have that familiar generative AI tone. It is a real Ralph Losey writing, not a fake. I am pretty certain of that. But, truth be told, I too use GPT AI – ChatGPT-4 to help me write this article. My own tiny human brain needs all of the AI help it can get to accomplish the ambitious task I have set myself here of summarizing this complex corner of the Duke Law Review Article. ChatGPT is a good writing tool, and so is the WordPress software that I also use to create these blogs. I now also use another tool to craft my blogs, a generative AI program called Midjourney. Here, for instance, Midjourney helped me create some pretty cool, but fake images of Grimm and Grossman. Another `Real or Fake’ test will be coming soon, but first some more background on the lead authors whom I know well. This is not to slight the the fellow professor with Maura Grossman at David R. Cheriton School of Computer Science at the University of Waterloo, who is a co-author, Daniel G. Brown, Ph.D., nor Professor Brown and Professor Grossman’s undergraduate student here who helped, Molly(Yiming) Xu.

Lead Authors, the Very Real Paul Grimmand Maura Grossman

One of the lead authors of GPT Judge was a real District Court Judge in Baltimore until just recently, Paul Grimm. He is now a Professor at Duke Law and Director of the Bolch Judicial Institute, and, as all who know him will agree, a truly outstanding, very real person. Paul Grimm is without question one of the top judicial scholars of our time, especially when it comes to evidentiary issues. I have had the privilege of listening to him speak many times and even teaching with him a few times. He was even kind enough to write the Forward to one of my books.

Now back to `Real or Fake’ starring Paul Grimm. You be the judge.

The lead author of The GPTJUDGE: Justice in a Generative AI World, is a top expert in law and technology and friend, Maura Grossman, PhD. She is now a practicing attorney, with her own law and consulting firm, Maura Grossman Law, in Buffalo, NY, a Special Master for Courts throughout the country, a Research Professor in the School of Computer Science at the University of Waterloo, an Adjunct Professor at Osgoode Hall Law School at the University of Waterloo and an affiliate faculty member of the Vector Institute in Ontario. Phew! How does she do it all? I suspect substantial help from AI. Now it is Maura’s turn to be the subject of my `Real of Fake’ quiz, then we will get into the article proper.

What do you think, `Real or Fake’ Maura Grossman?

One more thing before I begin, Judge Grimm and Maura Grossman have recently written another article that you should also put on your must read pile, or better yet, click the link and see it now and bookmark: Paul W. Grimm, Maura R. Grossman, and Gordon V. Cormack, Artificial Intelligence as Evidence, 19 Nw. J. Tech. & Intell. Prop. 9, 84 (2021). The well known information scientist, and Maura’s husband, Gordon Cormack, is also an author. So you know the technical AI details in all of these articles are top notch.

Summary of the Segment of the Law Review Article, The GPTJudge, Covering Evidentiary Issues

Judges will soon be required to routinely face the issues raised by evidence created by GPT AI and any evidence alleged to be fake. This will force judges to assess the authenticity and admissibility of evidence challenged as inaccurate or as potential deepfakes. The existing Federal Rules of Evidence and their state counterparts can, the authors contend, be flexibly applied to accommodate the emerging AI technology. Although not covered by the article, in my personal opinion the rules governing arbitrations are also flexible enough. The author’s contend, and I generally agree, that It is infeasible to amend these rules for every new technological development, such as deepfakes, due to the time-consuming revision process required to amend federal rules. Paul Grimm should know as he used to be on the rules committee that revises the Federal Rules of Civil Procedure.

The admissibility of AI evidence hinges on several key areas under the Federal Rules of Evidence: relevance (401), authenticity (901 and 902), the judge’s gatekeeper role in evaluating evidence (104(a)), the jury’s role in determining the authenticity of contested evidence (104(b)), and the need to exclude prejudicial evidence, even if relevant (403).

Judges should adapt the rules to allow for their application to new technologies like AI, without rigidly adhering to them, to promote the development of evidence law. Federal Rule of Evidence 702, which pertains to scientific, technical, and specialized evidence, requires judges to ensure such evidence is based on sufficient facts, reliable methodology, and has been applied accurately to the case.

These evidence rules provide judges and lawyers with enough guidance. Arbitrators probably do not need special new rules either, especially because there are no juries in arbitration.

It is important when assessing potential GPT AI evidence, or alleged deepfake evidence, that judges pay close attention to the rule requiring the exclusion of evidence that could lead to unfair prejudice (Rule 403). This stresses the importance of ensuring that such evidence is both valid and reliable before being presented to the jury. This rule obviously has no direct application to arbitration, but still, arbitrators must take care they are not fooled by fakes.

When evaluating the authenticity of potential evidence, including deepfakes, or other disputed evidence, judges should refer to Federal Rule of Evidence 702 and the Daubert factors to assess the evidence’s validity and reliability. Careful consideration of the potential for unfair prejudice that can occur with the introduction of unreliable technical evidence is of prime importance. The authors stress that admissibility should not solely hinge on whether the evidence is more likely than not to be genuine (the preponderance standard), but also should depend on the potential risks or negative outcomes if the evidence is proven fake, or insufficiently valid and reliable. Evidence should be excluded if the authenticity is doubtful and the risk of unfair or incorrect outcomes is high. Again the presence of a jury or not tempers this risk.

So how are judges supposed to make the call on `Real or Fake’? Judge Paul Grimm and Maura Grossman recommend Judges follow three steps to make these determinations:

  1. Scheduling Order: Judges should set a deadline in their scheduling order for parties intending to introduce potential GPT AI evidence to disclose this to the opposing party and the court well in advance. This allows the opposing counsel time to decide whether they want to challenge the admissibility of the evidence and seek necessary discovery.
  2. The Hearing: When there’s a challenge to the admissibility of the evidence as AI-generated or deepfake, judges should set an evidentiary hearing on the testimony and other evidence needed to rule on the admissibility. This hearing should be scheduled significantly ahead of the trial to allow the judge enough time to evaluate the evidentiary record and make a ruling.
  3. The Ruling: Following the hearing, judges should carefully consider the evidence and arguments presented during the hearing and issue a ruling. This ruling should assess whether the proponent of the evidence has adequately authenticated it. The judge should address the relevance, authenticity, and prejudice arguments. Special attention should be given to the validity and reliability of the challenged evidence, weighing its relevance against the risk of an unfair or excessively prejudicial outcome.

This is good advice for any judge facing these issues, arbitrators too, as well as attorneys and litigants. Everyone should walk these three simple steps to escape the fake traps that Generative AI can create.

Conclusion

I commend the entire law review article for your reading, but especially the section on evidentiary issues. This section, pgs. 12-18, was, for me, particular interesting and helpful. Also see the earlier article by Grimm, Grossman and Cormack, Artificial Intelligence as Evidence. Judges and Arbitrators will all soon be facing many challenges regarding the authenticity and admissibility of evidence related to AI. `Real or Fake’ may be a key question of our times.

The authors insist, and I somewhat reluctantly agree, that it’s not practical to amend the existing Federal Rules of Evidence for every new technological development. Instead, as the authors also point out, and I again agree, these rules provide a flexible framework. Our rules already allow judges to evaluate factors like relevance, authenticity, and potential prejudice of AI-generated evidence. Rule 702 in particular is crucial because it requires that scientific, technical, or specialized evidence be based on ample facts, reliable methodology, and a sound application to the case at hand. The same situation applies to arbitrations, although arbitrations are typically more informal and there are no juries. Still, arbitrators should be on the lookout for fake or unreliable evidence and look for general guidance from the federal evidence rules.

Judge Grimm and Dr. Grossman propose a three-step process to guide judges when handling potential GPT AI or deepfake evidence. I like the simple three-step procedure proposed. There is more to it than described in this summary, of course. You need to read the whole article – THE GPTJUDGE: JUSTICE IN A GENERATIVE AI WORLD.

I urge other lawyers, arbitrators and judges to try out the three-steps proposed when they face these issues. The just, speedy and inexpensive resolution of disputes must remain the polestar of all our dispute resolution. These suggestions, if employed in a reasoned and prudent manner, can help us to do that. Ensuring the reliability of evidence is important because Justice arises out of truth, not lies and fakes.

Copyright Ralph Losey 2023, to text and fake images only.