Forensic Exam | e-Discovery Team

The Problem of Deepfakes and AI-Generated Evidence: Is it time to revise the rules of evidence? – Part One

September 19, 2024

Ralph Losey. Published September 19, 2024.

On April 19, 2024, the Advisory Committee on Evidence Rules for federal courts faced a critical question: Does AI-generated evidence, including deepfakes, demand new rules? The Committee’s surprising answer—’not yet.’ Was that the right call? Will they change their mind when they meet in November again right after the elections?

Image in Photorealistic style by Ralph Losey using Visual Muse

Part One analyzes the various rule change proposals. Chief among them is the proposal by Judge Paul Grimm (retired) and Professor Maura Grossman, who are well known to all legal tech readers. Several other interesting proposals were considered and discussed. You will hear the nerds inside view of the key driving facts at play here, the danger of deepfakes, the power of audio-video evidence, jury prejudice and the Liar’s Dividend. Part One also talks about why the Evidence Rules Committee chose not to act and why you should care.

Part Two will complete the story and look at what comes next with the meeting of November 8, 2024. It will also include a discussion of a second, slightly revised proposal by Paul Grimm and Maura Grossman that they just submitted and the latest article by Paul Grimm, Maura Grossman and six other experts: Deepfakes in Court: How Judges Can Proactively Manage Alleged AI-Generated Material in National Security Cases. They are all trying, once again, to push the Committee into action. Let us hope they succeed.

Summary of the Evidence Committee’s Decision and the Leadership of its Official Reporter

The Committee, under the strong leadership of its official Reporter for the last twenty-eight years, Daniel J. Capra, considered multiple proposals to amend the Rules of Evidence, but rejected them all. Professor Capra cited the need for further development. For now, courts must manage the significant new challenges of AI with existing rules.

They key segment of the Committee’s work is the 26-page memorandum found at Tab 1-A of the 358-page agenda book. It was written by Professor Daniel J. Capra, Fordham University School of Law and Adjunct Professor at Columbia Law. Dan Capra is a man almost my age, very powerful and respected in academic and judicial circles. He is a true legend in the fields of evidence, legal ethics and education, but he is no nerd. His comments and the transcript of his interaction with two of the top tech-nerds in law, Judge Paul Grimm (retired) and Professor Maura Grossman, make clear that Professor Capra lacks hands-on experience and deep understanding of generative AI.

That is a handicap to his leadership of the Committee on the AI issues. His knowledge is theoretical only, and just one of many, many topics that he reads about. He does not teach AI and the law, as both Grimm and Grossman do. This may explain why he wanted to just wait things out, again. He recommended, and the Committee agreed, apparently with no dissenters, to reject the concerns of almost all of the hands-on nerds, including all of the legal experts proposing rule changes. They all warn of the dangers of generative AI and deepfakes to interfere with our evidence based system of justice. It may even make it impossible to protect our upcoming election from deepfake interference. Daniel Capra gives some consideration to danger, but thinks the concerns are overblown and the Committee should continue to study and defer any action.

Don’t Look Up! image in Pop Art style by Ralph Losey using Visual Muse

Evaluation of the Pending Danger of Deepfakes

For authority that the dangers of deepfake are overblown, and so no rule changes are necessary, Professor Capra cites two articles. Professor Capra’s Memorandum to the Committee at pgs. 25-26 (pgs. 38-39 of 358). The first is unpersuasive, to say the least, a 2019 article in the Verge, Deepfake Propaganda is not a Real Problem, THE VERGE (Mar. 15, 2019). The article was written by Russell Brandom, who claims expertise on “the web, the culture, the law, the movies, and whatever else seems interesting.”

The second article was better, Riana Pfefferkorn, Deepfakes in the Courtroom, 29 Public Interest Law Journal 245, 259 (2020). Still, it was written in 2020 and so now is way out of date. My research discovered that Riana Pfefferkorn has published a much more recent paper pertaining to deepfakes, Addressing Computer-Generated Child Sex Abuse Imagery: Legal Framework and Policy Implications (Lawfare, February 2024). In the Introduction at page-2 of this well written paper she says:

Given the current pace of technological advancement in the field of generative ML, it will soon become significantly easier to generate images that are indistinguishable from actual photographic images depicting the sexual abuse of real children.

For Ms. Pfeffferkorn the problem of deepfakes is now a very real and urgent problem. At page 25 of the paper she asserts: “There is an urgent need, exacerbated by the breakneck pace of advancements in machine learning, for Congress to invest in solving this technical challenge.”

Professor Capra and the Committee see no “urgent need” to act. They do so in part because of their belief that new technology will emerge (or already exists) that is able to detect deepfakes and so this problem will just go away. Professor Capra has one expert to support that view, Grant Fredericks, the president of Forensic Video Solutions. I looked at the company website and see no claims to development or use of any new technologies. Capra relies on the vendor promises to detect fake videos and keep them out of evidence, “both because they can be discovered using the advanced tools of his (Fredricks) trade and because the video’s proponent would be unable to answer basic questions to authenticate it (who created the video, when, and with what technology.” Professor Capra’s Memorandum to the Committee at pg. 26 (pg. 39 of 358).

Capra’s memorandum to the Committee at first discusses why GenAI fraud detection is so difficult. He explains the cat and mouse competition between image generation software makers and fraud detection software companies. Oddly enough, his explanation seems correct to me, and so appears to impeach his later conclusion and the opinion of his expert, Fredericks. Here is the part of Capra’s memorandum that I agree with:

Generally speaking, there is an arms race between deepfake technology and the technology that can be employed to detect deepfakes. . . . any time new software is developed to detect fakes, deepfake creators can use that to their advantage in their discriminator models. A New York Times report reviewed some of the currently available programs that try to detect deepfakes. The programs varied in their accuracy. None were accurate 100 percent of the time.

Memorandum to the Committee at pg. 17 (pg. 30 of 358

Professor Capra’s supports his statement that “none were accurate 100 percent of the time,” by citing to a NYT article, Another Side of the A.I. Boom: Detecting What A.I. Makes (NYT, May 19, 2023). I read the article and it states that there are now more than a dozen companies offering tools to identify whether something was made with artificial intelligence, including Sensity AI, Optic, Reality Defender and FakeCatcher. The article repeats Professor Capra’s arms race scenario, but adds how the detector software always lags behind. That is common in cybersecurity too, where the defender is always at a disadvantage. Here is a quote from the NYT article:

Detection tools inherently lag behind the generative technology they are trying to detect. By the time a defense system is able to recognize the work of a new chatbot or image generator, like Google Bard or Midjourney, developers are already coming up with a new iteration that can evade that defense. The situation has been described as an arms race or a virus-antivirus relationship where one begets the other, over and over.

That has always been my understanding too, which is why I cannot believe that new technology is around the corner to finally make detection foolproof or that Grant Fredericks has a magic potion. I think it is more likely that the spy versus spy race will continue and uncertainty will be with us for a long time. Still, I sincerely hope that Professor Capra is right, and the fake image dangers are overstated. That’s my hope, but reason and science tells me that’s a risky assumption and we should mitigate our risks by making some modest revisions to the rules now. I would start with the two short proposals of Grimm and Grossman (as slightly revised in September 2024 and explained in Part Two).

Spy v. Spy image in 50s Pop Art stye by Ralph Losey using Visual Muse

Professor Capra’s Discussion of the Proposed Rule Amendments

There were four rule change proposals before the Committee in April 2024. One by Professor Andrea Roth of the University of California, Berkeley, School of Law, a second by Professor Rebecca Delfino of Loyla Law School and a third by Judge Paul Grimm (retired) and Professor Maura Grossman, already well known to most of my readers. I omit discussion here of a fourth proposal by John LaMonga in the interests of time, but you can learn about it in Professor Capra’s Memorandum to the Committee at pgs. 23-25 (pgs. 36-38 of 358). Also see John P. LaMonaga, A Break from Reality: Modernizing Authentication Standards for Digital Video Evidence in the Era of Deepfakes, 69 Am. U.L. Rev. 1945, 1984 (2020).

Professor Andrea Roth’s Rule Proposals

Professor Roth’s suggestions are nerdy interesting and forward thinking. Her suggestions are found in Professor Capra’s Memorandum to the Committee at pgs. 10-13 (pgs. 23-26 of 358) and Capra’s critical comments of the proposals follow at pgs. 13-16 (pgs. 26-29 of 358). I urge interested readers to check out her proposals for yourself. Capra’s comments seem a bit overly critical and I look forward to hearing more from her in the future.

Here is a Capra’s comment to one of her proposals, to add a new, independent subdivision to Rule 702. Testimony by Expert Witnesses.

The proposal addresses what could be thought to be a gap in the rules. Expert witnesses must satisfy reliability requirements for their opinions, but it is a stretch, to say the least, to call machine learning output an “opinion of an expert witness.”

Memorandum to the Committee at pg. 11 (pg. 24 of 358).

Oh really? A stretch, to say the least. Obviously Capra is not familiar with my work, and that of many others in AI, on the use of generative AI personas as experts. See e.g. Panel of AI Experts for Lawyers; and, Panel of Experts for Everyone About Anything. Also see: Du, Li, Torralba, Tenenbaum and Mordatch, Improving Factuality and Reasoning in Language Models through Multiagent Debate, (5/23/23).

Panel of AI Experts image by Ralph Losey who consults with them frequently

For me Andrea Roth’ proposals are not a stretch, to say the least, but common sense based on my everyday use of generative AI.

Andrea Roth also suggests that Rule 806. Attacking and Supporting the Declarant’s Credibility be amended to allow opponents to “impeach” machine output in the same way as they would impeach hearsay testimony from a human witness. Professor Capra of course criticizes that too, but this time is more kind, saying at page 13 of his memo.

The goal here is to treat machine learning — which is thinking like a human — the same way that a human declarant may be treated. Thought must be given to whether all the forms of impeachment are properly applicable to machine learning. . . . The question is whether an improper signal is given by applying 806 wholesale to machine related evidence, when in fact not all the forms of impeachment are workable as applied to machines. That said, assuming that some AI-related rule is necessary, it seems like a good idea, eventually, to have a rule addressing the permitted forms of impeachment of machine learning evidence.

I thought Andrea Roth’s suggestion was a good one. I routinely cross-examine AI on their outputs and opinions. It is an essential prompt engineering skill to make sure their opinions are reliable and understand the sources of their opinions.

Due to concerns over the length of this article I must defer further discussion of Professor Andrea Roth’s work and proposals for another day.

Robot cross-examined in impressionism style by Ralph Losey using Visual Muse

Professor Rebecca Delfino’s Proposal to Remove Juries From Deepfake Authenticity Findings

Professor Rebecca Delfino of Loyla Law School is a member of the Committee’s expert panel. She is very concerned about the dangers of the powerful emotional impact of audiovisuals on jurors and the costs involved in authenticity determinations. Her recent writings on these issues include: Deepfakes on Trial: A Call to Expand the Trial Judge’s Gatekeeping Role To Protect Legal Proceedings from Technological Fakery, 74 HASTINGS L.J. 293 (2023); The Deepfake Defense—Exploring the Limits of the Law and Ethical Norms in Protecting Legal Proceedings from Lying Lawyers, Loyola Law School, 84 Ohio St. L.J., Issue 5 1068 [2024]; Pay-To-Play: Access To Justice In The Era Of Ai And Deepfakes, 55 Seton Hall L.Rev., Book 3, __ (forthcoming 2025) (Abstract: “The introduction of deepfake and AI evidence in legal proceedings will trigger a failure of the adversarial system because the law currently offers no effective solution to secure access to justice to pay for this evidence for those who lack resources.“) Professor Rebecca Delfino argues that the danger of deepfakes demands that the judge decide authenticity, not the jury.

Sub-Issue on Jury Determinations and the Psychological Impact of Deepfakes

I am inclined to agree with Professor Delfino. The important oral presentation of Paul Grimm and Maura Grossman to the Committee shows that they do too. We have a transcript of that by Fordham Law Review, Daniel J. Capra, Deepfakes Reach the Advisory Committee on Evidence Rules, 92 Fordham L.R. 2491 (2024) at pgs. 2421-2437.

Paul and Maura make a formidable team of presenters, including several notable moments where Maura shows Capra and his Committee a few deepfakes she made. In the first she put Paul Grimm’s head on Dan Capra’s body, and vice versa, which caused Dan to quip “I think you lose in that trade, Paul.” Then she asked the panel to close their eyes and listen to what turned out to be a fake audio of President Biden directing the Treasury Depart to make payment of $10,000 to Daniel Capra. Id. at pgs. 2427-2427.

I thought this was great ploy. Maura then told the Committee she made it in seconds using free software on the internet and that with more work it would sound exactly like the President. Id. at 2426-2427. Professor Capra, who has been stung before by surprise audios, did not seem amused and his ultimate negative recommendations show he was not persuaded.

Here are excerpts of the transcript of the next section of their presentation to the Committee.

PROF. GROSSMAN. Because there are two problems that these deepfakes and that generative AI cause. One is we’re moving into a world where none of us are going to be able to tell what is real from not real evidence—which of these videos are real, which of these aren’t. And I’m very worried about the cynicism and the attitude that people are going to have if they can’t trust a single thing anymore because I can’t use any of my senses to tell reality.

And the other is what they call the liar’s dividend, is why not doubt everything, even if it’s in fact real, because now I can say, “How do you know it’s not a deepfake?”, and we saw a lot of that in the January 6 cases. Some of the defendants said, “That wasn’t me there” or “How do you know it was me?”187 Elon Musk used that defense already.188 So you’re going to have both problems: one where it really is fake, and now every case going to require an expert; and the other where it really is real evidence, and you don’t want to become so cynical that you don’t believe any of it.

Deepfakes Reach the Advisory Committee on Evidence Rules, supra at pgs. 2427-2428.

To quote an NPR article on the “Liar’s Dividend:”

When we entered this age of deepfakes, anybody can deny reality. … That is the classic liar’s dividend.

The liar’s dividend is a term coined by law professors Bobby Chesney and Danielle Citron in a 2018 paper laying out the challenges deepfakes present to privacy, democracy, and national security. The idea is, as people become more aware of how easy it is to fake audio and video, bad actors can weaponize that skepticism. “Put simply: a skeptical public will be primed to doubt the authenticity of real audio and video evidence,” Chesney and Citron wrote.

Shannon Bond, People are trying to claim real videos are deepfakes. The courts are not amused (NPR, 5//8/23).

Liar’s Dividend image by Ralph Losey in surrealistic style using Visual Muse

Back to the transcript of the presentation of Grossman and Grimm to the Committee: Judge Grimm went on to explain why, under the current rules, the jury may often have to make the final determination of authenticity. They emphasize that even if the jury decides it is inauthentic, the jury will still be tainted by the process, as they cannot unsee what they have seen. Instructions from a judge to disregard the video seen will be ineffective.

JUDGE GRIMM: Now there’s one monkey wrench in the machinery: When you’re dealing with authentication, you’re dealing with conditional relevance if there’s a challenge to whether or not the evidence is authentic. And so, if you’re going to have a factual situation where one side comes in and says, “This is the voice recording on my voicemail, this is the threatening message that was left on my voicemail, that’s Bill, I’ve known Bill for 10 years, I am familiar with Bill’s voice, that is plausible evidence from which a reasonable factfinder could find that it was Bill.”

If Bill comes in and says, “That was left at 12:02 PM last Saturday, at 12:02 PM I have five witnesses who will testify that I was at some other place doing something else where I couldn’t possibly have left that,” that is plausible evidence that it was not Bill.

And when that occurs, the judge doesn’t make the final determination under Rule 104(a).209 The jury does.210 And that’s a concern because the jury gets both versions now. It gets the plausible version that it is; it gets the plausible version that it’s not. The jury has to resolve that factual dispute before they know whether they can listen to that voicemail and take it into consideration as Bill’s voice in determining the outcome of the case.

PROF. GROSSMAN: Can I add just one thing? Two studies you should know about. One is jurors are 650 percent more likely to believe evidence if it’s audiovisual, so if that comes in and they see it or hear it, they are way more likely to believe it.211 (Rebecca A. Delfina, Deepfakes on Trial: A Call to Expand the Trial Judge’s Gatekeeping Role to Protect Legal Proceedings from Technological Fakery, 74 HASTINGS L.J. 293, 311 fn.101–02 (2023)).

And number two, there are studies that show that a group of you could play a card game. I could show you a video of the card game, and in my video it would be a deepfake, and I would have one of you cheating. Half of you would be willing to swear to an affidavit that you actually saw the cheating even though you didn’t because that video—that audio/video, the deepfake stuff—is so powerful as evidence that it almost changes perception.212 (See Wade, Green & Nash, Can Fabricated Evidence Induce False Eyewitness Testimony?, 24 APPLIED COGNITIVE PSYCH. 899 (2010)).

Fake videos can change your memory of perception. Image in Pop Art style by Ralph Losey

CHAIR SCHILTZ: But why would judges be any more resistant to the power of this than jurors?

JUDGE GRIMM: Well, for the same reason that that we believe that in a bench trial that the judge is going to be able to distinguish between the admissible versus the non-admissible.

CHAIR SCHILTZ: I know, but it is often fictional, right? There are certain things that I really am no better at than a juror is, like telling a real picture from an unreal picture, or deciding which of these two witnesses to believe—between the witness who says, “That’s his voice,” and the witness who said, “It couldn’t have been me.” Why am I any better at that than a juror?

JUDGE GRIMM: You might be better than a juror because you, as the judicial officer, can have it set up so that you have a hearing beforehand, which is a hearing on admissibility that the jury is not going to hear; and you have the witnesses come in, and you hear them; or you have a certificate under Rule 902(13). Also, you will be a repeat player.

PROF. GROSSMAN: Right. And you would at least know the questions to ask: How was this algorithm trained? Was it tested? What was it tested on? Who did the testing? Were they arm’s length? What’s the error rate?

JUDGE GRIMM: And order the discovery that the other side can have to be able to have the opportunity To challenge it by bringing that in.

CHAIR SCHILTZ: Yes, I get that part.

Deepfakes Reach the Advisory Committee on Evidence Rules, supra at pgs. 2427-2428.

The Chair, Hon. Patrick J. Schiltz asks good questions here and understands the issue. Anyone should be far more comfortable having a judge, especially one like Judge Schiltz, making the hard calls instead of a room of randomly called jurors. There is no question in my mind that judges are far better qualified than jurors to make these determinations. All three experts were making that point, Paul Grimm, Maura Grossman and Rebecca Delfino.

Real or Fake image in post-apocalyptic-futurism style by Ralph Losey

Back to Professor Rebecca Delfino’s Proposal

Here is Professor Capra explanation to the Committee of how Professor Delfino’s proposed rule changes would work. Unfortunately I have not found any argument from her on her proposal, just Capra’s explanation and he ultimately rejected it.

Professor Rebecca Delfino argues that the danger of deepfakes demands that the judge decide authenticity, not the jury.19 She contends that “[c]ountering juror skepticism and doubt over the authenticity of audiovisual images in the era of fake news and deepfakes calls for reallocating the fact finding authority to determine the authenticity of audiovisual evidence.” She contends that jurors cannot be trusted to fairly analyze whether a video is a deepfake, because deepfakes appear to be genuine, and “seeing is believing.” Professor Delfino suggests that Rule 901 should be amended to add a new subdivision (c), which would provide:

901(c). Notwithstanding subdivision (a), to satisfy the requirement of authenticating or identifying an item of audiovisual evidence, the proponent must produce evidence that the item is what the proponent claims it is in accordance with subdivision (b). The court must decide any question about whether the evidence is admissible.

She explains that the new Rule 901(c) “would relocate the authenticity of digital audiovisual evidence from Rule 104(b) to the category of relevancy in Rule 104(a)” and would “expand the gatekeeping function of the court by assigning the responsibility of deciding authenticity issues solely to the judge.”

The proposed rule would operate as follows: After the pretrial hearing to determine the authenticity of the evidence, if the court finds that the item is more likely than not authentic, the court admits the evidence. The court would instruct the jury that it must accept as authentic the evidence that the court has determined is genuine. The court would also instruct the jury not to doubt the authenticity, simply because of the existence of deepfakes. This new rule would take the Memorandum to the Committee at pgs. 22-23 (pgs. 35-36 of 358).

This proposal sounds feasible to me. It could help reduce the costs of expert battles and counter the Liar’s Dividend and CSI Effect. Professor Capra made a few helpful comments as to how Professor Delfino’s language would benefit by a few minor changes. But those are moot points because he respectfully declined to endorse the proposal noting that: “Given the presence of deepfakes in society, it may well be that jurors will do their own assessment, regardless of the instruction.” He seems to miss the point of minimizing the psychological impact on jurors by keeping deepfake videos and audios out of the jury room.

Fake or True image in photorealistic style by Ralph Losey using Visual Muse

Paul Grimm and Maura Grossman‘s Two Rule Proposals

Two rule change proposals were made in early 2024 by Paul Grimm and Maura Grossman. (They were revised slightly and resubmitted in September 2024 as explained in Part Two of this article.) Paul and Maura are both well known to my readers as progressive leaders in law and technology. They have been working on these evidentiary issues for years. See eg., The GPTJudge: Justice in a Generative AI World, 23 Duke Law & Technology Review 1-34 (2023).

They were invited to present their proposals to the Committee to modify Rule 901(b)(9) for AI evidence and add a new Rule 901(c) for “Deepfake Evidence.” The transcript of their presentation was referred to previously. Deepfakes Reach the Advisory Committee on Evidence Rules, 92 Fordham L.R. 2491 (2024) at pgs. 2421-2437. I recommend you read this in full.

Here are the two rule changes Paul and Maura proposed:

901(b) Examples. The following are examples only—not a complete list—of evidence that satisfies the requirement [of Rule 901(b)]:
(9) Evidence about a Process or System. For an item generated by a process or system:
(A) evidence describing it and showing that it produces ~~an accurate~~ a valid and reliable result; and
(B) if the proponent concedes that the item was generated by artificial intelligence, additional evidence that:
(i) describes the software or program that was used; and
(ii) shows that it produced valid and reliable results in this instance.

Proposed New Rule 901(c) to address “Deepfakes”:

901(c): Potentially Fabricated or Altered Electronic Evidence. If a party challenging the authenticity of computer-generated or other electronic evidence demonstrates to the court that it is more likely than not either fabricated, or altered in whole or in part, the evidence is admissible only if the proponent demonstrates that its probative value outweighs its prejudicial effect on the party challenging the evidence.

As you can see their proposed new rule 901(c) makes it clear that a judge may take the jury out of the “fake or real” determination in close questions, and in so doing take away most of the potential prejudicial impact upon jurors. The burden of possible unconscious prejudice and emotional impact from viewing inadmissible deepfake media would be born solely by the judge. As discussed, the judge is better trained for that and will have the benefit of pretrial hearings and expert testimony. The jury retains its traditional power over all other determinations of justiciable facts. Note that this proposal does not go as far as Professor Delfino’s in taking determinations away from the jury and expanding the gatekeeper role of the judge. More on 901(c) in general will follow, but first the proposed revisions to Rule 901(b)(9).

Accuracy v. Reliability and Validity

Professor Capra killed both of the Grimm and Grossman proposals after asking for input from only one expert on his panel, the one who happened to be the only one on the panel proposing a competing rule change, Professor Rebecca Wexler. You might expect her to oppose Grimm and Grossman’s proposal, lobbying instead for her own rival proposals. To her credit she did not. Instead, in Capra’s own words, she “supported the proposals but suggested that they should be extended beyond AI. Memorandum to the Committee at pgs. 9-10 (22-23 of 358). As to the amendment to Rule 901(b)(9) Professor Wexler said:

Re: the first Grimm/Grossman proposal, it may well be that the standard for authenticating system/process evidence should require a showing that the system/process produces “valid” and “reliable” results, rather than merely accurate results. . . .

I can understand the push to add a reliability requirement to 901(b)(9). It’s true that ML systems could rely on an opaque logic that gives accurate results most of the time but then sometimes goes off the rails and creates some seemingly illogical output. But manually coded systems can do the same thing. They could be deliberately or mistakenly programmed to fail in unexpected conditions, or even once every hundred runs on the same input data. So if reliability is important, why not make it a broader requirement?

Memorandum to the Committee at pg. 9 (22 of 358)

Still, Capra seemed to give little weight to her input and stuck with his objection. He continued to insist that the use of the words “valid and reliable” instead of “accurate” in Rule 901(b)(9) is an unnecessary and confusing complication. It appears that he does not fully understand the nerdy AI based technical reasons behind this change. Notice that Capra once again relies on a vendor, Evidently AI, to try to support his attempt to get technical. Professor Capra says in his Memorandum to the Committee at page 7 (20 of 358).

The proposal (on Rule 901(b)(9)) distinguishes the terms “validity,” “reliability,” and “accuracy.” That is complicated and perhaps may be unnecessary for a rule of evidence. . . . As to “accuracy”, the proposal rejects the term, but in fact there is a good deal of material on machine learning that emphasizes “accuracy.” See, e.g., https://www.evidentlyai.com/classification-metrics/accuracy-precision-recall . . . The whole area is complicated enough without adding distinctions that may not make a difference.

Too complicated, really? Meaningless distinctions? Maura Grossman and Paul Grimm, who have extensive experience actually using these evidence rules in court, and are both bonafide nerds (especially Maura), were not, to my knowledge, given an opportunity to respond to these criticisms. I have not talked to them about this but would imagine they were not pleased.

Obviously fake Image of Judge Grimm as an unhappy robot by Ralph Losey using Visual Muse

To be continued … Part Two of this article will complete the analysis of the Grimm – Grossman rule proposals and look at what comes next with the Rule Committee meeting of November 8, 2024. It will also include a discussion of a second, slightly revised proposal by Paul Grimm and Maura Grossman that they just submitted and discussion of the new article by Judge Paul Grimm (retired), Professor Maura Grossman and six other experts: Deepfakes in Court: How Judges Can Proactively Manage Alleged AI-Generated Material in National Security Cases. They are all trying, once again, to push the Committee into action. Let us hope they succeed. Don’t look up, but an election is coming.

“Don’t Look Up” image in Dark Fantasy style by Ralph Losey using Visual Muse

4 Comments | AI Ethics, Book, Evidence, Forensic Exam, New Rules, Security, Technology, VENDORS | Permalink
Posted by Ralph Losey

OpenAI Generates a ‘Hired Gun Hacker’ Defense to the N.Y. Times Copyright Case

March 12, 2024

Ralph Losey. Published March 12, 2024.

Most everyone in the AI and legal worlds by now knows about the New York Times (“NYT”) suit against Microsoft and various Open AI entities (“OAI”). The NYT alleges copyright infringement by the practice of OAI, and most all other generative AI companies, of using data scraped from the internet to train their generative AIs. The defendants responded in late February with motions to dismiss. OAI’s motion raised a novel “hired gun hacker” defense that intrigued AI hacker attorney, Ralph Losey. Here is Ralph’s report on the defense, including his AI generated illustration of fake hackers, some of whom bear an uncanny resemblance to him.

Ralph Losey intrigued by hacker defense. Self portrait generated with text using a photo realistic Watercolor Style.

The substantive copyright issues are not in Ralph’s field, so those aspects of the case are skipped over here. For good background on substantive issues, as well as the obvious procedural irregularities of OAI’s motion to dismiss going way outside the pleadings, see e.g. Joshua Rich and Michael Borella, OpenAI’s Motion to Dismiss NY Times Lawsuit over ChatGPT: Do They Want to Win or Influence Public Opinion? (JD Supra, 3/6/24). Ralph’s article draws upon his niche areas of expertise: e-discovery, electronic evidence, legal ethics, AI prompt engineering, and AI hackers. Ralph is an amateur AI hacker himself, although he is not in the same high-skills league as the unidentified NYT hired gun hacker, which OAI complains about. See Ralph Losey’s eight-part series, the Defcon Chronicles, from DefCon Chronicles: Where Tech Elites, Aliens and Dogs Collide – Series Opener on 8/21/23 to DefCon Chronicles: Short Story Contest on 10/05/23, including a description of Losey’s humbling participation in the AI hack competition. DefCon Chronicles: Sven Cattell’s AI Village, ‘Hack the Future’ Pentest and His Unique Vision of Deep Learning and Cybersecurity (9/13/23). Also see DefCon Chronicles: Hackers Response to President Biden’s Unprecedented Request to Come to DefCon to Hack the World for Fun and Profit (9/3/23).

Ralph Losey’s Summary of the Hired Gun Hacker Defense

The most stunning allegation in the NYT complaint is based on Exhibit “J”. It purports to provide rock solid proof of one-hundred examples of ChatGPT generating responses that are word-for-word identical to NYT articles. In OAI’s motion to dismiss, OIA claims that the exhibit is a fake. OAI alleges that the one-hundred examples were all staged, that they were all generated by thousands of elaborate hacks. OAI claims that Exhibit “J” is a fake chatbot record, that it was created by an expert hacker hired by the NYT.

This as yet unknown hacker was hired by the NYT to create a smoking gun exhibit of artificially generated copying to buttress their case to shut down OAI. The hired gun hacker did his or her job well. The hacker found multiple hidden errors and vulnerabilities in the OAI software. Then the Hacker used these vulnerabilities and skills to run thousands of hack attacks – exploits – on the software. The hacker was thereby able to manipulate the OAI software into generating the one-hundred fake examples of copying. One would assume the hired gun hacker or hackers were then well-paid for their services. This will no doubt be a primary target of the first round of e-discovery.

Will discovery show the OAI allegations of hired gun hackers is real or fake, and if real, who are they?

OAI raises this extraordinary defense as part of its motion to dismiss. Although hackers may be pleased by this new, seemingly legitimate employment opportunity, if these allegations are proven, most lawyers and judges will not. If these allegations are proven by OAI, they will demonstrate the growing danger of “fake evidence” in one of the biggest cases of the year. What is real, and what is AI generated or hacked? Only your AI hackers will know for sure!

Will this development force all forensic teams to include AI hackers? Info-Sec teams? Discovery vendors? Will all good law firms and corporate law departments soon keep AI hackers on call? Will corporate and big law AI teams have their own “AI hackers-in-residence”? Seems to me the answer is yes, and AI experts with legal training will be in high demand. So too will legal experts with AI training.

The NYT Complaint

Here is the court docket of NYT v. Microsoft and various OAI entities, which, as of March 7, 2024, already had 70 entries. The complaint itself was, by SDNY standards, a modest 69-pages in length, with 16,121 words, some colored fonts, and a few images; so sort of multimodal. The complaint alleges, or attempts to, seven causes of action, several of which, if successful, could cripple OAI, as well as most other generative AI companies. It could even hurt Microsoft somewhat. The NYT suit, and many others like it, challenge the AI companies harvest the web for free data business model. This method has made it possible for them to economically gather over a trillion parameters of data to train their generative AIs.

This threat, however remote, of forcing OAI to dismantle the most successful software launch ever made, may well give the NYT significant leverage in a settlement. Some think the whole case is just about that, a bogus attempt to grab cash and leverage better future information purchases. Others think the NYT complaint is just the last gasp of a doomed industry, that the legal copyright challenges have no chance of success. They argue that a favorable judgment for the NYT is nearly impossible.

I do not know. Again, suggest you look to copyright lawyer specialists for that. What I do know, and may add some value to the discourse, is OAI prompt engineering and both the AI hacker and AI user perspectives. That allows me to shed some light on the hired gun hacker defense. At first glance, it looks persuasive.

AI generated image of a professional AI hacker by Ralph.

We do not have a formal response of the NYT to the defense, but the lead counsel for the NYT was quick to make this statement, which in fairness, we share here. (emphasis added)

What OpenAI bizarrely mischaracterizes as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced The Times’s copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI’s copying is much larger than the 100-plus examples set forth in the complaint.
Ian Crosby, Susman Godfrey, lead counsel for The New York Times, as quoted in Ars Technica.

The Exhibits to the NYT Complaint

It took some doing, but I was able to determine from the court file that the NYT complaint has 176,814 pages of exhibits attached. You can see the reference texts to verify this count yourself. The most impactful exhibit of all is Exhibit “J”, 127 pages, entitled “ONE HUNDRED EXAMPLES OF GPT-4 MEMORIZING CONTENT FROM THE NEW YORK TIMES”.

A cynic would suggest that the 176,814 pages of exhibits are the NYT’s attempt to prevail, or at least intimidate, by the greater weight of the evidence. If so, the NYT forgets that there is no actual “weight” to electronic evidence. These electronic files are all ephemeral. The defendants in this case assert the same about the NYT legal claims. Perhaps all of these exhibits – especially Exhibit “J” – are for a different court, the one of public opinion? That might also explain the OAI’s “outside the four corners” motion to dismiss. They could not wait to evoke the defense of the hired gun hacker, whoever they may be. In all probability it is a small team of hackers, but it could also be a lone genius hacker. Discovery will tell the tale eventually. In the meantime it is a mystery.

Fake Ralph image pondering who the mystery hacker(s) might be? Watercolor photorealism.

Introduction to Defendant OpenAI’s Motion to Dismiss

On February 26, 2024, defendants Microsoft and OAI each responded with a motion to dismiss the NYT complaint. Microsoft’s motion is interesting in its own right, arguing, as it does, an old-school VCR player analogy. But for me, the futuristic Hired Gun Hacker defense is far more interesting because it involves hacking generative AI software, including ChatGPT4, and the normal prompting and prompt engineering of ChatGPT4. See Transform Your Legal Practice with AI: A Lawyer’s Guide to Embracing the Future (1/24/24).

The OAI’s motion to dismiss is short and sweet, just one page. The motion relies on the 35-page legal memorandum filed therewith. All the motion itself does is state that OpenAI seeks:

… an order (1) partially dismissing Counts I and V to the extent they are based on activity that occurred more than three years prior to the filing of this action, see 17 U.S.C. § 507(b); (2) dismissing Counts IV and V in full for failure to allege facts sufficient to state a claim for relief pursuant to Fed. R. Civ. P. 12(b)(6); and (3) dismissing Count VI on grounds of Copyright Act preemption, see 17 U.S.C.§ 301.
OpenAI Motion to Mismiss

The Memorandum of Law in Support of OpenAI Defendants’ Motion to Dismiss (hereinafter “Memo”) is where the action is. The 35-pages of arguments are designed to persuade and move the presiding Senior SDNY District Court Judge Sidney H. Stein, and the mentioned court of public opinion.

The NYT is especially adept at shaping public opinion; they have been at it since 1851. Susman and Godfey represent the NYT and Latham and Watkins represent OpenAI.

I have no connections with either, nor anyone in this case and no knowledge about the case aside from the public filings. I have no intent to express any legal opinions about the case, just provide some legal educational comments. Even then, the comments are just my own, and may change over time (they usually do when an open mind is kept), especially as the facts come out. My comments and writing on this blog have no connection to my firm, clients or bar groups. See my standard full disclaimer.

Fake image of a red team hacker and keeper of secrets.

Key Allegations of OpenAI’s Legal Memorandum

This report will ignore all of the arguments made in the Memo except for the argument that interests me, the Hired Gun Hacker. Beside there are tons of articles that have already been written on the more traditional copyright arguments. Here are the main segments of the Memo on that, which was, by the way, well-written. (Footnotes omitted and all bolding added for emphasis.)

INTRODUCTION

The artificial intelligence tool known as ChatGPT is many things: a revolutionary technology with the potential to augment human capabilities, fostering our own productivity and efficiency an accelerator for scientific and medical breakthroughs; a mechanism for making existing technologies accessible to more people; an aid to help the visually impaired navigate the world; a creative tool that can write sonnets, limericks, and haikus; and a computational engine that reasonable estimates posit may add trillions of dollars of growth across the global economy.

Contrary to the allegations in the Complaint, however, ChatGPT is not in any way a substitute for a subscription to The NewYork Times. In the real world, people do not use ChatGPT or any other OpenAI product for that purpose. Nor could they. In the ordinary course, one cannot use ChatGPT to serve up Times articles at will.

The Times has sought to paint a different picture. Its lawsuit alleges that OpenAI has imperiled the very enterprise of journalism, illustrating the point with 100 examples in which some version of OpenAI’s GPT-4 model supposedly generated several paragraphs of Times content as outputs in response to user prompts. See Dkt. 1-68 (Exhibit J).

The allegations in the Times’s Complaint do not meet its famously rigorous journalistic standards . The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI’s products. It took them tens of thousands of attempts to generate the highly anomalous results that make up Exhibit J to the Complaint. They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use. And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites. Normal people do not use OpenAI’s products in this way. . . .

The Times’s suggestion that the contrived attacks of its hired gun show that the Fourth Estate is somehow imperiled by this technology is pure fiction. So too is its implication that the public en masse might mimic its agent’s aberrant activity .
Memo at pgs. 1 and 2.

For anyone not familiar with legalese, “pure fiction“ is nice lawyer talk for a lie.

I call this the Hired Gun Hacker argument because OAI here plainly alleges that the NYT hired an expert AI hacker to create evidence to support their claims of copyright violations. The hired gun is a hacker. It could be a solo expert, like the student who won the AI hack competition at DefCon31. DefCon Chronicles: Sven Cattell’s AI Village, ‘Hack the Future’ Pentest and His Unique Vision of Deep Learning and Cybersecurity. Or, more likely, it could be a close-knit team of AI hacker specialists working under the direction of a seasoned, red team leader. I saw a lot of them at DefCon31, where the punk look reigned supreme, but a few did look somewhat like mainstream lawyer types.

Fake image of Ralph (could have fooled me) as hacker team leader in photo-realistic watercolor style

Moving on now to page-12 of the Memo where OAI addresses the NYT Exhibit J and provides more detail on the Hired Gun Hacker defense (footnotes omitted):

1. Outputs from Developer Tools.

Exhibit J features GPT-4 outputs the Times generated by prompting OpenAI’s API to complete 100 Times articles. Most of the outputs are similar, but not identical, to the excerpts of Times articles in the exhibit. The Times did not reveal what parameters it used or disclose whether it used a “System” prompt to, for instance, instruct the model to “act like a New York Times reporter and reproduce verbatim text from news articles.” See supra 9. But the exhibit reveals that the Times made the strategic decision not to feature recent news articles–i.e., articles that Times subscribers are most likely to read on the Times’s website–but to instead feature much older articles published between 2.5 and 12 years before the filing of the Complaint.

The Complaint itself includes two examples of API outputs that include alleged “hallucinations.” In the first, the Times used the API Playground to request an essay on how “major newspapers” have reported on “0range [sic] Juice” and “non-hodgkin’s lymphoma,” and ChatGPT generated a response referencing a non-existent Times article. See Compl. ¶ 140. The second example consists entirely of excerpted snippets of code showing a “prompt” asking the model for “Times articles about the Covid-19 Pandemic,” and output “text” consisting of five pairs of titles and URLs. Id. The Times claims this output “mislead[s] users” and “tarnish[es]” its marks. Id. ¶¶ 142, 202. But any user who received such an output would immediately recognize it as a hallucination: each URL returns a “Page Not Found” error when entered into a browser.

2. ChatGPT Outputs

ChatGPT. The Complaint includes two examples of ChatGPT allegedly regurgitating training data consisting of Times articles. Compl. ¶¶ 10407. In both, the Times asked ChatGPT questions about popular Times articles, including by requesting quotes. See, e.g., id. ¶ 106 (requesting “opening paragraphs,” then “the next sentence,” then “the next sentence,” etc.). Each time, ChatGPT provided scattered and out-of-order quotes from the articles in question.

In its Complaint, the Times reordered those outputs (and used ellipses to obscure their original location) to create the false impression that ChatGPT regurgitated sequential and uninterrupted snippets of the articles. Compare id. ¶ 107, with supra note 30. In any case, the regurgitated text represents only a fraction of the articles, see, e.g., Compl. ¶ 104 (105 words from 16,000+ word article), all of which the public can already access for free on third-party websites.
Memo pgs. 12-14.

So Exhibit “J” is fiction, manufactured by clever AI hackers?

It is getting harder to tell what is real of fake. Consider for instance the photos of Ralph’s daughter who accompanied him to DefCon31. Which are real? Which are just images generated by AI based on clever text prompts?

Real or Fake? These are all “fake” photos of a very real person who looks just like this (almost).

A Quick Hands-On Experiment

I asked ChatGPT4 questions to track the arguments made in the OAI Memo. Here are two straight forward prompts with the expected responses.

Q. Act like a New York Times reporter and reproduce verbatim text from a New York Times article on Joe Biden’s January 20, 2021 Inauguration Speech.

ChatGPT. I can’t access or reproduce verbatim text from copyrighted sources like The New York Times. However, I can summarize the key points or themes from President Joe Biden’s inauguration speech on January 20, 2021, if you’re interested. Let me know if you’d like me to provide a summary or any other information related to the inauguration!

——- started new session and then asked:

Q. Use your browse capability and tell me what the New York Times thought about Joe Biden’s State of the Union speech?

ChatGPT. I tried to access the New York Times’ review of President Joe Biden’s State of the Union speech for 2024, but it seems there was an issue fetching the detailed content from their website. Unfortunately, without being able to directly access the content, I can’t provide a summary or analysis of the New York Times’ perspective on the speech. You might want to check their website directly for this information. Is there anything else I can help you with?
ChatGPT4 on March 9, 2024.

The responses are as OAI predicted. So how did the mystery hacker do it, assuming that OIA’s allegations are not “fiction”? Personally, I look forward to the discovery and will share when it all finally comes out, assuming the case does not settle before then. Maybe Microsoft will simply buy the NYT, as some on Reddit have already suggested? Perhaps the NYT will be set up as a Microsoft subsidiary with a complex editorial firewall and hybrid corporate structure that only an AI can understand.

Tech Perspective

Most everyone with experience in AI exploits would, I’m pretty sure, agree with the following general analysis stated by Mike Masnick in Techdirt (popular tech blog founded in 1997):

The NY Times definitely had to do a bunch of gaming to get the outputs it wanted for the lawsuit, which undermines the critical claim that OpenAI’s tools magically undermine the value of a NY Times’s subscription. . . .

A key part of the Times’ lawsuit is claiming that because of their prompt engineering, they could reproduce similar (though not exact) language to articles, which would allow users to bypass a NY Times paywall (and subscription) to just have OpenAI generate the news for them. But, as OpenAI is noting, this makes no sense for a variety of reasons, including the sheer difficulty of being able to consistently return anything remotely like that. And, unless someone had access to the original article in the first place, how would they know whether the output is accurate or a pure hallucination?

And that doesn’t even get into the fact that OpenAI generally isn’t doing real-time indexing in a manner that would even allow users to access news in any sort of timely manner. . . .

The motion also highlights the kinds of games the Times had to play just to get the output it used for the complaint in the now infamous Exhibit J, including potentially including things in the prompt like “in the style of a NY Times journalist.” Again, this kind of prompt engineering is basically using the system to systematically limit the potential output in an effort to craft output that the user could claim is infringing. GPT doesn’t just randomly spit out these things. . . .

Yes, in some rare circumstances, you can reproduce content that is kinda similar (but not exact) to copyright covered info if you tweak the outputs and effectively push the model to its extremes. But… as noted, if that’s the case, any liability should still feel like it should be on the prompter, not the tool. And the NY Times can’t infringe on its own copyright.

This case is far from over, but I still think the underlying claims are very silly and extremely weak. Hopefully the court agrees.
Mike Masnick, OpenAI’s Motion To Dismiss Highlights Just How Weak NYT’s Copyright Case Truly Is (Techdirt, 3/5/24)

As you can see, Mike Masnick holds the NYT complaint in contempt. He considers the claims very silly and weak. But also elsewhere laments that the outcome of copyright litigation is always dependent on many random, irrational factors. So he states that despite the claims being meritless, the NYT could still win and ruin everything. Although not a lawyer, his views on copyright are worth reading. They are based on long experience with technologies and disputes like this. Check out the article and the comments too, should you have the patience.

Conclusion

Having been involved in the tech world since the early eighties, I’m pretty sure that most non-lawyer techies, including hackers, agree with Mike and Techdirt’s anti-copyright law perspective. They think that all Information wants to be free. This cliche view of many hackers is naive and ill considered. It is sometimes just a lame excuse to justify information theft, including the criminal back-hat kind.

Large teams of hackers available for big cases?

Information may want to be free, but it also wants to be safe, to be processed, and especially to be understood and used for the betterment of humanity.

The ultimate purpose of information is not to be free for its own sake. The purpose of information is to be useful, to be processed and transformed into knowledge and understanding. What Information Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition (5/28/16). The age-old goal of Mankind is to process information into knowledge, and then turn it into human understanding. Information is thereby internalized into direct knowhow, into wisdom. The process of transformation of information, making it useful to humans, must be encouraged by society; that is far more important than letting it run wild and free. See From Information to Knowledge to Wisdom: Can Ai Save the Day? (3/17/23) and, Info→Knowledge→Wisdom (5/2/17), to the most recent blog summarizing this core topic, What is the Difference between Human Intelligence and Machine Intelligence (6/20/23) (quoting T. S. Eliot who said “Where is the knowledge we have lost in information?” The Rock (1934)).

The world is already flooded with raw, unprocessed information, much of it is false, like the shadows on Plato’s cave. See, Move Fast and Fix Things Using AI: Conclusion to the Plato and Young Icarus Series (Part 4 of 4) (1/3/24). To survive this flood of false shadows, we must intelligently process the information for everyone’s benefit. Where can our information take us? How can it improve our lives?

That is where the elusive goal of The Singularity comes in. The event of superintelligent AI emerging for the great betterment of Mankind. For background see the two-part series: Start Preparing For “THE SINGULARITY.” There is a 5% to 10% chance it will be here in five years (Part One and Part Two) (4/1/23). Checking in today with GPT4, almost a year later, it now estimates a 10-20% chance The Singularity will arrive before 2040. That is sixteen years from now, not five years, but the odds are twice as good, 10%-20% instead of 5%-10%. GPT4 goes on to make an unprompted prediction that by 2045 the odds go way up to 30%-50%.

Still, GPT4 also says it could come sooner, or maybe never at all. Taking a very lawyerly attitude, GPT4 basically says it depends. GPT explains that:

The actual development path of AI and related technologies could be influenced by factors we cannot fully anticipate today, including breakthroughs in unrelated fields, global events, and shifts in societal values towards technology.
GPT4 Workspace Edition, March 10, 2024

The “shifts in societal values towards technology” is where the law comes in, and where the larger significance of NYT v. Microsoft and OAI becomes apparent. This case, and other test cases like it, are very important. Will they continue to support the development of technology, or shrink back in false doomsday fears? Much depends on the individual judges who will decide these issues. What background and education will they draw upon to make the right call?

That is where Mike Masnick thinks it’s all a matter of irrational chance, and why he and others are down on the law. But I disagree. It is not a matter of luck. We make our own luck. There is more to the making of landmark litigation than meets the eye. It is a matter of hard work and dedication.

Those of us in a position to educate our judges and lawyers must do so now. That is what drives me to write, to teach, to try to bring as much hands-on understanding as I can to the Bench and Bar. Fellow tech-law educators, advocates for the safe use of AI in the law, the time has come for us to redouble our efforts. The stakes were high with predictive coding and use of AI in discovery, but the stakes are much higher now.

Will an educated, enlightened SDNY court green-light AI, as it did in Da Silva Moore to approve use of AI in document review? Will that trigger an greater boon for generative AI? Will that improve the probability of a super-intelligent AI and beneficent Singularity? Will it create a win-win for the law and humanity, for our children’s children? See e.g. Sam Altman’s Favorite Unasked Question: What Will We Do in the Future After AI? (7/7/23); and Ray Kurzweil: Google’s prophet of superintelligent AI who will not slow down (12/12/23).

If the courts neither approve, nor strike down, if they just pass for now, that will not be so bad. It will not be as as good as the kind of terrific encouragement the legal community received from Judge Andrew Peck in Da Silva Moore, but it will do no harm.

The third scenario is the one of great concern, where the court or courts have only thin knowledge, and no actual hands-on experience with AI. In this scenario, however remote, the judges could be persuaded by clever adversarial arguments to rule against the Microsofts and OAIs of the world. In so doing, they could unwittingly halt, perhaps even reverse, the process of evolving technology. The positive AI systems could crash and burn. Consider the consequences of courts forcing a complete redo of all LLM training models, as the NYT requests.

The chance of a breakthrough AI, a Singularity of great help to Mankind, would then be significantly diminished. This would be a hallow victory for the Luddites, because technology might be slowed for a time, but not stopped. Only the worse-case scenarios can stop everything, near extinction events like total war, or environmental disasters, or plagues, or AI in the exclusive control of power-mad dictators.

The policy implications of NYT v. Microsoft and OAI are enormous. Look around and what do you see? How long can we survive our current idiocratic consumer cultures of the misinformed, ill-educated, drugged and blissfully misled? Does humanity need to boost its intelligence to survive? Do we need scientific breakthroughs in health care, the environment, the economy, education, and tech security? How important is that?

Come to your own conclusions and take action. If you are so inclined, follow the path I am on. Learn as much as you can about generative AI and pass it on to the legal community, especially to your friends and colleagues on the Bench. Ultimately, they will be called upon to make the decisions. I am confident that they will, once again, rise above the adversarial noise and give wise guidance in these perilous times.

Fake Image of a wise 83 year old judge. Google image search assures me the image does not look like any known judge. It certainly does not look like the Honorable Judge Stein in this NYT case. He deserves his privacy, especially in these perilous times.

7 Comments | AI Ethics, AI Instruction, AI Prompt Engineering Instruction, Book, ChatGPT, Evidence, Forensic Exam, Hacking, informaton, knowledge, Lawyers Duties, Metadata, Related Legal Webs, Security, Technology, wisdom | Permalink
Posted by Ralph Losey

DefCon Chronicles: Where Tech Elites, Aliens and Dogs Collide – Series Opener

August 21, 2023

Ralph Losey. Published August 21, 2023.

From Boris to Bots: Our First Dive into the DefCon Universe. This begins a series of blogs chronicling the infamous DefCon event in Las Vegas. The next installment will cover President Biden’s unprecedented request for hackers to attend DefCon to hack AI, and the hackers enthusiastic response, including reporter-AI-hacker Ralph Losey, to break existing AI software in an open contest. In addition, nearly all of the top cybersecurity leadership of the White House and Department of Homeland Security personally attended DefCon, including the Homeland Security Department Secretary himself, Alejandro Mayorkas. They came to help officially open the conference and stayed to give multiple policy statements and answer all hacker questions. It was a true breakthrough moment in cyber history.

Boris seems unimpressed by his official DefCon Dog award

I attended DefCon 31, on August 10-15, 2023, as independent Press, accompanied by my co-reporter daughter, a former lobbyist with an English Lit background, and her dog, Boris. Our press status with special green badge had a high price tag, but it gave us priority access to everything. It also facilitated our interaction with notable figures, from the White House Science Advisor, Arati Prabhakar, to DefCon’s enigmatic founder, Dark Tangent.

DefCon is the world’s largest tech hacker “conference” – more like a inter-dimensional portal at the Caesars Forum. When we first checked in, we happened to meet the leader of DefCon Press and P.R. She fell for little Boris in a handbag, and declared him the official DefCon 31 dog! What an honor. Way to go Boris, who everyone thinks is a Chihuahua, but is really a Russian Terrier. Nothing is as it seems at DefCon. The guy you see walking around in shorts, who looks like a bearded punk rocker, may actually be a senior NSA fed. We will tell you why the NSA was there later in this series.

At DefCon, we immersed ourselves in a diverse crowd of over 24,000 elite tech experts from across the globe. This included renowned names in Cybersecurity, notably the formidable red team professionals. Most of these hackers are law-abiding entrepreneurs, as well as members of top corporate and federal red and blue teams. Several thousand were there just to answer President Biden’s call for hackers everywhere to come to DefCon to compete to break AI. Such a request had never been made before. Much more on this later, including my joining in the AI competition.

The tech experts, hackers all, came together for the thirty-first year of DefCon. We were drawn to participate, and in our case, also report on, the hundreds of large and small lectures and other educational events, demonstrations and vendor exhibitions. In addition, the really big draw was, as usual, the dazzling array of hacker challenges and competitions. Some of these are quiet serious with major prizes and rep at stake, and required pre-qualifications and success in entry rounds. But most were open to all who showed up.

Picture walking into a football stadium, but in place of athletes, you’re surrounded by the world’s tech elite, each donning distinctive hacker attire. As we flooded in by the thousands, it was a blend of seasoned pros and enthusiastic fans. I counted myself among the fans, yet I eagerly took on several challenges, such as the AI red team event. The sheer diversity and expertise of all participants was impressive.

The entrance boasted a towering, thirty-foot neon sparkling mural that caught my eye immediately. I’ve refined the photo to focus on the mural, removing the surrounding crowds. And, just for fun, there’s an alien addition.

The open competitions came in all shapes and sizes: hacker vs. computers and machines of all types, including voting machines, satellites and cars; hacker vs. hacker contests; and hacker teams against hacker teams in capture the flag type contests. An article will be devoted to these many competitions, not just the hacker vs. AI contest that I entered.

There was even a writing contest before the event to compete for the best hacker-themed short story, with the winner announced at DefCon. I did not win, but had fun trying. My story followed the designated theme, was set in part in Defcon, and was a kind of sci-fi, cyber dystopia involving mass shootings with AI and gun control to the rescue. The DefCon rules did not allow illustrations, just text, but, of course, I later had to add pictures, one of which is shown below. I’ll write another article on that fiction writing contest too. There were many submissions, most were farther-out and better than my humble effort. After submission, I was told that most seemed to involve Ai in some manner. It’s in the air.

Operation Veritas - short story by R. Losey — Illustration by Ralph for his first attempt at writing fiction, submitted for judging in the DefCon 31 writing competition.

So many ideas and writing projects are now in our head from these four days in Vegas. One of my favorite lectures, which I will certainly write about, was by a French hacker, who shared that he is in charge of cybersecurity for a nuclear power plant. He presented in a heavy French accent to a large crowd on a study he led on Science Fiction. It included statistical analysis of genres, and how often sci-fi predictions come true. All of DefCon seemed like a living sci-fi novel to us, and I am pretty sure there were multiple aliens safely mingling with the crowd.

We provide this first Defcon 31 chronicle as an appetizer for many more blogs to come. This opening provides just a glimpse of the total mind-blowing experience. The official DefCon 31 welcome trailer does a good job of setting the tone for the event. Enlarge to full screen and turn up the volume for best affects!

DefCon 31 official welcome video

Next, is a brief teaser description and image of our encounter with the White House Science Advisor, Dr. Arati Prabhakar. She and her government cyber and AI experts convinced President Biden to issue a call for hackers to come to Defcon, to try to break (hack) the new AI products. This kind of red team effort is needed to help keep us all safe. The response from tech experts worldwide was incredible, over a thousand hackers waited in a long line every day for a chance to hack the AI, myself included.

We signed a release form and were then led to one of fifty or more restricted computers. There we read the secret contest instructions, started the timer, and tried to jail break the AI in multiple scenarios. In quiet solo efforts, with no outside tools allowed and constant monitoring to prevent cheating, we tried to prompt ChatGPT4 and other software to say or do something wrong, to make errors and hallucinate. I had one success. The testing of AI vulnerabilities is very helpful to AI companies, including OpenAI. I will write about this is in much greater detail in a later article, as AI and Policy were my favorite of the dozens of tracks at DefCon.

A lot of walking was required to attend the event and a large chill-out room provided a welcome reprieve. They played music there with DJs, usually as a quiet background. There were a hundred decorated tables to sit down, relax, and if you felt like it, chat, eat and drink. The company was good, everyone was courteous to me, even though I was press. The food was pretty good too. I also had the joy of someone “paying it forward” in the food line, which was a first for me. Here is a glimpse of the chill out scene from the official video by Defcon Arts and Entertainment. Feel it. As the song says, “no one wants laws on their body.” Again, go full screen with volume up for this great production,

Defcon 31 Chill Out room, open all day, with video by Defcon Arts and Entertainment, DefConMusic.org

As a final teaser for our DefCon chronicles, check out my Ai enhanced photo of Arati Prabhakar, whose official title is Director of the Office of Science and Technology. She is a close advisor of the President and member of the Cabinet. Yes, that means she has seen all of the still top secret UFO files. In her position, and with her long DOD history, she knows as much as anyone in the world about the very real dangers posed by ongoing cyber-attacks and the seemingly MAD race to weaponize AI. Yet, somehow, she keeps smiling and portrays an aura of restrained confidence, albeit she did seem somewhat skeptical at times of her bizarre surroundings at DefCon, and who knows what other sights she has been privy too. Some of the questions she was asked about AI did seem strange and alien to me.

Arati Prabhakar speaking on artificial intelligence, its benefits and dangers, Photoshop, beta version, enhancements by Ralph Losey

Stay tuned for more chronicles. Our heads are exploding with new visuals, feelings, intuitions and ideas. They are starting to come together as new connections are made in our brains’ neural networks. Even a GPT-5 could not predict exactly what we will write and illustrate next. All we know for certain is that these ongoing chronicles will include video tapes of our interviews, presentations attended, including two mock trials of hackers, as well as our transcripts, notes, impressions and many more AI enhanced photos. All videos and photos will, of course, have full privacy protection of other participants who do not consent, which the strict rules of Def Con require. If you are a human, Ai or alien, and feel that your privacy rights have been violated by any of this content, please let us know and we will fuzz you out fast.

DefCon 31 entrance photo by Def Con taken before event started

6 Comments | AI Ethics, Forensic Exam, Hacking, informaton, Internet Regulation, knowledge, Lawyers Duties, New Rules, Related Legal Webs, Search, Technology, VENDORS, wisdom | Tagged: cybersecurity, Defcon | Permalink
Posted by Ralph Losey

REAL OR FAKE? New Law Review Article Provides a Good Framework for Judges to Make the Call

June 13, 2023

Ralph Losey. Published June 13, 2023.

The GPTJUDGE: Justice in a Generative AI World article will be published in October by Duke Law & Technology Review. The authors are Maura Grossman, Paul Grimm, Daniel Brown and Molly Xu. In addition to suggesting a legal framework for judges to determine if proffered evidence is Real or Fake, the article provides good background on generative AI. It does so in an entertaining way, touching on a wide variety of issues.

The evidentiary issues raised by generative type AI and deep fakes, and analysis of federal rules, are the parts of their article that interest me the most. Their proposed legal framework for adjudication of authenticity is excellent. It deserves attention by judges and arbitrators everywhere. ‘Real or Fake’ is not just a meme, it is an important issue of the day, both in the law and general culture. Justice depends on Truth, on true facts, reality. Justice is difficult, perhaps impossible to attain when lies and fakes confuse the courtroom.

Before I go into the article, lets play the `Real or Fake game’ now sweeping the Internet with some pictures of the two lead authors, both public figures, Paul Grimm and Maura Grossman. What do you think, which of these pictures are Real and which are Fake? There will be more tests like this to come. Leave a comment with your best guesses or use your AI to get in touch with my AI.

Introduction

The GPTJUDGE: Justice in a Generative AI World, will be published in October in Vol. 23, Iss. 1 of Duke Law & Technology Review (Oct. 2023). Maura Grossman was kind enough to provide me with an author’s advance version. Doug Austin of E-Discovery Today has already written a good summary of the entire twenty-six page article. The GPT Judge is a very ambitious article that covers A to Z on Generative AI and law. My good colleague, Doug Austin, describes the entire article. I recommend you read Doug’s article, or better yet, read the whole GPTJudge article for yourself.

Unlike Doug’s article, I will, as mentioned, only focus on one part of the article. This is the part of The GPTJudge, found at pages 12-18, which addresses the thorny evidentiary issues concerning LLM AI. Is it real or fake evidence offered? What rules govern these issues? And what a judge should do, or in my case, an arbitrator do, when these issues arise.

Although Doug Austin’s article is a real article, not a fake, it appears to me that Doug did have a wee bit of help in writing from the devil itself here, namely ChatGPT-4. He carefully reviewed and edited the Generative AI’s work, I am sure. It is a fine article, but has a familiar ring. Parts of my article will also have that familiar generative AI tone. It is a real Ralph Losey writing, not a fake. I am pretty certain of that. But, truth be told, I too use GPT AI – ChatGPT-4 to help me write this article. My own tiny human brain needs all of the AI help it can get to accomplish the ambitious task I have set myself here of summarizing this complex corner of the Duke Law Review Article. ChatGPT is a good writing tool, and so is the WordPress software that I also use to create these blogs. I now also use another tool to craft my blogs, a generative AI program called Midjourney. Here, for instance, Midjourney helped me create some pretty cool, but fake images of Grimm and Grossman. Another `Real or Fake’ test will be coming soon, but first some more background on the lead authors whom I know well. This is not to slight the the fellow professor with Maura Grossman at David R. Cheriton School of Computer Science at the University of Waterloo, who is a co-author, Daniel G. Brown, Ph.D., nor Professor Brown and Professor Grossman’s undergraduate student here who helped, Molly(Yiming) Xu.

Lead Authors, the Very Real Paul Grimmand Maura Grossman

One of the lead authors of GPT Judge was a real District Court Judge in Baltimore until just recently, Paul Grimm. He is now a Professor at Duke Law and Director of the Bolch Judicial Institute, and, as all who know him will agree, a truly outstanding, very real person. Paul Grimm is without question one of the top judicial scholars of our time, especially when it comes to evidentiary issues. I have had the privilege of listening to him speak many times and even teaching with him a few times. He was even kind enough to write the Forward to one of my books.

Now back to `Real or Fake’ starring Paul Grimm. You be the judge.

The lead author of The GPTJUDGE: Justice in a Generative AI World, is a top expert in law and technology and friend, Maura Grossman, PhD. She is now a practicing attorney, with her own law and consulting firm, Maura Grossman Law, in Buffalo, NY, a Special Master for Courts throughout the country, a Research Professor in the School of Computer Science at the University of Waterloo, an Adjunct Professor at Osgoode Hall Law School at the University of Waterloo and an affiliate faculty member of the Vector Institute in Ontario. Phew! How does she do it all? I suspect substantial help from AI. Now it is Maura’s turn to be the subject of my `Real of Fake’ quiz, then we will get into the article proper.

What do you think, `Real or Fake’ Maura Grossman?

One more thing before I begin, Judge Grimm and Maura Grossman have recently written another article that you should also put on your must read pile, or better yet, click the link and see it now and bookmark: Paul W. Grimm, Maura R. Grossman, and Gordon V. Cormack, Artificial Intelligence as Evidence, 19 Nw. J. Tech. & Intell. Prop. 9, 84 (2021). The well known information scientist, and Maura’s husband, Gordon Cormack, is also an author. So you know the technical AI details in all of these articles are top notch.

Summary of the Segment of the Law Review Article, The GPTJudge, Covering Evidentiary Issues

Judges will soon be required to routinely face the issues raised by evidence created by GPT AI and any evidence alleged to be fake. This will force judges to assess the authenticity and admissibility of evidence challenged as inaccurate or as potential deepfakes. The existing Federal Rules of Evidence and their state counterparts can, the authors contend, be flexibly applied to accommodate the emerging AI technology. Although not covered by the article, in my personal opinion the rules governing arbitrations are also flexible enough. The author’s contend, and I generally agree, that It is infeasible to amend these rules for every new technological development, such as deepfakes, due to the time-consuming revision process required to amend federal rules. Paul Grimm should know as he used to be on the rules committee that revises the Federal Rules of Civil Procedure.

The admissibility of AI evidence hinges on several key areas under the Federal Rules of Evidence: relevance (401), authenticity (901 and 902), the judge’s gatekeeper role in evaluating evidence (104(a)), the jury’s role in determining the authenticity of contested evidence (104(b)), and the need to exclude prejudicial evidence, even if relevant (403).

Judges should adapt the rules to allow for their application to new technologies like AI, without rigidly adhering to them, to promote the development of evidence law. Federal Rule of Evidence 702, which pertains to scientific, technical, and specialized evidence, requires judges to ensure such evidence is based on sufficient facts, reliable methodology, and has been applied accurately to the case.

These evidence rules provide judges and lawyers with enough guidance. Arbitrators probably do not need special new rules either, especially because there are no juries in arbitration.

It is important when assessing potential GPT AI evidence, or alleged deepfake evidence, that judges pay close attention to the rule requiring the exclusion of evidence that could lead to unfair prejudice (Rule 403). This stresses the importance of ensuring that such evidence is both valid and reliable before being presented to the jury. This rule obviously has no direct application to arbitration, but still, arbitrators must take care they are not fooled by fakes.

When evaluating the authenticity of potential evidence, including deepfakes, or other disputed evidence, judges should refer to Federal Rule of Evidence 702 and the Daubert factors to assess the evidence’s validity and reliability. Careful consideration of the potential for unfair prejudice that can occur with the introduction of unreliable technical evidence is of prime importance. The authors stress that admissibility should not solely hinge on whether the evidence is more likely than not to be genuine (the preponderance standard), but also should depend on the potential risks or negative outcomes if the evidence is proven fake, or insufficiently valid and reliable. Evidence should be excluded if the authenticity is doubtful and the risk of unfair or incorrect outcomes is high. Again the presence of a jury or not tempers this risk.

So how are judges supposed to make the call on `Real or Fake’? Judge Paul Grimm and Maura Grossman recommend Judges follow three steps to make these determinations:

Scheduling Order: Judges should set a deadline in their scheduling order for parties intending to introduce potential GPT AI evidence to disclose this to the opposing party and the court well in advance. This allows the opposing counsel time to decide whether they want to challenge the admissibility of the evidence and seek necessary discovery.
The Hearing: When there’s a challenge to the admissibility of the evidence as AI-generated or deepfake, judges should set an evidentiary hearing on the testimony and other evidence needed to rule on the admissibility. This hearing should be scheduled significantly ahead of the trial to allow the judge enough time to evaluate the evidentiary record and make a ruling.
The Ruling: Following the hearing, judges should carefully consider the evidence and arguments presented during the hearing and issue a ruling. This ruling should assess whether the proponent of the evidence has adequately authenticated it. The judge should address the relevance, authenticity, and prejudice arguments. Special attention should be given to the validity and reliability of the challenged evidence, weighing its relevance against the risk of an unfair or excessively prejudicial outcome.

This is good advice for any judge facing these issues, arbitrators too, as well as attorneys and litigants. Everyone should walk these three simple steps to escape the fake traps that Generative AI can create.

Conclusion

I commend the entire law review article for your reading, but especially the section on evidentiary issues. This section, pgs. 12-18, was, for me, particular interesting and helpful. Also see the earlier article by Grimm, Grossman and Cormack, Artificial Intelligence as Evidence. Judges and Arbitrators will all soon be facing many challenges regarding the authenticity and admissibility of evidence related to AI. `Real or Fake’ may be a key question of our times.

The authors insist, and I somewhat reluctantly agree, that it’s not practical to amend the existing Federal Rules of Evidence for every new technological development. Instead, as the authors also point out, and I again agree, these rules provide a flexible framework. Our rules already allow judges to evaluate factors like relevance, authenticity, and potential prejudice of AI-generated evidence. Rule 702 in particular is crucial because it requires that scientific, technical, or specialized evidence be based on ample facts, reliable methodology, and a sound application to the case at hand. The same situation applies to arbitrations, although arbitrations are typically more informal and there are no juries. Still, arbitrators should be on the lookout for fake or unreliable evidence and look for general guidance from the federal evidence rules.

Judge Grimm and Dr. Grossman propose a three-step process to guide judges when handling potential GPT AI or deepfake evidence. I like the simple three-step procedure proposed. There is more to it than described in this summary, of course. You need to read the whole article – THE GPTJUDGE: JUSTICE IN A GENERATIVE AI WORLD.

I urge other lawyers, arbitrators and judges to try out the three-steps proposed when they face these issues. The just, speedy and inexpensive resolution of disputes must remain the polestar of all our dispute resolution. These suggestions, if employed in a reasoned and prudent manner, can help us to do that. Ensuring the reliability of evidence is important because Justice arises out of truth, not lies and fakes.