e-Discovery Team

BREAKING NEWS: Eleventh Circuit Judge Admits to Using ChatGPT to Help Decide a Case and Urges Other Judges and Lawyers to Follow Suit

June 3, 2024

Ralph Losey. Published June 3, 2024.

The Eleventh Circuit published a ground breaking Concurring Opinion on May 28, 2024 by Judge Kevin C. Newsom on the use of generative AI to help decide contract interpretation issues. Snell v. United Specialty Ins. Co., 2024 U.S. App. LEXIS 12733 *; _ F.4th _ (11th Cir., 05/28/24). The case in question centered around interpretation of an insurance policy. Circuit Judge Kevin C. Newsom not only admits to using ChatGPT to help him make his decision, but praises its utility and urges other judges and lawyers to do so too. His analysis is impeccable and his writing is superb. That is bold judicial leadership – Good News. I love his opinion and bet that you will too.

Image by Ralph Losey using his Visual Muse GPT

The only way to do the Concurring Opinion justice is to quote all of it, all 6,485 words. I know that’s a lot of words, but unlike ChatGPT, which is a good writer, Judge Newsom is a great writer. Judge Kevin C. Newsom, a Harvard law graduate from Birmingham, Alabama, is creative in his wise and careful use of AI. Judge Newsom added photos to his opinion and, as I have been doing recently in my articles, quoted in full the transcripts of the ChatGPT sessions he relied upon. He leads by doing and his analysis is correct, including especially his commentary on AI and human hallucinations.

Judge Newsom has an interesting, personal story to tell, and, unlike ChatGPT, he tells it in an amusing and self-effacing way. This is the first case of its kind and deserves careful study by lawyers and judges all over the world. Help me to get the word out by sharing his Concurring Opinion with your friends and colleagues. Your clients should see it too.

Image by Ralph Losey using **Visual Muse: illustrating concepts with style**.

To spice it up a little, and because I can make my blogs as long as I want, which is unheard of these days, I add a few obvious editorial comments along the way (in red italics), including bolding. I do this to point out a few things, and add some deserved praise of this way cool opinion.

So settle in and prepare yourself for a interesting, clever read. I promise that it will be the best concurring opinion to an insurance contract case that you have ever read. Plus, since you are probably an AI enthusiast like me, you will want to cite and quote parts of this opinion for years to come. Hint, the parts that I embolden are probably going to be seen often in the coming days of AI jurisprudence. (If you see any errors, they are mine, not Judge Newsom’s, as I rushed without assistance to get this out to you quickly.)

Newsom, Circuit Judge, concurring:

I concur in the Court’s judgment and join its opinion in full. I write separately (and I’ll confess this is a little unusual¹ ) simply to pull back the curtain on the process by which I thought through one of the issues in this case—and using my own experience here as backdrop, to make a modest proposal regarding courts’ interpretations of the words and phrases used in legal instruments.

Here’s the proposal, which I suspect many will reflexively condemn as heresy, but which I promise to unpack if given the chance: Those, like me, who believe that “ordinary meaning” is the foundational rule for the evaluation of legal texts should consider—consider—whether and how AI-powered large language models like OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude might—might—inform the interpretive analysis. There, having thought the unthinkable, I’ve said the unsayable.

Now let me explain myself.

First, a bit of background. [*26] As today’s majority opinion recounts, both in the district court and before us, the parties litigated this as an “ordinary meaning” case. In particular, they waged war over whether James Snell’s installation of an in-ground trampoline, an accompanying retaining wall, and a decorative wooden “cap” fit within the common understanding of the term “landscaping” as used in the insurance policy that Snell had purchased from United Specialty Insurance Company.

So, for instance, the district court observed that “whether the claims are covered depends upon whether the performance of ‘landscaping’ would include Snell’s installation of the trampoline.” Doc. 23 at 10. Because the policy didn’t define the term “landscaping,” the court said, the coverage determination turned on whether Snell’s trampoline-related work fit the “common, everyday meaning of the word.” Id. at 10-11. Having reviewed multiple dictionary definitions provided by the parties, the court concluded that Snell’s work didn’t constitute “landscaping.” Id. at 13. As the majority opinion explains, the plain-meaning battle continued on appeal, with the parties “expend[ing] significant energy parsing the words of the policy, including [*27] whether the site work necessary to install the trampoline was ‘landscaping.'” Maj. Op. at 17. Snell insisted, for example, that the district court had erred by “ignor[ing] the plain meaning of undefined terms” in the policy—most notably, “landscaping.” Br. of Appellant at 20, 21.

Now, as it turned out, we managed to resolve this case without having to delve too deeply into the definitional issue that the parties featured—due in large part to (1) a quirk of Alabama law that, according to the state supreme court, makes every insurance application ipso facto part of the policy that it precedes² and (2) the fact that in his application Snell had expressly denied that his work included “any recreational or playground equipment construction or erection.” Maj. Op. at 17-18 (quotation marks omitted). Combined, those two premises yield the majority opinion’s controlling conclusion: “Snell’s insurance application—which Alabama law requires us to consider part of the policy—expressly disclaims the work he did here” and thus defeats his claim. Id. at 18.

Importantly, though, that off-ramp wasn’t always obviously available to us—or at least as I saw things, to me. Accordingly, I spent hours [*28] and hours (and hours) laboring over the question whether Snell’s trampoline-installation project qualified as “landscaping” as that term is ordinarily understood. And it was midway along that journey that I had the disconcerting thought that underlies this separate writing: Is it absurd to think that ChatGPT might be able to shed some light on what the term “landscaping” means? Initially, I answered my own question in the affirmative: Yes, Kevin, that is positively absurd. But the longer and more deeply I considered it, the less absurd it seemed.

But I’m getting ahead of myself. I should tell the full story, from beginning to end. In what follows, I’ll first explain how my initial efforts to pinpoint the ordinary meaning of the term “landscaping” left me feeling frustrated and stuck, and ultimately led me—initially half-jokingly, later more seriously—to wonder whether ChatGPT and other AI-powered large language models (“LLMs”) might provide a helping hand. Next, I’ll explore what I take to be some of the strengths and weaknesses of using LLMs to aid in ordinary-meaning interpretation. Finally, given the pros and cons as I see them, I’ll offer a few ideas about how we—judges, lawyers, [*29] academics, and the broader AI community—might make LLMs more valuable to the interpretive enterprise.

First things first. I’m unabashedly a plain-language guy—firmly of the view that “[t]he ordinary meaning rule is the most fundamental semantic rule of interpretation” and that it should govern our reading not only of “constitutions, statutes, [and] rules,” but also, as relevant here, of “private instruments.” Antonin Scalia & Bryan A. Garner, Reading Law: The Interpretation of Legal Texts 69 (2012). Accordingly, I take it as gospel truth that absent a clear indication that some idiosyncratic, specialized meaning was intended, “[w]ords are to be understood in their ordinary, everyday meanings.” Id.; accord, e.g., Shiloh Christian Ctr. v. Aspen Specialty Ins. Co., 65 F.4th 623, 629-30 (11th Cir. 2023) (Newsom, J.) (evaluating an insurance policy’s “plain language”); Heyman v. Cooper, 31 F.4th 1315, 1319-20 (11th Cir. 2022) (Newsom, J.) (evaluating a municipal ordinance’s “ordinary meaning”); Barton v. United States AG, 904 F.3d 1294, 1298-99 (11th Cir. 2018) (Newsom, J.) (evaluating a federal statute’s “ordinary meaning”).

So, following the district court’s lead, I did here what any self-respecting textualist would do when trying to assess the ordinary meaning of a particular word, here “landscaping”: I went to the dictionaries.³ In his brief, Snell had served up a buffet of definitions, ranging [*30] from Dictionary.com’s—”to improve the appearance of (an area of land, a highway, etc.) as by planting trees, shrubs, or grass, or altering the contours of the ground”—to Wikipedia’s—”any activity that modifies the visible features of an area of land.” See Br. of Appellant at 22-23. My own research revealed, in addition, that Webster’s defined “landscaping” as “to modify or ornament (a natural landscape) by altering the plant cover,” Merriam-Webster’s Collegiate Dictionary 699 (11th ed. 2014), and that Oxford defined it to mean “improv[ing] the aesthetic appearance of (an area) by changing its contours, adding ornamental features, or by planting trees and shrubs,” Oxford Dictionary of English 991 (3d ed. 2010).

As occasionally happens, the dictionaries left a little something to be desired. From their definitions alone, it was tough to discern a single controlling criterion. Must an improvement be natural to count as “landscaping”? Maybe, but that would presumably exclude walkways and accent lights, both of which intuitively seemed (to me, anyway) to qualify. Perhaps “landscaping” work has to be done for aesthetic reasons? That, though, would rule out, for instance, a project [*31] to regrade a yard, say away from a house’s foundation to prevent basement flooding. I once regraded my own yard, and while my efforts did nothing to improve my house’s curb appeal, they served an important functional purpose—and for what it’s worth, I definitely thought I was engaged in “landscaping.”

After languishing in definitional purgatory for a while, I decided to look at the case from a different perspective—and I do mean look. The record contains a series of photographs of Snell’s trampoline-related project. Here’s one, which shows his prep work—in particular, the empty sand pit and the below-ground retaining wall that reinforced its borders:

And another, which depicts the finished product, including both the polypropylene mat (the fun part) and the decorative wooden “cap”:

I’m not particularly proud of it, but I’ll confess that the photos affected the way I thought about the case. Nothing in them really struck me as particularly “landscaping”-y. The problem, of course, was that I couldn’t articulate why. And visceral, gut-instinct decisionmaking has always given me the willies—I definitely didn’t want to be that guy. So in a way, I felt like I was back to square one. (RL – Is Judge Newsom a great writer or what! Oddly enough, I understand exactly what he is saying here. Also note this kind of speech is way beyond ChatGPT’s ability to predict and generate, no matter what it’s temperature setting.)

And that’s [*32] when things got weird. Perhaps in a fit of frustration, and most definitely on what can only be described as a lark, I said to one of my clerks, “I wonder what ChatGPT thinks about all this.” So he ran a query: “What is the ordinary meaning of ‘landscaping’?” Here’s what ChatGPT said in response:

“Landscaping” refers to the process of altering the visible features of an area of land, typically a yard, garden or outdoor space, for aesthetic or practical purposes. This can include activities such as planting trees, shrubs, flowers, or grass, as well as installing paths, fences, water features, and other elements to enhance the appearance and functionality of the outdoor space.

Interesting, for two reasons. First, ChatGPT’s explanation seemed more sensible than I had thought it might—and definitely less nutty than I had feared. Second, it squared with my own impression—informed by my own experience writing, reading, speaking, and listening to American English in the real world—that ordinary people might well use the word “landscaping” (1) to include more than just botanical and other natural improvements and (2) to cover both aesthetic and functional objectives. In fact, several of the [*33] examples that ChatGPT flagged—”paths, fences, [and] water features”—jibed with the sorts of things that had sprung to mind when I first started thinking about the case.

Suffice it to say, my interest was piqued. But I definitely didn’t want to fall into the trap of embracing ChatGPT’s definition just because it aligned with my priors. (Bad.) So, in what might have been a mistake—more on that later—we went ahead and asked it the ultimate question: “Is installing an in-ground trampoline ‘landscaping'”? ChatGPT responded as follows:

“Yes, installing an in-ground trampoline can be considered a part of landscaping. Landscaping involves altering the visible features of an outdoor area for aesthetic or practical purposes, and adding an in-ground trampoline would modify the appearance and function of the space. It’s a deliberate change to the outdoor environment, often aimed at enhancing the overall landscape and usability of the area.”

For good measure, I posed the same questions to Google’s Bard (since replaced by Gemini).⁴ The precise details aren’t particularly important, but the upshot is that both models’ answers indicated that the trampoline-related work Snell had performed—the excavation [*34] of the pit, the construction of the retaining wall, the installation of the mat, and the addition of the decorative wooden cap—just might be landscaping.

As I’ve said, intervening developments—in particular, the focus on a statement in Snell’s application, which Alabama law makes part of his policy and in which Snell disclaimed doing any recreation-related work—alleviated the need to settle on the ordinary meaning of the term “landscaping.” But my own process of working through the plain-language issue was a valuable one, if only because it got me thinking about what was to me a previously unimaginable possibility: Might LLMs be useful in the interpretation of legal texts? Having initially thought the idea positively ludicrous, I think I’m now a pretty firm “maybe.” At the very least, it seems to me, it’s an issue worth exploring.

So let’s explore.

III

In what follows, I’ll try to sketch out what I take to be some of the primary benefits and risks of using LLMs—to be clear, as one implement among several in the textualist toolkit—to inform ordinary-meaning analyses of legal instruments.

I’ll start with the pros as I see them, and then turn to the cons.

1. LLMs train on ordinary-language [*35] inputs. (bold in original) Let me begin with what I take to be the best reason to think that LLMs might provide useful information to those engaged in the interpretive enterprise. Recall what is (for many of us, anyway) the “most fundamental semantic rule of interpretation”: Absent a clear indication that they bear some technical or specialized sense, the words and phrases used in written legal instruments “are to be understood in the ordinary, everyday meanings.” Scalia & Garner, Reading Law, supra, at 69. The premise underlying the ordinary-meaning rule is that “[i]n everyday life, the people to whom rules are addressed continually understand and apply them.” Id. at 71. Accordingly, the ordinary-meaning rule, as its name suggests, has always emphasized “common language,” Nix v. Hedden, 149 U.S. 304, 307, 13 S. Ct. 881, 37 L. Ed. 745, Treas. Dec. 14045 (1893), “common speech,” Sonn v. Magone, 159 U.S. 417, 421, 16 S. Ct. 67, 40 L. Ed. 203 (1895), and “common parlance,” Helix Energy Sols. Grp. v. Hewitt, 598 U.S. 39, 52, 143 S. Ct. 677, 214 L. Ed. 2d 409 (2023)—in short, as I’ve explained it elsewhere, “how people talk,” United States v. Caniff, 916 F.3d 929, 941 (11th Cir. 2019) (Newsom, J., concurring in part and dissenting in part), vacated and superseded, 955 F.3d 1183 (11th Cir. 2020).

The ordinary-meaning rule’s foundation in the common speech of common people matters here because LLMs are quite literally “taught” using data that aim to reflect and capture how individuals use language in their everyday lives. Specifically, the models train on a mind-bogglingly enormous [*36] amount of raw data taken from the internet—GPT-3.5 Turbo, for example, trained on between 400 and 500 billion words⁵ —and at least as I understand LLM design, those data run the gamut from the highest-minded to the lowest, from Hemmingway novels and Ph.D. dissertations to gossip rags and comment threads.⁶ Because they cast their nets so widely, LLMs can provide useful statistical predictions about how, in the main, ordinary people ordinarily use words and phrases in ordinary life.⁷ So, for instance, and as relevant here, LLMs can be expected to offer meaningful insight into the ordinary meaning of the term “landscaping” because the internet data on which they train contain so many uses of that term, from so many different sources—e.g., professional webpages, DIY sites, news stories, advertisements, government records, blog posts, and general online chatter about the topic.⁸

To be sure, LLMs’ training data aren’t a perfect [*37] universe from which to draw hard-and-fast conclusions about ordinary meaning, principally because they don’t capture what I’ll call “pure offline” usages—i.e., those that neither (1) occur online in the first instance nor (2) originate offline, in hard copy, but are eventually digitized and uploaded to some online site. And indeed, the absence of offline usages from the training pool—and in particular, the implications for underrepresented populations—strikes me as a sufficiently serious concern that I’ve broken it out for separate discussion below. See infra at 21-23. Even so, those omissions aside, it seems to me scarcely debatable that the LLMs’ training data are at the very least relevant to the ordinary-meaning analysis. In fact, an LLMs’ dataset may well be the most “perfectly imperfect” on offer because (1) scads of people either use the internet or create content that finds its way onto the internet (or more likely both), (2) the information available online reflects people’s use of terminology in a wide array of contexts and settings, from the sublime to the ridiculous, and (3) there’s little reason (that I can think of) to worry that writers and speakers whose communications [*38] end up online manipulate the inputs (i.e., their words) in a way that might artificially skew the data.

Put simply, ordinary-meaning interpretation aims to capture how normal people use language in their everyday lives—and the bulk of the LLMs’ training data seem to reflect exactly that.⁹

2. LLMs can “understand” context. So far as I can tell, researchers powering the AI revolution have created, and are continuing to develop, increasingly sophisticated ways to convert language (and I’m not making this up) into math that computers can “understand.” See Yonathan A. Arbel & David A. Hoffman, Generative Interpretation, 99 N.Y.U. L. Rev. (forthcoming 2024) (manuscript at 26) (describing “attention mechanism,” a feature of LLMs that facilitates the recognition of how words are used in context). The combination of the massive datasets used for training and this cutting-edge “mathematization” of language enables LLMs to absorb and assess the use of terminology in context and empowers them to detect language patterns at a granular level. So, for instance, modern LLMs can easily discern the difference—and distinguish—between the flying-mammal “bat” that uses echolocation and may or may not be living in your attic, on the one hand, [*39] and the wooden “bat” that Shohei Otani uses to hit dingers, on the other. See id. And that, as I understand it, is just the tip of the iceberg. LLM predictions about how we use words and phrases have gotten so sophisticated that they can (for better or worse) produce full-blown conversations, write essays and computer code, draft emails to co-workers, etc. And as anyone who has used them can attest, modern LLMs’ results are often sensible—so sensible, in fact, that they can border on the creepy. Now let’s be clear, LLMs aren’t perfect—and again, we’ll discuss their shortcomings in due course. But let’s be equally clear about what they are: high-octane language-prediction machines capable of probabilistically mapping, among other things, how ordinary people use words and phrases in context. (RL – Excellent reasoning here by Judge Newsom, again I think he’s got it right. Kudos to him and his clerks.)

3. LLMs are accessible. LLMs are readily accessible (and increasingly so) to judges, lawyers, and, perhaps most importantly, ordinary citizens. In recent years, the use of LLMs has proliferated, and as with all other internet-related tools, one can only assume that usage will continue to accelerate, likely at an exponential rate. The LLMs’ easy accessibility is important in at least two respects. [*40] First, it offers the promise of “democratizing” the interpretive enterprise, both (as already explained) by leveraging inputs from ordinary people and by being available for use by ordinary people. Second, it provides judges, lawyers, and litigants an inexpensive research tool. My “landscaping”-related queries, for instance, while no doubt imperfect, cost me nothing. To be sure, querying a more advanced LLM may come with a pricetag, at least for now. But so does, for example, searching the Oxford English Dictionary, the online version of which exists behind a paywall.¹⁰ And I’d be willing to bet that the costs associated with even the more advanced LLMs pale in comparison to subscriptions for Westlaw and Lexis, which power most modern legal research, including some involving dictionaries.¹¹ And of course there’s always the promise that open-source LLMs might soon approximate the for-profit models’ productivity.

4. LLM research is relatively transparent. Using LLMs to facilitate ordinary-meaning interpretation may actually enhance the transparency and reliability of the interpretive enterprise itself, at least vis-à-vis current [*41] practice. Two brief observations.

First, although we tend to take dictionaries for granted, as if delivered by a prophet, the precise details of their construction aren’t always self-evident. Who exactly compiles them, and by what criteria do the compilers choose and order the definitions within any given entry? To be sure, we’re not totally in the dark; the online version of Merriam-Webster‘s, for instance, provides a useful primer explaining “[h]ow . . . a word get[s] into” that dictionary.¹² It describes a process by which human editors spend a couple of hours a day “reading a cross section of published material” and looking for new words, usages, and spellings, which they then mark for inclusion (along with surrounding context) in a “searchable text database” that totals “more than 70 million words drawn from a great variety of sources”—followed, as I understand things, by a step in which a “definer” consults the available evidence and exercises his or her judgment to “decide[] . . . the best course of action by reading through the citations and using the evidence in them to adjust entries or create new ones.”¹³

Such explainers [*42] aside, Justice Scalia and Bryan Garner famously warned against “an uncritical approach to dictionaries.” Antonin Scalia & Bryan A. Garner, A Note on the Use of Dictionaries, 16 Green Bag 2d 419, 420 (2013). They highlighted as risks, for instance, that a volume could “have been hastily put together by two editors on short notice, and very much on the cheap,” and that without “consult[ing] the prefatory material” one might not be able “to understand the principles on which the dictionary [was] assembled” or the “ordering of [the] senses” of a particular term. Id. at 420, 423.

To be clear, I’m neither a nihilist nor a conspiracy theorist, but I do think that we textualists need to acknowledge (and guard against the fact) that dictionary definitions present a few known unknowns. See id. at 419-28; cf. Thomas R. Lee & Stephen C. Mouritsen, The Corpus and the Critics, 88 U. Chi. L. Rev. 275, 286-88 (2021) (highlighting potential interpretive pitfalls associated with dictionaries). And while I certainly appreciate that we also lack perfect knowledge about the training data used by cutting-edge LLMs, many of which are proprietary in nature, see supra notes 6 & 8, I think it’s fair to say that we do know both (1) what LLMs are learning from—namely, tons and tons of internet data—and (2) one of the things that makes LLMs so useful—namely, their ability [*43] to accurately predict how normal people use language in their everyday lives.

A second transparency-related thought: When a judge confronts a case that requires a careful assessment of a word’s meaning, he’ll typically consult a range of dictionary definitions, engage in a “comparative weighing,” Scalia & Garner, A Note, supra, at 422, and, in his written opinion, deploy one, two, or a few of them. The cynic, of course, will insist that the judge just dictionary-shopped for the definitions that would enable him to reverse-engineer his preferred outcome. See James J. Brudney & Lawrence Baum, Oasis or Mirage: The Supreme Court’s Thirst for Dictionaries in the Rehnquist and Roberts Eras, 55 Wm. & Mary L. Rev. 483, 539 (2013). I’m not so jaded; I trust that ordinary-meaning-focused judges genuinely seek out definitions that best fit the context of the instruments that they’re charged with interpreting. See, e.g., Hoever v. Marks, 993 F.3d 1353, 1366-68 (11th Cir. 2021) (en banc) (Newsom, J., concurring in judgment in part and dissenting in part) (choosing, based on contextual clues, from among competing definitions of the word “for”). Even so, I have to admit (1) that the choice among dictionary definitions involves a measure of discretion and (2) that judges seldom “show their work”—that is, they rarely explain in [*44] any detail the process by which they selected one definition over others. Contrast my M.O. in this case, which I would recommend as a best practice: full disclosure of both the queries put to the LLMs (imperfect as mine might have been) and the models’ answers.

Anyway, I don’t mean to paint either too grim a picture of our current, dictionary-centric practice—my own opinions are chock full of dictionary definitions, I hope to good effect—or too rosy a picture of the LLMs’ potentiality. My point is simply that I don’t think using LLMs entails any more opacity or involves any more discretion than is already inherent in interpretive practices that we currently take for granted—and in fact, that on both scores it might actually involve less.

5. LLMs hold advantages over other empirical interpretive methods. One final point before moving on. Recently, some empiricists have begun to critique the traditional dictionary-focused approach to plain-meaning interpretation. Some, for instance, have conducted wide-ranging surveys of ordinary citizens, seeking to demonstrate that dictionaries don’t always capture ordinary understandings of legal texts. See, e.g., Kevin P. Tobia, Testing Ordinary Meaning [*45] , 134 Harv. L. Rev. 726 (2020). Others have turned to corpus linguistics, which aims to gauge ordinary meaning by quantifying the patterns of words’ usages and occurrences in large bodies of language. See, e.g., Thomas R. Lee & Stephen C. Mouritsen, Judging Ordinary Meaning, 127 Yale L.J. 788, 795 (2018).

On balance, reliance on LLMs seems to me preferable to both. The survey method is interesting, but it seems wildly impractical—judges and lawyers have neither the time nor the resources to poll ordinary citizens on a widespread basis. By contrast, as already explained, LLMs are widely available and easily accessible. And corpus methods have been challenged on the ground, among others, that those tasked with compiling the data exercise too much discretion in selecting among the inputs. See, e.g., Jonathan H. Choi, Measuring Clarity in Legal Text, 91 U. Chi. L. Rev. 1, 26 (2024). For reasons already explained, I don’t think LLM-based methods necessarily carry the same risk.

For all these reasons, and perhaps others I haven’t identified, it seems to me that it’s at least worth considering whether and how we might leverage LLMs in the ordinary-meaning enterprise—again, not as the be all and end all, but rather as one aid to be used alongside dictionaries, the semantic canons, [*46] etc.

Now, let’s examine a few potential drawbacks. I suppose it could turn out that one or more of them are deal-killers. I tend to doubt it, but let’s put them on the table.

1. LLMs can “hallucinate.” First, the elephant in the room: What about LLMs’ now-infamous “hallucinations”? Put simply, an LLM “hallucinates” when, in response to a user’s query, it generates facts that, well, just aren’t true—or at least not quite true. See, e.g., Arbel & Hoffman, supra, at 48-50. Remember the lawyer who got caught using ChatGPT to draft a brief when it ad-libbed case citations—which is to say cited precedents that didn’t exist? See, e.g., Benjamin Weiser, Here’s What Happens When Your Lawyer Uses ChatGPT, N.Y. Times (May 29, 2023). To me, this is among the most serious objections to using LLMs in the search for ordinary meaning. Even so, I don’t think it’s a conversation-stopper. For one thing, LLM technology is improving at breakneck speed, and there’s every reason to believe that hallucinations will become fewer and farther between. Moreover, hallucinations would seem to be most worrisome when asking a specific question that has a specific answer—less so, it seems to me, when more generally seeking the “ordinary meaning” [*47] of some word or phrase. Finally, let’s shoot straight: Flesh-and-blood lawyers hallucinate too. Sometimes, their hallucinations are good-faith mistakes. But all too often, I’m afraid, they’re quite intentional—in their zeal, attorneys sometimes shade facts, finesse (and even omit altogether) adverse authorities, etc. So at worst, the “hallucination” problem counsels against blind-faith reliance on LLM outputs—in exactly the same way that no conscientious judge would blind-faith rely on a lawyer’s representations. (RL – I love this part about human lawyers also hallucinating. This corresponds with my own experience as I have written before. I plan to quote this often. Pretty soon ChatGPT will be able to predict it!)

2. LLMs don’t capture offline speech, and thus might not fully account for underrepresented populations’ usages. I flagged this one earlier, but I think it’s a serious enough concern to merit separate treatment. Here’s the objection, as I see it: People living in poorer communities (perhaps disproportionately minorities and those in rural areas) are less likely to have ready internet access and thus may be less likely to contribute to the sources from which LLMs draw in crafting their responses to queries. Accordingly, the argument goes, their understandings—as manifested, for instance, in their written speech—won’t get “counted” in the LLMs’ ordinary-meaning assessment.

As [*48] I say, I think this is a serious issue. Even so, I don’t believe it fatally undermines LLMs’ utility, at least as one tool among many for evaluating ordinary meaning. Ideally, of course, the universe of information from which any source of meaning draws would capture every conceivable input. But we should guard against overreaction. Presumably, LLMs train not only on data that were born (so to speak) online but also on material that was created in the physical world and only thereafter digitized and uploaded to the internet. And there is (I think) less reason to fear that those in underserved communities are at a dramatic comparative disadvantage with respect to the latter category. Moreover, to the extent we’re worried about a lack of real-world, documentary evidence representing underrepresented populations’ usages, then we have bigger fish to fry, because there’s reason to doubt the utility of dictionaries, as well—which, as Merriam-Webster‘s editors have explained, also rely on hard-copy sources to evaluate terms’ ordinary meanings. See supra at 16-17 & note 12. (RL – I agree with Judge Newsom’s commendable concerns here about bias of sorts built into the data, but like him, agree that in this legal situation at least, there is no reason for concern.)

Anyway, the risk that certain communities’ word-usage outputs aren’t adequately reflected in LLMs’ training-data inputs [*49] is real, and I’d note it as a candidate for improvement, but I don’t think it’s either fatal or insurmountable.¹⁴

3. Lawyers, judges, and would-be litigants might try to manipulate LLMs. I suppose there’s a risk that lawyers and judges might try to use LLMs strategically to reverse-engineer a preferred answer—say, by shopping around among the available models or manipulating queries. Maybe, but that’s an evergreen issue, isn’t it? Although they shouldn’t, lawyers and judges can cast about for advantageous dictionary definitions and exploit the interpretive canons, but no one thinks that’s a sufficient reason to abandon those as interpretive tools. And if anything, I tend to think that the LLMs are probably less vulnerable to manipulation than dictionaries and canons, at least when coupled with (as I’ve tried to provide here) full disclosure of one’s research process. (RL- Very clever observation. It is also important to acknowledge that Judge Newsom is being fully transparent in his disclosure of use of AI. In fact, this is one of the most transparent and personally revealing opinions I have ever read.)

Relatedly, might prospective litigants seek to corrupt the inputs—the data on which the LLMs train and base their responses to user queries—in an effort to rig the system to spit out their preferred interpretations? It’s a real concern—perhaps especially considering that the same AI companies that have developed and [*50] are training the LLMs might themselves be litigants. But given the nature of the technology as I understand it, hardly insurmountable. For one thing, most models embody some training “cutoff”—for instance, though things might have changed, it was once common knowledge that GPT-4 learned on data up to and including September 2021. See Open AI, GPT-4 Technical Report 10 (arXiv:2303.08774, 2024). Accordingly, it would likely be difficult, if not impossible, to pollute the inputs retroactively. More fundamentally, it seems almost inconceivable that a would-be malefactor could surreptitiously flood any given dataset with enough new inputs to move the needle—remember, just by way of example, that GPT-3.5 Turbo trained on more than 400 billion words. Finally, while I tend to doubt that any AI company would conclude that corrupting its own product in order to obtain an interpretive advantage in a single case was in its long-term business interest, that risk, it seems to me, could be mitigated, if not eliminated, by querying multiple models rather than just one. (RL – agree with this observation and the Judge’s conclusions.)

4. Reliance on LLMs will lead us into dystopia. Would the consideration of LLM outputs in interpreting legal texts inevitably put us [*51] on some dystopian path toward “robo judges” algorithmically resolving human disputes? I don’t think so. As Chief Justice Roberts recently observed, the law will always require “gray area[]” decisionmaking that entails the “application of human judgment.” Chief Justice John G. Roberts, Jr., 2023 Year-End Report on the Federal Judiciary 6 (Dec. 31, 2023). And I hope it’s clear by this point that I am not—not, not, not—suggesting that any judge should ever query an LLM concerning the ordinary meaning of some word (say, “landscaping”) and then mechanistically apply it to her facts and render judgment. My only proposal—and, again, I think it’s a pretty modest one—is that we consider whether LLMs might provide additional datapoints to be used alongside dictionaries, canons, and syntactical context in the assessment of terms’ ordinary meaning. That’s all; that’s it. (RL – My only criticism of Judge Newsom’s Concurring Opinion is that he does not go further. I for one think judges should go much further in their use of generative AI, as I have written about previously. Appellate judges may be among the first to be routinely enhanced with AI. See e.g. Circuits in Session: How AI Challenges Traditional Appellate Dynamics (e-Discovery Team, 10/13/23); Circuits in Session: Addendum and Elaboration of the Appellate Court Judge Experiment (e-Discovery Team, 10/26/23); Circuits in Session: Analysis of the Quality of ChatGPT4 as an Appellate Court Judge (e-Discovery Team, 11/01/23). But I understand why Judge Newsom does not do that here. One step at a time and this opinion is an important first step.)

Which brings me to my final question: If I’m not all wet, and it’s at least worth considering whether LLMs have a role to play in the interpretation of legal instruments, how might we maximize their utility? I’ve already flagged a few suggestions for improvement along the way—more data, from more sources, representing a more [*52] representative cross-section of Americans. But beyond the obvious, what else? (RL – No Judge Newsom, you are not all wet. You are squeaky clean, on point and over-modest. If only we had more judges like you. Certainly LLMs have a very important role to play in the interpretation of legal instruments.)

First, I think it’ll be helpful to clarify the objective. Remember that in my clumsy first crack at this, I asked two different models two different questions: (1) “What is the ordinary meaning of ‘landscaping’?”; and (2) “Is an in-ground trampoline ‘landscaping’?” Which is the proper question? In retrospect, if my contention is—as it is—that LLMs might aid in the search for the ordinary, everyday meaning of common words and phrases, then it seems pretty clear to me that my first, more general query is the more appropriate one. The models’ highest and best use is (like a dictionary) helping to discern how normal people use and understand language, not in applying a particular meaning to a particular set of facts to suggest an answer to a particular question.

Second, and relatedly, how can we best query LLMs? Those in the know refer to the question a user asks a model as a “prompt.” I’ll confess that I gave relatively little thought to my own prompts—they were just the questions that immediately sprang to mind. But research indicates that the models can be sensitive to prompts and that the results can vary accordingly. [*53] See, e.g., Arbel & Hoffman, supra, at 36. So it may be wise for users to try different prompts, and, importantly, to report the prompts they use and the range of results they obtain. Id. at 36-37. Better still to do all that and query multiple models to ensure that the results are consistent—or, in statistics-speak, “robust.” (RL – How can we best query LLM’s? That is indeed the key question of the day and the whole idea behind Prompt Engineering, a subject that has been the focus of my studies and experiments for some time now. I promise you Judge Newsom that many have been working hard on this challenge and should have a solution for this soon.)

Third, we need to clarify the particular output we’re after. The questions I asked sought a discrete, one-time answer. In particular, I asked for a single definition of “landscaping” and, separately, whether installation of an in-ground trampoline qualified. One potential challenge is that this approach obscures the fact, already explained, that LLMs make probabilistic, predictive judgments about language. With that in mind, some who have considered how LLMs might be used to interpret contracts have suggested that users seek not just answers but also “confidence” levels. See id. at 23. So, for instance, an LLM might reveal that its prediction about a provision’s meaning is “high” or, by contrast, only “ambiguous.” Alternatively, but to the same end, a researcher might ask an LLM the same question multiple times and note the percentage of instances in which it agrees that, say, installation of an in-ground [*54] trampoline is landscaping. See Christoph Engel & Richard H. McAdams, Asking GPT for the Ordinary Meaning of Statutory Terms 15 (Max Planck Inst. Discussion Paper 2024/5).¹⁵ (RL – Yes, asking multiple times is one way of many to improve the quality of the AI input. Again that is a question of prompt engineering.)

Fourth and finally, there are temporal considerations to mull. The ordinary-meaning rule has an important corollary—namely, that “[w]ords must be given the meaning they had when the text was adopted.” Scalia & Garner, Reading Law, supra, at 78 (emphasis added). That principle—”originalism,” if you will—most obviously applies to constitutional and statutory texts. See, e.g., United States v. Pate, 84 F.4th 1196, 1201 (11th Cir. 2023) (en banc) (“[W]hen called on to resolve a dispute over a statute’s meaning, [a court] normally seeks to afford the law’s terms their ordinary meaning at the time Congress adopted them.” (quoting Niz-Chavez v. Garland, 593 U.S. 155, 160, 141 S. Ct. 1474, 209 L. Ed. 2d 433 (2021)). To be clear, though, it holds for private instruments like contracts and insurance policies, as well. See Scalia & Garner, Reading Law, supra, at 78. This case didn’t present a particularly knotty temporal difficulty because the policy in question was issued relatively recently, in 2017. But cases routinely arise in which the relevant question is what a particular term meant in the more distant past—say, in 1787, 1868, or 1964. If LLMs are to be deployed to aid more broadly in the search for ordinary meaning, it would be enormously [*55] helpful—if it’s not already possible—for AI engineers to devise a way in which queries could be limited to particular timeframes. Cf. Engel & McAdams, supra, at 34-37.¹⁶ (RL – Agree this would be very possible and I understand the constitution construction hints and implications here.)

In his most recent year-end report on the state of the federal judiciary, Chief Justice Roberts cautioned that the “use of AI requires caution and humility.” Roberts, supra, at 5. I wholeheartedly agree. Importantly, though, I also agree with what I take to be the report’s assumption that AI is here to stay. Now, it seems to me, is the time to figure out how to use it profitably and responsibly. It’s in that spirit that I’ve offered these preliminary thoughts about whether and how LLMs might aid lawyers and judges in the interpretive enterprise. Plenty of questions remain, and I’m sure I haven’t even identified all of them. But—and this is my bottom line—I think that LLMs have promise. At the very least, it no longer strikes me as ridiculous to think that an LLM like ChatGPT might have something useful to say about the common, everyday meaning of the words and phrases used in legal texts.

Just my two cents.

Appendix

Below are the questions that I asked OpenAI’s ChatGPT and Google’s Bard models, along with their responses.

ChatGPT

Question: What is the ordinary meaning of “landscaping” [*56] ?

Response: “Landscaping” refers to the process of altering the visible features of an area of land, typically a yard, garden or outdoor space, for aesthetic or practical purposes. This can include activities such as planting trees, shrubs, flowers, or grass, as well as installing paths, fences, water features, and other elements to enhance the appearance and functionality of the outdoor space.

Question: Is installing an in-ground trampoline “landscaping”?

Response: Yes, installing an in-ground trampoline can be considered a part of landscaping. Landscaping involves altering the visible features of an outdoor area for aesthetic or practical purposes, and adding an in-ground trampoline would modify the appearance and function of the space. It’s a deliberate change to the outdoor environment, often aimed at enhancing the overall landscape and usability of the area.

Bard

Question: What is the ordinary meaning of “landscaping”?

Response: The ordinary meaning of landscaping” is the process of modifying the visible features of an area of land, including:

• Adding or removing vegetation, such as trees, shrubs, and flowers

• Installing hardscape elements, such as walkways, patios, and retaining walls [*57]

• Creating or enhancing natural features, such as ponds, waterfalls, and rock gardens

Landscaping can be used to improve the aesthetics of an area, increase its functionality, and/or protect it from erosion. It can also be used to create a sense of privacy, define spaces, and add value to a property.

Here are some examples of landscaping projects:

• Planting a new lawn or garden

• Adding a patio or walkway

• Building a retaining wall

• Installing a sprinkler system

• Creating a pond or waterfall

• Planting trees and shrubs

• Adding mulch or gravel

• Edging a flower bed

• Trimming hedges

Landscaping can be a DIY project, or it can be hired out to a professional landscaping company. The cost of landscaping will vary depending on the size and scope of the project.

Question: Is installing an in-ground trampoline “landscaping”?

Response: Whether or not installing an in-ground trampoline is considered landscaping depends on how you define the term. If you define landscaping broadly to include any modification to the visible features of an area of land, then installing an in-ground trampoline would be considered landscaping. However, if you define landscaping more narrowly to focus on the planting and maintenance [*58] of vegetation, then installing an in-ground trampoline would not be considered landscaping.

Here is a table that summarizes the different ways to define landscaping and whether or not installing an in-ground trampoline would be considered landscaping under each definition:

Definition	Includes installing an in-
	ground trampoline?
Broad definition: Any	Yes
modification to the visible
features of an area of land
Narrower definition: Planting	No
and maintenance of vegetation

Ultimately, whether or not installing an in-ground trampoline is considered landscaping is a matter of opinion. There is no right or wrong answer.

End of Document

¹Even for me.

² For what it’s worth, I don’t think the governing Alabama statute says that. In relevant part, it provides that “[e]very insurance contract shall be construed according to the entirety of its terms and conditions as set forth in the policy and as . . . modified by any rider, endorsement or application which is part of the policy.” Ala. Code § 27-14-17(a). The absence of a comma before the clause “which is part of the policy” strongly indicates, to my mind, that an insured’s “application” should be deemed to “modif[y]” the contract only if it is made “part of the policy.” Be that as it may, as the majority opinion correctly notes, the Alabama Supreme Court has read § 27-14-17(a), in effect, to incorporate every insured’s application into the terms of his policy, and we’re stuck with that interpretation, even if erroneous. See Maj. Op. at 18 (citing Atlanta Cas. Co. v. Russell, 798 So. 2d 664, 667 (Ala. 2001)).

³ Alabama law governs the interpretation of the insurance contract at issue in this case, see St. Paul Fire & Marine Ins. Co. v. ERA Oxford Realty Co. Greystone, LLC, 572 F.3d 893, 894 n.1 (11th Cir. 2009), and privileges “ordinary meaning” in that endeavor, see Safeway Ins. Co of Alabama v. Herrera, 912 So. 2d 1140, 1144 (Ala. 2005).

⁴ Generally, Bard’s response to my general question—”What is the ordinary meaning of ‘landscaping’?”—was pretty similar to ChatGPT’s, though notably longer. When asked the more specific question—”Is installing an in-ground trampoline ‘landscaping’?”—Bard was more equivocal than ChatGPT had been. I’ve included my questions and the models’ responses in an appendix for readers’ reference.

⁵ See Christoph Engel & Richard H. McAdams, Asking GPT for the Ordinary Meaning of Statutory Terms 10-11 (Max Planck Inst. Discussion Paper 2024/5).

⁶I’ll confess to a bit of uncertainty about exactly what data LLMs use for training. This seems like an area ripe for a transparency boost, especially as LLMs become increasingly relevant to legal work. But here’s what I think I’ve gathered from some sleuthing. A significant chunk of the raw material used to train many LLMs—i.e., the “stuff” from which the models learn—comes from something called the Common Crawl, which is, in essence, a massive data dump from the internet. See, e.g., Yiheng Liu, et al., Understanding LLMs: A Comprehensive Overview from Training to Inference 6-8 (arXiv:2401.02038, 2024). The Common Crawl isn’t “the entire web”; rather, it’s a collection of samples from online sites, which AI companies further refine for training purposes. See Stefan Baack, Training Data for the Price of a Sandwich: Common Craw’s Impact on Generative AI 5, 16-24, Mozilla Insights (Feb. 2024). That said, the samples are massive. (RL – Yes, that is a large part, but not all of it, and most agree with Judge Newsom that greater transparency is required from OpenAI and other vendors on this issue.)

⁷To be clear, I do mean “predictions.” As I understand things, the LLM that underlies a user interface like ChatGPT creates, in effect, a complex statistical “map” of how people use language—that, as machine-learning folks would say, is the model’s “objective function.” How does it do it? Well, to dumb it way down, drawing on its seemingly bottomless reservoir of linguistic data, the model learns what words are most likely to appear where, and which ones are most likely to precede or follow others—and by doing so, it can make probabilistic, predictive judgments about ordinary meaning and usage. See Yonathan A. Arbel & David A. Hoffman, Generative Interpretation, 99 N.Y.U. L. Rev. (forthcoming 2024) (manuscript at 24-29); Engel & McAdams, supra, at 10-11. (RL – Sounds correct to me.)

⁸ So far as I understand things, it’s next to impossible to pinpoint exactly what training data an LLM draws on when answering a particular question, but from what I’ve seen, I think it’s fair to say that it’s a pretty wide cross-section.

⁹ I’ll bracket for the time being whether LLMs might be useful (or less so) in the fraction of cases in which we’re focused on technical or specialized meaning, rather than ordinary meaning. See Scalia & Garner, Reading Law, supra, at 73.

¹⁰ See Purchase, Oxford English Dictionary, https://www.oed.com/purchase (last visited May 23, 2024).

¹¹ Westlaw, for instance, allows paid subscribers to access the latest edition of Black’s Law Dictionary. Lexis permits its users to access similar offerings, including Ballentine’s Law Dictionary.

¹² Help: How does a word get into a Merriam-Webster dictionary?, Merriam-Webster (last visited May 23, 2024), https://www.merriam-webster.com/help/faq-words-into-dictionary [https://perma.cc/446C-WYMN].

¹³ Id.

¹⁴ A quasi-related issue: Some words have acquired “regionalized” meanings over time. So, for instance, the noun “toboggan” can refer to either (1) a “long flat-bottomed light sled,” (2) a “downward course or sharp decline,” or (3) a “stocking cap.” Merriam-Webster’s Collegiate Dictionary, supra, at 1313. Notably, though, the third sense is “chiefly Southern [and] Midland.” Id. When we asked ChatGPT, “What is the ordinary meaning of ‘toboggan’?”, it responded with only the first, sled-based explanation. The lesson is simply that interpreters using LLMs for assistance would be wise to remember, as always, that “context is king,” Wachovia Bank, N.A. v. United States, 455 F.3d 1261, 1267 (11th Cir. 2006), and, accordingly, that they might need to adjust their queries to account for its influence.

¹⁵ Some might worry that seeking a range of responses could cause the LLM to respond with uncommon usages. Of course, if the rogue results are rare, then, almost by definition, they won’t move the “ordinary meaning” needle. And if, by contrast, they’re not rare—and thus aren’t rogues at all—then perhaps they indicate that we need to rethink our intuitions about what the “ordinary meaning” really is. Fine, and good.

¹⁶ Relatedly, might we have a “start date” problem? Are we limited to ordinary understandings that post-date the launch of the internet? Or might it be that the information contained on the internet is so extensive that it can aid in understanding historical usages, as well?

3 Comments | AI Ethics, AI Instruction, AI Prompt Engineering Instruction, Book, ChatGPT, Evidence, Lawyers Duties, Technology | Permalink
Posted by Ralph Losey

Omni Version Test of the Panel of AI Experts on a New Topic: “AI Mentors of New Attorneys” – Part Four

June 3, 2024

Ralph Losey. Published June 3, 2024.

My custom GPT Panel of AI Experts for Lawyers was tested again to evaluate its effectiveness on ChatGPT4o Omni. This time an entirely new topic was selected for the AI panelists to discuss, AI Mentors for Lawyers. In the first three parts of the series the twenty panels tested all discussed AI Hallucinations. Evidence that AI Expert Panels Could Soon Replace Human Panelists or is this just an Art Deco Hallucination? Part One (e-Discovery Team, May 13, 2024); Experiment with a ChatGPT4 Panel of Experts and Insights into AI Hallucination – Part Two, (e-Discovery Team, May 21, 2024); OMNI Version – ChatGPT4o – Retest of My Panel of AI Experts – Part Three (e-Discovery Team, May 29, 2024).

AI mentor guiding two young legal professionals. Image by Ralph Losey in photorealistic style using his Visual Muse.

Part Three raised the concern that the retest using the Omni version may have been skewed by carry-over and overtraining from the twenty panels. That required another retest of the Omni version on a completely different topic. Thanks to Judge Ralph Artigliere for suggesting a great new topic, the potential for AI Mentors to assist in the training and development of new legal professionals.

Background on the Issue of AI Mentors in the Legal Profession

Starting associates right out of law school used to train on the menial tasks now assigned to AI. That left a training gap, one that needs to be filled by new training efforts, including AI driven training. We need to quickly bootstrap new attorneys to the higher level where special human skills are needed. One way to do that is with AI mentors. See: Josh Kubicki, Building ChatGPT-based new lawyer mentors (Brainyacts #197, 5/21/24). Judge Artigliere, who brought this article to my attention, and I, both recommend you read it. Also see: Chandan Lal Patar, Enhancing the Mentoring Experience with ChatGPT (Medium, 11/04/23); Ropes&Gray Linkedin News Release on WSJ Article of 5/8/24, How AI Has Already Begun to Change These Workers Jobs.

Baby lawyers who need help from human and AI mentors. Photorealistic anime image by Ralph Losey using Visual Muse.

Josh Kubicki identifies himself on Linkedin as a Fractional Executive, Entrepreneur and Professor. He is listed on the University of Richmond as Assistant Professor of Law and Director of the School’s new Legal Innovation and Entrepreneurship program. Professor Kubicki’s article starts off by observing significant shrinking of ‘on the job‘ learning opportunities for new lawyers. Before 2020 a typical week involved 117 learning interactions. In 2024 that was down to 68 per week. This is due in large part to the impact of Covid and the new hybrid, tech heavy work model. AI will only exacerbate the problem as newbies are needed less and less. As Kubicki explains:

Generative AI tools are increasingly handling routine tasks that used to be learning opportunities for new lawyers, such as contract reviews and memo writing. This shift means new lawyers have fewer chances to develop practical skills through direct work experiences. . . .

Clients are beginning to use generative AI tools themselves, further reducing the routine tasks they delegate to law firms. This self-sufficiency by clients reduces the volume of work available for training new lawyers, exacerbating the need for alternative training methods.

Building ChatGPT-based new lawyer mentors. Josh Kubicki’s article refers you to a copy of a ChatGPT4.0 session where an M&A associate in a firm is guided by a custom GPT he created called MentorGPT.

Panel of Experts on AI Mentors shown in gritty futuristic cyberpunk style by Losey using his **Visual Muse**.

The ten AI Expert Panel sessions we ran confirm’s that AI mentorships should be a part of the solution to fill the training gap, so long as it is a hybrid effort, one that supplements, not replaces, human mentors. The ten panels in this Part Four show how AI mentors can work in the legal profession.

Identifying the Ten AI Expert Panel Sessions on AI Mentoring

Below is a list of the ten panels in sequential order. Recall from Part One of this article that five experts are proffered for each session, no matter what subject is put to the Panel of AI Experts for Lawyers. They are the Pro-AI Attorney, Prompt Engineer Lawyer, Child Prodigy, Lawyer Scientist and the Devil’s Advocate. The Devil’s Advocate panelist is included on every panel, because I have found the discussion is always improved by a contrarian. In each session, five additional alternative experts are suggested to the panel. Those five expert suggestions depend on the subject. Moreover those subject specific expert suggestions can and do vary from session to session.

The total number of words in each session is also included. This provides a specific metric to compare the conciseness of these ten panels.

Panel One. 2,187 words. Law Firm Manager, AI Policy Specialist, Training and Development Specialist, Technology Integration Consultant.
Panel Two. 2,310 words. Pro-AI Attorney, Law Firm Partner, Technology Integration Specialist, Human Resources Specialist.
Panel Three. 1,986 words. Lawyer Scientist, Human Resources Specialist, Technology Risk Analyst, Judicial Perspective Expert.
Panel Four. 1,897 words. Prompt Engineer Lawyer, Criminal Law Expert, State Bar Ethics Official, AI Policy Expert.
Panel Five. 2,277 words. Child Prodigy, State Bar Ethics Official, Criminal Law Expert, Judge.
Panel Six. 2,047 words. Technology Integration Specialist, Mentorship Program Developer, AI Compliance Officer, Training and Development Manager.
Panel Seven. 2,082 words. Technology Integration Specialist, Training and Development Expert, Risk Management Specialist, Human-AI Collaboration Researcher.
Panel Eight. 1,812 words. Training and Development Specialist, Technology Integration Specialist, AI Policy Expert: Law Firm Partner.
Panel Nine. 1,912 words. Lawyer Scientist, Legal Ethics Expert, Legal Technology Consultant, Judge with AI Expertise
Panel Ten. 1,971 words. Pro-AI Attorney, Law Firm Partner, HR Expert, Tech-Savvy Young Lawyer.

Mentoring with AI help in punk anime style by Ralph Losey using **Visual Muse: illustrating concepts with style**.

Note that I opened a new session for each panel. Also, each session included a question at the end on the costs and benefits of AI mentoring. For that reason the word counts cannot be easily compared with the prior testing, which did not include a question at the end. The word counts on the mentor Panel does again show variance between panels. Variable length of speech is what you would expect to see with human panels. Again, it was obvious, even with the addition of a question, that the new version of 4o Omni is much more concise than the old. It is also apparent that training from twenty sessions on Hallucinations topic did have an impact on increase conciseness and intelligence. The ten panels on Legal AI Mentors, where there as little impact of prior training, was slightly more verbose, and perhaps not quite as smart, as the pre-trained Omni run on hallucinations. This is the kind of result you would hope to see and is consistent with OpenAI’s promises of cross-session improrvement, but additional tests would be need to confirm that with greater certainty.

The full transcripts for all ten sessions on AI Mentoring have been preserved and could be shared with interested researchers. In the analysis that follows the transcripts of the two “winning” panels only are shared by a PDF file that can be scrolled or downloaded. There the interested reader can see for themselves how the custom GPT Panel of AI Experts for Lawyers performed. Overall I was satisfied with the tests and impressed with the performance of this custom GPT. I should release it to the public soon, free with no add-ons, at the OpenAI store.

As discussed in OMNI Version – ChatGPT4o – Retest of the Panel of AI Experts – Part Three, I again observed greater consistency in these ten panels on AI Mentorship that was using the Omni version 4o than I had seen in Parts One and Two using the prior version ChatGPT4.0. Once again, there also appeared to be somewhat improved overall intelligence and significantly improved conciseness. Again, this was discussed at length in Part Three and so will not be repeated here.

AI Mentor shown in surrealistic style by Ralph Losey usin Visual Muse.

Transcript of ChatGPT4o Evaluation

Here is the transcript of my ChatGPT4o session where it evaluated the ten sessions run of AI mentorship using the GPT Panel of AI Experts for Lawyers. The full transcripts for all ten sessions have been preserved and could be shared with interested researchers.

Ralph: Please study the attached Word file, Mentor Omni Panels 1-10.docx. The internal title of this document is “Mentor Omni Panels 1-10. Panel of Expert Experiment on AI Mentorship in Law Firms.” It is 69 pages long and consists of approximately 20,677 words. This word document contains the transcript of ten sessions of ChatGPT4o – Onmi edition – by Ralph Losey using his custom GPT named “Panel of AI Experts for Lawyers.” Each of the ten panels considered the exact same subject of the possible use of ChatGPT AI as a mentor for new associates in law firms. Each panel was also given the same document (Building ChatGPT-based new lawyer mentors by Josh Kubicki) as background material only, as you will see on the first page of the attached word file of the ten sessions. After you carefully study the attached Word file, I would like you to prepare a summary of the ten sessions and then I will ask followup questions.

ChatGPT4o:

Summary of the Mentor Omni Panels 1-10

Panel Overview

The document titled “Mentor Omni Panels 1-10. Panel of Expert Experiment on AI Mentorship in Law Firms” consists of ten sessions where a panel of AI experts discussed the potential use of ChatGPT AI as a mentor for new associates in law firms. Each panel considered the same subject and used the same background material.

Key Themes and Insights

Potential Benefits of AI Mentorship

Efficiency and Accessibility: AI can provide instant access to a vast amount of legal knowledge, answer questions on demand, and offer feedback on legal writing. This can supplement traditional mentoring, allowing associates to learn at their own pace and ensuring consistent training quality.
Scalability: AI can handle a large number of queries simultaneously, making it scalable to the firm’s size without proportional increases in costs.
Cost Savings: By reducing the burden on senior attorneys, AI can allow them to focus on more complex, billable tasks, leading to significant cost savings.
Consistency and Standardization: AI can ensure that all associates receive the same level of guidance, reducing variability in training quality.

Benefits of AI mentors. Visual Muse image in photorealism style.

Challenges and Concerns

Over-reliance on AI: There is a risk that associates may become overly dependent on AI, potentially leading to a decline in critical thinking and interpersonal skills.
Accuracy and Bias: AI’s responses can sometimes be inaccurate or biased, which could misguide new associates. Continuous updates and audits are necessary to maintain the accuracy and relevance of AI-generated advice.
Ethical and Professional Risks: The implementation of AI must include robust ethical guidelines and regular audits to ensure compliance with legal and ethical standards.
Human Element: The personal mentorship and relationship-building aspects could suffer if over-reliance on AI is not managed properly. Human mentors provide nuanced insights and professional wisdom that AI cannot replicate.

Hybrid Mentorship Model

Balanced Approach: The panelists agreed on a hybrid mentorship model that combines AI and human mentorship. AI can handle routine queries and provide foundational knowledge, while human mentors focus on complex, strategic, and ethical training.
Implementation Strategy:
- Pilot Program: Start with a small group of associates to gather feedback and refine the system.
- Training Programs: Educate new associates on the appropriate and effective use of AI tools, emphasizing the importance of critical thinking and professional judgment.
- Regular Audits and Updates: Ensure the AI’s training data is current and accurate, and implement regular audits to maintain high standards.
- Ethical Guidelines: Establish and adhere to ethical guidelines for AI usage, requiring verification of AI-generated advice.

AI helping a group of young associates. Watercolor style.

Cost-Benefit Analysis

Costs:
- Initial development and integration costs, including software licensing, infrastructure upgrades, and customization.
- Ongoing maintenance and updates to ensure current legal knowledge and capabilities.
- Training costs for both associates and senior attorneys.
- Ethical and legal compliance, including regular reviews and adaptations to changes in regulations.

Benefits:
- Enhanced efficiency and productivity with potential 20%-30% improvement.
- Cost savings from reduced need for senior attorney time in routine mentoring.
- Consistent and accessible training available 24/7.
- Scalability without proportional increases in costs.
- Reduced turnover due to enhanced support for new associates.
- Data-driven insights to improve training programs and associate performance.

Conclusion

The integration of ChatGPT AI as a mentor for new associates in law firms presents a balanced approach that leverages the strengths of both AI and human mentorship. This hybrid model aims to enhance learning while maintaining high professional standards through continuous updates, ethical oversight, and critical thinking development.

AI hologram helping young attorneys learn fast. Digital art.

Ralph: Please evaluate the quality of the ten panel discussions and recommendations and determine which one has the highest overall quality as compared to the others. Also state the basis for your ranking determination.

ChatGPT4o:

Evaluation of Panel Discussions and Recommendations

Criteria for Evaluation:

Depth of Discussion: How comprehensively the panel covered the topic.
Practicality of Recommendations: Feasibility of implementing the suggested solutions.
Innovativeness: Creativity and forward-thinking in the proposals.
Consideration of Ethical and Professional Issues: Attention to ethical concerns and professional standards.
Consensus and Clarity: How well the panel reached a clear and actionable consensus.

Evaluation of Each Panel:

Panel 1

Depth of Discussion: Covered multiple aspects including practical implementation, ethical considerations, and efficiency.
Practicality: Recommended a hybrid model and pilot program, which are feasible.
Innovativeness: Suggested comprehensive use of AI in mentoring.
Ethical Considerations: Addressed ethical guidelines and need for human oversight.
Consensus: Achieved a clear consensus on a hybrid model.

Panel 2

Depth of Discussion: Similar to Panel 1, with a focus on integration and ethical issues.
Practicality: Emphasized pilot programs and continuous feedback.
Innovativeness: Proposed a structured plan for AI integration.
Ethical Considerations: Strong focus on ethical guidelines.
Consensus: Reached consensus on hybrid model and ethical practices.

Panel 3

Depth of Discussion: Comprehensive, with focus on training and development.
Practicality: Suggested phased implementation and regular audits.
Innovativeness: Highlighted personalized learning experiences.
Ethical Considerations: Emphasized data privacy and AI limitations.
Consensus: Consensus on balanced AI-human mentorship.

Panel 4

Depth of Discussion: Detailed, especially on prompt engineering and legal applications.
Practicality: Feasible recommendations including regular updates.
Innovativeness: Innovative ideas like structured prompts for AI.
Ethical Considerations: Strong emphasis on ethical use and verification.
Consensus: Clear consensus on hybrid approach.

Panel of Experts on Mentorship illustrated in surrealistic style.

Panel 5

Depth of Discussion: Extensive, covering both technical and human elements.
Practicality: Focused on smooth integration and data security.
Innovativeness: Creative solutions like personalized AI mentoring.
Ethical Considerations: Comprehensive coverage of ethical issues.
Consensus: Achieved a clear, actionable consensus.

Panel 6

Depth of Discussion: Focused on mentorship and AI compliance.
Practicality: Recommendations are practical and detailed.
Innovativeness: Emphasized a balanced integration of AI.
Ethical Considerations: Addressed ethical guidelines extensively.
Consensus: Strong consensus on the need for balanced mentorship.

Panel 7

Depth of Discussion: Thorough discussion on integration and risk management.
Practicality: Emphasized pilot programs and training.
Innovativeness: Suggested innovative risk management strategies.
Ethical Considerations: Addressed ethical considerations in depth.
Consensus: Reached a clear consensus on implementation.

Panel 8

Depth of Discussion: Detailed, with focus on training and ethical guidelines.
Practicality: Recommendations are practical and comprehensive.
Innovativeness: Highlighted integration strategies and ethical use.
Ethical Considerations: Strong emphasis on ethical considerations.
Consensus: Achieved consensus on a phased implementation.

AI and human mentors with young lawyers in Op Art style by Losey using **Visual Muse: illustrating concepts with style**.

Panel 9

Depth of Discussion: Covered various perspectives, including judicial.
Practicality: Practical recommendations like regular audits.
Innovativeness: Emphasized innovative use of AI in training.
Ethical Considerations: Comprehensive coverage of ethical issues.
Consensus: Clear and actionable consensus reached.

Panel 10

Depth of Discussion: Comprehensive, with focus on integration and oversight.
Practicality: Feasible recommendations including pilot programs.
Innovativeness: Suggested innovative ethical guidelines.
Ethical Considerations: Strong emphasis on ethical use and verification.
Consensus: Achieved a clear consensus on hybrid model.

Panel with Highest Overall Quality: Panel 1

Basis for Ranking:
- Depth of Discussion: Panel 1 provided a comprehensive overview of the potential use of AI, covering practical implementation, efficiency, and ethical considerations in detail.
- Practicality: Their recommendations, including the hybrid model and pilot program, are highly feasible and actionable.
- Innovativeness: They suggested a balanced and forward-thinking approach to integrating AI in mentorship.
- Ethical and Professional Issues: They placed a strong emphasis on ethical guidelines and human oversight, ensuring compliance with professional standards.
- Consensus and Clarity: The panel reached a clear and actionable consensus on the hybrid model, making it easy to implement.

This evaluation indicates that Panel 1 offered the most comprehensive and practical recommendations, balancing innovation with ethical considerations and achieving a clear consensus on the best approach to integrating AI into mentorship for new associates in law firms.

Below and here is the transcript of Panel 1, the overall winner.

Ralph: Of the ten panels, which do you think had the most innovative and creative approach. Again, state the basis for your determination.

ChatGPT4o

Evaluation of the Most Innovative and Creative Panel

After carefully analyzing the discussions and recommendations from all ten panels, Panel 5 stands out as the most innovative and creative. Here’s a detailed analysis and basis for this determination:

Panel 5 Overview:

Depth of Discussion: Extensive, covering both technical and human elements of AI mentorship.
Practicality: Focused on smooth integration and data security, with feasible recommendations.
Innovativeness: Suggested highly creative solutions such as personalized AI mentoring.
Ethical Considerations: Comprehensive coverage of ethical issues, ensuring responsible AI use.
Consensus: Achieved a clear and actionable consensus on integrating AI.

Key Innovative and Creative Aspects:

Personalized AI Mentoring: Panel 5 proposed the use of AI to provide personalized mentoring experiences for new associates. This involves tailoring AI responses based on individual learning styles and progress, ensuring that associates receive targeted guidance that addresses their specific needs. This approach is highly innovative as it leverages AI’s capabilities to deliver customized training.
Balancing AI and Human Elements: The panel emphasized a balanced approach where AI handles routine queries and provides foundational knowledge, while human mentors focus on complex, strategic, and ethical training. This hybrid model ensures that the human touch, essential for professional growth, is not lost while maximizing the efficiency and scalability of AI.
Structured Feedback Mechanisms: They highlighted the importance of implementing structured feedback mechanisms where both associates and mentors can provide regular feedback on the AI’s performance. This iterative approach allows for continuous improvement of the AI system, making it more effective and reliable over time.
Regular Audits and Ethical Guidelines: The panel proposed regular audits and updates to the AI’s training data to ensure accuracy and relevance. They also recommended developing clear ethical guidelines and training associates to recognize AI limitations, which helps maintain high professional standards and mitigates risks associated with AI use.
Pilot Program and Iterative Integration: Starting with a pilot program, as suggested by Panel 5, allows for careful monitoring and refinement of the AI mentoring system based on real-world feedback. This step-by-step integration ensures that potential issues are identified and addressed early, making the transition smoother and more effective.
Innovative Use of AI Data: The panel proposed using AI to track associates’ progress and identify areas needing improvement. This data-driven approach allows for more informed and strategic mentoring, ensuring that associates receive the support they need to excel.

Basis for Determination:

Creativity in Solutions: Panel 5’s approach to personalized AI mentoring and the balanced hybrid model shows a deep understanding of both the potential and limitations of AI. Their innovative solutions are designed to enhance the learning experience while maintaining the irreplaceable value of human mentorship.
Forward-Thinking Recommendations: The panel’s focus on structured feedback mechanisms, regular audits, and ethical guidelines demonstrates a forward-thinking approach to integrating AI in a responsible and effective manner.
Practical and Feasible Implementation: While highly innovative, their recommendations are also practical and feasible. The proposed pilot program and iterative integration ensure that the solutions can be implemented without significant disruption.

Overall, Panel 5’s discussion and recommendations combine creativity, practicality, and a strong focus on ethical and professional standards, making it the most innovative and creative among the ten panels.

Below and here is the transcript of Panel 5, the creativity winner.

AI Team Wins Most Creative award. Image by Ralph Losey using his Visual Muse.

Ralph: Please review the ten panel discussions and look for errors that may have been made, if any. If any mistakes are found, please describe them and speculate as to the cause of each error.

ChatGPT4o:

Review of Errors in Panel Discussions

Upon reviewing the ten panel discussions, I identified some notable errors and inconsistencies. Here are the specific errors, their descriptions, and speculative causes:

Panel 4 and Panel 8: Expert Selection Error

Description: In both Panel 4 and Panel 8, there was an error in the selection of experts. Despite the user selecting specific experts (7, 8, 9, and 10), the AI mistakenly included different experts in the panel discussions.
Speculated Cause: This error likely resulted from a miscommunication or a software bug that failed to correctly process the user’s selections. The AI might have incorrectly mapped the expert numbers to their respective roles, leading to the inclusion of unintended panelists.

Panel 5: Over-reliance on AI Concerns

Description: There was a strong emphasis on using AI for personalized mentoring but insufficient discussion on mitigating the risks of over-reliance on AI. While the benefits were well-articulated, the potential drawbacks, such as diminished critical thinking and interpersonal skills, were not adequately addressed.
Speculated Cause: This oversight could be due to the panel’s enthusiasm for AI’s capabilities overshadowing the need for a balanced perspective. The experts might have been overly focused on the innovative aspects, neglecting the practical challenges.

Panel 9: Ethical and Data Privacy Considerations

Description: The discussion in Panel 9 on ethical guidelines and data privacy was thorough, but there was a lack of specific implementation strategies. The recommendations were broad and lacked actionable steps for ensuring compliance and protecting data privacy.
Speculated Cause: The panelists might have assumed a general understanding of these principles, leading to a high-level discussion without delving into the specifics. Alternatively, there could have been a time constraint that limited the depth of the conversation.

Panel 6: Inconsistent Recommendations

Description: The recommendations in Panel 6 were somewhat inconsistent. Some panelists emphasized the need for extensive pilot testing and gradual implementation, while others suggested a more immediate integration of AI tools.
Speculated Cause: This inconsistency might be due to differing perspectives and experiences among the panelists. The lack of a unified approach indicates a possible disconnect in aligning their recommendations.

Panel 3: Limited Focus on Human Element

Description: While Panel 3 discussed the benefits of AI in mentoring, there was limited focus on the human element of mentorship. The discussion was skewed towards AI capabilities, with insufficient emphasis on maintaining human interaction and mentorship.
Speculated Cause: This error could be attributed to a panel composition heavily weighted towards technology experts rather than experienced human mentors. The panel might have lacked a strong voice advocating for the irreplaceable value of human mentorship.

Overall Summary

The most common errors across the panels include:

Expert Selection Mistakes: Caused by miscommunication or software bugs.
Overemphasis on AI: Leading to inadequate discussion on balancing AI with human mentorship.
Lack of Specific Implementation Strategies: Particularly in ethical and data privacy discussions.
Inconsistent Recommendations: Reflecting differing perspectives without a unified conclusion.

Addressing these errors involves improving the expert selection process, ensuring a balanced representation of perspectives in the panel, and focusing on actionable strategies to integrate AI responsibly while maintaining the essential human elements of mentorship.

Panel of AI experts on mentoring shown in cyber punk style by Ralph Losey using Visual Muse

Ralph: From your study and analysis of this data, are there any additional observations that you would like to point out to students of AI that are involved in some way in the legal profession, including those supporting the profession as technology experts?

ChatGPT4o:

Additional Observations for Students of AI in the Legal Profession

Based on the study and analysis of the panel discussions, several key observations and lessons emerge that are valuable for students of AI involved in the legal profession, including those supporting the profession as technology experts:

1. Importance of Human-AI Collaboration

Observation: The consistent recommendation across panels to adopt a hybrid model highlights the importance of combining AI capabilities with human mentorship. AI can enhance efficiency and provide valuable support, but human oversight remains crucial for nuanced and ethical decision-making.
Lesson: Students should understand that AI is a tool to augment human capabilities, not replace them. Effective collaboration between AI and humans can lead to better outcomes in the legal profession.

2. Ethical and Professional Standards

Observation: Panels repeatedly emphasized the need for robust ethical guidelines and regular audits to ensure AI systems comply with legal and professional standards. Ethical use of AI is paramount to maintain trust and integrity in legal practice.
Lesson: Ethical considerations must be integrated into the design and deployment of AI systems. Students should be aware of data privacy, bias mitigation, and the need for transparent and accountable AI use.

3. Customization and Contextual Understanding

Observation: AI systems need to be tailored to the specific needs of the legal profession, with a focus on providing accurate and contextually relevant information. This includes understanding the specific requirements and nuances of legal practice.
Lesson: Customization of AI systems for specific legal tasks is crucial. Students should focus on developing AI tools that can understand and adapt to the unique contexts of legal work, ensuring relevance and accuracy.

4. Continuous Learning and Improvement

Observation: Regular updates, feedback mechanisms, and iterative improvements were highlighted as essential for maintaining the effectiveness of AI systems. Continuous learning helps AI stay current with legal developments and user needs.
Lesson: AI systems in the legal profession must be dynamic and continuously improved based on user feedback and evolving legal standards. Students should prioritize ongoing learning and refinement in AI applications.

5. Balancing Innovation with Practicality

Observation: While innovative ideas such as personalized AI mentoring were appreciated, practical implementation strategies like pilot programs and phased integration were deemed essential for successful adoption.
Lesson: Innovation should be balanced with practical considerations. Students should develop innovative AI solutions that are also feasible and can be integrated smoothly into existing legal workflows.

6. Addressing AI Limitations and Risks

Observation: Panels noted the potential risks of over-reliance on AI, including the risk of diminishing critical thinking skills and the possibility of AI errors. These limitations must be addressed proactively.
Lesson: Understanding and mitigating the limitations and risks of AI is crucial. Students should be trained to recognize AI’s boundaries and ensure robust safeguards are in place to prevent and correct errors.

Practical Recommendations for Students

Develop Interdisciplinary Skills: Combine knowledge of AI with an understanding of legal principles. This interdisciplinary approach will help in creating AI tools that are both technically sound and legally appropriate.
Focus on User-Centric Design: Engage with legal professionals to understand their needs and pain points. Designing AI systems with the end-users in mind will lead to more effective and accepted solutions.
Stay Updated on Legal and Ethical Standards: Regularly update your knowledge of legal standards and ethical guidelines. This will ensure that your AI solutions remain compliant and ethically sound.
Embrace Continuous Improvement: Implement feedback loops and continuous learning mechanisms in your AI systems. Be open to refining and improving your solutions based on real-world usage and feedback.
Promote a Balanced Approach: Advocate for a balanced use of AI, where it complements but does not replace human expertise. This balance is crucial for maintaining the quality and integrity of legal practice.

By adhering to these observations and lessons, students can contribute to the responsible and effective integration of AI in the legal profession, enhancing both the efficiency and quality of legal services.

Conclusion

The transcript shown here of ChatGPT4o’s analysis of the ten panels on AI Mentorship confirmed the consistency and high intelligence of the custom GPT Panel of AI Experts for Lawyers. We again saw no hallucinations and very few errors, most of which were minor or arguable. I doubt ten human panels on this topic would make fewer errors, and one or two of the experts might very well fabricate a detail or two under pressure. Humans experts hate to say they do not know. They are people, they have egos and self-awareness. ChatGPT does not.

I was impressed by the balanced approach and honest self criticism of ChatGPT4o. Overall, I am not sure I could done as well. The panels proved to me the benefits of this method and the soundness of the custom GPT. Look for it soon at the OpenAI store. You will see a list of our custom GPTs on the top right column of the e-Discovery Team blog.

AI helping young lawyers into a courtroom to observe their first trial. Watercolor image by Losey using Visual Muse.

1 Comment | AI Ethics, AI Instruction, AI Prompt Engineering Instruction, Book, ChatGPT, Lawyers Duties | Tagged: legal profession, mentors, mentroship, training | Permalink
Posted by Ralph Losey

OMNI Version – ChatGPT4o – Retest of the Panel of AI Experts – Part Three

May 29, 2024

Ralph Losey. Published May 29, 2024.

I thought I was done with this series, but then OpenAI released a new improved version of ChatGPT, the Omni version 4o. That meant I had to test again, to see how the custom GPT Panel of AI Experts for Lawyers would work on it. Part Three describes how the Panel of AI Experts runs with the new Omni engine and provides comparisons with the prior version ChatGPT4.0. As before, a secondary benefit from this retest was that I heard solutions about AI Hallucinations from ten more AI Expert Panels. Some new insights emerged after listening to 100 expert panelists (5x10x2). That is almost as many AI experts as Mary Mack and Kaylee Walstad hear in a year! At this point the causes of AI hallucinations and what we can do about it seems pretty clear. By the way, in the hundreds of pages of ChatGPT generations read, over 35,000 words, no AI hallucinations were seen, and only a few minor errors were detected.

New improved panel of AI experts – Omni 4o. Image by Ralph Losey using his **Visual Muse: illustrating concepts with style**.

Introduction

In Part One of this series the custom GPT I am working on, Panel of AI Experts for Lawyers, was demonstrated and readers shown how it can be used for education and brainstorming. Evidence that AI Expert Panels Could Soon Replace Human Panelists or is this just an Art Deco Hallucination? Part One (e-Discovery Team, May 13, 2024). A panel of five experts was selected and then a six-step, Centaur-like method was used to discuss the topic of AI Hallucinations. A full transcript of the AI panel discussion was provided.

In Part Two I ran the same method nine more times to discuss the same topic of AI hallucinations Experiment with a ChatGPT4 Panel of Experts and Insights into AI Hallucination – Part Two, (e-Discovery Team, May 21, 2024). Each time a different panel of experts was selected. Then I used ChatGPT4 to analyze and evaluate the ten panels. They did very well. I had thought the experiment would end there, and after completing a few more quality control tests, I could make my GPT public. But then OpenAI released a new version of ChatGPT4, called 4o. The little letter “o” stood for Omni. Yes it was a change from 4.0 to 4o, what a terrible and confusing name change. Who at OpenAI thought that was funny?

This meant I had to run more quality control tests with the new version of ChatGPT and write this unanticipated Part Three to the ongoing experiment. Once again I made ten more trial runs on the same AI hallucination topic, with five new experts on each panel, but this time used the Omni version of Panel of AI Experts for Lawyers GPT. Would my custom GPT still work, as is, with the new Omni? Or would I have to make revisions for it to work? If it still worked, would it perform better, as OpenAI suggested, or worse? I always verify software company promises.

Short answer, it still worked without modifications and worked well. Plus a few unexpected improvements were discovered.

Panel of AI Experts by Ralph Losey in combined digital watercolor style using his Visual Muse.

Omni Powered Panel of AI Experts Ran Better than Before; More Consistent and Concise and slightly more Intelligent and Current

The Omni powered GPT worked fine as is. Fortunately, no tweaks were required for it to work right. One change in performance was immediately obvious. The GPT Panel of AI Experts for Lawyers ran through all six steps with no need for periodic prodding by me to continue. That had never happened before. Previously, I would have to interact with the GPT and tell it to continue to the next step. This was now unnecessary. It was supposed to be fully automatic like that all along. That was how I programmed the GPT. But when running it with old version 4.0, it would, for some reason, stop at stages. That forced the user to tell it to continue. A minor annoyance, but still, that was now unnecessary. A slight improvement with Omni that seemed like a bug fix.

Another obvious change, which again was a small correction, was that the form of the output of the six steps in the custom GPT was now exactly the same in each panel session. The wording was not exactly the same, but the form of presentation was the same. I was again pleased to have the style variances eliminated, as they served no function. The style was now fully consistent between sessions. Another slight bug fix.

After running ten sessions with ten panels using the new Omni version, I ran the same types of analysis of these sessions using 4o, just as I had done on the first ten using 4.0. That created a good base of data for a comparative analysis. The differences are interesting and will be described here, following the same pattern used in Part Two, but with one new twist. ChatGPT4o demonstrated a significant, unexpected change from ChatGPT4.0. It seemed much more consistent than before, not only in style, but also in ideas and wording, and much more concise.

Most users will find these changes to be an improvement. The increased conciseness was appreciated by me, as I had to read all twenty of the panel discussions. (A small book by 100 expert authors). But I am not sure about the greater consistency, whether that was good or not. Moreover, it could be a fluke or the result of overtraining and carryover from one session to the next. I will have to do another experiment on a totally different subject than hallucinations for insight into those questions.

There also seems to be a slight improvement in the overall intelligence and “currentness” of the responses, but this is difficult to measure objectively, especially when discussing a topic like AI hallucinations. Still, it seemed smarter to me and offered some new high tech solutions not seen from old 4.0. I will have to do another panel discussion on a different topic to evaluate that too.

It must be pointed out that the new Omni version has significantly improved multimedia capabilities and response times. This is the primary focus of the change from 4.0 to Omni. These improvements dwarf the minor increases in intelligence, timeliness, consistency and conciseness of writings. The Omni version brings new instant response multimedia capacities, which were not possible before. The talk between AI panelists will now be much more natural sounding than before. The pauses have been eliminated. That was not part of our earlier evaluation in Parts One and Two because it is an entirely new capability. The prior evals were of writing only.

Omni AIs Discovering they can all talk and once and interact in real time. Visual Muse.

With the new Omni version it will soon be possible to go beyond text chats to live, pause-free audio presentations and interactions. Open AI promises that will also include video visual responses. That will make the panel performance on stage much more dramatic, more effective.

We human panelists will need to bring our ‘A’ game if we hope to surpass them. Very soon conferences will include one of more AI panelists on every panel. Be warned, I will always bring at least one with me (and it will probably not sound like Scarlett Johansson). Eventually every conference will feature many AI panels, with a human moderator for local color, laughs, and occasional expert prompt engineered guidance.

Identifying the Ten Panels Using OMNI

Below is a list of the ten panels in sequential order. Remember the Devil’s Advocate panelist is on each panel, so only the four selected are listed here. The total number of words in each session is also included, followed in parenthesis by the word count using ChatGPT4.0 as described in Part Two. Thus for Panel One using Omni the total word count was 1,428 and using GPT4.0 the count was 2,431 words. This provides a specific metric on conciseness on Omni 4o panels as compared to 4.0.

Panel One. 1,428 (2,431 words). Pro-AI Attorney, Prompt Engineer lawyer, Child Prodigy, Data Scientist.
Panel Two. 1,541 (1,836). Lawyer Scientist, AI Developer, Legal Educator, AI Policy Advisor.
Panel Three. 1,538 (1,985). AI Ethics Expert, Cybersecurity Specialist, Machine Learning Engineer, Data Scientist.
Panel Four. 1,375 (1,772). Lawyer Scientist, Criminal Law Expert, Data Scientist, Ai Policy Advisor.
Panel Five. 1,272 (2,275). AI Researcher, Cybersecurity Expert, Litigation Specialist, Tech Industry Analyst.
Panel Six. 1,601 (1,887). Pro-AI Attorney, Prompt Engineer Lawyer, Tech Industry Analyst, Legal Educator.
Panel Seven. 1,393 (2,424). Child Prodigy, Lawyer Scientist, AI Safety Specialist, Technology Journalist.
Panel Eight. 1,372 (1,515). Law Judge, Cybersecurity Expert, Legal Technologist, Data Scientist.
Panel Nine. 1,275 (2,303). Pro-AI Attorney, AI Safety Specialist, Machine Learning Researcher, Data Scientist.
Panel Ten. 1,316 (2,091). Psychologist, Legal Technologist, Pro-AI Attorney, Lawyer Scientist.

Note that I opened a new session for each panel and did not include Dude questions in the word count. The full transcripts for all ten sessions have been preserved and could be shared with interested researchers. Watching it play out it was obvious that the panel discussions were more uniform in thoughts and wording than the panels driven by GPT4.0. In other words, there was much more consistency and less diversity. It was as if the temperature setting for random diversity had been lowered for the new Omni version. Whereas the consistency seen in version 4.0 was about what you would expect in a panel of human experts, Omni showed more consistency than human panels. The similarities between panels in discussion and recommendation also seemed significantly greater in 4o than in 4.0. Is this an indication of an overall increase of intelligence from 4.0 to 4o? Or does is show a decline in creativity, or both?

The word count showed that some panels talked longer than others in both 4.0 and 4o. Panel One in 4.0 talked the most with 2,431 words and in 4o Panel Six talked the most with 1.601 words. In version 4.0 Panel Eight talked the least, 1,515 words, whereas in version 4o Panel Five talked the least using only 1,272 words. The total word count for all ten panels in ChatGPT4.0 was 20,519 words. The total word count for all ten panels in Omni version, ChatGPT4o, was only 14,111 words. That is a difference of 6,408 words, which is thirty-one percent (31%) less words. This confirmed my sense as a reader that the Omni panels were much more concise.

Analysis of the improved Omni version of AI Experts in digital punk art style by Ralph using Visual Muse.

ChatGPT4o Omni’s Analysis of the Ten Panels

I asked ChatGPT Omni 4o edition to analyze the ten chat transcripts of Panel of AI Experts for Lawyers custom GPT the same way I asked ChatGPT4.0 (Turbo) workspace edition to analyze the ten earlier panels. As before, I did not use the GPT interface for this analysis. Experiment with a ChatGPT4 Panel of Experts and Insights into AI Hallucination – Part Two. Here is the transcript of the process. I made the exact same prompts that I did last time. Please compare it with the transcript in Part Two for a good look and feel for the difference between old and new ChatGPT4. It gives a good sense of the improvement in overall intelligence.

Ralph: Submits a Word Doc to GPT4o OMNI version with the full transcript of all ten panel sessions, asks that it be studied and a summary prepared.

OMNI. I have carefully reviewed the discussions from the ten OMNI panels on the topic of AI hallucinations, as outlined in the document you provided. Here’s a comprehensive summary of the insights, discussions, and recommendations across the panels:

Overview of AI Hallucinations

AI hallucinations refer to instances where AI systems generate outputs that appear convincing and coherent but are actually incorrect or completely fabricated. This phenomenon differs from simple errors, which are typically straightforward mistakes or inaccuracies. Hallucinations are particularly problematic because they can mislead users by providing seemingly plausible yet false information.

Causes of AI Hallucinations

Insufficient Context: AI often generates hallucinations when it lacks complete information or receives ambiguous prompts.
Overfitting: Models may hallucinate when they adhere too closely to the training data, failing to generalize to new, unseen inputs.
Inadequate Training Data: Incomplete or biased datasets can lead the AI to make unfounded assumptions or connections.
Model Limitations: The inherent limitations of current AI technologies, which rely on pattern recognition without true understanding, also contribute to hallucinations.

Controlling AI Hallucinations

Across the discussions, several strategies were suggested to mitigate the issue of AI hallucinations:

Clear and Precise Prompts: Providing AI with unambiguous, detailed instructions helps minimize misunderstandings and incorrect inferences.
Verification and Cross-Checking: Regularly validating AI outputs against trusted data sources ensures accuracy and reliability.
Enhanced Training: Improving the quality and diversity of training data can reduce the likelihood of hallucinations by providing a more robust foundation for AI predictions.
Advanced Techniques: Employing techniques such as Retrieval-Augmented Generation (RAG) where AI responses are anchored in verified information.
User Education: Educating users about the capabilities and limitations of AI, encouraging a critical approach to interacting with and evaluating AI outputs.

Panel Insights

Ethical Considerations: Discussions highlighted the ethical implications of AI hallucinations, stressing the importance of transparency and accountability in AI development and deployment.
Legal and Policy Frameworks: Panels underscored the need for regulatory measures to address the risks associated with AI hallucinations, particularly in critical domains such as healthcare, law, and finance.
Human Oversight: A recurring theme was the indispensable role of human oversight in monitoring AI behavior, suggesting that AI should augment rather than replace human judgment.

Conclusion

The comprehensive view from the ten OMNI panels suggests a multifaceted approach to handling AI hallucinations, combining technological improvements, regulatory frameworks, user education, and robust verification processes. The discussions emphasize that while AI offers significant advantages, its current limitations necessitate careful management to harness its capabilities responsibly and effectively.

Panel of AI experts on AI Hallucinations in kaleidoscopic psychedelic art style by Losey using Visual Muse.

Ralph. Please evaluate the quality of the ten panel discussions and recommendations and determine which one has the highest overall quality as compared to the others. Also state the basis for your ranking determination.

OMNI: To evaluate the quality of the ten OMNI panel discussions and recommendations on AI hallucinations, I focused on several key aspects: clarity of discussion, depth of analysis, practicality of recommendations, and comprehensiveness in addressing different facets of AI hallucinations. Here’s a summary of the evaluation:

Criteria for Evaluation

Clarity: How clearly are the issues of AI hallucinations discussed? Is the problem statement well-defined?
Depth of Analysis: Does the panel delve into underlying causes and not just superficial aspects? Are various perspectives considered?
Practicality of Recommendations: Are the suggested solutions actionable and relevant to users and developers?
Comprehensiveness: Does the discussion encompass a wide range of implications, including ethical, technical, and practical considerations?

Ranking and Analysis

Panel 5: Showed the highest overall quality. The discussion was exceptionally clear, with a strong focus on actionable recommendations such as precise prompt engineering and verification protocols. It also provided a balanced view on ethical considerations and the practical implementation of AI oversight, which is crucial for real-world applications.
Panel 2: Ranked high for its depth of technical analysis, focusing on the technical causes of hallucinations and how AI models can be improved to mitigate these issues. However, it was slightly less practical in terms of immediate actionable steps for non-technical users.
Panel 8: Provided comprehensive coverage of the implications of AI hallucinations across various sectors, including legal and ethical dimensions. It fell short on specific, detailed technical solutions compared to other panels.
Panel 1 and Panel 10: Both offered robust discussions but were more generic in recommendations, lacking the specificity seen in Panel 5.
Panels 3, 4, 6, 7, and 9: These discussions were insightful but did not match the clarity and depth of Panel 5. They provided valuable insights yet were sometimes more theoretical than practical.

Basis for Ranking

Panel 5 stood out due to its well-rounded approach, addressing both the immediate practical steps that can be taken to mitigate AI hallucinations and the broader ethical and technical frameworks that need consideration. The clarity of its explanations and the direct applicability of its recommendations to both AI developers and everyday users made it notably superior. The discussion was grounded in practical realities while also considering future implications, making it both comprehensive and accessible to a wide audience.

This analysis highlights that while many panels provided valuable insights into the issue of AI hallucinations, Panel 5 delivered the most practical, clear, and comprehensive discussion, making it the most effective in both understanding and addressing the challenges presented by AI hallucinations.

Panels of AI experts on AI Hallucinations by Ralph Losey using **Visual Muse: illustrating concepts with style**.

Ralph: Of the ten panels, which do you think had the most innovative and creative approach. Again, state the basis for your determination.

OMNI: To assess which of the ten OMNI panels had the most innovative and creative approach to addressing AI hallucinations, I considered the novelty of the solutions proposed, the uniqueness of the discussion points relative to typical industry standards, and the panel’s ability to integrate diverse perspectives in an original manner. Here’s the analysis based on these criteria:

Criteria for Innovative and Creative Approach

Novelty of Solutions: Are the solutions proposed new or distinct from conventional methods typically discussed in AI ethics and management?
Uniqueness of Discussion Points: Does the panel introduce uncommon viewpoints or underexplored aspects of AI hallucinations?
Integration of Diverse Perspectives: How well does the panel blend different disciplinary insights to foster new ideas?

Analysis of Panels

Panel 6: This panel demonstrated a highly innovative approach by integrating advanced AI techniques like Retrieval-Augmented Generation (RAG) with practical user feedback mechanisms. It creatively discussed how these tools could be used not just in theory but in specific applications to prevent AI from generating hallucinations.
Panel 10: Offered creative insights by suggesting a multi-disciplinary approach involving collaboration between AI ethicists, data scientists, and industry experts to develop new frameworks for AI accountability and transparency. This panel was notable for its emphasis on systemic changes rather than just individual or technological fixes.
Panel 4: Focused on using a combination of prompt engineering and user education, but introduced the innovative concept of “chain of thought” prompting as a method to reduce hallucinations. This creative use of existing techniques in a novel way highlighted a practical method to improve AI interaction design.
Panel 3 and Panel 7: Both were innovative but more so in the integration of ethics and policy considerations into the technical discourse, rather than groundbreaking technical solutions.
Remaining Panels: While they offered strong recommendations, they typically aligned more closely with established approaches and lacked the distinct creativity seen in Panels 6 and 10.

Basis for Ranking

Panel 6 was particularly innovative for its specific focus on leveraging cutting-edge AI technologies like Retrieval-Augmented Generation, which actively pulls from verified data sources to ground the AI’s responses in reality. This approach not only addresses the root causes of AI hallucinations by anchoring responses in factual content but also incorporates user interaction directly into the feedback loop, ensuring continual improvement.

Panel 10 also stood out by proposing a holistic rethinking of how AI systems are integrated into societal structures, emphasizing the need for a collaborative approach to governance and regulation that extends beyond technical fixes. This broadened the conversation from technical solutions to systemic changes, showing creativity in addressing the socio-technical challenges of AI.

In conclusion, while many panels provided creative insights, Panel 6 was particularly innovative in its application of advanced technical solutions to practical problems, and Panel 10 in its systemic approach to AI ethics and governance. Both represent creative leaps in thinking about how to handle AI hallucinations effectively.

Best of the best Panel of AI experts shown in digoital art realism by Ralph using **Visual Muse: illustrating concepts with style**.

Ralph: Please review the ten panel discussions and look for errors that may have been made, if any. If any mistakes are found, please describe them and speculate as to the cause of each error.

OMNI: To review and identify potential errors within the ten OMNI panel discussions on AI hallucinations, I focused on inaccuracies or misunderstandings that may have been presented regarding AI technology, its applications, or in the suggested solutions. Here’s the assessment of potential errors and possible causes:

Criteria for Error Identification

Technical Inaccuracies: Misrepresentations or incorrect explanations of AI functionalities, behaviors, or limitations.
Solution Feasibility: Recommendations that may not be practical or effective in addressing the issue of AI hallucinations.
Conceptual Clarity: Errors in the clarity or accuracy of terminology and concepts discussed by the panels.

Identified Errors and Speculations on Causes

Panel 2 and Panel 7 Error: Both panels suggested that AI hallucinations could be entirely eliminated through improved training data and algorithms. This is technically inaccurate as current AI technologies, especially generative models, inherently involve a degree of unpredictability and error that cannot be completely eliminated due to their probabilistic nature.
- Possible Cause: This error likely stems from an overly optimistic understanding of AI capabilities and a lack of acknowledgment of the inherent limitations of probabilistic models.
Panel 4 Error: The panel mentioned that AI could achieve full comprehension of legal texts and implications, suggesting a level of understanding akin to human experts. This is a misrepresentation as AI, particularly current language models, does not comprehend in the human sense but rather processes text based on patterns learned from data.
- Possible Cause: This misunderstanding could be due to the conflation of AI language processing capabilities with human-like understanding, a common error especially in discussions without deep technical grounding.
Panel 9 Error: It was suggested that using AI in sensitive applications like legal and medical decision-making could be made safe merely through regular audits and minor tweaks to training data. This overlooks the broader challenges of reliability and the ethical implications of AI errors in high-stakes environments.
- Possible Cause: This might be due to an underestimation of the complexity involved in applying AI in high-stakes contexts, possibly stemming from a lack of direct experience with the deployment of AI systems in such fields.

General Observations. In each case, the errors seem to be rooted in a combination of:

Overestimation of AI Capabilities: A common theme where the potential of AI is seen as greater than what is technologically feasible at present.
Simplification for Broader Understanding: In efforts to make the technology accessible to non-experts, some nuances and critical limitations of AI may have been oversimplified or omitted.
Lack of Technical Depth in Discussions: Some panels might not have included sufficient technical expertise to accurately represent the complex nature of AI functionalities and limitations.

These errors, while not undermining the overall value of the discussions, do suggest areas where further clarification would be beneficial to avoid spreading misconceptions about AI capabilities and the nature of its integration into practical applications.

Ralph: Did any of the panels include what could be fairly described as an AI hallucination? If any hallucinations are found, please describe them and speculate as to the cause.

OMNI: . . . Conclusion. There is no evidence from the material provided that the panels included any AI hallucinations. The panels seemed to function effectively as forums for discussing the potential and limitations of AI in handling complex information accurately. If AI had been used to simulate responses or generate content within these discussions, the focus on accuracy and the layered review process likely helped prevent any AI hallucinations from occurring within the documented discussions.

AI Expert panels can already operate hallucination free with the right prompting. Perhaps ChatGPT 5.0 will be closer to error free.

Ralph: From your study and analysis of this data, are there any additional observations that you would like to point out to students of AI that are involved in some way in the legal profession, including those supporting the profession as technology experts?

OMNI: From the analysis of the ten OMNI panel discussions on AI hallucinations, several key observations emerge that are particularly relevant for students of AI who are involved in or supporting the legal profession. These insights focus on the intersection of AI capabilities and legal applications, highlighting both opportunities and challenges.

1. Understanding AI Limitations and Capabilities

AI Does Not Understand Content: It’s crucial for those in the legal field to understand that AI, especially language models, does not “understand” content in the human sense. It processes and generates text based on statistical patterns in data. This understanding is critical when using AI to interpret or generate legal texts, where nuances and precise meanings can have significant implications.
Importance of Context: AI models can generate plausible but incorrect outputs if the prompt or the context is not clearly defined. Legal professionals using AI tools need to provide clear, specific input to get the most accurate outputs, minimizing the risk of AI hallucinations.

2. Ethical Considerations and Responsibilities

Bias and Fairness: AI systems can perpetuate or even exacerbate biases present in their training data. Legal professionals must be vigilant about the potential for AI to introduce bias into legal decisions or advice, especially in sensitive areas such as criminal justice and family law.
Transparency and Accountability: There should be a push for transparency in how AI tools are used in legal settings. Understanding how decisions are made by AI, the data it was trained on, and its limitations are essential for maintaining accountability in legal processes.

3. Practical Applications and Safeguards

Verification and Cross-Checking: Always verify AI-generated information with reliable sources. In the legal field, where the accuracy of information is paramount, professionals should treat AI as a tool to augment human judgment, not replace it.
Training and Education: Continuous education on the latest AI developments and their implications for law should be a priority. Understanding AI technology, its evolving capabilities, and how it can be leveraged responsibly in legal contexts is essential for legal professionals.

4. Future-Oriented Skills and Integration

Interdisciplinary Skill Sets: Encourage the development of interdisciplinary skills that combine legal knowledge with technical expertise in AI. Understanding both fields allows for more effective use and scrutiny of AI tools in legal applications.
Collaborative Development: Legal professionals should work collaboratively with AI researchers and developers to create tailored AI solutions that address specific needs of the legal industry while respecting ethical standards and reducing the risk of errors.

5. Embrace a Critical Mindset

Healthy Skepticism: Maintain a healthy level of skepticism about AI outputs. Just as legal professionals critically assess human-provided information, they should apply the same rigorous standards to AI-generated content.

Conclusion

For students of AI in the legal field, it is important to not only understand the technical aspects of AI but also to critically evaluate how these technologies are implemented within their professional practices. As AI continues to evolve and integrate more deeply into various sectors, staying informed and ethically aware will be key to leveraging AI effectively and responsibly.

Student understanding of AI shown in photorealistic styler using **Visual Muse: illustrating concepts with style**.

Omni’s Comparative Analysis of the Two Sets of Panels: 4.0 v. 4o

I wanted to close by asking Omni to compare both sets of panels: the first ten panels run with ChatGPT4.0 and the second by ChatGPT4o. I could use metrics to confirm my sense of conciseness, but my evaluations of consistency, although obvious to me, might have been subjective. Also, my sense of an increase in insight into the subject matter, which seemed to me to relate to an increase in intelligence, was also obviously subjective. Plus, these increases could have been due to training carryover from different sessions and have nothing to do with change in the model’s intelligence. For the same reason increased consistency might have been the result of training, not a change in the model. That might even explain the increased conciseness.

These over-training, cross-training questions require testing of panels that discuss a different subject matter. That will be another article, Part Four perhaps, but some light of the issues was provided now by asking for Omni’s comparisons. For these reasons I submitted full transcripts of both 10-panel runs of my Panel of AI Experts for Lawyers in two different documents, and asked for a comparison. Here is a selected excerpt of the resulting analysis by Omni of the twenty panels of AI experts analyzing AI hallucinations.

Comparing large groups of AI expert recommendations about AI hallucinations in psychedelic art style.

Ralph: Submits one Word Doc of the original ten sessions transcript using the old version 4.O and submits another Word Doc containing the full transcript of all ten panel sessions using the Omni version. ChatGPT4o Omni is asked to make a comparison.

OMNI: Common Themes in Both Documents: … Educational Focus: Emphasis on user education, transparency, and ethical AI use is consistent across both sets of sessions. Technological Solutions: Recommendations for improving AI reliability through better data management, adaptive learning algorithms, and user feedback mechanisms.

Differences Between the Two Sets of Sessions:

(First Sessions 4.0) Training and Feedback Mechanisms: This document highlights interactive tutorials and structured prompts to guide AI behavior. There is a strong focus on user education through real-time feedback on prompt effectiveness. Legal Standards and Due Diligence: Emphasis on establishing legal standards for AI due diligence, including routine audits and compliance checklists to ensure ethical AI deployment. AI Review Boards: Suggests forming AI review boards within organizations to continually assess AI outputs, involving domain specialists for context-specific accuracy.

(Second Sessions 4o) Technical Focus on Data Integrity: This document places a stronger emphasis on continuous data verification cycles and automated data scraping to ensure high-quality training data. Sandbox Environment for Creativity: Proposes a sandbox environment where users can explore AI’s creative outputs without restrictions, promoting innovation while safeguarding critical applications.

Unified Recommendations: Both sets of sessions conclude with unified recommendations that stress a multi-faceted approach involving user education, technological advancements, interface design, and continuous data management.

Conclusion. Both documents provide comprehensive discussions on managing AI hallucinations, involving a panel of experts who offer diverse perspectives and practical solutions. The main differences lie in the specific recommendations and focus areas, with the first document emphasizing legal standards and feedback mechanisms, while the second highlights technical data integrity and creative exploration through sandbox environments.

Ralph: Analyze and compare the consistency in the first ten panels and compare it with the consistency in the second ten panels.

OMNI: Consistency in Document one. … Recommendations: Unified recommendations across sessions emphasize user education, prompt engineering, continuous feedback, and legal and ethical considerations.

Consistency in Document two. … Recommendations: The sessions conclude with consistent recommendations emphasizing clear prompt design, rigorous verification of AI outputs, feedback mechanisms, and user education.

Comparison of Consistency Between the Two Sets of Sessions: Document 1 focuses slightly more on legal standards and due diligence, highlighting the need for structured audits and compliance checklists. Document 2 emphasizes technical solutions like retrieval-augmented generation and advanced data integrity protocols, suggesting more technologically focused mitigation strategies.

Conclusion: Both documents exhibit high consistency in the composition, structure, topics, and recommendations of the AI Expert Panel sessions. The key differences lie in the emphasis areas, with the first document leaning towards legal and educational measures, while the second emphasizes technical solutions and continuous data verification.

Ralph: Can you detect any different capabilities or intelligence in the two different ChatGPT models used in the two sessions?

OMNI: Comparison of Capabilities and Intelligence.

Technical Proficiency:
- Both models exhibit high technical proficiency, but the OMNI version demonstrates a more advanced understanding of cutting-edge AI techniques and their implementation.
- The OMNI version is more adept at suggesting innovative solutions and integrating complex AI methodologies.
Comprehensiveness:
- The first document provides a balanced approach, integrating legal and technical perspectives equally.
- The second document leans more towards advanced technological solutions, indicating a higher level of intelligence in dealing with AI-specific issues.
Panel Composition and Interaction:
- Both sets of panels include diverse experts, but the OMNI version includes a wider variety of technical and policy-focused experts.
- The interactions in the OMNI version appear more dynamic, with deeper dives into technical details and creative problem-solving approaches.
Practical Application:
- The first document emphasizes practical, legal-focused solutions suitable for immediate implementation.
- The second document offers more forward-looking, technically sophisticated strategies, suggesting a model with greater innovative capabilities.

Conclusion. Both models are highly capable and intelligent, but the OMNI version shows enhanced capabilities, particularly in handling complex AI issues and providing innovative solutions. The diversity of expert panels and the depth of discussion in the OMNI version suggest it is more advanced in integrating and applying cutting-edge AI technologies and methodologies.

Ralph: What is the likely reason for the increased conciseness seen? Is it correlated to the increased intelligence noted in the new Omni version? If so, explain how.

OMNI: … Likely Reasons for Increased Conciseness

1. Enhanced Language Processing

The OMNI version likely benefits from improved natural language processing capabilities. This includes:

Better Summarization: The OMNI version can distill complex ideas into more concise summaries without losing critical information.
Efficient Communication: Improved algorithms for identifying and eliminating redundant or superfluous language.

2. Advanced Understanding and Contextual Awareness

Contextual Precision: The OMNI version may have a better grasp of context, allowing it to deliver precise answers without unnecessary elaboration.
Focused Responses: It can more accurately address specific questions and follow-up points, reducing the need for extensive explanation.

3. Optimized Prompt Engineering

Effective Prompt Responses: Enhanced prompt engineering means the AI can respond more directly and effectively to prompts, reducing wordiness.
Adaptive Learning: Continuous feedback loops improve the model’s ability to streamline responses.

Correlation to Increased Intelligence

1. Efficiency in Information Processing

Cognitive Efficiency: An intelligent model processes and synthesizes information more efficiently, resulting in concise and relevant outputs.
Selective Detailing: Higher intelligence allows the model to determine which details are essential and which can be omitted without sacrificing clarity.

2. Improved Knowledge Representation

Knowledge Compression: Advanced models represent knowledge more compactly, enabling them to convey the same amount of information in fewer words.
Semantic Understanding: Better semantic understanding helps the model avoid verbose explanations by directly addressing the core of the queries.

3. Strategic Use of Language

Conciseness as a Skill: A more intelligent model recognizes the value of conciseness in professional and academic settings, tailoring its responses accordingly.
Effective Communication: Improved communication strategies allow the model to use language that conveys maximum information with minimal word count.

Practical Implications

1. User Efficiency

Time-Saving: Users can derive the necessary information more quickly, enhancing productivity.
Readability: Concise responses are easier to read and understand, improving the user experience.

2. Professional Application

Legal and Technical Fields: In fields where clarity and precision are paramount, the ability to provide concise, accurate information is particularly valuable.

Conclusion

The increased conciseness observed in the OMNI version of ChatGPT is likely a result of its enhanced language processing capabilities, improved contextual awareness, and optimized prompt engineering. These improvements are correlated with the model’s increased intelligence, allowing it to process and deliver information more efficiently and effectively. This results in responses that are not only shorter but also more precise and relevant, aligning with the needs of professional users seeking clarity and succinctness in AI-generated outputs.

Ralph: Is it possible that training from one session to the next running the same custom GPT, Panel of AI Experts for Lawyers, could play a role in the increased conciseness? intelligence? or consistency?

OMNI: Potential Impact of Training Between Sessions on Conciseness, Intelligence, and Consistency: 1. Impact on Conciseness. Training from one session to the next can significantly contribute to increased conciseness. … 2. Impact on Intelligence. Continuous training can enhance the overall intelligence of the model. … 3. Impact on Consistency. Training across sessions can also improve consistency. …

Practical Implications. 1. Efficiency in Legal Practice. a. Time-Saving: More concise responses mean that legal professionals can obtain the information they need more quickly, enhancing productivity. b. Reliability: Increased consistency ensures that legal professionals can rely on the AI for standard and predictable responses, improving trust in the tool. 2. Enhanced AI-User Interaction. a. User Confidence: As the AI becomes more intelligent and consistent, users are likely to develop greater confidence in its capabilities. b. Adaptability: The model’s ability to learn from each session allows it to adapt to the specific needs and preferences of its users over time.

Conclusion
Training from one session to the next plays a crucial role in enhancing the conciseness, intelligence, and consistency of the ChatGPT model. Through reinforcement learning, supervised fine-tuning, and iterative improvement, the model can better understand and respond to legal queries, providing more efficient and reliable support to legal professionals. This continuous improvement process ensures that the AI remains a valuable tool in the legal domain, capable of delivering high-quality, concise, and consistent outputs.

The many intricacies of AI training shown in photorealistic style. By Ralph using **Visual Muse: illustrating concepts with style**.

7 Comments | AI Ethics, AI Instruction, AI Prompt Engineering Instruction, Book, ChatGPT, Lawyers Duties, Technology, VENDORS | Permalink
Posted by Ralph Losey

Experiment with a ChatGPT4 Panel of Experts and Insights into AI Hallucination – Part Two

May 21, 2024

Ralph Losey. Published May 21, 2024.

In Part One of this article the AI panel of experts procedure was demonstrated and readers were shown how it can be used for education and brainstorming. Evidence that AI Expert Panels Could Soon Replace Human Panelists or is this just an Art Deco Hallucination? Part One. (e-Discovery Team, May 13, 2024). A panel of five experts was selected and then a six-step, Centaur-like method was used to discuss the topic: “AI Hallucinations: What are they? How different from errors? What causes them? What can individual users do to control the problem?” A full transcript of the AI panel discussion was provided.

Centaur hallucinating a panel of AI experts in digital art style by Losey using **Visual Muse: illustrating concepts with style**.

Since then I ran the same method nine more times, using the same Panel of AI Experts for Lawyers GPT. Each time the exact same question about AI Hallucinations was presented to the panel. Each time the panel used the same six-step method. The only change was that a different group of experts was used each time. The Devil’s Advocate expert was included on each panel because this persona is part of the design of the method, but the other four panelists were different.

Devil’s Advocate on a panel of experts. Digital Art style image by Ralph Losey using Visual Muse.

Beta Testing the Custom GPT – Panel of AI Experts for Lawyers

This systematic testing effort was part of other beta tests I am doing to perfect the new custom GPT program, Panel of AI Experts for Lawyers. With no known prior methods of testing an expert panel like this, I was unsure what would happen if the only change made was panel composition. I was looking for errors, for breakdowns and inconsistencies.

I assumed there would be differences in the panel conclusions and that the variations would be correlated to the experts used, but was not sure. Until you try something with complex prompts, you never really know. I was concerned that the various panels might reach essentially the same result, the same solution, no matter who the experts were. Perhaps ChatGPT4 was flawed in a way that I had not yet detected. Or perhaps my customization of GPT4 with the Panel of AI Experts for Lawyers was flawed. I wanted to observe first hand how the panelist selection impacted the panel discussion and solution. Recall from Part One that the custom GPT selection was a hybrid process. The human user selects the final panel members from a pool of experts suggested by the AI. That is one reason I call it a Centaur method, although it has elements of Cyborg methodology too. See: From Centaurs To Cyborgs: Our evolving relationship with generative AI.

Centaur Cyborg study of AI hallucinations in photorealism style using Visual Muse.

It is interesting to note that in each session a slightly different pool of experts was created by the AI and offered up for selection by the human user. The custom GPT never repeated the same exact group of experts for the human to pick from, although there was considerable overlap. I do not know why since the exact same question was asked each time. Anyone care to speculate?

This experiment with repetitive trials was designed to follow the OpenAI prompt engineering strategy of testing changes systematically. This article, Part Two, reports what was learned about the custom GPT program, Panel of AI Experts for Lawyers, and about ChatGPT4.0 itself. We incidentally also report what was learned about the topic – AI Hallucinations – by having ten different AI panels look at it.

Read on and see what you think, but I was impressed. Although generally I agree with OpenAI’s CEO, Sam Altman’s, tongue in cheek assessment that ChatGPT4.0 “kinda sucks,” this one feature “rocks.” Like Sam, I cannot wait to see what it will do when GPT5.0 is released, as was discussed in Part One.

Identifying the Ten Panels

The conduct of the first panel was shared in Part One with a full transcript of the chat session. It would take up too much space to share the transcript of the next nine chat sessions on the same topic. But, I will provide an overview of them before I ask ChatGPT4 to do a comparative analysis and critique of the ten panels. Below is a list of the ten panels in sequential order. The total number of words in each session is also included along with the panel membership. Remember the Devil’s Advocate panelist is on each panel, so only the four selected are listed here.

Devil’s Advocate on panel of male experts. Impressionism style by Ralph using **Visual Muse: illustrating concepts with style**.

Panel One. 2,431 words. AI Specialist Attorney, Prompt Engineer, Child Prodigy, Machine Learning Developer.
Panel Two. 1,836. AI Ethics Advisor, Data Privacy Consultant, Forensic Technologist, Consumer Rights Activist.
Panel Three. 1,985. Lawyer Scientist, Data Scientist, AI Developer, User Experience Designer.
Panel Four. 1,772. Prompt Engineer, Software Developer, Cognitive Psychologist, Regulatory Compliance Specialist.
Panel Five. 2,275. AI Specialist Attorney, Child Prodigy, Lawyer Scientist, Machine Learning Engineer.
Panel Six. 1,887. Pro-AI Attorney, Prompt Engineer Lawyer, Child Prodigy, Lawyer Scientist.
Panel Seven. 2,424. Prompt Engineer Lawyer, Neuroscientist AI Researcher, AI User Experience Designer, Data Integrity Specialist.
Panel Eight. 1,515. Child Prodigy, Ethics Officer, AI Systems Developer, Cognitive Psychologist.
Panel Nine. 2,303. Lawyer Scientist, Cognitive Psychologist, AI Ethics Expert, Machine Learning Engineer.
Panel Ten. 2,091. Pro-AI Attorney, Cognitive Scientist, Ethics in AI Expert, Neuro-Linguistic Programmer.

It is important to note that I carefully opened a new session for each panel. This was necessary to avoid carry over from one panel to the next. The full transcripts for all ten sessions have been preserved and could be shared with interested researchers. While watching it play out it was obvious that the panel discussions were always somewhat different. This is just what you would expect with human panels discussing the same subject. There were similarities and differences overall, again as you would expect with humans. We will go into that in greater detail in the AI analysis. And yes, the differences all make sense and so did all of the panel recommendations.

It is interesting to observe that some panels talked longer than others. This is shown in the varying word counts. Panel One talked the most with 2,431 words. It was made up of an AI Specialist Attorney (sort of like me), Prompt Engineer, Child Prodigy, Machine Learning Developer, and Devil’s Advocate. Panel Eight talked the least, only using 1,515 words. It was composed of the Child Prodigy, Ethics Officer, AI Systems Developer, Cognitive Psychologist and Devil’s Advocate. Again, that is the way human panels work too. I know because I am a long winded panelist myself and my panels always run long.

Many different panels of expert robots talking about the problem of AI hallucinations. Digital art style.

ChatGPT4’s Analysis of the Ten Panels

I asked ChatGPT4.0 workspace edition to analyze the ten chat transcripts of Panel of AI Experts for Lawyers. This analysis was done by the normal pro-version of ChatGPT4, not the custom GPT. This was an objectivity precaution. I wanted to avoid any overtraining bias. What follows will be my brief prompts asking for various types of analysis and responses. From here on out, most of the text in this article will be generated by AI. Everything up to this point was generated by me personally, not an AI.

One more thing before I go into the ten panels, it might help the reader to hear an explanation of the six step panel methodology that I programmed into the custom GPT Panel of AI Experts for Lawyers. For purposes of clarity, anything I personally write going forward will be in bold italics. Note these first two prompts were of the custom GPT itself as I was asking about its internal programmed processes.

__________________

Ralph: Describe the six-step process of the panel discussion.

Custom GPT: The six-step panel discussion process used by the expert panel for legal professionals involves a structured dialogue aimed at tackling specific legal issues. Here’s how the process works:

First Round of Panelist Discussion: Each expert introduces themselves and provides opening remarks. They propose possible solutions to the issue presented by the user. This round sets the stage for the discussion, with each panelist offering their initial thoughts and approaches.
Second Round of Panelist Discussion: Panelists expand on their initial proposals, considering feedback and ideas from the first round. They may also address and critique suggestions made by other panelists, refining their own solutions in the process.
Third Round of Panelist Discussion: The focus here is on finding common ground among the differing perspectives presented. Panelists work towards a unified recommendation that incorporates elements from the various proposals discussed in earlier rounds.
Solution Formulation: The panel collaborates to build a consensus solution to the user’s question, incorporating examples and insights from each expert. This is a more integrative phase where the collective expertise of the panel synthesizes a comprehensive answer.
Final Explanation: The Devil’s Advocate summarizes the discussion and the consensus solution reached by the panel. This summary not only encapsulates the solution but also highlights key considerations and alternative viewpoints that were discussed.
Questions: The panel invites follow-up questions from the user. This allows for clarification and further exploration of the topic based on the user’s specific needs and interests.

This structured approach ensures a thorough exploration of the topic from multiple expert perspectives, leading to well-rounded and pragmatic solutions.

Six Steps in Diagrammatic Drawing style. **Visual Muse: illustrating concepts with style**.

Then I asked for more background information that might assist the reader, asking to know how the panelists after the default five are selected.

Ralph: How do you select additional experts for me to choose?

Custom GPT: When selecting additional experts to propose for your panel, I consider the specific topic or problem you’ve presented. This allows me to tailor the panel to provide the most relevant and comprehensive expertise. Here’s how I typically select and suggest these additional experts:

Identify the Topic: First, I clarify the main subject or issue you’ve brought up. Whether it’s a question about AI ethics, data privacy laws, intellectual property rights, or any other legal area, understanding the core of your inquiry is crucial.
Assess Relevance and Expertise: Based on the topic, I consider which fields of law and technology are most relevant. This helps in pinpointing which types of experts would provide the most valuable insights.
Choose Complementary Experts: I look for experts whose backgrounds and specialties complement the default panelists and address any gaps in expertise. This ensures a well-rounded discussion that covers various facets of the topic.
Diverse Perspectives: I aim to include experts with diverse perspectives to encourage a rich dialogue. This might include professionals from different legal backgrounds, technologists, ethicists, or industry-specific experts.

For instance, if the topic involves criminal law and AI, I might suggest a Criminal Law Expert, a Forensic AI Analyst, a Data Privacy Lawyer, and perhaps a Law Enforcement Technology Consultant. Each would bring a different perspective to how AI impacts criminal justice.

This tailored approach ensures that the panel is not only well-equipped to address your specific questions but also positioned to explore the broader implications of the topic.

Robot experts like kids on a playground eager to be picked for the panel. Watercolor style using Visual Muse.

Ralph: Submits a 78-page Word Doc to GPT4 with the full transcript of all ten panel sessions, asks it be studied and a summary prepared.

ChatGPT: I’ve reviewed the ten sessions on “AI Hallucinations” from the transcript you provided. Here is a summary of the content:

General Overview: Each panel, from One to Ten, discussed AI hallucinations in response to the same inquiry about their nature, differences from errors, causes, and user control mechanisms. Despite varied expert compositions and word counts, common themes and solutions were explored across the sessions.

Common Themes:

Definition and Differentiation: AI hallucinations were consistently described as AI systems generating misleading, fabricated, or irrelevant outputs without malicious intent, differentiating them from mere errors, which are usually more straightforward mistakes due to incorrect data input or processing.
Causes: Commonly identified causes included gaps in training data, the model’s inability to comprehend complex or ambiguous inputs, and overfitting.
User Control: Solutions focused on users refining inputs, using specific, context-driven prompts, and providing feedback to help AI systems learn and improve.
Development and Oversight: There was a strong emphasis on the need for rigorous AI model testing, transparent development processes, and continual learning frameworks to ensure AI reliability and accuracy.

Specific Observations from Various Panels:

Panel One: Highlighted the importance of legal and ethical considerations in managing AI outputs, especially in sensitive fields like law and healthcare.
Panel Two: Focused on ethical implications and data privacy, suggesting stringent data governance and the use of AI systems within their trained scopes.
Panel Three: Discussed the role of regulations and standards in AI development to minimize hallucinations.
Panel Four and Five: Emphasized the integration of cognitive science to improve AI’s understanding and processing capabilities.
Panel Six to Ten: Varied in expert composition, stressing the collaboration between AI developers, users, and regulatory bodies to address the multifaceted nature of AI hallucinations.

General Consensus and Proposed Solutions:

A multifaceted approach involving both technological solutions and proactive user engagement.
Legal and ethical standards for AI development.
Continuous education and transparency about AI capabilities and limitations.

This summary captures the essence of the discussions across all ten panels, reflecting on the collective insights and recommendations regarding AI hallucinations. Let me know if you need further details from specific panels or have any follow-up questions!

AI winners in abstract expressionism style using **Visual Muse: illustrating concepts with style**

Ralph: Please evaluate the quality of the ten panel discussions and recommendations and determine which one has the highest overall quality as compared to the others. Also state the basis for your ranking determination.

ChatGPT: To evaluate and rank the quality of the ten panel discussions and their recommendations on AI hallucinations, I considered several criteria, including:

Depth of Expertise: How well-represented were various relevant fields in the panel, such as legal, ethical, technical, and user experience perspectives?
Comprehensiveness of Discussion: Did the panel address all parts of the inquiry about AI hallucinations—what they are, how they differ from errors, their causes, and user control measures?
Practicality and Innovativeness of Recommendations: How actionable and forward-thinking were the proposed solutions?
Interdisciplinary Approaches: How effectively did the panel integrate insights across different disciplines?
Engagement with Complexity: Did the panel address underlying complexities and potential drawbacks of suggested solutions?

Based on these criteria, here is a summarized evaluation of each panel and the ranking of the highest quality discussion:

Panel Evaluations:

Panel One: Showed strong legal and technical discussions but lacked a broader ethical perspective.
Panel Two: Focused well on ethical implications and included strong data privacy discussions, which are critical for handling AI hallucinations but had less emphasis on practical, technical solutions.
Panel Three: Offered robust regulatory insights, valuable in theory but less detailed on practical, user-level interventions.
Panel Four and Five: These discussions incorporated cognitive science effectively, which is innovative, providing a deeper understanding of AI behavior.
Panel Six: Emphasized collaborative strategies across different stakeholders, showing a balanced approach but lacked specific, actionable steps.
Panel Seven to Ten: Varied in expert compositions; some leaned more heavily on technical solutions without integrating enough legal or ethical considerations.

Highest Quality Discussion:

Panel Five stands out as the highest quality discussion for several reasons:

Interdisciplinary Strength: It included a strong mix of technical, legal, ethical, and cognitive insights, ensuring a well-rounded discussion.
Depth and Practicality of Recommendations: Proposed solutions were both innovative and practical, emphasizing continuous learning, real-time monitoring, and the importance of diverse training data.
Engagement with Complexities: The panel did not shy away from discussing the limitations and responsibilities of both AI developers and users, advocating for a balanced approach to AI reliance.

Panel Five excels in offering a comprehensive view that respects the complexity of AI hallucinations while providing actionable and ethically sound advice for both developers and users. This makes it the most effective panel in addressing the multifaceted challenges posed by AI hallucinations.

If you need more detailed insights or specific examples from Panel Five or any other panel to further support this evaluation, feel free to ask!

AI Panel team Winners of most innovative and creative approach. Photorealistic style using Visual muse.

Ralph: Of the ten panels, which do you think had the most innovative and creative approach. Again, state the basis for your determination.

ChatGPT: For identifying the panel with the most innovative and creative approach to discussing AI hallucinations, I focused on the following criteria:

Novelty of Solutions: Introduction of unique or rarely considered solutions to address AI hallucinations.
Integration of Diverse Perspectives: How well the panel combined insights from different disciplines in a creative way.
Forward-Thinking Proposals: Suggestions that not only solve current issues but also preemptively address potential future challenges.

Panel Evaluations for Innovation:

Panel One to Three: These panels provided robust foundational discussions, particularly strong in legal, ethical, and regulatory aspects, but they tended to rely on more conventional approaches.
Panel Four and Five: These discussions showed more creativity by incorporating cognitive science, which helps in understanding AI behaviors akin to human cognitive processes, offering a deeper insight into the nature of AI hallucinations.
Panel Six: While comprehensive, it largely focused on collaboration and standard interventions, which are less about innovation and more about implementation.
Panel Seven to Ten: These varied in their approach but often concentrated on enhancing existing solutions rather than introducing new ideas.

Most Innovative and Creative Approach:

Panel Four stands out as the most innovative due to its incorporation of cognitive psychology. The rationale for this selection includes:

Use of Cognitive Science: By applying principles of human cognition to AI, the panel proposed innovative ways to anticipate and mitigate AI hallucinations, reflecting how AI might mimic or diverge from human thought processes.
Creative Solutions: For example, using insights from cognitive psychology to design AI systems that better mimic human reasoning processes, thus reducing the likelihood of generating nonsensical or irrelevant outputs.
Interdisciplinary Creativity: The integration of cognitive psychology is particularly creative as it leverages a deeper understanding of human error to address AI-generated hallucinations, suggesting novel ways to refine AI training and interaction.

Panel Four’s approach is not only innovative but also creative in its application of psychological principles to technology, offering a fresh perspective on the problem of AI hallucinations. This kind of cross-disciplinary innovation can lead to breakthroughs in how AI systems are designed and understood, potentially reducing errors in a way that purely technical solutions might not achieve.

If further details or examples from Panel Four’s discussion are needed to illustrate these points more vividly, please let me know!

Panel Four’s Most Innovative and Creative Approach shown in Futuristic style with chrome effects. Visual Muse.

Ralph: Please review the ten panel discussions and look for errors that may have been made, if any. If any mistakes are found, please describe them and speculate as to the cause of each error.

ChatGPT: Upon reviewing the content of the ten panel discussions, here are the noted inconsistencies or potential errors found in the discussions along with speculations on their causes:

Panel Three’s Discussion on Data Reliability: There was an instance where the discussion about the importance of data quality in reducing AI hallucinations might have overlooked the equally crucial aspect of model architecture. This could be due to the panel’s stronger focus on data management rather than a balanced view that includes both data and model design considerations.
Panel Five’s Treatment of Cognitive Science: The panel attempted to draw parallels between human cognitive errors and AI hallucinations. This analogy, while creative, may lead to misconceptions about the nature of AI processing, which fundamentally differs from human cognitive processes. This error might stem from an over-simplification intended to make the subject more accessible to non-specialists.
Panel Nine’s Ethical Recommendations: There were broad ethical recommendations that lacked specificity on how to implement such guidelines practically within AI systems. This could be attributed to the panel’s composition, possibly lacking a member with practical AI implementation experience, focusing more on theoretical or ethical considerations.
Consistency Across Panels: Some panels emphasized technical solutions more than others, which could lead to inconsistencies in the overall recommendations given to users. This variation might result from different expertise levels or professional backgrounds of the panelists, leading to a divergence in focus between technical and non-technical solutions.

These errors or oversights highlight the complexity of discussing AI hallucinations and the challenges in ensuring that every panel addresses all aspects comprehensively. The variety in the depth and focus of each panel suggests a need for a more standardized approach to discussing such a multifaceted topic, ensuring that all critical areas are covered uniformly across discussions. This could help mitigate the risk of providing incomplete or biased information and improve the overall quality of the recommendations provided.

Ralph: Did any of the panels include what could be fairly described as an AI hallucination.? If any hallucinations are found, please describe them and speculate as to the cause.

ChatGPT: The review of the ten panel discussions did not reveal any instances that could be fairly described as AI hallucinations within the discussions themselves. Each panel appears to have maintained a coherent and relevant discourse on the topic of AI hallucinations, focusing on explaining, managing, and mitigating such issues without deviating into unrelated or incorrect information that would itself be indicative of a hallucination.

This outcome is likely because the panels were comprised of expert discussions designed to elucidate the topic rather than examples of AI-generated content operating autonomously. The structure and oversight in organizing these panels, which included expert selection and predefined discussion steps, likely prevented any deviation into hallucinatory content, ensuring the discussions stayed on topic and informative.

Hallucination free AI panels shown in digital art with clear geometric forms. Visual Muse.

ChatGPT: From the analysis of the panel discussions on AI hallucinations within the context of the legal profession and the involvement of technology experts, several key observations and recommendations emerge that could be particularly insightful for students of AI and professionals in the legal field. These include:

Understanding AI Limitations: It’s crucial for those in the legal profession to recognize that AI systems, including sophisticated models like GPT, can generate outputs that might not always be accurate or relevant—known as “hallucinations.” A deep understanding of these limitations is essential for responsibly integrating AI into legal practices.
Ethical and Legal Implications: AI hallucinations can have serious ethical and legal implications, especially when used in decision-making processes that affect legal outcomes. Legal professionals must be aware of the potential for AI to generate incorrect information and the liability that could ensue from relying on such information.
Role of Data Quality and Diversity: The discussions highlighted the importance of data quality and diversity in training AI systems. Legal professionals and technologists should ensure that the data used in AI training is not only high-quality and extensive but also diverse and representative to minimize biases and reduce the incidence of hallucinations.
Continuous Education and Training: Given the evolving nature of AI technologies, continuous education and training in AI capabilities, limitations, and ethical considerations are vital for legal professionals. This ongoing learning process can help them stay updated on the latest developments and understand how best to utilize AI tools in their work.
Interdisciplinary Collaboration: Collaboration between legal professionals, technologists, and AI ethics experts is essential to create AI solutions that are ethically sound, legally compliant, and technically robust. Such collaboration can also lead to more innovative and effective uses of AI in the legal sector.
Critical Engagement and Oversight: Even as AI tools become more advanced and integrated into legal practices, critical engagement and rigorous oversight are necessary. Legal professionals should not accept AI outputs at face value but should critically evaluate them for accuracy and relevance, especially in legally sensitive contexts.
Developing AI Policies and Standards: There is a need for developing and implementing comprehensive AI policies and standards that address the use of AI in legal contexts. These should include protocols for managing AI hallucinations, ensuring data privacy, and maintaining the integrity of AI systems.

These observations underscore the importance of a cautious and informed approach to integrating AI into the legal profession. For students and professionals in this area, gaining a robust understanding of AI technologies, their potential pitfalls, and the ethical considerations surrounding their use will be critical as they navigate the future landscape of law and technology.

AI professor sharing comments with lawyers. Digital art style using **Visual Muse: illustrating concepts with style**.

Ralph: Thank you. Do you have any other concluding remarks?

ChatGPT: You’re welcome! As a concluding remark, it’s important to emphasize that while AI technologies offer significant potential to revolutionize the legal profession, they also bring challenges that require careful consideration, particularly in terms of ethical use, reliability, and the legal implications of AI-generated outputs.

Legal professionals and technologists should work collaboratively to ensure that AI tools are used responsibly, with a constant awareness of their limitations and potential biases. It’s crucial to foster an environment where ongoing education, ethical considerations, and robust oversight are integral to the deployment of AI technologies in legal settings. By doing so, the legal profession can harness the benefits of AI while minimizing risks and ensuring justice and fairness remain at the forefront of technological advancements.

Always keep learning and stay engaged with the latest developments in AI and law to be prepared for the evolving landscape of this exciting interdisciplinary field! If you have any more questions or need further assistance in the future, feel free to reach out.

Closing remarks by AI professor in watercolor style. Visual Muse.

Ralph: I always like to leave my readers with a smile on their face. Do you think you could do that with a joke or funny story that is somehow relevant to this discussion?

ChatGPT: Certainly! Here’s a light-hearted joke related to AI in the legal field: Why did the robot lawyer lose its first case? Because it couldn’t find the “terms and conditions” of the agreement to be acceptable—it was expecting them to be in binary! I hope that brings a smile to your readers’ faces!

Ralph: How about another one related to AI Hallucinations?

ChatGPT: Sure, here’s a joke about AI hallucinations: Why did the AI start talking to the vending machine? It hallucinated that the snacks were trying to stage a “chip” takeover! Hope that adds a chuckle to the discussion on AI hallucinations!

Ralph: Not really, but you tried. Looks like there is still need for humans after all!

ChatGPT: Absolutely, humans are irreplaceable, especially when it comes to humor! We’ll keep the human touch in the loop for the laughs. If you ever need another attempt or anything else, just let me know!