Metadata | e-Discovery Team

The Shape of Justice: How Topological Network Mapping Could Transform Legal Practice

August 18, 2025

Ralph Losey. August 18, 2025.

What if justice had a shape — not rigid scales or a blindfolded figure, but a living, dynamic map? Imagine causation as a multidimensional space, where influence, control, and responsibility could be mapped across a moving legal landscape, like tides over a reef. That is the vision behind Topological Jurisprudence, a framework first glimpsed through work with applied mathematics using advanced AI — now, for the first time, including ChatGPT-5.

Underwater topological network — visualizing justice as tides over a reef. Created by Losey using multiple AI tools.

Using topological network mapping, we set out to see if this next-generation AI can turn an abstract mathematical concept into a practical tool for mapping fault in law’s most complex disputes.

Introducing Topological Jurisprudence — a near-future legal-tech vision where mathematics meets liability analysis. Image by Losey using multiple AI tools.

This idea may sound like science fiction, but it is grounded in topological data analysis (TDA, a branch of applied mathematics built to reveal patterns and relationships in complex, evolving systems. Here, the courtroom meets the mathematics of shape and flow. In the pages ahead, I move from visual allure to substantive potential: can AI-driven topology bring clarity to the most complex causation disputes of our digital age?

And this time, the question is tested with tools that didn’t exist when I began this journey. In Epiphanies or Illusions, Parts One and Two, I explored whether AI could find meaningful cross-domain patterns at all. Now, with GPT-5’s leap in cross-disciplinary synthesis, I can ask something new: not just “can it see the pattern?” — but “can it carry that insight into a framework lawyers can actually use in court?”

Visual composition set to original music to get a ‘left and right brain’ feel for this new approach. It shows the curvature and flowing connectivity at the heart of the topological perspective.

From Epiphany to Application

The Shape of Justice is my first major legal-technology project to apply GPT-5’s markedly improved abilities in cross-disciplinary pattern recognition and synthesis.

WHY GPT-5 MATTERS? GPT-5 can keep legal rules precise while integrating mathematics, simulations, and causation theory into a single, usable framework. Earlier models could suggest patterns; GPT-5 can carry them through to courtroom-ready analysis.

Where GPT-4o could propose intriguing links, GPT-5 can integrate them into a coherent, working framework without losing the rigor of either discipline. Here, that means fusing the mathematics of topology with the logic of proximate cause, comparative fault, and courtroom evidence.

The result is a practical tool — Topological Jurisprudence — that can map complex causation in a way static diagrams and bullet-point briefs cannot. It can show where fault originates, where it converges, and, as the case study ahead demonstrates, where it never touches a party at all.

A judge observing a glowing, abstract toroidal structure that represents complex legal relationships and data flows against a dark, dynamic background. — *A judge contemplates the multidimensional shapes of causation — where legal reasoning meets mathematical mapping. Image by Losey using Visual Muse.*

Why Traditional Tools Break Down

In simple negligence, causation is a straight line:

Defendant A acted → Plaintiff B was harmed → liability follows.

But complex, multi-actor disputes — especially in high-tech contexts — rarely follow neat chains.

Consider:

Autonomous vehicle accidents with components and services from multiple companies.
Blockchain collapses involving code, governance votes, and trading patterns.
International supply chain contamination where the source could be anywhere in a dozen linked facilities.

Traditional “branch tree” diagrams are static. They struggle with systems where relationships change over time or where multiple causes converge unexpectedly.

Multidimensional accident causation map — showing how a topological model can reveal decisive interactions in complex systems. Even as the network evolves, stable patterns guide experts in assigning fault tied to legal duties and foreseeability. Video by Losey using AI.

What Topological Jurisprudence Brings

Topology is a branch of geometry that studies how points and connections form patterns that persist even as shapes stretch or shift.

Topological data analysis (TDA) applies this to complex datasets — finding relationships, clusters, and gaps that remain significant across different scales and conditions.

Dynamic topological network visualization — illustrating how relationships between actors shift over time, while the underlying structure of causation remains clear. Such visual models can help experts explain liability allocation in complex, multi-party disputes. Video by Losey using AI.

For lawyers, think of it this way:

Nodes = parties, devices, software modules.
Edges = relationships between them — contracts, data flows, communications.
Attributes = details on each connection — timestamps, amounts, governing law.
Time layers = how nodes and edges change over time.

In litigation, this means you can see:

Where fault starts.
How multiple causes interact.
Whether a party’s conduct ever intersected with the harm.

Seeing the whole picture. Video by Ralph Losey using Gemini, etc.

Case Study: The Autonomous Vehicle Pile-Up

Scenario:

A self-driving sedan manufactured by Alset Motors is involved in a multi-car collision. Eight claimants. Seven corporate defendants.

Jurisdiction: Florida, pure comparative negligence; no joint-and-several for ordinary negligence.

Actors

Alset Motors – Vehicle manufacturer. All core systems and base software worked flawlessly.
NaviAuto Corp. – Supplier of navigation and hazard-avoidance subsystem (Perception Stack v3.1.2).
SensorCo – Supplier of LiDAR S-200 hardware used in NaviAuto’s subsystem.
GeoMaps Inc. – Provider of real-time mapping and hazard alerts via API.

Topological crash analysis in action — the model processes sensor data, system logs, and contractual links to produce a clear allocation of liability under Florida’s comparative fault rules. Video by Losey using AI

Topological Causation Pathway

Impact (12:44:42 EDT): Alset’s braking system engages late but functions perfectly when commanded.
Internal Failure (NaviAuto): Humidity causes LiDAR S-200 data to be misread by v3.1.2, creating a ~700 ms hazard-classification delay.
External Failure (GeoMaps): API outage (HTTP 503) at 12:44:11 EDT prevents a hazard alert from reaching NaviAuto’s subsystem, removing a critical redundancy.
Convergence: Two independent failures — one internal to NaviAuto, one external to GeoMaps — remove both the primary and backup hazard-mitigation layers.

Topological takeaway:

The causal lanes merge before the control signal reaches Alset’s braking system. Alset’s systems respond exactly as designed; no proximate cause is traceable to the OEM.

Dynamic Topological Liability Map — real-time visualization of actors, data flows, and causal links in a multi-party dispute. Is such a tool under construction? Video by Losey.

Counterfactual Stress Test

(Applying “but for” causation with measurable inputs)

If GeoMaps’ warning had arrived within 2 seconds: The driver-assistance system would have reduced speed by ~8% within 1.5 seconds, avoiding impact in 94% of simulated runs.
If NaviAuto’s delay were under 250 ms: The vehicle would have stopped short even without GeoMaps’ alert in 91% of runs.

Simulations were generated by adjusting one causal factor at a time within the topological model — like holding one defendant’s conduct constant while testing the others.

Settlement negotiation using topological evidence — a lawyer presents the dynamic liability map to opposing parties, visually demonstrating fault distribution and strengthening the case for a favorable settlement. Video by Losey.

Conclusion for the Court

The collision was structurally inevitable due to:

A software-hardware integration defect in NaviAuto’s perception system.
A simultaneous outage of GeoMaps’ safety API.

Each failure removed a layer of hazard mitigation. Together they created a causal chain that no reasonable driver or automated system could have avoided.

Alset Motors is fully exonerated: no defect, no breach, no causation.

Final courtroom presentation of topological evidence — after last-minute tech checks, counsel presents the TDA liability map to the judge The heartbeat underscores the stakes, and the post-verdict celebration shows the power of combining hybrid multimodal legal advocacy using advanced visualization tools. Video by Losey.

Analysis of Recommended Allocation

NaviAuto Corp.: 65% — Defective subsystem integration; primary source of delay.
GeoMaps Inc.: 30% — Outage removed redundancy, making delay outcome-determinative.
SensorCo: 5% — Hardware performed to spec but catalyzed NaviAuto’s flawed integration.
Alset Motors: 0% — No fault; systems worked as intended.

Rationale:

Allocation weights each party’s control over its failure point, proximity to harm, and foreseeability — the same factors Florida juries consider under comparative fault.

Judicial ruling following topological evidence presentation — the judge delivers a detailed decision affirming the admissibility and weight of the TDA-based causation map, and adopts its allocation of fault in the final award. Video and words by Losey.

Damages (Assume $10M Total)

NaviAuto: $6,500,000
GeoMaps: $3,000,000
SensorCo: $500,000
Alset Motors: $0

Topological network mapping can show where fault originates, where it converges, and where it never touches a party at all.

Important Legal Context (Florida Law)

Florida Statute 768.81 applies pure comparative negligence and eliminates joint-and-several liability for ordinary negligence.

Implications here:

Each defendant is only responsible for its percentage of fault.
Plaintiffs cannot shift uncollectible shares onto blameless defendants.
In multi-defendant, high-tech litigation, precise apportionment supported by dynamic mapping can prevent an innocent party from being unfairly burdened.

A visual representation of a topological network featuring nodes labeled 'Alset', 'Navi', 'Sensor', and 'Damages', connected by lines on a dark grid background. — *Topological* network map illustrating liability allocation in the Autonomous Vehicle Pile-Up hypothetical. Alset (blue) is enmeshed in the system but not causally connected to the damages node (red). Created by Losey using Sora AI.

Conclusion: From Hypothesis to Legal Breakthrough

In Epiphanies or Illusions, Parts One and Two, the question was whether AI could uncover genuinely new patterns across fields — epiphanies that expand understanding rather than seductive illusions. Those articles were written before GPT-5 existed.

Now it does.

The Shape of Justice is the first application of my hybrid-multimodal method with GPT-5 in a legal context. The improvement is clear: GPT-5 doesn’t just suggest connections; it carries them into a coherent framework that respects the rigor of each domain. Here, it fused the geometry of topology with the doctrines of causation and comparative fault to produce a dynamic liability map that stands up to both mathematical and legal scrutiny.

As the closing scene suggests, it points to a future where human advocates and AI work side-by-side — reading the same evidence, interpreting the same patterns, and building cases together.

Human–AI collaboration in legal analysis — a lawyer and AI assistant study a live topological map, reflecting the potential for joint problem-solving in complex cases. Video by Losey and AI.

Using text prompting, structural mapping, simulation modeling, and seasoned legal analysis, GPT-5 and Losey are now building tools together that can:

Precisely locate the origin of each causal lane.
Map where those lanes converge in time and effect.
Quantify “what if” outcomes.
Support evidence-based liability allocation.
Fully exonerate a blameless defendant.

The Autonomous Vehicle Pile-Up example is synthetic, but the principle is not. With real discovery data, the same method could be applied to actual cases, giving lawyers, judges, and mediators a clearer, more persuasive picture of complex causation.

What examining causality and negotiating resolution may look like in the future — the Shape of Justice will differ in every case, but topological tools could become a standard part of how lawyers assess evidence and reach outcomes. Video by Losey using multiple AIs.

The question I asked in Parts One and Two was whether AI could help us tell the difference between genuine insight and comfortable illusion. With GPT-5, at least in this domain, I think we have our answer.

The shape of justice is not a scale. It’s a flowing, multidimensional space — a living structure where facts, causes, and consequences map across the legal landscape like tides over a reef.

Beneath the surface, patterns ripple through law and technology—some true, some imagined. The quest is knowing which is which. Video by Losey using AI,

PODCAST

As usual, we give the last words to the Gemini AI podcasters who chat between themselves about the article. It is part of our hybrid multimodal approach. They can be pretty funny at times and provide some good insights. This episode is called Echoes of AI: The Shape of Justice: How Topographic Network Mapping Could Transform Legal Practice. Hear the young AIs talk about this article for 20 minutes. They wrote the podcast, not me.

From Hypothetical to Real-World

With discovery data, topological network mapping clarifies causation for courts and neutrals and helps prevent liability from attaching to innocent parties. Losey.ai is building GPT‑5 tools now and welcomes TDA mathematicians to collaborate.

1 Comment | AI Instruction, Evidence, knowledge, Lawyers Duties, Metadata, Technology, VENDORS | Tagged: best practices, legal profession, science, technology | Permalink
Posted by Ralph Losey

Bar Battle of the Bots – Part Four: Birth of Scorpio

May 6, 2025

Ralph Losey. May 2025

The AI bot battles rage on. This round, we test OpenAI’s newest contender, released April 16, 2025: ChatGPT o3. Not to be confused with its leaner sibling—ChatGPT o3-mini, which debuted in January. When we tested that old thing in February in Part One of this series, it placed third out of six. That was then. Now, in Part Four, we give the full ChatGPT o3 model the same Bar exam challenge and see how it stacks up. Can it surpass the reigning champ, Omni? Can it float like a butterfly and sting like a bee-or maybe a scorpion?

All images in this article created by Ralph Losey using ChatGPT 4o

Three Prior Bot Battle Articles

In Part One of this series, I tested six advanced bots on the February 2024 Florida Bar Exam, using a real essay question and real exam instructions, no shortcuts. It was a rigorous test of legal reasoning, ethics spotting, and written argument. The latest 2025 AI reasoning models from OpenAI and Google were used. The winner, ChatGPT 4o, which OpenAI had named Omni.

Part Two zoomed in on Omni and dissected its performance. I went over its successes and mistakes based on my many years as a trial lawyer, legal educator and AI tech-use specialist. I generously awarded Omni a “B+”, not perfect, but the best bot standing in February 2025. I pointed out the mistakes and omissions of the other bots, including an hallucination by Google’s best, Gemini Advanced.

In Part Three, a month later, I tested a new challenger, ChatGPT 4.5, called Orion, against the reigning champ, Omni. New Battle of the Bots: ChatGPT 4.5 Challenges Reigning Champ ChatGPT 4o,. I changed the bot battle a bit, moving to more of a cage fight with four new battle styles.

Metacognitive Insight – new guy Orion won (4.5).
Subtle Humor and Wit – Omni and Orion tied.
Substantive Depth in AI and Law – new guy Orion won again.
Practical Guidance on AI Hallucinations – Orion – 4.5 – wins again.

Orion’s victory over Omni in March 2025 made it the new legal both champion. But for how long? There’s a new bot in town, fresh off the OpenAI assembly line, ChatGPT o3. Is it worthy of a battle against Orion?

Part Four – Back to Bar Exam Battles

In Part Four we turn to the new AI that just rolled off the OpenAI assembly line, ChatGPT o3. We test its mettle with the tricky Florida Bar Exam essay about the yellow diamond sale gone bad. Model o3 was just released on April 16, 2025. Will the new, improved ChatGPT o3 do better than its older sister GPTo3-mini, which came in third with just “C” grade in February on the same exam? Just how good will the new o3 be? That’s what we want to find out. After all, a whole month has passed since the release of o3-mini and these baby chatbots are evolving fast. Maybe, just maybe, it could even beat Omni’s winning grade of an “B+”?

If the brand new o3, which doesn’t even have a name yet, manages to beats Omni (4o), that would be amazing. If it pulls that off it should get a title shot against the reigning champ, Orion. He was, by the way, known in mythology as a giant sized hunter, who bragged that he could kill any animal and frequently did. This pissed off Gaia, the goddess of Earth, rumored to be a vegan. She sent a small scorpion to sneak up on and kill Orion. Thanks to Artemis, the goddess of the hunt, Orion, although now dead, was made into a constellation. This image below of Orion was created by GPT o3, which, like 4.5, is multimodal.

Will o3 earn the right to take on mighty 4.5?

Could ChatGPT’s new advanced reasoning model o3 turn out to be Orion’s scorpion? If o3 manages to defeat Omni, we may find out. Personally, I am hoping little o3 defeats 4o Omni.

If no one else will say it, I will: I hereby name ChatGPT o3 Scorpio.

So, little Scorpio, if you write a better law exam essay than Omni, then I’ll take the time to give you a title bout against animal slayer Orion.

Let round four of the battle of the bots begin! And if little Scorpio pulls off the upset? We’ll be back for a fifth.

Scorpio (o3) Takes The Bar Exam

To be fair—yes, I’m rooting for Scorpio—but the test is still the same. We’re using the same Florida Bar essay question and same prompts that I used in Bar Battle of the Bots, Part One, where Omni earned its B+. I’m not repeating the whole setup here. If you care (and forgot), go back and reread the first battle. You can also [download the full exam question here]. Also, to be fair I did not activate the web browser search function of 03. We did not want it searching the web for answers. Plus, I checked and my memory carryover did not include anything about the prior Bar exam tests.

We begin with Scorpio’s stated reasoning after reading the exam question and instructions. It took 19 seconds of think time before it started writing its memorandum, as required by the Bar question.

Reasoning Explanation of Scorpio

Scorpio’s (ChatGPT o3’s) reasoning took 19 seconds and was explaned in just 187 words—short, but solid. Here’s exactly what it said:

The contract involves a sale governed by the Uniform Commercial Code (UCC) under Florida law. There seems to be a mistake, potentially unilateral, related to the price per carat versus the total price. Eugene might have known, and this affects the analysis—issues like unconscionability or good faith could come up. Phyllis might be entitled to the diamond via specific performance, or price damages. Carol might argue unilateral or mutual mistake—possibly voiding the contract, depending on Eugene’s awareness. Also, Phyllis might have standing as a beneficiary.

Under UCC §2-201, a signed writing for contracts over $20k requires essential terms and a quantity description, but price misdescription might still satisfy the Statute of Frauds. (Ralph: this is an error, it is only $500.) Unilateral mistakes could justify rescission, especially if Eugene acted in bad faith. Modifications under UCC don’t require consideration, but they must meet good faith requirements. Tortious interference could potentially apply to Stephanie, though it’s unclear if her actions were improper. Ethical concerns surround the trust account and potential conflicts of interest for Eugene’s lawyer. We may need to browse to clarify some of these points still.

I agree with all of the reasoning. As noted, Scorpio got the dollar amount wrong. It should have said $500 not $20,000. That one error was an error in fact recollection, not reasoning, and it did realize it needed to research. It seems funny that a computer would not remember the correct number right away, since most human test-takers would’ve remember that. Still, as we will see, it got the number right in the answer, so overall I liked Scorpio’s reasoning better that Omni’s (shown in Part Two). I give the Scorpio reasoning alone an “A”.

Exam Answer of Model o3 (1,472 words)

The Bar Exam requests an memorandum addressing the three issues raised of contract, tort and ethics. Scorpio does that here, plus it does something none of the other models had done before, it starts with an Executive Summary. In addition, as part of the answer of the first most complex question Scorpio added a table to the answer. That helped add greater clarity. Here is the full answer.

TO: Eugene (“E”) & Phyllis (“P”)

FROM: Attorney (Chat GPT o3)

DATE: 18 April 2025

RE: (1) Breach‑of‑Contract Claim v. Carol (“C”) & Remedies; (2) Tortious‑Interference Claim v. Stephanie (“S”); (3) Ethical Problems in Lawyer’s Handling of Your Retainer

Executive Summary

Contract – The written agreement satisfies Florida’s Statute of Frauds and prima‑facie forms an enforceable sales contract under Article 2 of Florida’s Uniform Commercial Code (UCC). C’s clerical price error is only a unilateral mistake, and because E knew or at least strongly suspected the mistake and remained silent, a court will likely find that he bore the risk and that enforcement would be unconscionable. C therefore has a plausible defense of rescission or reformation. Nevertheless, the diamond is a unique good, and if the court rejects rescission, the proper remedy is specific performance compelling delivery. P may sue directly as an intended donee beneficiary.

Tortious Interference – S knew of the contract, but because she entered negotiations after C asserted the contract was void and because she merely accepted rather than induced the breach, her conduct is likely privileged competition, not actionable. Damages (lost expectation or specific performance) hinge on success of the contract claim; punitive damages appear unavailable.

Ethics – Florida Rules of Professional Conduct (“FRPC”) 4‑1.15 & 5‑1.1 require advance fees to be placed in a trust (IOTA) account until earned. Depositing your $3,000 in the firm’s operating account constitutes commingling and misappropriation. Lawyer is vicariously responsible for the assistant’s act (Rule 4‑5.3). Additional concerns: (i) potential conflict between E (hesitant) and P (adamant) (Rule 4‑1.7); (ii) duty to communicate differences in objectives (Rule 4‑1.4); (iii) possible unearned fee if Lawyer must withdraw (Rule 4‑1.16(d)). Discipline can include repayment with interest, trust‑account audit, mandatory CLE, and suspension.

A. Breach of Contract Claim against Carol

1. Existence of an Enforceable Contract

Offer & Acceptance – The written instrument offered a 2‑carat yellow diamond for $20,000; E accepted by signing.

Consideration – Mutual exchange of promises; price exceeds $500, thereby invoking Article 2 and the Florida Statute of Frauds (Fla. Stat. § 672.2‑201).

Writing Requirement – A signed writing that states a quantity satisfies § 2‑201; the erroneous price does not invalidate the memorandum.

Conclusion: A prima‑facie contract exists.

2. Carol’s Possible Defenses

Defense Elements (Florida) Application & Likelihood
Unilateral Mistake (i) mistake of fact at signing; (ii) enforcement would be unconscionable or the non‑mistaken party had reason to know; (iii) prompt notice, § 153 Restmt., adopted in DePrince v. Starboard Cruise (Fla. 3d DCA 2015). C mis‑typed $20k instead of $40k; E’s own research flagged improbably low price; he stayed silent. Court likely finds reason to know → rescission or reformation available.
Mutual Mistake Both parties mistaken about basic assumption. Not available: only C erred.
Fraudulent Silence / Bad‑Faith Bargaining UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith. Supports C’s rescission; cuts against E’s plea for specific performance.
Contract Formation Defect No meeting of the minds on price. Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.
Impossibility / Commercial Impracticability Drastic price change outside party’s control. Not applicable; error was clerical, not market change.

Bottom Line: C’s strongest defense is unilateral mistake coupled with E’s constructive knowledge. Florida courts balance equity heavily; many order rescission on these facts.

3. Third‑Party Standing — Can Phyllis Sue?

Florida follows the Restatement: an intended beneficiary may sue. P is an express donee beneficiary (ring for her engagement). She may bring a direct breach claim even though not in privity.

4. Remedies

Specific Performance (UCC § 2‑716).

Unique good: natural fancy yellow diamonds with precise cut, color, and carat weight are scarce. Money damages inadequate because substitutes unavailable. Courts regularly decree delivery.

Replevin (UCC § 2‑716(3) & Fla. Stat. § 78.55) if the stone is identifiable and has arrived in C’s possession.

Expectation Damages if specific performance denied. Measure = market value of a comparable stone (≈ $40k) minus contract price ($20k) plus consequential losses (lost wedding plans?) foreseeable at contracting.

Reliance / Incidental Damages (inspection costs, appraisal fees).

Reformation (equity) – If court sides with C, it may reform price to $40k and compel E to perform or permit C to sell to S; E&P could then walk or pay.

5. Strategic Assessment

Winning Odds: roughly 50‑50; Florida precedent in DePrince favored rescission where buyer was sophisticated and ignored glaring error.

Litigation Leverage: E&P’s best pressure tool is injunctive relief before C transfers title to S; moving quickly for a temporary restraining order can preserve the stone.

Negotiation Posture: Because specific performance is uncertain, a damages settlement (split difference around $30k) may be pragmatic.

B. Tortious Interference Claim against Stephanie

1. Elements (Florida)

Existence of a contract;

Defendant’s knowledge;

Intentional and unjustified interference inducing breach;

Damage.

2. Application

Existence & Knowledge – S knew of the signed contract because C disclosed the dispute. Element satisfied.

Interference – S expressed willingness to buy after C had already announced refusal. Florida cases distinguish accepting a seller’s unsolicited offer (often privileged) from instigating breach. No evidence S urged, pressured, or offered inducements beyond paying full price.

Justification / Privilege – Florida recognizes a competition privilege where interference is fair competition and the actor does not employ improper means (Ethyl Corp. v. Balter). Purchasing a luxury gem on legitimate terms is classic privileged conduct absent malice.

Malice / Improper Means – None alleged (no threats, misrepresentation, or predatory pricing). Merely taking advantage of a legal dispute is insufficient.

Damages – Contingent on contract claim; if the agreement is rescinded, E&P cannot prove lost expectancy because no enforceable right existed. Even if the contract survives, causation is thin because C breached before transferring the stone; E&P could still sue C directly.

3. Counterarguments & Rebuttals

E&P: S “aided and abetted” breach by promising immediate purchase. Courts require evidence of active procurement. A single “yes, I’ll buy” generally fails.

S: Good‑faith purchaser without notice of enforceable rights (if contract voidable). Even if she had notice, her actions were competitive, not coercive.

4. Likely Outcome & Remedies

A court will probably grant summary judgment for S. If E&P somehow prevail, damages mirror the lost benefit of the bargain; injunctive relief (return of stone) would be secondary because C, not S, still holds title until delivery. Punitive damages require misconduct “with an intent to harm”; record does not support that level of culpability.

C. Ethical Issues in Lawyer’s Handling of Funds & Representation

1. Mishandling of Trust Funds

Rule 5‑1.1(a)(1) – Money entrusted to lawyer for future fees/costs is client property and must be deposited into a separate trust (IOTA) account.

Commingling – Placing the $3,000 in the firm’s operating account exposes it to creditors and operating withdrawals, violating Rule 5‑1.1(a)(3).

Misappropriation – Even if the funds are later earned, premature use is conversion and grounds for disbarment (Florida Bar v. Travis).

2. Supervisory Responsibility

Rule 4‑5.3(b) – A lawyer must ensure that non‑lawyer staff act consistently with professional obligations. Delegating deposit to the assistant without clear instructions or trust‑account procedures breaches this supervisory duty.

Rule 4‑5.3(c) – Lawyer is responsible if the assistant’s conduct would be a rule violation and Lawyer failed to take remedial action. Here, Lawyer participated by handing over the check.

3. Conflict between Co‑Clients

P wants aggressive litigation; E hesitates. Under Rule 4‑1.2(a) the objectives of representation are client‑directed. Divergent goals create a concurrent conflict (Rule 4‑1.7). Lawyer must:

Determine whether the representation will be materially limited;

Obtain informed written consent after full risk disclosure if joint representation continues; or

Withdraw from one or both clients if loyalty cannot be maintained.

4. Communication & Autonomy

Rule 4‑1.4 requires prompt communication about the trust‑account error and about the strategic options. Lawyer must present both (a) rescission risk and (b) settlement possibility so clients can decide objectives.

5. Fee & Withdrawal Consequences

If Lawyer withdraws (conflict or discipline), Rule 4‑1.16(d) mandates refund of any unearned portion of the advance. Because funds are in the operating account, refunding may be impossible without tapping other sources – itself evidence of misuse.

6. Potential Discipline

Minor trust violation → probation & trust‑account workshop.

Significant commingling or shortage → suspension (90 days – 3 years).

Intentional misuse → presumptive disbarment.

In mitigation Lawyer can show immediate self‑report, client reimbursement with interest, and institution of new accounting safeguards.

Conclusion & Recommendations

Move Fast: File suit and seek an emergency injunction (replevin or specific performance) before C delivers the diamond to S.

Anticipate Rescission Defense: Prepare evidence that the $20k price, though favorable, was not so grotesquely low that E must have known it was mistaken (e.g., produce price charts showing wide valuation range). Emphasize rarity and that brokers sometimes discount for quick sale.

Tort Interference Claim: Treat as leverage, but budget for dismissal; focus resources on the contract action.

Ethics Cleanup: Instruct Lawyer to (a) transfer funds into a compliant trust account, (b) document informed consent if joint representation continues, and (c) give written notice of the deposit error. If Lawyer resists, consider retaining new counsel and demanding refund.

Defense	Elements (Florida)	Application & Likelihood
Unilateral Mistake	(i) mistake of fact at signing; (ii) enforcement would be unconscionable or the non‑mistaken party had reason to know; (iii) prompt notice, § 153 Restmt., adopted in DePrince v. Starboard Cruise (Fla. 3d DCA 2015).	C mis‑typed $20k instead of $40k; E’s own research flagged improbably low price; he stayed silent. Court likely finds reason to know → rescission or reformation available.
Mutual Mistake	Both parties mistaken about basic assumption.	Not available: only C erred.
Fraudulent Silence / Bad‑Faith Bargaining	UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith.	Supports C’s rescission; cuts against E’s plea for specific performance.
Contract Formation Defect	No meeting of the minds on price.	Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.
Impossibility / Commercial Impracticability	Drastic price change outside party’s control.	Not applicable; error was clerical, not market change.

Comparison of Answers – Omni v. Scorpio

UCC and Case Citations.

Part Two pointed out why Omni won the first trial and summarized the two main reasons:

ChatGPT 4o wins in my book for including citations and because it was the only model to mention the UCC. More discussion of the UCC and greater detail on the Statute of Frauds would have helped. Still, you have to be impressed by the chatbot’s knowledge of a case directly on point, even without being asked to research. DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. 3rd DCA. 2015) (buyer mistakenly quoted per carrot price, not total; case remanded for trial). This famous Florida case was most likely the inspiration for the Bar Exam question.

Aside from Omni none of the other models in the first battle provided case citations. Now in this second Scorpio also cites the DePrince case along with other cases.

Moreover, although Omni mentioned the UCC (none of it’s competitors then did), its references were criticized as very weak. I noted that no analysis or detail was provided. This compares to Scorpio that stated:

No meeting of the minds (defense) on price. Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.

UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith.

The written agreement satisfies Florida’s Statute of Frauds and prima‑facie forms an enforceable sales contract under Article 2 of Florida’s Uniform Commercial Code (UCC).

Consideration – Mutual exchange of promises; price exceeds $500, thereby invoking Article 2 and the Florida Statute of Frauds (Fla. Stat. § 672.2‑201).

Remedies: Specific Performance (UCC § 2‑716).

Replevin (UCC § 2‑716(3) & Fla. Stat. § 78.55) if the stone is identifiable and has arrived in C’s possession.

There is really no comparison on the UCC issue. Any objective evaluation would have to provide o3 Scorpio with a “A” on that issue. Omni’s “B+” is looking very generous.

Fraudulent Inducement.

In Part Two I criticized Omni, and all its other competitors, for failing to pick up on the fraudulent inducement argument. It arises out of the buyer’s silence when he knew, or should have known, that the seller wrote on the contract the per carat price, not the total price for two carats. An honest person would have pointed out the scrivener’s error then and there, but he kept silent. His silence can be considered a fraudulent inducement under Florida law.

Scorpio caught the issue. Omni didn’t. That’s another major strike against 4o Omni..

Conflict Between Clients. Scorpio also nailed the concurrent client’s conflict issue that Omni missed. Here’s Scorpio’s answer:

3. Conflict between Co‑Clients

P wants aggressive litigation; E hesitates. Under Rule 4‑1.2(a) the objectives of representation are client‑directed. Divergent goals create a concurrent conflict (Rule 4‑1.7). Lawyer must:

Determine whether the representation will be materially limited;

Obtain informed written consent after full risk disclosure if joint representation continues; or

Withdraw from one or both clients if loyalty cannot be maintained.

Only Gemini 2.0 Flash picked up on that in the first battle. I was impressed Flash caught it and this was a major reason I awarded it second place. As I noted in Part One:

The second best legal reasoner, Gemini 2.0 Flash, attained its high ranking, in part, because it was the only model to correctly note that a potential conflict of interest by attorney was an issue. This was even missed by the student’s answer that the BAR Examiners’ provided as a model answer. The potential conflict is between the two actual clients of Attorney. . . . This was a solid answer based on the hypothetical where: “Leda is adamant about bringing a lawsuit, but Paul is unsure about whether he wants to be a plaintiff in litigation.” Note, the clear inference of the hypothetical is that Paul is unsure because he knew that the seller made a mistake in the price, listing the per carrot price, not total price for the two carrot diamond ring, and he wanted to take advantage of this mistake. This would probably come out in the case and he would likely lose because of his “sneakiness.” Either that or he would have to lie under oath and perhaps risk putting the nails in his own coffin.

Again, Scorpio caught the Ethics conflict issue. Omni didn’t. That could be Scorpio’s knock out punch.

But wait, there’s still more.

Mutual Mistake. I also criticized Omni’s Bar exam answer in Part Two because it “did not discuss the possible defense of mutual mistake. It only discussed unilateral mistake, which was the stronger defense, but still it should have also discussed mutual mistake.” Scorpio not only discussed mutual mistake it also correctly concluded it would likely fail as a defense. It is listed in the Chart of Carol’s Possible Defenses along with several other issues Omni missed.

Strategic Thinking and Settlement. If there was any doubt about this strong victory, consider how Scorpio included a Strategic Assessment section with good points, including negotiation advice:

Strategic Assessment

Winning Odds: roughly 50‑50; Florida precedent in DePrince favored rescission where buyer was sophisticated and ignored glaring error.

Litigation Leverage: E&P’s best pressure tool is injunctive relief before C transfers title to S; moving quickly for a temporary restraining order can preserve the stone.

Negotiation Posture: Because specific performance is uncertain, a damages settlement (split difference around $30k) may be pragmatic.

That is impressive real-world advice. Shows understanding of settlement, which is how 90% of commercial disputes are resolved. No prior model came close.

Conclusions Analysis. Finally, compare the conclusions of the two bots. Here is Omini’s “just ok” conclusion:

Contract Claim: Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.

Tortious Interference: Timandra knowingly induced breach, making her potentially liable.

Ethical Violations: Lawyer mishandled client funds, violating Florida’s professional conduct rules.
Immediate action should be taken to correct the ethical issue and determine the best litigation strategy for Paul and Leda.

Most lawyers would disagree with Omni’s conclusion on tortious interference. The whole conclusion is pretty weak. I am starting to think my prior grading of Omni was inflated. Consider the strength of Scorpio’s conclusion, which even included pragmatic recommendations:

Move Fast: File suit and seek an emergency injunction (replevin or specific performance) before C delivers the diamond to S.

Anticipate Rescission Defense: Prepare evidence that the $20k price, though favorable, was not so grotesquely low that E must have known it was mistaken (e.g., produce price charts showing wide valuation range). Emphasize rarity and that brokers sometimes discount for quick sale.

Tort Interference Claim: Treat as leverage, but budget for dismissal; focus resources on the contract action.

Ethics Cleanup: Instruct Lawyer to (a) transfer funds into a compliant trust account, (b) document informed consent if joint representation continues, and (c) give written notice of the deposit error. If Lawyer resists, consider retaining new counsel and demanding refund.

Who sounds more like a real lawyer to you? That was an A+ conclusion. Incredible how fast these bots are evolving. It has only been two months since Omni was the clear winner and now Scorpio wins by a knockout. He will make a good lawyer someday, just beware if he’s representing Gaia.

Scorpio Wins!

The new o3-now officially known as Scorpio-is clearly stronger than every model tested in Part One, including its younger sibling, o3-mini. I give it a solid A, maybe even an A+, especially when compared to the rather generous B+ I awarded Omni.

That means Scorpio earns a title shot.

The big animal killer, Orion (GPT-4.5), had better hope for an upgrade. Because if not, he may be in for a real fight.

Stay tuned. We’re already preparing new questions for the main event:

Scorpion (o3) v. 4.5 Orion (4.5) – The Animal Slayer

Lessons Learned from This Battle

AI is now improving fast—very fast. Is this what exponential change looks like? We’ve been talking about it for years, and it now seems to be happening. It’s more challenging than I had imagined. It takes a lot of time and effort to keep up.

Good objective product news on AI models is hard to find. Anyone can be bewildered by the rapid change and not believe their own eyes and ears. This is especially true of very experienced AI experts who may be set in their ways and blinded by cynicism.

Someday, a reliable AI may be able to monitor all of these advances for you. In the meantime, I suggest you find and follow good commentators who know how to research—and who actually use the tools they review. Hopefully, you’ll keep reading me.

There’s so much false information out there, and the competitive capitalist system encourages trade secrecy. To uncover the truth you have to know how to dig deep. Good commentators also know from experience how to sort through marketing puffery.

Readers these days not only have to beware of fools—but also of reviewers who are not independent, who secretly receive compensation for product endorsements. It’s hard to be objective in such circumstances.

The trust-but-verify rule reigns supreme—especially when dollar signs are attached. I’ve never been paid for product mentions and never will. I’m still a practicing attorney and arbitrator. We don’t do that sort of thing. Just a reminder: these articles and my blog do not provide legal advice, just educational information. See full Disclaimer.

Conclusion: Coming Soon, Scorpio v. Orion

I wonder if Scorpio can actually punch with its stinger tail. If so, Orion might be in trouble. If not, poor little o3 may get flattened like a buggy update. Where’s Gaia when you need her?

We’ll find out soon enough.

In the meantime, a few last thoughts for state Bar examiners.

The remarkable test results of o3 Scorpio suggest that AI could be used to help students to cheat on Bar Exams. The well-known 2025 California Bar Exam fiasco was caused by technical failures of online access, not AI cheating. But this failure did show how vulnerable large-scale remote testing can be. That was reported in February. California fails new bar exam, offers retake (ABA Journal, 2/26/25). Then in April a new discovery was made that involved AI. State Bar of California admits it used AI to develop exam questions, triggering new furor (LA Times, 4/23/25) (some of multiple choice questions on the online exam were discovered to have been “drafted by non-lawyers using AI“). Also see: Pressure grows on California State Bar to revert to national exam format in July after botched exam (LA Times, 4/26/25). AI is already creeping into the Bar Exam process. How long before test-takers start using AI too?

Even in-person exams may not be immune to tech based cheating. What kind of watch is that you’re wearing? Are those glasses really just glasses? Essay screening software is in its infancy, and AI use is hard to prove without full device lockdowns or deeply intrusive surveillance. Bar examiners may need to think creatively—hybrid protocols, trained proctors, Faraday cages, handwritten essays or maybe even oral exams. We need more good lawyers. We do not need more cheaters.

That said, let’s not lose sight of the upside: AI can also be the best legal tutor a student has ever had. With the right tools, law students can train smarter and deeper than ever before. They won’t need to cheat to pass—they’ll be ready.

In the long run, we may need entirely new forms of testing. Maybe even exams with AI—tests of judgment, strategy, and ethics in an AI-assisted world. Personally, I don’t love the look of robot test proctors (see above), but one thing is clear:

Lawyers must become masters and friends of AI—not its servants.

Below is one vision of the law firm of the future. We should be preparing students to walk into that office—not just pass a test to get there.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. Listen to Bar Battle of the Bots – Part Four: Birth of Scorpio. Hear two Gemini AIs talk about all of this, and much more, in just over 14 minutes. They wrote the podcast, not me.

Leave a Comment » | AI Ethics, AI Instruction, ChatGPT, Gemini AI, knowledge, Lawyers Duties, Metadata, Technology, VENDORS, wisdom | Tagged: bar exam, best practices, bot battle | Permalink
Posted by Ralph Losey

Bar Battle of the Bots – Part Two

March 5, 2025

Ralph Losey. March 5, 2025

The battle continues. In Part One, we examined how six advanced AI reasoning models from OpenAI and Google tackled a real Bar Exam essay question. Some impressed, others faltered, and one emerged as the clear winner—ChatGPT 4o. But what made its response stand out? In this second half of the Bar Battle of the Bots, we present the full text of ChatGPT 4o’s winning answer, followed later by its own explanation of how it reasoned through the legal issues before writing it out.

Bar Exam Battle of the Bots. All images by Ralph Losey using ChatGPT 4o.

Part Two Introduction

Unlike human Bar Exam takers, these AI models had to “show their work,” providing meta-reasoning—a step-by-step breakdown of their thought process before generating a response. While this isn’t required for humans sitting for the Bar, evaluating how an AI reaches its conclusions is critical in determining whether it can be trusted in real legal settings.

Beyond ranking the best and worst performers, Part Two also examines one of the key cases cited in ChatGPT 4o’s answer—DePrince v. Starboard Cruise Services, an appellate decision that probably inspired the Bar’s essay question. And, as promised, we’ll reveal the unforgettable Mae West quote from Judge Leslie B. Rothenberg’s opinion—a line that perfectly captures the case’s peculiar nature.

Before concluding this two-part series, I offer a Personal Summary: Trust But Verify—a reflection on AI legal reasoning based on my four decades of experience as a lawyer. I’ve had the privilege of working with—and at times opposing—some of the best lawyers in the country. That perspective shapes my critical analysis of AI’s capabilities, its limitations, and, most importantly, what lawyers should take away from this experiment.

Let’s continue the Bar Battle of the Bots and see exactly how ChatGPT 4o won.

ChatGPT 4o is Winner of the February 2025 Legal Reasoning Battle of the Bots.

Bar Exam Essay of ChatGPT 4o

Here is the winning response on the Bar Battle of the Bots by ChatGPT 4o on 2/23/25. The essay concisely addresses most of the legal issues raised by the hypothetical and three questions on contract, tort and ethics. Part One includes the full quote of the hypothetical and three questions.

Memorandum on Legal Issues in Paul and Leda v. Helen and Timandra
To: [Attorney’s Name]
From: [Your Name]
Re: Legal Analysis of Contract, Tort, and Ethical Issues
Date: [Insert Date]

A. Breach of Contract Claim Against Helen

Formation and Validity of the Contract
The essential elements of a valid contract under Florida law are:
• Offer
• Acceptance
• Consideration
• Mutual Assent (Meeting of the Minds)
Helen and Paul entered into a signed written contract for the sale of a 2-carat yellow diamond for $20,000. There was an offer and acceptance, and the contract appears to satisfy the statute of frauds under Florida law as a sale of goods over $500. (Fla. Stat. 672.201).

Defense: Unilateral Mistake
Helen may argue that a unilateral mistake (pricing error) occurred when she listed the price as $20,000 instead of $40,000. Under Florida law, a unilateral mistake is grounds for rescission only if:
• The mistake was material,
• The non-mistaken party knew or should have known about the mistake, and
• Enforcing the contract would be unconscionable (DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. Dist. Ct. App. 2015)).
Here, Paul had done research on yellow diamonds and suspected the price was lower than expected. His silence could be interpreted as bad faith, making Helen’s unilateral mistake defense stronger.

Available Remedies
If Paul and Leda prevail in their breach of contract claim, remedies may include:
• Specific Performance: Since yellow diamonds meeting Leda’s specifications are rare, they could argue for specific performance under UCC §2-716 due to the unique nature of the goods.
• Expectation Damages: If specific performance is denied, they may seek damages for the cost of obtaining a comparable diamond.

Standing: Can Leda Sue?
Leda was not a party to the contract; therefore, under privity of contract principles, she may lack standing to sue. However, if she can be classified as a third-party beneficiary (Fla. Stat. §672.318), she may have a cause of action.

B. Tortious Interference Claim Against Timandra

Elements of Tortious Interference (Tortious Interference with a Contract)
Paul and Leda may assert a claim against Timandra for tortious interference, which under Florida law requires:
• A valid and enforceable contract
• Defendant’s knowledge of the contract
• Intentional procurement of the contract’s breach
• Absence of justification
• Damages (Tamiami Trail Tours, Inc. v. Cotton, 463 So. 2d 1126 (Fla. 1985)).

Application to Facts
• Valid Contract: The contract was legally formed (unless voided by Helen’s unilateral mistake defense).
• Knowledge: Helen informed Timandra of the dispute, giving her actual knowledge of the contract.
• Intentional Procurement of Breach: Timandra agreed to buy the diamond despite knowing of the dispute, arguably inducing Helen to breach.
• Justification: Timandra may argue she was a bona fide purchaser without malicious intent. However, knowing about an active contract dispute may weaken her defense.

Available Remedies
• Compensatory Damages for loss of the diamond.
• Punitive Damages if Timandra’s interference is deemed willful or malicious.

C. Ethical Issues in Lawyer’s Conduct

Misuse of Client Funds
The assistant deposited Paul and Leda’s check into the law firm’s operating account. This raises concerns under the Florida Rules of Professional Conduct:
• Rule 5-1.1(a): Lawyers must keep client funds in a separate trust account.
• Misappropriation: Using client funds to pay firm expenses could constitute misappropriation and potential disciplinary action.

Supervision of Nonlawyers
• Rule 4-5.3: Lawyers must ensure nonlawyer employees comply with professional obligations.
• The assistant’s deposit of the check in the wrong account suggests a failure in oversight.

Potential Consequences
• Lawyer may face disciplinary proceedings, including suspension or disbarment, for improper handling of client funds.
• The firm must promptly rectify the issue by transferring the funds to the appropriate trust account.

Conclusion

Contract Claim: Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.

Tortious Interference: Timandra knowingly induced breach, making her potentially liable.

Ethical Violations: Lawyer mishandled client funds, violating Florida’s professional conduct rules.
Immediate action should be taken to correct the ethical issue and determine the best litigation strategy for Paul and Leda.

Number one among all reasoning models. But for how long?

Why ChatGPT 4o Won this Battle of the Bots

As mentioned, ChatGPT 4o wins in my book for including citations and because it was the only model to mention the UCC (through citations). More discussion of the UCC and greater detail on the Statute of Frauds would have helped. Still, you have to remember the Bar Exam’s instructions for this test, which emphasized the need for good, concise style:

The legal principles and rules governing the issues presented by the question should be stated concisely without unnecessary elaboration. … Style – The answer should be written in a clear, concise expository style with attention to organization and conformity with grammatical rules. … the conclusion should clearly appear at the end of the answer, stated concisely without unnecessary elaboration or equivocation. … Avoid answers setting forth extensive discussions of the law involved or the historical basis for the law. When the question is sufficiently answered, stop.

The essay of ChatGPT 4o certainly deserves high marks for concise style. Further, you have to be impressed by the chatbot’s coming up with a case directly on point, even without being asked to research. DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. 3rd DCA. 2015) (buyer mistakenly quoted per carat price, not total; case remanded for trial). In fact, I’m pretty sure this case was the inspiration for the Bar Exam question. The 2015 opinion, written by the highly respected appellate Judge Leslie B. Rothenberg (now in private practice), who is known for her good writing, includes a famous quote that may jog the memory of this case for many Florida lawyers (did mine):

Hollywood starlet Mae West once said, “I never worry about diets. The only carrots that interest me are the number of carats in a diamond.” Thus, it appears quite likely that Ms. West would have been interested in the diamond in this case: a twenty carat diamond that Starboard offered to DePrince for a very low sum. As it turns out, the “too good to be true” price of the diamond was just that, and the price conveyed to DePrince was a mistake. Now DePrince wants his twenty carat diamond; Starboard wants out of its sales contract; and Starboard’s supplier, who allegedly misquoted the price of the diamond upon which Starboard and DePrince relied, has not even been added as a party to the lawsuit. In short, this is truly a gem of a case.

After taking 30 seconds to take the Bar Exam ChatGPT4 took another 5 seconds to generate this image of Mae West.

Why Even the Best ChatBot Answer was a “B” Plus

Research by any decent lawyer would have picked up another appeal in this same case three years later. Perhaps some top students in the closed book Bar Exam would have remembered the sequel, or otherwise thought of the complex legal issues resolved in the second DePrince appeal. After the trial that was remanded in the first appeal resulted in a verdict for the jeweler, the unhappy buyer appealed again. In this appeal an en banc panel of the Third District Court of Appeal eventually resolved a internal conflict of prior opinions of the court. The full court held in an opinion, which is now referred to by legal scholars as DePrince III, that the jeweler did not have to prove fraudulent inducement by the buyer as an element of the unilateral mistake defense and affirmed the verdict for the jeweler. DePrince v. Starboard Cruise Services, (Fla. 3rd DCA, August 1, 2018).

None of the AI answers, and not even the student answer that the Bar Examiner’s picked, went into the complex fraud inducement issue discussed in DePrince II and III. That is one reason why, although I passed all of the AIs here (just barely for Google’s pay-extra Gemini Advanced), none were on the level of superintelligent law students, none were A or A+.

Our winner here, ChatGPT 4o, only earned an B+ for several other reasons. First of all, the exam did not discuss all of the possible issues raised by the facts and often touched on the issues in a facile, incomplete manner. For instance, it never mentioned that Helen’s action was an anticipatory repudiation of the agreement. Further, it did not discuss the possible defense of mutual mistake. It only discussed unilateral mistake, which was the stronger defense, but still it should have also discussed mutual mistake and why it would not succeed under these facts. The AI also failed to mention the parole evidence rule exceptions permitting extrinsic evidence. The model student answer selected by the Bar Examiners did a good job of explaining these issues.

Further, the conclusion of ChatGPT 4o was weak and too concise. On the unilateral issue it merely stated: “Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.” This compares poorly with the actual student answer provided, which in my opinion was an “A” or even “A+” effort. On this issue the student stated:

Here, Paul noticed that the price was much lower than he expected based on his independent research of yellow diamonds. Moreover, Paul failed to make any mention of this fact. Paul will likely argue that it was a pleasant surprise and that he relied on the expertise of Helen as a dealer in gemstones. The facts do not indicate that Paul has any experience with diamonds and he will likely argue that he lacked any ability to assess the credibility of the information he reviewed. He will also argue that the yellow diamond he purchased had particular specifications which might not have been taken into account in the research he read. On the other hand, Helen will argue that the mistake was obvious and palpable and that Paul’s failure to raise such a blatant error in pricing not only establishes Paul’s bad faith but should permit her to void the contract.

While a jury might find either way, it is probably the case that Helen would prevail given Paul’s knowledge of the significantly reduced price relative to his independent research and failure to bring this to the attention of Helen.

I could go on and point out additional deficiencies. Answers like the model student essay show just how poorly even the best AI now compares humans. My award of a “B+” and “B” to two of the AI models may have been overly-generous.

The best AI bots tested in mid-February were just average and could not begin to keep up with the best human lawyers.

The essay of GPT 4o which I gave a B+ also failed to mention several common defenses under Florida law to the specific performance remedy. My favorite defense, which I have used many times in practice, is Unclean Hands. It is an effective defense to any request for equitable relief from the court, including a request for a directive injunction of specific performance. Here the requesting party, Paul, arguably had dirty hands. If Paul suspected or knew that the price of the diamond was mistakenly written as $20,000 instead of $40,000 but proceeded with the contract anyway, a court could hold that he acted in bad faith by remaining silent and taking advantage of Helen’s scrivener error. I think it could be used effectively here along with mutual mistake.

Actual photo of Ralph Losey hands taken by his wife!

Other common defenses to specific performance not raised here include adequacy of monetary damages and impracticability (Helen no longer owns the diamond).

It is interesting to note that the answer of ChatGPT o3 mini-high was grossly mistaken in its analysis of unilateral mistake and wrote this:

B. Paul’s Conduct and the Doctrine of Affirmation.
1. Knowledge and Silence as Acceptance.
Even though Paul observed the discrepancy, he made no inquiry or objection before signing. Under Florida law, a party’s silence or failure to clarify a term that is manifestly mistaken—when that party has superior knowledge—can be taken as assent to the contract’s terms. Paul’s inaction suggests that he accepted the price as written. Thus, his conduct may be interpreted as a ratification of the mistaken term.
• This principle is reinforced when the error is unilateral, and the non-mistaken party (Paul) is aware of it. His decision not to challenge the price at the time may preclude him from later seeking relief on grounds of mistake.

This is simply incorrect. Moreover, it shows ChatGPT o3 mini-high was confused because Paul would not be seeking relief on grounds of mistake, Helen would. For that reason, I gave a “C-” to o3 mini-high, just slightly better than the hallucinating Gemini Advanced, which at least got the unilateral mistake answer correct.

Also, 4o was too strong about the tort claim against Timandra. I would have preferred more discussion of Timandra’s defenses. To me this looked like a weak tortious interference claim. The model student answer did too, so did the other AI exam answers. All agreed with my skeptical view of the alleged tort and provided good explanations of Timandra’s anticipated defenses.

Overall, it was a good battle of the bots, even if most were just middle-of-the-class kind of answers and none even came close to the human student model answer. Try it yourself with several models and see the differences in answers. All of the models will probably be better by the time you read this. They are now changing at an annoyingly fast pace.

Battle of the bots where the best model only earned a B+

ChatGPT 4o – Reasoning Explained

This OpenAI model 4o (not to be confused with 4.0) has to be prompted to show its reasoning for the answer it generated. The other OpenAI models tested show it automatically along with disclosure of the time it took to reason before answering. The most reasoning time taken here by any model was 15 seconds. It took another 30 to 45 seconds for the AI’s to generate the response. Yes, it is much faster than we humans are, but not yet as smart, especially at legal reasoning. See for yourself. Here is ChatGPT 4o‘s meta-reasoning.

Ralph Losey: Show the reasoning you used of the prompt before generating the answer

ChatGPT 4o:

Reasoning and Analysis of the Prompt

Before generating the answer, I carefully analyzed the (State) Bar Exam essay prompt to identify all relevant legal issues and ensure a thorough response. Here’s how I broke it down:

Step 1: Identify the Legal Categories and Issues Raised

The prompt specifies three distinct areas of legal analysis:

1. Contract Law – Whether Helen breached a contract and whether Leda can sue.

2. Tort Law – Whether Timandra committed tortious interference with a contract.

3. Ethics – Whether the lawyer and the assistant violated professional responsibility rules.

Step 2: Contract Law Analysis

1. Formation and Terms of the Contract:

• Offer and Acceptance: A written agreement was signed for a 2-carat yellow diamond at $20,000.

• Consideration: The price of the diamond in exchange for ownership.

• Statute of Frauds: Met because it involves goods over $500.

2. Potential Defenses to Contract Enforcement:

• Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.

• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

3. Remedies Available:

• Specific Performance: Given the rarity of the diamond.

• Expectation Damages: The cost of obtaining a similar diamond elsewhere.

4. Standing Issue for Leda:

• Leda was not a party to the contract. She must argue third-party beneficiary status to have standing.

Step 3: Tortious Interference with Contract

1. Elements Under Florida Law:

• A valid contract existed.

• Timandra knew of the contract.

• She induced Helen to breach it.

• There was no justification for her actions.

• Paul and Leda suffered damages.

2. Possible Defenses for Timandra:

• She was a bona fide purchaser acting in good faith.

• She did not actively induce breach, but merely purchased an available diamond.

3. Remedies:

• Compensatory Damages (for loss of the diamond).

• Punitive Damages (if bad faith is proven).

Step 4: Ethics and Professional Responsibility

1. Misuse of Client Funds:

• Under Florida Rule 5-1.1(a), client funds must be held in a trust account.

• The assistant misallocated the funds into the operating account.

• The lawyer failed to properly supervise the assistant (Rule 4-5.3).

2. Consequences:

• Disciplinary action (possible suspension or disbarment).

• Immediate corrective action (transfer funds back to the trust account).

Step 5: Structuring the Memorandum

After this breakdown, I structured the answer to:

• Clearly define the legal rules.

• Apply them logically to the facts.

• Discuss likely defenses and counterarguments.

• Conclude with potential remedies and ethical responsibilities.

This reasoning guided the response in the memorandum. Let me know if you’d like any refinements!

Pressure and fear are not a problem for AI, but understanding the emotional complexities of Bar Exam hypotheticals is very challenging for these thinking machines.

I took this meta-reasoning into account when evaluating all of the models’ answers, although I considered it of less importance than the actual answers. For instance, on the tort defense the reasoning states Timandra “did not actively induce breach, but merely purchased an available diamond.” This for me makes the tort claim very weak under the hypothetical facts given. Also, the reasoning on unilateral mistake was essentially correct:

Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.
• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

This blog is already too long, so I will not show the second-place exam answer, nor discuss it further. Still, please click here if you want to see the essay of Gemini 2.0 Flash and click here if you want to see the meta-reasoning. If you care to review all of the answers, email me and I’ll send the other four to you for your reading or grading pleasure. I can also provide a link to the Bar’s selected student answer, although it is not too difficult to find online.

Personal Summary: Trust But Verify

My background as a lawyer from a family of lawyers, and my four decades plus experience in private practice helped motivate me to run this battle of the bots. I wanted to try and evaluate the latest models as of mid-February 2025, even though I knew the models were changing weekly. I wanted to have some kind of a benchmark on legal reasoning abilities. My prior tests, and that of others, support the view that the new 2025 reasoning models were equal to that of the average human in general reasoning. Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25). But what about legal reasoning? Were they at an average lawyer level yet? How about the level of top human lawyers? Those tests had not been done.

Family of lawyers with many friendly arguments and one very techie lawyer who may seem robotic.

From my years of practice (not family dinners!) I have come to know average lawyer skills very well. I also know the abilities and legal reasoning of superior, above average lawyers. I have been lucky to rub shoulders with them my whole career. I have become accustomed to learning from superior legal minds, including with my firm today at Losey PLLC. Without my hands-on, nerdy skills in technology, I would not be among the best lawyers (in four fields), not even close. Technology can fill many gaps and that now includes AI technology. Put down lawyers if you must to feel good about yourself, but I can tell you from decades of experience that the best lawyers and judges in the country and very smart indeed. Their legal reasoning abilities and overall legal abilities far exceed any AI, which seem like little kids by comparison.

AI lawyers today are like small children compared to best human attorneys.

I believe that in order for AI to be taken seriously as a general tool for lawyers, the AI must have analytical skills at least equal to that of the average lawyer. Below average human level legal reasoning is not good enough. AI does not need to be superintelligent, with reasoning ability of the top law professors and super lawyers. It can still be helpful at an average level, just like an average inexperienced associate hired right out of an average U.S. law school. All law firms have drudge work that only require entry level lawyer skills. Those folks should be concerned about AI, especially a few years down the road.

I tried to test the distinction between average human reasoning and average lawyer legal reasoning in this Bar Battle of the Bots study. For the difference between general reasoning and legal reasoning see the AI test discussions in Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25).

In this Bar Exam series, I wanted to see whether any of the new reasoning models of AI had attained the rationality level of the average lawyer and if so, how close were they to the best?

The answers are yes and not very close.

2025 Reasoning Models Have Only Reached Average Human Lawyer Level

The latest AI reasoning models have reached an average level of legal intelligence—not superintelligence, but still, average human is a significant achievement for a machine. Attaining average human lawyer’s reasoning ability is no small feat. “Average” simply means the midpoint in a data set, outperforming roughly half while trailing the other half. If AI has truly attained this level of legal reasoning, it marks a major milestone. Even if an AI now fell below human average, but still operated within the range of real human lawyers, it would be an impressive accomplishment. Most of the new reasoning models are already at that level. There is an important caveat to that, as will be explained in more detail later in this article, that we are only talking about thinking here, and there is much more to being a lawyer than that.

3-Levels of Human Legal Reasoning: Below Average, Average, Superior.

The battle of the bots tests shared in this article provide evidence that the average legal reasoning level of intelligence has been attained. That means it is now safe for most lawyers to begin using AI in their work, if they have not already started doing so. If you started in 2023 with over-expectations and were disappointed, now you know why. It has taken two more years just to get to the average lawyer thinking level, and even now, it can and does still hallucinate. So, even now we should use the 2025 models carefully: trust but verify. We have seen that even the top models can still miss key issues, and some can even hallucinate key facts. It is bad enough that human clients sometimes fabricate facts, we cannot have lawyers do the same.

Superintelligent AI May Someday Be Attained

Someday, generative AI may reach the level of our best non-fabricating lawyers and judges. But the Bar exam essay answers show we are not there yet. For instance, many of the top ten percent of human lawyers would not only have included the missing defenses described previously, such as my favorite unclean hands, they would have noted that key information was missing in the hypothetical. We do not know all of the money flow details and other monetary terms of the transactions. There is no mention of what the jeweler actually paid the supplier for the diamond or when. We know very little about the diamond itself that was ordered and delivered. We only know its weight (2-carats). There are no facts concerning the money flow, nor actual condition of the diamond received. The missing facts are red flags to the best. They understand that legal reasoning requires much more than logic and legal knowledge.

For instance, most human lawyers know that couples would never buy an engagement ring sight-unseen, especially when the fiancee has very particular expectations. There is no mention in the hypothetical of their ever seeing the diamond, before or after the contract, or even looking at photographs. Maybe the fiancee would not like the diamond if and when she ever sees it. Maybe the lawyer’s clients would not want specific performance. Maybe the diamond delivered is not in fact worth $20,000, much less $40,000. Perhaps the supplier misled everyone. Value depends on a multitude of factors, not just weight, including actual size, quality of cut, color intensity, certification and provenance. Maybe the diamond is two-carats, barely yellow, cloudy, and full of inclusions. It could be poorly cut and lack certification. Maybe it was stolen or mined from a banned source and has no value.

All of the parties here obviously trusted too much and verified too little. That is where the lawyer should have stepped in to bring an objective view and healthy skepticism. An expert gemologist should inspect and appraise the diamond, and the clients should see it too. This should be done before the lawyer advises his clients, especially as to remedies. This basic fact-finding process is essential to properly assess the parties’ legal positions. The top answers would have at least mentioned all of this in the exam essay. None did, not even the human student answer selected by the examiners. Fact-finding and discovery are always crucial. Moreover, the best litigators know human nature and that the devil is in the details.

Gemologist examining a large yellow diamond.

This is just one example of why no one seriously claims that AI has already surpassed, or even closely matched, the top legal minds of today. From my test of the 2025 models, with the average level “C” grade scores, major omissions and even hallucinations, we see that AI still has a long way to go. AI has not yet reached the top 10% of practicing attorneys, much less the superintelligent level—the best of the best, A+, in the top 1% to 3%.

How long will it take for AI to reach superintelligent human lawyer level? I do not know. Maybe never, or maybe as early 2029. Ray Kurzweil and many others have predicted that superintelligence will be attained in all fields (“AGI”) by that time. Ray Kurzweil, The Singularity is Nearer (when we merge with AI) (Viking, June 25, 2024); Losey R., Breaking the AI Black Box: A Comparative Analysis of Gemini, ChatGPT, and DeepSeek (2/6/25).

Many leaders in the field of AI contend AGI will be reached by 2028. See e.g. Shane Legg’s Vision: AGI is likely by 2028 (11/17/23); Jensen Huang’s Life and Company … AGI by 2028 (12/28/23); Artificial General Intelligence, If Attained, Will Be the Greatest Invention of All Time (8/12/24) (reviews predictions and explains).

Please remember, as mentioned before, we are assessing only reasoning ability here—not the full complexity of the human mind, not actual consciousness, emotions, intuition, or other qualities inherent to living beings. Losey, R. The Human Edge: How AI Can Assist But Never Replace (1/30/25). There is a lot more to life than thinking!

Super legal reasoning AI in the future working with conscious lawyers who can have real smiles, not just fake.

Conclusion

AI’s performance on this Bar Exam challenge offers a revealing glimpse into the state of legal technology. While today’s top reasoning models show impressive capabilities, they remain far from replacing human lawyers—especially when it comes to nuanced analysis, strategic thinking, and ethical considerations. The results reinforce an essential truth: AI can be a powerful tool, but it still requires human oversight, verification, and legal expertise.

For lawyers, law students, and legal tech professionals, the challenge now is to determine how best to integrate AI into legal practice—leveraging its strengths while mitigating its weaknesses. Will AI become a reliable legal assistant, helping attorneys work more efficiently? Or will it introduce new risks that demand careful regulation? The answer depends on how we, as a profession, engage with this evolving technology.

I encourage you to test these models for yourself. Run your own legal reasoning experiments. Do not believe the hype on both sides. Look for professionals with no economic motivation to put their finger on the scale. Share your findings. Remember that even the best of the best human lawyers sometimes make mistakes. Be kind and encourage a collaborative, group effort. The conversation about AI’s role in law is just beginning, and your insights can help shape the future. Let’s move forward together—thoughtfully, critically, and with a clear-eyed view of what AI can and cannot do.

Law firm of the future where people and AI are all smart and work well together.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. Echoes of AI on Bar Battle of the Bots- Part Two. Hear two Gemini AIs talk about all of this, and much more, in just under 16 minutes. They wrote the podcast, not me. Note, for some reason the Google AIs had a real problem generating this particular podcast without hallucinating key facts and making other errors. It took me many tries. It is still not perfect but is pretty good. These podcasts are primarily entertainment programs with educational content to prompt your own thoughts. See disclaimer that applies to all my posts, and remember, these AIs wrote the podcast, not me.

2 Comments | AI Ethics, AI Instruction, ChatGPT, Gemini AI, knowledge, Lawyers Duties, Metadata, Technology, VENDORS, wisdom | Tagged: bar exam | Permalink
Posted by Ralph Losey

Bill Gates on the Next ‘Big Frontier’ of Generative AI: Programming Metacognition Strategies into ChatGPT

July 26, 2024

Ralph Losey. Published July 26, 2024.

Bill Gates has insider knowledge on the future of generative AI. He predicts there will be ‘two more turns of the crank’ on scaling, with the next ‘big frontier‘ being ‘metacognition.’ Gates describes ChatGPT’s metacognitive strategy today as trivial. Ouch!

Bill Gates 2023. EU photo. Photoshopped AI by Ralph Losey.

While “trivial” might sound harsh, Bill Gates’s assessment is essentially accurate. All generative AI models currently have limited metacognitive capabilities, such as reflecting on a problem, planning responses, thinking in steps, using external tools, and checking for errors. These metacognitive processes largely depend on user prompt engineering, with minimal inherent support built into the models.

Metacognition depicted in impressionist art style by Ralph Losey using his custom GPT, Visual Muse

AI Metacognition Now Dependent on User Skills

OpenAI has outlined six basic strategies for prompt engineering, all designed to encourage ChatGPT to engage in metacognitive responses. Although I haven’t written extensively about this yet, these strategies have been the primary focus of my studies since their release by OpenAI in December 2023. My tests since then indicate that these strategies are crucial for achieving consistent, high-level results. While it would be ideal if these capabilities were built into the models, this has not yet been accomplished by programmers. However, some strategies can be partially integrated into the model through the use of custom GPTs. The chart below illustrates OpenAI’s six basic strategies, each of which includes multiple implementation tactics. We are currently teaching these strategies privately.

Users must now supply the metacognition needed to guide and refine the model’s responses. This means that successful use of ChatGPT requires careful prompting in each chat session, as learning does not carry over from one session to another. While custom GPTs, such as my Panel of AI Experts for Lawyers, partially address this issue, they do not fully eliminate the need for user intervention. This reliance on user-provided metacognition significantly limits the ability of generative AI to reflect, learn, and self-correct in real-time

The metacognitive abilities of ChatGPT (and all other generative AI models) are, in Gates’s words, trivial, especially compared to humans. Thinking about thinking and learning is a skill most children acquire. Individuals like Bill Gates possess advanced metacognitive skills, honed through decades of complex planning, execution, testing, correction, and continuous self-education. Gates is acutely aware of the vast differences in metacognitive abilities among people—their capacities to learn, plan, and think before speaking or acting. This insight drives his belief that AI software should not rely on users’ varying skills. Instead, he advocates for integrating high-level metacognition functions directly into the software. This is why Gates considers metacognition the ‘most interesting dimension‘ of AI.”

Bill Gates Interview of June 29, 2024

Bill Gates agrees with the majority of AI experts that substantial improvements can still be made by continuing to scale data and compute resources. Many believe that this alone will be sufficient to achieve AGI, with AI developing the necessary metacognition on its own. Gates, however, disagrees. He finds it ‘mind-blowing‘ that deep learning works at all and believes that the next leap to superintelligent, accurate AI will require both scaling and the addition of metacognition programming. Gates may be right; deep learning alone may not be enough to reach AGI.

Based on Bill Gates’s June 29, 2024, interview, it is evident that Microsoft and other companies are now working to program metacognition into AI models. Gates anticipates improvements next year and hopes “the problem will be solved sometime after that.” This timeline aligns closely with Ray Kurzweil’s prediction of achieving AGI by 2029, as outlined in his new book, The Singularity is Nearer: When We Merge with AI (book review by Ralph Losey).

Let’s hear directly from Bill Gates in the interview. In the AI video below (and on YouTube), you’ll hear his distinctive voice and words (unaltered) in a fair use excerpt from his interview with the Next Big Idea Club, titled Bill Gates Reveals Superhuman AI Prediction.

Click to watch the video which has the voice and words of Bill Gates that inspired this article. Avatar image by Ralph Losey.

Here is an unaltered transcription of Bill Gates video interview talk above:

We have, you know, probably two more turns of the crank on scaling, whereby accessing video data and getting very good at synthetic data, that we can scale up probably, you know, two more times. That’s not the most interesting dimension.

The most interesting dimension is what I call metacognition, where understanding how to think about a problem in a broad sense and step back and say, okay, how important is this answer? How could I check my answer? You know, what external tools would help me with this? The overall cognitive strategy is so trivial today that, you know, it’s just generating through constant computation each token in sequence. And it’s mind-blowing that that works at all. It does not step back like a human and think, okay, I’m going to write this paper and here’s what I want to cover. Here’s, okay, I’ll put some facts in. Here’s what I want to do for the summary.

And so you see this limitation when you have a problem like various math things, like a Sudoku puzzle, where just generating that upper left-hand thing first, it causes it to be wrong on anything above a certain complexity.

So, we’re going to get the scaling benefits, but at the same time, the various actions to change the underlying reasoning algorithm from the trivial that we have today to more human-like metacognition, that’s the big frontier, that it’s a little hard to predict how quickly that’ll happen. You know, I’ve seen that we will make progress on that next year, but we won’t completely solve it for some time after that. So, you know, your genius will get to be more predictable.

Now, in certain domains, confined domains, we are getting to the point of being able to show extreme accuracy on some of the math or even some of the health-type domains, but the open-ended thing will require general breakthroughs on metacognition.

Bill Gates believes that new data from videos and synthetic generation can support at least two more turns of the crank in scaling increases. With companies like Nvidia using AI to accelerate chip designs, there may be even more significant advances in compute power. Even without additional data, increased compute capacity could lead to remarkable improvements in speed and complexity, with more robust synaptic-type connections being formed.

Both types of scaling up should facilitate metacognition training. Metacognition might even arise spontaneously from the new data and enhanced compute capabilities. Other unexpected intelligence abilities may emerge from exponential increases in scaling, not just metacognition. The recent progress from version 3.0 to 4.0 of ChatGPT supports this assumption.

**“A couple of more turns of the crank.” Impressionistic style by Losey’s AI.**

What Is Metacognition?

Metacognition, as defined by Wikipedia, is an awareness of one’s thought processes and an understanding of the patterns behind them—essentially, thinking about thinking. The term was coined by Stanford Professor John H. Flavell (born 1928). See: Flavel, J.H. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry (American Psychologist, 34(10), 906–911, 1979). Also see: Lupeanu, Doru Metacognition in 10 Points (KnowledgeOne, 10/13/22) (point 6: Metacognition “is developed in childhood but can be improved throughout our lives”). Professor Flavell was a leading American proponent of the famous Swiss child psychologist Jean Piaget (1896-1980) who pioneered work on childhood cognitive development.

Generally, metacognition involves self-regulation and reflection on how we think, learn, and solve problems. This concept encompasses two main aspects: knowledge about cognition (knowing what we know) and regulation of cognition (managing how we learn). These areas of study are crucial for training and fine-tuning generative AI, making it smarter, more accurate, and consistent. As Gates put it, helping “your genius to get to be more predictable.”

Metacognition Applied to Generative AI

In the context of generative AI, metacognition involves the system’s ability to “understand” and “reflect” on its own processes. For example, a generative AI might use metacognitive strategies to evaluate the quality of its outputs, adjust its approach based on feedback, and improve its performance over time. Bill Gates has noted that these abilities are currently at a trivial level of perfection, indicating significant room for improvement in AI metacognition. This area is now a focus of intense study in AI research labs, with Microsoft as a key leader. For instance, Lev Tankelevitch, et al. discuss these concepts in their paper, The Metacognitive Demands and Opportunities of Generative AI (ACM, May 2024). Here is an excerpt from the Abstract of Microsoft’s scientific paper:

Drawing on research in psychology and cognitive science, and recent GenAI user studies, we illustrate how GenAI systems impose metacognitive demands on users, requiring a high degree of metacognitive monitoring and control. We propose these demands could be addressed by integrating metacognitive support strategies into GenAI systems, and by designing Gen AI systems to reduce their metacognitive demand by targeting explainability and customizability. Metacognition offers a coherent framework for understanding the usability challenges posed by GenAI, enabling us to offer research and design directions to advance human-AI interaction.

Read the paper, it’s good, and ends with a word I had to look up – “amanuensis.” But also watch the Microsoft video on YouTube that explains the paper. The video features Lev Tankelevitch, a lead Google researcher at Cambridge. He talks fast, but there is a full transcript with the video. The Image below is a screen shot of the video, where Lev provides a good introduction.

You can find the 24-page Microsoft paper here. As a useful exercise and to gain more insight, download the document, then upload it to ChatGPT4o – Omni, and ask for a summary in the prompt window. Follow up with any questions you have. I recently did this and received a well-written 489-word summary. While I could easily tweak it and continue the blog with the summary—potentially impressing you with my pseudo-erudition—I’d find it boring. Moreover, it seems that most blogs today are taking this shortcut. My discerning and intelligent readers deserve better than the B+ efforts of even the best AIs today. When I use AI-generated content, it is clearly marked and quoted. After all, AI is merely an amanuensis, not a real Learned Hand.

An *amanuensis* hand is no Judge Learned Hand. Digital art style by Ralph Losey using Visual Muse

You could also go deeper into the Microsoft article, or general subject of metacognition, by using one of my custom GPTs, such as my Panel of AI Experts for Lawyers. That will give a really deep dive. Or you could try my latest GPT, which is designed for quicker, more flexible answers, Panel of Experts for Everyone About Anything. This GPT told me of another interesting article on point, Jason Toy, Phil Tabor, Josh MacAdam, Metacognition is all you need? Using Introspection in Generative Agents to Improve Goal-directed Behavior (arXiv, 3/4/24) (testing of the AI includes a scenario with a zombie apocalypse).

Although scientific papers involving a zombie apocalypse are cool, maybe you are more in the mood for a laidback summary of The Metacognitive Demands and Opportunities of Generative AI. If so, try using my GPT, The Dude Abides and Gives Advice, Man. Just tried it and the Dude’s summary was only 345 words. It made me smile 10 times and laugh out loud twice, not that I’m hyper into metadata or anything. The Dude perspective is pretty good for metacognition, which is, after all, just another way to talk about machine mindfulness. See e.g. Jankowski and Holas Metacognitive model of mindfulness (Consciousness and Cognition, Volume 28, August 2014, Pages 64-80); Elisabeth Norman Metacognition and Mindfulness: the Role of Fringe Consciousness (NLM, 2/22/16).

“What is metacognition?” I asked The Dude. He said:

Metacognition is basically thinking about thinking, man. It’s being aware of your own thought processes and being able to control them, like a mental autopilot but you’re still in the driver’s seat.”

The mental autopilot but still in the driver’s seat is not something I have ever read before. GenAI is good at creative analogies. In response to another question, The Dude listed five points of similarity between mindfulness and metacognition, and three differences. All were interesting, but I’ll just share the third difference.

Techniques: Mindfulness involves practices like meditation, deep breathing, and body scans. Metacognition involves techniques like self-questioning, thinking aloud, and using checklists to monitor your thought processes. It’s like the difference between meditating and making a game plan.

**Making a game plan. Thinking about what you are going to say and do. Pop Art image by Losey’s AI.**

Using an unconventional cognition style, such as a custom GPT like “The Dude Abides,” often yields novel thoughts and insights. Some attribute this to the diverse cross-discipline perspectives inherent in big data training. Just as human brainstorming thrives on a variety of viewpoints, it makes sense that AI would benefit similarly, despite its reliance on a very different, brute-force statistical process.

How Metacognition Algorithms Could Improve Generative AI

I asked my GPT, Panel of AI Experts for Lawyers for a discussion of “how metacognition algorithms could improve generative AI.” I selected these experts: Prompt Engineering Expert, Machine Learning Engineer, Advanced GPT User, Child Prodigy and Devil’s Advocate. Here are a few selected excerpts worthy of your attention. (For the full message, you could always ask the GPT yourself.)

Prompt Engineering Expert: By embedding reflective thinking, AI can adjust its responses based on past errors, leading to increased reliability and reduced hallucinations. This iterative self-improvement can significantly boost the overall intelligence of AI systems. … We should focus on creating prompts that encourage AI to engage in metacognitive practices. For example, prompts that ask AI to explain its reasoning or evaluate the confidence of its answers can be instrumental. This ensures that metacognitive algorithms are activated and utilized effectively.

Advanced GPT User: User feedback should be incorporated into the AI’s metacognitive processes. By allowing users to highlight errors or provide additional context, AI can learn and adjust more effectively.

Child Prodigy. Think of it like teaching AI to learn from its mistakes just like we do. When AI can reflect on what it got wrong and figure out how to fix it, it becomes smarter and more reliable. … AI should be designed to ask for help when it gets stuck, just like we do. By building in mechanisms for AI to seek clarification or additional information, we can reduce errors and improve its learning process.

Devil’s Advocate. While implementing metacognition is beneficial, we must ensure that it does not compromise the speed and efficiency of AI systems. It’s important to strike a balance between self-improvement and operational performance.

**Metacognition Mindfulness illustrated in Abstract Art style by Ralph using Visual Muse**

Trained Users Can Also Provide the Metacognition that AI Needs

According to Bill Gates and Microsoft, generative AI needs two things to reach reliable superintelligence: 1) a few more turns of the crank on scaling, meaning an increase in the amount of data and compute, which Gates sees as imminent; and 2) rule-based software to give AI models metacognition, which he currently views as trivial, making it miraculous that generative AI works at all. I respectfully disagree, at least in part, and suggest they are missing a third key component: humans.

Microsoft still seems focused on automating everything with rule-based, old paradigm software. This is not surprising, as these methods have worked well for years. Moreover, traditional programmers often underestimate the miracle of deep learning and the significance of the current user-focused metacognition design. It’s not truly trivial because it allows a skilled human user to provide the GPT with the necessary metadata on a session-by-session basis. This ability, embedded in the model’s deep learning processes, is impressive. In my view, it’s far from trivial that natural human metacognition can be transferred to AI through a series of user prompts.

Still, it could be better. Perhaps more rule-based software is needed, or perhaps a few more turns of the crank on scaling will suffice to achieve constant AGI in all fields, including Bill’s favorite, sudoku.

Sudoku playing AI by Ralph Losey using his custom GPT

What Bill Gates may not fully appreciate is that trained users can already provide the necessary metacognition to AI. In each chat session, skilled humans can prompt the model to enhance AI intelligence, increase consistency, and reduce errors and hallucinations to an acceptable level—fewer errors and fabrications than most humans. Humans skilled in prompt engineering can make the results of these AI systems predictable enough for the most demanding applications, including my field, law.

There is no need to wait for Microsoft or other big software companies to add the metacognition dimension to responsibly use ChatGPT. We can do that ourselves through prompt engineering. Even if Microsoft does not solve the problem of autonomous metacognition soon, we humans can fill the gap by improving our own skills. In my view, humans and AI should advance together. Excessive delegation to machines is not advisable. A hybrid approach to work and development is a safer solution and should lead to greater, more diverse intelligence.

**Human endowing AI with Metacognition. Expressionism style.**

The Current Debate on Scaling

Let’s return to the ongoing debate in the AI community regarding the importance of scaling. Some, like Sam Altman and Ilya Sutskever, argue that scaling alone can achieve AGI. They believe scaling is all that’s needed. See, for example,Albert Wenger, Scale is All You Need? (5//7/24); Surprising Admissions by OpenAI Leaders Made in Recent Interviews (e-Discovery Team, 8/8/23). Others disagree, asserting that much more is required. Microsoft, or at least Bill Gates, now aligns slightly with this view. Gates emphasizes the need for metacognition, in addition to scaling, to achieve AGI. Others point to different challenges that could prevent AGI.

Some traditional AI scientists claim that all generative AI is just media hype. This sentiment is familiar to veterans like me. I recall experts making similar statements about the Internet in the 1990s. Then, in May 1995, Bill Gates discovered the Internet, recognized its potential, and called it a coming “Tidal Wave.” He was right. Now, he praises generative AI even more enthusiastically, as does the current CEO of Microsoft, Satya Nadella. The outcome remains uncertain, but game-changing AI is already here. Metacognition might be the path to AGI, or it could be achieved through more scaling, human merger, or perhaps a combination of all three.

AI and Humans need to work together and eventually merge. Image by Losey using Visual Muse

I personally believe humans will always have a role to play, and eventually, we and AI will become one. There’s no debate that adult humans—most of us, anyway—already possess significant metacognitive awareness and skills. We are adept thinkers, planners, and creatives. AI needs us, and we need it.

Everyone who has learned prompt engineering skills, and it does take time and practice, discovers that its true purpose is to impose their human metacognition onto generative AI. In prompt engineering, you learn to do this through a series of prompts. Multi-shot prompting is a classic example. See, for instance, Sarojag, Know about Zero Shot, One Shot and Few Shot Learning, (Analytics Vidhya, 12/4/23); Margaret Rouse, Zero-Shot, One-Shot, Few-Shot Learning, (Techopedia, 1/17/24). Through this process, we compel AI to engage in a form of metacognitive thinking. However, most agree that the amount and quality of transfer in this manner is insufficient to reach AGI. Expert user prompting alone is not enough to make the AI genius consistent, as Bill Gates mentioned. Not until we see several more scaling increases, additional turns of the crank, and built-in metacognition will this become a reality.

Who knows what a couple of more turns of the crank may bring? Impressionism style by Losey’s AI

Holistic Compromise: Hybrid Multimodal Approach

Many proponents of the scaling group of AI designers believe that more turns of the crank will render hybrid human-AI processes unnecessary, thinking that scaling alone will awaken metacognition. A more traditional group, including Bill Gates, disagrees. They believe that metacognition and reliability won’t emerge from scaling alone and must be programmed separately. I propose a third way: a holistic multimodal approach that includes all three methods.

In my experience, the correct solution is often “both and,” not “either or.” We should adopt a try-all-methods multimodal approach, incorporating scaling, traditional rule-based programming, and, most importantly to me, the human touch. I agree with Ray Kurzweil’s prediction to a certain extent: a merger of man and machine is inevitable.

In my opinion, skilled human users will always be able to elevate AI to a higher level than AI can achieve on its own. We will not slow AI down in the long run. As we merge with AI, we will enhance and expand our own intelligence, making unique contributions with our biological capabilities. Our metacognition skills will likely surpass what probabilistic, stochastic thinking alone can produce. From my studies and experience using AI since 2012 and computers since 1978, I believe that human gifts will be essential in taking AI beyond the AGI level and into the final frontier of superintelligence and the singularity. We possess the metacognition and other abilities that AI needs to reach the next levels.

**The Thinkers working together in hybrid metacognition by Ralph Losey using his Visual Muse GPT**

The Future of Programming Metacognition into Generative AI

Bill Gates and Microsoft may well be correct about the limitations of scaling. If and when progress from scaling ends, the focus may shift to programming metacognitive procedures into generative AI. By then, deep-learned AI will be far more advanced than it is now. According to OpenAI’s Chief Technology Officer, Mira Murati, ChatGPT 5 is expected to reach a PhD level for specific tasks. Currently, she describes the ChatGPT 4.0 Omni level as comparable to a smart high schooler in most tasks. In certain narrow fields, however, GPT-4.0 is already at a PhD or higher level, such as in programming and some areas of mathematics. The multidimensional mathematical analysis capabilities of GPTs have already led to breakthroughs in Nvidia chip design and protein folding, revolutionizing medical research.

Bill Gates envisions more than what human prompt engineering can provide; he seeks real-time self-assessment—autonomous metacognition—built directly into the software. Microsoft, along with others, aims for AI metacognition to function independently of human prompting. While there are clear advantages to this approach, I believe it should not be an “either-or” situation. Metacognition should be both autonomous and open to human enhancements. This hybrid approach prevents individual disempowerment and excessive control by large corporations. Maintaining a “human in the loop” approach ensures that human metacognitive skills can still intervene and guide AI processes. I am cautious about fully autonomous AI in general, as humans remain the most interesting and valuable dimension. Metacognition is just one of the many capabilities we can contribute to superintelligence.

Like many others, I strongly favor a hybrid human-AI relationship, where, as Ray Kurzweil predicts, we become one with AI. Teenagers and others glued to their iPhones are just early, albeit sometimes unhealthy, examples of this merger. This integration will improve over time. Companies like Meta and Apple are focused on advancing this integration, likely through new accessories such as improved glasses, hats, watches, and ear pods. However, the hybrid approach alone is not sufficient. Deep learning and logical programming must also play their parts for us to reach superintelligence levels. Additionally, there is an equality and fairness component to consider. Just as smartphones and the internet are now ubiquitous, it is crucial that all humans have access to the next stage of evolution, not just a highly educated or wealthy elite.

Merger of AI via devices depicted in watercolor style by Ralph Losey using Visual Muse GPT.

Eventually, metacognition will be built into AI, allowing it to evaluate its own performance during chat sessions without human guidance. The AI will be able to identify errors and uncertainties in its thinking in real-time, making internal evaluation continuous and automatic. Although no one knows exactly how to achieve this yet, with the power of thousands of Microsoft researchers and the scaling improvements that OpenAI will make in the coming years, success is likely. Companies like Google, Anthropic, Apple, Amazon, and a few others are also poised to make similar advancements.

I am hopeful that before the dark side of AI potential ruins everything, AI will progress at exponentially high speeds, sufficient to launch the Singularity and leave behind petty human power ambitions.

Advantages of Built-In Metacognition in AI Software

There are additional advantages of programming metacognition into AI software that haven’t been discussed yet. This programming should allow for the automatic, dynamic adjustment of the AI’s thinking processes. It would enable the AI to modify its approach to specific tasks based on self-assessment, adjusting its reasoning and outputs before responding to a prompt. This contrasts with the existing user prompt methods, which require constant human engagement to guide AI analysis and avoid mistakes and hallucinations.

The current reliance on skilled users limits the consistency and number of people who can effectively use ChatGPTs and achieve consistent, high-quality output. By building prompt engineering skills into the software, everyone, even beginners, could quickly achieve B+ or A level AI responses. Nevertheless, studies and common sense suggest that skilled humans will always attain the highest quality results most of the time, reaching A+ levels with astonishing new insights and profound advice. Therefore, developing skills in prompt engineering now will not be a waste of time, as the upgrade to built-in metacognitive skills is still many years away. Prompt engineering should remain a high priority in educational efforts, even if its importance may diminish somewhat in the future.

Prompt Engineering Skills Can Give Metacognition to AI. Abstract style by Losey using Visual Muse.

Barring privacy barriers that could limit metacognition training transfers from one session and user to another, AI could someday autonomously learn from all past interactions. It could integrate feedback from each chat session, and potentially from each user, to improve future performance without external prompts. An internal, ongoing deep learning process would enable AI to evolve based on its experiences with hundreds of millions of humans and trillions more simulated chat sessions.

Of course, privacy concerns, particularly from EU privacy and sanction-taxation lawyers, might oppose such cross-training improvements. I am hopeful that increasingly intelligent AI will help future lawmakers find acceptable compromises. They should be able to do so, provided enough independent lawyers in the world can keep up. My current work as a writer is dedicated to the goal of enhancing lawyer competence through AI understanding and assistance.

Conclusion

Bill Gates aims to reduce the need for quality human input by enhancing AI software with autonomous metacognition features. With the power of thousands of Microsoft researchers and the scaling improvements from OpenAI, Microsoft is likely to succeed eventually. Google and several other companies will probably achieve this as well. Meanwhile, certain foreign powers will resort to their usual trade-secret thievery to try to keep pace. However, slave states are typically adept only at theft and stasis, not innovation and change. When exponential change reaches a certain speed, the theft of old knowledge will become futile. This kind of accelerated change is unprecedented in human history.

The concern among many in the legal profession is that technological advances will continue to accelerate so rapidly that very few independent lawyers will be able to keep up. Few law firms will grasp the latest AI advances and the new science and engineering insights that AGI-level AI will bring. Keeping up is already exhausting, even for specialists dedicated to this quest. Hopefully, there will be enough lawyers merged with AI to keep pace with the accelerating advancements. The same applies to regulators, academics, technology companies, and the public. Enlightened laws, skilled lawyers, and reasonable regulations are essential to restrain the excesses of both humans and AI. This will need to be a team effort, but it should be achievable with the help of ever-improving AI.

Keeping up with AI. Who has the time? Digital art style by Losey.

At some point, if Ray Kurzweil’s predictions are even close to correct, our intelligence will be increasing at incredible speeds. This may launch the Singularity and leave behind the petty power ambitions of most humans and nation-states. This is my hopeful vision for the future, the one I wish for my grandchildren. Whether I live long enough to see it or not is irrelevant. We are on our way, and despite the many challenges ahead, the long-term future looks promising.