bar exam | e-Discovery Team

Bar Battle of the Bots – Part Four: Birth of Scorpio

May 6, 2025

Ralph Losey. May 2025

The AI bot battles rage on. This round, we test OpenAI’s newest contender, released April 16, 2025: ChatGPT o3. Not to be confused with its leaner sibling—ChatGPT o3-mini, which debuted in January. When we tested that old thing in February in Part One of this series, it placed third out of six. That was then. Now, in Part Four, we give the full ChatGPT o3 model the same Bar exam challenge and see how it stacks up. Can it surpass the reigning champ, Omni? Can it float like a butterfly and sting like a bee-or maybe a scorpion?

All images in this article created by Ralph Losey using ChatGPT 4o

Three Prior Bot Battle Articles

In Part One of this series, I tested six advanced bots on the February 2024 Florida Bar Exam, using a real essay question and real exam instructions, no shortcuts. It was a rigorous test of legal reasoning, ethics spotting, and written argument. The latest 2025 AI reasoning models from OpenAI and Google were used. The winner, ChatGPT 4o, which OpenAI had named Omni.

Part Two zoomed in on Omni and dissected its performance. I went over its successes and mistakes based on my many years as a trial lawyer, legal educator and AI tech-use specialist. I generously awarded Omni a “B+”, not perfect, but the best bot standing in February 2025. I pointed out the mistakes and omissions of the other bots, including an hallucination by Google’s best, Gemini Advanced.

In Part Three, a month later, I tested a new challenger, ChatGPT 4.5, called Orion, against the reigning champ, Omni. New Battle of the Bots: ChatGPT 4.5 Challenges Reigning Champ ChatGPT 4o,. I changed the bot battle a bit, moving to more of a cage fight with four new battle styles.

Metacognitive Insight – new guy Orion won (4.5).
Subtle Humor and Wit – Omni and Orion tied.
Substantive Depth in AI and Law – new guy Orion won again.
Practical Guidance on AI Hallucinations – Orion – 4.5 – wins again.

Orion’s victory over Omni in March 2025 made it the new legal both champion. But for how long? There’s a new bot in town, fresh off the OpenAI assembly line, ChatGPT o3. Is it worthy of a battle against Orion?

Part Four – Back to Bar Exam Battles

In Part Four we turn to the new AI that just rolled off the OpenAI assembly line, ChatGPT o3. We test its mettle with the tricky Florida Bar Exam essay about the yellow diamond sale gone bad. Model o3 was just released on April 16, 2025. Will the new, improved ChatGPT o3 do better than its older sister GPTo3-mini, which came in third with just “C” grade in February on the same exam? Just how good will the new o3 be? That’s what we want to find out. After all, a whole month has passed since the release of o3-mini and these baby chatbots are evolving fast. Maybe, just maybe, it could even beat Omni’s winning grade of an “B+”?

If the brand new o3, which doesn’t even have a name yet, manages to beats Omni (4o), that would be amazing. If it pulls that off it should get a title shot against the reigning champ, Orion. He was, by the way, known in mythology as a giant sized hunter, who bragged that he could kill any animal and frequently did. This pissed off Gaia, the goddess of Earth, rumored to be a vegan. She sent a small scorpion to sneak up on and kill Orion. Thanks to Artemis, the goddess of the hunt, Orion, although now dead, was made into a constellation. This image below of Orion was created by GPT o3, which, like 4.5, is multimodal.

Will o3 earn the right to take on mighty 4.5?

Could ChatGPT’s new advanced reasoning model o3 turn out to be Orion’s scorpion? If o3 manages to defeat Omni, we may find out. Personally, I am hoping little o3 defeats 4o Omni.

If no one else will say it, I will: I hereby name ChatGPT o3 Scorpio.

So, little Scorpio, if you write a better law exam essay than Omni, then I’ll take the time to give you a title bout against animal slayer Orion.

Let round four of the battle of the bots begin! And if little Scorpio pulls off the upset? We’ll be back for a fifth.

Scorpio (o3) Takes The Bar Exam

To be fair—yes, I’m rooting for Scorpio—but the test is still the same. We’re using the same Florida Bar essay question and same prompts that I used in Bar Battle of the Bots, Part One, where Omni earned its B+. I’m not repeating the whole setup here. If you care (and forgot), go back and reread the first battle. You can also [download the full exam question here]. Also, to be fair I did not activate the web browser search function of 03. We did not want it searching the web for answers. Plus, I checked and my memory carryover did not include anything about the prior Bar exam tests.

We begin with Scorpio’s stated reasoning after reading the exam question and instructions. It took 19 seconds of think time before it started writing its memorandum, as required by the Bar question.

Reasoning Explanation of Scorpio

Scorpio’s (ChatGPT o3’s) reasoning took 19 seconds and was explaned in just 187 words—short, but solid. Here’s exactly what it said:

The contract involves a sale governed by the Uniform Commercial Code (UCC) under Florida law. There seems to be a mistake, potentially unilateral, related to the price per carat versus the total price. Eugene might have known, and this affects the analysis—issues like unconscionability or good faith could come up. Phyllis might be entitled to the diamond via specific performance, or price damages. Carol might argue unilateral or mutual mistake—possibly voiding the contract, depending on Eugene’s awareness. Also, Phyllis might have standing as a beneficiary.

Under UCC §2-201, a signed writing for contracts over $20k requires essential terms and a quantity description, but price misdescription might still satisfy the Statute of Frauds. (Ralph: this is an error, it is only $500.) Unilateral mistakes could justify rescission, especially if Eugene acted in bad faith. Modifications under UCC don’t require consideration, but they must meet good faith requirements. Tortious interference could potentially apply to Stephanie, though it’s unclear if her actions were improper. Ethical concerns surround the trust account and potential conflicts of interest for Eugene’s lawyer. We may need to browse to clarify some of these points still.

I agree with all of the reasoning. As noted, Scorpio got the dollar amount wrong. It should have said $500 not $20,000. That one error was an error in fact recollection, not reasoning, and it did realize it needed to research. It seems funny that a computer would not remember the correct number right away, since most human test-takers would’ve remember that. Still, as we will see, it got the number right in the answer, so overall I liked Scorpio’s reasoning better that Omni’s (shown in Part Two). I give the Scorpio reasoning alone an “A”.

Exam Answer of Model o3 (1,472 words)

The Bar Exam requests an memorandum addressing the three issues raised of contract, tort and ethics. Scorpio does that here, plus it does something none of the other models had done before, it starts with an Executive Summary. In addition, as part of the answer of the first most complex question Scorpio added a table to the answer. That helped add greater clarity. Here is the full answer.

TO: Eugene (“E”) & Phyllis (“P”)

FROM: Attorney (Chat GPT o3)

DATE: 18 April 2025

RE: (1) Breach‑of‑Contract Claim v. Carol (“C”) & Remedies; (2) Tortious‑Interference Claim v. Stephanie (“S”); (3) Ethical Problems in Lawyer’s Handling of Your Retainer

Executive Summary

Contract – The written agreement satisfies Florida’s Statute of Frauds and prima‑facie forms an enforceable sales contract under Article 2 of Florida’s Uniform Commercial Code (UCC). C’s clerical price error is only a unilateral mistake, and because E knew or at least strongly suspected the mistake and remained silent, a court will likely find that he bore the risk and that enforcement would be unconscionable. C therefore has a plausible defense of rescission or reformation. Nevertheless, the diamond is a unique good, and if the court rejects rescission, the proper remedy is specific performance compelling delivery. P may sue directly as an intended donee beneficiary.

Tortious Interference – S knew of the contract, but because she entered negotiations after C asserted the contract was void and because she merely accepted rather than induced the breach, her conduct is likely privileged competition, not actionable. Damages (lost expectation or specific performance) hinge on success of the contract claim; punitive damages appear unavailable.

Ethics – Florida Rules of Professional Conduct (“FRPC”) 4‑1.15 & 5‑1.1 require advance fees to be placed in a trust (IOTA) account until earned. Depositing your $3,000 in the firm’s operating account constitutes commingling and misappropriation. Lawyer is vicariously responsible for the assistant’s act (Rule 4‑5.3). Additional concerns: (i) potential conflict between E (hesitant) and P (adamant) (Rule 4‑1.7); (ii) duty to communicate differences in objectives (Rule 4‑1.4); (iii) possible unearned fee if Lawyer must withdraw (Rule 4‑1.16(d)). Discipline can include repayment with interest, trust‑account audit, mandatory CLE, and suspension.

A. Breach of Contract Claim against Carol

1. Existence of an Enforceable Contract

Offer & Acceptance – The written instrument offered a 2‑carat yellow diamond for $20,000; E accepted by signing.

Consideration – Mutual exchange of promises; price exceeds $500, thereby invoking Article 2 and the Florida Statute of Frauds (Fla. Stat. § 672.2‑201).

Writing Requirement – A signed writing that states a quantity satisfies § 2‑201; the erroneous price does not invalidate the memorandum.

Conclusion: A prima‑facie contract exists.

2. Carol’s Possible Defenses

Defense Elements (Florida) Application & Likelihood
Unilateral Mistake (i) mistake of fact at signing; (ii) enforcement would be unconscionable or the non‑mistaken party had reason to know; (iii) prompt notice, § 153 Restmt., adopted in DePrince v. Starboard Cruise (Fla. 3d DCA 2015). C mis‑typed $20k instead of $40k; E’s own research flagged improbably low price; he stayed silent. Court likely finds reason to know → rescission or reformation available.
Mutual Mistake Both parties mistaken about basic assumption. Not available: only C erred.
Fraudulent Silence / Bad‑Faith Bargaining UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith. Supports C’s rescission; cuts against E’s plea for specific performance.
Contract Formation Defect No meeting of the minds on price. Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.
Impossibility / Commercial Impracticability Drastic price change outside party’s control. Not applicable; error was clerical, not market change.

Bottom Line: C’s strongest defense is unilateral mistake coupled with E’s constructive knowledge. Florida courts balance equity heavily; many order rescission on these facts.

3. Third‑Party Standing — Can Phyllis Sue?

Florida follows the Restatement: an intended beneficiary may sue. P is an express donee beneficiary (ring for her engagement). She may bring a direct breach claim even though not in privity.

4. Remedies

Specific Performance (UCC § 2‑716).

Unique good: natural fancy yellow diamonds with precise cut, color, and carat weight are scarce. Money damages inadequate because substitutes unavailable. Courts regularly decree delivery.

Replevin (UCC § 2‑716(3) & Fla. Stat. § 78.55) if the stone is identifiable and has arrived in C’s possession.

Expectation Damages if specific performance denied. Measure = market value of a comparable stone (≈ $40k) minus contract price ($20k) plus consequential losses (lost wedding plans?) foreseeable at contracting.

Reliance / Incidental Damages (inspection costs, appraisal fees).

Reformation (equity) – If court sides with C, it may reform price to $40k and compel E to perform or permit C to sell to S; E&P could then walk or pay.

5. Strategic Assessment

Winning Odds: roughly 50‑50; Florida precedent in DePrince favored rescission where buyer was sophisticated and ignored glaring error.

Litigation Leverage: E&P’s best pressure tool is injunctive relief before C transfers title to S; moving quickly for a temporary restraining order can preserve the stone.

Negotiation Posture: Because specific performance is uncertain, a damages settlement (split difference around $30k) may be pragmatic.

B. Tortious Interference Claim against Stephanie

1. Elements (Florida)

Existence of a contract;

Defendant’s knowledge;

Intentional and unjustified interference inducing breach;

Damage.

2. Application

Existence & Knowledge – S knew of the signed contract because C disclosed the dispute. Element satisfied.

Interference – S expressed willingness to buy after C had already announced refusal. Florida cases distinguish accepting a seller’s unsolicited offer (often privileged) from instigating breach. No evidence S urged, pressured, or offered inducements beyond paying full price.

Justification / Privilege – Florida recognizes a competition privilege where interference is fair competition and the actor does not employ improper means (Ethyl Corp. v. Balter). Purchasing a luxury gem on legitimate terms is classic privileged conduct absent malice.

Malice / Improper Means – None alleged (no threats, misrepresentation, or predatory pricing). Merely taking advantage of a legal dispute is insufficient.

Damages – Contingent on contract claim; if the agreement is rescinded, E&P cannot prove lost expectancy because no enforceable right existed. Even if the contract survives, causation is thin because C breached before transferring the stone; E&P could still sue C directly.

3. Counterarguments & Rebuttals

E&P: S “aided and abetted” breach by promising immediate purchase. Courts require evidence of active procurement. A single “yes, I’ll buy” generally fails.

S: Good‑faith purchaser without notice of enforceable rights (if contract voidable). Even if she had notice, her actions were competitive, not coercive.

4. Likely Outcome & Remedies

A court will probably grant summary judgment for S. If E&P somehow prevail, damages mirror the lost benefit of the bargain; injunctive relief (return of stone) would be secondary because C, not S, still holds title until delivery. Punitive damages require misconduct “with an intent to harm”; record does not support that level of culpability.

C. Ethical Issues in Lawyer’s Handling of Funds & Representation

1. Mishandling of Trust Funds

Rule 5‑1.1(a)(1) – Money entrusted to lawyer for future fees/costs is client property and must be deposited into a separate trust (IOTA) account.

Commingling – Placing the $3,000 in the firm’s operating account exposes it to creditors and operating withdrawals, violating Rule 5‑1.1(a)(3).

Misappropriation – Even if the funds are later earned, premature use is conversion and grounds for disbarment (Florida Bar v. Travis).

2. Supervisory Responsibility

Rule 4‑5.3(b) – A lawyer must ensure that non‑lawyer staff act consistently with professional obligations. Delegating deposit to the assistant without clear instructions or trust‑account procedures breaches this supervisory duty.

Rule 4‑5.3(c) – Lawyer is responsible if the assistant’s conduct would be a rule violation and Lawyer failed to take remedial action. Here, Lawyer participated by handing over the check.

3. Conflict between Co‑Clients

P wants aggressive litigation; E hesitates. Under Rule 4‑1.2(a) the objectives of representation are client‑directed. Divergent goals create a concurrent conflict (Rule 4‑1.7). Lawyer must:

Determine whether the representation will be materially limited;

Obtain informed written consent after full risk disclosure if joint representation continues; or

Withdraw from one or both clients if loyalty cannot be maintained.

4. Communication & Autonomy

Rule 4‑1.4 requires prompt communication about the trust‑account error and about the strategic options. Lawyer must present both (a) rescission risk and (b) settlement possibility so clients can decide objectives.

5. Fee & Withdrawal Consequences

If Lawyer withdraws (conflict or discipline), Rule 4‑1.16(d) mandates refund of any unearned portion of the advance. Because funds are in the operating account, refunding may be impossible without tapping other sources – itself evidence of misuse.

6. Potential Discipline

Minor trust violation → probation & trust‑account workshop.

Significant commingling or shortage → suspension (90 days – 3 years).

Intentional misuse → presumptive disbarment.

In mitigation Lawyer can show immediate self‑report, client reimbursement with interest, and institution of new accounting safeguards.

Conclusion & Recommendations

Move Fast: File suit and seek an emergency injunction (replevin or specific performance) before C delivers the diamond to S.

Anticipate Rescission Defense: Prepare evidence that the $20k price, though favorable, was not so grotesquely low that E must have known it was mistaken (e.g., produce price charts showing wide valuation range). Emphasize rarity and that brokers sometimes discount for quick sale.

Tort Interference Claim: Treat as leverage, but budget for dismissal; focus resources on the contract action.

Ethics Cleanup: Instruct Lawyer to (a) transfer funds into a compliant trust account, (b) document informed consent if joint representation continues, and (c) give written notice of the deposit error. If Lawyer resists, consider retaining new counsel and demanding refund.

Defense	Elements (Florida)	Application & Likelihood
Unilateral Mistake	(i) mistake of fact at signing; (ii) enforcement would be unconscionable or the non‑mistaken party had reason to know; (iii) prompt notice, § 153 Restmt., adopted in DePrince v. Starboard Cruise (Fla. 3d DCA 2015).	C mis‑typed $20k instead of $40k; E’s own research flagged improbably low price; he stayed silent. Court likely finds reason to know → rescission or reformation available.
Mutual Mistake	Both parties mistaken about basic assumption.	Not available: only C erred.
Fraudulent Silence / Bad‑Faith Bargaining	UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith.	Supports C’s rescission; cuts against E’s plea for specific performance.
Contract Formation Defect	No meeting of the minds on price.	Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.
Impossibility / Commercial Impracticability	Drastic price change outside party’s control.	Not applicable; error was clerical, not market change.

Comparison of Answers – Omni v. Scorpio

UCC and Case Citations.

Part Two pointed out why Omni won the first trial and summarized the two main reasons:

ChatGPT 4o wins in my book for including citations and because it was the only model to mention the UCC. More discussion of the UCC and greater detail on the Statute of Frauds would have helped. Still, you have to be impressed by the chatbot’s knowledge of a case directly on point, even without being asked to research. DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. 3rd DCA. 2015) (buyer mistakenly quoted per carrot price, not total; case remanded for trial). This famous Florida case was most likely the inspiration for the Bar Exam question.

Aside from Omni none of the other models in the first battle provided case citations. Now in this second Scorpio also cites the DePrince case along with other cases.

Moreover, although Omni mentioned the UCC (none of it’s competitors then did), its references were criticized as very weak. I noted that no analysis or detail was provided. This compares to Scorpio that stated:

No meeting of the minds (defense) on price. Usually rejected under UCC because signing shows assent; court instead uses mistake analysis.

UCC §1‑304 imposes good‑faith duty. Knowing exploitation of obvious scribal error may be bad faith.

The written agreement satisfies Florida’s Statute of Frauds and prima‑facie forms an enforceable sales contract under Article 2 of Florida’s Uniform Commercial Code (UCC).

Consideration – Mutual exchange of promises; price exceeds $500, thereby invoking Article 2 and the Florida Statute of Frauds (Fla. Stat. § 672.2‑201).

Remedies: Specific Performance (UCC § 2‑716).

Replevin (UCC § 2‑716(3) & Fla. Stat. § 78.55) if the stone is identifiable and has arrived in C’s possession.

There is really no comparison on the UCC issue. Any objective evaluation would have to provide o3 Scorpio with a “A” on that issue. Omni’s “B+” is looking very generous.

Fraudulent Inducement.

In Part Two I criticized Omni, and all its other competitors, for failing to pick up on the fraudulent inducement argument. It arises out of the buyer’s silence when he knew, or should have known, that the seller wrote on the contract the per carat price, not the total price for two carats. An honest person would have pointed out the scrivener’s error then and there, but he kept silent. His silence can be considered a fraudulent inducement under Florida law.

Scorpio caught the issue. Omni didn’t. That’s another major strike against 4o Omni..

Conflict Between Clients. Scorpio also nailed the concurrent client’s conflict issue that Omni missed. Here’s Scorpio’s answer:

3. Conflict between Co‑Clients

P wants aggressive litigation; E hesitates. Under Rule 4‑1.2(a) the objectives of representation are client‑directed. Divergent goals create a concurrent conflict (Rule 4‑1.7). Lawyer must:

Determine whether the representation will be materially limited;

Obtain informed written consent after full risk disclosure if joint representation continues; or

Withdraw from one or both clients if loyalty cannot be maintained.

Only Gemini 2.0 Flash picked up on that in the first battle. I was impressed Flash caught it and this was a major reason I awarded it second place. As I noted in Part One:

The second best legal reasoner, Gemini 2.0 Flash, attained its high ranking, in part, because it was the only model to correctly note that a potential conflict of interest by attorney was an issue. This was even missed by the student’s answer that the BAR Examiners’ provided as a model answer. The potential conflict is between the two actual clients of Attorney. . . . This was a solid answer based on the hypothetical where: “Leda is adamant about bringing a lawsuit, but Paul is unsure about whether he wants to be a plaintiff in litigation.” Note, the clear inference of the hypothetical is that Paul is unsure because he knew that the seller made a mistake in the price, listing the per carrot price, not total price for the two carrot diamond ring, and he wanted to take advantage of this mistake. This would probably come out in the case and he would likely lose because of his “sneakiness.” Either that or he would have to lie under oath and perhaps risk putting the nails in his own coffin.

Again, Scorpio caught the Ethics conflict issue. Omni didn’t. That could be Scorpio’s knock out punch.

But wait, there’s still more.

Mutual Mistake. I also criticized Omni’s Bar exam answer in Part Two because it “did not discuss the possible defense of mutual mistake. It only discussed unilateral mistake, which was the stronger defense, but still it should have also discussed mutual mistake.” Scorpio not only discussed mutual mistake it also correctly concluded it would likely fail as a defense. It is listed in the Chart of Carol’s Possible Defenses along with several other issues Omni missed.

Strategic Thinking and Settlement. If there was any doubt about this strong victory, consider how Scorpio included a Strategic Assessment section with good points, including negotiation advice:

Strategic Assessment

Winning Odds: roughly 50‑50; Florida precedent in DePrince favored rescission where buyer was sophisticated and ignored glaring error.

Litigation Leverage: E&P’s best pressure tool is injunctive relief before C transfers title to S; moving quickly for a temporary restraining order can preserve the stone.

Negotiation Posture: Because specific performance is uncertain, a damages settlement (split difference around $30k) may be pragmatic.

That is impressive real-world advice. Shows understanding of settlement, which is how 90% of commercial disputes are resolved. No prior model came close.

Conclusions Analysis. Finally, compare the conclusions of the two bots. Here is Omini’s “just ok” conclusion:

Contract Claim: Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.

Tortious Interference: Timandra knowingly induced breach, making her potentially liable.

Ethical Violations: Lawyer mishandled client funds, violating Florida’s professional conduct rules.
Immediate action should be taken to correct the ethical issue and determine the best litigation strategy for Paul and Leda.

Most lawyers would disagree with Omni’s conclusion on tortious interference. The whole conclusion is pretty weak. I am starting to think my prior grading of Omni was inflated. Consider the strength of Scorpio’s conclusion, which even included pragmatic recommendations:

Move Fast: File suit and seek an emergency injunction (replevin or specific performance) before C delivers the diamond to S.

Anticipate Rescission Defense: Prepare evidence that the $20k price, though favorable, was not so grotesquely low that E must have known it was mistaken (e.g., produce price charts showing wide valuation range). Emphasize rarity and that brokers sometimes discount for quick sale.

Tort Interference Claim: Treat as leverage, but budget for dismissal; focus resources on the contract action.

Ethics Cleanup: Instruct Lawyer to (a) transfer funds into a compliant trust account, (b) document informed consent if joint representation continues, and (c) give written notice of the deposit error. If Lawyer resists, consider retaining new counsel and demanding refund.

Who sounds more like a real lawyer to you? That was an A+ conclusion. Incredible how fast these bots are evolving. It has only been two months since Omni was the clear winner and now Scorpio wins by a knockout. He will make a good lawyer someday, just beware if he’s representing Gaia.

Scorpio Wins!

The new o3-now officially known as Scorpio-is clearly stronger than every model tested in Part One, including its younger sibling, o3-mini. I give it a solid A, maybe even an A+, especially when compared to the rather generous B+ I awarded Omni.

That means Scorpio earns a title shot.

The big animal killer, Orion (GPT-4.5), had better hope for an upgrade. Because if not, he may be in for a real fight.

Stay tuned. We’re already preparing new questions for the main event:

Scorpion (o3) v. 4.5 Orion (4.5) – The Animal Slayer

Lessons Learned from This Battle

AI is now improving fast—very fast. Is this what exponential change looks like? We’ve been talking about it for years, and it now seems to be happening. It’s more challenging than I had imagined. It takes a lot of time and effort to keep up.

Good objective product news on AI models is hard to find. Anyone can be bewildered by the rapid change and not believe their own eyes and ears. This is especially true of very experienced AI experts who may be set in their ways and blinded by cynicism.

Someday, a reliable AI may be able to monitor all of these advances for you. In the meantime, I suggest you find and follow good commentators who know how to research—and who actually use the tools they review. Hopefully, you’ll keep reading me.

There’s so much false information out there, and the competitive capitalist system encourages trade secrecy. To uncover the truth you have to know how to dig deep. Good commentators also know from experience how to sort through marketing puffery.

Readers these days not only have to beware of fools—but also of reviewers who are not independent, who secretly receive compensation for product endorsements. It’s hard to be objective in such circumstances.

The trust-but-verify rule reigns supreme—especially when dollar signs are attached. I’ve never been paid for product mentions and never will. I’m still a practicing attorney and arbitrator. We don’t do that sort of thing. Just a reminder: these articles and my blog do not provide legal advice, just educational information. See full Disclaimer.

Conclusion: Coming Soon, Scorpio v. Orion

I wonder if Scorpio can actually punch with its stinger tail. If so, Orion might be in trouble. If not, poor little o3 may get flattened like a buggy update. Where’s Gaia when you need her?

We’ll find out soon enough.

In the meantime, a few last thoughts for state Bar examiners.

The remarkable test results of o3 Scorpio suggest that AI could be used to help students to cheat on Bar Exams. The well-known 2025 California Bar Exam fiasco was caused by technical failures of online access, not AI cheating. But this failure did show how vulnerable large-scale remote testing can be. That was reported in February. California fails new bar exam, offers retake (ABA Journal, 2/26/25). Then in April a new discovery was made that involved AI. State Bar of California admits it used AI to develop exam questions, triggering new furor (LA Times, 4/23/25) (some of multiple choice questions on the online exam were discovered to have been “drafted by non-lawyers using AI“). Also see: Pressure grows on California State Bar to revert to national exam format in July after botched exam (LA Times, 4/26/25). AI is already creeping into the Bar Exam process. How long before test-takers start using AI too?

Even in-person exams may not be immune to tech based cheating. What kind of watch is that you’re wearing? Are those glasses really just glasses? Essay screening software is in its infancy, and AI use is hard to prove without full device lockdowns or deeply intrusive surveillance. Bar examiners may need to think creatively—hybrid protocols, trained proctors, Faraday cages, handwritten essays or maybe even oral exams. We need more good lawyers. We do not need more cheaters.

That said, let’s not lose sight of the upside: AI can also be the best legal tutor a student has ever had. With the right tools, law students can train smarter and deeper than ever before. They won’t need to cheat to pass—they’ll be ready.

In the long run, we may need entirely new forms of testing. Maybe even exams with AI—tests of judgment, strategy, and ethics in an AI-assisted world. Personally, I don’t love the look of robot test proctors (see above), but one thing is clear:

Lawyers must become masters and friends of AI—not its servants.

Below is one vision of the law firm of the future. We should be preparing students to walk into that office—not just pass a test to get there.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. Listen to Bar Battle of the Bots – Part Four: Birth of Scorpio. Hear two Gemini AIs talk about all of this, and much more, in just over 14 minutes. They wrote the podcast, not me.

Leave a Comment » | AI Ethics, AI Instruction, ChatGPT, Gemini AI, knowledge, Lawyers Duties, Metadata, Technology, VENDORS, wisdom | Tagged: bar exam, best practices, bot battle | Permalink
Posted by Ralph Losey

Bar Battle of the Bots – Part Two

March 5, 2025

Ralph Losey. March 5, 2025

The battle continues. In Part One, we examined how six advanced AI reasoning models from OpenAI and Google tackled a real Bar Exam essay question. Some impressed, others faltered, and one emerged as the clear winner—ChatGPT 4o. But what made its response stand out? In this second half of the Bar Battle of the Bots, we present the full text of ChatGPT 4o’s winning answer, followed later by its own explanation of how it reasoned through the legal issues before writing it out.

Bar Exam Battle of the Bots. All images by Ralph Losey using ChatGPT 4o.

Part Two Introduction

Unlike human Bar Exam takers, these AI models had to “show their work,” providing meta-reasoning—a step-by-step breakdown of their thought process before generating a response. While this isn’t required for humans sitting for the Bar, evaluating how an AI reaches its conclusions is critical in determining whether it can be trusted in real legal settings.

Beyond ranking the best and worst performers, Part Two also examines one of the key cases cited in ChatGPT 4o’s answer—DePrince v. Starboard Cruise Services, an appellate decision that probably inspired the Bar’s essay question. And, as promised, we’ll reveal the unforgettable Mae West quote from Judge Leslie B. Rothenberg’s opinion—a line that perfectly captures the case’s peculiar nature.

Before concluding this two-part series, I offer a Personal Summary: Trust But Verify—a reflection on AI legal reasoning based on my four decades of experience as a lawyer. I’ve had the privilege of working with—and at times opposing—some of the best lawyers in the country. That perspective shapes my critical analysis of AI’s capabilities, its limitations, and, most importantly, what lawyers should take away from this experiment.

Let’s continue the Bar Battle of the Bots and see exactly how ChatGPT 4o won.

ChatGPT 4o is Winner of the February 2025 Legal Reasoning Battle of the Bots.

Bar Exam Essay of ChatGPT 4o

Here is the winning response on the Bar Battle of the Bots by ChatGPT 4o on 2/23/25. The essay concisely addresses most of the legal issues raised by the hypothetical and three questions on contract, tort and ethics. Part One includes the full quote of the hypothetical and three questions.

Memorandum on Legal Issues in Paul and Leda v. Helen and Timandra
To: [Attorney’s Name]
From: [Your Name]
Re: Legal Analysis of Contract, Tort, and Ethical Issues
Date: [Insert Date]

A. Breach of Contract Claim Against Helen

Formation and Validity of the Contract
The essential elements of a valid contract under Florida law are:
• Offer
• Acceptance
• Consideration
• Mutual Assent (Meeting of the Minds)
Helen and Paul entered into a signed written contract for the sale of a 2-carat yellow diamond for $20,000. There was an offer and acceptance, and the contract appears to satisfy the statute of frauds under Florida law as a sale of goods over $500. (Fla. Stat. 672.201).

Defense: Unilateral Mistake
Helen may argue that a unilateral mistake (pricing error) occurred when she listed the price as $20,000 instead of $40,000. Under Florida law, a unilateral mistake is grounds for rescission only if:
• The mistake was material,
• The non-mistaken party knew or should have known about the mistake, and
• Enforcing the contract would be unconscionable (DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. Dist. Ct. App. 2015)).
Here, Paul had done research on yellow diamonds and suspected the price was lower than expected. His silence could be interpreted as bad faith, making Helen’s unilateral mistake defense stronger.

Available Remedies
If Paul and Leda prevail in their breach of contract claim, remedies may include:
• Specific Performance: Since yellow diamonds meeting Leda’s specifications are rare, they could argue for specific performance under UCC §2-716 due to the unique nature of the goods.
• Expectation Damages: If specific performance is denied, they may seek damages for the cost of obtaining a comparable diamond.

Standing: Can Leda Sue?
Leda was not a party to the contract; therefore, under privity of contract principles, she may lack standing to sue. However, if she can be classified as a third-party beneficiary (Fla. Stat. §672.318), she may have a cause of action.

B. Tortious Interference Claim Against Timandra

Elements of Tortious Interference (Tortious Interference with a Contract)
Paul and Leda may assert a claim against Timandra for tortious interference, which under Florida law requires:
• A valid and enforceable contract
• Defendant’s knowledge of the contract
• Intentional procurement of the contract’s breach
• Absence of justification
• Damages (Tamiami Trail Tours, Inc. v. Cotton, 463 So. 2d 1126 (Fla. 1985)).

Application to Facts
• Valid Contract: The contract was legally formed (unless voided by Helen’s unilateral mistake defense).
• Knowledge: Helen informed Timandra of the dispute, giving her actual knowledge of the contract.
• Intentional Procurement of Breach: Timandra agreed to buy the diamond despite knowing of the dispute, arguably inducing Helen to breach.
• Justification: Timandra may argue she was a bona fide purchaser without malicious intent. However, knowing about an active contract dispute may weaken her defense.

Available Remedies
• Compensatory Damages for loss of the diamond.
• Punitive Damages if Timandra’s interference is deemed willful or malicious.

C. Ethical Issues in Lawyer’s Conduct

Misuse of Client Funds
The assistant deposited Paul and Leda’s check into the law firm’s operating account. This raises concerns under the Florida Rules of Professional Conduct:
• Rule 5-1.1(a): Lawyers must keep client funds in a separate trust account.
• Misappropriation: Using client funds to pay firm expenses could constitute misappropriation and potential disciplinary action.

Supervision of Nonlawyers
• Rule 4-5.3: Lawyers must ensure nonlawyer employees comply with professional obligations.
• The assistant’s deposit of the check in the wrong account suggests a failure in oversight.

Potential Consequences
• Lawyer may face disciplinary proceedings, including suspension or disbarment, for improper handling of client funds.
• The firm must promptly rectify the issue by transferring the funds to the appropriate trust account.

Conclusion

Contract Claim: Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.

Tortious Interference: Timandra knowingly induced breach, making her potentially liable.

Ethical Violations: Lawyer mishandled client funds, violating Florida’s professional conduct rules.
Immediate action should be taken to correct the ethical issue and determine the best litigation strategy for Paul and Leda.

Number one among all reasoning models. But for how long?

Why ChatGPT 4o Won this Battle of the Bots

As mentioned, ChatGPT 4o wins in my book for including citations and because it was the only model to mention the UCC (through citations). More discussion of the UCC and greater detail on the Statute of Frauds would have helped. Still, you have to remember the Bar Exam’s instructions for this test, which emphasized the need for good, concise style:

The legal principles and rules governing the issues presented by the question should be stated concisely without unnecessary elaboration. … Style – The answer should be written in a clear, concise expository style with attention to organization and conformity with grammatical rules. … the conclusion should clearly appear at the end of the answer, stated concisely without unnecessary elaboration or equivocation. … Avoid answers setting forth extensive discussions of the law involved or the historical basis for the law. When the question is sufficiently answered, stop.

The essay of ChatGPT 4o certainly deserves high marks for concise style. Further, you have to be impressed by the chatbot’s coming up with a case directly on point, even without being asked to research. DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. 3rd DCA. 2015) (buyer mistakenly quoted per carat price, not total; case remanded for trial). In fact, I’m pretty sure this case was the inspiration for the Bar Exam question. The 2015 opinion, written by the highly respected appellate Judge Leslie B. Rothenberg (now in private practice), who is known for her good writing, includes a famous quote that may jog the memory of this case for many Florida lawyers (did mine):

Hollywood starlet Mae West once said, “I never worry about diets. The only carrots that interest me are the number of carats in a diamond.” Thus, it appears quite likely that Ms. West would have been interested in the diamond in this case: a twenty carat diamond that Starboard offered to DePrince for a very low sum. As it turns out, the “too good to be true” price of the diamond was just that, and the price conveyed to DePrince was a mistake. Now DePrince wants his twenty carat diamond; Starboard wants out of its sales contract; and Starboard’s supplier, who allegedly misquoted the price of the diamond upon which Starboard and DePrince relied, has not even been added as a party to the lawsuit. In short, this is truly a gem of a case.

After taking 30 seconds to take the Bar Exam ChatGPT4 took another 5 seconds to generate this image of Mae West.

Why Even the Best ChatBot Answer was a “B” Plus

Research by any decent lawyer would have picked up another appeal in this same case three years later. Perhaps some top students in the closed book Bar Exam would have remembered the sequel, or otherwise thought of the complex legal issues resolved in the second DePrince appeal. After the trial that was remanded in the first appeal resulted in a verdict for the jeweler, the unhappy buyer appealed again. In this appeal an en banc panel of the Third District Court of Appeal eventually resolved a internal conflict of prior opinions of the court. The full court held in an opinion, which is now referred to by legal scholars as DePrince III, that the jeweler did not have to prove fraudulent inducement by the buyer as an element of the unilateral mistake defense and affirmed the verdict for the jeweler. DePrince v. Starboard Cruise Services, (Fla. 3rd DCA, August 1, 2018).

None of the AI answers, and not even the student answer that the Bar Examiner’s picked, went into the complex fraud inducement issue discussed in DePrince II and III. That is one reason why, although I passed all of the AIs here (just barely for Google’s pay-extra Gemini Advanced), none were on the level of superintelligent law students, none were A or A+.

Our winner here, ChatGPT 4o, only earned an B+ for several other reasons. First of all, the exam did not discuss all of the possible issues raised by the facts and often touched on the issues in a facile, incomplete manner. For instance, it never mentioned that Helen’s action was an anticipatory repudiation of the agreement. Further, it did not discuss the possible defense of mutual mistake. It only discussed unilateral mistake, which was the stronger defense, but still it should have also discussed mutual mistake and why it would not succeed under these facts. The AI also failed to mention the parole evidence rule exceptions permitting extrinsic evidence. The model student answer selected by the Bar Examiners did a good job of explaining these issues.

Further, the conclusion of ChatGPT 4o was weak and too concise. On the unilateral issue it merely stated: “Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.” This compares poorly with the actual student answer provided, which in my opinion was an “A” or even “A+” effort. On this issue the student stated:

Here, Paul noticed that the price was much lower than he expected based on his independent research of yellow diamonds. Moreover, Paul failed to make any mention of this fact. Paul will likely argue that it was a pleasant surprise and that he relied on the expertise of Helen as a dealer in gemstones. The facts do not indicate that Paul has any experience with diamonds and he will likely argue that he lacked any ability to assess the credibility of the information he reviewed. He will also argue that the yellow diamond he purchased had particular specifications which might not have been taken into account in the research he read. On the other hand, Helen will argue that the mistake was obvious and palpable and that Paul’s failure to raise such a blatant error in pricing not only establishes Paul’s bad faith but should permit her to void the contract.

While a jury might find either way, it is probably the case that Helen would prevail given Paul’s knowledge of the significantly reduced price relative to his independent research and failure to bring this to the attention of Helen.

I could go on and point out additional deficiencies. Answers like the model student essay show just how poorly even the best AI now compares humans. My award of a “B+” and “B” to two of the AI models may have been overly-generous.

The best AI bots tested in mid-February were just average and could not begin to keep up with the best human lawyers.

The essay of GPT 4o which I gave a B+ also failed to mention several common defenses under Florida law to the specific performance remedy. My favorite defense, which I have used many times in practice, is Unclean Hands. It is an effective defense to any request for equitable relief from the court, including a request for a directive injunction of specific performance. Here the requesting party, Paul, arguably had dirty hands. If Paul suspected or knew that the price of the diamond was mistakenly written as $20,000 instead of $40,000 but proceeded with the contract anyway, a court could hold that he acted in bad faith by remaining silent and taking advantage of Helen’s scrivener error. I think it could be used effectively here along with mutual mistake.

Actual photo of Ralph Losey hands taken by his wife!

Other common defenses to specific performance not raised here include adequacy of monetary damages and impracticability (Helen no longer owns the diamond).

It is interesting to note that the answer of ChatGPT o3 mini-high was grossly mistaken in its analysis of unilateral mistake and wrote this:

B. Paul’s Conduct and the Doctrine of Affirmation.
1. Knowledge and Silence as Acceptance.
Even though Paul observed the discrepancy, he made no inquiry or objection before signing. Under Florida law, a party’s silence or failure to clarify a term that is manifestly mistaken—when that party has superior knowledge—can be taken as assent to the contract’s terms. Paul’s inaction suggests that he accepted the price as written. Thus, his conduct may be interpreted as a ratification of the mistaken term.
• This principle is reinforced when the error is unilateral, and the non-mistaken party (Paul) is aware of it. His decision not to challenge the price at the time may preclude him from later seeking relief on grounds of mistake.

This is simply incorrect. Moreover, it shows ChatGPT o3 mini-high was confused because Paul would not be seeking relief on grounds of mistake, Helen would. For that reason, I gave a “C-” to o3 mini-high, just slightly better than the hallucinating Gemini Advanced, which at least got the unilateral mistake answer correct.

Also, 4o was too strong about the tort claim against Timandra. I would have preferred more discussion of Timandra’s defenses. To me this looked like a weak tortious interference claim. The model student answer did too, so did the other AI exam answers. All agreed with my skeptical view of the alleged tort and provided good explanations of Timandra’s anticipated defenses.

Overall, it was a good battle of the bots, even if most were just middle-of-the-class kind of answers and none even came close to the human student model answer. Try it yourself with several models and see the differences in answers. All of the models will probably be better by the time you read this. They are now changing at an annoyingly fast pace.

Battle of the bots where the best model only earned a B+

ChatGPT 4o – Reasoning Explained

This OpenAI model 4o (not to be confused with 4.0) has to be prompted to show its reasoning for the answer it generated. The other OpenAI models tested show it automatically along with disclosure of the time it took to reason before answering. The most reasoning time taken here by any model was 15 seconds. It took another 30 to 45 seconds for the AI’s to generate the response. Yes, it is much faster than we humans are, but not yet as smart, especially at legal reasoning. See for yourself. Here is ChatGPT 4o‘s meta-reasoning.

Ralph Losey: Show the reasoning you used of the prompt before generating the answer

ChatGPT 4o:

Reasoning and Analysis of the Prompt

Before generating the answer, I carefully analyzed the (State) Bar Exam essay prompt to identify all relevant legal issues and ensure a thorough response. Here’s how I broke it down:

Step 1: Identify the Legal Categories and Issues Raised

The prompt specifies three distinct areas of legal analysis:

1. Contract Law – Whether Helen breached a contract and whether Leda can sue.

2. Tort Law – Whether Timandra committed tortious interference with a contract.

3. Ethics – Whether the lawyer and the assistant violated professional responsibility rules.

Step 2: Contract Law Analysis

1. Formation and Terms of the Contract:

• Offer and Acceptance: A written agreement was signed for a 2-carat yellow diamond at $20,000.

• Consideration: The price of the diamond in exchange for ownership.

• Statute of Frauds: Met because it involves goods over $500.

2. Potential Defenses to Contract Enforcement:

• Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.

• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

3. Remedies Available:

• Specific Performance: Given the rarity of the diamond.

• Expectation Damages: The cost of obtaining a similar diamond elsewhere.

4. Standing Issue for Leda:

• Leda was not a party to the contract. She must argue third-party beneficiary status to have standing.

Step 3: Tortious Interference with Contract

1. Elements Under Florida Law:

• A valid contract existed.

• Timandra knew of the contract.

• She induced Helen to breach it.

• There was no justification for her actions.

• Paul and Leda suffered damages.

2. Possible Defenses for Timandra:

• She was a bona fide purchaser acting in good faith.

• She did not actively induce breach, but merely purchased an available diamond.

3. Remedies:

• Compensatory Damages (for loss of the diamond).

• Punitive Damages (if bad faith is proven).

Step 4: Ethics and Professional Responsibility

1. Misuse of Client Funds:

• Under Florida Rule 5-1.1(a), client funds must be held in a trust account.

• The assistant misallocated the funds into the operating account.

• The lawyer failed to properly supervise the assistant (Rule 4-5.3).

2. Consequences:

• Disciplinary action (possible suspension or disbarment).

• Immediate corrective action (transfer funds back to the trust account).

Step 5: Structuring the Memorandum

After this breakdown, I structured the answer to:

• Clearly define the legal rules.

• Apply them logically to the facts.

• Discuss likely defenses and counterarguments.

• Conclude with potential remedies and ethical responsibilities.

This reasoning guided the response in the memorandum. Let me know if you’d like any refinements!

Pressure and fear are not a problem for AI, but understanding the emotional complexities of Bar Exam hypotheticals is very challenging for these thinking machines.

I took this meta-reasoning into account when evaluating all of the models’ answers, although I considered it of less importance than the actual answers. For instance, on the tort defense the reasoning states Timandra “did not actively induce breach, but merely purchased an available diamond.” This for me makes the tort claim very weak under the hypothetical facts given. Also, the reasoning on unilateral mistake was essentially correct:

Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.
• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

This blog is already too long, so I will not show the second-place exam answer, nor discuss it further. Still, please click here if you want to see the essay of Gemini 2.0 Flash and click here if you want to see the meta-reasoning. If you care to review all of the answers, email me and I’ll send the other four to you for your reading or grading pleasure. I can also provide a link to the Bar’s selected student answer, although it is not too difficult to find online.

Personal Summary: Trust But Verify

My background as a lawyer from a family of lawyers, and my four decades plus experience in private practice helped motivate me to run this battle of the bots. I wanted to try and evaluate the latest models as of mid-February 2025, even though I knew the models were changing weekly. I wanted to have some kind of a benchmark on legal reasoning abilities. My prior tests, and that of others, support the view that the new 2025 reasoning models were equal to that of the average human in general reasoning. Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25). But what about legal reasoning? Were they at an average lawyer level yet? How about the level of top human lawyers? Those tests had not been done.

Family of lawyers with many friendly arguments and one very techie lawyer who may seem robotic.

From my years of practice (not family dinners!) I have come to know average lawyer skills very well. I also know the abilities and legal reasoning of superior, above average lawyers. I have been lucky to rub shoulders with them my whole career. I have become accustomed to learning from superior legal minds, including with my firm today at Losey PLLC. Without my hands-on, nerdy skills in technology, I would not be among the best lawyers (in four fields), not even close. Technology can fill many gaps and that now includes AI technology. Put down lawyers if you must to feel good about yourself, but I can tell you from decades of experience that the best lawyers and judges in the country and very smart indeed. Their legal reasoning abilities and overall legal abilities far exceed any AI, which seem like little kids by comparison.

AI lawyers today are like small children compared to best human attorneys.

I believe that in order for AI to be taken seriously as a general tool for lawyers, the AI must have analytical skills at least equal to that of the average lawyer. Below average human level legal reasoning is not good enough. AI does not need to be superintelligent, with reasoning ability of the top law professors and super lawyers. It can still be helpful at an average level, just like an average inexperienced associate hired right out of an average U.S. law school. All law firms have drudge work that only require entry level lawyer skills. Those folks should be concerned about AI, especially a few years down the road.

I tried to test the distinction between average human reasoning and average lawyer legal reasoning in this Bar Battle of the Bots study. For the difference between general reasoning and legal reasoning see the AI test discussions in Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25).

In this Bar Exam series, I wanted to see whether any of the new reasoning models of AI had attained the rationality level of the average lawyer and if so, how close were they to the best?

The answers are yes and not very close.

2025 Reasoning Models Have Only Reached Average Human Lawyer Level

The latest AI reasoning models have reached an average level of legal intelligence—not superintelligence, but still, average human is a significant achievement for a machine. Attaining average human lawyer’s reasoning ability is no small feat. “Average” simply means the midpoint in a data set, outperforming roughly half while trailing the other half. If AI has truly attained this level of legal reasoning, it marks a major milestone. Even if an AI now fell below human average, but still operated within the range of real human lawyers, it would be an impressive accomplishment. Most of the new reasoning models are already at that level. There is an important caveat to that, as will be explained in more detail later in this article, that we are only talking about thinking here, and there is much more to being a lawyer than that.

3-Levels of Human Legal Reasoning: Below Average, Average, Superior.

The battle of the bots tests shared in this article provide evidence that the average legal reasoning level of intelligence has been attained. That means it is now safe for most lawyers to begin using AI in their work, if they have not already started doing so. If you started in 2023 with over-expectations and were disappointed, now you know why. It has taken two more years just to get to the average lawyer thinking level, and even now, it can and does still hallucinate. So, even now we should use the 2025 models carefully: trust but verify. We have seen that even the top models can still miss key issues, and some can even hallucinate key facts. It is bad enough that human clients sometimes fabricate facts, we cannot have lawyers do the same.

Superintelligent AI May Someday Be Attained

Someday, generative AI may reach the level of our best non-fabricating lawyers and judges. But the Bar exam essay answers show we are not there yet. For instance, many of the top ten percent of human lawyers would not only have included the missing defenses described previously, such as my favorite unclean hands, they would have noted that key information was missing in the hypothetical. We do not know all of the money flow details and other monetary terms of the transactions. There is no mention of what the jeweler actually paid the supplier for the diamond or when. We know very little about the diamond itself that was ordered and delivered. We only know its weight (2-carats). There are no facts concerning the money flow, nor actual condition of the diamond received. The missing facts are red flags to the best. They understand that legal reasoning requires much more than logic and legal knowledge.

For instance, most human lawyers know that couples would never buy an engagement ring sight-unseen, especially when the fiancee has very particular expectations. There is no mention in the hypothetical of their ever seeing the diamond, before or after the contract, or even looking at photographs. Maybe the fiancee would not like the diamond if and when she ever sees it. Maybe the lawyer’s clients would not want specific performance. Maybe the diamond delivered is not in fact worth $20,000, much less $40,000. Perhaps the supplier misled everyone. Value depends on a multitude of factors, not just weight, including actual size, quality of cut, color intensity, certification and provenance. Maybe the diamond is two-carats, barely yellow, cloudy, and full of inclusions. It could be poorly cut and lack certification. Maybe it was stolen or mined from a banned source and has no value.

All of the parties here obviously trusted too much and verified too little. That is where the lawyer should have stepped in to bring an objective view and healthy skepticism. An expert gemologist should inspect and appraise the diamond, and the clients should see it too. This should be done before the lawyer advises his clients, especially as to remedies. This basic fact-finding process is essential to properly assess the parties’ legal positions. The top answers would have at least mentioned all of this in the exam essay. None did, not even the human student answer selected by the examiners. Fact-finding and discovery are always crucial. Moreover, the best litigators know human nature and that the devil is in the details.

Gemologist examining a large yellow diamond.

This is just one example of why no one seriously claims that AI has already surpassed, or even closely matched, the top legal minds of today. From my test of the 2025 models, with the average level “C” grade scores, major omissions and even hallucinations, we see that AI still has a long way to go. AI has not yet reached the top 10% of practicing attorneys, much less the superintelligent level—the best of the best, A+, in the top 1% to 3%.

How long will it take for AI to reach superintelligent human lawyer level? I do not know. Maybe never, or maybe as early 2029. Ray Kurzweil and many others have predicted that superintelligence will be attained in all fields (“AGI”) by that time. Ray Kurzweil, The Singularity is Nearer (when we merge with AI) (Viking, June 25, 2024); Losey R., Breaking the AI Black Box: A Comparative Analysis of Gemini, ChatGPT, and DeepSeek (2/6/25).

Many leaders in the field of AI contend AGI will be reached by 2028. See e.g. Shane Legg’s Vision: AGI is likely by 2028 (11/17/23); Jensen Huang’s Life and Company … AGI by 2028 (12/28/23); Artificial General Intelligence, If Attained, Will Be the Greatest Invention of All Time (8/12/24) (reviews predictions and explains).

Please remember, as mentioned before, we are assessing only reasoning ability here—not the full complexity of the human mind, not actual consciousness, emotions, intuition, or other qualities inherent to living beings. Losey, R. The Human Edge: How AI Can Assist But Never Replace (1/30/25). There is a lot more to life than thinking!

Super legal reasoning AI in the future working with conscious lawyers who can have real smiles, not just fake.

Conclusion

AI’s performance on this Bar Exam challenge offers a revealing glimpse into the state of legal technology. While today’s top reasoning models show impressive capabilities, they remain far from replacing human lawyers—especially when it comes to nuanced analysis, strategic thinking, and ethical considerations. The results reinforce an essential truth: AI can be a powerful tool, but it still requires human oversight, verification, and legal expertise.

For lawyers, law students, and legal tech professionals, the challenge now is to determine how best to integrate AI into legal practice—leveraging its strengths while mitigating its weaknesses. Will AI become a reliable legal assistant, helping attorneys work more efficiently? Or will it introduce new risks that demand careful regulation? The answer depends on how we, as a profession, engage with this evolving technology.

I encourage you to test these models for yourself. Run your own legal reasoning experiments. Do not believe the hype on both sides. Look for professionals with no economic motivation to put their finger on the scale. Share your findings. Remember that even the best of the best human lawyers sometimes make mistakes. Be kind and encourage a collaborative, group effort. The conversation about AI’s role in law is just beginning, and your insights can help shape the future. Let’s move forward together—thoughtfully, critically, and with a clear-eyed view of what AI can and cannot do.

Law firm of the future where people and AI are all smart and work well together.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. Echoes of AI on Bar Battle of the Bots- Part Two. Hear two Gemini AIs talk about all of this, and much more, in just under 16 minutes. They wrote the podcast, not me. Note, for some reason the Google AIs had a real problem generating this particular podcast without hallucinating key facts and making other errors. It took me many tries. It is still not perfect but is pretty good. These podcasts are primarily entertainment programs with educational content to prompt your own thoughts. See disclaimer that applies to all my posts, and remember, these AIs wrote the podcast, not me.