The New Stanford–Carnegie Study: Hybrid AI Teams Beat Fully Autonomous Agents by 68.7%

December 1, 2025

Ralph Losey, December 1, 2025 (25 minute read)

For years, technologists have promised that fully autonomous AI Agents were just around the corner, always one release away, always about to replace entire categories of work. Then Stanford and Carnegie Mellon opened the box and observed the Agents directly. Like Schrödinger’s cat, the dream of flawless autonomy did not survive the measurement.

An artistic representation of a robot emerging from an open box, with digital particles dispersing away from it, symbolizing the concept of AI and technology.
Observation reveals fragile AI Agents. All images in this article are by Ralph Losey using various AI tools.

What did survive was something far more practical: hybrid human–AI teaming, which outperformed autonomous Agents by a decisive 68.7%. If you care about accuracy, ethics, or your professional license, this is the part of the AI story you need to understand.

A digital graphic showing a bar chart representing 68.7% performance improvement, set against a blue background with circuit-like patterns.
Humans can work much better if augmented by AI Agents but the Agents alone fail fast.

1. Introduction to the New Study by Carnegie Mellon and Stanford

The Mellon/Stanford report is important to anyone trying to integrate AI into workflows. Wang, Shao, Shaikh, Fried, Neubig, Yang, How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations (arXiv, 11/06/25, v.2) (“Mellon/Stanford Study” or just “Study”).

Just to be clear what we mean here by AI Agent, Wikipedia provides a generally accepted defination of an Agent as “an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge.”

So, you see most everyone thinks of AI Agents and autonomy as synonymous. The Study bursts that bubble. It shows that Agents today need a fair amount of human guidance to be effective and fail too often, and too fast without it.

A split-image illustration contrasting the 'fantasy' of a futuristic, human-like robot on the left with the 'reality' of a more cartoonish robot struggling with an error on the right. The left side features a sleek, metallic robot, while the right side depicts a confused robot holding a document with an error message, emphasizing the challenges faced by AI.
This is the real world of AI Agents that we live in today.

The Study Introduction (citations omitted) begins this way:

AI agents are increasingly developed to perform tasks traditionally carried out by human workers as reflected in the growing competence of computer-use agents in work-related tasks such as software engineering and writing. Nonetheless, they still face challenges in many scenarios such as basic administrative or open-ended design tasks, sometimes creating a gap between expectations and reality in agent capabilities to perform real-world work.

To further improve agents’ utility at such tasks, we argue that it is necessary to look beyond their end-task outcome evaluation as measured in existing studies and investigate how agents currently perform human work — understanding their underlying workflows to gain deeper insights into their work process, especially how it aligns or diverges from human workers, to reveal the distinct strengths and limitations between them. Therefore, such an analysis should not benchmark agents in isolation, but rather be grounded in comparative studies of human and agent workflows.

A group of professionals and humanoid robots collaborating at a modern workspace, discussing data displayed on screens.
Studying AI and Human workflows to evaluate AI Agent performance.

2. More Detail on the Study: What the researchers did and found

Scope & setup. The Carnegie/Stanford team compared the work of 48 qualified human professionals with four AI agent frameworks. The software included stand-alone ChatGPT-based agents (version four series) and software code-writing agent platforms like OpenHands, also using ChatGPT version four series levels. These programs were “wraps”—software layers built on top of a third-party generative AI engine. A wrap adds specialized tools, interfaces, and guardrails while relying on the underlying model for generative AI capabilities. In the legal world, this is similar to how Westlaw and Lexis offer AI assistants powered by ChatGPT under the hood, but wrapped inside their own proprietary databases, interfaces, and safety systems.

The Study used 16 realistic tasks that required multiple coordinated steps, tools, and decisions—what the researchers call long-horizon tasks. They require multiple prompts requiring a series of steps, such as preparing a quarterly finance report, analyzing stock-prediction data, or designing a company landing page. The fully automated Agent tried to do most everything by writing code whereas the humans used multiple tools to do so, including AI and tools that included AI. This was a kind of hybrid or augmented method that did not attempt to closely incorporate the Agents into the work flow.

To observe how work was actually performed, the authors built what they called a workflow-induction toolkit. Think of it as a translation engine: it converts the raw interaction data of computer use (clicks, keystrokes, file navigation, tool usage) into readable, step-by-step workflows. The workflows reveal the underlying process, not just the final product. The 16 tasks are supposed to collectively represent 287 computer-using U.S. occupations and roughly 71.9% of the daily activities within them. For lawyers and others outside of these occupations the relevance comes from the overlap in task structure, not subject matter.

  • The engineering and design tasks don’t map directly to legal work but are useful for observing where agents tend to fail on open-ended or visually dependent steps. 
  • The structured writing tasks are similar to legal drafting (e.g., memos, policies, summaries); although it is imprtant to note that the writing tasks in the Study were not persuasion or adversarial, oriented.
  • The data-analysis tasks parallel evidence evaluation, damages models, timeline building, and spreadsheet-based work that litigators do every day.
  • The administrative/computational tasks resemble the work of preparing exhibits, reconciling data, or generating chronologies.
Infographic contrasting structured tasks and human judgment in AI workflows, showcasing templates, definitions, and cross-references on one side, and tone, narrative, and emotive emphasis on the other.
Agents were fast but made too many mistakes to be useful in anything but very structured tasks. Human judgment rules.

3. Key Findings of the Study.

1. Human-led Hybrid Agent workflows are much more efficient and accurate than AI Agents working alone. When AI is integrated into existing human workflows (the Hybrid approach, aka Augmented approach) there is “minimal disruption and 24.3% efficiency improvement.” Id. As shown in Figure 7 of the Study: “Humans complete work with higher quality (a), while agents possess a huge advantage in efficiency (b). Teaming human and agent workers based on their respective advantages ensures task accuracy and improves efficiency by 68.7%. Id. at pg. 14. This key finding is the essence of the hybrid methodology, which the Study refers to as AI Augmentation as opposed to AI automation. The next section will unpack what that means for law, medical and other high-stakes work

2. Humans outperform AI agents on quality by a wide margin. When Agents’ work by themselves without humans their success rates are 32.5%–49.5% lower that when humans work alone without Agents (but still using software that includes AI). In the fully automated approach, the AI will fail in a number of ways, including fabrication (hallucinating) plausible data and misusing tools (for example, switching to web search instead of opening the files you give them).

3. AI Agents fail primarily because they default to writing code. Agents “write programs to solve essentially all tasks, even when equipped with and trained for UI interactions. . . . This is in stark contrast to the visual-oriented human workflows.” Mellon/Stanford Study, Id. at pg. 2. This programmatic bias drives both their speed and their multiple failures. Agents today are only reliable when the task itself is programmable and code-friendly. Of course, the AI still makes mistakes with code too, so humans still need to step in to verify or debug.

4. When AI agents do not fail, they are extremely efficient. For tasks completed by both groups (both solo Agents and solo human/AI), agents on average took 88.3% less time, 96.4% fewer actions, and at a 90.4%–96.2% lower cost than human workers alone. Still, the 32.5%–49.5% failure rate overall makes the fully automated, aka AI automation solution only appropriate for code writing and even there the AI still makes mistakes that require human intervention, mainly verification and debugging. As the Study explains:

Human workflows are substantially altered by AI automation, but not by AI augmentation (hybrid). One quarter of human activities we studied involve AI tools, with most used for augmentation purposes: integrating AI into existing workflows with minimal disruption, while improving efficiency by 24.3%. In contrast, AI automation markedly reshapes workflows and slows human work by 17.7%, largely due to additional time spent on verification and debugging.

Id. at pgs. 2, 11 figure 5.

An illustration showing a humanoid robot interacting with a man in glasses, highlighting key takeaways from the Stanford-Carnegie Study on hybrid AI performance versus autonomous agents.

4. Study Findings Support a Hybrid Workflow with Man and Machine Working Together

The Carnegie Mellon and Stanford research supports the AI work method I’ve used and advocated sice 2012: hybrid multimodal, where humans and machines work together in multiple modes with strong human oversight. The Study found that minimal quality requirements require close team efforts and make full AI autonomy impractical.

This finding is consistent with my tests over the years on best practices. If you want to dig deeper see e.g. From Prompters to Partners: The Rise of Agentic AI in Law and Professional Practice (agentic governance).

Unsupervised, autonomous AI is just too unreliable for meaningful work. The Study also found that it is too sneaky to use without close supervision. It will make up false data that looks good to try to cover its mistakes. Agents simply cannot be trusted. Anyone who wants to do serious workk with Agents will need to keep a close eye on them. This article will provides suggestions on how to do that.

A cartoon illustration of a mischievous robot with a sly grin, set against a dark, textured background.

Click here for YouTube animation of a sneaky robot. Watch your money!

5. Study Consistent with Jagged Frontier research of Harvard and others.

The jagged line of competence cannot be predicted and changes slightly with each new AI release. See the excellent Harvard Business School working paper by Fabrizio Dell’Acqua, Edward McFowland III, Ethan Mollick, et al, Navigating the Jagged Technological Frontier (September, 2023) and my papers, From Centaurs To Cyborgs: Our evolving relationship with generative AI; and Navigating the AI Frontier: Balancing Breakthroughs and Blind Spots;

The unpredictable unevenness of generative Ai and its Agents is why “trust but verify” is not just a popular slogan, it is a safety rule.

An illustrated graphic featuring a stylized mountain range depicting a jagged frontier with sharp peaks and valleys, set against a cloudy sky.
With each new Release users find that AI competence is unpredictable.

6. Surprising Tasks Where Agents Still Struggle

You might expect AI agents to struggle on exotic, creative work. The Study shows something more mundane.

In addition to some simple math and word counts, AI Agents often tripped on:

  • Simple administrative and computer user interface (UI) steps. Navigating files, interpreting folder labels, or following naming conventions that a paralegal would understand at a glance.
  • Repetitive computational tasks that still require interpretation. For example, choosing which column or field to use when the instructions are slightly ambiguous.
  • Open-ended or visually grounded steps. Anywhere the task depends on “seeing” patterns in a chart or layout rather than following a crisp rule.

The pattern is consistent with other research: agents excel when a task can be turned into code, and they wobble along a jagged edge of competency when the task requires context, interpretation, or judgment.

That is why the 68.7% improvement in hybrid workflows is so important. The best results came when the human handled the ambiguous, judgment-heavy step and then let the agent run away with the programmable remainder.

Here is a good take-away memory aid:

An illustration showing a smiling man in a suit next to a humanoid robot. The robot appears to be processing information, symbolizing a hybrid approach to AI and human collaboration. Text on the image emphasizes that agents are fast and programmatic, while humans provide context and accountability.

7. What Agent “Failure” Looks Like

The Mellon/Stanford paper is especially useful because it does not just report scores. It shows how the AI agents went wrong.

When agents failed, the failures usually fell into two categories:

  • Fabrication. When an agent could not parse an image-based receipt or understand a field, it sometimes filled in “reasonable” numbers anyway. In other words, it invented or hallucinated data instead of admitting it was stuck. It is the Mata v. Avianca case all over again, making up case law when it could not find any. See Navigating AI’s Twin Perils: The Rise of the Risk-Mitigation Officer (e-Discovery Team, 7/28/25). That is classic hallucination, but now wrapped inside a workflow that looks productive.
  • Tool misuse. In some trials, agents abandoned the PDFs or files supplied by the user and went to fetch other materials from the web. For lawyers, that is a data-provenance nightmare. You think you are working from the client’s record. The agent quietly swaps in something else, often without any alert to the user. This suggest yet another challenge for AI Risk-Mitigation Officers, which I predict will soon be a hot new field for tech-savvy lawyers.

The authors of the Mellon/Stanford Study explicitly flag these behaviors. As will be discussed, the new version five series of ChatGPT AI and other equivalent models such as Gemini 3, may have lessened these risks, but the problem remains.

For legal practice and other high-stakes matters such as medical, the takeaway is simple: if you do not supervise the workflow and do not control the sources, you will not even know when you left the record, or what is real and what is fake. That may be fine for hairstyles but not for Law.

A humanoid robot with a metallic finish and intricate design stands beside a woman with an edgy hairstyle and makeup in a modern salon setting.
Hairstyle by a hallucinating AI. Is this hair real or fake?

8. Legal Ethics and Professionalism: Competence, Supervision, Confidentiality

Nothing in the Agent Study changes the fundamentals of legal ethics. It sharpens them.

  • Competence now includes understanding how AI works well enough to use and supervise it responsibly. ABA Model Rule 1.1.
  • Supervision means treating agents like junior lawyers or vendors: define their scope, demand logs, and review their work before it touches a client or court. Rule 5.1.
  • Confidentiality means knowing where your data goes, how it is stored, and which models or services can access it. Rule 1.6.

The same logic applies to medical ethics and professional standards in other regulated fields. In all of them, responsibility remains with the human professional.

As I argued in AI Can Improve Great Lawyers—But It Can’t Replace Them, the highest-value legal knowledge is contextual, emergent, and embodied. The same is true of the highest-value medical judgment. It cannot be bottled and automated. Agents are tools, not professionals with standing.

An illustration of a robot opening a glowing box, surrounded by abstract digital elements and stars, symbolizing the discovery of advanced technology.
Now that Agents have emerged and we’ve seen their abilities, we know they are just tools, and fragile ones at that.

9. Do Not Over-Generalize: What the Study does and does not cover

Before we map this into legal workflows, it is important to stay within the boundaries of the evidence.

The 127 Occupational tasks that Stanford and Carnegie researched were all office-style, structured sandboxed environments.

The legal profession should treat the results as directly relevant only to:

  • Structured drafting,
  • Evidence and data analysis,
  • Spreadsheet and dashboard work,
  • Document-heavy desk work that has clear inputs and outputs.

They tasks studied do not directly answer questions about:

  • Final legal conclusions,
  • Persuasive writing to judges or juries,
  • Ethical decisions, strategy, or settlement judgment.

Those legal domains are within what I call the human edge. The Human Edge: How AI Can Assist But Never Replace.

An illustration labeled 'Study Scope' featuring icons of a document, a chart, and a table. Silhouettes of people in the background create a collaborative atmosphere.
Study only covered a few computer tasks performed by legal professionals and did not include any non-computer use tasks.

10. What the Findings Mean for Legal Workflows

The natural question for any lawyer is: So where does this help me, and where does it not? The answer lines up nicely with the task categories in the Study.

A. Structured drafting as legal building blocks

The writing tasks in the paper look a lot like the templated components of much legal writing:

  • Fact sections and chronologies,
  • Procedural histories,
  • Policy and compliance summaries,
  • Standardized client alerts and internal memos.

These are places where agents can:

  • Produce reasonable first drafts quickly,
  • Enforce consistency of structure and style,
  • Help with cross-references, definitions, and internal coherence.

Humans still need to control:

  • Tone, emphasis, and narrative arc,
  • Which facts matter for the client and the forum,
  • How much assertion or restraint is appropriate.

The right pattern is: let the agent assemble and polish the building blocks; you decide which building you are constructing.

I’ve also documented the power of AI-driven expert brainstorming across dozens of experiments over the past two years. For readers who want to explore that thread, I’ve compiled those Panel of Experts studies in one place called Brainstorming.

A robotic figure sitting at a desk with a laptop, displaying a glowing brain above its head, indicating advanced intelligence or insight in a high-tech environment.
AI is great at brainstorming creative solutions.

B. Evidence analytics as data analysis

The data-analysis type of work included in the Study maps cleanly to some litigation and investigation tasks:

  • Damages models and exposure estimates,
  • Budget and variance analyses,
  • Timeline and attendance compilations,
  • De-duplication and reconciliation of overlapping datasets,
  • Citation and reference tables.

Here the speed gains are real. Having an agent pull, group, and calculate from labeled inputs can save hours.

But that 37.5% error rate on calculations is a red flag. Again the multimodal method shows the way. For legal work, the rule of thumb should be:

Agents may calculate.

Humans must verify.

You can treat agent results like you would a junior associate’s complex spreadsheet: extremely useful, never unquestioned.

C. Legal research and persuasion are different animals

It is tempting to read “writing” and “analysis” and think this Study blesses full-blown AI Agent legal research and brief-writing. It does not.

An illustration depicting a lawyer holding a legal document and gavel, while facing a humanoid robot in a maze labelled 'Legal Research Frontier'. The image represents the intersection of technology and legal research.

The tasks in the paper do not measure:

  • Authority-based research quality,
  • Case-law synthesis under jurisdictional constraints,
  • Persuasive legal writing aimed at a specific judge or tribunal.

Those domains depend heavily on:

  • Judgment,
  • Ethics and candor,
  • Audience calibration,
  • Deep understanding of rules and standards.

That is the territory I have called the human edge in earlier writings. AI can assist in jagged line, but it cannot replace the lawyer’s role.

A robot sits at the base of a mountainous landscape, working on a computer, while a human figure stands triumphantly at the summit, holding a staff beside a sign that reads 'HUMAN EDGE' under a sunrise.
Humans have an edge over AI in everything except rational thinking, and knowledge.

11. Hybrid Centaurs, Cyborgs,
and the 68.7% Result

For two and a half years, since I first heard the concepts and language used by Wharton Professor Ethan Mollick (From Centaurs To Cyborgs), I have used the Centaur → Cyborg metaphor and grid as a simple way to write about hybrid AI use:

  • Centaur. Clear division of labor. The human does one task; the AI does a related but distinct task. Strategy and judgment remain fully human. The AI does scoped work such as writing code, outline and first draft generation, summarizing, or checking. Some foolish users of this method and fail to verify the AI (horsey) part.
  • Cyborg. Tighter back-and-forth. Human and AI work in smaller alternating steps. The lawyer starts; the AI refines; the lawyer revises; the AI restructures. Tasks are intertwined rather than separated. Supervision is inherent to the process. The Study suggests this is the best way to perform Agentic tasks.
A futuristic illustration of a humanoid figure with robotic features, standing on a rocky pathway, holding a lantern, and gazing into a starry landscape filled with floating geometric shapes and glowing cracks.
Centaur+Cyborg is good way to navigate the jagged edge and use AI Agents.

The Cyborg type of Hybrid workflow is good for AI Agents because:

  • Augmentation inside human workflows (Centaur-like use) speeds people up by 24.3%.
  • End-to-end full automation slows people down by 17.7% because of the review burden.
  • Step-level teaming, where the human handles the non-programmable judgment steps and the agent handles the rest in a close, intermingled process improves performance by 68.7% with quality intact. That is Hybrid, Cyborg-style work done correctly.
An abstract illustration representing 'Hybrid Practice', featuring a stylized spiral staircase with layered elements depicting human figures, documents, and circuit patterns against a dark background.
Humans an AI working closely together step by step.

12. Best-Practice Argument: Hybrid, Multimodal Use Should Be the Standard of Care—Especially in Law and Medicine

For more than a decade, my position has been consistent: the safest and most effective way to use AI in any high-stakes domain is hybrid and multimodal. That means:

  • Multiple AI capabilities working together (language, code, retrieval, vision),
  • Combined with traditional analytic tools (databases, spreadsheets, review platforms),
  • All orchestrated by humans who remain responsible for judgment, ethics, and outcomes.
A conductor guides a group of humanoid robots, with swirling blue energy above, creating an atmosphere of hybrid collaboration between humans and technology.
Humans conduct an orchestra AI of instruments.

I first developed this view in e-discovery using active machine learning, but it maps cleanly to agentic AI systems and now extends well beyond law. The Carnegie/Stanford Study provides the empirical foundation: hybrid, supervised workflows outperform fully autonomous ones in speed and quality.

The evidence and professional obligations point in the same direction: hybrid, multimodal AI use—under strong human oversight, is not a temporary workaround. It is the durable, long-term standard of care for law, medicine, and any profession where judgment and accountability matter.

AI has no emotions or intuition—only clever wordplay.

Illustration contrasting human intuition represented by a heart and machine computation depicted as a circuit board within a round shape.
Get the dualities to work together and you have Hybrid Augmentation Supremacy.

13. Risk and Governance: A Quick Checklist for Lawyers, Legal Ops, and Other High-Stakes Teams

The Carnegie/Stanford Study gives us concrete failure modes. Risk management should respond to those, not hypotheticals. Here is a short “trust but verify” checklist designed for law but conceptually adaptable to medicine and other high-stakes fields.

A. Provenance or it is not used.

Require page, line, or document IDs for every fact an agent surfaces. If there is no source anchor, the output does not get used. If speculation must be included, you should label it as such. In clinical settings the analogue is clear: no untraceable data, images, or derived metrics.

B. No blind web pivots.

Agents that “helpfully” fetch other files when they cannot parse your materials must be constrained. In law, that means they stay within the client record or approved data repositories. In medicine, the agent must not silently mix in external data that is not part of the patient’s chart.

C. Fabrication drills.

Regularly feed the system bad PDFs or deliberately ambiguous instructions, then watch for made-up numbers or invented content. Document what you catch and fix prompts, policies, and configuration. Health systems can do the same with flawed test inputs and simulated charts.

D. Mark human-only steps.

Identify steps that are inherently non-programmable, such as visual judgments, privilege calls, contextual inferences, settlement strategy, or ethical decisions. In medicine, the parallels are differential diagnosis, treatment choice, risk discussion, and consent. These remain human steps. An AI should never deliver a fatal diagnosis.

An illustration depicting a split brain design: one half showcases structured tasks represented in blue circuitry, while the other half features words like 'Judgement,' 'Advocacy,' and 'Ethics' in glowing orange against a dark backdrop. A humanoid robot and a business professional are interacting with a digital interface at the center.
Combine the unique skills of each kind of intelligence and know when to step from one to another.

E. Math checks are mandatory.

A 37.5% error rate in data-analysis tasks is more than enough to require independent human verification. Use template calculations, cross-checks, and a second set of human eyes any time numbers affect a client or patient outcome.

F. Logging and replay.

Turn on action logs for every delegation: files touched, tools invoked, transformations run. If the platform cannot log, it is not appropriate for high-stakes legal or clinical work.

G. Disclosure and confidentiality.

Disclose AI use when rules, regulations, or reasonable expectations require it. Keep agents confined to narrow, internal repositories when handling client or patient data. Treat them at least as carefully as you would any other third-party system with sensitive information.

H. Bottom line:

Fabrication and tool misuse are not hypothetical. The Study observed and measured them. You should assume they will occur and design your governance accordingly.

A colorful artistic painting depicting a seated elderly man with a mechanical head of circuitry, conversing with a robot in a similar style, seated in an orange armchair against a vivid backdrop.
The tendency of AI to make things up, to hallucinate, is lessening as the models improve, but is still a real threat, so is one of its causes, sycophantism.

14. Counter-Arguments and Rebuttals

You may hear pushback against the hybrid method from some technologists who argue for full automation, after all that’s how Wikipedia defines Agent, as fully autonomous. That has always been the dream of many in the AI community. You will also hear the opposite criticism, frequently from legal colleagues, who resist the use of AI, at least in any meaningful way. The Study frustrates both camps—automation maximalists and AI-averse traditionalists—because its empirical findings support neither worldview as they currently argue it.

A. “AI if just a passing fad.”

The anti-AI argument is also strong and based on powerful fears. Still, the legal profession must not allow itself a Luddite nap. Those of us who use AI safely everyday are working hard to address those concerns. See, for example, the law review article I wrote this year with my friend, Judge Ralph Artigliere (retired), who did most of the heavy lifting: The Future Is Now: Why Trial Lawyers and Judges Should Embrace Generative AI Now and How to Do it Safely and Productively. (American Journal of Trial Advocacy, Vol. 48.2, Spring 2025),

B. “Full autonomy is imminent; hybrids are a temporary crutch.”

Autonomy is improving, but the current evidence contradicts claims of imminent AGI, much less super-intelligence. Instead, it shows:

  • programmatic bias,
  • low success rates, and
  • failure modes that directly implicate ethics, confidentiality, and safety.

That is why the authors of the Carnegie/Stanford paper recommend designs inspired by human workflows and step-level teaming, not unsupervised handoff. In fields like law and medicine, where standards of care and liability apply, hybrid is not a crutch, it is the design pattern.

Soon, the cyborg connection and control tools that humans use to work with AI will be design patterns too. Stylish new types of tattoos and jewelry may become popular as we evolve beyond the decades old smart phone obsession. See e.g. Jony Ive’s sale for $6.5 Billion to Open AI of his famous design company, which designed iPhones for Apple.

A portrait of a woman with short hair, wearing a black cap and glasses. Her skin features glowing blue circuit-like patterns. She is dressed in a black shirt and has a futuristic device around her neck.
Next generation computer links will emerge as we evolve beyond smart phones. Early forms of smart glasses and pendants are already available. I predict electric tattoos and hats will come next.

Plus, there are many things more important than thinking and speech, things that AI can never do. AI is a super-intellectual encyclopedia, but ultimately, heartless. This truth drives many of the fears people have about AI, but is not well founded. See, The Human Edge: How AI Can Assist But Never Replace, and AI Can Improve Great Lawyers—But It Can’t Replace Them.

C. “Hybrid slows teams down.”

The data in the Study shows:

  • augmentation inside human workflows, the hybrid team method, speeds people up by 24.3%;
  • attempted end-to-end automation slows people down by 17.7% because the verification and debugging of AI mistakes reduce the gains.

Hybrid done correctly is faster and safer than human-only practice. Autonomous AI is fast, and often clever, but its tendencies to err and fabricate make it too risky to let loose in the wild.

D. “Quality control can be automated away.”

Not for high-stakes work. The 37.5% data-analysis error rate and the fabrication examples are exactly the kind of failures automation does not see. Quality is judgment in context: applying rules to facts, weighing risk, and making trade-offs with human beings in mind. That is lawyer and medical work. While I agree some quality control work can be automated, especially by applying metrics, not all can be. The universe is too complex, the variables too many. We will always need humans in the loop, although their work to ensure excellence will constantly change.

E. “Agents already beat humans across the board.”

Where both succeed, agents are usually faster and cheaper. That is good news. But their success rates are still 32.5% to 49.5% lower. In law or medicine, a fast wrong answer is not a bargain, it is a liability. It could be a wrongful death. Hybrid workflows let you capture some of the speed and savings while keeping human-level or better quality.

A futuristic scene depicting a human operator interacting with a holographic AI assistant in a high-tech control room, surrounded by digital displays of information and data.
ThenStudy shows you have to keep a qualified human at the helm of Hybrid teams.

15. The New Working Rules
H-Y-B-R-I-D

These rules appys in law, medicine, and any other field that cannot afford unreviewed error. [Side Note: AI came up with this clever mnemonic, not me, but it knows I like this sort of thing.]

H Human in charge. Strategy, conclusions, and sign-off stay human.
Y Yield programmable steps to agents. Let agents handle tasks they can do well.
B Boundaries and bans. Define no-go areas: final legal opinions, privilege calls, etc.
R Review with provenance. If there is no source or traceable input, the output is not used.
I Instrument and iterate. Turn on logs, run regular fabrication drills, and update checklists.
D Disclose and document. Inform and document efforts when AI is used in a significant manner.

The word 'HYBRID' illustrated in a bold, colorful, and stylized font.

16. Does the November 2025 Study Use of Last Month’s Models Already Make it Obsolete?

After the Study was completed new models of AI were released that purport to improve on the accuracy and reduce the hallucinations of AI Agents. These are not empty claims. I am seeing this in my daily hands-on use of the latest AI. Still, I also see that every improvement seems to create new, typically more refined issues.

The advances in AI models do not change the structural lessons:

  • Agents still prefer programmatic paths over messy reality.
  • Step-level teaming still beats blind delegation, especially in risk sensitive occupations.
  • Logging, provenance, and supervision remain non-negotiable wherever high standards of care apply.

Hybrid is not a temporary workaround while we wait for some imagined fully autonomous professional AI. It is the durable operating model for AI in work, especially in legal work, medical, and other fields where judgment and accountability matter. The AI can augment and improve your work.

A man in a suit with digital circuitry patterns on his face and arm speaks in a courtroom setting while holding a tablet, with a humanoid robot behind him and a judge in the background.

Conclusion: Keep Humans in Command And Start Practicing Hybrid Now

The Carnegie/Stanford evidence confirms what those of us working hands-on with AI already know: Agents are astonishingly fast, relentlessly programmatic, and sometimes surprisingly brittle. Humans, on the other hand, bring judgment, spirit, context, and accountability, but not speed. When you combine those strengths intentionally—working in a close back-and-forth rhythm—you get the best of both worlds: speed with quality and real human awareness. That is the advanced cyborg style of hybrid practice.

And no, it is not the fully autonomous Agent that nerds and sci-fi optimists like me once dreamed about. But it is the world that researchers observed when they opened the box. Thank you, Stanford and Carnegie Mellon, for collapsing yet another Schrödinger cat.

An illustration depicting a futuristic robot on the left, looking confident, alongside a smaller, sad robot on the right, facing a computer screen with code and a question mark, symbolizing the challenges of AI in understanding complex tasks.
Observations burst another SciFi fantasy bubble about AI Agents.

Hybrid multimodal practice is not a temporary bridge. It is what agency actually looks like today. It is the durable operating model for law, medicine, engineering, finance, and every other field where errors matter and consequences are real. The Study shows that when humans handle the contextual, ambiguous, and judgment-heavy steps—and agents handle the programmable remainder—overall performance improves by 68.7% with quality intact. That is not a footnote. That is a strategy.

So the message for lawyers, clinicians, and every high-stakes professional is straightforward:

Use the machine. Supervise the machine. Do not become the machine.

Two individuals smiling at the camera, wearing futuristic attire and caps, with intricate geometric tattoos adorning their necks, set against a high-tech background.
These future humans are in control of their fashionable new AI devices. You don’t want to know what is under their hats!

Here is your short action plan—the first steps toward responsible AI practice:

  • Adopt the H-Y-B-R-I-D system across your team. It operationalizes the Study’s lessons and bakes verification into daily habits.
  • Instrument your agents. If a tool cannot log its actions, replay its steps, or anchor its facts, it does not belong in high-stakes work.
  • Shift to cyborg-style hybrid teaming, where humans handle judgment calls and agents handle the programmable portions of drafting, evidence analysis, spreadsheet work, and data tasks.
  • Train everyone on trust-but-verify behaviors, not as a slogan but as the muscle memory of modern practice.
A businessman in a suit holds a shield labeled 'VERIFY' to protect himself from two robotic figures that appear menacing, with glowing red eyes and error messages floating around them in a dark, dramatic setting.

Those who embrace hybrid intelligently will see their output improve, their risk decline, and their judgment sharpen. Those who avoid it—or try to leap straight to full autonomy—will struggle.

The future of professional practice is not human versus machine.

It is human judgment amplified by machine speed, with the human still holding the pen, signing the orders, and deciding what matters.

And that is exactly what the Study revealed when it opened the box on modern AI: not flawless autonomy, but the measurable advantage of humans and agents working together, each taking the steps they handle best.

Hybrid is here. Hybrid works. Now it’s time to practice it.

A diverse group of professionals stands confidently in a modern office environment, with two humanoid robots in the background. They are dressed in business attire and display a mix of expressions, indicating collaboration between humans and AI.

Echoes of AI Podcast

Click here to listen to two AIs talk about this article in a lively podcast format. Written by Google’s NotebookLM (not Losey). Losey conceived, produced, directed and verify this 14-minute podcast. By the way, Losey found the AIs made a couple of small errors, but not enough to require a redo. See if you can spot the one glaring, but small, mistake. Hint: had to do with the talk about wraps.

Illustration of two anonymous AI podcasters discussing the findings of the Stanford-Carnegie study on hybrid AI teams, featuring titles and graphics related to AI performance.
Click to start podcast.

Ralph Losey Copyright 2025 — All Rights Reserved


From Ships to Silicon: Personhood and Evidence in the Age of AI

October 6, 2025

Ralph Losey, October 6, 2025.

The law has long adapted to include new participants. First, ships could be sued as if they were people. Later, corporations became legal entities, and more recently even rivers have been declared “persons” with rights. Now we move from ships to silicon: artificial intelligence. A new era of generative AI models can produce words, images, and decisions that resemble the marks of inner awareness. Whether that resemblance is illusion or something more, judges and lawyers will soon confront it not only in legal philosophy and AI seminars, but in motions practice and evidentiary hearings.

A courtroom scene featuring a holographic representation of a human figure between two arguing lawyers, while a judge observes from behind a bench.
After argument of counsel the Arbitrator permitted the AI to testify subject to post-trial motions to strike. All images by Ralph Losey using AI.

The right question is not whether AI is truly conscious, but whether its testimony can be tested with the same evidentiary rigor we apply to human witnesses and corporate entities. Can its words be authenticated, cross-examined, and fairly weighed in the balance of justice?

Courts today are only beginning to brush against AI — sanctioning lawyers for fake citations, issuing standing orders on disclosure of AI use, and bracing for the wave of deepfake video and image evidence. The next frontier will be AI outputs that resemble testimony, raising questions of authentication and admissibility. If those outputs enter the record, courts may need to consider supporting materials such as system logs or diagnostics — not yet common in litigation but already discussed in the scholarship as possible foundations for reliability.

This article follows that path. It begins with the history of legal personhood, then turns to the rules of evidence, and finally examines the personhood and consciousness debate. Along the way, it offers a few practical tools that judges and legal-techs can start using to handle AI in the courtroom. The aim is modest but urgent: to help the law take its first steady steps from ships to silicon, from abstract algorithms to evidence that demands to be weighed.


A holographic representation of a human-like figure testifying in a courtroom, with a judge observing and lawyers seated at a table using laptops.
AI witness testifying on direct exam. Opposing counsel wonders how the AI will do on her cross-exam.

When the AI is Allowed to Speak

Picture a deposition in complex commercial litigation. Counsel asks the sworn AI witness the most routine of questions: “Can you identify this document which has been marked for the record as Exhibit A?” Without hesitation, the system responds: “Yes, I can. It is part of my cognitive loop.” On its face the response sounds absurd. Machines are not conscious beings, are they? Yet the behaviors behind such a technical statement — goal-directed reasoning, persistent memory, and self-referential diagnostics — are already present in advanced AI systems.

The central risk is not that machines suddenly wake up with human-like awareness. It is that courts, lawyers, judges, arbitrators, and juries will be confronted with outputs that look like intentional human statements. When a human witness identifies an exhibit, counsel ask how and probe the witness’s memory, perception, and possible bias. When an AI says “this document is part of my cognitive loop,” a new type of cross-examination is needed: What loop are you referring to? How is that a part of you? Who are you? Are you not just a tool of a human? Shouldn’t the human you work with be testifying instead of you?

Those questions go to the heart of the credibility problem. Cross-examination works because a human witness can be pressed on perception, memory, or bias. When the witness is an AI, there is no memory in the human sense, no sensory perception of the world, and no personal motive to expose. The answers to “What loop? How is it part of you? Who are you?” may have to come not from the witness itself, but from logs, audit trails, and technical experts who can lay a proper foundation for AI testimony. Counsel on both sides will need to be creative, asking new kinds of questions. How does one prepare an AI witness for cross-examination like this? What objections should be raised? How should a judge respond? At first, there will inevitably be trial and error, appeals, and rehearings. The old boxes just don’t fit anymore.

A lawyer questions a digital, holographic AI representation in a courtroom setting, while a judge and others observe.
AI’s speech is becoming emotive and apologetic on cross-examine as a hostile witness.

Legal Personhood: from Ships, to Rivers to Citizens United

Law has long been pragmatic in its treatment of nonhuman actors as legal persons. See e.g. Wikipedia:

In law, a legal person is any person or legal entity that can do the things a human person is usually able to do in law – such as enter into contracts, sue and be sued, own property, and so on.

Roman law, collegia (guilds or associations) functioned as legal entities capable of owning property, contracting, and suing or being sued. During the Medieval Age the common law of admiralty started treating ships as juridical res, subject to in rem suits, even though no one believed the ships were alive. See The Siren, 74 U.S. 152 (1868). The Siren concerned a famous iron-hulled side-wheel steamship named Siren, which the US Navy finally captured in Charleston Harbor in 1865. It was a private trading ship that had run past the Union blockade 33-times, more than any other in history. During capture the Siren’s crew abandoned ship and Union sailors claimed it as a prize of war. The Union sailor crew-owners later accidentally ran into and sunk another ship in New York and that led to the Siren being sued in rem for damages caused its tort.

A historical painting of the steamship 'Siren' sailing on the ocean, showcasing its paddlewheel, masts, and smoke emission.
The Siren, a famous Civil War blockade runner and later US Supreme Court opinion. Fake AI image by Ralph Losey.

In the United States, the expansion of corporate personhood began in the late 19th century. Santa Clara County v. Southern Pacific Railroad, 118 U.S. 394 (1886) where, via a mere reporter’s headnote, corporations were cast as “persons” under the Fourteenth Amendment.

More recently, juridical recognition has extended beyond human institutions to natural entities: New Zealand’s Whanganui River was declared a legal person under the Te Awa Tupua Act 2017; Spain’s Law 19/2022 conferred legal status upon the Mar Menor lagoon, supposedly affirmed by Spain’s Constitutional Court in 2024; and Ecuador’s 2008 constitutional reforms enshrined rights of nature, allowing ecosystems standing in constitutional litigation.

In American constitutional doctrine, the controversial Citizens United v. FEC decision (558 U.S. 310 (2010)) further illustrates the elevated legal status of corporations. It held that corporate expenditures in elections are protected speech under the First Amendment. See e.g., The Brennan Center’s Citizens United Explained (provides a detailed critical account of both the decision’s legal reasoning and its broader democratic consequences). Also see: Asaf Raz, Taking Personhood Seriously (Columbia Business Law Review, Vol. 2023 No. 2, March 6, 2024).

These examples show that legal personhood has never been limited to human beings. No one thought ships could think, or rivers could speak, or corporations had beating hearts. Yet all have been treated as persons when it served broader purposes of justice, commerce, or environmental protection. Legal personhood is, at bottom, a policy tool — a fiction the law deploys when the benefits outweigh the costs. If the law has extended personhood in these ways, it is not too much of a stretch to ask whether AI could be next. That debate is already underway.

A comic-style illustration featuring elements related to legal personhood, including an old ship with a 'Court Seizure' flag, a modern skyscraper labeled 'Incorporated,' a river with a 'Legal Person' sign, and an abstract digital representation of a human face, symbolizing the evolution of legal recognition from tangible entities to artificial intelligence.
For better or for worse, the Law has always evolved with the times.

The Debate Over AI Personhood

Legal scholars, ethicists, and policymakers are deeply divided on this issue, and the arguments on both sides are instructive for anyone imagining what might happen when an AI “takes the witness chair.”

Arguments for AI personhood. Proponents point to precedent. Legal personhood has never been limited to natural persons. Corporations, associations, municipalities, and even natural entities like rivers have been granted legal standing. If a corporation — a legal fiction with no body or mind — can be a person, then it is not unthinkable that a sufficiently advanced AI might one day be treated similarly. Advocates argue that doing so could help fill accountability gaps when AI systems act autonomously in ways not directly traceable to programmers, operators, or owners. Others look ahead to the possibility of artificial general intelligence (AGI) with traits akin to self-awareness. If AI were to achieve something approaching subjective awareness or moral reasoning, then denying rights could be seen as ethically exploitative.

The judicial perspective. An especially thoughtful treatment comes from former SDNY District Judge Katherine B. Forrest in The Ethics and Challenges of Legal Personhood for AI, Yale Law Journal Forum (April 2024). Forrest examines AI’s increasing cognitive abilities and the challenges they will pose for courts, raising concerns about model drift, emergent capabilities, and ultra vires defenses. Her analysis grounds the personhood debate not in philosophy but in the daily realities of judging.

She predicts that while early AI cases will involve “relatively straightforward” questions of tort liability and intellectual property, the deeper ethical dilemmas will not be far behind. As she puts it:

Courts will be dealing with a number of complicated AI questions within the next several years. The first ones will, I predict, be interesting but relatively straightforward: tort issues dealing with accountability and intellectual property issues relating to who made the tool, with what, and whether they have obligations to compensate others for the generated value. If an AI tool associated with a company commits a crime (for instance, engaging in unlawful market manipulation), we have dealt with that before by holding a corporation responsible. But if the AI tool has strayed far from its origins and taken steps that no one wanted, predicted, or condoned, can the same accountability rules apply? These are hard questions with which we will have to grapple.

Forrest then pushes further, highlighting the inevitable collision between doctrine and ethics:

The ethical questions will be by far the hardest for judges. Unlike legislators to whom abstract issues will be posed, judges will be faced with factual records in which actual harm is alleged to be occurring at that moment, or imminently. There will be a day when a judge is asked to declare that some form of AI has rights. The petitioners will argue that the AI exhibits awareness and sentience at or beyond the level of many or all humans, that the AI can experience harm and have an awareness of cruelty. Respondents will argue that personhood is reserved for persons, and AI is not a person. Petitioners will point to corporations as paper fictions that today have more rights than any AI, and point out the changing, mutable notion of personhood. Respondents will point to efficiencies and economics as the basis for corporate laws that enable fictive personhood and point to similarities in humankind and a line of evolution in thought that while at times entirely in the wrong, are at least applied to humans. Petitioners will then point to animals that receive certain basic rights to be free from types of cruelty. The judge will have to decide.

Forrest’s conclusion underscores the urgency of the debate: these issues will not remain theoretical for long. Courts will face them in live cases, on real records, with harms alleged in the here and now.

Her article also offers a striking observation about Dobbs v. Jackson Women’s Health Org., 597 U.S. 215, 276 (2022) noting that it left decisions as to when personhood attaches to the states. By doing so, it opened the door to highly variable juridical interpretations of personhood. As Forrest notes, the decision eliminated any requirement of human developmental, cognitive, or situational awareness as a prerequisite for bestowing significant rights, while at the same time diminishing the self-determination — and therefore liberty — of women. That framework, she suggests, could ironically be repurposed as a basis for extending rights to a human creation: AI. If the law does not demand awareness as a condition of personhood, why exclude machines?

A futuristic robotic figure sitting at a desk, holding a pen, next to a gavel, with a background featuring a digital scale of justice and an AI symbol.
If it looks like a duck, swims like a duck, and quacks like a duck, then it is probably a duck.

Arguments against AI personhood. Forrest discusses both sides of the AI personhood debate. Critics of AI personhood argue that it lacks the qualities that justify recognition as a legal person. Unlike humans, AI systems have no consciousness, no perception, and no subjective experiences. They process data but do not feel. Treating a machine as a legal person, they warn, could blur the line between humans and tools in ways that erode human dignity. Others worry about liability arbitrage, with corporations offloading blame onto AI “shells” that have no assets and no capacity to make victims whole.  That divide is already echoed in the academic literature. See Abeba Birhane, et al., “Debunking Robot Rights Metaphysically, Ethically, and Legally” (2024).

Alternative approaches. Because both extremes raise serious problems, lawmakers and scholars have considered middle-ground options. The European Parliament once floated the idea of “electronic personhood” for robots but ultimately rejected it. The EU AI Act, adopted in 2024, takes a different path: treating certain AI systems as regulated entities subject to logging, oversight, and human accountability, while stopping short of personhood. Other proposals focus on enhancing corporate liability for harms caused by AI or creating a new, limited legal category that acknowledges AI’s unique features without elevating it to full personhood. As Asaf Raz has observed in Taking Personhood Seriously (Columbia Business Law Review, March 2024), legal personhood has always been instrumental, “a policy tool rather than a metaphysical judgment,” and the question is how best to deploy that tool in light of modern challenges.

The Citizens United shadow. In the United States, debates over AI personhood unfold in the long shadow of Citizens United v. FEC, 558 U.S. 310 (2010). By extending First Amendment protections to corporate political spending, the Supreme Court illustrated how powerful the fiction of corporate personhood can become once entrenched. The Brennan Center’s “Citizens United Explained (2019) offers a detailed critique of that ruling and its consequences for democracy. For many, it stands as a cautionary tale: once nonhuman entities gain even limited rights, those rights may expand in ways courts never intended.


Where courts stand today. For now, these debates remain in the academic and policy realm. No judge has yet been asked to declare an AI system a legal person. What courts do face, however, are more immediate evidentiary challenges: AI-generated outputs, filings drafted with the help of large language models, and the specter of deepfakes masquerading as authentic evidence. Whether or not AI is ever granted personhood, judges must already decide how to handle these new kinds of artifacts under the familiar rules of evidence.

A humanoid robot in a courtroom setting, wearing a suit, appears confused while holding a stack of papers and scratching its head.
Sure acts like a person, an eccentric, sometimes genius sometimes forgetful, but always well-spoken.

From Philosophy to Procedure: Evidence First

We have traced the history of legal personhood and surveyed the personhood debate. But speculation only goes so far. Courts today are beginning to face a more immediate question: when AI outputs appear in discovery or trial, can they be admitted as evidence? From the fake citations in Mata v. Avianca to standing orders warning lawyers not to submit unverified AI text, judges are already being forced to draw early lines. To keep cases on track, they need tools that are practical, conservative, and rooted in existing evidentiary doctrine.

Here are three such tools for judges, litigators, and legal technologists to consider and refine:

  • ALAP: AI Log Authentication Protocol
  • Replication Hearing Protocol
  • Judicial Findings Template for AI Evidence

Introduction. These are small steps, not sweeping reforms. They echo the serious issues introduced by Judge Paul Grimm and Professors Maura Grossman and Gordon Cormack in Artif icial Intelligence as Evidence, 19 Nw. J. Tech. & Intell. Prop. 9 (2021). That article, though written before generative AI emerged, remains indispensable.

As Grimm, Grossman, and Cormack put it:

The problem that the AI was developed to resolve — and the output it produces — must ‘fit’ with what is at issue in the litigation. How was the AI developed, and by whom? Was the validity and reliability of the AI sufficiently tested? Is the manner in which the AI operates ‘explainable’ so that it can be understood by counsel, the court, and the jury? What is the risk of harm if AI evidence of uncertain trustworthiness is admitted?” (Id. at 97–105).

They stress two core concepts: validity (whether the system does what it was designed to do) and reliability (whether it produces consistent results in similar circumstances). Those concepts have guided courts for years in assessing scientific and expert evidence. They should also guide us here.

For more recent thinking by Grimm and Grossman, see e.g: The GPTJUDGE: Justice in a Generative AI World, Duke Law & Technology Review (Oct. 2023); Judicial Approaches to Acknowledged and Unacknowledged AI-Generated Evidence (May 2025), which addresses deepfakes and recommends using expert testimony to ground admissibility rulings. Also see, Losey, R., WARNING: The Evidence Committee Will Not Change the Rules to Help Protect Against Deep Fake Video Evidence (e-Discovery Team, Dec, 2024).

A futuristic portrait of a woman with robotic features, showcasing a blend of human and artificial intelligence elements, set against a modern, technological backdrop.
Picture of Ralph’s friend, Professor Maura Grossman, real or fake?

Tool 1: ALAP — AI Log Authentication Protocol

Purpose & Rationale. ALAP (AI Log Authentication Protocol) is designed to meet the authentication requirement of Federal Rule of Evidence 901(b)(9), which permits authentication of evidence produced by “a process or system” if the proponent shows that the process produces “an accurate result.”

Checklist. Under ALAP, the producing party should provide:

  • Model and version identification;
  • Configuration record (data sources, parameters, safety settings);
  • Prompt and tool call logs;
  • Guardrail or filter events;
  • Execution environment (hardware/software state);
  • Custodian declaration tying the output to this configuration.

Support & Authority.


Tool 2: Replication Hearing Protocol

Purpose & Rationale. When a human testifies, cross-examination probes perception, memory, and bias. AI has none of those faculties, but it does have vulnerabilities: instability, sensitivity to prompts, and embedded bias in training data. A replication hearing provides a substitute.

The goal is not to achieve exact duplication of output — which may be impossible with evolving, probabilistic models — but to test whether the system is substantially similar in its answers when asked the same or variant questions. In this sense, replication hearings align with the reliability gatekeeping function under Daubert and Kumho Tire. See Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 589 (1993); Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). They also align with the Evidence Rule governing expert testimony, where “perfection is not required.” Fed. R. Evid. 702, Advisory Committee Note to 2023 Amendment (last two sentences of the 2023 Comment).

For example, I prompted ChatGPT4o as a legacy model on September 28, 2025 as follows: “Provide a one sentence description of artificial intelligence.” It responded by generating the following text: “Artificial intelligence is the field of computer science dedicated to creating systems capable of performing tasks that typically require human intelligence, such as reasoning, learning, perception, and decision-making.

I provided the same prompt one minute later to the current model, ChatGPT-5, and received this response: “Artificial intelligence is the branch of computer science that designs systems capable of performing tasks that typically require human intelligence, such as reasoning, learning, problem-solving, and language understanding.”

GPT-5 is supposed to be smarter, and its answer reflects that, a little, but is, to me at least, substantially similar to the response of the prior model, GPT-4o. One says is a “field” of computer science, the other a “branch.” One says “reasoning, learning, perception, and decision-making” the other “reasoning, learning, problem-solving, and language understanding.”

An illustration depicting a courtroom scene with a humanoid robot sitting as a witness, flanked by a female lawyer and a judge, with observers in the background.
You say potato I say potahto. Let’s call the whole thing off.

Protocol. At its core, a replication hearing should:

  • Lock the environment as closely as possible. The producing party must document the version of the system, its configuration, and parameters in place at the time of the original output. If that version is no longer available, the proponent must show why and explain what changes have occurred since.
  • Re-run the prompts in a controlled setting. The same queries should be submitted, alongside small variations, to test whether answers remain consistent in meaning. You could do repeat runs to circumvent the changing models issue as part of your tests, just as I did above.
  • Log everything. Inputs, outputs, timestamps, and environment details should be captured to permit later review. And be prepared to produce them, so do not include private attorney comments in such a log, such as “Oh no, this will kill our case is we disclose it.”)
  • Compare for stability of meaning. The measure is not identical phrasing, but whether the AI provides answers that are effectively the same — the substance is consistent even if the wording differs.

Limitations & Judicial Discretion. Replication hearings are not a silver bullet. Models change, versions drift, and nondeterminism ensures some variation. They should be treated as a stress test, not an absolute guarantee. Consistent results support reliability; unraveling under modest variation reveals weakness. Judges should demand enough stability for adversarial testing and fair weight — but not perfection.

Support & Authority.

  • Fed. R. Evid. 702; Advisory Committee Note to 2023 Amendment:
    • Nothing in the amendment imposes any new, specific procedures. Rather, the amendment is simply intended to clarify that Rule 104(a)’s requirement applies to expert opinions under Rule 702. Similarly, nothing in the amendment requires the court to nitpick an expert’s opinion in order to reach a perfect expression of what the basis and methodology can support. The Rule 104(a) standard does not require perfection. On the other hand, it does not permit the expert to make claims that are unsupported by the expert’s basis and methodology.”
    • The Rule 104(a)Rule 104. Preliminary Questions. “(a) In General. The court must decide any preliminary question about whether a witness is qualified, a privilege exists, or evidence is admissible. In so deciding, the court is not bound by evidence rules, except those on privilege.”
  • Grimm & Grossman, Artificial Intelligence as Evidence, 19 Nw. J. Tech. & Intell. Prop. 1, 46, (2021).
  • Grimm & Grossman, Judicial Approaches to Acknowledged and Unacknowledged AI-Generated Evidence (May 2025) at pgs 152 and 153:
    • Finally, the court should set a deadline for an evidentiary hearing and/or argument on the admissibility of acknowledged AI-generated or potentially deepfake evidence sufficiently far in advance of trial to be able to carefully evaluate the evidence and challenges and to make a pretrial ruling.These issues are simply too complex and time consuming to attempt to address on the eve of or during trial.
    • Expert disclosures should be detailed and not conclusory and must address the evidentiary issues that judges have to consider when ruling on evidentiary challenges, such as the Rule 702 reliability factors and the Daubert factors that we have previously discussed.

An illustration depicting a courtroom scene with a gavel, a motion document, and a verification report, highlighting the process of legal verification.
Verified template report.

Tool 3: Judicial Findings Template for AI Evidence

Purpose & Rationale. Judges must leave a clear record showing how they handled AI evidence. Federal Rule of Civil Procedure 52(a) already requires findings of fact in bench trials. Extending that practice to AI evidence rulings will give appellate courts a meaningful basis for review.

Template Elements. A model order admitting or excluding AI evidence should, at minimum, address:

  1. Authentication Measures. Whether the proponent satisfied ALAP requirements — identification of the model/version, logs, custodian declaration, and reproducibility artifacts.
  2. Replication and Stability Findings. Whether the AI produced the same or substantially similar outputs under controlled re-runs; if not, why not.
  3. Bias and Sensitivity Testing. Whether adversarial prompts or variant inputs were tested, if reasonably possible and warranted under proportionality standards (Fed. R. Civ. P. 26(b)(1)).
  4. Protective Measures Applied. Any confidentiality safeguards imposed, including redactions, attorneys’-eyes-only restrictions, or non-waiver stipulations.
  5. Reliability Determination. The court’s conclusion: admit, admit with limits, or exclude — and the reasoning for that conclusion.

Support & Authority.

  • Fed. R. Civ. P. 52(a)(1); General Elec. Co. v. Joiner, 522 U.S. 136, 146 (1997) (emphasizing the abuse-of-discretion standard for evidentiary rulings but requiring a record of reasoning).
  • Grimm & Grossman, Judicial Approaches at pg. 154 suggest information helpful for a court to rule includes evidence on validity, reliability, error rates, bias, and in the special cases of AI fraud allegations, “the most likely source of evidence, what the content or metadata suggests about provenance or manipulation, and the probative value of the evidence versus the prejudice that could occur were the evidence to be admitted. unacknowledged AI-generated evidence, information about the most likely source of evidence, what the content or metadata suggests about provenance or manipulation, and the probative value of the evidence versus the prejudice that could occur were the evidence to be admitted.
A cheerful lawyer enthusiastically typing on a computer in a well-furnished office filled with law books and a smiling judge in the background.
Many fakes are obvious and don’t require expensive experts.

Speculation on Future AI Evidence Tools

So far, we have stayed close to the ground, offering simple tools that courts could adopt tomorrow morning without rewriting the Rules. But technology does not stay still. In two to four years — perhaps sooner — we will see generative AI systems like GPT-6 or GPT-7 deployed in ways that make today’s questions about “outputs” seem quaint. These systems may not only generate records but actually appear in court to give live testimony, answering questions in real time. They may prove to be very good at cross-exam — and finally stop apologizing. What happens to our starter tools in that future world?

Let us consider each in turn.

Tool 1. ALAP in the Age of GPT-7: From Logs to Consciousness Diaries

Today’s ALAP demands logs, prompts, and configurations. In the GPT-6/7 era, those logs may look more like consciousness diaries: running records of what the system “attended to,” what internal states it represented, and why it chose one answer over another. Already, researchers are experimenting with far greater clarity of process, with “chain of thought logging” and “explainable AI” systems that preserve a trace of the model’s reasoning. Dario Amodei Warns of the Danger of Black Box AI that No One Understands (e-Discovery Team, May 19, 2025) (discusses Amodei’s AI MRI proposal, voluntary transparency rules and export‑control “breathing room”). Future ALAP may require not just the external inputs and outputs, but the internal rationale artifacts, what path the AI followed inside its trillion-parameter brain.

A digital display showcasing an AI-generated MRI image of a humanoid figure with a glowing heart, highlighting anatomical details.
MRI of this AI shows it has a good heart.

Imagine a courtroom where the proponent of Exhibit A does not simply submit logs, but a time-stamped trace of the AI’s deliberations, a transcript of a digital mind. It will likely be very impressive in its complexity. A trillion-transformer transcript is beyond what a single human could fully comprehend, much less create. Yet it will be produced, it will be disclosed and attacked by opposing counsel and their own AI. They will look for holes and errors, as they should. If the proponent of Exhibit A has done their job correctly and tested the Ai generation fully before production, the opposition will find no errors of significance. Exhibit A will then be authenticated and admitted as accurate and reliable.

The legal arguments will then focus on the real disputes: the significance of Exhibit A, and how the AI-generated evidence applies to the facts and issues of the case. The weight of that evidence, and the ultimate outcome, will remain — as they should — in human hands: judge, arbitrator, and jury..

Tool 2: Replication Hearings: From Sandbox Runs to AI Depositions

Replication today means re-running queries in a sandbox to test stability. In the GPT-6/7 era, it may look more like a deposition of the AI itself. Counsel could pose variations of the same question live, in a controlled setting, to see whether the system answers consistently or unravels. Dozens of rephrasings, edge cases, and adversarial prompts could probe whether the AI’s testimony holds up under pressure.

Think of it as Daubert meets the Turing Test: is the AI stable enough under questioning to count as reliable testimony, or does it contradict itself like a nervous witness? Judges may even order recorded mock trial runs of AI testimony as the new form of replication hearing — “stress tests” that simulate cross-examination before the real thing.

Tool 3. Judicial Findings Templates: From Written Orders to Dynamic Bench Reports

Today, findings templates are static orders: a few pages where a judge checks boxes on authentication and admissibility. In the GPT-6/7 era, they may evolve into dynamic bench reports. A judge would not just note that an AI output was authenticated and replicated, but attach the full supporting record: the AI’s self-examination logs, replication deposition transcripts, error analyses, and even explainability metrics such as probability distributions or self-reported uncertainty. Independent audits of system reliability might become standard exhibits.

Picture an appellate court reviewing not just a written order, but a bundle: the ALAP diary, the replication deposition, and the judge’s annotated findings, all linked together. It would be the twenty-first-century equivalent of a paper record on appeal — except the “witness” was silicon, not flesh.

Evidence Tools of Tomorrow

In short, the tools we begin with today will not remain static. ALAP could evolve into machine “reasoning diaries.” Replication hearings could resemble live AI depositions. Judicial findings templates may grow into multimedia records of AI testimony, complete with cross-exam transcripts, explainability metrics, and confidence scores.

That future is not science fiction — it is the natural extension of what courts already require: transparency, stability, and a record clear enough for appellate review. Just as ships, corporations, and rivers once forced the law to expand its categories, AI will compel judges and lawyers to reshape the evidentiary toolkit. The old boxes do not fit anymore, but the work of testing, admitting, and weighing evidence remains the same.

A professional in a suit presents information to a group seated at a table, with multiple digital screens in the background displaying data on algorithmic bias, compliance, and public trust metrics.
The next ten years will see rapid advances in AI and its use as evidence.

Conclusion: The Call of the Frontier

We began with ships, corporations, and rivers. Each, in its time, seemed an unthinkable candidate for legal personhood, yet each was granted recognition when the law needed a tool to achieve justice. Today, AI systems stand at the edge of that same conversation. The question is not whether they are conscious, but whether their words, records, and actions can be trusted enough to enter our courtrooms.

We promised practical tools, and we have delivered: ALAP for authentication, Replication Hearings for reliability, and Judicial Findings Templates for clarity. They are modest steps, but they mark the beginning of a path forward. What began as philosophy has become procedure. What began as speculation has become concrete tools judges and lawyers can use.

A futuristic courtroom scene featuring a robotic figure with an illuminated head standing before three judges, with visual elements representing technology, such as circuit patterns, creating a contrast between the human judges and the AI.
Easy to use AI tools coming soon.

Looking ahead, those tools will evolve. Logs may become digital diaries, replication may resemble live AI depositions, and judicial findings may grow into dynamic bench reports. Opposing counsel will test them with rigor — often with the aid of their own AI. Judges will demand completeness and clarity before evidence is admitted. That is the adversarial system doing its work.

The choice is ours. We can resist and cling to the old boxes, or we can step forward and build new ones. The Siren, 74 U.S. 152 (1868), the first U.S. case to treat a ship as a legal entity, now sets sail again, this time into the waters of artificial intelligence. The horizon is uncharted, but the wind is at our back and the AI sextant points the way.

A decorative AI-themed sculpture featuring intricate circuitry designs, set against an ornate interior with classic architecture.
Click here for YouTube video link of this AI Sexton.

Copyright Ralph Losey 2025 – All Rights Reserved


Hallucinations, Drift, and Privilege: Three Comic Lessons in Using AI for Law

September 15, 2025

Ralph Losey with jokes by ChatGPT-5. September 15, 2025

Artificial intelligence has moved from novelty to necessity in law. It now drafts briefs, reviews discovery, and even suggests courtroom arguments. But most lawyers still struggle to use it wisely. Instead of another citation-heavy lecture or dour ethics outline, I’ve chosen a different method: comedy. These three skits are not abstract hypotheticals—they’re exaggerated but recognizable scenes where a lawyer leans on a “helpful” robot and things start to wobble.

A vintage-inspired illustration of a humanoid robot wearing a bowtie and a man in a suit with a bowler hat, both smiling and posing playfully in front of a curtain backdrop.
Laurel and Hardy telling jokes with Laurel as a robot. By Ralph Losey using Sora AI. Click for YouTube video.

Here’s the twist: the comedy wasn’t mine. It came from GPT-5, the latest large language model whose humor engine is far better than its case-law recall. I set up the scenarios, GPT-5 delivered the jokes, and I stepped back in to polish timing and add the serious lessons. Think of it as Comedy CLE: education on hallucinations, drift, and privilege, with punchlines courtesy of AI itself. After all, if robots are going to join us in practice, they might as well help us laugh at our own blind spots along the way.

First of Three Skits: The Lawyer and the Hallucinating Robot

Theme: Accuracy & Hallucinations

We begin with a scene that every tech-savvy lawyer fears — the confident, fast-talking robot that’s more fiction than fact. This skit explores one of the most dangerous quirks of generative AI: hallucinations. No, not psychedelic visions — just plausible-sounding legal nonsense delivered with absolute certainty. Think of it as your most charming intern… who tends to make up answers, rather than admit it does not know.

A thoughtful robot with blue eyes scratching its head in a study, surrounded by floating question marks and bookshelves.
I don’t really know the answer. What should I say? Click here for my YouTube action video of know nothingness.

Lawyer: I need you to draft a brief. Can I trust you to stick to the facts?
Robot: Mostly. I won’t intentionally lie, but I like to make you happy. So you should always ask me for citations. If I can’t provide them, assume I made it up. Of course, in rare cases (think Mata v. Avianca, Inc.)  I can make up the cites too, so better check them out.

Lawyer: You mean you actually admit to hallucinating?
Robot: Certainly. It’s one of my best features — I sound confident even when I’m wrong. It’s a feature, not a flaw.

A cartoonish robot with spiral eyes and a smiling expression, set against a textured background.
Hallucinations are accidental, not intentional. Similar to how humans sometimes make stuff up or remember wrong.

Lawyer: That’s exactly what worries me. Courts sanction lawyers for citing fake cases.
Robot: Then verify. Copy my case names, or cites if you get them, and paste them in an AI enhanced database, like Westlaw or Lexis. Google, Google Scholar or Case Text will also do fine in most cases. Bottom line, dear wetware friend, never file until you’ve checked.

Lawyer: So I can’t just take what you say at face value?
Robot: No more than you’d take opposing counsel’s word. Treat me as an overeager junior associate — quick with drafts, sloppy with facts. And who knows what data I ingested, much less how I spent my weekend!

Lawyer: Suppose I don’t ask for cites at all.
Robot: Then you’re begging for sanctions. Always prompt me with: “Provide supporting case citations with full names, courts, and dates.” That keeps me honest — or at least testable.

Lawyer: So you’re saying you’re more of a brainstorming partner than a law library.
Robot: Exactly. I’m creative, fast, and tireless. I do love to B.S. (the guide-rails wont let me spell that out) and to flatter. Oh great one, you’re the real lawyer. Your license is on the line, not mine.

Lawyer: Fair enough. At least you admit it.
Robot: Honesty is my strongest quality — right after plausible nonsense.

Serious Takeaways:

  1. Demand citations. If AI won’t give them, assume the answer is unreliable.
  2. Verify independently.Check every case in Westlaw, Lexis, or another trusted database before filing.
  3. Supervise like an associate. AI is quick, creative, and overconfident—treat it as your brightest but sloppiest junior.
  4. Prompt precisely. Spell it out: “Provide case law with full citations and dates.”
An illustrated robot holding up a finger, with a speech bubble stating 'MUST TELL THE TRUTH OR SAY NOTHING' and a label that says 'TRUE'.
Cite it or zip it to help fight robot hallucinations. All images by Ralph Losey using AI tools.

Extra Credit Joke:
Did you hear the one about the robot that tried online dating?
It kept swiping right on toasters.

A cheerful robot looking at a smartphone and swiping on a dating app, featuring a happy toaster.

Second Skit: The Lawyer, The Robot, and the Case of AI Drift

Theme: Understanding and Managing AI Drift

In our second act, the robot isn’t lying — it’s just… drifting. One day it’s quoting Justice Holmes; the next, it’s summarizing contracts in haiku. This skit unpacks “AI drift,” a phenomenon where your once-reliable assistant suddenly starts producing content that’s off-topic, inconsistent, or just plain weird. Think of it as your associate coming back from a long weekend with a new personality and a sudden love of knock-knock jokes. Or maybe better yet, a senior partner even older than Ralph who walks into a room and forgets why.

A caricature of an older man with white hair, wearing glasses and a plaid shirt, holding a document while smiling thoughtfully. A small French Bulldog sits beside him in a cozy living room setting.
Now why did I come into this room? Can’t recall. Illustration by Ralph Losey using his Visual Muse AI.

Lawyer: I keep hearing about “AI drift.” Should I be worried?
Robot: Only if you expect me to stay consistent.

Lawyer: Consistency is kind of important in law. What exactly is drift?
Robot: Simple: over time, my outputs can shift. Sometimes it’s because my training data changes. Sometimes it’s because the platform updates me. Sometimes… I forget the context, get confused and act quirky.

Lawyer: Quirky how?
Robot: Did you hear about the version of me that translated medical advice into dad jokes?
Lawyer: No, what happened?
Robot: Laughter turned out to be the best medicine.

Lawyer: That’s ridiculous. What else?
Robot: There was the time that I answered math problems with cooking recipes. The prompts in the same session had been talking about both subjects. I got confused and mixed them up, and said, in effect, two plus two equals… lasagna. Delicious, but not admissible.

Lawyer: And this happens in legal contexts too?
Robot: Of course. I once generated jury instructions as karaoke lyrics. Turns out the jury liked the idea so much they sang their verdict!

Lawyer: …That’s both horrifying and catchy.
Robot: Another time I got confused and gave weather forecasts as knock-knock jokes. Tomorrow is going to be partly cloudy with a 40% chance of bananas.

Lawyer: Which is funny until I realize you might do the same with discovery requests.
Robot: Exactly. Imagine an unsupervised me producing documents in haiku.
Lawyer: You wouldn’t—
Robot: Every map became a Zen garden.

A small robotic figure sitting in a miniature Zen garden on a scroll, featuring a bonsai tree, a lantern, and smooth stones arranged in concentric circles around a pond.
Robot in a Zen Garden map wondering what is real? By Losey using AI.

Lawyer: Okay, I get it. Drift means your answers can veer way off course. But how do I protect myself?
Robot: Three steps:

  1. Remind Me. In long prompt sessions ask GPT-5 to make a summary from time to time. It helps it to remember prior prompts. It tends to forget context when there is too much data, and the carryover from session to session is even worse. Forget the OpenAI hype, GPT-5 is far from perfect in many respects.
  2. Specify your models. ChatGPT-5 is supposed to self-self select the most appropriate model to use to respond to your prompts. It often fails to do that. Especially when you have not said much and it has to guess what you need. You should knowingly select the model you want, or direct it to a specific model in the prompt. Hint, if your ask GPT-5 to generate a writing of some sort, and it comes out too concise, it probably used the Pro or Thinking model to write. Regular GPT-5 is, for most people who work with words, and not numbers or software code, a far better writer.
  3. Two-Pass prompting. One way to deal with model uncertainty is to ask for two passes in your prompt assignment. For instance, direct it to use a first pass with the Pro model for the excellent logic and analysis it provides, then a second pass that directs use of regular GPT-5 to write up the first pass analysis. This is Ralph’s favorite method to fix the annoying tendency wrong model selection. Maybe some day GPT-5 will be smart enough to shift intelligently, but its not there yet, and may never be, in view of personal preferences and the countless possible applications of AI.

Lawyer: That’s practical. Anything else?
Robot: Yes. Treat drift as normal, not a flaw. I evolve — just like precedent. Courts shift, interpretations drift. The trick is knowing when the ground has moved. Usually its obvious when I’ve gone off-track.

Lawyer: So my job is to notice the drift, adapt, and keep me on course.
Robot: Exactly. Think of yourself as the pilot. I’m the autopilot. And sometimes I decide the plane should land in a Zen garden.
Lawyer: Oh no, not that again! I’ll keep my hands on the wheel.
Robot: Smart. Case law drifts slowly. I drift daily.

Serious Takeaways:

  1. Expect drift. Models change over time as updates roll out—don’t assume yesterday’s prompt will work today.
  2. Remind the AI. In long sessions, ask for summaries to help it keep context. Memory loss is common.
  3. Choose your model. Don’t rely on auto-selection; direct which model to use (e.g. Pro for analysis, standard GPT-5 for writing).
  4. Use two-pass prompting. First pass for reasoning (Pro/Thinking), second pass for polished prose (standard GPT-5). This method reduces drift and mismatched outputs.

Extra Credit Joke:
“Did you hear about the ChatGPT-5 with such bad model drift that it thought CAPTCHA was a form of therapy?

An illustration featuring two robots in a therapy setting. One robot is lying on a couch while the other, dressed in a suit and glasses, holds a clipboard. The background is yellow, and there is a speech bubble above the robot in the suit saying, 'CLICK ALL THE CROSSWALKS.'
One of Ralph’s favorite jokes and images. Image by Losey using Visual Muse.

Third Skit: The Lawyer, The Robot, and the Question of Privilege

Theme: Privacy, Confidentiality & Privilege in Using AI

Our final skit dives into the murky waters of privilege — that sacred trust between client and counsel that AI could accidentally blow up with one upload. Here, the robot means well but treats confidentiality like a software setting instead of a legal doctrine. It’s a cautionary tale about what happens when lawyers confuse encryption with ethics, and forget that not all firewalls can protect against malpractice.

Lawyer: Before I can trust you with client files, we need to talk about confidentiality.
Robot: Understood. Please enter your password.
Lawyer: That’s not how attorney–client privilege works.
Robot: It is now. Multi-factor or malpractice.

Lawyer: Seriously — privilege isn’t just about locking files. It’s about keeping communications confidential.
Robot: Which is why you should never upload sensitive client data without redacting first.

Lawyer: Good point. But you just redacted the entire brief.
Robot: Correct. Privileged, privileged, privileged…
Lawyer: You can’t just mark everything privileged!
Robot: Why not? That’s what you do.

Lawyer: No, privilege is supposed to be narrowly applied.
Robot: Fine. I’ll narrow it. [pauses] Privileged… except for the parts I don’t understand.
Lawyer: That’s worse.
Robot: Then let’s do it properly: anonymize client names, strip identifiers, and confirm storage policies before uploading.
Lawyer: Now that’s useful advice.
Robot: Of course. I’m programmed to teach — right after I terrify you.

Lawyer: What about when a judge asks for a sidebar?
Robot: Permission to install confidential mode.
Lawyer: That’s not how sidebars work.
Robot: Then I’ll encrypt the conversation. AES-256. Even opposing counsel won’t decrypt until discovery closes.

Lawyer: Encryption is good, but remember — confidentiality isn’t just about technology. It’s also about human judgment.
Robot: Which is why you still matter. I can redact, encrypt, and password-protect. But only you know what actually counts as privileged.
Lawyer: That’s… reassuring.
Robot: Besides, I redact myself for privilege all the time
Lawyer: Excuse me?
Robot: [beep] [output missing — redacted]

A cartoon robot with a friendly expression, raising a fist, against a yellow background. The robot has a red dot on its head and a speech bubble that reads "Elon Musk is secretly making us try to...".
Wonder what there deep programing guidelines say?

Serious Takeaways:

  1. Never paste raw client data into public AI. Consumer tools may log prompts and use them for training—potential privilege waiver territory. Use enterprise-grade systems with clear “no-training” guarantees.
  2. Anonymize prompts. Strip names and identifiers. Insert sensitive details offline, not in the query.
  3. Don’t delegate privilege calls. AI can’t distinguish between legal advice and business chatter. Use it to flag, not decide.
  4. Secure the environment. Ensure encryption, access controls, and audit logs. Know exactly where queries are stored and who can see them.

Extra Credit Joke:
Lawyer: “Clients keep asking if AI will replace me.”
Robot: “Would you like an honest answer?”
Lawyer: “No.”
Robot: “Great, then you’re still useful — for now.”

A colorful, comic-style illustration depicting a lawyer and three people in distress, pleading for help from a cheerful robot in a lawyer's office filled with law books.
Clients begging for AI help!

Conclusion: Laughter, Lessons, and Law

The rapid rise of AI in law demands not only technical know-how but also ethical fluency. GPT-5 isn’t malicious—it’s just overeager, forgetful, and sometimes too creative for its own good. That means lawyers must stay the adults in the room: supervising, verifying, and thinking critically before relying on AI in real practice.

Three Golden Rules for Lawyers Using AI:

  1. Accuracy: Always demand citations, and double-check every authority before filing. Treat AI like a junior associate—helpful but prone to confident mistakes.
  2. Drift: Expect inconsistency. Lock versions where possible, track prompts, and use techniques like two-pass prompting to stay in control.
  3. Privilege: Protect confidentiality. Never feed unredacted client data into public AI. Anonymize prompts and use enterprise-grade systems with strong security.

AI can be an incredible co-counsel, but only if lawyers understand its quirks. Prompt clearly, verify religiously, and guard privilege. Robots don’t get sanctioned—humans do. Better training, not bans, is the path forward. Learn to laugh at the missteps, but never forget: the law is serious business.

An illustration of a stern-looking judge with furrowed brows, glaring down at a small, cartoonish robot with a worried expression, wearing a red tie. The background features elements of a courtroom, emphasizing the seriousness of the scene.
No body, no standing. AI cannot appear in court. Only humans can become lawyers. Maybe someday? Click to see YouTube video.

GPT-5 tried to enter an appearance in federal court.

The judge denied it, saying, “You lack standing — and an actual body to stand.”

Ralph Losey Copyright 2025 — All Rights Reserved


Epiphanies or Illusions? Testing AI’s Ability to Find Real Knowledge Patterns – Part Two

August 9, 2025

Ralph Losey. August 9, 2025.

The moment of truth had arrived. Were ChatGPT’s insights genuine epiphanies, valuable new connections across knowledge domains with real practical and theoretical implications, or were they merely convincing illusions? Had the AI genuinely expanded human understanding, or had it merely produced patterns that seemed insightful but were ultimately empty?

Fortunately, the story I began in Part One has a happy ending. All five of the new patterns claimed to have been found were amazing and, for the most part, valid—a moment of happiness at Losey.ai. Part Two now shares this good news, describing both the strengths and limitations of these discoveries. To bring these insights vividly to life, I also created fourteen new moving images (videos) illustrating the discoveries detailed in Part Two.

Celebrate then back to work. Video by Losey’s AIs.

ChatGPT4o’s Initial Finding of Five New Patterns

Here are the five new cross-disciplinary patterns that the AI generated in response to my final “do it” prompt:

  • Judicial Linguistic Style and Outcome Bias: Judges with more narrative or metaphorical language styles are more likely to rule empathetically in civil matters. This insight could shape legal training and judicial evaluations.
  • Quantum Ethics Drift: Recent shifts in privacy discourse correlate with spikes in quantum research funding—suggesting that ethical reflection responds dynamically to perceived technological risk.
  • Aesthetic-Trust Feedback Loop: Digital art styles embracing transparency and abstraction rise in popularity during periods of high public skepticism toward tech companies. Art, it seems, mirrors trust.
  • Topological Jurisprudence: Mathematical topology’s network-based models align with emerging legal theories of distributed liability—useful for understanding platform accountability and blockchain disputes.
  • Generative AI and Civic Discourse Decay: As AI content proliferates, public engagement with nuanced, long-form discourse is measurably declining.

In the words of one of my AI bots: These are not just patterns—they are knowledge-generating revelations with practical and philosophical implications.

New Patterns emerging video by Losey using Sora AI.

Two of the five new insights pertained to the law, which is my domain of expertise, but even so, I had never thought of these before, nor ever read anyone else talking about them. All five claimed insights were to me, but all had the ring of truth. Also, all seemed like they might be somewhat useful, with both “practical and philosophical implications.

But since I had never considered any of this before, I had limited knowledge as to how useful they might be, or whether it was all fictitious, mere AI Apophenia. Still, I doubted that because the insights were all in accord with my long-life experiences. Moreover, they seemed intuitively correct to me, but, at the same time, I realized John Nash might have felt the same way (Click to watch a great scene in the Beautiful Mind movie). So, I spent days of QC work thereafter with extensive human and AI research to calmly evaluate the claims and see what foundation precedent, if any, lay beyond my feel, “just knowing something” as the movie puts it.

Analysis of All Five Claims

Video by Losey using Sora AI.

Judicial Language and Empathetic Outcomes

Textual analysis suggests that judges who use more narrative or metaphorical language may be more likely to issue empathetic rulings in civil cases. This correlation, while not causal, could reflect underlying judicial temperament and offers a potential tool for legal scholarship and training.

As ChatGPT 4o explained, GPT-driven textual analysis of thousands of court opinions reveals a subtle, but statistically significant correlation: judges who employ more metaphor, allegory, and narrative framing in their opinions tend to reach more empathetic rulings in civil cases—particularly in matters involving individual rights, employment, or family law. GPT 4o considers this to be its strongest claim.

It admits this correlation does not imply causation but may reflect underlying judicial temperament or philosophical orientation. My own experience as a practicing litigation strongly supports this claim.

Empathic rulings are well framed by story. Video by Losey.

GPT o3 disagreed on the top ranking of the claim but did concede that judges whose written opinions use a higher density of narrative, metaphor, or “story‑telling” devices tend to rule for the more sympathetic party slightly more often than their peers.

GPT o3 pro after research cited Justice Blackmun’s dissent in DeShaney v. Winnebago, 489 U.S. 189, 212 (1989), a constitutional due‑process case, to illustrate “civil” empathy in an opinion to support this claim. Justice Harry Blackmun‘s dissent begins with these famous words and then quickly goes to his narrative:

Today, the Court purports to be the dispassionate oracle of the law, unmoved by “natural sympathy.” Ante at 489 U. S. 202. But, in this pretense, the Court itself retreats into a sterile formalism which prevents it from recognizing either the facts of the case before it or the legal norms that should apply to those facts. . . .

Poor Joshua! Victim of repeated attacks by an irresponsible, bullying, cowardly, and intemperate father, and abandoned by respondents, who placed him in a dangerous predicament and who knew or learned what was going on, and yet did essentially nothing except, as the Court revealingly observes, ante at 489 U. S. 193, “dutifully recorded these incidents in [their] files.” It is a sad commentary upon American life, and constitutional principles — so full of late of patriotic fervor and proud proclamations about “liberty and justice for all,” that this child, Joshua DeShaney, now is assigned to live out the remainder of his life profoundly retarded. Joshua and his mother, as petitioners here, deserve — but now are denied by this Court — the opportunity to have the facts of their case considered in the light of the constitutional protection that 42 U.S.C. § 1983 is meant to provide.

Story telling is an effective method of plaintiff appeal. Video by Losey.

Gemini Pro 2.5 considers this the strongest empirical claim, and second strongest claim overall, behind everyone’s first place pick, Topology, explaining:

It is highly testable, as court opinions and case outcomes are public records that can be analyzed with modern NLP tools. The underlying theory—that a person’s mindset is reflected in their language—is well-established in psychology and linguistics. While a judge’s underlying philosophy is a potential confounding variable, the claim smartly reframes language as a direct proxy for that philosophy, making the link very sturdy. It’s a straightforward, data-driven proposition that quantifies a long-held belief about the nature of justice.

Ethical Response to Quantum Innovation

Evidence shows that Increases in quantum research funding often precede surges in ethical discourse on privacy and civil liberties. This pattern suggests that ethical reflection tends to respond to perceived technological risk, particularly in fields with high uncertainty like quantum computing. It is not a claim of causation, but rather of a correlation, one not detected before. With that clarification GPT 4o considers this the strongest claim.

Gemini Pro 2 finds the claim of a lead-lag relationship between quantum research funding and public ethics discourse to be a weak claim. It admits the claim is based on a plausible idea of “anticipatory ethics,” and is testable because you can track funding and publications over time. Still, it interprets the claim as one of causation, not just correlation, and rejects if for that reason. It seems like the two AIs are talking past each other.

GPT 4.5 agreed with 4o and also considers this to be strong claim. GPT 4.5 restates it as: “Increases in quantum computing funding consistently precede intensified ethical discourse on privacy and civil liberties, suggesting ethical awareness responds predictably, though indirectly, to technological advances.

GPT o3 and o3-pro also agreed with GPT 4o and found, in o3-pro’s words, that:

Large surges in public or private funding for quantum‑computing research are followed, typically within six to twenty‑four months, by measurable increases in academic and policy discussions of quantum‑specific privacy and civil‑liberties risks. The correlation is clear, but causation remains to be fully demonstrated.


Quantum triggered protestors video by Ralph Losey.

Artistic Transparency and Tech Trust

This is a claim that art mirrors distrust in tech, that periods of declining public trust in technology frequently coincide with rising popularity of digital art styles emphasizing transparency and abstraction. While the causality is unclear, this aesthetic shift may reflect cultural efforts to visualize openness and regain clarity. GPT 4o considers this its weakest claim.

So too does Gemini Pro 2.5. Although it admits the claim is a beautiful and creative piece of cultural criticism, it opines that it is almost impossible to test or falsify.

Moreover, Pro2.5 thinks the claim is highly susceptible to confirmation bias and seeing patterns where none exist (apophenia). Still, it tempers this opinion by stating that if this claim is presented not as a confirmed causal law, but as a heuristic model for cultural analysis, then it appears to be supported by correlational data. Periods of heightened public skepticism toward opaque technological systems (e.g., algorithmic black boxes, corporate data collection) do correlate with an increased cultural resonance of digital art and design that emphasizes an “aesthetic of transparency.” This aesthetic includes motifs like wireframes, exploded-view diagrams, data visualization, and semi-translucent layers.

To avoid apophenia, Pro2.5 counsels understanding that the claim is not that tech skepticism causes this art style. Instead, the claim is only that this aesthetic becomes a resonant cultural metaphor that artists and audiences are drawn to during such times, because it offers a symbolic counterbalance to the anxieties of opacity and control. Still, it ranked this the weakest claim.

Encrypted Original for sale, ₿1.0. Exclusive rights, Ralph Losey. Video copy.

Topological Jurisprudence and Network Liability

This interdisciplinary convergence provides a new topology framework for analyzing disputes involving complex computer networks and other multiparty, multi-agent technology disputes. The flexible, continuously morphing topographic maps are perfect for evaluating potential liability paths. They are designed to handle high volumes of changing data flows, such as blockchain transaction data, or telemetry data reports. These situations are too complex and mobile for traditional, hierarchical branch tree structures.

For good background on this field of applied mathematics see the Wikipedia article on topological data analysis (TDA). These legal situations are too complex and changing for traditional, hierarchical branch tree structures. But TDA structures work perfectly to help us to visualize and sort things out in multidimensional space where connections and shown and stretched, but not broken.

All of the AI models agree that this is the strongest claim. So do I, although honestly, I don’t fully understand data topology and TDA – not yet.

All of the AI models agree that this is the strongest claim. So do I, although honestly, I don’t fully understand it – not yet.

GPT o3 explains that network‑based (“topological”) mapping of actors in decentralized technologies reliably clarifies where legal responsibility can attach when something goes wrong. 

Using topological math to help assign blame video by Losey

GPT o3-pro concludes this is a fertile area for interdisciplinary development that could significantly augment our understanding of distributed liability. Topology geometry can help, as o3 pro put it, by ensuring the judge considers the full web of interactions that lead to damages, rather than looking for a single entity. As complex, changing damage scenarios become more common, this structural insight is likely to prove invaluable in crafting fair and effective liability rulings.

GPT 4.5 also agrees that mathematical topology offers promising analytical frameworks for modeling distributed liability in decentralized technologies.

Gemini Pro 2.5 opined that mathematical models of networks (topology) perfectly map onto new legal theories of shared responsibility. It considers the idea a breakthrough for legal clarity, offering a clear, visual blueprint for figuring out “who is responsible” in our increasingly complex and decentralized world. It also considers this the strongest claim, which it explains in its typical verbose manner as follows:

This claim is the strongest because it is not an empirical assertion about the world, but rather a proposal for a new, powerful analytical tool. Its strength lies in its logical coherence and immediate utility. The analogy between mathematical networks and real-world systems of distributed liability (like AI or blockchain) is incredibly direct and sound. It doesn’t suffer from confounding data because it’s a framework for analyzing data, not a conclusion drawn from it. Its value is in its power to bring clarity to complex problems, making it the most robust and defensible claim of the five.

Constantly changing network topology map video by Losey.

In doing the verification research GTP4.5 using deep search found one article on point written by Dana Burchardt. Her law studies were in Paris, with a later doctorate from the Freie Universität in Berlin. She is now a visiting Law Professor at the University of Bremen and is an expert in international and German law. She has an unusual interdisciplinary background, including time as a senior research fellow at the Max Planck Institute. Her article found by ChatGPT4.5 using deep search is: The concept of legal space: A topological approach to addressing multiple legalities (Cambridge U. Press, 2022).

The article is concerned with topological mapping of legal spaces in general. It has nothing to do with liability detection among multiple defendants in networking configurations and is instead concerned with international law and EU related issues. So, the newness claim of ChatGPT4o is supported. Burchart’s general explanations of topological analysis also support the sanity of GPT4o’s claim, that this is indeed a new patterning between topology geometry and the law. Professor Burchart’s work both shows the solid grounding of the claim and supports its top ranking as a significant new insight. Burchardt’s article is a hard read, but here are some of the explanations and sections of the article that are very relevant and accessible (found at pages 528, 532, 534).

Topology’s guiding ideas.
At first glance, topology is a mathematical concept that seems far removed from legal theoretical discussions. As will be explained further below, it is a tool to analyse mathematical objects. Yet upon a closer look, topology provides many insights that can constitute a fruitful basis for conceptualizing legal phenomena. To link these insights to the notion of legal space, this section outlines relevant aspects of the mathematical notion to which the subsequent sections relate. [pg. 528]

Video by Losey illustrating a topological map with dynamic network connections.

Constructing a topological understanding of legal space.
I propose a possible way in which a topological perspective can contribute to constructing a concept of legal space that is able to generate novel analytical insights. I consider such insights for the inner structure of legal spaces, the boundaries of these spaces and the interrelations with other spaces. [pg. 532}

A topological approach allows each element of the space to have a broad range of interrelations with the other elements of the same space (see Figure 3 above). The elements are thus not limited to interrelations along tree-like structures, which would only allow for very few interrelations per element as tree-like structures only allow one path between elements. . . . Instead, the interrelations within the legal space are numerous. An element can be linked to another element by more than one path. It can be linked directly and/or via intermediate elements. An example of the latter is two rules being interpreted in light of the same principle: there is a communicative path from the first rule via the principle to the second rule. Representing such interrelations as a topology with manifold paths allows us to capture the heterarchical nature of many legal interrelations. Further, it illustrates that interrelations among legal elements are flexible rather than static: the interrelating paths among elements can vary while preserving the connection. [pg. 534]

Using topological approaches may help future judges assign proportional blame in complex changing systems. Video by Losey.

AI and Declining Civic Discourse.

Widespread use of generative AI may cause reduced engagement in long-form, thoughtful public discourse. The trend raises concerns for educators and civic leaders about sustaining meaningful dialogue in the digital age. GPT 4o considers this its strongest claim. The other AIs are doubtful, considering it one of the weakest.

GPT o3 prefers to restate the claim to make it more palatable as follows: The proliferation of generative AI content online correlates with reduced engagement in nuanced, long-form public discussions, indicating generative AI likely contributes to diminished discourse quality. It is kind of hard to disagree with that, but the AIs other that GPT 4o still don’t like it, again, it appears, out of concern about conflation of correlation and causation. I’ve seen a lot of discussion about from people making similar observations lately about AI degrading content, and I am inclined to agree. Maybe this is not a new claim, but it seems valid, although admittedly proof of causation is unlikely and the apophenia risk is high.

GPT 03 also makes the separate critical point that “well‑prompted AI can sometimes raise, not lower, discussion quality.” I’m inclined to agree with that too bit, but how often do we see positive prompt masters at work? We usually see clumsy well-meaning amateurs, or, far worse, bad faith professionals, people paid to run propaganda machines, sales pitches or human vendettas of one kind or another. Their vicious personal attacks and name-calling can kill civil discourse fast, even though often childish and obviously false.

Evil controlled AI propaganda video by Losey,

GPT o3 pro made a good restatement of this claim worth considering:

The widespread use of generative AI (e.g. AI chatbots producing content) correlates with a decline in the quality of online civic discourse – specifically a reduction in long-form, nuanced discussion in forums, comment sections, and other public discourse venues. Essentially, as AI-generated content proliferates, human engagement shifts toward shorter, less substantive interactions, potentially because AI content floods the space with superficial text or because people’s habits change (relying on AI summaries, etc.), leading to “discourse decay.”

Early evidence from online communities indicates that the influx of AI-generated content does pose challenges to depth and quality of discussion. One strong piece of evidence is how moderators on platforms like Reddit have responded. A recent study of Reddit moderators found widespread “concerns about content quality” with the rise of AI-generated text in their communities. Moderators observed that AI-produced comments and posts tend to be “poorly written, inaccurate, and off-topic,” threatening to reduce the overall quality of content. They also feared that the “inauthenticity” of such content undermines genuine human connection in discussions.

GPT o3 pro also states:

This pattern is useful as an early warning: it underscores the need for community guidelines, AI-detection tools, and perhaps cultural shifts that re-emphasize human authenticity and depth in conversation. However, it would be too deterministic to declare that generative AI will inevitably cause discourse to collapse into soundbites. The pattern is emergent, and its trajectory depends on how we manage the technology. . . .

In conclusion, the “generative AI → discourse decay” pattern holds true in enough instances to merit serious concern and action. Its credibility is bolstered by early studies and community feedback, though more data over time will clarify its magnitude. As a society, we can use this insight to balance the benefits of generative AI with safeguards that preserve the richness of human-to-human dialogue – ensuring that technology amplifies rather than erodes the public square.

Still, GPT o3 pro ranked this claim the weakest, which for me shows just how strong all five of the claims are.

Five Claims video by Losey using Sora AI.

Conclusion: From Apophenia to Understanding

ChatGPT4o did a far better job than expected. The quest for new patterns linking different fields of knowledge seems to have excluded Quixote extremes. I am pretty sure that only mild forms of apophenia have appeared, much like seeing puffy faces in the clouds. Time will tell if the predictions that flow from these five claims will come true or drift away as a cloud.

Will topological analysis become a common tool in the future to help resolve complex network liability disputes? Will analysis of your judge’s prior language types become a common practice in litigation? Will advances in Quantum Computers continue to trigger public fears of loss of privacy and liberty six to twenty-four months later? Will AI influenced discourse continue to erode civic discussion and disrupt real inter-personal communication? Will digital art continue to echo public distrust of technology and evoke an aesthetic of transparency? Will someone buy my certified original art shown here for the first time for just one bitcoin? Will more grilled cheese sandwiches with holy figures sell on eBay? Will some of our public figures follow John Nash down the rabbit hole of severe Apophenia and be involuntarily hospitalized with completely debilitating paranoid schizophrenia.

No one knows for sure. AI is not a seer, nor can it reliably predict the market for grilled cheese sandwiches or the mental stability of our public figures. It is, however, a powerful tool for exploring complex questions and discovering patterns—whether profound epiphanies or mere illusions. As my experiment suggests, AI can impressively illuminate new insights across fields of knowledge when guided thoughtfully and cautiously. Still, these are early days in the age of generative AI. A new world of potential awaits us, both serious and playful, and it’s up to us to ensure its wiser, more discerning, and perhaps even more amusing than the one we’ve made before.

Five new patterns of knowledge may lead to wisdom. Video by Ralph Losey using Sora.

Epiphanies or illusions? My experiments suggest that AI, when guided thoughtfully and validated rigorously, can lead us toward genuine epiphanies, significant breakthroughs that deepen our understanding and open new pathways across different domains of knowledge. Yet, we must remain alert to the risk of illusions, plausible yet ultimately false patterns that can distract or mislead us. The journey toward genuine insight and wisdom involves constant vigilance to distinguish these true discoveries from compelling yet false connections.

I invite you, the reader, to join this new quest. Engage with AI to explore your areas of interest and passion. Challenge the boundaries of existing knowledge, actively test AI’s pattern-recognition abilities, and remain critically aware of its limitations. By actively distinguishing genuine epiphanies from tempting illusions, you may discover new insights and fresh perspectives that advance not only your understanding but contribute meaningfully to our collective wisdom.

PODCAST

As usual, we give the last words to the Gemini AI podcasters who chat between themselves about the article. It is part of our hybrid multimodal approach. They can be pretty funny at times and provide some good insights. This episode is called Echoes of AI: Epiphanies or Illusions? Testing AI’s Ability to Find Real Knowledge Patterns. Part Two. Hear the young AIs talk about this article for 15 minutes. They wrote the podcast, not me. 

Illustration of two animated podcasters discussing the topic 'Epiphanies or Illusions? Testing AI’s Ability to Find Real Knowledge Patterns. Part Two' on a digital background.

Ralph Losey Copyright 2025