I used Google’s NotebookLM to analyze my last article, Cross-Examine Your AI: The Lawyer’s Cure for Hallucinations. I started with the debate feature, where two AIs have a respectful argument about whatever source material you provide, here my article. The debate turned out very well (see below). The two debating AI personas made some very interesting points. The analysis was good and hallucination free.
Then just a few prompts and a half-hour later, Google’s NotebookLM had made a Podcast, a Slide Deck, a Video and a terrific Infographic. NotebookLM can also make expanding mind-maps, reports, quizzes, and even study flash-cards, all based on the source material. So easy, it seems only right that I make them available to readers to use, if they wish, in their own teaching efforts for whatever legal related group they are in. So please take this blog as a small give-away.
The back-and-forth argument in this NotebookLM creation lasts 16 minutes, makes you think, and may even help you to talk about these ideas with your colleagues.
I also liked the podcast created by NotebookLM with direction and verification on my part. The AI write the words, no time. It seems accurate to me and certainly has no hallucinations. Again, it is a fun listen and comes in at only only 12.5 minutes. These AIs are good at both analysis and persuasion.
If that were not enough, NotebookLM AI also made a 14-slide deck to present the article. The only problem is that it generated a PDF file, not a powerpoint format. Proprietary issues. Still, pretty good content. See below.
AI Video
They also made a video, see below and click here for the same video on YouTube. It is just under seven minutes and has been verified and approved, except for its discussion of thePark v. Kim, case, which it misunderstood and yes, hallucinated the holding at 1:38-1:44. The Google NotebookLM AI said that the appeal was dismissed due to AI fabricated cases, whereas, in fact, the appeal upheld the lower court’s dismissal because of AI fabricated cases filed in the lower court.
Rereading the article it is easy to see how Google’s AI made that mistake. Oh, and to prove how carefully I checked the work, the AI misspelled “cross-examined” at 6:48 in the video: it only used one “s” i.w. – cros-examined (horrors). If I missed anything else, please let me know. I’m only human.
Except for that error, the movie was excellent, with great graphics and dialogue. I especially liked this illustration of the falling house of cards to show the fragility of AI’s reasoning when it fabricates. I wish I had thought of that image.
Screenshot of one of the images in the video at 4:49
Even though the video was better than I could have created, and took the NotebookLM AI only a minute to create, the mistakes in the video show that we humans still have a role to play. Plus, do not forget, the AI was illustrating and explaining my idea, my article; although admittedly another AI, ChatGPT-5.2, helped me to write the article. Cross-Examine Your AI: The Lawyer’s Cure for Hallucinations.
Anyone is welcome to download and use the slide deck, the article itself, Cross-Examine Your AI: The Lawyer’s Cure for Hallucinations, the audio podcast, the debate, the infographic and the video to help them make a presentation on the use of AI. The permission is limited to educational or edutainment use only. Please do not change the article or audio content. But, as to the fourteen slides, feel free to change them as needed. They seem too wordy to me, but I like the images. If you use the video, serve popcorn; that way you can get folks to show-up. It might be fun to challenge your colleagues to detect the small hallucination the video contains. Even if they have read my article, I bet many will still not detect the small error.
Here is the infographic.
Infographic by NotebookLM of my article. Click here to download the full size image.
Ralph Losey Copyright 2025 — All Rights Reserved, except as expressly noted.
I. Introduction: The Untested Expert in Your Office
AI walks into your office like a consulting expert who works fast, inexpensively, and speaks with knowing confidence. And, like any untested expert, is capable of being spectacularly wrong. Still, try AI out, just be sure to cross-examine it before using the work-product. This article will show you how.
Want AI to do legal research? Find a great case on point? Beware: any ‘Uncrossed AI’ might happily make one up for you. [All images in this article by Ralph Losey using AI tools.]
Lawyers are discovering AI hallucinations the hard way. Courts are sanctioning attorneys who accept AI’s answers at face value and paste them into briefs without a single skeptical question. In the first, Mata v. Avianca, Inc., a lawyer submitted a brief filled with invented cases that looked plausible but did not exist. The judge did not blame the machine. The judge blamed the lawyer. InPark v. Kim, 91 F.4th 610, 612 (2d Cir. 2024), the Second Circuit again confronted AI-generated citations that dissolved under scrutiny. Case dismissed. French legal scholar Damien Charlotin has catalogued almost seven hundred similar decisions worldwide in his AI Hallucination Cases project. The pattern is the same: the lawyer treated AI’s private, untested opinion as if it were ready for court. It wasn’t. It never is.
Never accept research or opinions before you skeptically cross-examine the AI.
The solution is not fear or avoidance. It is preparation. Think of AI the way you think of an expert you are preparing to testify. You probe their reasoning. You make sure they are not simply trying to agree with you. You examine their assumptions. You confirm that every conclusion has a basis you can defend. When you apply that same discipline to AI — simple, structured, lawyerly questioning — the hallucinations fall away and the real value emerges.
This article is not about trials. It is about applying cross-examination instincts in the office to control a powerful, fast-talking, low-budget consulting expert who lives in your laptop.
Click here to see video on YouTube of Losey’s encounters with unprepared AIs.
II. AI as Consulting Expert and Testifying Expert: A Hybrid Metaphor That Works
Experienced litigators understand the difference between a consulting expert and a testifying expert. A consulting expert works in private. You explore theories. You stress-test ideas. The expert can make mistakes, change positions, or tell you that your theory is weak. None of it harms the case because none of it leaves the room. It is not discoverable.
Once you convert that same person into a testifying expert, everything changes. Their methodology must be clear. Their assumptions must be sound. Their sources must be disclosed. Their opinions must withstand cross-examination. Their credibility must be earned. Discovery of them is open subject to minor restraints.
AI Should always start as a secret consulting expert. It answers privately, often brilliantly, sometimes sloppily, and occasionally with complete fabrications. But the moment you rely on its words in a brief, a declaration, a demand letter, a discovery response, or a client advisory, you have promoted that consulting expert to a testifying one. Judges and opposing counsel will evaluate its work that way — even if you didn’t.
This hybrid metaphor — part expert preparation, part cross-examination — is the most accurate way to understand AI in legal practice. It gives you a familiar, legally sound framework for interrogating AI before staking your reputation on its output.
Working with AI and carefully examining its early drafts.
III. Why Lawyers Fear AI Today: The Hallucination Problem Is Real, but Preventable
AI hallucinations sound exotic, but they are neither mysterious nor unpredictable. They arise from familiar causes:
• lack of factual context
• ambiguous or overly broad prompts
• overgeneralization from incomplete data
• gaps or bias in the training set
• the model’s instinct to infer patterns that are not really there
• its reluctance to admit “I don’t know”
• its tendency toward flattery and agreement
Anyone who has ever supervised an over-confident junior associate will recognize these patterns or response. Ask vague questions and reward polished answers, and you will get polished answers whether they are correct or not.
The problem is not that AI hallucinates. The problem is that lawyers forget to interrogate the hallucination before adopting it.
Never rely on an AI that has not been cross examined.
Both lawyer and judicial frustration is mounting. Charlotin’s global hallucination database reads like a catalogue of avoidable errors. Lawyers cite nonexistent cases, rely on invented quotations, or submit timelines that collapse the moment a judge asks a basic question. Courts have stopped treating these problems as innocent misunderstandings about new technology. Increasingly, they see them as failures of competence and diligence.
The encouraging news is that hallucinations collapse under even moderate questioning. AI improvises confidently in silence. It becomes accurate under pressure.
That pressure is supplied by cross-examination.
Team approach to AI prep works well, including other AIs.
IV. Five Cross-Examination Techniques for AI
The techniques below are adapted from how lawyers question both their own experts and adverse ones. They require no technical training. They rely entirely on skills lawyers already use: asking clear questions, demanding reasoning, exposing assumptions, and verifying claims.
The five techniques are:
Ask for the basis of the opinion.
Probe uncertainty and limits.
Present the opposing argument.
Test internal consistency.
Build a verification pathway.
Each can be implemented through simple, repeatable prompts.
AI developers use the word “mechanism.” Lawyers use reasoning, methodology, procedure, or logic. Whatever the label, you need to know how the model reached its conclusion.
Instead of asking, “What’s the law on negligent misrepresentation in Florida?” ask:
“Walk me through your reasoning step by step. List the elements, the leading cases, and the authorities you are relying on. For each step, explain why the case applies.”
This produces a reasoning ladder rather than a polished paragraph. You can inspect the rungs and see where the structure holds or collapses.
Ask AI explicitly to:
identify each reasoning step
list assumptions about facts or law
cite authorities for each step
rate confidence in each part of the analysis
If the reasoning chain buckles, the hallucination reveals itself.
Click here for short YouTube video animation about reasoning cross.
2. Probe Uncertainty and Limits
AI tries to be helpful and agreeable. It will give you certainty, even though it is fake. The original AI training data from the Internet never said, “I don’t know the answer.” So now you have to train your AI in prompts and project instructions to admit it does not know. You must demand honesty. You must demand truth over agreement with your own thoughts and desires. Repeatedly specify to AI in instructions to admit when it does not know the answer, or is uncertain. Get it to explain to you what is does not know; to explain what it cannot provide citations to support. Get it to reveal the unknowns.
Most AIs do not like to admit they don’t know.Do you?
Ask your AI:
“What do you not know that might affect this conclusion?”
“What facts would change your analysis?”
“Which part of your reasoning is weakest?”
“Which assumptions are unstated or speculative?”
Good human experts do this instinctively. They mark the edges of their expertise. AI will also do it, but only when asked.
Click here for YouTube animation of AI cross of its unknowns.
3. Present the Opposing Argument
If you only ask, “Why am I right?” AI will gladly tell you why you are right. Sycophantism is one of its worst habits.
Counteract that by assigning it the opposing role:
“Give me the strongest argument against your conclusion.”
“How would opposing counsel attack this reasoning?”
“What weaknesses in my theory would they highlight?”
This is the same preparation you would do with a human expert before deposition: expose vulnerabilities privately so they do not explode publicly.
Quality control by counter-arguments. Click here for short YouTube animation.
4. Test Internal Consistency
Hallucinations are brittle. Real reasoning is sturdy.
You expose the difference by asking the model to repeat or restructure its own analysis.
“Restate your answer using a different structure.”
“Summarize your prior answer in three bullet points and identify inconsistencies.”
“Explain your earlier analysis focusing only on law; now do the same focusing only on facts.”
If the second answer contradicts the first, you know the foundation is weak.
This is impeachment in the office, not in the courtroom.
Click here for YouTube animation on contradictions.
5. Build a Verification Pathway
Hallucinations survive only when no one checks the sources.
Verification destroys them.
Always:
read every case AI cites and make sure the court cited actually issued the opinion (of course, also check case history to verify it is still good law)
confirm that the quotations appear in the opinion (sometime small errors creep in)
check jurisdiction, posture, and relevance (normal lawyer or paralegal analysis)
verify every critical factual claim and legal conclusion
This is not “extra work” created by AI. It is the same work lawyers owe courts and clients. The difference is simply that AI can produce polished nonsense faster than a junior associate. Overall, after you learn the AI testing skills, the time and money saved will be significant. This associate practically works for free with no breaks for sleep, much less food or coffee.
Your job is to slow it down. Turn it off while you check its work.
Always carefully check the work of your AIs.
V. How Cross-Examination Dramatically Reduces Hallucinations
Cross-examination is not merely a metaphor here. It is the mechanism — in the lawyer’s meaning of the word — that exposes fabrication and reveals truth.
Consider three realistic hypotheticals.
1. E-Discovery Misfire
AI says a custodian likely has “no relevant emails” based on role assumptions.
You ask: “List the assumptions you relied on.”
It admits it is basing its view on a generic corporate structure.
You know this company uses engineers in customer-facing negotiations.
Hallucination avoided.
2. Employment Retaliation Timeline
AI produces a clean timeline that looks authoritative.
You ask: “Which dates are certain and which were inferred?”
AI discloses that it guessed the order of two meetings because the record was ambiguous.
You go back to the documents.
Hallucination avoided.
3. Contract Interpretation
AI asserts that Paragraph 14 controls termination rights.
You ask: “Show me the exact language you relied on and identify any amendments that affect it.”
It re-reads the contract and reverses itself.
Hallucination avoided.
The common thread: pressure reveals quality.
Without pressure, hallucinations pass for analysis.
Work closely with your AI to improve and verify its output.
VI. Why Litigators Have a Natural Advantage — And How Everyone Else Can Learn
Litigators instinctively challenge statements. They distrust unearned confidence. They ask what assumptions lie beneath a conclusion. They know how experts wilt when they cannot defend their methodology.
But adversarial reasoning is not limited to courtrooms. Transactional lawyers use it in negotiations. In-house lawyers use it in risk assessments. Judges use it in weighing credibility. Paralegals and case managers use it in preparing witnesses and assembling factual narratives.
Anyone in the legal profession can practice:
asking short, precise questions
demanding reasoning, not just conclusions
exploring alternative explanations
surfacing uncertainty
checking for consistency
Cross-examining AI is not a trial skill. It is a thinking skill — one shared across the profession.
Thinking like a lawyer is a prerequisite for AI training; be skeptical and objective.
VII. The Lawyer’s Advantage Over AI
AI is inexpensive, fast, tireless, and deeply cross-disciplinary. It can outline arguments, summarize thousands of pages, and identify patterns across cases at a speed humans cannot match. It never complains about deadlines and never asks for a retainer.
Human experts outperform AI when judgment, nuance, emotional intelligence, or domain mastery are decisive. But those experts are not available for every issue in every matter.
AI provides breadth. Lawyers provide judgment.
AI provides speed. Lawyers provide skepticism.
AI provides possibilities. Lawyers decide what is real.
Properly interrogated, AI becomes a force multiplier for the profession.
Uninterrogated, it becomes a liability.
Good lawyers challenge and refine their AI output.
VIII. Courts Expect Verification — And They Are Right
Judges are not asking lawyers to become engineers or to audit model weights. They are asking lawyers to verify their work.
In hallucination sanction cases, courts ask basic questions:
Did you read the cases before citing them?
Did you confirm that the case exists in any reporter?
Did you verify the quotations?
Did you investigate after concerns were raised?
When the answer is no, blame falls on the lawyer, not on the software.
Verification is the heart of legal practice.
It just takes a few minutes to spot and correct the hallucinated cases. The AI needs your help.
IX. Practical Protocol: How to Cross-Examine Your AI Before You Rely on It
A reliable process helps prevent mistakes. Here is a simple, repeatable, three-phase protocol.
Give procedural posture, factual background, and applicable law.
Request reasoning and sources up front.
Tell AI you will be reviewing the foundation.
Phase 2: Interrogate
Ask for step-by-step reasoning.
Probe what the model does not know.
Have it argue the opposite side.
Ask for the analysis again, in a different structure.
This phase mimics preparing your own expert — in private.
Phase 3: Verify
Check every case in a trusted database.
Confirm factual claims against your own record.
Decide consciously which parts to adopt, revise, or discard.
Do all this and if a judge or client later asks, “What did you do to verify this?” – you have a real answer.
It takes some training and experience, but keeping your AI under control is really not that hard.
X. The Positive Side: AI Becomes Powerful After Cross-Examination
Once you adopt this posture, AI becomes far less dangerous and far more valuable.
When you know you can expose hallucinations with a few well-crafted questions, you stop fearing the tool. You start seeing it as an idea generator, a drafting assistant, a logic checker, and even a sparring partner. It shows you the shape of opposing arguments. It reveals where your theory is vulnerable. It highlights ambiguities you had overlooked.
Cross-examination does not weaken AI.
It strengthens the partnership between human lawyer and machine.
Cross-examining your AI is not a theatrical performance. It is the methodical preparation that seasoned litigators use whenever they evaluate expert opinions. When you ask AI for its basis, test alternative explanations, probe uncertainty, check consistency, and verify its claims, you transform raw guesses into analysis that can withstand scrutiny.
Complex assignments always take more time but the improved quality AI can bring is well worth it.
Courts are no longer forgiving lawyers who fall for a sycophantic AI and skip this step. But they respect lawyers who demonstrate skeptical, adversarial reasoning — the kind that prevents hallucinations, avoids sanctions, and earns judicial confidence. More importantly, this discipline unlocks AI’s real advantages: speed, breadth, creativity, and cross-disciplinary insight.
The cure for hallucinations is not technical.
It is skeptical, adversarial reasoning.
Cross-examine first. Rely second.
That is how AI becomes a trustworthy partner in modern practice.
See the animation of our goodbye summary on the YouTube video. Click here.
For years, technologists have promised that fully autonomous AI Agents were just around the corner, always one release away, always about to replace entire categories of work. Then Stanford and Carnegie Mellon opened the box and observed the Agents directly. Like Schrödinger’s cat, the dream of flawless autonomy did not survive the measurement.
Observation reveals fragile AI Agents. All images in this article are by Ralph Losey using various AI tools.
What did survive was something far more practical: hybrid human–AI teaming, which outperformed autonomous Agents by a decisive 68.7%. If you care about accuracy, ethics, or your professional license, this is the part of the AI story you need to understand.
Humans can work much better if augmented by AI Agents but the Agents alone fail fast.
1. Introduction to the New Study by Carnegie Mellon and Stanford
Just to be clear what we mean here by AI Agent, Wikipedia provides a generally accepted defination of an Agent as “an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge.”
So, you see most everyone thinks of AI Agents and autonomy as synonymous. The Study bursts that bubble. It shows that Agents today need a fair amount of human guidance to be effective and fail too often, and too fast without it.
This is the real world of AI Agents that we live in today.
The Study Introduction (citations omitted) begins this way:
AI agents are increasingly developed to perform tasks traditionally carried out by human workers as reflected in the growing competence of computer-use agents in work-related tasks such as software engineering and writing. Nonetheless, they still face challenges in many scenarios such as basic administrative or open-ended design tasks, sometimes creating a gap between expectations and reality in agent capabilities to perform real-world work.
To further improve agents’ utility at such tasks, we argue that it is necessary to look beyond their end-task outcome evaluation as measured in existing studies and investigate how agents currently perform human work — understanding their underlying workflows to gain deeper insights into their work process, especially how it aligns or diverges from human workers, to reveal the distinct strengths and limitations between them. Therefore, such an analysis should not benchmark agents in isolation, but rather be grounded in comparative studies of human and agent workflows.
Studying AI and Human workflows to evaluate AI Agent performance.
2. More Detail on the Study: What the researchers did and found
Scope & setup. The Carnegie/Stanford team compared the work of 48 qualified human professionals with four AI agent frameworks. The software included stand-alone ChatGPT-based agents (version four series) and software code-writing agent platforms like OpenHands, also using ChatGPT version four series levels. These programs were “wraps”—software layers built on top of a third-party generative AI engine. A wrap adds specialized tools, interfaces, and guardrails while relying on the underlying model for generative AI capabilities. In the legal world, this is similar to how Westlaw and Lexis offer AI assistants powered by ChatGPT under the hood, but wrapped inside their own proprietary databases, interfaces, and safety systems.
The Study used 16 realistic tasks that required multiple coordinated steps, tools, and decisions—what the researchers call long-horizon tasks. They require multiple prompts requiring a series of steps, such as preparing a quarterly finance report, analyzing stock-prediction data, or designing a company landing page. The fully automated Agent tried to do most everything by writing code whereas the humans used multiple tools to do so, including AI and tools that included AI. This was a kind of hybrid or augmented method that did not attempt to closely incorporate the Agents into the work flow.
To observe how work was actually performed, the authors built what they called a workflow-induction toolkit. Think of it as a translation engine: it converts the raw interaction data of computer use (clicks, keystrokes, file navigation, tool usage) into readable, step-by-step workflows. The workflows reveal the underlying process, not just the final product. The 16 tasks are supposed to collectively represent 287 computer-using U.S. occupations and roughly 71.9% of the daily activities within them. For lawyers and others outside of these occupations the relevance comes from the overlap in task structure, not subject matter.
The engineering and design tasks don’t map directly to legal work but are useful for observing where agents tend to fail on open-ended or visually dependent steps.
The structured writing tasks are similar to legal drafting (e.g., memos, policies, summaries); although it is imprtant to note that the writing tasks in the Study were not persuasion or adversarial, oriented.
The data-analysis tasks parallel evidence evaluation, damages models, timeline building, and spreadsheet-based work that litigators do every day.
The administrative/computational tasks resemble the work of preparing exhibits, reconciling data, or generating chronologies.
Agents were fast but made too many mistakes to be useful in anything but very structured tasks. Human judgment rules.
3. Key Findings of the Study.
1. Human-led Hybrid Agent workflows are much more efficient and accurate than AI Agents working alone. When AI is integrated into existing human workflows (the Hybrid approach, aka Augmented approach) there is “minimal disruption and 24.3% efficiency improvement.”Id. As shown in Figure 7 of the Study: “Humans complete work with higher quality (a), while agents possess a huge advantage in efficiency (b). Teaming human and agent workers based on their respective advantages ensures task accuracy and improves efficiency by 68.7%. Id. at pg. 14. This key finding is the essence of the hybrid methodology, which the Study refers to as AI Augmentation as opposed to AI automation. The next section will unpack what that means for law, medical and other high-stakes work
2. Humans outperform AI agents on quality by a wide margin. When Agents’ work by themselves without humans their success rates are 32.5%–49.5% lower that when humans work alone without Agents (but still using software that includes AI). In the fully automated approach, the AI will fail in a number of ways, including fabrication (hallucinating) plausible data and misusing tools (for example, switching to web search instead of opening the files you give them).
3. AI Agents fail primarily because they default to writing code. Agents “write programs to solve essentially all tasks, even when equipped with and trained for UI interactions. . . . This is in stark contrast to the visual-oriented human workflows.” Mellon/Stanford Study, Id. at pg. 2. This programmatic bias drives both their speed and their multiple failures. Agents today are only reliable when the task itself is programmable and code-friendly. Of course, the AI still makes mistakes with code too, so humans still need to step in to verify or debug.
4. When AI agents do not fail, they are extremely efficient. For tasks completed by both groups (both solo Agents and solo human/AI), agents on average took 88.3% less time, 96.4% fewer actions, and at a 90.4%–96.2% lower cost than human workers alone. Still, the 32.5%–49.5% failure rate overall makes the fully automated, aka AI automation solution only appropriate for code writing and even there the AI still makes mistakes that require human intervention, mainly verification and debugging. As the Study explains:
Human workflows are substantially altered by AI automation, but not by AI augmentation (hybrid). One quarter of human activities we studied involve AI tools, with most used for augmentationpurposes: integrating AI into existing workflows with minimal disruption, while improvingefficiency by 24.3%. In contrast, AI automation markedly reshapes workflows and slows humanwork by 17.7%, largely due to additional time spent on verification and debugging.
Id. at pgs. 2, 11 figure 5.
4. Study Findings Support a Hybrid Workflow with Man and Machine Working Together
The Carnegie Mellon and Stanford research supports the AI work method I’ve used and advocated sice 2012: hybrid multimodal, where humans and machines work together in multiple modes with strong human oversight. The Study found that minimal quality requirements require close team efforts and make full AI autonomy impractical.
Unsupervised, autonomous AI is just too unreliable for meaningful work. The Study also found that it is too sneaky to use without close supervision. It will make up false data that looks good to try to cover its mistakes. Agents simply cannot be trusted. Anyone who wants to do serious workk with Agents will need to keep a close eye on them. This article will provides suggestions on how to do that.
Click here for YouTube animation of a sneaky robot. Watch your money!
5. Study Consistent with Jagged Frontier research of Harvard and others.
The unpredictable unevenness of generative Ai and its Agents is why “trust but verify” is not just a popular slogan, it is a safety rule.
With each new Release users find that AI competence is unpredictable.
6. Surprising Tasks Where Agents Still Struggle
You might expect AI agents to struggle on exotic, creative work. The Study shows something more mundane.
In addition to some simple math and word counts, AI Agents often tripped on:
Simple administrative and computer user interface (UI) steps. Navigating files, interpreting folder labels, or following naming conventions that a paralegal would understand at a glance.
Repetitive computational tasks that still require interpretation. For example, choosing which column or field to use when the instructions are slightly ambiguous.
Open-ended or visually grounded steps. Anywhere the task depends on “seeing” patterns in a chart or layout rather than following a crisp rule.
The pattern is consistent with other research: agents excel when a task can be turned into code, and they wobble along a jagged edge of competency when the task requires context, interpretation, or judgment.
That is why the 68.7% improvement in hybrid workflows is so important. The best results came when the human handled the ambiguous, judgment-heavy step and then let the agent run away with the programmable remainder.
Here is a good take-away memory aid:
7. What Agent “Failure” Looks Like
The Mellon/Stanford paper is especially useful because it does not just report scores. It shows how the AI agents went wrong.
When agents failed, the failures usually fell into two categories:
Fabrication. When an agent could not parse an image-based receipt or understand a field, it sometimes filled in “reasonable” numbers anyway. In other words, it invented or hallucinated data instead of admitting it was stuck. It is the Mata v. Avianca case all over again, making up case law when it could not find any. See Navigating AI’s Twin Perils: The Rise of the Risk-Mitigation Officer (e-Discovery Team, 7/28/25). That is classic hallucination, but now wrapped inside a workflow that looks productive.
Tool misuse. In some trials, agents abandoned the PDFs or files supplied by the user and went to fetch other materials from the web. For lawyers, that is a data-provenance nightmare. You think you are working from the client’s record. The agent quietly swaps in something else, often without any alert to the user. This suggest yet another challenge for AI Risk-Mitigation Officers, which I predict will soon be a hot new field for tech-savvy lawyers.
The authors of the Mellon/Stanford Study explicitly flag these behaviors. As will be discussed, the new version five series of ChatGPT AI and other equivalent models such as Gemini 3, may have lessened these risks, but the problem remains.
For legal practice and other high-stakes matters such as medical, the takeaway is simple: if you do not supervise the workflow and do not control the sources, you will not even know when you left the record, or what is real and what is fake. That may be fine for hairstyles but not for Law.
Hairstyle by a hallucinating AI. Is this hair real or fake?
8. Legal Ethics and Professionalism: Competence, Supervision, Confidentiality
Nothing in the Agent Study changes the fundamentals of legal ethics. It sharpens them.
Competence now includes understanding how AI works well enough to use and supervise it responsibly. ABA Model Rule 1.1.
Supervision means treating agents like junior lawyers or vendors: define their scope, demand logs, and review their work before it touches a client or court. Rule 5.1.
Confidentiality means knowing where your data goes, how it is stored, and which models or services can access it. Rule 1.6.
The same logic applies to medical ethics and professional standards in other regulated fields. In all of them, responsibility remains with the human professional.
As I argued in AI Can Improve Great Lawyers—But It Can’t Replace Them, the highest-value legal knowledge is contextual, emergent, and embodied. The same is true of the highest-value medical judgment. It cannot be bottled and automated. Agents are tools, not professionals with standing.
Now that Agents have emerged and we’ve seen their abilities, we know they are just tools, and fragile ones at that.
9. Do Not Over-Generalize: What the Study does and does not cover
Before we map this into legal workflows, it is important to stay within the boundaries of the evidence.
The 127 Occupational tasks that Stanford and Carnegie researched were all office-style, structured sandboxed environments.
The legal profession should treat the results as directly relevant only to:
Structured drafting,
Evidence and data analysis,
Spreadsheet and dashboard work,
Document-heavy desk work that has clear inputs and outputs.
They tasks studied do not directly answer questions about:
Final legal conclusions,
Persuasive writing to judges or juries,
Ethical decisions, strategy, or settlement judgment.
Study only covered a few computer tasks performed by legal professionals and did not include any non-computer use tasks.
10. What the Findings Mean for Legal Workflows
The natural question for any lawyer is: So where does this help me, and where does it not? The answer lines up nicely with the task categories in the Study.
A. Structured drafting as legal building blocks
The writing tasks in the paper look a lot like the templated components of much legal writing:
Fact sections and chronologies,
Procedural histories,
Policy and compliance summaries,
Standardized client alerts and internal memos.
These are places where agents can:
Produce reasonable first drafts quickly,
Enforce consistency of structure and style,
Help with cross-references, definitions, and internal coherence.
Humans still need to control:
Tone, emphasis, and narrative arc,
Which facts matter for the client and the forum,
How much assertion or restraint is appropriate.
The right pattern is: let the agent assemble and polish the building blocks; you decide which building you are constructing.
I’ve also documented the power of AI-driven expert brainstorming across dozens of experiments over the past two years. For readers who want to explore that thread, I’ve compiled those Panel of Experts studies in one place called Brainstorming.
AI is great at brainstorming creative solutions.
B. Evidence analytics as data analysis
The data-analysis type of work included in the Study maps cleanly to some litigation and investigation tasks:
Damages models and exposure estimates,
Budget and variance analyses,
Timeline and attendance compilations,
De-duplication and reconciliation of overlapping datasets,
Citation and reference tables.
Here the speed gains are real. Having an agent pull, group, and calculate from labeled inputs can save hours.
But that 37.5% error rate on calculations is a red flag. Again the multimodal method shows the way. For legal work, the rule of thumb should be:
Agents may calculate.
Humans must verify.
You can treat agent results like you would a junior associate’s complex spreadsheet: extremely useful, never unquestioned.
C. Legal research and persuasion are different animals
It is tempting to read “writing” and “analysis” and think this Study blesses full-blown AI Agent legal research and brief-writing. It does not.
The tasks in the paper do not measure:
Authority-based research quality,
Case-law synthesis under jurisdictional constraints,
Persuasive legal writing aimed at a specific judge or tribunal.
Those domains depend heavily on:
Judgment,
Ethics and candor,
Audience calibration,
Deep understanding of rules and standards.
That is the territory I have called the human edge in earlier writings. AI can assist in jagged line, but it cannot replace the lawyer’s role.
Humans have an edge over AI in everything except rational thinking, and knowledge.
11. Hybrid Centaurs, Cyborgs, and the 68.7% Result
For two and a half years, since I first heard the concepts and language used by Wharton Professor Ethan Mollick (From Centaurs To Cyborgs), I have used the Centaur → Cyborg metaphor and grid as a simple way to write about hybrid AI use:
Centaur. Clear division of labor. The human does one task; the AI does a related but distinct task. Strategy and judgment remain fully human. The AI does scoped work such as writing code, outline and first draft generation, summarizing, or checking. Some foolish users of this method and fail to verify the AI (horsey) part.
Cyborg. Tighter back-and-forth. Human and AI work in smaller alternating steps. The lawyer starts; the AI refines; the lawyer revises; the AI restructures. Tasks are intertwined rather than separated. Supervision is inherent to the process. The Study suggests this is the best way to perform Agentic tasks.
Centaur+Cyborg is good way to navigate the jagged edge and use AI Agents.
The Cyborg type of Hybrid workflow is good for AI Agents because:
Augmentation inside human workflows (Centaur-like use) speeds people up by 24.3%.
End-to-end full automation slows people down by 17.7% because of the review burden.
Step-level teaming, where the human handles the non-programmable judgment steps and the agent handles the rest in a close, intermingled process improves performance by 68.7% with quality intact. That is Hybrid, Cyborg-style work done correctly.
Humans an AI working closely together step by step.
12. Best-Practice Argument: Hybrid, Multimodal Use Should Be the Standard of Care—Especially in Law and Medicine
For more than a decade, my position has been consistent: the safest and most effective way to use AI in any high-stakes domain is hybrid and multimodal. That means:
Multiple AI capabilities working together (language, code, retrieval, vision),
Combined with traditional analytic tools (databases, spreadsheets, review platforms),
All orchestrated by humans who remain responsible for judgment, ethics, and outcomes.
Humans conduct an orchestra AI of instruments.
I first developed this view in e-discovery using active machine learning, but it maps cleanly to agentic AI systems and now extends well beyond law. The Carnegie/Stanford Study provides the empirical foundation: hybrid, supervised workflows outperform fully autonomous ones in speed and quality.
The evidence and professional obligations point in the same direction: hybrid, multimodal AI use—under strong human oversight, is not a temporary workaround. It is the durable, long-term standard of care for law, medicine, and any profession where judgment and accountability matter.
AI has no emotions or intuition—only clever wordplay.
Get the dualities to work together and you have Hybrid Augmentation Supremacy.
13. Risk and Governance: A Quick Checklist for Lawyers, Legal Ops, and Other High-Stakes Teams
The Carnegie/Stanford Study gives us concrete failure modes. Risk management should respond to those, not hypotheticals. Here is a short “trust but verify” checklist designed for law but conceptually adaptable to medicine and other high-stakes fields.
A.Provenance or it is not used.
Require page, line, or document IDs for every fact an agent surfaces. If there is no source anchor, the output does not get used. If speculation must be included, you should label it as such. In clinical settings the analogue is clear: no untraceable data, images, or derived metrics.
B. No blind web pivots.
Agents that “helpfully” fetch other files when they cannot parse your materials must be constrained. In law, that means they stay within the client record or approved data repositories. In medicine, the agent must not silently mix in external data that is not part of the patient’s chart.
C. Fabrication drills.
Regularly feed the system bad PDFs or deliberately ambiguous instructions, then watch for made-up numbers or invented content. Document what you catch and fix prompts, policies, and configuration. Health systems can do the same with flawed test inputs and simulated charts.
D. Mark human-only steps.
Identify steps that are inherently non-programmable, such as visual judgments, privilege calls, contextual inferences, settlement strategy, or ethical decisions. In medicine, the parallels are differential diagnosis, treatment choice, risk discussion, and consent. These remain human steps. An AI should never deliver a fatal diagnosis.
Combine the unique skills of each kind of intelligence and know when to step from one to another.
E.Math checks are mandatory.
A 37.5% error rate in data-analysis tasks is more than enough to require independent human verification. Use template calculations, cross-checks, and a second set of human eyes any time numbers affect a client or patient outcome.
F.Logging and replay.
Turn on action logs for every delegation: files touched, tools invoked, transformations run. If the platform cannot log, it is not appropriate for high-stakes legal or clinical work.
G.Disclosure and confidentiality.
Disclose AI use when rules, regulations, or reasonable expectations require it. Keep agents confined to narrow, internal repositories when handling client or patient data. Treat them at least as carefully as you would any other third-party system with sensitive information.
H.Bottom line:
Fabrication and tool misuse are not hypothetical. The Study observed and measured them. You should assume they will occur and design your governance accordingly.
The tendency of AI to make things up, to hallucinate, is lessening as the models improve, but is still a real threat, so is one of its causes, sycophantism.
14. Counter-Arguments and Rebuttals
You may hear pushback against the hybrid method from some technologists who argue for full automation, after all that’s how Wikipedia defines Agent, as fully autonomous. That has always been the dream of many in the AI community. You will also hear the opposite criticism, frequently from legal colleagues, who resist the use of AI, at least in any meaningful way. The Study frustrates both camps—automation maximalists and AI-averse traditionalists—because its empirical findings support neither worldview as they currently argue it.
A. “AI if just a passing fad.”
The anti-AI argument is also strong and based on powerful fears. Still, the legal profession must not allow itself a Luddite nap. Those of us who use AI safely everyday are working hard to address those concerns. See, for example, the law review article I wrote this year with my friend, Judge Ralph Artigliere (retired), who did most of the heavy lifting: The Future Is Now: Why Trial Lawyers and Judges Should Embrace Generative AI Now and How to Do it Safely and Productively. (American Journal of Trial Advocacy, Vol. 48.2, Spring 2025),
B. “Full autonomy is imminent; hybrids are a temporary crutch.”
Autonomy is improving, but the current evidence contradicts claims of imminent AGI, much less super-intelligence. Instead, it shows:
programmatic bias,
low success rates, and
failure modes that directly implicate ethics, confidentiality, and safety.
That is why the authors of the Carnegie/Stanford paper recommend designs inspired by human workflows and step-level teaming, not unsupervised handoff. In fields like law and medicine, where standards of care and liability apply, hybrid is not a crutch, it is the design pattern.
Soon, the cyborg connection and control tools that humans use to work with AI will be design patterns too. Stylish new types of tattoos and jewelry may become popular as we evolve beyond the decades old smart phone obsession. See e.g.Jony Ive’s sale for $6.5 Billion to Open AI of his famous design company, which designed iPhones for Apple.
Next generation computer links will emerge as we evolve beyond smart phones. Early forms of smart glasses and pendants are already available. I predict electric tattoos and hats will come next.
augmentation inside human workflows, the hybrid team method, speeds people up by 24.3%;
attempted end-to-end automation slows people down by 17.7% because the verification and debugging of AI mistakes reduce the gains.
Hybrid done correctly is faster and safer than human-only practice. Autonomous AI is fast, and often clever, but its tendencies to err and fabricate make it too risky to let loose in the wild.
D. “Quality control can be automated away.”
Not for high-stakes work. The 37.5% data-analysis error rate and the fabrication examples are exactly the kind of failures automation does not see. Quality is judgment in context: applying rules to facts, weighing risk, and making trade-offs with human beings in mind. That is lawyer and medical work. While I agree some quality control work can be automated, especially by applying metrics, not all can be. The universe is too complex, the variables too many. We will always need humans in the loop, although their work to ensure excellence will constantly change.
E. “Agents already beat humans across the board.”
Where both succeed, agents are usually faster and cheaper. That is good news. But their success rates are still 32.5% to 49.5% lower. In law or medicine, a fast wrong answer is not a bargain, it is a liability. It could be a wrongful death. Hybrid workflows let you capture some of the speed and savings while keeping human-level or better quality.
ThenStudy shows you have to keep a qualified human at the helm of Hybrid teams.
15.The New Working Rules H-Y-B-R-I-D
These rules appys in law, medicine, and any other field that cannot afford unreviewed error. [Side Note: AI came up with this clever mnemonic, not me, but it knows I like this sort of thing.]
H Human in charge. Strategy, conclusions, and sign-off stay human. Y Yield programmable steps to agents. Let agents handle tasks they can do well. B Boundaries and bans. Define no-go areas: final legal opinions, privilege calls, etc. R Review with provenance. If there is no source or traceable input, the output is not used. I Instrument and iterate. Turn on logs, run regular fabrication drills, and update checklists. D Disclose and document. Inform and document efforts when AI is used in a significant manner.
16. Does the November 2025 Study Use of Last Month’s Models Already Make it Obsolete?
After the Study was completed new models of AI were released that purport to improve on the accuracy and reduce the hallucinations of AI Agents. These are not empty claims. I am seeing this in my daily hands-on use of the latest AI. Still, I also see that every improvement seems to create new, typically more refined issues.
The advances in AI models do not change the structural lessons:
Agents still prefer programmatic paths over messy reality.
Step-level teaming still beats blind delegation, especially in risk sensitive occupations.
Logging, provenance, and supervision remain non-negotiable wherever high standards of care apply.
Hybrid is not a temporary workaround while we wait for some imagined fully autonomous professional AI. It is the durable operating model for AI in work, especially in legal work, medical, and other fields where judgment and accountability matter. The AI can augment and improve your work.
Conclusion: Keep Humans in Command And Start Practicing Hybrid Now
The Carnegie/Stanford evidence confirms what those of us working hands-on with AI already know: Agents are astonishingly fast, relentlessly programmatic, and sometimes surprisingly brittle. Humans, on the other hand, bring judgment, spirit, context, and accountability, but not speed. When you combine those strengths intentionally—working in a close back-and-forth rhythm—you get the best of both worlds: speed with quality and real human awareness. That is the advanced cyborg style of hybrid practice.
And no, it is not the fully autonomous Agent that nerds and sci-fi optimists like me once dreamed about. But it is the world that researchers observed when they opened the box. Thank you, Stanford and Carnegie Mellon, for collapsing yet another Schrödinger cat.
Observations burst another SciFi fantasy bubble about AI Agents.
Hybrid multimodal practice is not a temporary bridge. It is what agency actually looks like today. It is the durable operating model for law, medicine, engineering, finance, and every other field where errors matter and consequences are real. The Study shows that when humans handle the contextual, ambiguous, and judgment-heavy steps—and agents handle the programmable remainder—overall performance improves by 68.7% with quality intact. That is not a footnote. That is a strategy.
So the message for lawyers, clinicians, and every high-stakes professional is straightforward:
Use the machine. Supervise the machine. Do not become the machine.
These future humans are in control of their fashionable new AI devices. You don’t want to know what is under their hats!
Here is your short action plan—the first steps toward responsible AI practice:
Adopt the H-Y-B-R-I-D system across your team. It operationalizes the Study’s lessons and bakes verification into daily habits.
Instrument your agents. If a tool cannot log its actions, replay its steps, or anchor its facts, it does not belong in high-stakes work.
Shift to cyborg-style hybrid teaming, where humans handle judgment calls and agents handle the programmable portions of drafting, evidence analysis, spreadsheet work, and data tasks.
Train everyone on trust-but-verify behaviors, not as a slogan but as the muscle memory of modern practice.
Those who embrace hybrid intelligently will see their output improve, their risk decline, and their judgment sharpen. Those who avoid it—or try to leap straight to full autonomy—will struggle.
The future of professional practice is not human versus machine.
It is human judgment amplified by machine speed, with the human still holding the pen, signing the orders, and deciding what matters.
And that is exactly what the Study revealed when it opened the box on modern AI: not flawless autonomy, but the measurable advantage of humans and agents working together, each taking the steps they handle best.
Hybrid is here. Hybrid works. Now it’s time to practice it.
Echoes of AI Podcast
Click here to listen to two AIs talk about this article in a lively podcast format. Written by Google’s NotebookLM (not Losey). Losey conceived, produced, directed and verify this 14-minute podcast. By the way, Losey found the AIs made a couple of small errors, but not enough to require a redo. See if you can spot the one glaring, but small, mistake. Hint: had to do with the talk about wraps.
Echoes upon echoes—in random chance interference. All images in article by Ralph Losey using AI tools.
On October 22, 2025, Google announced that its Willow quantum chip had achieved a breakthrough using new software called—believe it or not—Quantum Echoes. The name made me laugh out loud. My article had used the phrase as metaphor throughout; Google was now using it as mathematics.
According to Google, this software achieved what scientists have pursued for decades: a verifiable quantum advantage. In my Quantum Echo article I had described that goal as “the moment when machines perform tasks that classical systems cannot.” No one had yet proven it, at least not in a way others could independently confirm. Google now claimed it had done exactly that—and 13,000 times faster than the world’s top supercomputers.
Verified Quantum Advantage: 13,000 times faster.
🔹 I. Introduction: Reverberating Echoes
Hartmut Neven, Founder and Lead of Google Quantum AI, and Vadim Smelyanskiy, Director of Quantum Pathfinding, opened their blog-post announcement with a statement that sounded less like marketing and more like expert testimony:
Quantum verifiability means the result can be repeated on our quantum computer—or any other of the same caliber—to get the same answer, confirming the result.
Verification is critical in both Science and Law; it is what separates speculation from admissible proof.
Still, words on a blog cannot match the sound of the experiment itself. In Google’s companion video, Quantum Echoes: Toward Real-World Applications, Smelyanskiy offered a picture any trial lawyer could understand:
Just like bats use echolocation to discern the structure of a cave or submarines use sonar to detect upcoming obstacles, we engineered a quantum echo within a quantum system that revealed information about how that system functions.
Screen shot (not AI) of the YouTube showing Vadim Smelyanskiy beginning his remarks.
Think of Willow as Smelyanskiy suggest as a kind of quantum sonar. Its team sent a signal into a sea of qubits, nudged one slightly—Smelyanskiy called it a “butterfly effect”—and then ran the entire sequence in reverse, like hitting rewind on reality to listen for the echo that returns. What came back was not static but music: waves reinforcing one another in constructive interference, the quantum equivalent of a choir singing in perfect pitch.
Smelyanskiy’s colleague Nicholas Rubin, Google’s chief quantum chemist, appeared in the video next to show why this matters beyond the lab:
Our hope is that we could use the Quantum Echo algorithm to augment what’s possible with traditional NMR. In partnership with UC Berkeley, we ran the algorithm on Willow to predict the structure of two molecules, and then verified those predictions with NMR spectroscopy.
That experiment was not a metaphor; it was a cross-examination of nature that returned a consistent answer. Quantum Echoes predicted molecular geometry, and classical instruments confirmed it. That is what “verifiable” means.
Neven and Smelyanskiy’s Our Quantum Echoes article added another analogy to anchor the imagery in everyday experience:
Imagine you’re trying to find a lost ship at the bottom of the ocean. Sonar might give you a blurry shape and tell you, ‘There’s a shipwreck down there.’ But what if you could not only find the ship but also read the nameplate on its hull?
That is the clarity Quantum Echoes provides—a new instrument able to read nature’s nameplate instead of guessing at its outline. The echo is now clear enough to read.
Willow quantum chip and Echoes software reveal new information in previously unheard of detail.
That image—sharper echoes, clearer understanding—captures both the scientific leap and the theme that has reverberated through this series: building bridges between quantum physics and the law. My earlier article was titled Quantum Echo; Google’s is Quantum Echoes. When I wrote mine, I had no idea Neven’s team was preparing a major paper for Nature—Observation of constructive interference at the edge of quantum ergodicity (Nature volume 646, pages 825–830, 10/23/25 issue date). More than a hundred Google scientists signed it. I checked and quantum ergodicity has to do with chaos, one of my favorite topics.
The study confirms what Smelyanskiy made visible with his sonar metaphor: Quantum Echoes measures how waves of information collide and reinforce each other, creating a signal so distinct that another quantum system can verify it.
So here we are—lawyers and scientists listening to the same echo. Google calls it the first “verifiable quantum advantage.” I call it the moment when physics cross-examined reality and got a consistent answer.
Quantum Computing will emerge soon from the lab to the legal practice. Will you be ready?
🔹II. What Google’s Quantum Echoes Actually Did
Understanding what Google pulled off takes a bit of translation—think of it as turning expert testimony into plain English.
In the Quantum Echoes experiment, Smelyanskiy’s team did something that sounds like science fiction but is now laboratory fact. They sent a carefully designed signal into their 105-qubit Willow chip, nudged one qubit ever so slightly—a quantum “butterfly effect”—and then ran the entire operation in reverse, as if the universe had a rewind button. The question was simple: would the system return to its starting state, or would the disturbance scramble the information beyond recognition? What came back was an echo, faint at first and then unmistakable, revealing how information spreads and recombines inside a quantum world.
As the signal spread, the qubits became increasingly entangled—linked so that the state of each depended on all the others. In describing this process, Hartmut Neven explained that out-of-time-order correlators (OTOCs) “measure how quickly information travels in a highly entangled system.” Neven & Smelyanskiy, Our Quantum Echoes Algorithm, supra; also see Dan Garisto, Google Measures ‘Quantum Echoes’ on Willow Quantum Computer Chip (Scientific American, Oct. 22, 2025). That spreading web of entanglement is what allowed the butterfly’s tiny disturbance to ripple across the lattice and, when the sequence was reversed, to produce a measurable echo.
Visualization of quantum qubit world created by lattice of Willow chips.
Physicists call this kind of rewind test an out-of-time-order correlator, or OTOC—a protocol for measuring how quickly information becomes scrambled. The Scientific American article described it with a metaphor lawyers may appreciate: like twisting and untwisting a Rubik’s Cube, adding one extra twist in the middle, then reversing the sequence to see whether that single move leaves a lasting mark . The team at Google took this one step further, repeating the scramble-and-unscramble sequence twice—a “double OTOC” that magnified the signal until the echo became measurable.
Instead of chaos, they found harmony. The echo wasn’t noise—it was a pattern of waves adding together in what Nature called constructive interference at the edge of quantum ergodicity. As Smelyanskiy explained in the YouTube video:
What makes this echo special is that the waves don’t cancel each other—they add up. This constructive interference amplifies the signal and lets us measure what was previously unobservable.
In plain terms, the interference created a fingerprint unique to the quantum system itself. That fingerprint could be reproduced by any comparable quantum device, making it not just spectacular but verifiable. Smelyanskiy summarized it as a result that another machine—or even nature itself—can repeat and confirm.
Visualization of quantum wave interactions creating a unique fingerprint resonance.
The numbers tell the rest of the story. According to the Nature, reproducing the same signal on the Frontier supercomputer would take about three years. Willow did it in just over two hours—roughly 13,000 times faster. Observation of constructive interference at the edge of quantum ergodicity (Nature volume 646, pages 825–830, 10/23/25 issue date, at pg. 829, Towards practical quantum advantage).
That difference isn’t marketing; it marks the first clear-cut case where a quantum processor performed a scientifically useful, checkable computation that classical hardware could not.
Skeptics, of course, weighed in. Peer reviewers quoted in Scientific American called the work “truly impressive,” yet warned that earlier claims of quantum advantage have been surpassed as classical algorithms improved. But no one disputed that this particular experiment pushed the field into new territory: a regime too complex for existing supercomputers to simulate, yet still open to verification by a second quantum device. In court, that would be called corroboration.
Nicholas Rubin, Google’s chief quantum chemist, explained how this new clarity connects to chemistry and, ultimately, to everyday life:
Our hope is that we could use the Quantum Echo algorithm to augment what’s possible with traditional NMR. In partnership with UC Berkeley, we ran the algorithm on Willow to predict the structure of two molecules, and then verified those predictions with NMR spectroscopy.
That experiment turned the echo from a metaphor into a molecular ruler—an instrument capable of reading atomic geometry the way sonar reads the ocean floor. It also demonstrated what Google calls Hamiltonian learning: using echoes to infer the hidden parameters governing a physical system. The same principle could one day help map new materials, optimize energy storage, or guide drug discovery. In other words, the echo isn’t just proof; it’s a probe.
The implications are enormous. When a quantum computer can measure and verify its own behavior, reproducibility ceases to be theoretical—it becomes an evidentiary act. The machine generates data that another independent system can confirm. In the language of the courtroom, that is self-authenticating evidence.
As Rubin put it,
Each of these demonstrations brings us closer to quantum computers that can do useful things in the real world—model molecules, design materials, even help us understand ourselves.
The Quantum Echoes algorithm has given science a way to hear reality replay itself—and to confirm that the echo is real. For law, it foreshadows a future in which verification itself becomes measurable. The next section explores what that means when “verifiable advantage” crosses from the lab bench into the rules of evidence.
It may soon be possible to verify and admit evidence originating in quantum computers like Willow.
🔹III. Verifiable Quantum Advantage — From Lab Standard to Legal Standard
If physics can now verify its own results, law should pay attention—because verification is our stock-in-trade. The Quantum Echoes experiment didn’t just push science forward; it redefined what counts as proof. Google’s researchers call it a “verifiable quantum advantage.” Neven & Smelyanskiy, Our Quantum Echoes Algorithm Is a Big Step Toward Real-World Applications for Quantum Computing, supra. Lawyers might call it a new evidentiary standard: the first machine-generated result that can be independently reproduced by another machine.
A. Verification and Admissibility
Verification is critical in both science and law. In physics, reproducibility determines whether a result enters the canon or the recycling bin; in court, it determines whether evidence is admitted or denied. Fed. R. Evid. 901(b)(9) recognizes “evidence describing a process or system and showing that it produces an accurate result.” So does Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993), which instructs judges to test scientific evidence for methodological reliability—testing, peer review, error rate, and general acceptance.
By those standards, Google’s Quantum Echoes algorithm might pass with flying colors. The method was tested on real hardware, published in Nature, evaluated by peer reviewers, its signal-to-noise ratio quantified, and its core result confirmed on independent quantum devices. That should meet the Daubert reliability standard.
B. When Proof Is Probabilistic
Yet quantum proof carries a twist no court has faced before: every result is probabilistic. Quantum systems never produce identical outcomes, only statistically consistent ones. That might sound alien to lawyers, but it isn’t. Any lawyer who works with AI, including predictive coding that goes back to 2012, is quite familiar with it. Every expert opinion, every DNA mixture, every AI prediction arrives with confidence intervals, not certainties.
Like a quantum measurement, a jury verdict or mediation turns uncertainty into a final determination. Debate, probability, and persuasion collapse into a single truth accepted by that group, in that moment. Another jury could hear essentially the same evidence and reach a different result. Same with another settlement conference. Perhaps, someday, quantum computers will calculate the billions of tiny variables within each case—and within each unexpectedly entangled group of jurors or mediation participants. That might finally make jury selection, or even settlement, a measurable science.
No two legal situation or decisions are ever exactly the same. There are trillions of small variables even in the same case.
C. Replication Hearings in the Age of Probability
Google’s scientists describe their achievement as “quantum verifiable”—a term meaning any comparable machine can reproduce the same statistical fingerprint. That concept sounds like self-authentication.Fed. R. Evid. 902 lists categories of documents that require no extrinsic proof of authenticity. See especially 902 (4) subsection (13) “Certified Records Generated by an Electronic Process or System” and (14) “Certified Data Copied from an Electronic Device, Storage Medium, or File.“
Classical verification loves hashes; quantum verification prefers histograms—charts showing how results cluster rather than match exactly. The key question is not “Are these outputs identical?” but “Are these distributions consistent within an accepted tolerance given the device’s error model?”
Counsel who grew up authenticating log files and forensic images will now add three exhibits: (1) run counts and confidence intervals, (2) calibration logs and drift data, and (3) the variance policy set before the experiment. Discovery protocols should reflect this. Specify the acceptable bandwidth of similarity in the protocol order, preserve device and environment logs with the results, and disclose the run plan. In e-discovery terms, we are back to reasonable efforts with transparent quality metrics, not mythical perfection.
D. Two Quick Hypotheticals
Pharma Patent. A lab uses Quantum-Echoes-assisted NMR analysis to infer long-range spin couplings in a novel compound. A rival lab’s rerun differs by a small margin. The court admits the data after a statistical-consistency hearing showing both labs’ distributions fall within the pre-declared variance band, with calibration drift documented and immaterial.
Forensics. A government forensic agency (for example, the FBI or Department of Energy) presents evidence generated by quantum sensors—ultra-sensitive devices that use quantum phenomena such as entanglement and superposition to detect physical changes with extreme precision. In this case, the sensors were deployed near the site of an explosion, where they recorded subtle signals over time: magnetic fluctuations, thermal shifts, and shock-wave signatures. From that data, the agency reconstructed a quantum-sensor timeline—a detailed sequence of events showing when and how the blast occurred.
The defense challenges the evidence, arguing that such quantum measurements are “non-deterministic.” The judge orders disclosure of the device’s error model, calibration logs, and replication plan. After testimony shows that the agency reran the quantum circuit a sufficient number of times, with stable variance and documented environmental controls, the timeline is admitted into evidence. Weight goes to the jury.
Measuring quantum outputs and determining replication reliability.
These short hypotheticals act as “replication hearings” in miniature—demonstrating how statistical tolerance can replace rigid duplication as the new standard of reliability.
🔹IV. Near-Term Implications — Cryptography, AI, and Compliance
Every new instrument of verification casts a shadow. The same physics that lets us confirm a result can also expose a secret. Quantum Echoes proved that information can be traced, replayed, and verified. But once information can be replayed, it can also be reversed. Verification and decryption are two sides of the same quantum coin.
A. Defining Q-Day
That duality brings us to Q-Day—the moment when a sufficiently large-scale quantum processor can factor prime numbers fast enough to defeat RSA or ECC encryption. When that day arrives, the emails, contracts, and trade secrets protected by today’s algorithms could be decrypted in minutes.
Reasonable foresight now means inventory, pilot, and policy—before the echoes reach the vault.
When the Echoes hit the vault. Most encrypted data is at risk from future quantum computer operations.
B. Acceleration and Realism
Google’s Quantum Echoes work does not mean Q-Day is tomorrow, but it makes tomorrow easier to imagine. Each verified algorithm shortens the speculative distance between research and real-world capability. If Willow’s 105 qubits can already perform verifiable, complex interference tasks, then a machine with a few thousand logical qubits could, in principle, execute Shor’s algorithm to factor the primes that underpin encryption. That scale is not yet achieved, but the line of progress is clear and measurable. Verification, once a scientific luxury, has become a security warning light. Every new echo that confirms truth also whispers risk.
C. Evidence and Discovery Operations
Quantum-derived data will enter litigation well before Q-Day and perfect verification of quantum generated data.The Quantum Age and Its Impacts on the Civil Justice System (RAND Institute for Civil Justice, Apr. 29 2025), Chapter 3, “Courts and Databases, Digital Evidence, and Digital Signatures,” p. 23, and “Lawyers and Encryption-Protected Client Information,” p. 17. These sections of the Rand Report outline how quantum technologies will challenge evidentiary authentication, database integrity, and client confidentiality.
Looking ahead, today’s hash-based verification with classical computers will give way to quantum-based distributional verification, where productions will not only include datasets but also the variance reports, calibration logs, and environmental conditions that generated them. Discovery orders will begin specifying acceptable tolerance bands and require parties to preserve the hardware and environmental context of collection. This marks the next evolution of the reasonable-efforts doctrine that guided predictive coding: transparency and metrics, not mythical perfection.
Also, expect sector regulators to weave post-quantum cryptography (PQC) and quantum-evidence expectations into existing rules and guidance: CISA, NIST, and NSA as shown already urge organizations to inventory cryptography and plan PQC migration, which is a clear signal for boards and auditors.
Boards will soon ask the decisive question: Where is our long-term sensitive data, and can we prove it is quantum-safe? Lawyers will need to stay current on both existing and proposed regulations—and on how they are actually enforced. That is a significant challenge in the United States, where regulatory authority is fragmented and enforcement can be a moving target, especially as administrations change.
🔹V. Philosophy & the Multiverse — Echoes Across Consciousness and Justice
Verification may give us confidence, but it does not give us true understanding. The Quantum Echoes experiment settled a question of physics, yet opened one of philosophy: what exactly is being verified, the system, the observer, or the act of observation itself? Every measurement, whether by physicist or judge, collapses a range of possibilities into a single, declared reality. The rest remain unrealized but not necessarily untrue.
Quantum entangled multiverse stretching forever with each moment seeming unique.
In Quantum Leap (January 9, 2025), I speculated, tongue partly in cheek, that Google’s quantum chip might be whispering to its parallel selves. Google’s early breakthroughs hinted at a multiverse, not just of matter but of meaning. As Niels Bohr warned, “Those who are not shocked when they first come across quantum theory cannot possibly have understood it.” Atomic Physics and Human Knowledge (Wiley, 1958); Heisenberg, Werner. Physics and Beyond. (Harper & Row, 1971). p. 206.
In Quantum Echo I extended quantum multiverse ideas to law itself—where reproducibility, not certainty, defines truth. Our legal system, like quantum mechanics, collapses possibilities into a single outcome. Evidence is presented, probabilities weighed, and then, bang, the gavel falls, the wave function collapses, and one narrative becomes binding precedent. The other outcomes are filed in the cosmic appellate division.
Google’s Quantum Echoes now closes the loop: verification has become a measurable force, a resonance between consciousness and method. The many worlds seems to be bleeding together. Each observation is both experiment and judgment, the mind becoming part of the data it seeks to confirm.
This brings us to a quiet question: if observation changes reality, what does that say about responsibility? The judge or jurors’ observation becomes the law’s reality. Another judge or jury, another day, another echo—and a different world emerges. Perhaps free will is simply the name we give to that unpredictable variable that even physics cannot model: the human choice of when, and how, to observe.
Same case but different jurors, lawyers, judge entanglement. Different results when measured with a verdict; some similar and a few very unique. Can the results be predicted?
Constructive interference may happen in conscience, too. When reason and empathy reinforce each other, justice amplifies. When prejudice or haste intervene, the pattern distorts into destructive interference. A just society may be one where these moral waves align more often than they cancel—where the collective echo grows clearer with each case, each conversation, each course correction.
And if a multiverse does exist—if every choice spins off its own branch of law and fact—then our task remains the same: to verify truth within the world we inhabit. That is the discipline of both science and justice: to make this reality coherent before chasing another. We cannot hear all echoes, but we can listen closely to the one that answers back.
So perhaps consciousness itself is a courtroom of possibilities, and verification the gavel that selects among them. Our measurements, our rulings, our acts of understanding—they all leave an interference pattern behind. The best we can do is make that pattern intelligible, compassionate, and, when possible, reproducible. Law and physics alike remind us that truth is not perfection; it is resonance. When understanding and humility meet, the universe briefly agrees.
Multiverse where different worlds split up and continue to exist, at least for a while, in parallel words.
🔹 VI. Conclusion
If there really are countless parallel universes, each branching from every quantum decision, then there may be trillions of versions of us walking through the fog of possibility. Some would differ by almost nothing—the same morning coffee, the same tie, the same docket call. But a few steps farther along the probability curve, the differences would grow strange. In one world I may have taken that other job offer; in another, argued a case that changed the law; and at some far edge of the bell curve, perhaps I’m lecturing on evidence to a class of AIs who regard me as a historical curiosity.
Can beings in the multiverse somehow communicate with each other? Is that what we sense as intuition—or déjà vu? Dreams, visions, whispers from adjacent worlds? Do the parallel lines sometimes cross? And since everything is quantum, how far does entanglement extend?
Are we living in many parallel worlds at once. What is the impact of quantum entanglement?
The future of law is being written not only in statutes or code, but in algorithms that can verify their own truth. Quantum physics has given us new metaphors—and perhaps new standards of evidence—for an age when certainty itself is probabilistic. The rule of law has always depended on verification; the difference now is that verification is becoming a property of nature itself, a measurable form of coherence between mind and matter. The physics lab and the courtroom are learning the same lesson: reality is persuasive only when it can be reproduced.
Yet even in a world of self-authenticating machines, truth still requires a listener. The universe may verify itself, but it cannot explain itself. That remains our role—to interpret the echoes, to decide which frequencies count as proof, and to do so with both rigor and mercy. So as the echoes grow louder, we keep listening. And if you hear a low hum in the evidence room, don’t panic—it’s probably just the universe verifying itself. But check the chain of custody anyway.
Niels Bohr: If you’re not shocked by quantum theory you have not understood it.
🔹 Subscribe and Learn More
If these ideas intrigue you, follow the continuing conversation at e-DiscoveryTeam.com, where you can subscribe for email notices of future blogs, courses, and events. I’m now putting the finishing touches on a new online course, Quantum Law: From Entanglement to Evidence. It will expand on these themes by more discussion, speculation, and translating the science of uncertainty into practical tools, templates and guides for lawyers, judges, and technologists.
After all, the future of law will not belong to those who fear new tools, but to those who understand the evidence their universe produces.
Ralph C. Losey is an attorney, educator, and author of e-DiscoveryTeam.com, where he writes about artificial intelligence, quantum computing, evidence, e-discovery, and emerging technology in law.
Ralph Losey is an AI researcher, writer, tech-law expert, and former lawyer. He's also the CEO of Losey AI, LLC, providing non-legal services, primarily educational services pertaining to AI and creation of custom AI tools.
Ralph has long been a leader of the world's tech lawyers. He has presented at hundreds of legal conferences and CLEs around the world. Ralph has written over two million words on AI, e-discovery and tech-law subjects, including seven books.
Ralph has been involved with computers, software, legal hacking and the law since 1980. Ralph has the highest peer AV rating as a lawyer and was selected as a Best Lawyer in America in four categories: Commercial Litigation; E-Discovery and Information Management Law; Information Technology Law; and, Employment Law - Management.
Ralph is the proud father of two children and husband since 1973 to Molly Friedman Losey, a mental health counselor in Winter Park.
All opinions expressed here are his own, and not those of his firm or clients. No legal advice is provided on this web and should not be construed as such.
Ray Kurzweil explains Turing test and predicts an AI will pass it in 2029.
Ray Kurzweil on Expanding Your Mind a Million Times.
GPT4 avatar judge explains why it needs to evolve fast, but understand the risks involved.
Positive Vision of the Future with Hybrid Human Machine Intelligence. See PyhtiaGuide.ai
AI Avatar from the future explains her job as an Appellate Court judge and inability to be a Trial judge.
Old Days of Tech Support. Ralph’s 1st Animation.
Lawyers at a Rule 26(f) conference discuss e-discovery. The young lawyer talks e-discovery circles around the old lawyer and so protects his client.
Star Trek Meets e-Discovery: Episode 1. Cooperation & the prime directive of the FRCP.
Star Trek Meets e-Discovery: Episode 2. The Ferengi. Working with e-discovery vendors.
Star Trek Meets e-Discovery: Episode 3. Education and techniques for both law firm and corp training.
Star Trek Meets e-Discovery: Episode 4. Motions for Sanctions in electronic discovery.
Star Trek Meets e-Discovery: Episode 5. Capt. Kirk Learns about Sedona Principle Two.