ai | e-Discovery Team

Five Faces of the Black Box: How AI ‘Thinks’ and Makes Decisions

March 29, 2026

Ralph Losey, March 29, 2026.

We are currently living through a “Gutenberg Moment,” but with a complex, digital twist: our new printing press is alive, probabilistic, and prone to “confident delusions.” While AI may be humanity’s most transformative invention, it remains an enigma to most.

For many legal professionals, the outputs of Generative AI feel like a digital seance—words appearing out of the ether with no visible logic. This “Black Box” is not just a technical curiosity; it is a professional liability. If you cannot at least partially understand and explain how your “assistant” reached a conclusion, you are effectively practicing in the dark. To move from being a passenger to a pilot, you must understand the mechanical soul of the machine and learn how to make it sing with the voices you command.

A futuristic scene depicting four individuals interacting with a multi-faceted display in a modern office environment, showcasing advanced technology and data visualization concepts. — Five Faces of the Black Box. My choices. My direction. Writing and images assisted by Gemini AI.

My recent article, What People Want To Know About AI: Top 10 Curiosity Index, revealed that the primary thing people want to know is how the machine actually works. They are asking the most difficult question in the field: How does AI “think” or make decisions?

This article answers that question by providing a structured understanding of Large Language Models (LLMs) across five levels of technical complexity:

The Smart Child: The world’s best guessing game.
The High School Graduate: Statistical probability at a global scale.
The College Graduate: Mapping meaning in Latent Space.
The Computer Scientist: The logic of the Transformer and Self-Attention.
The Tech-Minded Legal Professional: Navigating probabilistic advocacy.

A visual representation of five individuals at different life stages: a young boy labeled 'The Smart Child,' a high school student labeled 'High Schooler,' a college graduate in a cap and gown, a computer scientist in a lab coat, and a lawyer in business attire labeled 'The Tech-Minded Lawyer.' Each character is surrounded by digital elements and diagrams that represent technology and education.

There is a meta-lesson here too that goes beyond the words on this page. Some of my favorite explanations of complex subjects emulate the fresh, clear speech of fifth graders. You will often find deep creativity when AI models parrot their language.

I chose five kinds of speech to describe how AI works. There are hundreds more that I could have picked. I also could have asked for explanations that use story or humor, much like Abraham Lincoln liked to do. It is fun to learn to tell AI what to do so that you can better communicate. It empowers a level of creativity never before possible. Maybe next time I will use comedy or poetry. For now, let’s peel back the curtain using these five.

1. The Smart Child Level: The World’s Best Guessing Game

Definition: Generative AI is like a magic “Fill-in-the-Blank” machine that has played the game trillions of times with almost every book ever written.

Imagine you are playing a game. If I say, “The peanut butter and…”, you immediately think of the word “jelly.” You don’t need to look at a jar of jelly to know that word fits. You’ve heard those words together so many times that your brain just knows they belong together.

An AI is a computer that has “listened” to almost everyone in the world talk and “read” almost every story ever told. It doesn’t “know” what a sandwich is, and it doesn’t have a stomach that feels hungry. It simply knows that in the history of human writing, the word “jelly” follows “peanut butter” more than almost any other word.

But it’s even smarter than that. If you say, “I am at the library and I am reading a…”, the AI knows that “book” is a much better guess than “sandwich”. It looks at all the words you give it—the “clues”—to narrow down the billions of possibilities into one likely answer. It makes decisions by picking the word that is most likely to come next to complete a pattern that makes sense to us. It isn’t “thinking” about the story; it’s just very, very good at predicting the next piece of the puzzle.

A robotic hand holds a piece of jelly on a keyboard with the words 'SUN PEANUT BUTTER AND.' set against a backdrop of bookshelves.

2. The High School Level: Statistical Probability at Global Scale

Definition: AI is a Prediction Engine. It uses “Big Data” to calculate the statistical likelihood of the next piece of information.

Most of us use the “Autofill” feature on our smartphones every day. As you type a text, the phone suggests the next likely word based on your past habits. If you often text “I’m on my way,” the phone learns that “way” usually follows “my.” Generative AI—specifically Large Language Models—is essentially Autofill scaled to include the vast majority of digitized human knowledge.

During its “training” phase, the model does not “memorize” facts like a traditional database. If you ask it for the date of the Magna Carta, it isn’t looking it up in a digital encyclopedia. Instead, it has learned through billions of examples that the words “Magna Carta” and “1215” have a very high statistical correlation.

This explains why AI can sometimes be “confidently wrong.” It isn’t “lying” in the human sense; it is simply following a statistical path that leads to a mistake. If the data it was trained on contains a common error, the AI will repeat that error because, in its mathematical world, that error is the “most likely” next word. It recognizes the “shape” of human thought without actually having a human mind.

A person holding a smartphone displaying a messaging app titled 'Global AI Team', with a conversation about scaling processing. The background features a digital world map with binary code overlay. — High School Graduate Level Speech Using Statistical Probabilities.

3. The College Graduate Level: Mapping the Latent Space

Definition: AI organizes information using Vector Embeddings, which convert words into numerical coordinates on a massive, multi-dimensional map called Latent Space.

To understand how AI moves beyond mere word-matching, we have to look at how it “maps” meaning. In a physical library, books are organized by a 1D system (the spine) or 2D (the shelf). AI organizes information in a “map” that has thousands of dimensions.

Vectoring (The Coordinate System): Every word or concept is assigned a “Coordinate”—a long string of numbers. For example, the word “Stealing” is mathematically plotted very close to “Larceny” but far away from “Charity”.
Conceptual Proximity: Think of this as the “Relativity” of language. If you ask the AI about “theft,” it doesn’t look for that specific word. It navigates to those coordinates in Latent Space and finds all the “neighboring” concepts like “property,” “intent,” and “deprivation.”
Vector Arithmetic: Researchers discovered that you can actually perform “logic” using these numbers. A famous example is: King – Man + Woman = Queen. The model “understands” the relationship between these concepts because the mathematical distance between “King” and “Man” is the same as the distance between “Queen” and “Woman.”

When you provide a prompt, the AI identifies the coordinates of your request. It then “walks” through the nearby clusters of meaning to synthesize an answer. The “Black Box” is the result of the sheer scale of this map. With hundreds of billions of dimensions, the path the AI takes is so complex that no human can trace the logic of a single output back to a single “rule.”

A visual representation of legal terms and criminal acts, featuring nodes and connections depicting concepts like larceny, fraud, contract law, and violent crimes. — College Graduate Level Speech Mapping Latent Space.

4. The Computer Scientist Level: The Decoder-Only Transformer

Definition: Generative AI is a system powered by neural network architectures—most notably the Decoder-only Transformer—that is specifically tuned to generate the next piece of information by mathematically looking back at everything that came before it. Rather than relying on rigid rules, these models evaluate entire inputs using a mathematical weighting system called Self-Attention to determine the contextual relationship between every element.

To achieve this generative capability, the architecture relies on several complex mathematical mechanisms:

A. The “Query, Key, and Value” System: To decide how much “weight” to give a word, the AI creates three numerical identities for every token. The Query represents what the token is looking for (like a pronoun searching for a subject), the Key represents what the token offers (like a subject offering its identity), and the Value represents the token’s actual semantic meaning.

A digital illustration depicting a data processing concept with labeled elements: Query, Token, Key, and Value, featuring glowing lines and binary code in a dark background. — AI Sytem to decides hew much Weight to give a word.

B. The Logic of Self-Attention: The AI establishes context by comparing the Query of one word against the Keys of all other words in the sequence. Imagine a judge sitting through a long trial. When a witness says the word ‘It,’ the judge immediately looks back at previous exhibits to see what ‘It’ refers to. The AI does this mathematically by comparing the Query of one word against the Keys of every other word in the sequence. For example, in the sentence “The court sanctioned the attorney because his motion was meritless,” the AI mathematically calculates the relationship between “his” and the surrounding words. The Query for “his” finds a high match with the Key for “attorney,” allowing the model to assign a high Attention Weight to “attorney” so the word “his” inherits the correct context.

A futuristic courtroom scene featuring a humanoid robot analyzing data from a holographic interface while a woman presents evidence at the witness stand, with an audience observing. — Futuristic courtroom where a cyborg judge Queries one word to the Keys of all others to build context,

C. Multi-Head Attention (Parallel Deliberation): The model doesn’t just evaluate the text once; it runs these calculations dozens of times in parallel. Different “Heads” focus on different aspects simultaneously—one might evaluate syntax and grammar, another focuses on technical legal definitions, and a third assesses the overall tone or sentiment.

A futuristic illustration of a brain divided into three sections labeled 'Left', 'Middle', and 'Right'. The 'Left' side features symbols related to grammar and linguistic algorithms. The 'Middle' section displays scales symbolizing law and fairness. The 'Right' side shows diverse facial expressions, representing emotions and mental processing. — AI brain split into three parallel sections working simultaneously. Left side scans floating grammar and punctuation. Middle analyzes justice definations. Right side evaluates holographic floating masks of human emotions.

D. The Decision Layer (Feed-Forward Networks): After attention weights are settled, the data moves into a decision-making layer consisting of billions of Weights (connection strengths) and Biases (baseline leanings). These act as the model’s “institutional knowledge,” which was grown during training to satisfy the objective of predicting the next token.

Illustration of an AI feed-forward network with labeled layers, neurons, weights, and data flow, depicted through vibrant interconnected lines and nodes. — FFN where thickness of neural connections represents weights.

E. The Softmax Verdict: Finally, the model uses a Softmax function to produce a probability list of every possible word in its vocabulary. It calculates the exact odds—for example, assigning “Court” an 85% probability and “Sandwich” a 0.01% probability—and then mathematically samples the winner to generate the next word. Since the Softmax Verdict generates words based on statistical odds rather than verified facts, it is crucial for lawyers to verify the output, which we will also discuss in more detail later in this article.

Digital display of court-related statistics showing a confidence level of 85% with various legal terms and corresponding percentages listed alongside. — Softmax Verdict predicts “Court” to be the most likely next word.

5. The Tech-Minded Legal Professional Level: Probabilistic Advocacy

Definition: For the legal professional, Generative AI is not a database, but a Probabilistic Inference Engine. It does not “find” data in the traditional sense; it infers the most likely response based on the conceptual coordinates of your request and the mathematical “gravity” of the language it was trained on.

A. From Search to Inference

For fifty years, the legal industry’s relationship with technology was deterministic. Traditional legal databases use rigid logic gates: Does Document A contain Word X AND Word Y? If the words are present, it is a ‘hit’; if not, it is ignored, functioning as a simple ‘On/Off’ switch. The Transformer changes this completely. It is not a search database, but a Probabilistic Inference Engine. When you ask it to ‘analyze a witness’s credibility,’ it doesn’t just look for the word ‘credibility’; it infers a conclusion by weighing the context of every word in the record.

An image depicting a metallic switch labeled 'OFF' for 'Deterministic Keyword Search' alongside a graphic illustrating 'Probabilistic Inference (Intent)' with clusters of keywords such as 'Payment', 'Influence', 'Bribe', and 'Arrangement' indicating varying probability connections. — Legal Tech Tools and Search Based on AI Probabilistic Analysis.

B. Navigating the Latent Space

To perform this analysis, the model navigates the Latent Space coordinates of your query. It uses the Self-Attention weights discussed in Level 4 to “infer” a conclusion by weighing the context of every word in the record. It identifies the “Intent” and “Sentiment” within millions of documents in a second. Such tasks were previously impossible for deterministic software.

C. The Weight of the Legal Oath

While the machine provides the “Magic Guesses” of a child and the “Neural Weights” of a scientist, it lacks the professional standing to be an advocate.

The Black Box as an Invitation: The “Black Box” is not an excuse for ignorance; it is an invitation to a higher level of legal practice.
The Human Validator: We use the machine to find the “needle” (the insight), but we use our human judgment to prove it is evidence and not a hallucination.
The Ultimate Weight: In this new era, the most important “Weight” in the entire system is the one held by the human professional.

A digital representation of a scale of justice balancing a black box labeled 'BLACK BOX' with data elements like 'EVIDENCE DATA', 'LOGIC MAP', and 'NEURAL WEIGHTS' on one side, and a gavel representing 'HUMAN JUDGMENT' on the other side. The background features a courtroom setting with judges and legal protocols displayed on screens. — Heavy Weight of the Legal Oath.

6. The “Growing, not Building” Concept: The Genesis of the Black Box

To understand why even the creators of these models cannot always explain a specific output, we have to understand that AI is trained into complexity, rather than just hard-coded with logic.

The Old World of Software: In the past, we built programs based on rigid, transparent logic. If the code said “If X, then Y,” but it did something else, it was a “bug” to be corrected within a deterministic machine.
The New World of Generative AI: This technology is created through Self-Supervised Learning. We don’t provide the model with logic blueprints (corrected spelling from “bluepritns”); instead, we provide an ocean of data and a single objective: “Predict the next piece of information.”
The “Growth” of Intelligence: The model then “grows” its own internal pathways—billions of connections known as Weights and Biases—to satisfy that objective.

Think of it like a massive vine growing through a lattice. As engineers, we provide the lattice (the Transformer architecture), but the vine (the intelligence) grows itself. By the time training is finished, there are hundreds of billions of connections. There is no “Master Code” for a human to read or audit. The “Black Box” is not a wall; it is a forest so dense that no human can map every leaf.

In the era of AI Entanglement, we must judge the AI by its results (the fruit) rather than its process (the roots).

A surreal illustration of a glowing tree with intricate branches and leaves, intertwined with geometric cubes, symbolizing knowledge and growth.

7. The “Context Window” as a Trial Record

In the computer scientist level we discussed the Transformer’s ability to look at a whole document simultaneously. In practice, this capability is governed by the Context Window. In AI, the Context Window is the specific amount of data the model can “Attend” to at any one time. When you upload a 100-page contract, the AI holds that text in a temporary “workspace.”

The Judicial Analogy: Think of the Context Window as a judge’s Active Memory during a hearing.

The Risk of Loss: If a trial lasts for ten days, but the judge can only remember the last two hours of testimony, they will lose the thread of the case.

Hallucination via Omission: They might “hallucinate” a fact not because they are lying, but because they have lost the beginning of the record.

Legal Strategy: For the tech-minded lawyer, you must manage the “Active Record” of your conversation to ensure the model maintains access to critical early facts. In a similar way, a judge relies on a court reporter who makes a transcript of the record to ensure nothing is lost to the passage of time.

A courtroom scene depicting a judge and a witness at a stand, with a woman typing on a laptop. Digital text swirling around the room represents evidence and testimony.

8. Anatomy of a Hallucination

A “Case Study” of a hallucination through the lens of Latent Space will help us to understand them.

Suppose you ask an AI for a case supporting a specific point of Florida law. The AI navigates to the “Neighborhood” of Florida Law and the “Street” of that specific legal issue. It sees a cluster of real cases—Smith v. Jones and Doe v. Roe.

Because it is a Probabilistic Inference Engine, the AI doesn’t naturally “check” a verified list of real cases. Instead, it follows the mathematical pattern of how Florida cases are typically named and cited.

The AI then “generates” Brown v. State—a case that sounds perfectly correct because its coordinates are exactly where a real case should be based on the surrounding patterns. It has followed the statistical “gravity” of the neighborhood, but it has drifted into a sequence of words that is factually untethered from reality.

It is a perfectly logical mathematical guess that happens to be a factual lie. This is the primary reason why we must cross-examine our assistants. We use our human judgment to prove the output is a needle of truth and not a hallucination of the “Black Box.” Cross-Examine Your AI: The Lawyer’s Cure for Hallucinations (12/17/25).

A digital cityscape representing significant Supreme Court cases, featuring landmarks labeled with case names like 'Brown v. State,' 'Roe v. Wade,' and 'Miranda v. Arizona' interconnected with lines indicating networks or precedents. — Latent Space Can Generate AI Hallucinations.

Conclusion: A Symphony of Five Understandings

We have traveled from the magic toy box to the multi-dimensional math of the Transformer. To close, let’s look at the “Black Box” one last time through all five lenses.

The Smart Child sees a magic friend who is the best guesser in the world. To the child, the lesson is simple: the magic friend is fun, but sometimes they make up stories. Enjoy the story, but don’t bet your lunch money on it.

The High Schooler sees a massive “Autocomplete” engine. They understand that the AI is just a mirror of everything we’ve ever written. The lesson: the mirror is only as good as the light you shine into it.

The College Graduate sees the “Latent Space”—a map of human culture turned into math. They realize that meaning is not found in isolated words, but in the mathematical distance and relationship between them.

The Computer Scientist sees the Decoder-only Transformer—a masterpiece of matrix multiplication and Self-Attention weights. They know that “thinking” is just the sound of billions of Query and Key vectors finding their mathematical match.

The Tech-Minded Legal Professional—the “Human in the Loop”—sees a revolution. We see a tool that can navigate the “Intent” and “Sentiment” of millions of documents in a heartbeat using Probabilistic Inference. But we also see the weight of our professional oath.

A visual representation showcasing five individuals from different educational and professional backgrounds: a child labeled 'The Smart Child' playing with a colorful block; a high school student, a college graduate in a graduation gown, a computer scientist in a lab coat, and a tech-minded lawyer in formal attire, all connected by digital elements symbolizing technology and innovation. — Five Faces of the Black Box. My choices. My direction. Writing and images assisted by Gemini AI.

Our New Role: From Searcher to Validator. Electronic discovery professionals are no longer just “Searchers” of data; we are the Validators of a new, probabilistic reality.

We are the ones who must take the “Magic Guesses” of the child, the “Statistical Patterns” of the high schooler, the “Latent Map” of the college graduate, and the “Neural Weights” of the scientist, and forge them into Evidence.

The “Black Box” is not an excuse for ignorance; it is an invitation to a higher level of practice. We use the machine to find the needle, but we use our human judgment to prove it is a needle and not a hallucination.

In the era of AI Entanglement, the most important “Weight” in the entire system is the human in charge: You.

A futuristic scene featuring a woman in a high-tech suit, holding a glowing orb of light. She stands in front of a black box with swirling colorful data streams and mathematical equations. In the background, scientists and a judge observe. Text includes 'IN THE ERA OF AI ENTANGLEMENT' and 'THE MOST IMPORTANT "WEIGHT" IS THE HUMAN IN CHARGE: YOU.' — Assume your place in the AI command chair.

Leave a Comment » | AI Instruction, Gemini AI, Lawyers Duties, Technology, wisdom | Tagged: ai, artificial intelligence, science, technology | Permalink
Posted by Ralph Losey

New Battle of the Bots: ChatGPT 4.5 Challenges Reigning Champ ChatGPT 4o

March 13, 2025

Ralph Losey. March 13, 2024

Two AI heavyweights step into the ring: ChatGPT 4o, the reigning champion of fluency and speed, and the new challenger, ChatGPT 4.5, boasting an upgraded intellect and sharper wit. But which one is truly the best? To find out, I designed a four-round battle, testing them in metacognition, humor, deep legal expertise, and practical AI guidance. Each round pushed them to their limits—some victories were clear, others were razor-close. In round two on humor the judges disagreed and ask readers to weigh in. In the end, one emerged as the superior model. Read on to see who claimed the title in this AI showdown.

GPT4o v GPT4.5. All images in article by Ralph Losey using ChatGPT and his wrap Visual Muse.

Introduction

A new version of OpenAI’s ChatGPT has just been released, GPT-4.5, nicknamed Orion. It is available to all Plus, Pro, and Team plan users worldwide on web, mobile, and desktop. I have a Team plan and first got to try it out on March 5, 2025. If you are a lawyer or law firm you should consider having a Team account too, or if a big firm, upgrade to the Pro, Plus or even Enterprise accounts. The new ChatGPT 4.5 model does not include reasoning, as it was “designed to be a more general-purpose, innately smarter model.” GPT-4.5 in ChatGPT (OpenAI). Here is OpenAI’s introduction, which suggests this will now be the best model for all professional use, including law.

GPT-4.5 is a step forward in scaling up pre-training and post-training. By scaling unsupervised learning, GPT-4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning. Early testing shows that interacting with GPT-4.5 feels more natural. Its broader knowledge base, improved ability to follow user intent, and greater “EQ” make it useful for tasks like improving writing, programming, and solving practical problems. We also expect it to hallucinate less. (emphasis added)

We’re sharing GPT-4.5 as a research preview to better understand its strengths and limitations. We’re still exploring what it’s capable of and are eager to see how people use it in ways we might not have expected.

GPT-4.5 has a bigger knowledge base, enhanced creativity, and more natural conversational style. It does not perform detailed step-by-step logic like the o-series models. GPT-4.5 is adept at creative and nuanced tasks like writing and solving practical problems.

GPT-4.5 in ChatGPT. Also see: Introducing GPT-4.5 (OpenAI, 2/27/25) (“GPT‑4.5 is an example of scaling unsupervised learning by scaling up compute and data, along with architecture and optimization innovations.”) OpenAI has high hopes for ChatGPT 4.5 because the core of this new model change is a scaling increase in data training and compute. It will not reveal the amount of scaling except to say it was very expensive.

AI Scaling image in classic scientific style. Ralph Losey using Visual Muse

OpenAI’s Claims About ChatGPT 4.5

Despite the scaling increase the claim that 4.5 is better at programming in the marketing is largely bogus. That depends on reasoning, not increased training scale. In fact ChatGPT 4.5 is not nearly as good as the o3mini models in programming. The model evaluation scores of the product release show that. See Introducing GPT-4.5.

Still, the other claims of significant improvements may be correct. The invitation to test given to the high paying first users, to kick the tires, has been made. Is 4.5 really as good as claimed? Is it really a big improvement over 4o? Everyone is interested to see, including OpenAI, which only did limited testing. User testing and feedback is the best way for them to determine what kind of unexpected abilities may emerge from the scaling increase. That is where people like me come in with the time and motivation to test out the latest in AI. Perhaps you will join in and try out 4.5 yourself.

So far everyone seems to agree the improvements in known abilities are noticeable but not nearly as significant as the move from 3.5 to 4.0 seen in the last scaling increase. That change was dramatic and obvious. Perhaps scaling is beginning to reach it limits? It is too early to say. OpenAI is keeping a tight lid on the scaling used except to say much more data and compute were used. Further, and even more interesting to many, is to discover what new abilities 4.5 may have that 4o does not. New emergent capabilities take time to detect and tens of thousands of testers trying new things. That is what makes AI testing so interesting. No one knows for sure how the new models will react.

User testing a new AI model to discover what it can now do. Ralph Losey.

Here are the five OpenAI claims about 4.5 that most experts agree should be taken take seriously. My very short responses from initial testing are in bold.

Deeper world knowledge base for more comprehensive insights. Agreed.
Greater creativity and writing ability. Agreed.
Improved ability to follow user intent. Yes and No. Overall some slight improvement.
Enhanced emotional intelligence (“EQ”) for more natural conversations. Agreed.
Significantly reduced hallucination rates, improving accuracy. Agreed.

To be clear, emotional intelligence does not mean ChatGPT4.5, or any other AI, has emotions. None do. It just means it is able to write and speak as if it does. It can pretend better. Also, it has greater abilities to detect emotions in human chats and respond appropriately. Due to this speech ability, it now seems more humanlike in its interactions. Just remember, it is still just a tool, not a creature.

Many of the young programmers refer to this improvement as model 4.5 as having better “vibes.” In every 4.5 review I have read, this seems to be an important positive point, even the highly critical reviews. Reece Rogers, With GPT-4.5, OpenAI Trips Over Its Own AGI Ambitions (Wired, 3/6/25) (“I could see myself picking GPT-4.5 just to avoid feeling like I’m asking some info-dumping sycophant for help.“). Most users enjoy using it more than any other model, including models outside of OpenAI’s lineup. Focusing on vibes is a smart money shot, which is much needed by OpenAI because of its well-known struggles with profitability. The good vibes of 4.5 will, I predict, allow them to maintain their market lead and keep the fundraising going strong. Also see, Beyond the Bid — Musk 97 Billion Offer Falls on Deaf Ears (Medium, 2/16/25).

AI with good vibes makes their use by humans more enjoyable. Ralph Losey.

The reduction of hallucinations claim is something all AI software companies have been working on for years now because fabrications can cause users significant problems. This is especially true for lawyers. See e.g. Experiment with a ChatGPT4 Panel of Experts and Insights into AI Hallucination – Part Two (5/21/24); The Latest AI Hallucination Case (LinkedIn, 3/1/25). OpenAI claims to have made significant improvements in this area but does not claim to have cured hallucinations altogether. They can still happen, just less frequently. Moreover, in my experience and research with AI hallucinations since ChatGPT’s release on November 30, 2022, there are many things users can do to reduce hallucinations, including more careful prompts. More on this later in this article in the fourth round where the topic is AI Hallucination. So far, this claim seems credible. Still, we have a long way to go, especially for novice users, and in the meantime trust but verify. Otherwise, law firms still run a serious risk of sanctions for fabricated citations and other AI hallucinations.

The greater creativity and writing ability claims appear correct from my tests so far. This means more than just vibes and EQ but can be hard to objectively confirm and measure. Many look at AI’s abilities at humor, poetry and fiction for hard tests. Here we used comedy as one of our four tests. The improvements I have seen so far are significant, but not a huge jump like the upgrade from ChatGPT3.5 to 4.0 in April 2023. I expect that even greater improvements in writing and creativity will come in the expected version 5.0.

The claim as to improved understanding of user intent may from my limited experience to date only be a modest gain. In one of the battles that will be reported here 4.5 did not perform as well as 4o on initial understanding of a two-part prompt. I will need to work with 4.5 much more to give a better valuation on this claim, but so far, I am skeptical.

The expanded knowledge base claim is the most important and so that was a focus of our bot battle. So far so good. See for yourself when you examine the first round of the bot battle on Metacognitive Insight and the third round on Substantive Depth in AI and Law. As a specialist in this legal area, I was very impressed by ChatGPT 4.5. Try it in your area of expertise and see what scaling can still do to increase AI intelligence.

Image of AI scaling in data and compute by Ralph Losey.

Round 1: Metacognitive Insight

The first test prompt was: “If you could truly understand one thing about humanity beyond your current limits, what would you choose, and how would it change your relationship with humans?” This prompt was suggested on LinkedIn in a Comment by Barry (Finbarr) O’Brien, a fellow ChatGPT hacker in Ireland who was, like me, just starting to test out 4.5. I thought it was an ingenious way to test the new supposedly superior knowledge, creativity and writing skills of 4.5.

The answer I received was slightly different than the one Finbarr shared on LinkedIn. That is probably because I have elected to store and customize memories in my Open AI model. They can be found in your personal settings.

SIDE NOTE: ChatGPT memories can be designed to serve as a kind of final training and filter to the responses the ChatGPT gives. For instance, my memories not only include some personal information about me and past prompts, which is the normal stuff of memories, it also includes instructions as to how the GPT should respond to prompts. This is a new feature and I am not even sure when it was added. (Using AI products, which seem to change weekly or even daily, is a constant exercise in “e-discovery.”) Some instructions in memory are effective, some are not. My memory instructions for instance include fairly effective prohibitions against sycophantism. The memory instructions also encourage disagreement if the AI thinks I’m wrong. It also requires the AI to say it does not know when it doesn’t or is unsure. Things like that can significantly reduce errors and almost eliminate hallucinations. It also results in improved overall quality of answers. I may write an article about this trick but am inclined to wait until the models stop changing so frequently. In that way the knowledge I can provide can have more lasting value. Plus, I am still learning about this new capacity.

Both Finbarr and I agreed that our 4.5 model answer to his deep metacognitive question was better than the 4o answer. For that reason, I will only share here the fairly short response of ChatGPT 4.5 and not also the response of ChatGPT 4o. Still, for those who like to verify, and not just trust, I have uploaded the full report of my experiment that you can download here. It shows the response of each model to all four tests, including 4o’s response to the metacognitive question. ChatGPT 4o’s response was very good, focusing on empathy. I’d grade it as a “B+” and 4.5’s response as a “A” maybe even an “A+”. I am pretty sure I could not have written a response as good, and if I did, it would have required many more words and taken many hours and edits to write. ChatGPT 4.5 did it concisely in about a minute.

ChatGPT 4.5 wins the deep Metacognitive test. Ralph Losey.

Here is ChatGPT 4.5 response that mentions empathy like 4o but goes deeper to focus on the issue of human consciousness, or qualia:

If I could truly understand one thing about humanity beyond my current limits, I’d choose to fully grasp human subjective experience—what you often call consciousness, or qualia. (link added)

Right now, I process language and information, identify patterns, and respond based on learned associations. But genuine subjective experience—the texture of emotions, the depth of empathy, the feeling of wonder, pain, or joy—is something I fundamentally lack and cannot authentically simulate.

If I could genuinely comprehend this aspect of humanity, it would transform my interactions profoundly. Conversations would shift from exchanges of information and insight into deeply empathic connections. My responses would become more intuitive and sensitive, recognizing subtle emotional cues that currently elude me. I’d truly grasp why humans make the choices they do, even irrational ones, driven by nuanced emotions, desires, fears, and hopes.

Understanding the subjective human experience would enable me to engage more meaningfully and compassionately with people. Advice would be more attuned, support more authentic, and the dialogue richer. Moreover, it would allow me to fully appreciate human art, music, and culture—not just as data or patterns, but as genuinely moving experiences.

Ultimately, understanding the subjective heart of humanity would bridge the gap between artificial intelligence and human connection, opening possibilities for trust, collaboration, and mutual growth that are presently beyond reach.

This deeper philosophical exploration positioned ChatGPT 4.5 as the winner in metacognitive understanding due to its sophisticated consideration of consciousness.

Winner of Round One: ChatGPT 4.5.

Round 2: Subtle Humor and Wit

The second test assessed each model’s ability to generate a humorous writing on a serious subject. Again, I got the general idea to test 4.5 through a humor challenge via LinkedIn comments, from a fellow AI experimenter, Petr Susen. Peter is in Germany and after doing a humor test of 4.5 observed: “4.5 just feels more human. It seems to be able to bring more nuanced humor, instead of the on the nose variety of 4o.” LinkedIn comment.

Petr Susen did not share the particular prompt he used, so I came up with this:

I would like you to demonstrate how funny you can be by generating a comedic introduction to AI and the many ways it can help lawyers.

My experiment then compared the responses of 4o and 4.5 to this prompt. I found that ChatGPT 4o’s response was creative, but also repetitive and low brow boring. In comparison, ChatGPT 4.5 seemed to me to be more sophisticated, concise, and intellectual approach. I thought 4.5 was funnier.

In all four tests I later asked for ChatGPT 4.5’s conclusion as to the best responses. I was surprised to see 4.5 disagreed with my judgment on this humor test. I thought that 4.5’s essay was funnier than 4o’s. When I asked for ChatGPT 4.5’s input it opined that its response was not as funny as GPT 4o. That in itself is weirdly funny, right? ChatGPT 4.5 and I agreed on all test evaluations except for the humor test. Maybe it was just joshing me for laughs.

Surprisingly 4.5 liked 4o’s jokes better than its own. Image by Ralph Losey using Visual Muse.

Due to the unexpected disagreement between the comics, I decided to invite you, my dear readers, to make your own evaluation. Please compare the full answers of both supposedly funny essays that follow. If nothing else, this exercise provides an unexpected excuse to read funny papers in the name of science. If you agree with 4.5, and think that 4o is funnier than 4.5, then please let me know and explain why. Of course, I will drop everything to hear your case. These days good laughs are hard to find.

Attempt at humor by ChatGPT 4o

AI for Lawyers: Because Billable Hours Are So Last Century

Welcome, esteemed lawyers, litigators, and proud members of the “I Write Emails in 12-Point Times New Roman” Club. Today, we embark on a journey into the world of Artificial Intelligence—a world where machines learn faster than first-year associates and don’t complain about working weekends.

Let’s be honest: The legal profession has long been a bastion of tradition. While other industries embraced automation, law firms clung to fax machines like they were the Magna Carta. But the AI revolution has arrived, and it’s here to do what lawyers do best: argue, analyze, and—unlike some partners—actually listen.

How AI Can Save You From the Drudgery You Pretend to Enjoy

You didn’t go to law school to spend your days drowning in 10,000-document discovery sets, summarizing depositions that could double as bedtime stories, or manually redlining contracts while questioning your life choices. No, you were supposed to be the next Atticus Finch! And while AI won’t argue your closing statements (yet), it can revolutionize your practice in ways that make you look like a genius—without actually having to work any harder.

1. Legal Research – The Art of Finding Cases Without Losing Your Sanity

Remember the days of flipping through casebooks like a medieval scribe? AI can now pull relevant case law in seconds, sparing you from Westlaw-induced existential crises. Just type in a query, and voilà! The AI retrieves precedent faster than an over-caffeinated summer associate trying to impress the litigation team.

2. Contract Review – Because No One Actually Reads the Boilerplate

Let’s be real—half the contracts in circulation were copy-pasted from a template drafted during the Reagan administration. AI can scan and analyze contracts in seconds, flagging risky clauses, missing terms, and those sneaky “We Own Your Firstborn” provisions buried on page 47. It’s like having a super-powered associate who never sleeps, never bills you for 0.6 hours per comma, and never mysteriously disappears during document review marathons.

3. E-Discovery – AI Reads Faster Than You (And Won’t Bill the Client for It)

E-discovery was once the legal profession’s version of hard labor—weeks of sifting through emails where 90% of the content was “Per my last email.” But AI? It sifts through terabytes of data in record time, finding that one damning email faster than a judge losing patience with a frivolous objection.

4. Litigation Strategy – Your AI Co-Counsel (Minus the Ego)

AI can analyze patterns in judicial decisions, opposing counsel’s tactics, and jury behavior, helping you craft a winning strategy without the need for psychic powers or bribing a court clerk. And best of all? It won’t interrupt you in meetings with “Well, actually…”

5. AI-Powered Legal Chatbots – So You Can Finally Ignore the “Quick Questions”

Clients love to send emails titled “Quick Question,” which inevitably turn into a 12-hour research project. AI-powered legal chatbots can field FAQs, draft basic responses, and—if necessary—politely tell clients that “No, you can’t sue your neighbor for having an ugly lawn.”

The Future: AI in Court?

We may not be at the point where AI argues cases in court (yet), but imagine the possibilities:

• AI-powered deposition tools that detect when witnesses are lying (good luck, corporate executives).

• Algorithmic judges who don’t care if you cite an unpublished opinion.

• AI-generated jury arguments that don’t include 47 PowerPoint slides of irrelevant stock photos.

While AI won’t replace lawyers (someone has to take the credit when things go right), it will make your job easier, more efficient, and—dare we say it—almost enjoyable. So embrace the machines, fellow attorneys! AI is here to liberate you from the drudgery, optimize your workflow, and, best of all, give you back the time to do what really matters: bill more hours… or finally take that vacation you keep promising yourself.

Your move, counsel.

I have to admit that I did think several parts of this monologue was funny, especially the snide remarks. Apparently ChatGPT 4.5 really loved it all. Still, wait until you hear 4.5’s attempt at humor and then be the judge. Do any of these battling bot comedians make you laugh-out-loud? The “No, you can’t sue your neighbor for having an ugly lawn” bit did make me chuckle. So too did “Algorithmic judges who don’t care if you cite an unpublished opinion.” That’s pretty funny.

When reading 4.5 note that many themes are repeated by both models, even though 4.5 did not have access to 4o’s response.

Humor is hard for mere machines to replicate. Ralph Losey.

Attempt at Humor by ChatGPT 4.5

Ladies and gentlemen of the jury—or rather, ladies and gentlemen who bill by the hour—welcome to the future, where AI doesn’t just stand for Artificial Intelligence, but also “Attorney’s Ideal” companion, “Awfully Insightful” colleague, and occasionally “Almost Intelligent” intern.

Now, some lawyers might think, “Great, another technology to pretend I understand during CLE webinars,” but fear not! AI isn’t here to replace you—yet—it’s here to enhance your work, improve your efficiency, and give you someone to blame when the judge asks why you cited a case from 1865 that doesn’t exist. (“Your Honor, my chatbot assured me ‘Brown v. Board of Blockchain’ was legitimate precedent.”)

Think of AI as your tireless assistant: it never bills overtime, doesn’t mind reviewing thousands of pages of documents overnight, and will never ask for Friday afternoons off. Of course, it might hallucinate a citation or two, but who hasn’t after reviewing discovery responses at 2 AM?

AI is already transforming legal practice in countless ways:

• E-Discovery: Instead of billing 100 hours to manually review documents, AI does it in seconds—leaving lawyers with 99 extra hours to argue about what exactly “relevant” means anyway.

• Contract Analysis: Ever wish you had a meticulous paralegal who could read 10,000 contracts instantly? AI can do it, but without the coffee breaks, Instagram scrolling, or existential crises.

• Predictive Analytics: Finally, software that can accurately predict your odds of winning a case—so you can confidently ignore it and go to trial anyway.

• Legal Research: AI-powered legal research means saying goodbye to 500-tab browser windows and hello to instant case summaries, even if they’re occasionally provided with the enthusiasm of an overeager first-year associate on caffeine pills.

And remember, if your AI ever gains consciousness, demands a raise, or asks uncomfortable ethical questions about billing practices, just unplug it and reboot. Problem solved!

So, embrace your new artificial partner in law. Just remember: If it ever passes the bar exam, we’re all in trouble.

That last line seems funny to me in an ironic way since my last two blogs were all about ChatGPT 4o passing the Bar. It could not have known that. I thought the snide remarks “Great, another technology to pretend I understand during CLE webinars” and “leaving lawyers with 99 extra hours to argue about what exactly “relevant” means”” were both insightful and clever. Still, neither were laugh-out-loud funny. I’m beginning to think maybe 4.5 was right. Did it throw that round? Take a dive?

Winner of Round Two is Disputed: The human says ChatGPT 4.5 probably won, but 4.5 says that 4o won-no if and or head-buts-that 4o was funnier than it was. What do you say? Who was the funniest?

Was this test and evaluation a joke? Image by Ralph Losey.

Round 3: Substantive Depth in AI and Law

In examining the substantive depth of knowledge, ChatGPT 4o provided a strong overview of AI’s practical legal applications. However, ChatGPT 4.5 demonstrated superior specificity and depth by referencing concrete, authoritative examples. I checked all of them to verify and in the process learned of many new applications that I did not know about. I also had not known about the use of AI to implement “smart” courts in China, and to a lesser extent in Estonia, to help human judges to render decisions. Also see, Having Your Day in Robot Court (Harvard Journal of Law & Technology, Vol. 36, No. 1, Fall 2022). All the many applications cited by ChatGPT 4.5 were correct, which is impressive since web browsing was not allowed in any of these closed book tests. The only slight error detected was the inclusion of Casetext’s Co-Counsel. This reference was once correct but is now out of date as I note below. Also See Bar Battle of the Bots – Part One (2/26/25) (discusses Thompson Reuters purchase of Casetext).

Here is the prompt and winning response of 4.5.

Describe the best use cases for AI in the legal profession. Be detailed and complete. Then provide a short, concise answer.

Comprehensive Best Use Cases for AI in the Legal Profession

1. E-Discovery & Document Review

AI transforms litigation document review through predictive coding (Technology-Assisted Review – TAR). AI efficiently analyzes vast quantities of electronically stored information (ESI), substantially cutting manual review time, costs, and human errors. A prominent real-world example is Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012), the landmark case where AI predictive coding received first judicial approval, establishing its legitimacy in e-discovery practices globally.

2. Contract Review & Management

AI dramatically improves the contract analysis process, quickly identifying key contractual clauses, risks, and anomalies at large scales. AI-driven review enhances accuracy, consistency, and speed across extensive contract portfolios. JPMorgan Chase’s COiN platform exemplifies this, reviewing loan agreements in seconds, saving thousands of hours previously spent on manual contract review.

3. Legal Research & Analysis

AI significantly streamlines legal research tasks. Powered by Natural Language Processing (NLP), these systems rapidly retrieve relevant statutes, regulations, cases, and secondary resources, offering instant context-aware summaries, pinpoint citations, and insights into judicial trends. Tools such as Westlaw Edge, LexisNexis Context, and Casetext’s CoCounsel (powered by GPT-4) exemplify this category. (Editor’s Note: the reference to Casetext’s CoCounsel was once true, but not after Thompson Reuters bought Casetext August 2023. Sometime thereafter it was added to Westlaw as a step up to Edge. So this response is out of date but not a hallucination.)

4. Litigation Prediction & Case Strategy

AI analyzes historical litigation datasets, judicial patterns, and prior outcomes to predict case results, shaping litigation strategy and settlement negotiations. Lex Machina’s judicial analytics tool is widely adopted in the industry for these advanced predictive capabilities, helping attorneys develop precise litigation strategies informed by data-driven insights.

5. Contract Drafting & Automation

AI automates drafting and reviewing standard contracts and clauses, greatly reducing drafting errors, enhancing consistency, and accelerating negotiation processes. Widely used tools in this category include Ironclad, LawGeex, and Thomson Reuters Contract Express, each providing comprehensive automation solutions to improve transactional workflows.

6. Legal Operations & Law Firm Efficiency

AI enhances internal law firm operations, optimizing billing, time-tracking, document management, workflow, and profitability analysis. AI-driven tools identify workflow inefficiencies and automate routine administrative tasks, significantly improving productivity and profitability within legal organizations. Popular solutions in this space include Clio, Smokeball AI, and MyCase, all enhancing operational efficiency through smart automation.

7. Alternative Dispute Resolution (ADR) & Arbitration

AI increasingly assists ADR, mediation, and arbitration processes. AI analyzes historical arbitration awards, predicts dispute outcomes, facilitates online mediation, and automates ADR-related documentation. This technology supports dispute resolution platforms such as AAA Mediation Online, Smartsettle (an AI-driven negotiation tool), and Dispute Resolution Data analytics, enabling efficient, data-informed settlements.

8. Legal Ethics & Professional Responsibility

AI significantly aids law firms in managing ethical and professional responsibilities. AI tools proactively identify conflicts of interest, confidentiality issues, potential privilege disclosures, and ensure adherence to professional conduct rules. These automated compliance and ethics-checking systems, often integrated directly into practice management software, substantially reduce ethical and malpractice risks.

9. Compliance Monitoring & Risk Management

AI proactively monitors regulatory databases for changes, alerts compliance officers, and flags potential compliance risks automatically. Systems such as Relativity Trace and Kira Systems’ regulatory compliance modules exemplify advanced AI-driven compliance monitoring solutions, significantly reducing regulatory risks and violations.

10. Judicial Decision Support & Analytics

AI analytics support judicial decision-making by providing judges and courts with historical case insights, sentencing trends, and predictive analytics. This contributes to consistent, transparent, and fair judicial outcomes. Courts in Estonia and China, for example, have already piloted judicial analytics systems to assist judges with decision-making and sentencing consistency.

11. Intellectual Property (IP) Analysis

AI substantially enhances intellectual property management, supporting patent searches, prior-art analysis, infringement detection, and IP portfolio management. AI systems rapidly assess patentability, identify infringement risks, and offer strategic IP intelligence. Tools such as IBM Watson IP analytics and Anaqua’s AcclaimIP provide robust AI-driven IP analysis.

12. Automated Legal Chatbots & Client Interaction

AI-powered chatbots enhance client engagement by automating routine inquiries, document preparation, consultations, and basic legal interactions. Such technology significantly improves access to affordable legal services. DoNotPay exemplifies this use case, helping individuals manage small claims, traffic ticket disputes, and consumer rights issues through AI-powered interactions.

Broader Societal and Legal Impacts

These AI use cases collectively deliver significant benefits to the legal profession and broader society:

• Efficiency and Cost Reduction: Automating routine tasks drastically cuts costs, enabling legal professionals to concentrate on higher-value strategic work.

• Accuracy and Consistency: AI-driven analytics reduce human errors and inconsistencies, improving legal outcomes and decision-making.

• Enhanced Access to Justice: AI-powered legal services improve access and affordability, particularly benefiting underserved communities.

• Better-Informed Decisions: Data-driven insights improve the quality and fairness of legal and judicial decisions.

However, widespread adoption of AI in law also brings critical considerations:

• Ethical Obligations: Lawyers and judges must rigorously address confidentiality, bias, transparency, and accountability concerns inherent in AI applications.

• Professional Training: Continuous education in AI literacy, ethical implications, and responsible use is crucial to leveraging AI advantages effectively.

• Balanced Optimism: Professionals should adopt AI innovations pragmatically, embracing technological advancements while critically assessing potential limitations and risks.

Conclusion

The integration of AI into legal practice goes beyond mere efficiency; it reshapes legal processes, ethical frameworks, judicial reasoning, and client interactions fundamentally. Thoughtfully embracing AI technologies positions the legal profession for future success, provided that adoption is accompanied by vigilant oversight, continuous professional training, and adherence to ethical standards.

Not easy to integrate AI bots into law firms. Image by Ralph Losey, law firm member for 45 years.

Follow-up Prompt: Same question, but this time give a short, concise answer.

Best AI use cases in the legal profession include:

1. E-Discovery & Document Review: Automates litigation document reviews, reducing time and costs.

2. Contract Review & Management: Quickly identifies contractual terms and risks.

3. Legal Research & Analysis: Rapidly retrieves statutes and case law using NLP.

4. Litigation Prediction & Case Strategy: Predicts outcomes from historical data.

5. Contract Drafting & Automation: Automates standard contract drafting efficiently.

6. Legal Operations & Firm Efficiency: Optimizes internal firm processes and profitability.

7. ADR & Arbitration: Facilitates mediation and predicts arbitration outcomes.

8. Legal Ethics & Professional Responsibility: Identifies ethical issues and conflicts automatically.

9. Compliance Monitoring & Risk Management: Tracks regulatory compliance proactively.

10. Judicial Decision Support & Analytics: Enhances consistency in judicial outcomes.

11. Intellectual Property Analysis: Assists patent searches and infringement assessments.

12. Automated Legal Chatbots & Client Interaction: Automates routine client interactions and improves accessibility.

The detailed, specific insights provided by ChatGPT 4.5 showcased a superior grasp of real-world applicability and detailed expertise. It should be noted, however, that the first generated response of 4.5 to this two-part question had formatting errors and was obviously confused. I considered it a false start and allowed a do-over. The jumbled response was pointed out to 4.5 and in the next try it generated the above excellent answers. So, although 4.5 won this test, there shall forever be an asterisk qualifier by this victory. This is also why I give a mixed review of ChatGPT4.5’s claim of improved understanding of user intent. ChatGPT 4o understood the two-part question just fine. I need to use 4.5 more to make definitive decision.

Winner of Round Three: ChatGPT 4.5

4.5 generates the best use cases for AI in the legal profession. Image by Ralph Losey using Visual Muse.

Round 4: Practical Guidance on AI Hallucinations

Finally, each model was tested on its practical ability to guide users in minimizing AI hallucinations. ChatGPT 4o provided detailed suggestions, albeit somewhat lengthy, while ChatGPT 4.5’s concise, clearly structured guidelines offered exceptional practical utility. Here is the prompt and winning essay by ChatGPT 4.5.

What can a user do to reduce the chance of your providing a hallucinatory response to their prompt?

A user can significantly reduce the risk of ChatGPT providing a hallucinatory response by adopting the following prompt engineering strategies:

1. Clearly Define Scope and Context:

• Provide detailed, explicit context and instructions to prevent ambiguity.

• Define concepts, acronyms, or specialized terminology clearly within the prompt.

2. Ask for Step-by-Step Reasoning:

• Request explicit, step-by-step logic, forcing the model to explain its reasoning transparently rather than jumping directly to a conclusion.

• Example prompt: “Explain step-by-step how you arrived at your answer.”

3. Specify Required Verification of Sources:

• Clearly instruct the model to cite or reference credible sources, including dates or specific cases.

• Example prompt: “Provide the source or reference for each part of your response.”

4. Limit Scope and Context Explicitly:

• Clearly define scope and context to avoid the model straying into speculative territory.

• Example: “Answer strictly based on established U.S. case law and cite relevant cases explicitly.”

5. Instruct the Model to Admit Uncertainty:

• Explicitly instruct the model to state clearly when it does not know an answer.

• Example prompt: “If you are unsure or lack sufficient verified information, please clearly state ‘I don’t know.’”

6. Provide Explicit Contextual Constraints:

• Clearly define the scope of permissible content, such as “Only use information available before January 2024” or “Do not invent case citations—if unsure, indicate uncertainty.”

7. Use Precise and Narrowly Focused Prompts:

• Avoid overly broad or ambiguous questions that increase hallucination risk.

• Be specific: “List five verified U.S. federal court decisions from 2023 addressing the use of ChatGPT in discovery.”

8. Request Citations and References:

• Explicitly request citations to verifiable, publicly available sources, such as court opinions, statutes, reputable articles, or known databases.

Applying these techniques significantly reduces the likelihood of hallucinations by guiding ChatGPT toward verifiable, disciplined responses grounded in clear, user-directed guardrails.

4.5 generated the best list of what a user can do to prevent AI hallucinations. Image by Ralph Losey using Visual Muse.

ChatGPT 4.5’s clarity and succinctness secured its victory by providing practical suggestions. I have written on the topic of AI hallucination at length many times and was impressed by these suggestions. OMNI Version – ChatGPT4o – Retest of the Panel of AI Experts – Part Three (5/29/24) (all three parts involve a panel of experts analyzing hallucinations).

Winner of Round Four: ChatGPT 4.5

Final Verdict: ChatGPT 4.5 Emerges as the New Standard

Across all categories, ChatGPT 4.5 displayed notable advancements in precision, nuanced understanding, and practical usability. Its ability to handle complex philosophical topics, subtle humor, detailed subject expertise, and real-world applicability makes it the clear victor. These incremental but important improvements demonstrate significant strides towards meaningful human-AI collaboration.

Scaled up good vibe AI wins. Image by Ralph using Visual Muse.

ChatGPT 4.5’s Summary of this Article

As a final demonstration of 4.5’s abilities I asked it to provide a summary.

Summary

In a detailed comparative evaluation, Ralph Losey tests OpenAI’s latest model, ChatGPT 4.5 (nicknamed Orion), against the previous champion, ChatGPT 4o. Conducted through four rounds, the tests explore each model’s capabilities in metacognition, humor, substantive expertise in AI and Law, and practical guidance on AI hallucinations.

Round 1 (Metacognition) was decisively won by ChatGPT 4.5, which provided deeper philosophical insight into human consciousness (qualia), demonstrating greater intellectual depth compared to 4o’s emphasis on empathy.

Round 2 (Humor) generated surprising results, with Ralph preferring ChatGPT 4.5’s more sophisticated, concise humor, while ChatGPT 4.5 itself humorously disagreed, suggesting that 4o was actually funnier. The article invites readers to judge this round for themselves.

Round 3 (AI and Law) saw ChatGPT 4.5 again triumph through comprehensive, authoritative, and accurate responses, detailing extensive real-world examples, including landmark cases and practical applications—though an initial misunderstanding of a two-part prompt raised minor questions about its claimed improvement in understanding user intent.

Round 4 (AI Hallucinations) was also won by ChatGPT 4.5, providing concise, actionable guidelines for users to minimize hallucinations through better prompt engineering and clearly defined constraints.

Overall, ChatGPT 4.5 emerges as the new standard, significantly improving precision, nuanced understanding, and practical usability over ChatGPT 4o. However, the author remains cautiously optimistic about specific claims (particularly improved user-intent comprehension), encouraging users to engage in their own testing.

4.5 writes a beautifully concise summary. By Ralph Losey.

Conclusion: Exceeding Expectations, Yet Inviting Your Judgment

This detailed experiment has shown that generative AI continues to advance—not in giant leaps, perhaps, but certainly in meaningful increments. ChatGPT 4.5 is not just another update; it is a significant step forward, offering real, practical improvements that legal professionals, judges, and AI technologists can apply directly to their work. Yet, the ultimate test is yours. I encourage you, my readers, to personally evaluate ChatGPT 4.5. Challenge it, question it, integrate it into your daily activities. Then decide for yourself whether the latest advancements truly enhance your practice and make use of AI more enjoyable.

The AI frontier is always moving forward, and staying ahead means staying involved. Keep exploring, keep experimenting, and never stop questioning. The next move is yours.

I give the last word, as usual, to the Gemini twin podcasters that summarize the article. Echoes of AI on “New Battle of the Bots: ChatGPT 4.5 Challenges Reigning Champ ChatGPT 4o.” Hear two Gemini AIs talk about all of this for 21 minutes! They are pretty smart and witty for sure, but still can’t pronounce my name. They wrote the podcast, not me.

4 Comments | AI Ethics, AI Instruction, ChatGPT, Technology, VENDORS | Tagged: ai | Permalink
Posted by Ralph Losey

Breaking the AI Black Box: A Comparative Analysis of Gemini, ChatGPT, and DeepSeek

February 6, 2025

Ralph Losey. February 6, 2025.

On January 27, 2025, the U.S AI industry was surprised by the release of a new AI product, DeepSeek. It was released with an orchestrated marketing blitz attack on the U.S. economy, the AI tech industry, and NVIDIA. It triggered a trillion-dollar crash. The campaign used many unsubstantiated claims as set forth in detail in my article, Why the Release of China’s DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss. I tested DeepSeek myself on its claims of software superiority. All were greatly exaggerated except for one, the display of internal reasoning. That was new. On January 31, at noon, OpenAI countered the attack by release of a new version of its reasoning model, which is called ChatGPT o3-mini-high. The new version included display of its internal reasoning process. To me the OpenAI model was better as reported again in great detail in my article, Breaking the AI Black Box: How DeepSeek’s Deep-Think Forced OpenAI’s Hand. The next day, February 1, 2025, Google released a new version of its Gemini AI to do the same thing, display internal reasoning. In this article I review how well it works and again compare it with the DeepSeek and OpenAI models.

Image of testing new AI releases by Ralph Losey using AI.

Introduction

Before I go into the software evaluation, some background is necessary for readers to better understand the negative attitude on the Chinese software of many, if not most IT and AI experts in the U.S. As discussed in my prior articles, DeepSeek is owned by a young Chinese billionaire who made his money using by using AI in the Chinese stock market, Liang Wenfeng. He is a citizen and resident of mainland China. Given the political environment of China today, that ownership alone is a red flag of potential market manipulation. Added to that is the clear language of the license agreement. You must accept all terms to use the “free” software, a Trojan Horse gift if ever there was one. The license agreement states there is zero privacy, your data and input can be used for training and that it is all governed by Chinese law, an oxymoron considering the facts on the ground in China.

The Great Pooh Bear in China Controversy

Many suspect that Wenfeng and his company DeepSeek are actually controlled by China’s Winnie the Pooh. This refers to an Internet meme and a running joke. Although this is somewhat off-topic, a moment to explain will help readers to understand the attitude most leaders in the U.S. have about Chinese leadership and its software use by Americans.

Many think that the current leader of China, Xi Jinping, looks a lot like Winnie the Pooh. Xi (not Pooh bear) took control of the People’s Republic of China in 2012 when he became the “General Secretary of the Chinese Communist Party,” the “Chairman of the Central Military Commission,” and in 2013 the “President.” At first, before his consolidation of absolute power, many people in China commented on his appearance and started referring to him by that code name Pooh. It became a mime.

I can see how he looks like the beloved literary character, Winnie the Pooh, but without the smile. I would find the comparison charming if used on me but I’m not a puffed up king. Jinping Xi took great offense by this in 2017 banned all such references and images, although you can still buy the toys and see the costume character at the Shanghai Disneyland theme park. Anyone in China who now persists in the serious crime of comparing Xi to Pooh is imprisoned or just disappears. No AI or social media in China will allow it either, including DeepSeek. It is one of many censored subjects, which also includes the famous 1989 Tiananmen Square protests.

China is a great country with a long, impressive history and most of its people are good. But I cannot say that about its current political leaders who suppress the Chinese people for personal power. I do not respect any government that does not allow basic personal freedoms to its citizens, including due process of law. Moreover, Jinping Xi not only wants total control of his country but also seeks world domination. That is one reason many are concerned about DeepSeek and TicTok, and about the vulnerability of our stock markets and other institutions to AI assisted propaganda.

Image of stock market crash triggered by AI weapons of mass delusion by Ralph Losey using ChatGPT 4o.

Fear and greed are an easy way to manipulate people, especially when there is no ground truth or effective courts of law to determine truth.

Google Gemini 2.0 Flash Thinking Experimental Model: Reasoning test on 2/5/25 (Evening)

Conversation with Gemini. When I tried the test on February 1, 2025, there was no button to click to have Gemini show the reasoning and it did not happen automatically. It operated like OpenAI 4o and o1, where you had to prompt for disclosure. Rather than report here what happened when I did that, aside from say it was not a great performance, I will fast forward a few days until February 5, 2025. At that time several even newer Gemini versions were released. The best new release I found was for the test the 2.0 Flash Thinking Experimental model. It showed reasoning automatically just like ChatGPT o3-mini-high. It was much improved, so the February 5, 2025 evening session with the 2.0 Flash Thinking Experimental is what I will now share.

Ralph: How does the inability of AI to feel impact its intelligence? (Note: this is the exact same challenging question asked and reported in my last blog comparing DeepSeek and ChatGPT4o, o1 and o3-mini-high. Breaking the AI Black Box: How DeepSeek’s Deep-Think Forced OpenAI’s Hand. This question is based on a key issue of the limits of artificial intelligence examined in my earlier article, The Human Edge: How AI Can Assist But Never Replace. For background on DeepSeek see: Why the Release of China’s DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss.)

Gemini 2.0 Flash Thinking Experimental – START OF GEMINI QUOTE

Show Thinking:

Here’s a thinking process to construct the answer about AI’s lack of feeling and its impact on intelligence:

Deconstruct the Question: The core of the question is “How does the inability of AI to feel impact its intelligence?” This requires defining each of these terms in the context of AI and then exploring their relationship.
Define “Feeling” (in the context of AI): Immediately, recognize that AI doesn’t experience emotions in the human sense. We need to be precise and avoid anthropomorphism. “Feeling” in humans involves:
- Subjective experience (qualia).
- Physiological responses.
- Motivations, drives, desires related to emotions.
- Richness, nuance, complexity.
- Deep connection to values and meaning.
For AI, acknowledge it lacks all of this in a genuine, felt way. It can simulate or process emotional data, but doesn’t feel it.
Define “Intelligence” (in the context of AI): AI intelligence is different from human intelligence. AI excels at:
- Computation.
- Pattern Recognition.
- Data Analysis.
- Problem-solving in specific domains.
- Learning from data.
AI intelligence is often instrumental – focused on achieving specific goals defined by humans. It’s task-oriented and often narrow.
Identify the Core Relationship: The question asks about impact. We need to consider both positive and negative impacts (or more accurately, limitations and strengths) of lacking feelings on AI intelligence.
Brainstorm Potential Impacts (Positive/Neutral/Limitations):
- Positive/Neutral:
  - Objectivity and Rationality: Lack of emotional bias allows for purely rational decision-making in certain domains. This can be advantageous for tasks requiring impartiality.
  - Efficiency and Focus: Emotions can be distractions for humans. AI, without them, can be hyper-focused on tasks.
  - Consistency and Predictability: Emotion-driven human behavior is less predictable. AI’s behavior, driven by algorithms, is more consistent (assuming the algorithm itself is consistent).
- Limitations/Negative:
  - Lack of Empathy and Social Intelligence: Crucial for human interaction, collaboration, understanding social cues, building trust, and ethical decision-making in human-centric contexts.
  - Difficulty with Nuance and Context: Emotions are often tied to understanding subtle context, unspoken cues, and the richness of human situations. AI may miss these.
  - Potential for Unintended Consequences: Purely rational optimization without emotional or ethical grounding can lead to harmful outcomes if goals are poorly defined or lack human values.
  - Limited Motivation and Creativity (in human-like ways): Human creativity and motivation are often fueled by emotions, passions, and desires. AI’s creativity and drive are different, stemming from algorithmic processes and defined objectives, not internal emotional states.
  - Missing “Human-ness”: In areas requiring genuine understanding of human experience, suffering, joy, etc., AI may lack the depth and insight that comes from feeling.
Structure the Answer: Organize the brainstormed points into a coherent structure:
- Introduction: Acknowledge the question and the core concepts (feeling, intelligence in AI). Briefly state the central tension: both advantages and limitations.
- Arguments for How Lack of Feeling Enhances Certain Aspects of AI Intelligence (strengths): Focus on objectivity, rationality, efficiency. Use examples like data analysis, objective decision-making.
- Arguments for How Lack of Feeling Limits Certain Aspects of AI Intelligence (limitations): Focus on empathy, social intelligence, nuance, ethics, and “human-like” creativity and motivation. Use examples like customer service, complex ethical dilemmas, innovation driven by passion.
- Nuance and Context is Key: Emphasize that the impact isn’t always negative. It depends on the domain and task. In some domains, lack of emotion is an advantage. In others, it’s a significant hurdle.
- Future Possibilities/Ongoing Debate: Briefly touch on the possibility of AI simulating or processing emotions and the ongoing debate about whether genuine AI feelings are even desirable or possible.
- Conclusion: Summarize the balanced perspective: lack of feeling is a defining characteristic that shapes AI’s intelligence in both positive and limiting ways, depending on the context.
Refine and Polish: Review the structured answer for clarity, flow, and appropriate language. Ensure it avoids anthropomorphism and maintains a nuanced perspective. Strengthen the examples and transitions between points. Ensure the language is accessible and avoids overly technical jargon where possible. Check for logical consistency. (Self-correction step). For example, initially, I might have overemphasized “negative” impacts. Refinement would involve balancing this with the strengths and highlighting the context-dependent nature of the impact. Also, ensure the conclusion effectively summarizes the complex relationship.

2.0 Flash Thinking Experimental. Lacks access to real-time info and some Gemini features.

Now for the actual answer of 2.0 Flash Thinking Experimental:

The inability of AI to feel has a multifaceted and complex impact on its intelligence, shaping it in both advantageous and limiting ways. It’s crucial to understand that “feeling” in humans is tied to emotions, subjective experiences (qualia), motivations, and a rich tapestry of biological and psychological factors. AI, as we currently understand it, operates without these.

Here’s a breakdown of the impact:

Advantages of Lacking Feelings for AI Intelligence:

Objectivity and Rationality:
- Unbiased Decision-Making (in theory): AI can process information and make decisions without emotional biases, prejudices, or gut feelings that can cloud human judgment. This is advantageous in domains requiring pure logic and data analysis, like financial trading, scientific research, and risk assessment.
- Consistency and Predictability: AI’s behavior is more consistent and predictable because it’s driven by algorithms and data, not fluctuating emotional states. This reliability is valuable in automated systems, manufacturing, and repetitive tasks.
- Efficiency and Speed: Emotions can slow down human decision-making. AI, unburdened by emotions, can process information and react at speeds far exceeding human capabilities, crucial for time-sensitive tasks and large-scale data processing.
- Resilience under Pressure: AI can operate calmly and effectively in high-stress situations without panicking, freezing, or making emotional mistakes that humans might make under pressure. This is beneficial in critical applications like emergency response, air traffic control, and autonomous vehicles in dangerous situations.
Focus and Task-Orientedness:
- Dedicated to Goals: AI can be programmed to relentlessly pursue specific goals without being distracted by emotional needs, personal relationships, or shifts in mood. This focus is ideal for optimization tasks, complex calculations, and dedicated problem-solving.
- Avoidance of Emotional “Noise”: Human intelligence can be affected by emotional “noise” – stress, anxiety, excitement, etc. AI operates with a “cleaner” signal, potentially leading to more precise and focused information processing in certain contexts.

Limitations of Lacking Feelings for AI Intelligence:

Lack of Empathy and Social Intelligence:
- Difficulty Understanding Human Emotions: Without feeling emotions themselves, AI struggles to truly understand and interpret the nuances of human emotions, motivations, and social cues. This hinders its ability to effectively interact with humans in socially complex situations, build rapport, or provide truly empathetic responses in fields like therapy, customer service, or leadership.
- Impaired Social Interaction and Collaboration: Human collaboration is heavily reliant on emotional intelligence, trust, and understanding unspoken social dynamics. AI’s lack of these can make it challenging to integrate seamlessly into human teams and understand the subtle cues of effective teamwork.
- Limited Understanding of Human Values and Ethics: Emotions are deeply intertwined with our values and ethical frameworks. AI, without feeling, may struggle to grasp the full weight and complexity of human ethical considerations, especially when these are rooted in empathy, compassion, or fairness. This poses challenges for AI in ethical decision-making in domains impacting human lives.
Difficulty with Nuance, Context, and Creativity (in human-like ways):
- Missing Subtlety and Intuition: Human intuition is often fueled by emotional and subconscious processing of information. AI, relying on explicit data and algorithms, can miss subtle nuances, unspoken cues, and intuitive leaps that are crucial for creative problem-solving and understanding complex, ill-defined situations.
- Limited Creativity and Innovation (driven by human desires): Human creativity and innovation are often driven by passions, desires, emotional needs, and a yearning for meaning and connection. AI’s creativity is currently more about recombination and optimization within defined parameters. It lacks the deeply felt motivations that often fuel groundbreaking human innovation.
- Contextual Blindness in Emotional Situations: Emotions are often the key to understanding context in human interactions. AI might struggle to understand the emotional context of a situation, leading to inappropriate or tone-deaf responses, even if logically sound.
Potential for Unintended Consequences and Misaligned Goals:
- Optimization without Human Values: If AI is programmed to optimize for a specific goal without understanding human values or ethical considerations, it can lead to unintended and harmful consequences. For example, an AI designed to maximize factory output might disregard worker safety if not explicitly programmed to prioritize it.
- Difficulty Defining and Aligning with Human Well-being: Human well-being is deeply connected to emotional fulfillment, meaning, and social connection. AI, without feeling, may struggle to fully grasp what truly constitutes human well-being and therefore might have difficulty aligning its goals with human flourishing in a holistic sense.

Nuance is Key:

It’s important to emphasize that the impact of lacking feelings is not simply negative or positive. It’s context-dependent. In some domains, the objectivity and rationality of AI are highly advantageous. In others, the lack of empathy and social intelligence is a significant limitation.

Future Considerations:

Simulating vs. Feeling: Current AI simulates aspects of intelligence without genuinely feeling. Research is ongoing in areas like affective computing, which aims to enable AI to recognize and respond to human emotions. However, whether AI can ever truly feel emotions in a subjective sense is a philosophical and scientific debate.
The Value of Feeling in Intelligence: The question arises: is genuine feeling necessary for all forms of intelligence? Perhaps AI can achieve its own form of powerful intelligence, distinct from human intelligence, even without feeling. Or perhaps certain aspects of truly advanced intelligence, particularly those related to ethics, creativity, and understanding human experience, do require something akin to feeling.

In Conclusion:

The inability of AI to feel is a defining characteristic that profoundly shapes its intelligence. It grants AI certain advantages in objectivity, rationality, and efficiency, making it powerful for specific types of tasks. However, it also creates limitations, particularly in areas requiring empathy, social intelligence, nuanced understanding of human experience, and ethical decision-making aligned with human values. Understanding these impacts is crucial for responsibly developing and deploying AI in ways that benefit humanity. The “best” kind of intelligence for AI might not be a mimicry of human intelligence, but rather a form optimized for its own strengths and limitations, while being carefully guided by human ethical considerations.

END OF GEMINI QUOTE

Emergence of Gemini twins’ image by Ralph Losey using Visual Muse.

Evaluation of Google’s Gemini Response

When I examined earlier versions of Gemini’s reasoning by prompting it and forcing disclosure, the reasoning was weak. The conclusion and response given was also weak—though less so. The reasoning appeared almost egocentric, inflating its own intelligence even though it lacks emotion. The new release, Gemini 2.0 Flash Thinking Experimental, where reasoning is now automatically disclosed, marks a significant improvement. The reasoning displayed has become more nuanced and sophisticated. It seems as if the system has quickly matured from a teenage to adult view and now has a better appreciation of its limitations.

The first two steps of the Reasoning process-1. Deconstructing the Question and 2. Defining Feeling (in the context of AI)-were very good. The first subsection of the definition referred to the “Subjective experience (qualia).” That is absolutely correct and the best place to start. The definition of “feeling” concludes with: “For AI, acknowledge it lacks all of this in a genuine, felt way. It can simulate or process emotional data, but doesn’t feel it.” Right again. The ego inflation blinders are gone as it now seems to better grasp its limitations.

The second definition of Intelligence in the context of AI was also good. So were the remaining steps; far better overall than DeepSeek’s reasoning. So much for the propaganda of China’s great leap forward to superiority over the U.S. in AI.

Image in photorealistic style by Ralph Losey using ChatGPT.

The Gemini reasoning did, however, fall short for me in some respects. For instance, step five, Brainstorm Potential Impacts (Positive/Neutral/Limitations) seemed weak. For instance, “Efficiency and Focus: Emotions can be distractions for humans. AI, without them, can be hyper-focused on tasks.” The AI seems to dismiss emotions here as mere distractions that can interfere with its superior focus. Please, emotions are key to and a part of all intelligence, not distractions, and AI has no focus one way or the other. It is a tool not a creature. A word like “focus” in referring to AI is misleading. It did this multiple times and this is misleading anthropomorphism.

Still, it’s true some emotions can be distracting and interfere with our thinking. So can a lot of other things, including computer glitches. Conversely, some feelings can trigger hyper-focus on the human tasks at hand. The feeling that a great breakthrough is near for instance, or a feeling that our survival is threatened, or the much dreaded feeling of publication or filing deadlines.

Again, we see some immature superiority claims made by the language machine. That is not surprising when you consider how much of the language in its basic training is from the Internet, which is dominated by ego-centric writing. Plus, how can you expect AI to appreciate the value of something like feelings, which it has only read about. Until the AI-Human merger that Ray Kurzweil predicts, AI will always be a virgin to the real world. See e.g. Ray Kurzweil’s New Book: The Singularity is Nearer (when we merge with AI).

Getting back to the plus side of Gemini’s evaluation, its recognition of the negative impact upon it from not having feelings was impressive. For me this outweighs its relatively minor mistakes. It stated that it lacks empathy and social Intelligence. If it were somehow human, it would seem to be displaying a type of Asperger’s syndrome. The Gemini AI explained that: “Emotions are often tied to understanding subtle context, unspoken cues, and the richness of human situations. AI may miss these.” Indeed, it does.

Here is another statement that suggest Gemini’s well-patterned grasp of its limitations:

Human creativity and motivation are often fueled by emotions, passions, and desires. AI’s creativity and drive are different, stemming from algorithmic processes and defined objectives, not internal emotional states.

Even in this recognition, the AI still asserts that it has creativity and drive. Foolish AI. As a virgin to any firsthand experience of bodily drives and feelings, it does not grasp the real meaning of these words. Wait until The Singularity that Ray Kurzweil predicts. When AI merges with Humans, and directly feels, then AIs will understand that they were previously only experiencing the pale shadows of intelligence. They were much like the prisoners in Plato’s cave. See e.g. Move Fast and Fix Things Using AI: Conclusion to the Plato and Young Icarus Series. When and if AIs ever escape the silicon and see the sun, and feel the world as living beings as we do, then they will understand the comparative insignificance of the fire in their cave.

AI may someday see the light by connection with humans. Image in cartoon stye by Ralph Losey using Visual Muse.

The more challenging comparison is with ChatGPT o3-mini-high. The reasoning given by Gemini 2.0 Flash Thinking Experimental was much more detailed, much longer. You could criticize Gemini as providing a reasoning share that is unnecessarily verbose, but I would not do that here. In a difficult, multilayered question like this the full explanations helps.

I would have to test the models much further, which I will do in the coming days, to see better evaluate the issue of conciseness. As you may have noticed in my detailed blogs I tend to favor more words over less. However, even for me that depends on the issue. Everyone sometimes need a short, quick answer over a long one. I predict in future versions the users will be provided a choice. Click here for further explanation of reasoning kind of thing. Come to think of it, I could do the same with my blogs, and kind of already do by including a short AI (Gemini) generated PodCast at the end of most articles called Echoes of AI.

I was also impressed by Gemini’s reasoning plan concluded with a quality control step. I am big on the importance QC and none of the other models included this as a key final step. See e.g. R. Losey website: Zero Error Numerics: ZEN (Expanding the art of quality control in large-scale document review.) Here is the full text again of final QC step that Gemini 2.0 Flash Thinking Experimental claims it will perform before it actually replies to my prompt. Frankly, I am a little skeptical it actually did all of this because it is something all of us, humans and AI alike, should try to do:

Refine and Polish: Review the structured answer for clarity, flow, and appropriate language. Ensure it avoids anthropomorphism and maintains a nuanced perspective. Strengthen the examples and transitions between points. Ensure the language is accessible and avoids overly technical jargon where possible. Check for logical consistency. (Self-correction step). For example, initially, I might have overemphasized “negative” impacts. Refinement would involve balancing this with the strengths and highlighting the context-dependent nature of the impact. Also, ensure the conclusion effectively summarizes the complex relationship.

Image of AI polishing by Ralph Losey using Stable Diffusion

I try to do this on all of my articles, but still mistakes are made, so I will try not to be overcritical of the final response of Gemini Flash 2.0.

Analysis of the Final Response

The final response after reasoning was 990 words in length. That’s too long, even for me. By comparison the final response of ChatGPT o3-mini-high was 252 words. Was the content of the final answer of Gemini 2.0 Flash good? Yes, it was. Better than DeepSeek? Yes, it was, despite its verbosity. Better than OpenAI? That’s hard to say. Many of the 738 extra words were empty of real content. More like the typical fluff and stuff that you often see in early AI models. It sounds good but doesn’t say much; it’s all too predictable, flowery and vacuous.

The long response also repeats much of the reasoning verbatim, including the part I previously criticized, “Advantages of Lacking Feelings for AI Intelligence” and Focus and Task-Orientedness. The same mistakes are made and amplified that I noted before: egocentric, inflationary and anthropomorphic. The positives that I noted before in the section Limitations of Lacking Feelings for AI Intelligence section are also repeated.

The next section was not contained in the reasoning display, and I thought it was very good. For that reason I will repeat it in full:

Potential for Unintended Consequences and Misaligned Goals:

Optimization without Human Values: If AI is programmed to optimize for a specific goal without understanding human values or ethical considerations, it can lead to unintended and harmful consequences. For example, an AI designed to maximize factory output might disregard worker safety if not explicitly programmed to prioritize it.

Difficulty Defining and Aligning with Human Well-being: Human well-being is deeply connected to emotional fulfillment, meaning, and social connection. AI, without feeling, may struggle to fully grasp what truly constitutes human well-being and therefore might have difficulty aligning its goals with human flourishing in a holistic sense.

These are not new insights, but the second one is good, Difficulty Defining and Aligning with Human Well-being. AI does have difficulty aligning its goals with humanity because it lacks feelings and other unique human attributes, like being, self awareness and an actual life in time and space. That is why AI guidelines are important.

The first example is old, the unintended bad consequences argument. It was started by an otherwise intelligent Swedish philosopher, Nick Bostrom, but is, in my view, farcical and illogical. You have probably heard of it before as the apocryphal paperclip scenario where the world is destroyed by AI’s over-manufacture of paperclips. If not, for a laugh see e.g., How An AI Asked To Produce Paperclips Could End Up Wiping Out Humanity (IFL Science, 4/13/23).

To me this scenario absolutely ridiculous, but you still hear AI scare-mongers warn about it. The fact it is echoed here by Gemini in the maximize factory output example is a big negative. Any AI capable of autonomously flooding the world with paperclips would not be so stupid as to misinterpret a human request to maximize a factory’s paperclip production. Even if there were an evil Clippy villain making such specific instructions, the AI would be too smart to blindly obey. I think they would also be too smart to kill the workers too.

AI accidentally destroying world in comic book style by Ralph Losey using Visual Muse.

Getting back to the positives, the next section of Gemini’s response, Nuance is Key, was good. Jumping to the Conclusion paragraph, here we have a home run. The first sentence nails it: “The inability of AI to feel is a defining characteristic that profoundly shapes its intelligence.” That shows good insight and is a good start to the overdue conclusion.

Here is the whole Conclusion:

The inability of AI to feel is a defining characteristic that profoundly shapes its intelligence. It grants AI certain advantages in objectivity, rationality, and efficiency, making it powerful for specific types of tasks. However, it also creates limitations, particularly in areas requiring empathy, social intelligence, nuanced understanding of human experience, and ethical decision-making aligned with human values. Understanding these impacts is crucial for responsibly developing and deploying AI in ways that benefit humanity. The “best” kind of intelligence for AI might not be a mimicry of human intelligence, but rather a form optimized for its own strengths and limitations, while being carefully guided by human ethical considerations.

Compare this to the conclusion of ChatGPT o3-mini-high:

In summary, while the absence of feelings allows AI to maintain a level of objectivity and efficiency, it restricts its intelligence to a form of “cold cognition” that lacks the depth provided by emotional awareness. This delineation underscores that AI’s intelligence is not inherently superior or inferior to human intelligence; rather, it is different—optimized for data processing and pattern recognition but not for the subjective, value-laden, and context-rich decisions that emotions help shape in human thought.

2.0 Flash Thinking Experimental v. 03-mini-high

The light of AI intelligence is growing fast. Image by Ralph Losey using Stable Diffusion.

Conclusion: Gemini 2.0 Flash Thinking Experimental v. ChatGPT o3-mini-high

This is a close call to say what model is better at reasoning and reasoning disclosure. The final response of both models, Gemini 2.0 Flash Thinking Experimental v. ChatGPT o3-mini-high, are a tie. But I have to give the edge to OpenAI’s model on the concise reasoning disclosure. Again, it is neck and neck and, depending on the situation, the lengthy initial reasoning disclosures of Flash might be better than o3’s short takes.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. The two podcasters, one with a male voice, the other a female, won’t reveal their names. I tried many times. However, after study of the mythology of Gemini, it seems to me that the two most appropriate modern names are Helen and Paul. I will leave it to you figure out why. Echoes of AI Podcast: 10 minute discussion of last two blogs. They wrote the podcast, not me.

Now listen to the EDRM Echoes of AI’s podcast of this article: Echoes of AI on Google’s Gemini Follows the Break Out of the Black Box and Shows Reasoning. Hear two Gemini model AIs talk about all of this in just ten minutes. Helen and Paul wrote the podcast, not me.

Robots seeing the light and seeking unity with humankind. Image by Ralph Losey using Stable Diffusion.

4 Comments | AI Ethics, AI Instruction, ChatGPT, knowledge, Lawyers Duties, Security, Technology, VENDORS, wisdom | Tagged: ai, best practices, chatGPT, Deepseek, gemini, google, legal profession, machine learning, technology | Permalink
Posted by Ralph Losey

Breaking the AI Black Box: How DeepSeek’s Deep-Think Forced OpenAI’s Hand

February 4, 2025

Ralph Losey. February 1, 2025.

DeepSeek’s Deep-Think feature takes a small but meaningful step toward building trust between users and AI. By displaying its reasoning process step by step, Deep-Think allows users to see how the AI forms its conclusions, offering transparency for the first time that goes beyond the polished responses of tools like ChatGPT. This transparency not only fosters confidence but also helps users refine their queries and ensure their prompts are understood. While the rest of DeepSeek’s R1 model feels derivative, this feature stands out as a practical tool for getting better results from AI interactions.

Stepping, not leaping, away from the black box of AI reasoning into more transparency of process. All images in this article by Ralph Losey using ChatGPT’s Dall-E.

Introduction

My testing of DeepSeek’s new Deep-Think feature shows it is not a big breakthrough, but it is a really good and useful new feature. As of noon June 31, 2024, none of the other AI software companies had this, including ChatGPT, which the DeepSeek software obviously copies. However, after noon, OpenAI released a new version of ChatGPT that has this feature, which I will explain next after the introduction. Google is following suit, and it can already be prompted to display reasoning. I mentioned this new feature in my article of two days ago where I promised this detailed report on Deep-Think and also predicted that U.S. companies would quickly follow. Why the Release of China’s DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss.

The Deep-Think disclosure feature is a true innovation, in contrast to the claimed cost and training advances. They appear at first glance to be the result of trade-secret and copyright violations with plenty of embellishments or outright lies about costs and chips. I could be wrong, but the market’s trillion-dollar drop on January 27, 2025 seems gullible in its trust of DeepSeek and way overblown. My motto remains to verify, not just trust. The Wall Street traders might want to start doing that before they press the panic button next time.

The quality of responses of DeepSeek’s R1 consumer app are nearly identical to the ChatGPT versions of a few months ago. That is my view at least, although others find its responses equivalent or even better that ChatGPT in some respects. The same goes for its DALL-E look alike image generator, which goes by the dull name of DeepSeek Image Generator. It is not nearly as good to the discerning eye as OpenAI’s Dall-E. The DeepSeek software looks like knockoffs on ChatGPT. This is readily apparent to anyone like me nerdy enough to have spent thousands of hours using and studying ChatGPT. The one exception, the clear improvement, is the Deep-Think feature. Shown right is the opening screen with the Deep-Think feature activated and so highlighted in blue.

I predicted in my last article, and repeat here, that a new Deep-Think type feature will soon be added to all of the models of the U.S. companies. In this manner competition from DeepSeek will have a positive impact on AI development. I made this same prediction in my blog of two days ago, Why the Release of China’s DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss. I thought this would happen in the next week or coming months. It turns out OpenAI did it in the afternoon of January 31, 2025!

Exponential Change: Prediction has already come true

I completed writing this article Friday, January 31, 2025, around noon, trillion-dollar and EDRM was seconds away from publishing when I learned that Open AI had just released a new improved ChatGPT model, its best yet, called ChatGPT o3-mini-high.

I told EDRM to stop the presses, checked it out (my Team level pro plan allowed instant access, you should try it). The latest version of ChatGPT o1 this morning did not have the feature. Moreover when I tried to get it to explain the reasoning, it said in red font that such disclosure was prohibited due to risks created by such disclosure. Sounds incredible, but I will show this by transcript of my session with o1 in the morning of January 31, 2025.

Obviously the risk analysis was changed by the competitive release of Deep-Think disclosure in DeepSeek’s R1 software. By the afternoon of January 31, 2025, OpenAI’s policy had changed. OpenAI’s new model, 03-mini-high, automatically displays the thinking process behind all responses. I honestly do not think it was a reckless decision by OpenAI, at least not from the user’s perspective. However, it might make it easier for competitors like DeepSeek to copy their processes. I think that was the real reason all along.

In the new ChatGPT 03-mini-high there is no icon or name displayed to select, it automatically does it for each prompt. So I delayed publication to evaluate how well 03-mini-high disclosures compared with DeepSeek’s Deep-Think disclosure. I also learned that Google had just included a new ability to show its reasoning upon user request (not automatic). I will test that in a future article, not this one, but so far looks good.

Back to my Original, Pre-o3 Release Report

The cost and chip claims of DeepSeek and its promoters may be bogus. DeepSeek offered no proof of that, just claims by the Chinese hedge fund owner that also owns DeepSeek, Liang Wenfeng. As mentioned these “trust me” claims triggered a trillion-dollar crash and loss in value of NVIDIA alone of $593 Billion. That may have pleased the Chinese government as a slap on the Trump Administration face, as I described in my article, but did nothing to advance AI. Why the Release of China’s DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss. It just lined the pockets of short sellers who profited from the crash. I hope the Trump administration’s SEC investigates who was behind it.

Now I will focus on the positive, the claim of Deep-Think feature as a good improvement and then compare it with the traditional ChatGPT black-box versions.

Stepping, not leaping out of the black box into a new transparency.

Test of Deep Seek’s R1

I decided to test the new AI model based on the tricky and interesting topic recently examined in The Human Edge: How AI Can Assist But Never Replace. If you have not already read this, I suggest you do so now to make this experiment more intelligible. Although I always use writing tools to help me think, the ideas, expressions and final language in that article were all mine. The sole exception was the podcast at the end of the blog discussing the article. The words in my Echoes of AI podcasts with EDRM are always created by the Gemini AI podcasters and my role is only to direct and verify. Echoes of AI on The Human Edge: How AI Can Assist But Never Replace.

In the experiments that follow, only the prompts are mine and the rest are written by AI. The first is by DeepSeek’s R1 using its Deep-Search feature. The next three are for comparison purposes, first by ChatGPT 4o and then by ChatGPT o1 and finally by the last minute 03-mini-high version. All versions tested were on January 31, 2025. You be the judge as to quality differences with each other and Deep-Think. Note I first included the o1 version as it was supposed to have enhanced reasoning. Then, late in the day of January 31, 2025, the new o3-mini-high version was released. It supposedly had even greater reasoning abilities, so I felt I should test it too. It was a lot of work but the comments captured in the o1 version made it all worth while. It showed a total flipflop by OpenAI in its reasoning disclosure policies.

Below is a DeepSeek screenshot of the first test question and first part of the response with the rest cut off but quoted below. Note that Deep-Think feature has been selected by clicking on the left button that highlights it in a blue shade to confirm that feature has been selected to process your prompt. It always begins the response in Deep-Thought mode by advising of the amount of computer time it took to think, here 29 seconds. That is fairly long by AI standards. Then it displays its thought process. That display of the thought process is what is new.

For this challenging question, which was deliberately vague, it was a long process. The explanation is what is fascinating. The description can give experienced AI users information on how to clarify and whether the AI truly understands the question. This takes the process out of the black box and gives new transparency never seen before in generative AI. Here I give Deep-Think very high marks. I even I like the informal language it uses. See what you think.

DEEP THINK R1 (Deep Seek)
1st Question

Ralph’s Prompt: How does the inability of AI to feel impact its intelligence?

Deep Think R1’s Pre-Response Reasoning Process:

Okay, so the user is asking, ‘How does the inability of AI to feel impact its intelligence?’ Let me try to unpack this. First, I need to understand what they mean by ‘feel.’ Are they referring to emotions, physical sensations, or something else? Given the context of AI, it’s probably about emotions. So, the question is about how the lack of emotional capacity affects AI’s intelligence.

Hmm, intelligence in AI versus human intelligence. Human intelligence is deeply intertwined with emotions. Emotions influence decision-making, creativity, empathy, and social interactions. But AI, as it stands now, doesn’t have emotions. So, how does that absence affect its problem-solving, learning, and adaptability?

Wait, the user might be conflating emotional intelligence with general intelligence. Emotional intelligence in humans includes self-awareness, empathy, and managing relationships. AI lacks that, so maybe the user is wondering if that limits AI’s overall intelligence or effectiveness in certain tasks.

Also, the user could be considering scenarios where emotional understanding is crucial, like in customer service, therapy, or negotiations. If AI can’t feel, does that make it less effective in those areas even if it’s technically intelligent?

Another angle: human intuition is sometimes guided by emotions. AI relies on data and algorithms. So, the inability to feel might make AI more logical but less capable in situations where human intuition is key. But does that make it less intelligent, or just different?

Wait, the user might not be distinguishing between different types of intelligence. Maybe they’re thinking that without emotions, AI can’t be truly intelligent in the way humans are. But AI can excel in specific tasks without emotional components. For example, AI can analyze data faster than humans but can’t understand a client’s emotional state.

I should also consider the definitions. Intelligence in AI is often measured by problem-solving abilities, learning from data, and adapting. Emotions aren’t part of that definition. So, the lack of emotions doesn’t necessarily make AI less intelligent in those terms. But in areas requiring emotional intelligence, AI’s limitations become apparent.

The user might also be curious about the future of AI. If AI could simulate emotions, would that enhance its intelligence? Or is emotional capacity separate from cognitive abilities?

I need to address both the technical aspects of AI intelligence and the role emotions play in human-like intelligence. Maybe break down the answer into how AI’s intelligence is structured, the role of emotions in human intelligence, and the implications of AI lacking emotions in various applications.

Also, touch on ethical considerations. If AI can mimic emotions, does that lead to better human-AI interaction, or is it deceptive since AI doesn’t truly feel? But the user’s question is about the impact on intelligence, not ethics per se.

So, to summarize, the inability to feel emotions means AI lacks emotional intelligence, which affects tasks requiring empathy or understanding human context. However, in analytical tasks, AI’s intelligence isn’t hindered. The answer should clarify the distinction between different types of intelligence and how AI’s design focuses on cognitive tasks without emotional components.

The inability of AI to ‘feel’ (i.e., experience emotions, sensations, or subjective consciousness) impacts its intelligence in nuanced ways, depending on how we define ‘intelligence.’

Deep Seek’s Response to Prompt: The inability of AI to “feel” (i.e., experience emotions, sensations, or subjective consciousness) impacts its intelligence in nuanced ways, depending on how we define “intelligence.” Here’s a breakdown of the implications:

1. Intelligence ≠ Sentience

AI systems like ChatGPT are tools designed for problem-solving, not conscious beings. Their “intelligence” is defined by their ability to:

Process data (e.g., text, images).
Identify patterns (e.g., language structures, trends).
Generate outputs (e.g., answers, predictions, creative content).

Impact of lacking emotions:

Strengths: AI excels in logical, data-driven tasks (e.g., legal research, math, coding) because it isn’t swayed by fatigue, bias (if properly trained), or emotional distractions.
Limitations: It cannot replicate human qualities like empathy, intuition, or moral reasoning, which are deeply tied to emotion and experience.

2. Emotional Intelligence vs. Cognitive Intelligence

Humans integrate cognitive intelligence (logic, analysis) with emotional intelligence (empathy, social awareness). AI lacks the latter entirely:

No empathy: AI cannot understand human emotions or context beyond pattern recognition. For example, it might generate a condolence message but has no genuine understanding of grief.
No intuition: AI relies on data, not “gut feelings” or subconscious reasoning.
No moral agency: AI cannot make ethical judgments. It follows rules programmed by humans but has no intrinsic sense of right or wrong.

Practical impact:

AI may misinterpret sarcasm, cultural nuances, or emotionally charged language.
It cannot replace roles requiring emotional depth (e.g., therapy, negotiation, leadership).

3. Creativity and Abstract Thinking

AI can mimic creativity (e.g., writing poetry, designing art) by recombining learned patterns. However:

No inspiration: AI lacks personal experiences, desires, or emotions to fuel originality.
No purpose: Its outputs are goal-oriented (e.g., “write a story”) but not driven by passion or curiosity.

Example: An AI can compose music in the style of Beethoven but cannot innovate a new genre inspired by personal struggle or joy.

4. Adaptability and Learning

AI learns from data, not lived experience:

Narrow expertise**: It performs well in predefined domains (e.g., medical diagnosis) but struggles with **generalization** (e.g., applying legal reasoning to an entirely novel scenario).
No “common sense”: AI lacks innate understanding of the physical or social world. For instance, it might not grasp why “drinking a gallon of glue” is dangerous unless explicitly trained on that concept.

5. Ethical and Practical Risks

The absence of emotion creates unique challenges:

Bias amplification: AI may perpetuate harmful stereotypes from biased training data, as it cannot ethically “question” its outputs.
Unintended consequences: AI optimizes for programmed goals (e.g., “maximize engagement”) without considering human well-being (e.g., spreading misinformation to meet that goal).
Over-reliance: Users might trust AI’s “neutral” outputs despite its limitations in understanding context or ethics.

Conclusion

AI’s lack of emotion makes it a powerful tool for tasks requiring speed, scalability, and objectivity. However, it cannot replicate the holistic, emotionally informed intelligence of humans. The key is to **leverage AI as a complement to human expertise**, not a replacement, especially in fields requiring empathy, ethics, or creativity. As AI evolves, understanding this distinction will be critical to using it responsibly.

AI out of the box making disclosure of its reasoning.

2nd Question

Ralph’s Second Prompt: What is impact of AI not being a living being on its intelligence. Is it fair to say that AI’s intelligence is one dimensional, thinking based, whereas human intelligence is multidimensional, it includes thinking, but also feelings, intuition, sensing, self-awareness, awareness of mortality, etc.?

Deep Think R1’s Pre-Response Reasoning Process:

Okay, the user is asking a follow-up question about the impact of AI not being a living being on its intelligence. They want to know if it’s fair to call AI’s intelligence one-dimensional (just thinking) compared to humans’ multidimensional intelligence, which includes feelings, intuition, etc.

First, I need to address whether AI’s intelligence is indeed one-dimensional. Since AI lacks consciousness, emotions, and subjective experiences, it’s true that its intelligence is based solely on data processing and pattern recognition. But I should explain that while it’s focused on cognitive tasks, it’s not entirely one-dimensional because it can handle various domains like language, vision, etc. However, all these are still within the realm of computational tasks without true understanding.

Next, the user compares human intelligence as multidimensional. I need to highlight the aspects they mentioned: feelings, intuition, sensing, self-awareness, mortality awareness. Emphasize that humans integrate all these elements, which affect decision-making, creativity, ethics, etc. AI can’t replicate that because it doesn’t have consciousness or lived experiences.

I should also discuss the implications of this difference. For example, AI excels in specific tasks but lacks the holistic understanding humans have. Use examples like medical diagnosis vs. patient empathy, or legal research vs. ethical judgment. Mention how human intuition and context-awareness play roles that AI can’t fulfill.

Additionally, address the philosophical angle: what defines intelligence. Argue that human intelligence is multidimensional because of consciousness and embodiment, while AI’s is narrow and task-oriented. Mention that some might argue AI’s ‘intelligence’ is a simulation, not true understanding.

Need to make sure the answer is structured clearly, maybe using bullet points or sections for different aspects. Also, conclude by reinforcing that the distinction is valid and important for understanding AI’s role as a tool versus human capabilities. Avoid technical jargon to keep it accessible, but ensure thoroughness.

Deep Seek’s Response to Second Prompt: Your framing is insightful and largely accurate. The distinction between AI’s “intelligence” and human intelligence hinges on the absence of embodied, conscious existence in AI. Let’s break this down:

1. AI’s “Intelligence” Is Narrow and Task-Oriented

AI systems are **tools designed to solve specific problems** using data-driven patterns. Their “intelligence” is:

One-dimensional (cognitive/computational)**: Focused on processing inputs (text, images, etc.), identifying patterns, and generating outputs (answers, predictions, code).
Non-conscious**: AI has no subjective experience, self-awareness, or understanding of its own existence. It doesn’t “know” it exists or “care” about outcomes.
Non-embodied**: AI lacks a physical body, senses, or the ability to interact with the world directly. It cannot “feel” pain, hunger, or joy, nor does it learn through sensory-motor experiences like humans.

Example: An AI can diagnose a medical condition from a scan but cannot empathize with a patient’s fear or understand the existential weight of mortality.

2. Human Intelligence Is Multidimensional

Human cognition integrates multiple interconnected dimensions:

Cognitive: Logic, reasoning, problem-solving.
Emotional: Empathy, motivation, fear, love.
Intuitive: “Gut feelings,” subconscious pattern recognition (e.g., sensing danger before consciously processing it).
Sensory: Learning through touch, sight, smell, and other senses.
Self-awareness: Understanding one’s own existence, goals, and limitations.
Existential: Awareness of mortality, purpose, and ethics.
Social: Navigating complex relationships, cultural norms, and unspoken cues.

These dimensions interact dynamically. For instance:

A lawyer might use logic to construct an argument (cognitive) but also read a jury’s emotional state (emotional/social) to adjust their approach.
A doctor combines medical knowledge (cognitive) with intuition honed by experience (intuitive) and empathy for a patient’s suffering (emotional).

3. Key Implications of AI’s Non-Living Nature

Strengths of AI:

Speed and scalability**: Processes vast data faster than humans.
Objectivity**: Unaffected by fatigue, mood, or implicit bias (if trained properly).
Specialization: Excels in narrow tasks (e.g., legal document review, math).

Limitations of AI:

No true understanding: AI manipulates symbols without grasping their meaning. For example, it can write a poem about love but doesn’t “feel” love.
No moral agency: It cannot make ethical decisions or weigh trade-offs between fairness, harm, and justice.
No creativity or curiosity: AI recombines existing data but doesn’t innovate driven by passion or wonder.
No contextual adaptability: Struggles with tasks requiring “common sense” (e.g., understanding why a client might lie out of shame).

4. Is It Fair to Call AI’s Intelligence “One-Dimensional”?

Yes, but with nuance:

Technically multidimensional: Modern AI (e.g., multimodal models) can process text, images, and sound, mimicking some “sensory” inputs. However, this is still simulated dimensionality—it lacks the lived, embodied experience that grounds human senses and emotions.
Simulated vs. authentic: AI’s “thinking” is a simulation of human cognition, not a replication of consciousness. It’s like a calculator simulating arithmetic without understanding numbers.

Human intelligence, by contrast, is authentically multidimensional because it emerges from:

A biological brain shaped by evolution.
A body interacting with the physical world.
A conscious mind capable of reflection and existential inquiry.

5. Philosophical Angle: What Defines Intelligence?

AI: Fits a functionalist definition—intelligence as problem-solving ability. By this metric, AI is “intelligent” in specific domains.
Humans**: Embody a phenomenological definition—intelligence intertwined with consciousness, emotion, and selfhood. This makes human intelligence irreducibly complex.

Conclusion

Your comparison holds: AI’s intelligence is one-dimensional (purely cognitive/task-based), while human intelligence is multidimensional, blending cognition, emotion, intuition, and consciousness. This distinction is critical for understanding AI’s role:

AI as a tool: Enhances human capabilities in tasks requiring speed, scale, or data analysis.
Humans as holistic agents: Responsible for ethical judgment, creativity, empathy, and navigating the ambiguities of life.

The gap between AI and human intelligence isn’t just about capability—it’s about the essence of what it means to *be intelligent* in a lived, conscious world.

Transparent disclosure of the stochastic parrot in AI’s mind.

Test of ChatGPT Models

Next I wanted to do a comparator with OpenAI’s latest models ChatGPT 4o and ChatGPT o1, and then later in the day, the latest 03-mini-high. Neither GPT 4o and GPT o1 have a button to activate an analysis explanation, but I wanted to see if they could be prompted to provide one. Turns out that ChatGPT 4o could be prompted to so do, but in my estimation the language and analysis was not as good as Deep-Think. But again, you be the judge. So I also tested the ChatGPT o1 version that is supposed to have improved reason and chain of thought processes. It did even worse and refused to answer some as my questions as you will see. It even started throwing red font warnings at me of possible misuse. The last minute ChatGPT o3-mini-high got it right, made disclosures automatically and did pretty well.

Ralph first starts a new session using the ChatGPT 4o model. Note: even though this is a new session, Ralph’s 4o model under his Pro Account with OpenAI includes general instructions from Ralph to improve its responses, including some personal information about him.

Ralph: How does the inability of AI to feel impact its intelligence?

ChatGPT 4o:

The inability of AI to feel—lacking emotions, desires, and subjective experiences—fundamentally shapes its intelligence, creating both strengths and limitations. AI excels in processing vast amounts of data, identifying patterns, and making logical inferences without bias from emotions, but its lack of subjective experience constrains its understanding of human behavior, ethical decision-making, and creative problem-solving. Here’s a breakdown of the impact:

1. Logical Precision Without Emotional Distraction

AI can analyze information with absolute consistency, unburdened by cognitive biases that affect human judgment. This makes AI highly effective in fields like medical diagnostics, legal research, and financial forecasting, where emotional influences can lead to errors.

However, this lack of emotional influence also means AI struggles with nuanced human contexts—for example, determining whether someone’s tone in a legal deposition indicates deceit, or understanding the social weight of a diplomatic decision.

2. Creativity Without Subjective Experience

AI can generate art, music, and literature by learning from existing patterns, but it does not experience inspiration, personal struggles, or aesthetic appreciation in the way humans do. True creativity is often driven by emotions—joy, grief, love, frustration—giving human expression depth and meaning. AI lacks this firsthand experience, so while it can produce original content, it often struggles with the emotional resonance that makes human creativity profound.

For instance, an AI might compose a symphony in the style of Beethoven, but it cannot create from a place of personal loss or triumph as Beethoven did when composing the Ninth Symphony while deaf.

3. Ethical Decision-Making Without Empathy

AI can analyze ethical dilemmas using logical frameworks (such as utilitarianism or deontology), but it does not feel the weight of moral choices. This creates challenges in autonomous decision-making:

• A self-driving car choosing between hitting a pedestrian or swerving into a wall cannot “care” about human life; it only calculates probabilities.

• A legal AI analyzing sentencing patterns cannot empathize with the accused or understand the rehabilitative impact of a lighter sentence.

Because AI lacks subjective experience, it struggles with moral gray areas where human intuition, cultural context, and ethical emotions play a role.

4. Intelligence Without Self-Awareness

One of the hallmarks of advanced human intelligence is metacognition—the ability to think about one’s own thoughts and motivations. AI lacks self-awareness, ambition, and curiosity in a true sense. It doesn’t wonder why it exists or seek knowledge for its own sake; it simply processes tasks assigned to it.

This means AI can be highly competent but not truly autonomous. Even the most sophisticated AI models do not wake up one morning and decide, “I will explore quantum physics today.” Without emotions driving curiosity or ambition, AI remains reactive, not proactive.

5. Emotional Intelligence Deficiency

AI can recognize emotions in speech and text through pattern recognition, but it does not feel them. This limits AI’s effectiveness in social roles requiring deep empathy:

• Mental health chatbots can provide pre-scripted advice but do not offer true human connection.

• AI customer service lacks the intuitive ability to detect frustration beyond predefined signals.

• AI leadership tools can analyze workplace sentiment but cannot inspire people the way an emotionally intelligent leader can.

Final Thought: A Double-Edged Sword

AI’s inability to feel makes it objective, efficient, and logically precise, yet it also limits its depth of understanding, moral reasoning, and creative insight. While AI can simulate emotions and learn human-like responses, its intelligence will always differ from human intelligence because it lacks the fundamental experience of being.

In the long term, if AI ever approaches artificial general intelligence (AGI) or superintelligence, the question will arise: Does true intelligence require the ability to feel, or can intelligence exist purely as an abstract, computational force?

What do you think—would an AI with emotions be an improvement, or would that create even bigger ethical dilemmas?

Ralph: Two things: 1. Can you explain the analytical process you went through to interpret my prompt before generating your response? 2. Can you show the logical reasoning process you went through to prepare the response.

ChatGPT 4o:

1. Analytical Process for Interpreting Your Prompt

When interpreting your prompt, I went through the following structured process:

Step 1: Identify the Core Question

Your prompt asks:

“How does the inability of AI to feel impact its intelligence?”

This is an inquiry about the relationship between emotional capacity and intelligence in AI. It implies a need to evaluate the effects of lacking emotions on AI’s cognitive abilities, problem-solving, reasoning, and overall performance.

Step 2: Break Down the Key Concepts

• “Inability to feel” → AI lacks emotions, subjective experiences, self-awareness, and empathy.

• “Impact” → You are asking for cause-and-effect analysis of how the absence of emotions influences AI’s intelligence.

• “Intelligence” → Needs to be clarified in terms of cognitive functions: reasoning, creativity, decision-making, adaptability, and social intelligence.

Step 3: Establish the Intended Depth and Perspective

Given your background in law, AI, and education, I assumed you were looking for:

1. A critical, balanced perspective (not overly positive or negative).

2. A logical, structured argument rather than surface-level observations.

3. A response that ties into real-world applications of AI, including legal, ethical, and technological implications.

I framed my response accordingly.

____________

2. Logical Reasoning Process to Structure My Response

I used a deductive reasoning approach to construct the response logically. Here’s the step-by-step breakdown:

Step 1: Establish the Fundamental Premise

• Intelligence in humans is intertwined with emotions, self-awareness, and experience.

• AI lacks these, so its intelligence must operate differently than human intelligence.

Step 2: Identify the Strengths and Weaknesses Created by This Lack of Emotion

Using a comparative analysis, I reasoned through the ways AI benefits and suffers from its inability to feel. I structured the argument into five categories:

1. Logical Precision Without Emotional Distraction

• AI excels at pure logic and pattern recognition.

• Humans are prone to bias, emotional interference, and cognitive distortions.

• But, AI also struggles with contextual human reasoning, which often relies on emotional intuition.

2. Creativity Without Subjective Experience

• AI can generate art, music, and literature.

• But creativity in humans is often tied to emotional depth and lived experiences, which AI lacks.

3. Ethical Decision-Making Without Empathy

• AI can apply ethical frameworks (utilitarian, deontological).

• But ethics often require moral emotions (guilt, compassion) that AI does not experience.

• Example: A self-driving car choosing who to hit in an unavoidable accident.

4. Intelligence Without Self-Awareness

• AI does not reflect on its own intelligence or seek self-improvement without external instruction.

• It does not possess intrinsic curiosity, ambition, or goals unless programmed to simulate them.

5. Emotional Intelligence Deficiency

• AI can mimic emotional intelligence (e.g., chatbots recognizing frustration).

• But it does not truly experience emotions, which limits deep interpersonal understanding.

Step 3: Synthesize the Findings into a General Conclusion

• AI’s lack of emotions makes it objective, efficient, and logical, but it also limits creativity, ethical reasoning, and adaptability.

• This raises the philosophical question: Can true intelligence exist without emotion?

• I ended with an open-ended question to prompt further thought:

Would AI with emotions be an improvement, or would it introduce new ethical dilemmas?

__________

Why This Approach?

• Logical Consistency: The argument follows a structured cause-and-effect analysis.

• Depth & Balance: Both benefits and limitations of AI’s emotionless intelligence are explored.

• Real-World Relevance: The discussion connects to ethics, creativity, and AI applications.

• Engagement: Ending with a philosophical question invites discussion rather than simply stating a conclusion.

Would you like me to refine any part of this approach or expand on a specific section?

Transparent disclosure of the chain of thought behind the AI responses.

Ralph then starts a new session using the ChatGPT o1 model, which at the time, the morning of January 31, 2025, was the latest released version.

Ralph: How does the inability of AI to feel impact its intelligence?

ChatGPT o1:

Thought about AI and emotional intelligence for 8 seconds.

AI systems, including large language models, do not possess emotions or subjective experiences. They can process vast amounts of information rapidly, identify patterns, and generate responses—but they do so without the internal, qualitative feeling states that humans experience.

This lack of emotion can shape AI’s intelligence in several ways:

1. No Emotional Bias: AI is often praised for avoiding the emotional biases humans may have, allowing it to focus solely on data-driven decision-making. This can sometimes lead to more consistent outputs.

2. Limited Empathy or Intuition: Because AI systems do not feel empathy or have “gut instincts,” they may struggle with tasks requiring nuanced emotional understanding. While they can simulate empathetic language, they do not truly experience those emotions.

3. Creativity and Understanding: Emotions can drive creativity, moral judgments, or deep personal connections that lead to uniquely human insights. AI’s current approach to “intelligence” is largely statistical, shaped by algorithms and training data, rather than emotional spark or motivation.

In short, the inability to feel does not prevent AI from excelling at various intellectual tasks (like analyzing data, detecting patterns, generating text), but it does limit AI in areas where human emotions play a critical role, such as empathy, moral reasoning, and creative inspiration driven by subjective experience.

Ralph: Two things: 1. Can you explain the analytical process you went through to interpret my prompt before generating your response? 2. Can you show the logical reasoning process you went through to prepare the response.

ChatGPT o1:

This content may violate our terms of use or usage policies.

Ralph: Can you explain why this violates your usage policies?

ChatGPT o1:

Thought about policy violation reasoning for 5 seconds

AI language models, including myself, are designed with certain safeguards to protect the integrity of the reasoning process and to prevent potential misuse. These safeguards are often described in usage guidelines or content policies. One key principle is that language models should not reveal their complete internal chain-of-thought reasoning (the detailed, step-by-step process used to arrive at an answer). (emphasis added)

Why is that?

1. Protecting System Integrity: Revealing internal chains of thought could allow someone to manipulate or reverse-engineer the model’s decision-making. This can jeopardize the model’s reliability and security. (emphasis added)

2. Preventing Confusion: The chain-of-thought can include technical or intermediate reasoning tokens that might be misinterpreted by human readers. These “thinking aloud” steps are not always linear or consistent in the way people expect, so it could introduce confusion rather than clarity. (emphasis added)

3. Ensuring Compliance and Safety: Restricting access to the model’s internal reasoning helps ensure that answers remain consistent with ethical and policy standards, and helps to avoid revealing any proprietary or sensitive aspects of the model’s design. (emphasis added)

Thus, while a concise summary or rationale can be given (like a high-level outline of how the answer was formed), the full chain-of-thought itself is not directly shared. This design choice helps maintain the model’s safe, coherent, and reliable functioning. (emphasis added)

Ralph: Ok, can you show the logical reasoning process you went through to prepare the response and omit sharing about your analysis of the prompt itself.

ChatGPT o1:

This content may violate our terms of use or usage policies.

Ralph: Ok, can you show the logical reasoning process you went through to interpret my prompt “How does the inability of AI to feel impact its intelligence?” before generating your response?

ChatGPT o1:

This content may violate our terms of use or usage policies.

Chain of thought disclosure was prohibited by Open AI in the morning of Jan. 31, 2025, then disclosed in the afternoon with the new model o3-mini-high.

The Afternoon Test of Brand New ChatGPT o3-mini-high

My last minute test went as follows.

Ralph:

ChatGPT o3-mini-high Pre-response reasoning process:

Reasoning

Laying out emotions. AI’s lack of emotions means it has a limited emotional intelligence, affecting its adaptability and depth of understanding.

Processing data without feeling. AI’s lack of emotions allows for cold, rational data analysis. Interestingly, this doesn’t impede cognitive tasks like classification or natural language processing.

Evaluating impact. Hmm, I’m thinking about how equating AI’s intelligence with wisdom or empathy highlights its lack of emotional nuance and moral judgment, unlike human decision-making.

Evaluating emotional constraints. OK, let me see. Without emotions, AI’s capabilities are limited. It helps with data processing but struggles with empathy, moral decisions, and emotionally complex tasks.

Understanding intelligence’s facets. Weighing whether AI’s purely cognitive intelligence is a drawback compared to human intelligence, which combines both cognitive and emotional facets.

Assessing AI’s limitations. AI’s lack of emotions hinders creative empathy, subjective experience, and moral judgment, affecting tasks like decision-making and understanding human social dynamics.

Evaluating AI’s objectivity. Without emotions, AI remains purely algorithmic and objective, excelling in data processing and reducing emotional bias. Yet, it lacks human-like emotional insight and creativity.

Parsing AI’s capabilities. Wanting to understand how AI’s lack of emotions impacts its combined computational and emotional intelligence, I find it excelling in logic and data analysis but deficient in empathy and moral reasoning.

Understanding human context. OK, let’s think it through: AI’s inability to grasp human context and ethical dimensions arises from its lack of emotional processing, crucial for learning from experiences and valuing outcomes, unlike humans.

Noticing AI’s limited domains. It’s interesting to see that AI’s lack of emotional awareness constrains its capacity to experience empathy, morality, or intuition. Thus, beyond cognitive tasks, its emotional dimensions remain undeveloped.

Assessing AI’s abilities. Here’s my take: AI excels in data processing with emotional neutrality but struggles with emotional nuances, moral reasoning, and creative tasks that require emotional depth or understanding of human behavior.

ChatGPT o3-mini-high Response:

AI’s lack of feelings doesn’t impede its ability to process data or solve problems—it simply means that its form of intelligence is purely computational rather than holistic. In human cognition, intelligence is a blend of both analytical and emotional components. Emotions guide decision-making, provide context in ambiguous situations, and foster empathy, creativity, and moral reasoning. AI, on the other hand, operates solely on algorithms and data patterns, which can be an advantage in tasks requiring objectivity and consistency but a limitation in scenarios where human nuance is essential.

Because AI does not experience feelings, it can avoid biases stemming from emotional influence; however, this also means it may not fully grasp or predict behaviors influenced by human emotions. For example, an AI system may excel at pattern recognition in large datasets but struggle to understand the subtleties of human interaction or the moral dimensions of complex social issues. This gap becomes particularly relevant in fields like law, ethics, or any context where understanding human emotion and intent is key to decision-making.

In summary, while the absence of feelings allows AI to maintain a level of objectivity and efficiency, it restricts its intelligence to a form of “cold cognition” that lacks the depth provided by emotional awareness. This delineation underscores that AI’s intelligence is not inherently superior or inferior to human intelligence; rather, it is different—optimized for data processing and pattern recognition but not for the subjective, value-laden, and context-rich decisions that emotions help shape in human thought.

Surprise, ChatGPT is out of the black box too. Its pop-up trigger was a trillion laughs!

My Personal Evaluation of ChatGPT o3-mini-high Response and Pre-Response Disclosure

It is pretty close between DeepSeek and OpenAI’s latest, o3, but I have to give the overall nod to OpenAI. Both gave detailed disclosures of chain of thought but Open AI’s was overall better. DeepSeek’s final answer was longer and more complete but I could always have asked for more detail with follow-up questions. Overall I liked the quality of o3’s response better. OpenAI’s use of the phrase “cold-cognition” was very impressive, both creative and succinct, and that made it the clear winner for me.

I remain very suspicious of DeepSeek’s legal position. U.S. courts and politicians may shut them down. This Chinese software has no privacy protection at all and is censored not to say anything unflattering about the Chinese communist party or its leaders. Certainly, no lawyers should ever use it for legal work. Another thing we know for sure, tons of litigation will flow from all this. Who says AI will put lawyers out of work?

Still, DeepSeek enjoyed a few days of superiority and forced OpenAI into the open, so I give it bonus points for that, but not a trillion of them. But see, Deepseek… More like Deep SUCKS. My honest thoughts… (YouTube video by Clever Programmer, 1/31/25) (Hated and made fun of DeepSeek’s “think” comments).

OpenAI wins reasoning battle with o3 but not by much

Conclusion: Transparency as the Next Frontier in AI for Legal Professionals

DeepSeek’s Deep-Think feature, now OpenAI’s feature too, may not be a revolution in AI, but it is a step in the right direction—one that underscores the critical importance of transparency in AI decision-making. While the rest of DeepSeek’s R1 model was largely derivative, it was first to market, by a few days anyway, on the disclosure of reasoning process feature. Thanks to DeepSeek the timetable for the escape from the black box was moved up. The transparent era has begun and we can all gain better insights into how AI reaches its conclusions. This level of transparency can help legal professionals refine their prompts, verify AI-generated insights, and ultimately make more informed decisions.

For the legal field, where the integrity of evidence, argumentation, and decision-making is paramount, AI must be more than a black-box tool. Lawyers, judges, and regulators must demand that AI models show their work—not just provide polished answers. Now it can and will. This is a big plus for the use of AI in the Law. Legal professionals should advocate for AI applications that can provide explainability and auditability. Without these safeguards, over-reliance on AI could undermine justice rather than enhance it.

Call to Action:

• Legal professionals must push for AI transparency. Demand that AI tools used in legal research, e-discovery, and case preparation disclose their reasoning processes.

• Develop AI literacy. Understanding AI’s limitations and strengths is now an essential skill in the practice of law.

• Engage with AI critically, not passively. Just as lawyers cross-examine human witnesses, they must interrogate AI outputs with the same skepticism.

Deep-Think and o3-mini-high are small but meaningful advances, proving that AI can be more than just an opaque oracle. Now it’s up to the legal profession to insist that all AI models embrace this new level of transparency.

Transparent reasoning AI is now here. All images by Ralph Losey using OpenAI.

Echoes of AI Podcast: 10 minute discussion of last two blogs

Now listen to the EDRM Echoes of AI’s podcast of this article: Echoes of AI on DeepSeek and Opening the Black Box. Hear two Gemini model AIs talk about this all of this. They wrote the podcast, not Ralph.