artificial intelligence | eDiscovery Team

Five Faces of the Black Box: How AI ‘Thinks’ and Makes Decisions

March 29, 2026

Ralph Losey, March 29, 2026.

We are currently living through a “Gutenberg Moment,” but with a complex, digital twist: our new printing press is alive, probabilistic, and prone to “confident delusions.” While AI may be humanity’s most transformative invention, it remains an enigma to most.

For many legal professionals, the outputs of Generative AI feel like a digital seance—words appearing out of the ether with no visible logic. This “Black Box” is not just a technical curiosity; it is a professional liability. If you cannot at least partially understand and explain how your “assistant” reached a conclusion, you are effectively practicing in the dark. To move from being a passenger to a pilot, you must understand the mechanical soul of the machine and learn how to make it sing with the voices you command.

A futuristic scene depicting four individuals interacting with a multi-faceted display in a modern office environment, showcasing advanced technology and data visualization concepts. — Five Faces of the Black Box. My choices. My direction. Writing and images assisted by Gemini AI.

My recent article, What People Want To Know About AI: Top 10 Curiosity Index, revealed that the primary thing people want to know is how the machine actually works. They are asking the most difficult question in the field: How does AI “think” or make decisions?

This article answers that question by providing a structured understanding of Large Language Models (LLMs) across five levels of technical complexity:

The Smart Child: The world’s best guessing game.
The High School Graduate: Statistical probability at a global scale.
The College Graduate: Mapping meaning in Latent Space.
The Computer Scientist: The logic of the Transformer and Self-Attention.
The Tech-Minded Legal Professional: Navigating probabilistic advocacy.

A visual representation of five individuals at different life stages: a young boy labeled 'The Smart Child,' a high school student labeled 'High Schooler,' a college graduate in a cap and gown, a computer scientist in a lab coat, and a lawyer in business attire labeled 'The Tech-Minded Lawyer.' Each character is surrounded by digital elements and diagrams that represent technology and education.

There is a meta-lesson here too that goes beyond the words on this page. Some of my favorite explanations of complex subjects emulate the fresh, clear speech of fifth graders. You will often find deep creativity when AI models parrot their language.

I chose five kinds of speech to describe how AI works. There are hundreds more that I could have picked. I also could have asked for explanations that use story or humor, much like Abraham Lincoln liked to do. It is fun to learn to tell AI what to do so that you can better communicate. It empowers a level of creativity never before possible. Maybe next time I will use comedy or poetry. For now, let’s peel back the curtain using these five.

1. The Smart Child Level: The World’s Best Guessing Game

Definition: Generative AI is like a magic “Fill-in-the-Blank” machine that has played the game trillions of times with almost every book ever written.

Imagine you are playing a game. If I say, “The peanut butter and…”, you immediately think of the word “jelly.” You don’t need to look at a jar of jelly to know that word fits. You’ve heard those words together so many times that your brain just knows they belong together.

An AI is a computer that has “listened” to almost everyone in the world talk and “read” almost every story ever told. It doesn’t “know” what a sandwich is, and it doesn’t have a stomach that feels hungry. It simply knows that in the history of human writing, the word “jelly” follows “peanut butter” more than almost any other word.

But it’s even smarter than that. If you say, “I am at the library and I am reading a…”, the AI knows that “book” is a much better guess than “sandwich”. It looks at all the words you give it—the “clues”—to narrow down the billions of possibilities into one likely answer. It makes decisions by picking the word that is most likely to come next to complete a pattern that makes sense to us. It isn’t “thinking” about the story; it’s just very, very good at predicting the next piece of the puzzle.

A robotic hand holds a piece of jelly on a keyboard with the words 'SUN PEANUT BUTTER AND.' set against a backdrop of bookshelves.

2. The High School Level: Statistical Probability at Global Scale

Definition: AI is a Prediction Engine. It uses “Big Data” to calculate the statistical likelihood of the next piece of information.

Most of us use the “Autofill” feature on our smartphones every day. As you type a text, the phone suggests the next likely word based on your past habits. If you often text “I’m on my way,” the phone learns that “way” usually follows “my.” Generative AI—specifically Large Language Models—is essentially Autofill scaled to include the vast majority of digitized human knowledge.

During its “training” phase, the model does not “memorize” facts like a traditional database. If you ask it for the date of the Magna Carta, it isn’t looking it up in a digital encyclopedia. Instead, it has learned through billions of examples that the words “Magna Carta” and “1215” have a very high statistical correlation.

This explains why AI can sometimes be “confidently wrong.” It isn’t “lying” in the human sense; it is simply following a statistical path that leads to a mistake. If the data it was trained on contains a common error, the AI will repeat that error because, in its mathematical world, that error is the “most likely” next word. It recognizes the “shape” of human thought without actually having a human mind.

A person holding a smartphone displaying a messaging app titled 'Global AI Team', with a conversation about scaling processing. The background features a digital world map with binary code overlay. — High School Graduate Level Speech Using Statistical Probabilities.

3. The College Graduate Level: Mapping the Latent Space

Definition: AI organizes information using Vector Embeddings, which convert words into numerical coordinates on a massive, multi-dimensional map called Latent Space.

To understand how AI moves beyond mere word-matching, we have to look at how it “maps” meaning. In a physical library, books are organized by a 1D system (the spine) or 2D (the shelf). AI organizes information in a “map” that has thousands of dimensions.

Vectoring (The Coordinate System): Every word or concept is assigned a “Coordinate”—a long string of numbers. For example, the word “Stealing” is mathematically plotted very close to “Larceny” but far away from “Charity”.
Conceptual Proximity: Think of this as the “Relativity” of language. If you ask the AI about “theft,” it doesn’t look for that specific word. It navigates to those coordinates in Latent Space and finds all the “neighboring” concepts like “property,” “intent,” and “deprivation.”
Vector Arithmetic: Researchers discovered that you can actually perform “logic” using these numbers. A famous example is: King – Man + Woman = Queen. The model “understands” the relationship between these concepts because the mathematical distance between “King” and “Man” is the same as the distance between “Queen” and “Woman.”

When you provide a prompt, the AI identifies the coordinates of your request. It then “walks” through the nearby clusters of meaning to synthesize an answer. The “Black Box” is the result of the sheer scale of this map. With hundreds of billions of dimensions, the path the AI takes is so complex that no human can trace the logic of a single output back to a single “rule.”

A visual representation of legal terms and criminal acts, featuring nodes and connections depicting concepts like larceny, fraud, contract law, and violent crimes. — College Graduate Level Speech Mapping Latent Space.

4. The Computer Scientist Level: The Decoder-Only Transformer

Definition: Generative AI is a system powered by neural network architectures—most notably the Decoder-only Transformer—that is specifically tuned to generate the next piece of information by mathematically looking back at everything that came before it. Rather than relying on rigid rules, these models evaluate entire inputs using a mathematical weighting system called Self-Attention to determine the contextual relationship between every element.

To achieve this generative capability, the architecture relies on several complex mathematical mechanisms:

A. The “Query, Key, and Value” System: To decide how much “weight” to give a word, the AI creates three numerical identities for every token. The Query represents what the token is looking for (like a pronoun searching for a subject), the Key represents what the token offers (like a subject offering its identity), and the Value represents the token’s actual semantic meaning.

A digital illustration depicting a data processing concept with labeled elements: Query, Token, Key, and Value, featuring glowing lines and binary code in a dark background. — AI Sytem to decides hew much Weight to give a word.

B. The Logic of Self-Attention: The AI establishes context by comparing the Query of one word against the Keys of all other words in the sequence. Imagine a judge sitting through a long trial. When a witness says the word ‘It,’ the judge immediately looks back at previous exhibits to see what ‘It’ refers to. The AI does this mathematically by comparing the Query of one word against the Keys of every other word in the sequence. For example, in the sentence “The court sanctioned the attorney because his motion was meritless,” the AI mathematically calculates the relationship between “his” and the surrounding words. The Query for “his” finds a high match with the Key for “attorney,” allowing the model to assign a high Attention Weight to “attorney” so the word “his” inherits the correct context.

A futuristic courtroom scene featuring a humanoid robot analyzing data from a holographic interface while a woman presents evidence at the witness stand, with an audience observing. — Futuristic courtroom where a cyborg judge Queries one word to the Keys of all others to build context,

C. Multi-Head Attention (Parallel Deliberation): The model doesn’t just evaluate the text once; it runs these calculations dozens of times in parallel. Different “Heads” focus on different aspects simultaneously—one might evaluate syntax and grammar, another focuses on technical legal definitions, and a third assesses the overall tone or sentiment.

A futuristic illustration of a brain divided into three sections labeled 'Left', 'Middle', and 'Right'. The 'Left' side features symbols related to grammar and linguistic algorithms. The 'Middle' section displays scales symbolizing law and fairness. The 'Right' side shows diverse facial expressions, representing emotions and mental processing. — AI brain split into three parallel sections working simultaneously. Left side scans floating grammar and punctuation. Middle analyzes justice definations. Right side evaluates holographic floating masks of human emotions.

D. The Decision Layer (Feed-Forward Networks): After attention weights are settled, the data moves into a decision-making layer consisting of billions of Weights (connection strengths) and Biases (baseline leanings). These act as the model’s “institutional knowledge,” which was grown during training to satisfy the objective of predicting the next token.

Illustration of an AI feed-forward network with labeled layers, neurons, weights, and data flow, depicted through vibrant interconnected lines and nodes. — FFN where thickness of neural connections represents weights.

E. The Softmax Verdict: Finally, the model uses a Softmax function to produce a probability list of every possible word in its vocabulary. It calculates the exact odds—for example, assigning “Court” an 85% probability and “Sandwich” a 0.01% probability—and then mathematically samples the winner to generate the next word. Since the Softmax Verdict generates words based on statistical odds rather than verified facts, it is crucial for lawyers to verify the output, which we will also discuss in more detail later in this article.

Digital display of court-related statistics showing a confidence level of 85% with various legal terms and corresponding percentages listed alongside. — Softmax Verdict predicts “Court” to be the most likely next word.

5. The Tech-Minded Legal Professional Level: Probabilistic Advocacy

Definition: For the legal professional, Generative AI is not a database, but a Probabilistic Inference Engine. It does not “find” data in the traditional sense; it infers the most likely response based on the conceptual coordinates of your request and the mathematical “gravity” of the language it was trained on.

A. From Search to Inference

For fifty years, the legal industry’s relationship with technology was deterministic. Traditional legal databases use rigid logic gates: Does Document A contain Word X AND Word Y? If the words are present, it is a ‘hit’; if not, it is ignored, functioning as a simple ‘On/Off’ switch. The Transformer changes this completely. It is not a search database, but a Probabilistic Inference Engine. When you ask it to ‘analyze a witness’s credibility,’ it doesn’t just look for the word ‘credibility’; it infers a conclusion by weighing the context of every word in the record.

An image depicting a metallic switch labeled 'OFF' for 'Deterministic Keyword Search' alongside a graphic illustrating 'Probabilistic Inference (Intent)' with clusters of keywords such as 'Payment', 'Influence', 'Bribe', and 'Arrangement' indicating varying probability connections. — Legal Tech Tools and Search Based on AI Probabilistic Analysis.

B. Navigating the Latent Space

To perform this analysis, the model navigates the Latent Space coordinates of your query. It uses the Self-Attention weights discussed in Level 4 to “infer” a conclusion by weighing the context of every word in the record. It identifies the “Intent” and “Sentiment” within millions of documents in a second. Such tasks were previously impossible for deterministic software.

C. The Weight of the Legal Oath

While the machine provides the “Magic Guesses” of a child and the “Neural Weights” of a scientist, it lacks the professional standing to be an advocate.

The Black Box as an Invitation: The “Black Box” is not an excuse for ignorance; it is an invitation to a higher level of legal practice.
The Human Validator: We use the machine to find the “needle” (the insight), but we use our human judgment to prove it is evidence and not a hallucination.
The Ultimate Weight: In this new era, the most important “Weight” in the entire system is the one held by the human professional.

A digital representation of a scale of justice balancing a black box labeled 'BLACK BOX' with data elements like 'EVIDENCE DATA', 'LOGIC MAP', and 'NEURAL WEIGHTS' on one side, and a gavel representing 'HUMAN JUDGMENT' on the other side. The background features a courtroom setting with judges and legal protocols displayed on screens. — Heavy Weight of the Legal Oath.

6. The “Growing, not Building” Concept: The Genesis of the Black Box

To understand why even the creators of these models cannot always explain a specific output, we have to understand that AI is trained into complexity, rather than just hard-coded with logic.

The Old World of Software: In the past, we built programs based on rigid, transparent logic. If the code said “If X, then Y,” but it did something else, it was a “bug” to be corrected within a deterministic machine.
The New World of Generative AI: This technology is created through Self-Supervised Learning. We don’t provide the model with logic blueprints (corrected spelling from “bluepritns”); instead, we provide an ocean of data and a single objective: “Predict the next piece of information.”
The “Growth” of Intelligence: The model then “grows” its own internal pathways—billions of connections known as Weights and Biases—to satisfy that objective.

Think of it like a massive vine growing through a lattice. As engineers, we provide the lattice (the Transformer architecture), but the vine (the intelligence) grows itself. By the time training is finished, there are hundreds of billions of connections. There is no “Master Code” for a human to read or audit. The “Black Box” is not a wall; it is a forest so dense that no human can map every leaf.

In the era of AI Entanglement, we must judge the AI by its results (the fruit) rather than its process (the roots).

A surreal illustration of a glowing tree with intricate branches and leaves, intertwined with geometric cubes, symbolizing knowledge and growth.

7. The “Context Window” as a Trial Record

In the computer scientist level we discussed the Transformer’s ability to look at a whole document simultaneously. In practice, this capability is governed by the Context Window. In AI, the Context Window is the specific amount of data the model can “Attend” to at any one time. When you upload a 100-page contract, the AI holds that text in a temporary “workspace.”

The Judicial Analogy: Think of the Context Window as a judge’s Active Memory during a hearing.

The Risk of Loss: If a trial lasts for ten days, but the judge can only remember the last two hours of testimony, they will lose the thread of the case.

Hallucination via Omission: They might “hallucinate” a fact not because they are lying, but because they have lost the beginning of the record.

Legal Strategy: For the tech-minded lawyer, you must manage the “Active Record” of your conversation to ensure the model maintains access to critical early facts. In a similar way, a judge relies on a court reporter who makes a transcript of the record to ensure nothing is lost to the passage of time.

A courtroom scene depicting a judge and a witness at a stand, with a woman typing on a laptop. Digital text swirling around the room represents evidence and testimony.

8. Anatomy of a Hallucination

A “Case Study” of a hallucination through the lens of Latent Space will help us to understand them.

Suppose you ask an AI for a case supporting a specific point of Florida law. The AI navigates to the “Neighborhood” of Florida Law and the “Street” of that specific legal issue. It sees a cluster of real cases—Smith v. Jones and Doe v. Roe.

Because it is a Probabilistic Inference Engine, the AI doesn’t naturally “check” a verified list of real cases. Instead, it follows the mathematical pattern of how Florida cases are typically named and cited.

The AI then “generates” Brown v. State—a case that sounds perfectly correct because its coordinates are exactly where a real case should be based on the surrounding patterns. It has followed the statistical “gravity” of the neighborhood, but it has drifted into a sequence of words that is factually untethered from reality.

It is a perfectly logical mathematical guess that happens to be a factual lie. This is the primary reason why we must cross-examine our assistants. We use our human judgment to prove the output is a needle of truth and not a hallucination of the “Black Box.” Cross-Examine Your AI: The Lawyer’s Cure for Hallucinations (12/17/25).

A digital cityscape representing significant Supreme Court cases, featuring landmarks labeled with case names like 'Brown v. State,' 'Roe v. Wade,' and 'Miranda v. Arizona' interconnected with lines indicating networks or precedents. — Latent Space Can Generate AI Hallucinations.

Conclusion: A Symphony of Five Understandings

We have traveled from the magic toy box to the multi-dimensional math of the Transformer. To close, let’s look at the “Black Box” one last time through all five lenses.

The Smart Child sees a magic friend who is the best guesser in the world. To the child, the lesson is simple: the magic friend is fun, but sometimes they make up stories. Enjoy the story, but don’t bet your lunch money on it.

The High Schooler sees a massive “Autocomplete” engine. They understand that the AI is just a mirror of everything we’ve ever written. The lesson: the mirror is only as good as the light you shine into it.

The College Graduate sees the “Latent Space”—a map of human culture turned into math. They realize that meaning is not found in isolated words, but in the mathematical distance and relationship between them.

The Computer Scientist sees the Decoder-only Transformer—a masterpiece of matrix multiplication and Self-Attention weights. They know that “thinking” is just the sound of billions of Query and Key vectors finding their mathematical match.

The Tech-Minded Legal Professional—the “Human in the Loop”—sees a revolution. We see a tool that can navigate the “Intent” and “Sentiment” of millions of documents in a heartbeat using Probabilistic Inference. But we also see the weight of our professional oath.

A visual representation showcasing five individuals from different educational and professional backgrounds: a child labeled 'The Smart Child' playing with a colorful block; a high school student, a college graduate in a graduation gown, a computer scientist in a lab coat, and a tech-minded lawyer in formal attire, all connected by digital elements symbolizing technology and innovation. — Five Faces of the Black Box. My choices. My direction. Writing and images assisted by Gemini AI.

Our New Role: From Searcher to Validator. Electronic discovery professionals are no longer just “Searchers” of data; we are the Validators of a new, probabilistic reality.

We are the ones who must take the “Magic Guesses” of the child, the “Statistical Patterns” of the high schooler, the “Latent Map” of the college graduate, and the “Neural Weights” of the scientist, and forge them into Evidence.

The “Black Box” is not an excuse for ignorance; it is an invitation to a higher level of practice. We use the machine to find the needle, but we use our human judgment to prove it is a needle and not a hallucination.

In the era of AI Entanglement, the most important “Weight” in the entire system is the human in charge: You.

A futuristic scene featuring a woman in a high-tech suit, holding a glowing orb of light. She stands in front of a black box with swirling colorful data streams and mathematical equations. In the background, scientists and a judge observe. Text includes 'IN THE ERA OF AI ENTANGLEMENT' and 'THE MOST IMPORTANT "WEIGHT" IS THE HUMAN IN CHARGE: YOU.' — Assume your place in the AI command chair.

1 Comment | AI Instruction, Gemini AI, Lawyers Duties, Technology, wisdom | Tagged: ai, artificial intelligence, science, technology | Permalink
Posted by Ralph Losey

Information Governance v Search: The Battle Lines Are Redrawn

February 8, 2015

There is a battle in the legal tech world between Information Governance and Search. It reflects a larger conflict in IT and all of society. Last year I came to believe that Information Governance’s preoccupation with classification, retention, and destruction of information was a futile pursuit. I challenged these activities as inefficient and doomed to failure in the age of information explosion. Instead of classify and kill, I embraced the googlesque approach of save and search.

I became wary of the whole approach of governing information as hostile to individual privacy rights and liberties. In my experience IG rules only seemed to serve the large entities who made them. For instance, IG rules typically state that employees have no reasonable expectation of privacy to any communications they may have at work, that all of their email accounts, even personal, can be searched at will. Their every keystroke can be monitored and recorded. Old school records policies seemed to encourage these draconian approaches. Under current U.S. law, these rules are usually enforceable.

Although I appeared to be a lonely searcher-voice in the legal technology world, which is, after all, not too surprising, since law itself is an attempt to govern, I had plenty of good company in the general technology world. There is not only Google, whom you would expect, but also EMC, GE, and a host of others. The debate is part of larger issues surrounding Big Data.

I took up arms against IG as I then knew it, which I understood to be an activity primarily designed to classify, control and delete records. I knew this conflict of approaches in how to treat information was important, and I felt compelled to speak out. Govern or Search is not just a legal issue. It is a cultural issue.

When I first spoke out with a contrarian voice, it created a controversy. Most in the legal establishment thought I was just plain wrong. Many wrote articles respectfully opposing my position. Many more were ready to argue, to fight even. Some did. I was even yelled at once at a CLE speakers dinner by a distinguished leader of IG who bristled at my challenges (some might say baiting). She insisted that everyone in her very large corporation could easily comply with her lengthy retention schedules. Oh brother.

The more thoughtful members of the IG leadership responded to the opposition with dialogue. This requires listening and trying to understand the points of the other side. I understand and favor dialogue, which is what attracted me to Sedona back in the day. I learned from this dialogue that IG, like Search, is not a monolith, that there are various factions and groups within IG.

After months of dialogue with the modern camp of IG, I have come to see that the contest between Search and IG need not be a fight to the death. I came to see a potential win/win outcome to this struggle. To those followers of IG who, like Jason R. Baron, have already transcended the old roles of traditional records keepers, there is no need to fight at all. My quarrel is, instead, with the old-liners, the Records Manager strata of IG who are obsessed with ESI classification and killing. To those who have let go of that traditional role, and already been reborn as multimodal, AI-enhanced Information experts, I have no quarrel. You could say that a partial settlement has been reached by a realignment of the parties.

My opposition continues only with the old-time record keepers with their long complex retention schedules and harsh top down rules. I will continue to oppose these caterpillars, no matter what smoke they may blow my way, unless and until they bow to the inevitable electronic metamorphosis. There has been no settlement with them. Trial in the world court of public opinion continues. I will oppose them for their own good. The librarians should relax, perhaps inhale a bit, cocoon, learn the new tech ways, and reemerge.

The battle against the new age Information Governors is, however, over; although I will remain watchful. Why? Because they in fact have already embraced the search and technology ways of “my side.” As Sun Tzu said: “The supreme art of war is to subdue the enemy without fighting.” Search and technology have won. Information has won. They are all one.

Underneath the superficial differences, and the annoying tendency of IG to claim every other field, including Search, as a subset of its own, both sides share almost all of the same values and concerns. Members of both sides are committed to cybersecurity and privacy, and do not see them as an either or choice. That is critical. We must not sacrifice all of our privacy and individual rights in the name of security.

Where are the rights to both privacy and security in the challenge of too-much-information? I am a strong proponent of privacy, and so are many in the IG world. I am also a strong proponent of cybersecurity. I think it is possible to have both. In both the Search and IG camps their are people who agree with me on these points, and others who disagree. Many see it as one or the other, especially people in government. They take extreme views favoring either security or privacy. Many in both tech and government simply dismiss the importance of privacy, and say just get over it. Advocacy for individual privacy is a separate battle in both worlds, IG and Search. The same is true over cybersecurity. I favor a balanced approach, and so do many in the IG world.

The real battle is not between new IG and Search, it is between the extreme positions that can be found in both camps on the issues of privacy and security. I advocate for a middle ground, privacy and security, and so do many in the IG world. I am also apprehensive of the emergence of Big Brother from Big Data, but, as it turns out, so are many in the IG world. Our common ground is far greater than our differences. Thus a realignment of the parties to our common foes.

Death of a Caterpillar

The traditionalists in the IG world whom I continue to oppose, the ones who are glorified records managers, have another five years, at best, before complete obsolescence. The classify and control lock-down approach of records management is contrary to the time. It cannot withstand the continuing exponential growth of data, nor the basic entropy forces aligned against all attempts to govern by all-too-human rules and compliance. Records managers are caterpillars waiting to be reborn. They should withdraw into a cocoon and embrace the change.

My prediction is that within five years the traditional records management activities, specifically the classification, filing and obsessive deletion of data, will no longer be worth the effort. (I concede that some deletion is necessary and will continue.) It will be far more efficient to rely on advanced Search, than classify and kill. This five-year projection assumes continued exponential growth and complexity of ESI. Breakthroughs in search in the next five years would be nice too, but my prediction does not depend on that. It assumes instead a slow, steady improvement of search technologies. They are already awesome, when used properly. The caterpillar record managers will grow big and fly high with search if they will only allow themselves to have new eyes.

Alas, as of now the old-school IG’ers still see the world through paper glasses. They think that Information Governance is like paper records management, just with more zeros after the number of records involved. The file-everything librarian mentality lives on, or tries to. Yawn. There is a reason nobody in the C-Suite ever took records managers seriously. Dressing them up with new titles is not going to change anything. They have to really change and be reborn into the digital world. They need to learn to fly with search, instead of creeping along with filing rules. They need to embrace the new high-tech world of IG 2.0.

ESI Grows and Changes Too Fast for Traditional Governance

Electronic information is a totally new kind of force, something Mankind has never seen before. Digital Information is a Genie out of the bottle. It cannot be captured. It cannot be managed. It certainly cannot be governed. It cannot even be killed. Forget about trying to put it back in the bottle. It is breeding faster than even Star Trek’s Tribbles could imagine. Like Baron and Paul discussed in their important 2007 law review, ESI is like a new Universe, and we are living just moments after the Big Bang. George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 RICH. J.L. & TECH. 10 (2007).

Ludwig Wittgenstein

What many outside of Google, Baron, and Paul fail to grasp is that Information has a life of its own. Id. at FN 30 (quoting Ludwig Wittgenstein (a 20th Century Austrian philosopher whom I was forced to study while in college in Vienna): “[T]o imagine a language is to imagine a form of life.”) Electronic information is a new and unique life form that defies all attempts of limitation, much less governance. As James Gleick observed in his book on information science, everything is a form of information. The Universe itself is a giant computer and we are all self-evolving algorithms. Gleick, The Information: a history, a theory, a flood.

Many claim that information wants to be free. It does not want to be governed, or charged for. Information is more useful when free and when it is not subject to transitory restraints. Still, it must also be respected and safeguarded.

Stuart Brand of Whole Earth Catalogue fame is credited with originating the phrase information wants to be free, but in fact his quote is taken out of context. His whole quote from the Whole Earth Review, May 1985, actually was:

On the one hand information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.

Regardless of the economic aspects, and whether information really wants to be free, as a practical matter Information itself cannot be governed, even if some of it can be commoditized. Information is moving and growing far too fast for governance. But not too fast for search or security, at least I hope not. There are promising tech methods on the horizon that should guaranty privacy. See eg.: Entangled Photons on Silicon Chip: Secure Communications & Ultrafast Computers, The Hacker News, 1/27/15 (quantum entanglement encryption as the ultimate privacy solution).

Digitized information is like a nuclear reaction that has passed the point of no return. The chain reaction has been triggered. This is what exponential growth really means. In time such fission vision will be obvious. Even people without Google glasses will be able to see it. Just look at the extent of ESI proliferated during any minute of the world today as shown by the chart below. And the volume of ESI stored doubles at least every two years.

In the meantime we have records managers running around who serve like heroic bomb squads. Some know that it is just a noble quest, doomed to failure. Most do not. Some helicopter in and out of corporate worlds like wannabe Brian Williamses. They take flack (for real). They attempt to defuse ticking information bombs. They build walls around it. They confidently set policies and promulgate rules. They inventory it, map it, delete it. They talk sternly about enforcement of rules. (Of course, that never happens, which is one reason the whole effort is futile.) They automate deletion. They also try to automate filing. Some are even starting to make robot file clerks. But is it worth the effort? Might the time and money be better spent to protect our data from black hat hackers? To protect our privacy and individual rights?

The old school IG’ers are all working diligently to try to solve today’s problems of information management. But, all the while, ever new problems encroach upon their walls. They cannot keep up with this growth, the new forms of information. The next generation of exponential growth builds faster than anyone can possibly govern. Do they not know that the nuclear bomb has already exploded? The tipping point has already past?

Information retention policies that are being created today are like sand castles built at low tide. Can you hear the next wave of data generated by the Internet of Things? It will surely wash away all of today’s efforts. There will always be more data, more unexpected new forms of information.

IG Through the Eyes of an AI-Enhanced Butterfly

I used to endorse the old ways myself. I used to be a caterpillar. ESI feared me. I was all about killing data as soon as you no longer had a business need for it. I was all in favor of short retention schedules. But, that was then. That was before I really mastered predictive coding, which in my version means active machine learning. That was before I understood much better than I used to, that we are living in a whole new world of Big Data Analytics.

I now realize that is possible to dramatically reduce the costs of document review. I now realize the incredible power of AI enhanced search. I am starting to realize the potential value of large pools of seeming worthless data. These realizations change everything. I have been reborn as a butterfly with digital wings of AI.

Old school IG, by which I mean e-dressed-up records management, is not the way to deal with today’s all digital world. We are all suffering from information overload. We are all looking for a solution. Will we cope by Search and advanced technology, or by vertical forces of governance and man-made laws? This is an important question for everyone.

My understanding and experiences with Big Data analytics over the last few years have led me understand that more data can mean more intelligence, that it does not necessarily mean more trouble and expense. I understand that more and bigger data has its own unique values, so long as it can be analyzed and searched effectively.

This change of position was reinforced by my observing many litigated cases where companies no longer had the documents they needed to prove their case. The documents had short retention spans. They had all been destroyed in the normal course of business before litigation was ever anticipated. I have seen first hand that yesterday’s trash can be tomorrow’s treasure. I will not even go into the other kind of problems that very short retention policies can place upon a company to immediately implement a lit-hold. The time pressures to get a hold in place can be enormous and thus errors become more likely.

There is a definite dark side to data destruction that many do not like to face. No one knows for sure when data has lost its value. The meaningless email of yesterday about lunch at a certain restaurant could well have a surprise value in the future. For instance, a time-line of what happened when, and to whom, is sometimes an important issue in litigation. These stupid lunch emails could help prove where a witness was and when. They could show that a witness was at lunch, out of the office, and not at a meeting as someone else alleges.

Who knows what value such seemingly worthless data may someday have? Perhaps millions of emails of ten thousand employees about lunch could be used someday to prove or disprove certain class-action allegations. Outside of the little world of litigation, perhaps the information could help management make smarter business decisions. For instance, they could help a company to decide whether to open a company cafeteria, and if so, what kind of food its employees would really like to have served there. Information can prove what really happened in the past and can help you to make the right decisions. With smart search, there can be great hidden value in too much information. Businesses are starting to see this now where Big Data mining is all the buzz. We lawyers need to start doing the same.

The point is, with the never-ending uncertainties of tomorrow, you can never know for sure that information is valueless and should be destroyed, and what information has value and should be saved. There may be an unimaginably large haystack of information, and you may think it only has a few valuable needles. But, you never really know. Today’s irrelevant straw could be tomorrow’s relevant needle. With the AI based search capacities we already have, capacities that are surely to improve, when you need to find a needle in these near infinite stacks, you will be able to. The cost of storage itself has become so low as to become a negligible factor for most large corporations. Why destroy data when you can effectively search it and mine it for value? That is the butterfly view.

Information Technology View on Records Management v. Search

The general IT world is also struggling between whether to go all-in with Search, or keep trying to solve the problem of too much information with records management. Unlike the legal world, where my vote for Search is still a new and small minority, in the IT world search is already a strong voice. Many in IT see attempts at information governance as a knee-jerk reaction from those still transitioning into the digital world. In the last year it seems to me that those favoring search over filing are gaining ground in the technology world. From what I see, the retain and search solution is surging ahead of the old-fashioned govern and destroy approach.

Consider, for instance, the policy of search stated by hot new companies like Pivotal, which is a joint venture between EMC, VMware, and GE. Pivotal’s public mantra is: Store Everything. Analyze Anything. Build the Right Thing.

Pivotal urges its customers to store everything, not just its organized databases, such as financial records. It provides the ability to store all types of data, including especially disorganized data, such as employee emails and texts, and do so in the same place. That is the new gold standard. Pivotal explains the value of store everything this way:

Store everything to create a rich data repository for business needs. With unlimited, supported Pivotal HD enterprises never have to worry about data growth constraints or runaway license costs.

Its suite of Big Data software is designed to allow a company to store all data types in the same place, which it, along with EMC, and others, have started calling a Data Lake. All types and formats of ESI become readable, searchable, in the Data Lake. They do not have to be stored separately, nor searched and analyzed separately. The Data Lakes are also infinitely expandable. Unlike real lakes, they cannot flood. They can instead grow unhindered in cyberspace. All they need are more servers.

These are major breakthroughs and mean the inevitable end of separate data silos by format type and size. This allows you to, in Pivotal’s words, leverage all your data, forever, and place it all in a centralized Business Data Lake. You can analyze multiple data sets and types that live in the Business Data Lake. This allows you to determine the integration value of multiple data sets and types. It also makes storage of Big Data much less expensive.

Bottom line, when all of your data is saved forever, and subject to advanced search analytics, you are empowered to build the right thing. In Pivotal’s words, building the right thing means to deliver a transformative solution to meet today’s demanding business needs. For business that means creation of new products, new advertising, new sales and business methods. For law it means building your case, finding evidence, and creating new legal methods. The promise of Big Data is changing everything in the tech world. Some in IG are also aware of these facts and are adapting ESI management accordingly.

AI-Enhanced Big Data Search Will Greatly Simplify Information Governance

The key problem all large organizations face is the challenge to find the information they need, when they need it, and do so in a cheap and efficient manner. Information needs are determined by both law and personal preferences, including business operation needs. In order to find information, you must first have it. Not only that, you must keep it until you need it. To do that, you need to preserve the information. If you have already destroyed information, really destroyed it I mean, not just deleted it, then obviously you will not be able to find it. You cannot find what does not exist, as all Unicorn chasers eventually find out.

This creates a basic problem for old-school IG because the whole system is based on a notion that the best way to find valuable information is to destroy worthless information. Much of old IG is devoted to trying to determine what information is a valuable needle, and what is worthless chaff. This is because everyone knows that the more information you have, the harder it is for you to find the information you need. The idea is that too much information will cut you off. These maxims were true in the pre-AI-Enhanced Search days, but are, IMO, no longer true today.

In order to meet the basic goal of finding information, old-school IG focuses its efforts on the proper classification of information. Again, the idea was to make it simpler to find information by preserving some of it, the information you might need to access, and destroying the rest. That is where records classification comes in.

The question of what information you need has a time element to it. The time requirements are again based on personal and business operations needs, and on thousands of federal, state and local laws. Information governance thus became a very complicated legal analysis problem. There are literally thousands of laws requiring certain types of information to be preserved for various lengths of time. Of course, you could comply with most of these laws by simply saving everything forever, but, in the past, that was not a realistic solution. There were severe limits on the ability to save information, and the ability to find it. Also, it was presumed that the older information was, the less value it had. Almost all information was thus treated like news.

These ideas were all firmly entrenched before the advent of Big Data and AI-enhanced data mining. In fact, in today’s world there is good reason for Google to save every search, ever done, forever. Some patterns and knowledge only emerge in time and history. New information is sometimes better information, but not necessarily so. In the world of Big Data all information has value, not just the latest.

The records life-cycle ideas all made perfect sense in the world of paper information. It cost a lot of money to save and store paper records. Everyone with a monthly Iron Mountain paper records storage bill knows that. Even after the computer age began, it still cost a fair amount of money to save and store ESI. The computers needed to buy and maintain digital storage used to be very expensive. Finding the ESI you needed quickly on a computer was still very difficult and unreliable. All we had at first was keyword search, and that was very ineffective.

Due to the costs of storage, and the limitations of search, tremendous efforts were made by record managers to try to figure out what information was important, or needed, either from a legal perspective, or a business necessity perspective, and to save that information, and only that information. The old idea behind IG was to destroy the ESI you did not need or were not required by law to preserve. This destruction saved you money, and, it also made possible the whole point of IG, to find the information you wanted, when you wanted it.

Back in the pre-AI search days, the more information you had, the harder it was to find the information you needed. That still seems like common sense. Useless information was destroyed so that you could find valuable information. In reality, with the new and better algorithms we now have for AI-enhanced search, it is just the reverse. The more information you have, the easier it becomes to find what you want. You now have more information to draw upon.

That is the new reality of Big Data. It is a hard intellectual paradigm to jump, and seems counter-intuitive. It took me a long time to get it. The new ability to save and search everything cheaply and efficiently is what is driving the explosion of Big Data services and products. As the save everything, find anything way of thinking takes over, the classification and deletion aspects of IG will naturally dissipate. The records life-cycle will transform into virtual immortality. There is no reason to classify and delete, if you can save everything and find anything at low cost. The issues simplify; they change to how to save and search, although new collateral issues of security and privacy grow in importance.

Recent Breakthroughs in Artificial Intelligence
Make Possible Save Everything, Find Anything

The New York Times in an opinion editorial in late 2014 discussed recent breakthroughs in Artificial Intelligence and speculated on alternative futures this could create. Our Machine Masters, NT Times Op-Ed, by David Brooks (October 31, 2014). The Times article quoted extensively another article in the Wired by technology blogger Kevin Kelly: The Three Breakthroughs That Have Finally Unleashed AI on the World. Kelly argues, as do I, that artificial intelligence has now reached a breakthrough level. This artificial intelligence breakthrough, Kevin Kelly argues, and David Brook’s agrees, is driven by three things: cheap parallel computation technologies, big data collection, and better algorithms. The upshot is clear in the opinion of both Wired and the New York Times: “The business plans of the next 10,000 start-ups are easy to forecast: Take X and add A.I. This is a big deal, and now it’s here.”

These three new technology advances change everything. The Wired article goes into the technology and financial aspects of the new AI; it is where the big money is going and will be made in the next few decades. If Wired is right, then this means in our world of e-discovery, companies and law firms will succeed if, and only if, they add AI to their products and services. The firms and vendors who add AI to document review, and project management, will grow fast. The non-AI enhanced vendors, non-AI enhanced software, will go out of business. The law firms that do not use AI tools will shrink and die. The same goes for IG.

The three big new advances that are allowing better and better AI are nowhere near to threatening the jobs of human judges or lawyers, although they will likely reduce their numbers, and certainly will change their jobs. We are already seeing these changes in Legal Search and Information Governance. Thanks to cheap parallel computation, we now have Big Data Lakes stored in thousands of inexpensive, cloud computers that are operating together. This is where open-sourced software like Hadoop comes in. They make the big clusters of computers possible. The better algorithms is where better AI-enhanced Software comes in. This makes it possible to use predictive coding effectively and inexpensively to find the information needed to resolve law suits. The days of vast numbers of document reviewer attorneys doing linear review are numbered. Instead, we will see a few SMEs, working with small teams of reviewers, search experts, and software experts.

The role of Information Managers will also change drastically. Because of Big Data, cheap parallel computing, and better algorithms, it is now possible to save everything, forever, at a small cost, and to quickly search and find what you need. The new reality of Save Everything, Find Anything undercuts most of the rationale of old paradigm of Information Governance, but not the new. The new paradigm of IG gets it, and relies on AI technology.

The save everything forever AI search model of new IG will create a variety of new legal work for lawyers, but they will be the next generation of tech lawyers. The cybersecurity protection and privacy aspects of Big Data Lakes are already creating many new legal challenges and issues. Big Data breaches already mean Big Money for the law firms who offer curative services. That is happening now. In the future lawyers will play a larger role in preventative security issues. More legal issues are sure to arise with the expansion of Big Data, AI, and development of the next generation of IG. From what I have seen technology creates new jobs as fast as it eliminates old ones. The real challenge is keeping up with the changes.

Conclusion

Ralph_Losey_2013_aba

Preservation is far less difficult when you are anyway saving everything forever. With this approach the challenging task remaining in e-discovery is really just search. That is why I say, only slightly tongue in cheek, that Information Governance is actually a sub-set of Search, not visa versa. In so far as e-discovery is concerned, that is true; but IG is a concern that goes beyond e-discovery.

In the IG now emerging – IG 2.0 – Information Governance serves as a kind of umbrella organization for all things information. It is not just a hyped up version of records management. It is a center of a high-tech wheel built around information. That image has traction for Search advocates such as myself, just so long as search is not considered to be just another spoke in the Wheel. Search has a much more important position. It is the tire around the wheel, where the rubber meets the road. In today’s world you are likely to get lost without it.