This is an article on Information Theory, the Law, e-Discovery, Search and the evolution of our computer technology culture from Information → Knowledge → Wisdom. The article as usual assumes familiarity with writings on AI and the Law, especially active machine learning types of Legal Search. The article also assumes some familiarity with the scientific theory of Information as set forth in James Gleick’s book, The Information: a history, a theory, a flood (2011). I will begin the essay with several good instructional videos on Gleick’s book and Information Theory, including a bit about the life and work of the founder of Information Theory, Claude Shannon. Then I will provide my personal recapitulation of this theory and explore the application to two areas of my current work:
- The search for needles of relevant evidence in large, chaotic, electronic storage systems, such as email servers and email archives, in order to find the truth, the whole truth, and nothing but the truth needed to resolve competing claims of what happened – the facts – in the context of civil and criminal law suits and investigations.
- The articulation of a coherent social theory that makes sense of modern technological life, a theory that I summarize with the phrase: Information → Knowledge → Wisdom. See Information → Knowledge → Wisdom: Progression of Society in the Age of Computers and the more recent, How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.”
I essentially did the same thing in my blog last week applying Chaos Theories. What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition. This essay will, to some extent, build upon the last and so I suggest you read it first.
Gleick’s The Information: a history, a theory, a flood covers the history of cybernetics, computer science, and the men and women involved with Information Theory over the last several decades. Gleick explains how these information scientists today think that everything is ultimately information. The entire Universe, matter and energy, life itself, is made up of information. Information in turn is ultimately binary, zeros and ones, on and off, yes and no. It is all bits and bytes.
Here are three videos, including two interviews of James Gleick, to provide a refresher on Information Theory for those who have not read his book recently. Information Wants to Have Meaning. Or Does It? (3:40, Big Think, 2014).
The Story of Information (3:47, 4th Estate Books, 2012).
The generally accepted Father of Information Theory is Claude Shannon (1916-2001). He is a great visionary engineer whose ideas and inventions led to our current computer age. Among other things, he coined the word Bit in 1948 as the basic unit of information. He was also one of the first MIT hackers, in the original sense of the word as a tinkerer, who was always building new things. The following is a half-hour video by University of California Television (2008) that explains his life’s work and theories. It is worth taking the time to watch it.
Shannon was an unassuming genius, and like Mandelbrot, very quirky and interested in many different things in a wide variety of disciplines. Aside from being a great mathematician, Bell Labs engineer, and MIT professor, Shannon also studied game theory. He went beyond theory and devised several math based probability methods to win at certain games of chance, including card counting at blackjack. He collaborated with a friend at MIT, another mathematician, Edward Thorp, who became a professional gambler.
Shannon, his wife, and Thorp travelled regularly to Las Vegas for a couple of years in the early sixties where they constantly won at the tables using their math tricks, including card counting. Shannon wanted to beat the roulette wheel too, but the system he and Thorp developed to do that required probability calculations beyond what he could do in his head. To solve this problem in 1961 he invented a small, concealable computer, the world’s first wearable computer, to help him calculate the odds. It was the size of a cigarette pack. His Law Vegas exploits became the very loose factual basis for a 2008 movie “21“, where Kevin Spacey played Shannon. (Poor movie, not worth watching.)
Shannon made even more money by applying his math abilities in the stock market. The list of his eclectic genius goes on and on, including his invention in 1950 of an electromechanical mouse named Theseus that could teach itself how to escape from a maze. Shannon’s mouse appears to have been the first artificial learning device. All that, and he was also an ardent juggler and builder/rider of little bitty unicycles (you cannot make this stuff up). Here is another good video of his life, and yet another to celebrate 2016 as the 100th year after his birth, The Shannon Centennial: 1100100 years of bits by the IEEE Information Theory Society.
For a different view loosely connected with Information Theory I recommend that you listen to an interesting Google Talk by Gleick.“The Information: A History, a Theory, a Flood” – Talks at Google (53:45, Google, 2011). It pertains to news and culture and the tension between a humanistic and mechanical approach, a difference that mirrors the tension between Information and Knowledge. This is a must read for all news readers, especially NY Times readers, and for everyone who consumes, filters, creates and curates Information (a Google term). This video has a good dialogue concerning modern culture and search.
As you can see from the above Google Talk, a kind of Hybrid Multimodal approach seems to be in use in all advanced search. At Google they called it a “mixed-model.” The search tools are designed to filter identity-consonance in favor of diverse-harmonies. Crowd sourcing and algorithms function as curation authority to facilitate Google search. This is a kind of editing by omission that human news editors have been doing for centuries.
The mixed-model approach implied here has both human and AI editors working together to create new kinds of interactive search. Again, good search depends upon a combination of AI and human intelligence. Neither side should work alone and commercial interests should not be allowed to take control. Both humans and machines should create bits and transmit them. People should use AI software to refine their own searches as an ongoing process. This should be a conversation, an interactive Q&A. This should provide a way out of Information to Knowledge.
Personal Interpretation of Information Theory
My takeaway from the far out reaches of Information theories is that everything is information, even life. All living entities are essentially algorithms of information, including humans. We are intelligent programs capable of deciding yes or no, capable of conscious, intelligent action, binary code. Our ultimate function is to transform information, to process and connect otherwise cold, random data. That is the way most Information Theorists and their philosophers see it, although I am not sure I agree.
Life forms like us are said to stand as the counter-pole to the Second Law of Thermodynamics. The First Law you will remember is that energy cannot be created or destroyed. The Second Law is that the natural tendency of any isolated system is to degenerate into a more disordered state. The Second Law is concerned with the observed one-directional nature of all energy processes. For example, heat always flows spontaneously from hotter to colder bodies, and never the reverse, unless external work is performed on the system. The result is that entropy always increases with the flow of time.
The Second Law is causality by multiplication, not a zig-zag Mandelbrot fractal division. See my last blog on Chaos Theory. Also see: the work of the Austrian Physicist, Ludwig Boltzmann (1844–1906) on gas-dynamical equations, and his famous H-theorem: the entropy of a gas prepared in a state of less than complete disorder must inevitably increase, as the gas molecules are allowed to collide. Boltzman’s theorem-proof assumed “molecular chaos,” or, as he put it, the Stosszahlansatz, where all particle velocities were completely uncorrelated, random, and did not follow from Newtonian dynamics. His proof of the Second Law was attacked based on the random state assumption and the so called Loschmidt’s paradox. The attacks from pre-Chaos, Newtonian dominated scientists, many of whom still did not even believe in atoms and molecules, contributed to Boltzman’s depression and, tragically, he hanged himself at age 62.
My personal interpretation of Information Theory is that humans, like all of life, counter-act and balance the Second Law. We do so by an organizing force called negentropy that balances out entropy. Complex algorithms like ourselves can recognize order in information, can make sense of it. Information can have meaning, but only by our apprehension of it. We hear the falling tree and thereby make it real.
This is what I mean by the transition from Information to Knowledge. Systems that have ability to process information, to bring order out of chaos, and attach meaning to information, embody that transition. Information is essentially dead, whereas Knowledge is living. Life itself is a kind of Information spun together and integrated into meaningful Knowledge.
We humans have the ability to process information, to find connections and meaning. We have created machines to help us to do that. We now have information systems – algorithms – that can learn, both on their own and with our help. We humans also have the ability find things. We can search and filter to perceive the world in such a way as to comprehend its essential truth. To see through appearances, It is an essential survival skill. The unseen tiger is death. Now, in the Information Age, we have created machines to help us find things, help us see the hidden patterns.
We can create meaning, we can know the truth. Our machines, our robot friends, can help us in these pursuits. They can help us attain insights into the hidden order behind chaotic systems of otherwise meaningless information. Humans are negentropic to a high degree, probably more so than any other living system on this planet. With the help of our robot friends, humans can quickly populate the world with meaning and move beyond a mere Information Age. We can find order, process the binary yes-or-no choices and generate Knowledge. This is similar is the skilled editor’s function discussed in Gleick’s Talks at Google (53:45, Google, 2011), but one whose abilities are greatly enhanced by AI analytics and crowdsourcing. The arbitration of truth as they put it in the video is thereby facilitated.
With the help of computers our abilities to create Knowledge are exploding. We may survive the Information flood. Some day our Knowledge may evolve even further, into higher-level integrations – into Wisdom.
When James Gleick was interviewed by Publishers Weekly in 2011 about his book, The Information: a history, a theory, a flood, he touched upon the problem with Information:
By the technical definition, all information has a certain value, regardless of whether the message it conveys is true or false. A message could be complete nonsense, for example, and still take 1,000 bits. So while the technical definition has helped us become powerful users of information, it also instantly put us on thin ice, because everything we care about involves meaning, truth, and, ultimately, something like wisdom. And as we now flood the world with information, it becomes harder and harder to find meaning. That paradox is the final tension in my book.
Application of Information Theory to e-Discovery and Social Progress
In responding to lawsuits we must search through information stored in computer systems. We are searching for information relevant to a dispute. This dispute always arises after the information was created and stored. We do not order and store information according to issues in a dispute or litigation that has not yet happened. This means that for purposes of litigation all information storage systems are inherently entropic, chaotic. They are always inadequately ordered, as far as the lawsuit is concerned. Even if the ESI storage is otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email accounts), it is never well-ordered for a particular lawsuit.
As forensic evidence finders we must always sort through meaningless, irrelevant noise to find the meaningful, relevant information we need. The information we search is usually not completely random. There is some order to it, some meaning. There are, for instance, custodian and time parameters that assist our search for relevance. But the ESI we search is never presented to us arranged in an order that tracks the issues raised by the new lawsuit. The ESI we search is arranged according to other logic, if any at all.
It is our job to bring order to the chaos, meaning to the information, by separating the relevant information from the irrelevant information. We search and find the documents that have meaning for our case. We use sampling, metrics, and iteration to achieve our goals of precision and recall. Once we separate the relevant documents from the irrelevant, we attain some knowledge of the total dataset. We have completed First Pass Review, but our work is not finished. All of the relevant information found in the First Pass is not produced.
Additional information refinement is required. More yes-no decisions must be made in what is called Second Pass Review. Now we consider whether a relevant document is privileged and thus excluded from production, or whether portions of it must be redacted to protect confidentiality.
Even after our knowledge is so further enhanced by confidentiality sorting, and a production set is made, the documents produced, our work is still incomplete. There is almost always far too much information in the documents produced for them to be useful. The information must be further processed. Relevancy itself must be ranked. The relevant documents must be refined down to the 7 +/- 2 documents that will persuade the judge and jury to rule our way, to reach the yes or no decision we seek. The vast body of knowledge, relevant evidence, must become wisdom, must become persuasive evidence.
In a typical significant lawsuit the metrics of this process are as follows: from trillions, to thousands, to a handful. (You can change the numbers if you want to fit the dispute, but what counts here are the relative proportions.)
In a typical lawsuit today we begin with an information storage system that contains trillions of computer files. A competent e-discovery team is able to reduce this down to tens of thousands of files, maybe less, that are relevant. The actual count depends on many things, including issue complexity, cooperation and Rule 26(b)(1) factors. The step from trillions of files, to tens of thousands of relevant files, is the step from information to knowledge. Many think this is what e-discovery is all about: find the relevant evidence, convert Information to Knowledge. But it is not. It is just the first step: from 1 to 2. The next step, 2 to 3, the Wisdom step, is more difficult and far more important.
The tens of thousands of relevant evidence, the knowledge of the case, is still too vast to be useful. After all, the human brain can, at best, only keep seven items in mind at a time. Miller, The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review 63 (2): 81–97. Tens of thousands of documents, or even thousands of documents, are not helpful to jurors. It may all be relevant, but is not all important. All trial lawyers will tell you that trials are won or lost by only five to nine documents. The rest is just noise, or soon forgotten foundation. Losey, Secrets of Search – Part III (5th secret).
The final step of information processing in e-discovery is only complete when the tens of thousands of files are winnowed down to 5 or 9 documents, or less. That is the final step of Information’s journey, the elevation from Knowledge to Wisdom.
Our challenge as e-discovery team members is to take raw information and turn it into wisdom – the five to nine documents with powerful meaning that will produce the favorable legal rulings that we seek. Testimony helps too of course, but without documents, it is difficult to test memory accuracy, much less veracity. This evidence journey mirrors the challenge of our whole culture, to avoid drowning in too-much-information, to rise above, to find Knowledge and, with luck, a few pearls of Wisdom.
From trillions to a handful, from mere information to practical wisdom — that is the challenge of our culture today. On a recursive self-similar level, that is also the challenge of justice in the Information Age, the challenge of e-discovery. How to meet the challenges? How to self-organize from out of the chaos of too much information? The answer is iterative, cooperative, interactive, interdisciplinary team processes that employ advanced hybrid, multimodal technologies and sound human judgment. See What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition.
The micro-answer for cyber-investigators searching for evidence is fast becoming clear. It depends on a balanced hybrid application of human and artificial intelligence. What was once a novel invention, TAR, or technology assisted review, is rapidly becoming an obvious solution accepted in courts around the world. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015); Pyrrho Investments v MWB Property, EWHC 256 (Ch) (2/26/16). That is how information works. What was novel one day, even absurd, can very quickly become commonplace. We are creating, transmitting and processing information faster than ever before. The bits are flying at a rate that even Claude Shannon would never have dreamed possible.
The pace of change quickens as information and communication grows. New information flows and inventions propagate. The encouragement of negentropic innovation – ordered bits – is the basis of our property laws and commerce. The right information at the right time has great value.
Just ask a trial lawyer armed with five powerful documents — five smoking guns. These essential core documents are what make or break a case. The rest is just so much background noise, relevant but unimportant. The smoking hot Wisdom is what counts, not Information, not even Knowledge, although they are, of course, necessary prerequisites. There is a significant difference between inspiration and wisdom. Real wisdom does not just appear out of thin air. It arises out of True Information and Knowledge.
The challenge of Culture, including Law and Justice in our Information Age, is to never lose sight of this fundamental truth, this fundamental pattern: Information → Knowledge → Wisdom. If we do, we will get lost in the details. We will drown in a flood of meaningless information. Either that, or we will progress, but not far enough. We will become lost in knowledge and suffer paralysis by analysis. We will know too much, know everything, except what to do. Yes or No. Binary action. The tree may fall, but we never hear it, so neither does the judge or jury. The power of the truth is denied,
There is deep knowledge to be gained from both Chaos and Information Theories that can be applied to the challenges. Some of the insights can be applied in legal search and other cyber investigations. Others can be applied in other areas. As shown in this essay, details are important, but never lose sight of the fundamental pattern. You are looking for the few key facts. Like the Mandelbrot Set they remain the same, or at least similar, over different scales of magnitude, from the small county court case, to the largest complex multinational actions. Each case is different, yet the same. The procedures ties them all together.
Meaning is the whole point of Information. Justice is whole point of the Law.
You find the truth of a legal controversy by finding the hidden order that ties together all of the bits of evidence together. You find the hidden meaning behind all of the apparent contradictory clues, a fractal link of the near infinite strings of bits and bytes.
What really happened? What is the just response, the equitable remedy? That is the ultimate meaning of e-discovery, to find the few significant, relevant facts in large chaotic systems, the facts that make or break your case, so that judges and juries can make the right call. Perhaps this is the ultimate meaning of many of life’s challenges? I do not have the wisdom yet to know, but, as Cat Stevens says, I’m on the road to find out.