Ruminations on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery

I had a chance this week to think about a book I read a few weeks ago by James Gleick: The Information: a history, a theory, a flood. I like Gleick. His first book, Chaos: making a new science (1987) was inspirational for me. It started me on an intellectual journey to try to understand these scientific discoveries. I was hoping for more of the same from The Information. Unfortunately, I did not get it. But with effort I was able to find some meaning from his new book, and applications to my own somewhat related field, e-discovery, which I will share in this blog.

The Influence of the Science of Chaos on My Ideas About e-Discovery

Before I go into Gleick’s latest book, and why it was disappointing, I have to go into his first book, and why, for me at least, it was so good. Chaos was a clear and stimulating book. It was filled with big ideas that came together at the end and made sense. Gleick’s book helped motivate me, a lawyer with little or no scientific training, to read more to try to understand the new science of chaos. I started studying math and geometry, including some far-out stuff  like complex, imaginary numbers and infinite recursive geometries. That was a big stretch for a liberal arts based lawyer like myself.

Much of the science went over my head, especially the advanced math, but still my long experience with computers allowed me to understand how Chaos theories emerged from their information processing power. Apparently many other people were interested in Chaos theories too, and how they might apply to their life, as the number of science and math books on the subject in the stores exploded to meet the new demand. The butterfly’s wings had flapped and many of us have never looked at the world the same again.

When I read Chaos, it led, among other things, to further study and appreciation of the great French mathematician, Benoit Mandelbrot, who spent most of his life employed by IBM. That led to my in-depth study of fractals, especially the famous computer generated fractal that Mandelbrot discovered and now bears his name. This fractal demonstrated the hidden order behind chaos. It taught me about recursive self-similarity over scales of magnitude.

The ability to bring order out of chaos by iteration and simple mathematical processes is a key insight of contemporary science and math. This insight later inspired my own work in electronic discovery. It is the basis of my later invention of several new legal methodologies. These are methods designed to find relevant information through iterative processes, sampling, cost projections, and communications between counsel; new methods designed to find the needle in the haystack without breaking the bank.

The latest research in TREC Legal Track seems to be confirming the validity of this iterative sampling communicative approach. Baron, J.R., Law in the Age of Exabytes: Some Further Thoughts on “Information Inflation” and Current Issues in e-Discovery Search, Richmond Journal of Law and Technology, Vol. XVII, Issue 3 (Spring 2011) at Fn. 92. Advanced concept search software with latent semantic indexing, and the like, certainly help with the key problems of search. Id. at Fns. 123, 124. But software and other new technologies alone will not work without new processes and methods –entirely new methods of human judgment that use iteration, cooperation, sampling, and quality controls. Search depends on both advanced software algorithms and processes controlled and designed by lawyers. There seems to be a consensus emerging on this. Id. at para [7]. Sigler, S., Search Method Validation or eDiscovery Standards? What is Really Needed for eDiscovery Search and Retrieval to be Successful?, e-Discovery Team (3/27/2011); Baron, J., In Search of Quality: Is It Time for E-Discovery Search Process Quality Standards?, e-Discovery Team (3/13/2011).

Maura Grossman and Gordon Cormack explained the most recent findings of TREC Legal Track in the Conclusion to their article, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, Vol. XVII, Issue 3 (Spring 2011):

The particular processes found to be superior in this study are both interactive, employing a combination of computer and human input. While these processes require the review of orders of magnitude fewer documents than exhaustive manual review, neither entails the naïve application of technology absent human judgment.

Human judgment, combined with technology and fractal iterative sampling processes, is the most efficient way to sort through large volumes of information and find the key information needed. Modern thought on e-discovery is, to me at least, a natural outgrowth of the chaos theories that have been driving science since the 1970s.

Unfulfilled Expectations

Since Gleick had inspired me before with his book, Chaos, I was hoping for the same experience with The Information. I was hoping it would show a way to really understand the intellectual foundations of computer science and information science. I thought maybe The Information would have insights and meaning that would make me a better e-discovery lawyer, maybe even a better person. (Yes, I have high hopes for books!) The Information offered the promise of explaining the latest science and math behind the Information Age, the explosion of information that drives electronic discovery law today. The Information offered the promise of new substance and meaning.

My high hopes and history help explain my disappointment with The Information. It has some good moments. I learned things that I did not know before about the history of cybernetics, computer science, math, and the men and women involved. I got more information, but little meaning. Overall the book seems hollow, too technical, and, unlike Chaos, it did not all come-together in the end. It was a struggle to read and there was no real pay-off. I finished this over 500 page book tired and disappointed. I expected inspiration and meaning. I  expected knowledge, perhaps even wisdom from Gleick. But instead, all I got was information, too much information (TMI).

The book in my opinion contributes to the problem that it points out; that raw information is a flood, a deluge, that is leading to us to more chaos and entropy. As an e-discovery lawyer I know all too well about TMI, about the challenges of finding select information in a chaotic system of large corporate enterprises. Now, with The Information, I have more TMI about TMI. For a full and proper book review see the NY Times article. My role is not to review books, but to look for their meaning and impact on the law.

Since Gleick’s The Information did not really provide answers. I was forced to try to figure it out for myself, to find the meaning of the information presented in the book. This was my way to fight the entropy the book had created in my mind. The alternative was just to forget the whole thing, which, I suspect, is how most readers will react. Still, I had invested a lot of time in reading The Information, so I figured a few more hours to try to sort it out was worth the effort.

My Interpretation of the Ideas in The Information

Here is what I think the book is saying. Anyway, it is how I make sense of the information in The Information.

Science is finding that everything is ultimately information. The entire Universe, matter and energy, life itself, is made up of information. Information in turn is ultimately binary, zeros and ones, on and off, yes and no. It is all bits. It’s turtles all the way down.

The Universe is a giant computer and we are all self-evolving algorithms, one of a near infinite number of intelligent programs capable of deciding yes or no, capable of conscious action. The Universe tends towards disorganized information, to cold unconscious data that is not processed, not connected. It follows the Second Law of Thermodynamics in physics. The Universe tends to entropy. There is a black hole in the center of each Galaxy. Still, the Universe is also capable of negentropy. It is capable of order, of processed information. Systems can find order in the information, can make sense of information, can find connections, meaning. Systems that have abilities to process information, to bring order out of chaos, tend to be considered living. Life is the ability to maintain order in the chaos, to be negentropic and meaning generative.

We humans have the ability to process information, to find connections and meaning. We can find things. We can create meaning. We can have insights into the hidden order behind entropic systems that move towards chaos, what we usually think of as inert matter, non-living things. We humans are negentropic to a high degree, probably more so than any other living system on this planet. We can find the order, process the information, and generate knowledge. We do not just see information, we see meaning. Knowledge grows from meaning. We can process this knowledge even further into self-evolving actions, into wisdom.

I think this is what Gleick is saying in The Information. But who knows, I could be wrong. My summary of the information in The Information could be misinformation, but for me it has meaning.

When asked about all of this in an interview by Publishers Weekly Gleick did his best to explain:

By the technical definition, all information has a certain value, regardless of whether the message it conveys is true or false. A message could be complete nonsense, for example, and still take 1,000 bits. So while the technical definition has helped us become powerful users of information, it also instantly put us on thin ice, because everything we care about involves meaning, truth, and, ultimately, something like wisdom. And as we now flood the world with information, it becomes harder and harder to find meaning. That paradox is the final tension in my book.

Chaos Theories, Information, Search, and e-Discovery

So what is this all supposed to mean to electronic discovery? In responding to lawsuits we must search through information stored in computer systems. We are searching for information relevant to a dispute. This dispute necessarily developed and took final form after the information was created and stored, and well after the information storage systems were designed. Information is not stored by anyone or any organization according to a future order of relevance that is unknown at the time of storage. For purposes of our litigation, of finding information relevant to the issues in our case, information storage systems are always too entropic. They are always inadequately ordered, as far as the law suit is concerned, even if they are otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email accounts). Since time is an impenetrable barrier, for our purposes as evidence finders we are always dealing with inadequately ordered information.

Information can only be stored and ordered according to what is known. Lawsuits before filing are just latent future events whose contours and order are never fully known, even if their potential is recognized and precautions taken to avoid or minimize litigation risks. It is sort of like quantum mechanics. The exact positions of electrons and photons are just probabilities. Perhaps someday quantum computers will overcome this difficulty. (Google and others are working on limited versions of it now.) Perhaps someday we will be able to store information so that it can be easily retrieved for a dispute that has not yet materialized, for a relevancy not yet formed, not yet observed a la Heisenberg. But, with the limited capacities of today’s finite based, non-quantum computers, we cannot organize information for near infinite variables, for purposes of uncertain events that have not yet occurred. We cannot store information according to issues in a dispute or litigation that is not yet at hand.

Due to the limitations of time, and the complexity — the chaos — inherent in possible future events, we lawyers are always essentially dealing with disordered information. We search through information that has a high degree of entropy and meaninglessness to our case. The information we search through is usually not completely random. There is some order to it, some meaning. There are, for instance, custodian and time parameters that assist our search for relevance. But the ESI we search is never presented to us arranged in an order that tracks issues that were just raised by a new law suit. The ESI we search is arranged according to other orders. Sometimes the order behind the ESI we search is very weak, like most email systems, and sometimes very strong, like databases. But it is always disordered for purposes of relevancy. It is our job to find the hidden order, to bring order to the chaos by separating the relevant information from the irrelevant information. We search and find the documents that have meaning for our case. We use sampling, metrics, and iteration to achieve our goals of precision and recall.

Once we have separated the relevant from the irrelevant, which in large ESI collections is a process that iterates until budgetary constraints are reached, we have moved from information to knowledge. We have added meaning to the raw bits and bytes. But our work is not finished. All relevant information is not produced, much less useful. Further knowledge refinement is required. More yes-no decisions must be made. Is this piece of information privileged and thus excluded from production?

Even after the knowledge is further enhanced, and a production set is made. Our work is still incomplete. In litigation sorting and gathering relevant producible information, the evidence, is not enough. There is almost always far too much of this knowledge to be useful. The knowledge must be further processed. Relevancy itself must be ranked. The relevant documents must be refined down to the seven or fewer documents that will persuade the judge and jury to rule our way, to reach the yes or no decision we seek. The vast body of knowledge, relevant evidence, must become wisdom, persuasive evidence.

Metrics of Meaning in e-Discovery

In a typical significant law suit, the metrics of this process are as follows: from trillions, to thousands, to a handful. (You can change the numbers if you want to fit the dispute, but what counts here are the relative proportions.) In a typical case today an enterprise stores from three trillion to seven trillion computer files in its computers (3,000,000,000,000 – 7,000,000,000,000). A competent e-discovery team is able to reduce this down to thirty thousand to seventy thousand files that are relevant (30,000 – 70,000). (Maybe the e-discovery team can do even better than this, and reduce to 3,000 to 7,000 files. It depends on many things, including primarily cooperation.) This is the knowledge of the lawsuit gathered from the raw information. Many think this is what e-discovery is all about: find the relevant evidence, convert information to knowledge. But it is not. It is just the first step: from 1 to 2. The next step, 2 to 3, is more difficult and far more important.

The relevant evidence, the knowledge of the case, is still too vast in today’s trillion-file world. The human brain can, at best, only keep seven items in mind at a time. Tens of thousands of documents, or even thousands of documents, are not helpful to human jurors. It may all be relevant. But it is not all important. All trial lawyers will tell you that trials are won or lost on only three to seven documents. The rest is just noise, or soon forgotten foundation.

So the final step of information processing in e-discovery is only complete when the 30,000 – 70,000 files are winnowed down to 3 to 7 files. That is the final step of information processing, the distillation from knowledge to wisdom. Our challenge as e-discovery team members is to take TMI and turn it into wisdom –the three to seven documents with powerful meaning that will produce the yes or no decisions we seek.

Conclusion

From Three Trillion to Three, from just information to practical wisdom — that is the challenge of chaos and entropy in the law today. That is the challenge of justice in the Information Age. How to meet that challenge? How to self-organize the needed order from the chaos of TMI? Iterative, cooperative, communication processes that employ advanced technologies, sampling, metrics, and sound human judgment. The answer is fast becoming clear to every specialist. What was once a novel invention is rapidly becoming an obvious solution. That is how information works. What was novel one day, even absurd, can very quickly become common place and establishment. We are processing information faster than before.

The pace of change quickens as information and communication grows. New information flows and inventions propagate. The encouragement of such negentropic innovation is the basis of our patent laws, the basis of our commerce. The right information at the right time has great value. Just ask any trial lawyer armed with four powerful documents — four smoking guns. They are what make or break a case. The rest is just so much background noise, relevant but unimportant. Wisdom is what counts, not information, not even knowledge. The challenge of law and justice in our Information Age is to never lose sight of this fundamental truth. If we do, we get lost in the details. We drown in a flood of meaningless information. We lose the big picture. We lose the case.

4 Responses to Ruminations on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery

  1. Ralph Artigliere says:

    Ralph, you provoke reflection and interesting paths of information and knowledge, for which I thank you. As a lawyer and a judge, I had precious little time for reflection, so your blog would have been valuable to me then as it is now when I have a bit more time to think about things other than existing cases. Anyway: here is a thought. You say:
    “For purposes of our litigation, of finding information relevant to the issues in our case, information storage systems are always too entropic. They are always inadequately ordered, as far as the law suit is concerned, even if they are otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email accounts.

    What about the added factor of deception, hiding, analog or digital man-created disorder? That makes finding and processing discoverable ESI much more difficult in my opinion. Just a thought.

  2. […] on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery – http://tinyurl.com/3vcmnda (Ralph […]

  3. […] humor. I’ve opined on the distinctions between information, knowledge and wisdom before in Ruminations on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery. Information is easy to supply, in fact our essential problem now as a culture is too much […]

  4. […] have talked about this article before in Ruminations on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery, but now I am going to dive deep and encourage you to follow, to see the details for yourself. So […]

Leave a Reply

Discover more from e-Discovery Team

Subscribe now to keep reading and get access to the full archive.

Continue reading