Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Five

October 9, 2016

predictive_coding_quality_triangleThis is the fifth installment of the article explaining the e-Discovery Team’s latest enhancements to electronic document review using Predictive Coding. Here are Parts OneTwoThree and Four. This series explains the nine insights behind the latest upgrade to version 4.0 and the slight revisions these insights triggered to the eight-step workflow. We have already covered five of the nine insights. In this installment we will cover the remaining four: GIGO & QC (Garbage In, Garbage Out) (Quality Control); SME (Subject Matter Expert); Method (for electronic document review); and, Software (for electronic document review). The last three: SME – Method – Software, are all parts of Quality Control.

GIGO & QC – Garbage In, Garbage Out & Quality Control

Garbage In, Garbage Out is one of the oldest sayings in the computer world. You put garbage into the computer and it will spit it back at you in spades. It is almost as true today as it was in the 1980s when it was first popularized. Smart technology that recognizes and corrects for some mistakes has tempered GIGO somewhat, but it still remains a controlling principle of computer usage.

garbage-in-garbage-out

The GIGO Wikipedia entry explains that:

GIGO in the field of computer science or information and communications technology refers to the fact that computers, since they operate by logical processes, will unquestioningly process unintended, even nonsensical, input data (“garbage in”) and produce undesired, often nonsensical, output (“garbage out”). … It was popular in the early days of computing, but applies even more today, when powerful computers can produce large amounts of erroneous information in a short time.

Wikipedia also pointed out an interesting new expansion of the GIGO Acronym, Garbage In, Gospel Out:

It is a sardonic comment on the tendency to put excessive trust in “computerized” data, and on the propensity for individuals to blindly accept what the computer says.

Now as to our insight: GIGO in electronic document review, especially review using predictive coding, is largely the result of human error on the part of the Subject Matter Expert. Of course, garbage can also be created by poor methods, where too many mistakes are made, and by poor software. But to really mess things up, you need a clueless SME. These same factors also create garbage (poor results) when used with any document review techniques. When the subject matter expert is not good, when he or she does not have a good grasp for what is relevant, and what is important for the case, then all methods fail. Keywords and active machine learning both depend on reliable attorney expertise. Quality control literally must start at the top of any electronic document review project. It must start with the SME.

Missed_target

If your attorney expert, your SME, has no clue, their head is essentially garbage. With that kind of bad input, you will inevitably get bad output. This happens with all usages of a computer, but especially when using predictive coding. The computer learns what you teach it. Teach it garbage and that is what it will learn. It will hit a target all right. Just not the right target. Documents will be produced, just not the ones needed to resolve the disputed issues. A poor SME makes too many mistakes and misses too many relevant documents because they do not know what is relevant and what is not.

Robot_BYTEA smart AI can correct for some human errors (perfection is not required). The algorithms can correct for some mistakes in consistency by an SME, and the rest of the review team, but not that many. In machine learning for document review the legal review robot now starts as a blank slate with no knowledge of the law or the case. They depend on the SME to teach them. Someday that may change. We may see smart robots who know the law and relevance, but we are not even near there yet. For now our robots are more like small children. They only know what you tell them, but they can spot inconsistencies in your message and they never forget.

Subject Matter Expert – SME

The predictive coding method can fail spectacularly with a poor expert, but so can keyword search. The converse of both propositions is also true. In all legal document review projects the SME needs to be an expert in scope of relevance, what is permitted discovery, what is relevant and what is not, what is important and what is not. They need to know the legal rules governing relevance backwards and forwards. They also need to have a clear understanding of the probative value of evidence in legal proceedings. This is what allows an attorney to know the scope of discoverable information.

relevance_scope_2016

If the attorney in charge does not understand the scope of discoverable information, does not understand probative value, then the odds of finding the documents important to a case are significantly diminished. You could look at a document with high probative value and not even know that it is relevant. This is exactly the concern of many requesting parties, that the responding party’s attorney will not understand relevance and discoverability the same way they do. That is why the first step in my recommended work flow is to Talk, which I also call Relevance Dialogues.

The kind of ESI communications with opposing counsel that are needed is not whining accusations or aggressive posturing. I will go into good talk versus bad talk in some detail when I explain the first step of our eight-step method. The point of the talking that should begin any document review project is to get a common understanding of scope of discoverable information. What is the exact scope of the request for production? Don’t agree the scope is proportionate? That’s fine. Agree to disagree and Talk some more, this time to the judge.

We have seen firsthand in the TREC experiments the damage  that can be done by a poor SME and no judge to keep them inline. Frankly, it has been something of a shock, or wake up call, as to the dangers of poor SME relevance calling. Most of the time I am quite lucky in my firm of super-specialists (all we do is employment law matters) to have terrific SMEs. But I have been a lawyer for a long time. I have seen some real losers in this capacity in the past 36 years. I myself have been a poor SME in some of the 2015 TREC experiments. An example that comes to mind is when I had to be the SME on the subject of CAPTCHA in a collection of forum messages by hackers. It ended up being on the job training. I saw for myself how little I could do to guide the project. Weak SMEs make bad leaders in the world of technology and law.

captcha

spoiled brat becomes an "adult"There are two basic ways that discovery SMEs fail. First, there are the kind who do not really know what they are talking about. They do not have expertise in the subject matter of the case, or, let’s be charitable, their expertise is insufficient. A bullshit artist makes a terrible SME. They may fool the client (and they often do), but they do not fool the judge or any real experts. The second kind of weak SMEs have some expertise, but they lack experience. In my old firm we used to call them baby lawyers. They have knowledge, but not wisdom. They lack the practical experience and skills that can only come from grappling with these relevance issues in many cases.

That is one reason why boutique law firms like my own do so well in today’s competitive environment. They have the knowledge and the wisdom that comes from specialization. They have seen it before and know what to do. Knowledge_Information_Wisdom

An SME with poor expertise has a very difficult time knowing if a document is relevant or not. For instance, a person not living in Florida might have a very different understanding than a Floridian of what non-native plants and animals threaten the Florida ecosystem. This was Topic 408 in TREC 2016 Total Recall Track. A native Floridian is in a better position to know the important invasive species, even ones like vines that have been in the state for over a hundred years. A non-expert with only limited information may not know, for instance, that Kudzo vines are an invasive plant from Japan and China. (They are also rumored to be the home of small, vicious Kudzo monkeys!) What is known for sure is that Kudzu, Pueraria montana, smothers all other vegetation around, including tall trees (shown below). A native Floridian hates Kudzo as much as they love Manatees.

kudzu

A person who has just visited Florida a few times would not know what a big deal Kudzo was in Florida during the Jeb Bush administration, especially in Northern Florida. (Still is.) They had probably never heard of it at all. They could see email with the term and have no idea what the email meant. It is obvious the native SME would know more, and thus be better positioned than a fake-SME, to determine Jeb Bush email relevance to non-native plants and animals that threaten the Florida ecosystem. By the way, all native Floridians especially hate pythons and a python eating one of our gators as shown below is an abomination.

python

Expertise is obviously needed for anyone to be a subject matter expert and know the difference between relevant and irrelevant. But there is more to it than information and knowledge. It also takes experience. It takes an attorney who has handled these kinds of cases many times before. Preferably they have tried a case like the one you are working on. They have seen the impact of this kind of evidence on judge and jury. An attorney with both theoretical knowledge and practical experience makes the best SME. Your ability to contribute subject matter expertise is limited when you have no practical experience. You might think certain ESI is helpful, when in fact, it is not; it has only weak probative value. A document might technically be relevant, but the SME lacks the experience and wisdom to know that matter is practically irrelevant anyway.

It goes without saying that any SME needs a good review team to back them up, to properly, consistently implement their decisions. In order for good leadership to be effective, there must also be good project management. Although this insight discussion features the role of the SME member of the review team, that is only because the importance of the SME was recently emphasized to us in our TREC research. In actuality all team members are important, not just the input from the top. Project management is critical, which is an insight already well-known to us and, we think, the entire industry.

Corrupt SMEs

Star_wars_emperor

Beware evil SMEs

Of course, no SME can be effective, no matter what their knowledge and experience, if they are not fair and honest. The SME must impartially seek and produce documents that are both pro and con. This is an ethics issue in all types of document review, not just predictive coding. In my experience corrupt SMEs are rare. But it does happen occasionally, especially when a corrupt client pressures their all too dependent attorneys. It helps to know the reputation for honesty of your opposing counsel. See: Five Tips to Avoid Costly Mistakes in Electronic Document Review Part 2 that contains my YouTube video, E-DISCOVERY ETHICS (below).

Also see: Lawyers Behaving Badly: Understanding Unprofessional Conduct in e-Discovery, 60 Mercer L. Rev. 983 (Spring 2009); Mancia v. Mayflower Begins a Pilgrimage to the New World of Cooperation, 10 Sedona Conf. J. 377 (2009 Supp.).

If I were a lawyer behaving badly in electronic document review, like for instance the Qualcomm lawyers did hiding thousands of highly relevant emails from Broadcom, I would not use predictive coding. If I wanted to not find evidence harmful to my case, I would use negotiated keyword search, the Go Fish kind. See Part Four of this series.

looking for droids in all the wrong places

I would also use linear review and throw an army of document review attorneys at it, with no one really knowing what the other was doing (or coding). I would subtly encourage project mismanagement. I would not pay attention. I would not supervise the rest of the team. I would not involve an AI entity,  i.w.- active machine learning. I would also not use an attorney with search expertise, nor would I use a national e-discovery vendor. I would throw a novice at the task and use a local or start-up vendor who would just do what they were told and not ask too many questions.

sorry_dave_ai

A corrupt hide-the-ball attorney would not want to use a predictive coding method like ours. They would not want the relevant documents produced or logged that disclose the training documents they used. This is true in any continuous training process, not just ours. We do not produce irrelevant documents, the law prevents that and protects our client’s privacy rights. But we do produce relevant documents, usually in phases, so you can see what the training documents are.

Star Wars Obi-WanA Darth Vader type hide-the-ball attorney would also want to avoid using a small, specialized, well-managed team of contract review lawyers to assist on a predictive coding project the review project. They would instead want to work with a large, distant army of contract lawyers. A small team of contract review attorneys cannot be brought into the con, especially if they are working for a good vendor. Even if you handicap them with a bad SME, and poor methods and software, they may still find a few of the damaging documents you do not want to produce. They may ask questions when they learn their coding has been changed from relevant to irrelevant. I am waiting for the next Qualcomm or Victor Stanley type case where a contract review lawyer blows the whistle on corrupt review practices. Qualcomm Inc. v. Broadcom Corp., No. 05-CV-1958-B(BLM) Doc. 593 (S.D. Cal. Aug. 6, 2007) (one honest low-level engineer testifying at trial blew the whistle on Qualcomm’s massive fraud to hide critical email evidence). I have heard stories from contract review attorneys, but the law provides them too little protection, and so far at least, they remain behind the scenes with horror stories.

One protection against both a corrupt SME, and SME with too little expertise and experience, is for the SME to be the attorney in charge of the trial of the case, or at least one who works closely with them so as to get their input when needed. The job of the SME is to know relevance. In the law that means you must know how the ultimate arbitrator of relevance will rule – the judge assigned to your case. They determine truth. An SME’s own personal opinion is important, but ultimately of secondary importance to that of the judge. For that reason a good SME will often vary on the side of over-expansive relevance because they know from history that is what the judge is likely to allow in this type of case.

Judges-Peck_Grimm_FacciolaThis is a key point. The judges, not the attorneys, ultimately decide on close relevance and related discoverability issues. The head trial attorney interfaces with the judge and opposing counsel, and should have the best handle on what is or is not relevant or discoverable. A good SME can predict the judge’s rulings and, even if not perfect, can gain the judicial guidance needed in an efficient manner.

If the judge detects unethical conduct by the attorneys before them, including the attorney signing the Rule 26(g) response, they can and should respond harshly to punish the attorneys. See eg: Victor Stanley, Inc. v. Creative Pipe, Inc., 269 F.R.D. 497, 506 (D. Md. 2010). The Darth Vader’s of the world can be defeated. I have done it many times with the help of the presiding judge. You can too. You can win even if they personally attack both you and the judge. Been through that too.

Three Kinds of SMEs: Best, Average & Bad

bullseye_arrow_hitWhen your project has a good SME, one with both high knowledge levels and experience, with wisdom from having been there before, and knowing the judge’s views, then your review project is likely to succeed. That means you can attain both high recall of the relevant documents and also high precision. You do not waste much time looking at irrelevant documents.

When an SME has only medium expertise or experience, or both, then the expert tends to err on the side of over-inclusion. They tend to call grey area documents relevant because they do not know they are unimportant. They may also not understand the new Federal Rules of Civil Procedure governing discoverability. Since they do not know, they err on the side of inclusion. True experts know and so tend to be more precise than rookies. The medium level SMEs may, with diligence, also attain high recall, but it takes them longer to get there. The precision is poor. That means wasted money reviewing documents of no value to the case, documents of only marginal relevance that would not survive any rational scrutiny of Rule 26(b)(1).

When the SME lacks knowledge and wisdom, then both recall and precision can be poor, even if the software and methods are otherwise excellent. A bad SME can ruin everything. They may miss most of the relevant documents and end up producing garbage without even knowing it. That is the fault of the person in charge of relevance, the SME, not the fault of predictive coding, nor the fault of the rest of the e-discovery review team.

relevance_targets

top_smeIf the SME assigned to a document review project, especially a project using active machine learning, is a high-quality SME, then they will have a clear grasp of relevance. They will know what types of documents the review team is looking for. They will understand the probative value of certain kids of documents in this particular case. Their judgments on Rule 26(b)(1) criteria as to discoverability will be consistent, well-balanced and in accord with that of the governing judge. They will instruct the whole team, including the machine, on what is true relevant, on what is discoverable and what is not. With this kind of top SME, if the software, methods, including project management, and rest of the review team are also good, then high recall and precision are very likely.

aver_smeIf the SME is just average, and is not sure about many grey area documents, then they will not have a clear grasp of relevance. It will be foggy at best. They will not know what types of documents the review team is looking for. SMEs like this think that any arrow that hits a target is relevant, not knowing that only the red circle in the center is truly relevant. They will not understand the probative value of certain kids of documents in this particular case. Their judgments on Rule 26(b)(1) criteria as to discoverability will not be perfectly consistent, and will end up either too broad or too narrow, and may not be in accord with that of the governing judge. They will instruct the whole team, including the machine, on what might be relevant and discoverable in an unfocused, vague, and somewhat inconsistent manner. With this kind of SME, if the software and methods, including project management, and rest of the review team are also good, and everyone is very diligent, high recall is still possible, but precision is unlikely. Still, the project will be unnecessarily expensive.

The bad SME has multiple possible targets in mind. They just search without really knowing what they are looking for. They will instruct the whole team, including the machine, on what might be relevant and discoverable in an confused, constantly shifting and often contradictory manner. Their obtuse explanations of relevance have little to do with the law, nor the case at hand. They probably have a very poor grasp of Rule 26(b)(1) Federal Rules of Civil Procedure. Their judgments on 26(b)(1) criteria as to discoverability, if any, will be inconsistent, imbalanced and sometimes irrational. This kind of SME probably does not even know the judge’s name, much less a history of their relevance rulings in this type of case. With this kind of SME, even if the software and methods are otherwise good, there is little chance that high recall or precision will be attained. An SME like this does not know when their search arrow has hit center of the target. In fact, it may hit the wrong target entirely. Their thought-world looks like this.

poor_sme

A document project governed by a bad SME runs a high risk of having to be redone because important information is missed. That can be a very costly disaster. Worse, a document important to the producing parties case can be missed and the case lost because of that error. In any event, the recall and precision will both be low. The costs will be high. The project will be confused and inefficient. Projects like this are hard to manage, no matter how good the rest of the team. In projects like this there is also a high risk that privileged documents will accidentally be produced. (There is always some risk of this in today’s high volume ESI world, even with a top-notch SME and review team. A Rule 502(d) Order should always be entered for the protection of all parties.)

Method and Software

The SME and his or her implementing team is just one part of the quality triangle. The other two are Method of electronic document review and Software used for electronic document review.

predictive_coding_quality_triangle-variation

Obviously the e-Discovery Team takes Method very seriously. That is one reason we are constantly tinkering with and improving our methods. We released the breakthrough Predictive Coding 3.0 last year, following 2015 TREC research, and this year, after TREC 2016, we released version 4.0. You could fairly say we are obsessed with the topic. We also focus on the importance of good project management and communications. No matter how good your SME, and how good your software, if your methods are poor, so too will your results in most of your projects. How you go about a document review project, how you manage it, is all-important to the quality of the end product, the production.

predictive_coding_4-0_webThe same holds true for software. For instance, if your software does not have active machine learning capacities, then it cannot do predictive coding. The method is beyond the reach of the software. End of story. The most popular software in the world right now for document review does not have that capacity. Hopefully that will change soon and I can stop talking around it.

Mr_EDREven among the software that has active machine learning, some are better than others. It is not my job to rank and compare software. I do not go around asking for demos and the opportunity to test other software. I am too busy for that. Everyone knows that I currently prefer to use EDR. It is the software by Kroll Ontrack that I use everyday. I am not paid to endorse them and I do not. (Unlike almost every other e-discovery commentator out there, no vendors pay me a dime.) I just share my current preference and pass along cost-savings to my clients.

I will just mention that the only other e-discovery vendor to participate with us at TREC is Catalyst. As most of my readers know, I am a fan of the founder and CEO, John Tredennick. There are several other vendors with good software too. Look around and be skeptical. But whatever you do, be sure the software you use is good. Even a great carpenter with the wrong tools cannot build a good house.

predictive_coding_quality_triangleOne thing I have found, that is just plain common sense, is that with good software and good methods, including good project management, you can overcome many weaknesses in SMEs, except for dishonesty or repeated, gross-negligence. The same holds true for all three corners of the quality triangle. Strength in one can, to a certain extent, make up for weaknesses in another.

To be continued …


What Information Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition

May 28, 2016

Ralph_and_LexieThis is an article on Information Theory, the Law, e-Discovery, Search and the evolution of our computer technology culture from Information → Knowledge → Wisdom. The article as usual assumes familiarity with writings on AI and the Law, especially active machine learning types of Legal Search. The article also assumes some familiarity with the scientific theory of Information as set forth in James Gleick’s book, The Information: a history, a theory, a flood (2011). I will begin the essay with several good instructional videos on Gleick’s book and Information Theory, including a bit about the life and work of the founder of Information Theory, Claude Shannon. Then I will provide my personal recapitulation of this theory and explore the application to two areas of my current work:

  1. The search for needles of relevant evidence in large, chaotic, electronic storage systems, such as email servers and email archives, in order to find the truth, the whole truth, and nothing but the truth needed to resolve competing claims of what happened – the facts – in the context of civil and criminal law suits and investigations.
  2. The articulation of a coherent social theory that makes sense of modern technological life, a theory that I summarize with the phrase: Information → Knowledge → Wisdom. See Information → Knowledge → Wisdom: Progression of Society in the Age of Computers and the more recent, How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.”

I essentially did the same thing in my blog last week applying Chaos Theories. What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition. This essay will, to some extent, build upon the last and so I suggest you read it first.

Information Theory

Gleick_The_InformationGleick’s The Information: a history, a theory, a flood covers the history of cybernetics, computer science, and the men and women involved with Information Theory over the last several decades. Gleick explains how these information scientists today think that everything is ultimately information. The entire Universe, matter and energy, life itself, is made up of information. Information in turn is ultimately binary, zeros and ones, on and off, yes and no. It is all bits and bytes.

Here are three videos, including two interviews of James Gleick, to provide a refresher on Information Theory for those who have not read his book recently. Information Wants to Have Meaning. Or Does It? (3:40, Big Think, 2014).

The Story of Information (3:47, 4th Estate Books, 2012).

Shannon_ClaudeThe generally accepted Father of Information Theory is Claude Shannon (1916-2001). He is a great visionary engineer whose ideas and inventions led to our current computer age. Among other things, he coined the word Bit in 1948 as the basic unit of information. He was also one of the first MIT hackers, in the original sense of the word as a tinkerer, who was always building new things. The following is a half-hour video by University of California Television (2008) that explains his life’s work and theories. It is worth taking the time to watch it.

Shannon was an unassuming genius, and like Mandelbrot, very quirky and interested in many different things in a wide variety of disciplines. Aside from being a great mathematician, Bell Labs engineer, and MIT professor, Shannon also studied game theory. He went beyond theory and devised several math based probability methods to win at certain games of chance, including card counting at blackjack. He collaborated with a friend at MIT, another mathematician, Edward Thorp, who became a professional gambler.

Shannon_movie_21_SpaceyShannon, his wife, and Thorp travelled regularly to Las Vegas for a couple of years in the early sixties where they constantly won at the tables using their math tricks, including card counting.  Shannon wanted to beat the roulette wheel too, but the system he and Thorp developed to do that required probability calculations beyond what he could do in his head. To solve this problem in 1961 he invented a small, concealable computer, the world’s first wearable computer, to help him calculate the odds. It was the size of a cigarette pack. His Law Vegas exploits became the very loose factual basis for a 2008 movie “21“, where Kevin Spacey played Shannon. (Poor movie, not worth watching.)

Shannon made even more money by applying his math abilities in the stock market. The list of his eclectic genius goes on and on, including his invention in 1950 of an electromechanical mouse named Theseus that could teach itself how to escape from a maze. Shannon’s mouse appears to have been the first artificial learning device. All that, and he was also an ardent juggler and builder/rider of little bitty unicycles (you cannot make this stuff up). Here is another good video of his life, and yet another to celebrate 2016 as the 100th year after his birth, The Shannon Centennial: 1100100 years of bits by the IEEE Information Theory Society.

claude_shannon_bike_juggle

_______

For a different view loosely connected with Information Theory I recommend that you listen to an interesting Google Talk by Gleick.“The Information: A History, a Theory, a Flood” – Talks at Google (53:45, Google, 2011). It pertains to news and culture and the tension between a humanistic and mechanical approach, a difference that mirrors the tension between Information and Knowledge. This is a must read for all news readers, especially NY Times readers, and for everyone who consumes, filters, creates and curates Information (a Google term). This video has  a good dialogue concerning modern culture and search.

As you can see from the above Google Talk, a kind of Hybrid Multimodal approach seems to be in use in all advanced search. At Google they called it a “mixed-model.” The search tools are designed to filter identity-consonance in favor of diverse-harmonies. Crowd sourcing and algorithms function as curation authority to facilitate Google search. This is a kind of editing by omission that human news editors have been doing for centuries.

The mixed-model approach implied here has both human and AI editors working together to create new kinds of interactive search. Again, good search depends upon a combination of AI and human intelligence. Neither side should work alone and commercial interests should not be allowed to take control. Both humans and machines should create bits and transmit them. People should use AI software to refine their own searches as an ongoing process. This should be a conversation, an interactive Q&A. This should provide a way out of Information to Knowledge.

Lexington - IT lex

Personal Interpretation of Information Theory

My takeaway from the far out reaches of Information theories is that everything is information, even life. All living entities are essentially algorithms of information, including humans. We are intelligent programs capable of deciding yes or no, capable of conscious, intelligent action, binary code. Our ultimate function is to transform information, to process and connect otherwise cold, random data. That is the way most Information Theorists and their philosophers see it, although I am not sure I agree.

Life forms like us are said to stand as the counter-pole to the Second Law of Thermodynamics. The First Law you will remember is that energy cannot be created or destroyed. The Second Law is that the natural tendency of any isolated system is to degenerate into a more disordered state. The Second Law is concerned with the observed one-directional nature of all energy processes. For example, heat always flows spontaneously from hotter to colder bodies, and never the reverse, unless external work is performed on the system. The result is that entropy always increases with the flow of time.

Ludwig_BoltzmannThe Second Law is causality by multiplication, not a zig-zag Mandelbrot fractal division. See my last blog on Chaos Theory. Also see: the work of the Austrian Physicist, Ludwig Boltzmann (1844–1906) on gas-dynamical equations, and his famous H-theorem: the entropy of a gas prepared in a state of less than complete disorder must inevitably increase, as the gas molecules are allowed to collide. Boltzman’s theorem-proof assumed “molecular chaos,” or, as he put it, the Stosszahlansatz, where all particle velocities were completely uncorrelated, random, and did not follow from Newtonian dynamics. His proof of the Second Law was attacked based on the random state assumption and the so called Loschmidt’s paradox. The attacks from pre-Chaos, Newtonian dominated scientists, many of whom still did not even believe in atoms and molecules, contributed to Boltzman’s depression and, tragically, he hanged himself at age 62.

My personal interpretation of Information Theory is that humans, like all of life, counter-act and balance the Second Law. We do so by an organizing force called negentropy that balances out entropy. Complex algorithms like ourselves can recognize order in information, can make sense of it. Information can have meaning, but only by our apprehension of it. We hear the falling tree and thereby make it real.

This is what I mean by the transition from Information to Knowledge. Systems that have ability to process information, to bring order out of chaos, and attach meaning to information, embody that transition. Information is essentially dead, whereas Knowledge is living. Life itself is a kind of Information spun together and integrated into meaningful Knowledge.

privacy-vs-googleWe humans have the ability to process information, to find connections and meaning. We have created machines to help us to do that. We now have information systems – algorithms – that can learn, both on their own and with our help.  We humans also have the ability find things. We can search and filter to perceive the world in such a way as to comprehend its essential truth. To see through appearances, It is an essential survival skill. The unseen tiger is death. Now, in the Information Age, we have created machines to help us find things, help us see the hidden patterns.

We can create meaning, we can know the truth. Our machines, our robot friends, can help us in these pursuits. They can help us attain insights into the hidden order behind chaotic systems of otherwise meaningless information. Humans are negentropic to a high degree, probably more so than any other living system on this planet. With the help of our robot friends, humans can quickly populate the world with meaning and move beyond a mere Information Age. We can find order, process the binary yes-or-no choices and generate Knowledge. This is similar is the skilled editor’s function discussed in Gleick’s Talks at Google (53:45, Google, 2011), but one whose abilities are greatly enhanced by AI analytics and crowdsourcing. The arbitration of truth as they put it in the video is thereby facilitated.

With the help of computers our abilities to create Knowledge are exploding. We may survive the Information flood. Some day our Knowledge may evolve even further, into higher-level integrations – into Wisdom.

James GleickWhen James Gleick was interviewed by Publishers Weekly in 2011 about his book, The Information: a history, a theory, a floodhe touched upon the problem with Information:

By the technical definition, all information has a certain value, regardless of whether the message it conveys is true or false. A message could be complete nonsense, for example, and still take 1,000 bits. So while the technical definition has helped us become powerful users of information, it also instantly put us on thin ice, because everything we care about involves meaning, truth, and, ultimately, something like wisdom. And as we now flood the world with information, it becomes harder and harder to find meaning. That paradox is the final tension in my book.

Application of Information Theory to e-Discovery and Social Progress

Information-mag-glassIn responding to lawsuits we must search through information stored in computer systems. We are searching for information relevant to a dispute. This dispute always arises after the information was created and stored. We do not order and store information according to issues in a dispute or litigation that has not yet happened. This means that for purposes of litigation all information storage systems are inherently entropic, chaotic. They are always inadequately ordered, as far as the lawsuit is concerned. Even if the ESI storage is otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email accounts), it is never well-ordered for a particular lawsuit.

As forensic evidence finders we must always sort through meaningless, irrelevant noise to find the meaningful, relevant information we need. The information we search is usually not completely random. There is some order to it, some meaning. There are, for instance, custodian and time parameters that assist our search for relevance. But the ESI we search is never presented to us arranged in an order that tracks the issues raised by the new lawsuit. The ESI we search is arranged according to other logic, if any at all.

It is our job to bring order to the chaos, meaning to the information, by separating the relevant information from the irrelevant information. We search and find the documents that have meaning for our case. We use sampling, metrics, and iteration to achieve our goals of precision and recall. Once we separate the relevant documents from the irrelevant, we attain some knowledge of the total dataset. We have completed First Pass Review, but our work is not finished. All of the relevant information found in the First Pass is not produced.

Additional information refinement is required. More yes-no decisions must be made in what is called Second Pass Review. Now we consider whether a relevant document is privileged and thus excluded from production, or whether portions of it must be redacted to protect confidentiality.

Even after our knowledge is so further enhanced by confidentiality sorting, and a production set is made, the documents produced, our work is still incomplete. There is almost always far too much information in the documents produced for them to be useful. The information must be further processed. Relevancy itself must be ranked. The relevant documents must be refined down to the 7 +/- 2 documents that will persuade the judge and jury to rule our way, to reach the yes or no decision we seek. The vast body of knowledge, relevant evidence, must become wisdom, must become persuasive evidence.

Knowledge_Information_Wisdom

In a typical significant lawsuit the metrics of this process are as follows: from trillions, to thousands, to a handful. (You can change the numbers if you want to fit the dispute, but what counts here are the relative proportions.)

In a typical lawsuit today we begin with an information storage system that contains trillions of computer files. A competent e-discovery team is able to reduce this down to tens of thousands of files, maybe less, that are relevant. The actual count depends on many things, including issue complexity, cooperation and Rule 26(b)(1) factors. The step from trillions of files, to tens of thousands of relevant files, is the step from information to knowledge. Many think this is what e-discovery is all about: find the relevant evidence, convert Information to Knowledge. But it is not. It is just the first step: from 1 to 2. The next step, 2 to 3, the Wisdom step, is more difficult and far more important.

The tens of thousands of relevant evidence, the knowledge of the case, is still too vast to be useful. After all, the human brain can, at best, only keep seven items in mind at a time. Miller, The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review 63 (2): 81–97. Tens of thousands of documents, or even thousands of documents, are not helpful to jurors. It may all be relevant, but is not all important. All trial lawyers will tell you that trials are won or lost by only five to nine documents. The rest is just noise, or soon forgotten foundation. Losey, Secrets of Search – Part III (5th secret).

The final step of information processing in e-discovery is only complete when the tens of thousands of files are winnowed down to 5 or 9 documents, or less. That is the final step of Information’s journey, the elevation from Knowledge to Wisdom.

Our challenge as e-discovery team members is to take raw information and turn it into wisdom – the five to nine documents with powerful meaning that will produce the favorable legal rulings that we seek. Testimony helps too of course, but without documents, it is difficult to test memory accuracy, much less veracity. This evidence journey mirrors the challenge of our whole culture, to avoid drowning in too-much-information, to rise above, to find Knowledge and, with luck, a few pearls of Wisdom.

Conclusion

Ralph_green2From trillions to a handful, from mere information to practical wisdom — that is the challenge of our culture today. On a recursive self-similar level, that is also the challenge of justice in the Information Age, the challenge of e-discovery. How to meet the challenges? How to self-organize from out of the chaos of too much information? The answer is iterative, cooperative, interactive, interdisciplinary team processes that employ advanced hybrid, multimodal technologies and sound human judgment. See What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition.

The micro-answer for cyber-investigators searching for evidence is fast becoming clear. It depends on a balanced hybrid application of human and artificial intelligence. What was once a novel invention, TAR, or technology assisted review, is rapidly becoming an obvious solution accepted in courts around the world. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015); Pyrrho Investments v MWB PropertyEWHC 256 (Ch) (2/26/16). That is how information works. What was novel one day, even absurd, can very quickly become commonplace. We are creating, transmitting and processing information faster than ever before. The bits are flying at a rate that even Claude Shannon would never have dreamed possible.

The pace of change quickens as information and communication grows. New information flows and inventions propagate. The encouragement of negentropic innovation – ordered bits – is the basis of our property laws and commerce. The right information at the right time has great value.

Just ask a trial lawyer armed with five powerful documents — five smoking guns. These essential core documents are what make or break a case. The rest is just so much background noise, relevant but unimportant. The smoking hot Wisdom is what counts, not Information, not even Knowledge, although they are, of course, necessary prerequisites. There is a significant difference between inspiration and wisdom. Real wisdom does not just appear out of thin air. It arises out of True Information and Knowledge.

The challenge of Culture, including Law and Justice in our Information Age, is to never lose sight of this fundamental truth, this fundamental pattern: Information → Knowledge → Wisdom. If we do, we will get lost in the details. We will drown in a flood of meaningless information. Either that, or we will progress, but not far enough. We will become lost in knowledge and suffer paralysis by analysis. We will know too much, know everything, except what to do. Yes or No. Binary action. The tree may fall, but we never hear it, so neither does the judge or jury. The power of the truth is denied,

There is deep knowledge to be gained from both Chaos and Information Theories that can be applied to the challenges. Some of the insights can be applied in legal search and other cyber investigations. Others can be applied in other areas. As shown in this essay, details are important, but never lose sight of the fundamental pattern. You are looking for the few key facts. Like the Mandelbrot Set they remain the same, or at least similar, over different scales of magnitude, from the small county court case, to the largest complex multinational actions. Each case is different, yet the same. The procedures ties them all together.

Meaning is the whole point of Information. Justice is whole point of the Law.

You find the truth of a legal controversy by finding the hidden order that ties together all of the bits of evidence together. You find the hidden meaning behind all of the apparent contradictory clues, a fractal link of the near infinite strings of bits and bytes.

What really happened? What is the just response, the equitable remedy? That is the ultimate meaning of e-discovery, to find the few significant, relevant facts in large chaotic systems, the facts that make or break your case, so that judges and juries can make the right call. Perhaps this is the ultimate meaning of many of life’s challenges? I do not have the wisdom yet to know, but, as Cat Stevens says, I’m on the road to find out.


%d bloggers like this: