Elusion Random Sample Test Ordered Under Rule 26(g) in a Keyword Search Based Discovery Plan

August 26, 2018

There is a new case out of Chicago that advances the jurisprudence of my sub-specialty, Legal Search. City of Rockford v. Mallinckrodt ARD Inc., 2018 WL 3766673, Case 3:17-cv-50107 (N.D. Ill., Aug. 7, 2018). This discovery order was written by U.S. Magistrate Judge Iain Johnston who entitled it: “Order Establishing Production Protocol for Electronically Stored Information.” The opinion is both advanced and humorous, destined to be an oft-cited favorite for many. Thank you Judge Johnston.

In City of Rockford an Elusion random sample quality assurance test was required as part of the parties discovery plan to meet the reasonable efforts requirements of Rule 26(g). The random sample procedure proposed was found to impose only a proportional, reasonable burden under Rule 26(b)(1). What makes this holding particularly interesting is that an Elusion test is commonly employed in predictive coding projects, but here the parties had agreed to a keyword search based discovery plan. Also see: Tara Emory, PMP, Court Holds that Math Matters for eDiscovery Keyword Search,  Urges Lawyers to Abandon their Fear of Technology (Driven, (August 16, 2018) (“party using keywords was required to test the search effectiveness by sampling the set of documents that did not contain the keywords.”)

The Known Unknowns and Unknown Unknowns

Judge Johnston begins his order in City of Rockford with a famous quote by Donald Rumseld, a two-time Secretary of Defense.

“[A]s we know there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. . .”
Donald Rumseld

For those not familiar with this famous Known Knowns quip, here is a video of the original:

Here the knowledge logic is spelled out in a chart, since I know we all love that sort of thing. Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Anybody who does complex investigations is familiar with this problem. Indeed, you can argue this insight is fundamental to all of science and experimental method. Logan, David C. (March 1, 2009). “Known knowns, known unknowns, unknown unknowns and the propagation of scientific enquiry”, Journal of Experimental Botany 60 (3). pp. 712–4. [I have always wanted to quote a botany journal.]

How do you deal with the known unknowns and the unknown unknowns, the information that we don’t even know that we don’t know about? The deep, hidden information that is both obtuse and rare. Information that is hard to retrieve and harder still to prove does not exist at all. Are you chasing something that might not exist? Something unknown because nonexistent? Such as an overlooked Highly Relevant document? (The stuff of nightmares!) Are you searching for nothing? Zero? If you find it, what does that mean? What can be known and what can never be known? Scientists, investigators and the Secretary of Defense alike all have to ponder these questions and all want to use the best tools and best people possible to do so. See: Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Seeking Knowledge of the Unknown Elusion Error Rate

These big questions, though interesting, are not why Judge Johnston started his opinion with the Rumseld quote. Instead, he used the quote to emphasize that new e-discovery methods, namely random sampling and statistical analysis, can empower lawyers to know what they never did before. A technical way to know the known unknowns. For instance, a way to know the number of relevant documents that will be missed and not produced: the documents that elude retrieval.

As the opinion and this blog will explain, you can do that, know that, by using an Elusion random sample of the null-set. The statistical analysis of the sample transforms the unknown quantity to a known (subject to statistical probabilities and range). It allows lawyers to know, at least within a range, the number of relevant documents that have not been found. This is a very useful quality assurance method that relies on objective measurements to demonstrate success of your project, which here is information retrieval. This and other random sampling methods allow for the calculation of Recall, meaning the percent of total relevant documents found. This is another math-based, quality assurance tool in the field of information retrieval.

One of the main points Judge Johnston makes in his order is that lawyers should embrace this kind of technical knowledge, not shy away from it. As Tara Emory said in her article, Court Holds that Math Matters for eDiscovery Keyword Search:

A producing party must determine that its search process was reasonable. In many cases, the best way to do this is with objective metrics. Producing parties often put significant effort into brainstorming keywords, interviewing witnesses to determine additional terms, negotiating terms with the other party, and testing the documents containing their keywords to eliminate false positives. However, these efforts often still fail to identify documents if important keywords were missed, and sampling the null set is a simple, reasonable way to test whether additional keywords are needed. …

It is important to overcome the fear of technology and its related jargon, which can help counsel demonstrate the reasonableness of search and production process. As Judge Johnston explains, sampling the null set is a process to determine “the known unknown,” which “is the number of the documents that will be missed and not produced.” Judge Johnson disagreed with the defendants’ argument “that searching the null set would be costly and burdensome.” The Order requires Defendants to sample their null set at a 95% +/-2% margin of error (which, even for a very large set of documents, would be about 2,400 documents to review).[4] By taking these measures—either with TAR or with search terms, counsel can more appropriately represent that they have undertaken a “reasonable inquiry” for relevant information within the meaning of FRCP 26(g)(1).

Small Discovery Dispute in an Ocean of Cooperation

Judge Johnston was not asked to solve the deep mysteries of knowing and not knowing in City of Rockford. The parties came to him instead with an interesting, esoteric discovery dispute. They had agreed on a great number of things, for which the court profusely congratulated them.

The attorneys are commended for this cooperation, and their clients should appreciate their efforts in this regard. The Court certainly does. The litigation so far is a solid example that zealous advocacy is not necessarily incompatible with cooperation. The current issue before the Court is an example of that advocacy and cooperation. The parties have worked to develop a protocol for the production of ESI in this case, but have now reached an impasse as to one aspect of the protocol.

The parties disagreed on whether to include a document review quality assurance test in the protocol. The Plaintiffs wanted one and the Defendants did not. Too burdensome they said.

To be specific, the Plaintiffs wanted a test where the efficacy of any parties production would be tested by use of an Elusion type of Random Sample of the documents not produced. The Defendants opposed any specific test. Instead, they wanted the discovery protocol to say that if the receiving party had concerns about the adequacy of the producing party’s efforts, then they would have a conference to address the concerns.

Judge Johnston ruled for the plaintiff in this dispute and ordered a  random elusion sample to be taken after the defendant stopped work and completed production. In this case it was a good decision, but should not be routinely required in all matters.

The Stop Decision and Elusion Sample

One of the fundamental problems in any investigation is to know when you should stop the investigation because it is no longer worth the effort to carry on. When has a reasonable effort been completed? Ideally this happens after all of the important documents have already been found. At that point you should stop the effort and move on to a new project. Alternatively, perhaps you should keep on going and look for more? Should you stop or not?

In Legal Search we all this the “Stop Decision.” Should you conclude the investigation or continue further AI training rounds and other search. As explained in the e-Discovery Team TAR Course:

The all important stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment.You decide to stop the review after weighing a multitude of considerations. Then you test your decision with a random sample in Step Seven.

See: TAR Course: 15th Class – Step Seven – ZEN Quality Assurance Tests.

If you want to go deeper into this, then listen in on this TAR Course lecture on the Stop decision.

____________

Once a decision is made to Stop, then a well managed document review project will use different tools and metrics to verify that the Stop decision was correct. Judge Johnston in City of Rockford used one of my favorite tools, the Elusion random sample that I teach in the e-Discovery Team TAR Course. This type of random sample is called an Elusion sample.

Judge Johnston ordered an Elusion type random sample of the null set in City of Rockford. The sample would determine the range of relevant documents that likely eluded you. These are called False Negatives. Documents presumed Irrelevant and withheld that were in fact Relevant and should have been produced. The Elusion sample is designed to give you information on the total number of Relevant documents that were likely missed, unretrieved, unreviewed and not produced or logged. The fewer the number of False Negatives the better the Recall of True Positives. The goal is to find, to retrieve, all of the Relevant ESI in the collection.

Another way to say the same thing is to say that the goal is Zero False Negatives. You do not miss a single relevant file. Every file designated Irrelevant is in fact not relevant. They are all True Negatives. That would be Total Recall: “the Truth, the Whole Truth …” But that is very rare and some error, some False Negatives, are expected in every large information retrieval project. Some relevant documents will almost always be missed, so the goal is to make the False Negatives inconsequential and keep the Elusion rate low.

Here is how Judge Iain Johnston explained the random sample:

Plaintiffs propose a random sample of the null set. (The “null set” is the set of documents that are not returned as responsive by a search process, or that are identified as not relevant by a review process. See Maura R. Grossman & Gordon v. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. L. Rev. 1, 25 (2013). The null set can be used to determine “elusion,” which is the fraction of documents identified as non-relevant by a search or review effort that are, in fact, relevant. Elusion is estimated by taking a random sample of the null set and determining how many or what portion of documents are actually relevant. Id. at 15.) FN 2

Judge Johnston’s Footnote Two is interesting for two reasons. One, it attempts to calm lawyers who freak out when hearing anything having to do with math or statistics, much less information science and technology. Two, it does so with a reference to Fizbo the clown.

The Court pauses here for a moment to calm down litigators less familiar with ESI. (You know who you are.) In life, there are many things to be scared of, including, but not limited to, spiders, sharks, and clowns – definitely clowns , even Fizbo. ESI is not something to be scared of. The same is true for all the terms and jargon related to ESI. … So don’t freak out.

Accept on Zero Error for Hot Documents

Although this is not addressed in the court order, in my personal view, no False Negatives, iw – overlooked  documents – are acceptable when it comes to Highly Relevant documents. If even one document like that is found in the sample, one Highly Relevant Document, then the Elusion test has failed in my view. You must conclude that the Stop decision was wrong and training and document review must recommence. That is called an Accept on Zero Error test for any hot documents found. Of course my personal views on best practice here assume the use of AI ranking, and the parties in City of Rockford only used keyword search. Apparently they were not doing machine training at all.

The odds of finding False Negatives, assuming that only a few exist (very low prevalence) and the database is large, are very unlikely in a modest sized random sample. With very low prevalence of relevant ESI the test can be of limited effectiveness. That is an inherent problem with low prevalence and random sampling. That is why statistics have only limited effectiveness and should be considered part of a total quality control program. See Zero Error Numerics: ZEN. Math matters, but so too does good project management and communications.

The inherent problem with random sampling is that the only way to reduce the error interval is to increase the size of the sample. For instance, to decrease the margin of error to only 2% either way, a total error of 4%, a random sample size of around 2,400 documents is needed. Even though that narrows the error rate to 4%, there is still another error factor of the Confidence Level, here at 95%. Still, it is not worth the effort to review even more sample documents to reduce that to a 99% Level.

Random sampling has limitations in low prevalence datasets, which is typical in e-discovery, but still sampling can be very useful. Due to this rarity issue, and the care that producing parties always take to attain high Recall, any documents found in an Elusion random sample should be carefully studied to see if they are of any significance. We look very carefully at any new documents found that are of a kind not seen before. That is unusual. Typically  any relevant documents found by random sample of the elusion set are of a type that have been seen before, often many, many times before. These “same old, same old” type of documents are of no importance to the investigation at this point.

Most email related datasets are filled with duplicative, low value data. It is not exactly irrelevant noise, but it is not a helpful signal either. We do not care if we  get all of that kind of merely relevant data. What we really want are the Hot Docs, the high value Highly Relevant ESI, or at least Relevant and of a kind not seen before. That is why the Accept On Zero Error test is so important for Highly Relevant documents.

The Elusion Test in City of Rockford 

In City of Rockford Judge Johnston considered a discovery stipulation where the parties had agreed to use a typical keyword search protocol, but disagreed on a quality assurance protocol. Judge Johnston held:

With key word searching (as with any retrieval process), without doubt, relevant documents will be produced, and without doubt, some relevant documents will be missed and not produced. That is a known known. The known unknown is the number of the documents that will be missed and not produced.

Back to the False Negatives again, the known unknown. Judge Johnston continues his analysis:

But there is a process by which to determine that answer, thereby making the known unknown a known known. That process is to randomly sample the nullset. Karl Schieneman & Thomas C. Gricks III, The Implications of Rule26(g) on the Use of Technology-Assisted Review, 2013 Fed. Cts. L. Rev. 239, 273 (2013)(“[S]ampling the null set will establish the number of relevant documents that are not being produced.”). Consequently, the question becomes whether sampling the null set is a reasonable inquiry under Rule 26(g) and proportional to the needs of this case under Rule 26(b)(1).

Rule 26(g) Certification
Judge Johnston takes an expansive view of the duties placed on counsel of record by Rule 26(g), but concedes that perfection is not required:

Federal Rule of Civil Procedure 26(g) requires all discovery requests be signed by at least one attorney (or party, if proceeding pro se). Fed. R. Civ. P. 26(g)(1). By signing the response, the attorney is certifying that to the best of counsel’s knowledge, information, and belief formed after a reasonable inquiry, the disclosure is complete and correct at the time it was made. Fed. R. Civ. P. 26(g)(1)(A). But disclosure of documents need not be perfect. … If the Federal Rules of Civil Procedure were previously only translucent on this point, it should now be clear with the renewed emphasis on proportionality.

Judge Johnston concludes that Rule 26(g) on certification applies to require the Elusion sample in this case.

Just as it is used in TAR, a random sample of the null set provides validation and quality assurance of the document production when performing key word searches.  Magistrate Judge Andrew Peck made this point nearly a decade ago. See William A. Gross Constr. Assocs., 256 F.R.D. at 135-6 (citing Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 262 (D. Md. 2008)); In re Seroquel Products Liability Litig., 244 F.R.D. 650, 662 (M.D. Fla. 2007) (requiring quality assurance).

Accordingly, because a random sample of the null set will help validate the document production in this case, the process is reasonable under Rule 26(g).

Rule 26(b)(1) Proportionality

Judge Johnston considered as a separate issue whether it was proportionate under Rule 26(b)(1) to require the elusion test requested. Again, the court found that it was in this large case on the pricing of prescription medication and held that it was proportional:

The Court’s experience and understanding is that a random sample of the null set will not be unreasonably expensive or burdensome. Moreover and critically, Defendants have failed to provide any evidence to support their contention. Mckinney/Pearl Rest. Partners, L.P. v. Metro. Life Ins. Co., 322 F.R.D. 235, 242 (N.D.Tex. 2016) (party required to submit affidavits or offer evidence revealing the nature of the burden)
Once again we see a party seeking protection from having to do something because it is so burdensome then failing to present actual evidence of burden. We see this a lot lately. Responding Party’s Complaints of Financial Burden of Document Review Were Unsupported by the Evidence, Any Evidence (e-Discovery Team, 8/5/18);

Judge Johnston concludes his “Order Establishing Production Protocol for Electronically Stored Information” with the following:

The Court adopts the parties’ proposed order establishing the production protocol for ESI with the inclusion of Plaintiffs’ proposal that a random sample of the null set will occur after the production and that any responsive documents found as a result of that process will be produced. Moreover, following that production, the parties should discuss what additional actions, if any, should occur. If the parties cannot agree at that point, they can raise the issue with the Court.

Conclusion

City of Rockford is important because it is the first case to hold that a quality control procedure should be used to meet the reasonable efforts certification requirements of Rule 26(g). The procedure here required was a random sample Elusion test with related, limited data sharing. If this interpretation of Rule 26(g) is followed by other courts, then it could have a big impact on legal search jurisprudence. Tara Emory in her article, Court Holds that Math Matters for eDiscovery Keyword Search goes so far as to conclude that City of Rockford stands for the proposition that “the testing and sampling process associated with search terms is essential for establishing the reasonableness of a search under FRCP 26(g).”

The City of Rockford holding could persuade other judges and encourage courts to be more active and impose specific document review procedures on all parties, including requiring the use of sampling and artificial intelligence. The producing party cannot always have a  free pass under Sedona Principle Six. Testing and sampling may well be routinely ordered in all “large” document review cases in the future.

It will be very interesting to watch how other attorneys argue City of Rockford. It will continue a line of cases examining methodology and procedures in document review. See eg., William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134 (S.D.N.Y. 2009) (“wake-up call” for lawyers on keyword search); Winfield v. City of New York (SDNY, Nov. 27, 2017), where Judge Andrew Peck considers methodologies and quality controls of the active machine learning process. Also see Special Master Maura Grossman’s Order Regarding Search Methodology for ESI, a validation Protocol for the Broiler Chicken antitrust cases.

The validation procedure of an Elusion sample in City of Rockford is just one of many possible review protocols that a court could impose under Rule 26(g). There are dozens more, including whether predictive coding should be required. So far, courts have been reluctant to order that, as Judge Peck explained in Hyles:

There may come a time when TAR is so widely used that it might be unreasonable for a party to decline to use TAR. We are not there yet.

Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016):

Like a kid in the backseat of the car, I cannot help but ask, are we there yet? Hyles was published over two years ago now. Maybe some court, somewhere in the world, has already ordered a party to do predictive coding against their will, but not to our knowledge. That is a known unknown. Still, we are closer to “There” with the City of Rockford’s requirement of an Elusion test.

When we get “there,” and TAR is finally ordered in a case, it will probably arise in a situation like City of Rockford where a joint protocol applicable to all parties is involved. That is easier to sell than a one-sided protocol. The court is likely to justify the order by Rule 26(g), and hold that it requires all parties in the case to use predictive coding. Otherwise, they will not meet the  reasonable effort burdens of Rule 26(g). Other rules will be cited too, of course, including Rule 1, but Rule 26(g) is likley to be key.

____________

___

 

____

 

 



What Information Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition

May 28, 2016

Ralph_and_LexieThis is an article on Information Theory, the Law, e-Discovery, Search and the evolution of our computer technology culture from Information → Knowledge → Wisdom. The article as usual assumes familiarity with writings on AI and the Law, especially active machine learning types of Legal Search. The article also assumes some familiarity with the scientific theory of Information as set forth in James Gleick’s book, The Information: a history, a theory, a flood (2011). I will begin the essay with several good instructional videos on Gleick’s book and Information Theory, including a bit about the life and work of the founder of Information Theory, Claude Shannon. Then I will provide my personal recapitulation of this theory and explore the application to two areas of my current work:

  1. The search for needles of relevant evidence in large, chaotic, electronic storage systems, such as email servers and email archives, in order to find the truth, the whole truth, and nothing but the truth needed to resolve competing claims of what happened – the facts – in the context of civil and criminal law suits and investigations.
  2. The articulation of a coherent social theory that makes sense of modern technological life, a theory that I summarize with the phrase: Information → Knowledge → Wisdom. See Information → Knowledge → Wisdom: Progression of Society in the Age of Computers and the more recent, How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.”

I essentially did the same thing in my blog last week applying Chaos Theories. What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition. This essay will, to some extent, build upon the last and so I suggest you read it first.

Information Theory

Gleick_The_InformationGleick’s The Information: a history, a theory, a flood covers the history of cybernetics, computer science, and the men and women involved with Information Theory over the last several decades. Gleick explains how these information scientists today think that everything is ultimately information. The entire Universe, matter and energy, life itself, is made up of information. Information in turn is ultimately binary, zeros and ones, on and off, yes and no. It is all bits and bytes.

Here are three videos, including two interviews of James Gleick, to provide a refresher on Information Theory for those who have not read his book recently. Information Wants to Have Meaning. Or Does It? (3:40, Big Think, 2014).

The Story of Information (3:47, 4th Estate Books, 2012).

Shannon_ClaudeThe generally accepted Father of Information Theory is Claude Shannon (1916-2001). He is a great visionary engineer whose ideas and inventions led to our current computer age. Among other things, he coined the word Bit in 1948 as the basic unit of information. He was also one of the first MIT hackers, in the original sense of the word as a tinkerer, who was always building new things. The following is a half-hour video by University of California Television (2008) that explains his life’s work and theories. It is worth taking the time to watch it.

Shannon was an unassuming genius, and like Mandelbrot, very quirky and interested in many different things in a wide variety of disciplines. Aside from being a great mathematician, Bell Labs engineer, and MIT professor, Shannon also studied game theory. He went beyond theory and devised several math based probability methods to win at certain games of chance, including card counting at blackjack. He collaborated with a friend at MIT, another mathematician, Edward Thorp, who became a professional gambler.

Shannon_movie_21_SpaceyShannon, his wife, and Thorp travelled regularly to Las Vegas for a couple of years in the early sixties where they constantly won at the tables using their math tricks, including card counting.  Shannon wanted to beat the roulette wheel too, but the system he and Thorp developed to do that required probability calculations beyond what he could do in his head. To solve this problem in 1961 he invented a small, concealable computer, the world’s first wearable computer, to help him calculate the odds. It was the size of a cigarette pack. His Law Vegas exploits became the very loose factual basis for a 2008 movie “21“, where Kevin Spacey played Shannon. (Poor movie, not worth watching.)

Shannon made even more money by applying his math abilities in the stock market. The list of his eclectic genius goes on and on, including his invention in 1950 of an electromechanical mouse named Theseus that could teach itself how to escape from a maze. Shannon’s mouse appears to have been the first artificial learning device. All that, and he was also an ardent juggler and builder/rider of little bitty unicycles (you cannot make this stuff up). Here is another good video of his life, and yet another to celebrate 2016 as the 100th year after his birth, The Shannon Centennial: 1100100 years of bits by the IEEE Information Theory Society.

claude_shannon_bike_juggle

_______

For a different view loosely connected with Information Theory I recommend that you listen to an interesting Google Talk by Gleick.“The Information: A History, a Theory, a Flood” – Talks at Google (53:45, Google, 2011). It pertains to news and culture and the tension between a humanistic and mechanical approach, a difference that mirrors the tension between Information and Knowledge. This is a must read for all news readers, especially NY Times readers, and for everyone who consumes, filters, creates and curates Information (a Google term). This video has  a good dialogue concerning modern culture and search.

As you can see from the above Google Talk, a kind of Hybrid Multimodal approach seems to be in use in all advanced search. At Google they called it a “mixed-model.” The search tools are designed to filter identity-consonance in favor of diverse-harmonies. Crowd sourcing and algorithms function as curation authority to facilitate Google search. This is a kind of editing by omission that human news editors have been doing for centuries.

The mixed-model approach implied here has both human and AI editors working together to create new kinds of interactive search. Again, good search depends upon a combination of AI and human intelligence. Neither side should work alone and commercial interests should not be allowed to take control. Both humans and machines should create bits and transmit them. People should use AI software to refine their own searches as an ongoing process. This should be a conversation, an interactive Q&A. This should provide a way out of Information to Knowledge.

Lexington - IT lex

Personal Interpretation of Information Theory

My takeaway from the far out reaches of Information theories is that everything is information, even life. All living entities are essentially algorithms of information, including humans. We are intelligent programs capable of deciding yes or no, capable of conscious, intelligent action, binary code. Our ultimate function is to transform information, to process and connect otherwise cold, random data. That is the way most Information Theorists and their philosophers see it, although I am not sure I agree.

Life forms like us are said to stand as the counter-pole to the Second Law of Thermodynamics. The First Law you will remember is that energy cannot be created or destroyed. The Second Law is that the natural tendency of any isolated system is to degenerate into a more disordered state. The Second Law is concerned with the observed one-directional nature of all energy processes. For example, heat always flows spontaneously from hotter to colder bodies, and never the reverse, unless external work is performed on the system. The result is that entropy always increases with the flow of time.

Ludwig_BoltzmannThe Second Law is causality by multiplication, not a zig-zag Mandelbrot fractal division. See my last blog on Chaos Theory. Also see: the work of the Austrian Physicist, Ludwig Boltzmann (1844–1906) on gas-dynamical equations, and his famous H-theorem: the entropy of a gas prepared in a state of less than complete disorder must inevitably increase, as the gas molecules are allowed to collide. Boltzman’s theorem-proof assumed “molecular chaos,” or, as he put it, the Stosszahlansatz, where all particle velocities were completely uncorrelated, random, and did not follow from Newtonian dynamics. His proof of the Second Law was attacked based on the random state assumption and the so called Loschmidt’s paradox. The attacks from pre-Chaos, Newtonian dominated scientists, many of whom still did not even believe in atoms and molecules, contributed to Boltzman’s depression and, tragically, he hanged himself at age 62.

My personal interpretation of Information Theory is that humans, like all of life, counter-act and balance the Second Law. We do so by an organizing force called negentropy that balances out entropy. Complex algorithms like ourselves can recognize order in information, can make sense of it. Information can have meaning, but only by our apprehension of it. We hear the falling tree and thereby make it real.

This is what I mean by the transition from Information to Knowledge. Systems that have ability to process information, to bring order out of chaos, and attach meaning to information, embody that transition. Information is essentially dead, whereas Knowledge is living. Life itself is a kind of Information spun together and integrated into meaningful Knowledge.

privacy-vs-googleWe humans have the ability to process information, to find connections and meaning. We have created machines to help us to do that. We now have information systems – algorithms – that can learn, both on their own and with our help.  We humans also have the ability find things. We can search and filter to perceive the world in such a way as to comprehend its essential truth. To see through appearances, It is an essential survival skill. The unseen tiger is death. Now, in the Information Age, we have created machines to help us find things, help us see the hidden patterns.

We can create meaning, we can know the truth. Our machines, our robot friends, can help us in these pursuits. They can help us attain insights into the hidden order behind chaotic systems of otherwise meaningless information. Humans are negentropic to a high degree, probably more so than any other living system on this planet. With the help of our robot friends, humans can quickly populate the world with meaning and move beyond a mere Information Age. We can find order, process the binary yes-or-no choices and generate Knowledge. This is similar is the skilled editor’s function discussed in Gleick’s Talks at Google (53:45, Google, 2011), but one whose abilities are greatly enhanced by AI analytics and crowdsourcing. The arbitration of truth as they put it in the video is thereby facilitated.

With the help of computers our abilities to create Knowledge are exploding. We may survive the Information flood. Some day our Knowledge may evolve even further, into higher-level integrations – into Wisdom.

James GleickWhen James Gleick was interviewed by Publishers Weekly in 2011 about his book, The Information: a history, a theory, a floodhe touched upon the problem with Information:

By the technical definition, all information has a certain value, regardless of whether the message it conveys is true or false. A message could be complete nonsense, for example, and still take 1,000 bits. So while the technical definition has helped us become powerful users of information, it also instantly put us on thin ice, because everything we care about involves meaning, truth, and, ultimately, something like wisdom. And as we now flood the world with information, it becomes harder and harder to find meaning. That paradox is the final tension in my book.

Application of Information Theory to e-Discovery and Social Progress

Information-mag-glassIn responding to lawsuits we must search through information stored in computer systems. We are searching for information relevant to a dispute. This dispute always arises after the information was created and stored. We do not order and store information according to issues in a dispute or litigation that has not yet happened. This means that for purposes of litigation all information storage systems are inherently entropic, chaotic. They are always inadequately ordered, as far as the lawsuit is concerned. Even if the ESI storage is otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email accounts), it is never well-ordered for a particular lawsuit.

As forensic evidence finders we must always sort through meaningless, irrelevant noise to find the meaningful, relevant information we need. The information we search is usually not completely random. There is some order to it, some meaning. There are, for instance, custodian and time parameters that assist our search for relevance. But the ESI we search is never presented to us arranged in an order that tracks the issues raised by the new lawsuit. The ESI we search is arranged according to other logic, if any at all.

It is our job to bring order to the chaos, meaning to the information, by separating the relevant information from the irrelevant information. We search and find the documents that have meaning for our case. We use sampling, metrics, and iteration to achieve our goals of precision and recall. Once we separate the relevant documents from the irrelevant, we attain some knowledge of the total dataset. We have completed First Pass Review, but our work is not finished. All of the relevant information found in the First Pass is not produced.

Additional information refinement is required. More yes-no decisions must be made in what is called Second Pass Review. Now we consider whether a relevant document is privileged and thus excluded from production, or whether portions of it must be redacted to protect confidentiality.

Even after our knowledge is so further enhanced by confidentiality sorting, and a production set is made, the documents produced, our work is still incomplete. There is almost always far too much information in the documents produced for them to be useful. The information must be further processed. Relevancy itself must be ranked. The relevant documents must be refined down to the 7 +/- 2 documents that will persuade the judge and jury to rule our way, to reach the yes or no decision we seek. The vast body of knowledge, relevant evidence, must become wisdom, must become persuasive evidence.

Knowledge_Information_Wisdom

In a typical significant lawsuit the metrics of this process are as follows: from trillions, to thousands, to a handful. (You can change the numbers if you want to fit the dispute, but what counts here are the relative proportions.)

In a typical lawsuit today we begin with an information storage system that contains trillions of computer files. A competent e-discovery team is able to reduce this down to tens of thousands of files, maybe less, that are relevant. The actual count depends on many things, including issue complexity, cooperation and Rule 26(b)(1) factors. The step from trillions of files, to tens of thousands of relevant files, is the step from information to knowledge. Many think this is what e-discovery is all about: find the relevant evidence, convert Information to Knowledge. But it is not. It is just the first step: from 1 to 2. The next step, 2 to 3, the Wisdom step, is more difficult and far more important.

The tens of thousands of relevant evidence, the knowledge of the case, is still too vast to be useful. After all, the human brain can, at best, only keep seven items in mind at a time. Miller, The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review 63 (2): 81–97. Tens of thousands of documents, or even thousands of documents, are not helpful to jurors. It may all be relevant, but is not all important. All trial lawyers will tell you that trials are won or lost by only five to nine documents. The rest is just noise, or soon forgotten foundation. Losey, Secrets of Search – Part III (5th secret).

The final step of information processing in e-discovery is only complete when the tens of thousands of files are winnowed down to 5 or 9 documents, or less. That is the final step of Information’s journey, the elevation from Knowledge to Wisdom.

Our challenge as e-discovery team members is to take raw information and turn it into wisdom – the five to nine documents with powerful meaning that will produce the favorable legal rulings that we seek. Testimony helps too of course, but without documents, it is difficult to test memory accuracy, much less veracity. This evidence journey mirrors the challenge of our whole culture, to avoid drowning in too-much-information, to rise above, to find Knowledge and, with luck, a few pearls of Wisdom.

Conclusion

Ralph_green2From trillions to a handful, from mere information to practical wisdom — that is the challenge of our culture today. On a recursive self-similar level, that is also the challenge of justice in the Information Age, the challenge of e-discovery. How to meet the challenges? How to self-organize from out of the chaos of too much information? The answer is iterative, cooperative, interactive, interdisciplinary team processes that employ advanced hybrid, multimodal technologies and sound human judgment. See What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition.

The micro-answer for cyber-investigators searching for evidence is fast becoming clear. It depends on a balanced hybrid application of human and artificial intelligence. What was once a novel invention, TAR, or technology assisted review, is rapidly becoming an obvious solution accepted in courts around the world. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015); Pyrrho Investments v MWB PropertyEWHC 256 (Ch) (2/26/16). That is how information works. What was novel one day, even absurd, can very quickly become commonplace. We are creating, transmitting and processing information faster than ever before. The bits are flying at a rate that even Claude Shannon would never have dreamed possible.

The pace of change quickens as information and communication grows. New information flows and inventions propagate. The encouragement of negentropic innovation – ordered bits – is the basis of our property laws and commerce. The right information at the right time has great value.

Just ask a trial lawyer armed with five powerful documents — five smoking guns. These essential core documents are what make or break a case. The rest is just so much background noise, relevant but unimportant. The smoking hot Wisdom is what counts, not Information, not even Knowledge, although they are, of course, necessary prerequisites. There is a significant difference between inspiration and wisdom. Real wisdom does not just appear out of thin air. It arises out of True Information and Knowledge.

The challenge of Culture, including Law and Justice in our Information Age, is to never lose sight of this fundamental truth, this fundamental pattern: Information → Knowledge → Wisdom. If we do, we will get lost in the details. We will drown in a flood of meaningless information. Either that, or we will progress, but not far enough. We will become lost in knowledge and suffer paralysis by analysis. We will know too much, know everything, except what to do. Yes or No. Binary action. The tree may fall, but we never hear it, so neither does the judge or jury. The power of the truth is denied,

There is deep knowledge to be gained from both Chaos and Information Theories that can be applied to the challenges. Some of the insights can be applied in legal search and other cyber investigations. Others can be applied in other areas. As shown in this essay, details are important, but never lose sight of the fundamental pattern. You are looking for the few key facts. Like the Mandelbrot Set they remain the same, or at least similar, over different scales of magnitude, from the small county court case, to the largest complex multinational actions. Each case is different, yet the same. The procedures ties them all together.

Meaning is the whole point of Information. Justice is whole point of the Law.

You find the truth of a legal controversy by finding the hidden order that ties together all of the bits of evidence together. You find the hidden meaning behind all of the apparent contradictory clues, a fractal link of the near infinite strings of bits and bytes.

What really happened? What is the just response, the equitable remedy? That is the ultimate meaning of e-discovery, to find the few significant, relevant facts in large chaotic systems, the facts that make or break your case, so that judges and juries can make the right call. Perhaps this is the ultimate meaning of many of life’s challenges? I do not have the wisdom yet to know, but, as Cat Stevens says, I’m on the road to find out.


Five Reasons You Should Read the ‘Practical Law’ Article by Maura Grossman and Gordon Cormack called “Continuous Active Learning for TAR”

April 11, 2016

Maura-and-Gordon_Aug2014There is a new article by Gordon Cormack and Maura Grossman that stands out as one of their best and most accessible. It is called Continuous Active Learning for TAR (Practical Law, April/May 2016). The purpose of this blog is to get you to read the full article by enticing you with some of the information and knowledge it contains. But before we go into the five reasons, we will examine the purpose of the article, which aligns with our own, and touch on the differences between their trademarked TAR CAL method and our CAR Hybrid Multimodal method. Both of our methods use continuous, active learning, the acronym for which, CAL, they now claim as a Trademark. Since they clearly did invent the acronym, CAL, we for one will stop using it – CAL – as a generic term.

The Legal Profession’s Remarkable Slow Adoption of Predictive Coding

The article begins with the undeniable point of the remarkably slow adoption of TAR by the legal profession, in their words:

Adoption of TAR has been remarkably slow, considering the amount of attention these offerings have received since the publication of the first federal opinion approving TAR use (see Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012)).

Winners in Federal CourtI remember getting that landmark ruling in our Da Silva Moore case, a ruling that pissed off plaintiffs’ counsel, because, despite what you may have heard to the contrary, they were strenuously opposed to predictive coding. Like most other lawyers at the time who were advocating for advanced legal search technologies, I thought Da Silva would open the flood gates, that it would encourage attorneys to begin using the then new technology in droves. In fact, all it did was encourage the Bench, but not the Bar. Judge Peck’s more recent ruling on the topic contains a good summary of the law. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015). There were a flood  of judicial rulings approving predictive coding all around the country, and lately, around the world. See Eg. Pyrrho Investments v MWB PropertyEWHC 256 (Ch) (2/26/16).

The rulings were followed in private arbitration too. For instance, I used the Da Silva More ruling a few weeks after it was published to obtain what was apparently the first ruling by an arbitrator in AAA approving use of predictive coding. The opposition to our use of cost-saving technology in that arbitration case was again fierce, and again included personal attacks, but the arguments for use in arbitration are very compelling. Discovery in arbitration is, after all, supposed to be constrained and expedited.

IT_GovernanceAfter the Da Silva Moore opinion, Maura Grossman and I upped our speaking schedule (she far more than me), and so did several tech-minded judges, including Judge Peck (although never at the same events as me, until the cloud of false allegations created by a bitter plaintiff’s counsel in Da Silva Moore could be dispelled). At Legal Tech for the next few years Predictive Coding is all anybody wanted to talk about. Then IG, Information Governance, took over as the popular tech-child of the day. In 2015 we had only a few predictive coding panels at Legal Tech, but they were well attended.

The Grossman Cormack speculates that the cause of the remarkably slow adoption is:

The complex vocabulary and rituals that have come to be associated with TAR, including statistical control sets, stabilization, F1 measure, overturns, and elusion, have dissuaded many practitioners from embracing TAR. However, none of these terms, or the processes with which they are associated, are essential to TAR.

Control-SetsWe agree. The vendors killed what could have been their golden goose with all this control set nonsense and their engineers love of complexity and misunderstanding of legal search. I have ranted about this before. See Predictive Coding 3.0. I will not go into that again here, except to say the statistical control set nonsense that had large sampling requirements was particularly toxic. It was not only hard and expensive to do, it led to mistaken evaluations of the success or failure of projects because it ignored the reality of the evolving understand of relevance, so called concept drift. Another wrong turn involved the nonsense of using only random selection to find training documents, a practice that Grossman and I opposed vigorously. See Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One,  Part Two,  Part Three, and Part Four. Grossman and Cormack correctly criticize these old vendor driven approaches in Continuous Active Learning for TAR. They call them SAL and SPL protocols (a couple of acronyms that no one wants to trademark!).

Bottom line, the tide is changing. Over the last several years the few private attorneys who specialize in legal search, but are not employed by a vendor, have developed simpler methods. Maura and I are just the main ones writing and speaking about it, but there are many others who agree. Many have found that it is counter-productive to use control sets, random input, non-continuous training with its illogical focus on the seed set, and misleading recall point projections.

grossman_cormack_filteredWe do so in defiance of the vendor establishment and other self-proclaimed pundits in this area who benefitted by such over-complexity. Maura and Gordon, of course, have their own software (Gordon’s creation), and so never needed any vendors to begin with. Not having a world renowned information scientist like Professor Cormack as my life partner, I had no choice but to rely on vendors for their software. (Not that I complaining, mind you. I’m married to a mental health counselor, and it does not get any better than that!)

MrEdr_CapedAfter a few years I ultimately settled on one vendor, Kroll Ontrack, but I continue to try hard to influence all vendors. It is a slow process. Even Kroll Ontrack’s software, which I call Mr. EDR, still has control set functions built in. Thanks to my persistence, it is easy to turn off these settings and do things my way, with no secret control sets and false recall calculations. Hopefully soon that will be the default setting. Their eyes have been opened. Hopefully all of the other major vendors will soon follow suit.

All of the Kroll Ontrack experts in predictive coding are now, literally, a part of my Team. They are now fully trained and believers in the simplified methods, methods very similar to those of Grossman and Cormack, albeit, as I will next explain, slightly more complicated. We proved how well these methods worked at TREC 2015 when the Kroll Ontrack experts and I did 30 review projects together in 45 days. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and  (web page with short summary). Also see – Mr. EDR with background information on the Team’s participation in the TREC 2015 Total Recall Track.

We Agree to Disagree with Grossman and Cormack on One Issue, Yet We Still Like Their Article

Team_TRECWe are fans of Maura Grossman and Gordon Cormack’s work, but not sycophants. We are close, but not the same; colleagues, but not followers. For those reasons we think our recommendation for you to read this article means more than a typical endorsement. We can be critical of their writings, but, truth is, we liked their new article, although we continue to dislike the name TAR (not important, but we prefer CAR). Also, and this is of some importance, my whole team continues to disagree with what we consider the somewhat over-simplified approach they take to finding training documents, namely reliance on the highest ranking documents alone.

LogisticRegressionWindowLogisticFitChart6Despite what some may think, the high-ranking approach does eventually find a full diversity of relevant documents. All good predictive coding software today pretty much uses some type of logistic regression based algorithms that are capable of building out probable relevance in that way. That is one of the things we learned by rubbing shoulders with text retrieval scientists from around the world at TREC when participating in the 2015 Total Recall Track that Grossman and Cormack helped administer. This regression type of classification system works well to avoid the danger of over-training on a particular relevancy type. Grossman and Cormack have proven that before to our satisfaction (so have our own experiments), and they again make a convincing case for this approach in this article.

4_Cylinder_engineStill, we disagree with their approach of only using high-ranking documents for training, but we do so on the grounds of efficiency and speed, not effectiveness. The e-Discovery Team continues to advocate a Hybrid Multimodal approach to active machine learning. We use what I like to call a four-cylinder type of CAR search engine, instead of one-cylinder, like they do.

  1. High-ranking documents;
  2. Mid-level, uncertain documents;
  3. A touch, a small touch, of random documents; and,
  4. Human ingenuity found documents, using all type of search techniques (multimodal) that seem appropriate to the search expert in charge, including keyword, linear, similarity (including chains and families), concept (including passive machine learning, clustering type search).

Predictive Coding 3.0 – The method is here described as an eight-part work flow (Step 6 – Hybrid Active Training).

The latest Grossman and Cormack’s versions of CAL (their trademark) only uses the highest-ranking documents for active training. Still, in spite of this difference, we liked their article and recommend you read it.

The truth is, we also emphasize the high-probable relevant documents for training. The difference between us is that we use the three other methods as well. On that point we agree to disagree. To be clear, we are not talking about continuous training or not, we agree on that. We are not talking about active training, or not (passive), we agree on that. We are not talking about using what they call using SAL or SPL protocols (read their article for details), we agree with them that these protocols are ineffective relics invented by misguided vendors. We are only talking about a difference in methods to find documents to use to train the classifier. Even that is not a major disagreement, as we agree with Grossman and Cormack that high-ranking documents usually make the best trainers, just not in the first seed set. There are also points in a search, depending on the project, where the other methods can help you get to the relevant documents in a fast, efficient manner. The primary difference between us is that we do not limit ourselves to that one retrieval method like Grossman and Cormack do in their trademarked CAL methodology.

Cormack and Grossman emphasize simplicity, ease of use, and reliance on the software algorithms as another way to try to overcome the Bar’s continued resistance to TAR. The e-Discovery Team has the same goal, but we do not think it is necessary to go quite that far for simplicity sake. The other methods we use, the other three cylinders, are not that difficult and have many advantages. e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF and web page with short  summary). Put another way, we like the ability of fully automatic driving from time to time, but we want to keep an attorney’s learned hand at or near the wheel at all times. See Why the ‘Google Car’ Has No Place in Legal Search.

Accessibility with Integrity: The First Reason We Recommend the Article

Professor Gordon Cormack

Here’s the first reason we like Grossman & Cormack’s article, Continuous Active Learning for TAR: you do not have to be one of Professor Cormac’s PhD students to understand it. Yes. It is accessible, not overly technical, and yet still has scientific integrity, still has new information, accurate information, and still has useful knowledge.

It is not easy to do both. I know because I try to make all of my technical writings that way, including the 57 articles I have written on TAR, which I prefer to call Predictive Coding, or CAR. I have not always succeeded in getting the right balance, to be sure. Some of my articles may be too technical, and perhaps some suffer from breezy information over-load and knowledge deficiency. Hopefully none are plain wrong, but my views have changed over the years. So have my methods. If you compare my latest work-flow (below) with earlier ones, you will see some of the evolution, including the new emphasis over the past few years with continuous training.

predictive_coding_revised_small_size

The Cormacks and I are both trying hard to get the word out to the Bar as to the benefits of using active machine learning in legal document review.  (We all agree on that term, active machine learning, and all agree that passive machine learning is not an acceptable substitute.) It is not easy to write on this subject in an accurate, yet still accessible and interesting manner. There is a constant danger that making a subject more accessible and simple will lead to inaccuracies and misunderstandings. Maura and Gordon’s latest article meets this challenge.

Search ImageTake for example the first description in the article of their continuous active training search method using highest ranking documents:

At the outset, CAL resembles a web search engine, presenting first the documents that are most likely to be of interest, followed by those that are somewhat less likely to be of interest. Unlike a typical search engine, however, CAL repeatedly refines its understanding about which of the remaining documents are most likely to be of interest, based on the user’s feedback regarding the documents already presented. CAL continues to present documents, learning from user feedback, until none of the documents presented are of interest.

That is a good way to start an article. The comparison with a Google search having continued refinement based on user feedback is well thought out; simple, yet accurate. It represents a description honed by literally hundreds of presentations on the topic my Maura Grossman. No one has talked more on this topic than her, and I for one intend to start using this analogy.

Rare Description of Algorithm Types – Our Second Reason to Recommend the Article

Another reason our Team liked Continuous Active Learning for TAR is the rare description of search algorithm types that it includes. Here we see the masterful touch of one of the world’s leading academics on text retrieval, Gordon Cormack. First, the article makes clear the distinction between effective analytic algorithms that truly rank documents using active machine learning, and a few other popular programs now out there that use passive learning techniques and call it advanced analytics.

The supervised machine-learning algorithms used for TAR should not be confused with unsupervised machine-learning algorithms used for clustering, near-duplicate detection, and latent semantic indexing, which receive no input from the user and do not rank or classify documents.

Old_CAR_stuck_mudThese other older, unsupervised search methods are what I call concept search. It is not predictive coding. It is not advanced analytics, no matter what some vendors may tell you. It is yesterday’s technology – helpful, but far from state-of-the-art. We still use concept search as part of multimodal, just like any other search tool, but our primary reliance to properly rank documents is placed on active machine learning.

hyperplanes3d_2The Cormack-Grossman article goes farther than pointing out this important distinction, it also explains the various types of bona fide active machine learning algorithms. Again, some are better than others. First Professor Cormack explains the types that have been found to be effective by extensive research over the past ten years or so.

Supervised machine-learning algorithms that have been shown to be effective for TAR include:

–  Support vector machines. This algorithm uses geometry to represent each document as a point in space, and deduces a boundary that best separates relevant from not relevant documents.

– Logistic regression. This algorithm estimates the probability of a document’s relevance based on the content and other attributes of the document.

Conversely Cormack explains:

Popular, but generally less effective, supervised machine-learning algorithms include:

– Nearest neighbor. This algorithm classifies a new document by finding the most similar training document and assuming that the correct coding for the new document is the same as its nearest neighbor.

– Naïve Bayes (Bayesian classifier). This algorithm estimates the probability of a document’s relevance based on the relative frequency of the words or other features it contains.

Ask your vendor which algorithms its software includes. Prepare yourself for double-talk.

Hot-or-Not

If you try out your vendors software and the Grossman-Cormack CAL method does not work for you, and even the e-Discovery Team’s slightly more diverse Hybrid Multimodal method does not work, then your software may be to blame. As Grossman-Cormack put it, where the phrase “TAR tool” means software:

[I]t will yield the best possible results only if the TAR tool incorporates a state-of-the-art learning algorithm.

That means software that uses a type of support vector machine and/or logistic regression.

Teaching by Example – Our Third Reason to Recommend the Article

The article uses a long example involving search of Jeb Bust email to show you how their CAL method works. This is an effective way to teach. We think they did a good job with this. Rather than spoil the read with quotes and further explanation, we urge you to check out the article to see for yourself. Yes, it is an oversimplification, after all this is a short article, but it is a good one, and is still accurate.

 Quality Control Suggestions – Our Fourth Reason to Recommend the Article

quality_diceAnother reason we like the article are the quality control suggestions it includes. They essentially speak of using other search methods, which is exactly what we do in Hybrid Multimodal. Here are their words:

To increase counsel’s confidence in the quality of the review, they might:

Review an additional 100, 1,000, or even more documents.

Experiment with additional search terms, such as “Steve Jobs,” “iBook,” or “Mac,” and examine the most-likely relevant documents containing those terms.

Invite the requesting party to suggest other keywords for counsel to apply.

Review a sample of randomly selected documents to see if any other documents of interest are identified.

We like this because it shows that the differences are small between the e-Discovery Team’s Hybrid Multimodal method (hey, maybe I should claim Trademark rights to Hybrid Multimodal, but then again, no vendors are using my phrase to sell their products) using continuous active training, and the Grossman-Cormack trademarked CAL method. We also note that their section on Measures of Success essentially mirrors our own thoughts on metric analysis and ei-Recall. Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part Three.

Article Comes With an Online “Do it Yourself” CAL Trial Kit – Our Fifth Reason to Recommend the Article

We are big believers in learning by doing. That is especially true in legal tasks that seem complicated in the abstract. I can write articles and give presentations that provide explanations of AI-Enhanced Review. You may get an intellectual understanding of predictive coding from these, but you still will not know how to do it. On the other hand, if we have a chance to show someone an entire project, have them shadow us, then they will really learn how it is done. It is like teaching a young lawyer how to try a case. For a price, we will be happy to do so (assuming conflicts clear).

Jeb_BushMaura and Gordon seem to agree with us on that learn by doing point and have created an online tool that anyone can use to try out their method. In allows for a search of the Jeb Bush email, the same set of 290,099 emails that we used in ten of the thirty topics in 2015 TREC. In their words:

There is no better way to learn CAL than to use it. Counsel may use the online model CAL system to see how quickly and easily CAL can learn what is of interest to them in the Jeb Bush email dataset. As an alternative to throwing up their hands over seed sets, control sets, F1 measures, stabilization, and overturns, counsel should consider using their preferred TAR tool in CAL mode on their next matter.

You can try out their method with their online tool, or in a real project using your vendor’s tool. By the way, we did that as part of our TREC 2015 experiments, and the Kroll Ontrack software worked about the same as theirs, even when we used their one-cylinder, high ranking only, CAL (their trademark) method.

Here is where you can find their CAL testing tool: cormack.uwaterloo.ca/cal. Those of you who are still skeptical can see for yourself how it works. You can follow the example given in the article about searching for documents relevant to Apple products, to verify their description of how that works. For even more fun, you can dream up your own searches.

030114-O-0000D-001 President George W. Bush. Photo by Eric Draper, White House.

Perhaps, if you try hard enough, you can find some example searches where their high-end only method, which is built into the test software, does not work well. For example, try finding all emails that pertain to, or in any way mention, the then President, George Bush. Try entering George Bush in the demo test and see for yourself what happens.

It becomes a search for George + Bush in the same document, and then goes from there based on your coding the highest ranked documents presented as either relevant or non-relevant. You will see that you quickly end up in a TAR pit. The word Bush is in every email (I think), so you are served up with every email where George is mentioned, and believe me, there are many Georges, even if there is only one President George Bush. Here is the screen shot of the first document presented after entering George Bush. I called it relevant.

Screen Shot 2016-04-10 at 4.13.24 PM

These kind of problem searches do not discredit TAR, or even the Grossman Cormack one-cylinder search method. If this happened to you in a real search project, you could always use our Hybrid Multimodal™ method for the seed set (1st training), or start over with a different keyword or keywords to start the process. You could, for instance, search for President Bush, or President within five of George, or “George Bush.” There are many ways, some faster and more effective than others.

Even using the single method approach, if you decided to use the keywords “President + Bush”, then the search will go quicker than “George + Bush.” Even just using the term “President” works better than George + Bush, but still seems like a TAR pit, and not a speeding CAR. It will probably get you to the same destination, high recall, but the journey is slightly longer and, at first, more tedious. This high recall result was verified in TREC 2015 by our Team, and by a number of Universities who participated in the fully automatic half of the Total Recall Track, including Gordon’s own team. This was all done without any manual review by the fully automatic participants because there was instant feedback of relevant or irrelevant based on a prejudged gold standard. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and (web page with short  summary). With this instant feedback protocol, all of the teams attained high recall and good precision. Amazing but true.

You can criticized this TREC experiment protocol, which we did in our report, as unrealistic to legal practice because:

(1) there is no SME who works like that (and there never will not be, until legal knowledge itself is learned by an AI); and,

(2) the searches presented as tasks were unrealistically over-simplistic. Id.

But you cannot fairly say that CAL (their trademark) does not work. The glass is most certainly not half empty. Moreover, the elixir in this glass is delicious and fun, especially when you use our Hybrid Multimodal™ method. See Why I Love Predictive Coding: Making document review fun with Mr. EDR and Predictive Coding 3.0.

Conclusion

Ralph_head_2016Active machine learning (predictive coding) using support vector or logistic regression algorithms, and a method that employs continuous active training, using either one cylinder (their CAL), or four (our Hybrid Multimodal), really works, and is not that hard to use. Try it out and see for yourself. Also, read the Grossman Cormack article, it only takes about 30 minutes. Continuous Active Learning for TAR (Practical Law, April/May 2016). Feel free to leave any comments below. I dare say you can even ask questions of Grossman or Cormack here. They are avid readers and will likely respond quickly.


%d