We lengthened the TAR Course again by adding a video focusing on the three iterated steps in the eight-step workflow of predictive coding. Those are steps four, five and six: Training Select, AI Document Ranking, and Multimodal Review. Here is the new video introducing these steps. It is divided into two parts.
This video was added to the thirteenth class of the TAR Course. It has sixteen classes altogether, which we continue to update and announce on this blog. There were also multiple revisions to the text in this class.
Unintended Negative Consequences of Da Silva Moore
Predictive coding methods have come a long way since Judge Peck first approved predictive coding in our Da Silva Moore case. The method Brett Anders and I used back then, including disclosure of irrelevant documents in the seed set, was primarily derived from the vendor whose software we used, Recommind, and from Judge Peck himself. We had a good intellectual understanding, but it was the first use for all of us, except the vendor. I had never done a predictive coding review before, nor, for that matter, had Judge Peck. As far as I know Judge Peck still has not ever actually used predictive coding software to do document review, although you would be hard pressed to find anyone else in the world with a better intellectual grasp of the issues.
I call the methods we used in Da Silva Moore Predictive Coding 1.0. See: Predictive Coding 3.0 (October 2015) (explaining the history of predictive coding methods). Now, more than five years later, my team is on version 4.0. That is what we teach in the TAR Course. What surprises me is that the rest of the profession is still stuck in our first method, our first ideas of how to best use the awesome power of active machine learning.
This failure to move on past the Predictive Coding 1.0 methods of Da Silva Moore, is, I suspect, one of the major reasons that predictive coding has never really caught on. In fact, the most successful document review software developers since 2012 have ignored predictive coding altogether.
Looking back now at the 1.0 methods we used in Da Silva I cannot help but cringe. It is truly unfortunate that the rest of the legal profession still uses these methods. The free TAR Course is my attempt to make amends, to help the profession move on from the old methods. Mea Culpa.
In my presentation in Manhattan last month I humorously quipped that my claim to fame, Da Silva Moore, was also my claim to shame. We never intended for the methods in Da Silva Moore to be the last word. It was the first word, writ large, to be sure, but in pencil, not stone. It was like a billboard that was supposed to change, but never did. Who knew what we did back in 2012 would have such unintended negative consequences?
In Da Silva Moore we all considered the method of usage of machine learning that we came up with as something of an experiment. That is what happens when you are the first at anything. We assumed that the methods we came up with would quickly mature and evolve in other cases. They certainly did for us. Yet, the profession has mostly been silent about methods since the first version 1.0 was explained. (I could not take part in these early explanations by other “experts” as the case was ongoing and I was necessarily silenced from all public comment about it.) From what I have been told by a variety of sources many, perhaps even most attorneys and vendors are using the same methods that we used back in 2012. No wonder predictive coding has not caught on like it should. Again, sorry about that.
Why the Silence?
Still, it is hardly all my fault. I have been shouting about methods ever since 2012, even if I was muzzled from talking about Da Silva Moore. Why is no one else talking about the evolution of predictive coding methods? Why is mine the only TAR Course?
There is some discussion of methods going on, to be sure, but most of it is rehashed, or so high-level and intellectual as to be superficial and worthless. The discussions and analysis do not really go into the nitty-gritty of what to do. Why are we not talking about the subtleties of the “Stop decision?” About the in and outs of document training selection. About the respective merits of CAL versus IST? I would welcome dialogue on this with other practicing attorneys or vendor consultants. Instead, all I hear is silence and old issues.
The biggest topic still seems to be the old one of whether to filter documents with keywords before beginning machine training. That is a big, no duh, don’t do it, unless lack of money or some other circumstance forces you to, or unless the filtering is incidental and minor to cull out obvious irrelevant. See eg: Stephanie Serhan, Calling an End to Culling: Predictive Coding and the New Federal Rules of Civil Procedure, 23 Rich. J.L. & Tech. 5 (2016). Referring to the 2015 Rule Amendments, Serhan, a law student, concludes:
Considering these amendments, predictive coding should be applied at the outset on the entire universe of documents in a case. The reason is that it is far more accurate, and is not more costly or time-consuming, especially when the parties collaborate at the outset.
Also see eg, William Webber’s analysis of the Biomet case where this kind of keyword filtering was used before predictive coding began. What is the maximum recall in re Biomet?, Evaluating e-Discovery (4/24/13). Webber, an information scientist, showed back in 2013 that when keyword filtering was used in the Biomet case, it filtered out over 40% of the relevant documents. This doomed the second filter predictive coding review to a maximum possible recall of 60%, even if it was perfect, meaning it would otherwise have attained 100% recall, which (almost) never happens. I have never seen a cogent rebuttal of this analysis; again, aside from proportionality, cost arguments.
There was discussion for a while on another important, yet sort of no-brainer issue, whether to keep on machine training or not, which Grossman and Cormack called Continuous Active Learning (CAL). We did not do that in Da Silva Moore, but we were using predictive Coding 1.0 as explained by our vendor. We have known better than that now for years. In fact, later in 2012, during my two public ENRON document review experiments with predictive coding I did not follow the two-step procedure of version 1.0. Instead, I just kept on training until I could not find any more relevant documents. A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents. (Part One); Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents. (Part Two); Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron (in PDF form and the blog introducing this 82-page narrative, with second blog regarding an update); Borg Challenge: Report of my experimental review of 699,082 Enron documents using a semi-automated monomodal methodology (a five-part written and video series comparing two different kinds of predictive coding search methods).
Of course you keep training. I have never heard any viable argument to the contrary. Train then review, which is the protocol in Da Silva Moore, was the wrong way to do it. Clear and simple. The right way to do machine training is to keep training until you are done with the review. This is the main thing that separates Predictive Coding 1.0 from 2.0. See: Predictive Coding 3.0 (October 2015). I switched to version 2.0 right after Da Silva Moore in late 2012 and started using continuous on my own initiative. It seemed obvious once I had some experience under my belt. Still, I do credit Maura Grossman and Gordon Cormack with the terminology and scientific proof of the effectiveness of CAL, a term which they have now trademarked for some reason. They have made important contributions to methods and are tireless educators of the profession. But where are the other voices? Where are the lawyers?
The Grossman and Cormack efforts are scientific and professorial. To me this is just work. This is what I do as a lawyer to make a living. This is what I do to help other lawyers find the key documents they need in a case. So I necessarily focus on the details of how to actually do active machine learning. I focus on the methods, the work-flow. Aside from the Professors Cormack and Grossman, and myself, almost no one else is talking about predictive coding methods. Lawyers mostly just do what the vendors recommend, like I did back in Da Silva Moore days. Yet almost all of the vendors are stagnant. (The new KrolLDiscovery and Catalyst are two exceptions, and even the former still has some promised software revisions to make.)
From what I have seen of the secret sauce that leaks out in predictive coding software demos of most vendors, they are stuck in the old version 1.0 methods. They know nothing, for instance, of the nuances of double-loop learning taught in the TAR Course. The vendors are instead still using the archaic methods that I thought were good back in 2012. I call these methods Predictive Coding 1.0 an 2.0. See: Predictive Coding 3.0 (October 2015).
In addition to continuous training, or not, most of those methods still use nonsensical random control sets that ignore concept drift, a fact of life in every large review project. Id. Moreover, the statistical analysis in 1.0 and 2.0 that they use for recall does not survive close scrutiny. Most vendors routinely ignore the impact of Confidence Intervals on range and the impact on low prevalence data-sets. They do not even mention binomial calculations designed to deal with low prevalence. Id. Also See: ZeroErrorNumerics.com.
The e-Discovery Team will keep on writing and teaching, satisfied that at least some of the other leaders in the field are doing essentially the same thing. You know who you are. We hope that someday others will experiment with the newer methods. The purpose of the TAR Course is to provide the information and knowledge needed to try these methods. If you have tried predictive coding before, and did not like it, we hear you. We agree. I would not like it either if I still had to use the antiquated methods of Da Silva Moore.
We try to make amends for the unintended consequences of Da SIlva Moore by offering this TAR Course. Predictive coding really is breakthrough technology, but only if used correctly. Come back and give it another try, but this time use the latest methods of Predictive Coding 4.0.
Machine learning is based on science, but the actual operation is an art and craft. So few writers in the industry seem to understand that. Perhaps that is because they are not hands-on. They do not step-in. (Stepping-In is discussed in Davenport and Kirby, Only Humans Need Apply, and by Dean Gonsowski, ) Even most vendor experts have never actually done a document review project of their own. And the software engineers, well, forget about it. They know very little about the law (and what they think they know is often wrong) and very little about what really goes on in a document review project.
Knowledge of the best methods for machine learning, for AI, does not come from thinking and analysis. It comes from doing, from practice, from trial and error. This is something all lawyers understand because most difficult tasks in the profession are like that.
The legal profession needs to stop taking legal advice from vendors on how to do AI-enhanced document review. Vendors are not supposed to be giving legal advice anyway. They should stick to what they do best, creating software, and leave it to lawyers to determine how to best use the tools they make.
My message to lawyers is to get on board the TAR train. Even though Da Silva Moore blew the train whistle long ago, the train is still in the station. The tracks ahead are clear of all legal obstacles. The hype and easy money phase has passed. The AI review train is about to get moving in earnest. Try out predictive coding, but by all means use the latest methods. Take the TAR Course on Predictive Coding 4.0 and insist that your vendor adjust their software so you can do it that way.