e-Discovery Team’s 2016 TREC Report: Once Again Proving the Effectiveness of Our Standard Method of Predictive Coding

February 24, 2017

Team_TRECOur Team’s Final Report of its participation in the 2016 TREC ESI search Conference has now been published online by NIST and can be found here and the final corrected version can be found here.  TREC stands for Text Retrieval Conference. It is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The stated purpose of the annual TREC conference is to encourage research in information retrieval from large text collections.

The other co-sponsor of TREC is the United States Department of Defense. That’s right, the DOD is the official co-sponsor of this event, although TREC almost never mentions that. Can you guess why the DOD is interested? No one talks about it at TREC, but I have some purely speculative ideas. Recall that the NSA is part of the DOD.

We participated in one of several TREC programs in both 2015 and 2016, the one closest to legal search, called the Total Recall Track. The leaders, administrators of this Track were Professors Gordon Cormack and Maura Grossman. They also participated each year in their own track.

One of the core purposes of all of the Tracks is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is:

[T]o speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems.

Our participation in TREC in 2015 and 2016 has demonstrated substantial improvements in retrieval methodologies. That is what we set out to do. That is the whole point of the collaboration between the Department of Commerce and Department of Defense to establish TREC.

clinton_emailThe e-Discovery Team has a commercial interest in participation in TREC, not a defense or police interest. Although from what we saw with the FBI’s struggles to search email last year, the federal government needs help. We were very unimpressed by the FBI’s prolonged efforts to review the Clinton email collection. I was one of the few e-discovery lawyers to correctly call the whole Clinton email server “scandal” a political tempest in a teapot. I still do and I am still outraged by how her email review was handled by the FBI, especially with the last-minute “revelations.”

prism_nsaThe executive agencies of the federal government have been conspicuously absent from TREC. They seem incapable of effective search, which may well be a good thing. Still, we have to believe that the NSA and other defense agencies are able to do a far better job at large-scale search than the FBI. Consider their ongoing large-scale metadata and text interception efforts, including the once Top Secret PRISM operation. Maybe it is a good thing the NSA doe not share it abilities with the FBI, especially these days. Who knows? We certainly will not.

Mr_EDRThe e-Discovery Team’s commercial interest is to transfer Predictive Coding technology from our research labs into commercial products, namely transfer our Predictive Coding 4.0 Method using KrolL Discovery EDR software to commercial products. In our case at the present time “commercial products” means our search methods, time and consultations. But who knows, it may be reduced to a robot product someday like our Mr. EDR.

The e-Discovery Team method can be used on other document review platforms as well, not just Kroll’s, but only if they have strong active machine learning features. Active machine learning is what everyone at TREC was testing, although we appear to have been the only participant to focus on a particular method of operation. And we were the only team led by a practicing attorney, not an academic or software company. (Catalyst also fielded a team in 2015 and 2106 headed by Information Science Ph.D., Jeremy Pickens.)

Olympics-finish-line-Usain-Bolt-winsThe e-Discovery Team wanted to test the hybrid multimodal software methods we use in legal search to demonstrate substantial improvements in retrieval methodologies on real-world problems. We have now done so twice; participating in both the 2015 and 2016 Total Recall Tracks. The results in 2016 were even better than 2015. We obtained remarkable results in document review speed, recall and precision; although, as we admit, the search challenges presented at TREC 2016 were easier than most projects we see in legal discovery. Still, to use the quaint language of TREC, we have demonstrated the robustness of our methods and software.

These demonstrations, and all of the reporting and analysis involved, have taken hundreds of hours of our time, but there was no other venue around to test our retrieval methodologies on real-world problems. The demonstrations are now over. We have proven our case. Our standard Predictive Coding method has been tested and its effectiveness demonstrated. No one else has tested and proven their predictive coding methods as we have done. We have proven that our hybrid multimodal method of AI-Enhanced document review is the gold standard. We will continue to make improvements in our method and software, but we are done with participation in federal government programs to prove our standard, even one run by the National Institute of Standards and Technology.


To prove our point that we have now demonstrated substantial improvements in retrieval methodologies, we quote below Section 5.1 of our official TREC report, but we urge you to read the whole thing. It is 164 pages. This section of our report covers our primary research question only. We investigated three additional research questions not included below.


Section 5.1 First and Primary Research Question

What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all thirty-four topics using the Team’s Predictive Coding 4.0 hybrid multimodal search methods and Kroll Ontrack’s software, eDiscovery.com Review (EDR).

Again, as in the 2015 Total Recall Track, the Team attained very good results with high levels of Recall and Precision in all topics, including perfect or near perfect results in several topics using the corrected gold standard. The Team did so even though it only used five of the eight steps in its usual methodology, intentionally severely constrained the amount of human effort expended on each topic and worked on a dataset stripped of metadata. The Team’s enthusiasm for the record-setting results, which were significantly better than its 2015 effort, is tempered by the fact that the search challenges presented in most of the topics in 2016 were not difficult and the TREC relevance judgments had to be corrected in most topics.  …

This next chart uses the corrected standard. It is the primary reference chart we use to measure our results. Unfortunately, it is not possible to make any comparisons with BMI standards because we do not know the order in which the BMI documents were submitted.



The average results obtained across all thirty-four topics at the time of reasonable call using the corrected standard are shown below in bold. The average scores using the uncorrected standard are shown for comparison in parentheses.

  • 91.57% Recall (75.46%)
  • 65.90% Precision (57.12%)
  • 76.64% F1 (57.69%)
  • 124 Docs Reviewed Effort (124)

At the time of reasonable call the Team had recall scores greater than 90% in twenty-two of the thirty-four topics and greater than 80% in five more topics. Recall of greater than 95% was attained in fourteen topics. These Recall scores under the corrected standard are shown in the below chart. The results are far better than we anticipated, including six topics with total recall – 100%, and two topics with both total recall and perfect precision, topic 417 Movie Gallery and topic 434 Bacardi Trademark.


At the time of reasonable call the Team had precision scores greater than 90% in thirteen of the thirty-four topics and greater than 75% in three more topics. Precision of greater than 95% was attained in nine topics. These Precision scores under the corrected standard are shown in the below chart. Again, the results were, in our experience, incredibly good, including three topics with perfect precision at the time of the reasonable call.


At the time of reasonable call the Team had F1 scores greater than 90% in twelve of the thirty-four topics and greater than 75% in two more. F1 of greater than 90% was attained in eight topics. These F1 scores under the corrected standard are shown in the below chart. Note there were two topics with a perfect score, Movie Gallery (100%) and Bacardi Trademark (100%) and three more that were near perfect: Felon Disenfranchisement (98.5%), James V. Crosby (97.57%), and Elian Gonzalez (97.1%).


We were lucky to attain two perfect scores in 2016 (we attained one in 2015), in topic 417 Movie Gallery and topic 434 Bacardi Trademark. The perfect score of 100% F1 was obtained in topic 417 by locating all 5,945 documents relevant under the corrected standard after reviewing only 66 documents. This topic was filled with form letters and was a fairly simple search.

The perfect score of 100% F1 was obtained in topic 434 Bacardi Trademark by locating all 38 documents relevant under the corrected standard after reviewing only 83 documents. This topic had some legal issues involved that required analysis, but the reviewing attorney, Ralph Losey, is an SME in trademark law so this did not pose any problems. The issues were easy and not critical to understand relevance. This was a simple search involving distinct language and players. All but one of the 38 relevant documents were found by tested, refined keyword search. One additional relevant document was found by a similarity search. Predictive coding searches were run after the keywords searches and nothing new was uncovered. Here machine learning merely performed a quality assurance role to verify that all relevant documents had indeed been found.

The Team proved once again, as it did in 2015, that perfect recall and perfect precision is possible, albeit rare, using the Team’s methods and fairly simple search projects.

The Team’s top ten projects attained remarkably high scores with an average Recall of 95.66%, average Precision of 97.28% and average F-Measure: 96.42%. The top ten are shown in the chart below.


In addition to Recall, Precision and F1, the Team per TREC requirements also measured the effort involved in each topic search. We measured effort by the number of documents that were actually human-reviewed prior to submission and coded relevant or irrelevant. We also measured effort by the total human time expended for each topic. Overall, the Team human-reviewed only 6,957 documents to find all the 34,723 relevant documents within the overall corpus of 9,863,366 documents. The total time spent by the Team to review the 6,957 documents, and do all the search and analysis and other work using our Hybrid Multimodal Predictive Coding 4.0 method, was 234.25 hours. reviewed_data_pie_chart_2016

It is typical in legal search to try to measure the efficiency of a document review by the number of documents classified by an attorney in an hour. For instance, a typical contract review attorney can read and classify an average of 50 documents per hour. The Team classified 9,863,366 documents by review of 6,957 documents taking a total time of 234.25 hours. The Team’s overall review rate for the entire corpus was thus 42,106 files per hour (9,863,366/234.25).

In legal search it is also typical, indeed mandatory, to measure the costs of review and bill clients accordingly. If we here assume a high attorney hourly rate of $500 per hour, then the total cost of the review of all 34 Topics would be $117,125. That is a cost of just over $0.01 per document. In a traditional legal review, where a lawyer reviews one document at a time, the cost would be far higher. Even if you assume a low attorney rate of $50 per hour, and review speed of 50 files per hour, the total cost to review every document for every issue would be $9,863,366. That is a cost of $1.00 per document, which is actually low by legal search standards.13

Analysis of project duration is also very important in legal search. Instead of the 234.25 hours expended by our Team using Predictive Coding 4.0, traditional linear review would have taken 197,267 hours (9,863,366/50). In other words, the review of thirty-four projects, which we did in our part-time after work in one Summer, would have taken a team of two lawyers using traditional methods, 8 hours a day, every day, over 33 years! These kinds of comparisons are common in Legal Search.

Detailed descriptions of the searches run in all thirty-four topics are included in the Appendix.


We also reproduce below Section 1.0, Summary of Team Efforts, from our 2016 TREC Report. For more information on what we learned in the 2016 TREC see alsoComplete Description in 30,114 Words and 10 Videos of the e-Discovery Team’s “Predictive Coding 4.0” Method of Electronic Document ReviewNine new insights that we learned in the 2016 research are summarized by the below diagram more specifically described in the article.



Excerpt From Team’s 2016 Report

1.1 Summary of Team’s Efforts. The e-Discovery Team’s 2016 Total Recall Track Athome project started June 3, 2016, and concluded on August 31, 2016. Using a single expert reviewer in each topic the Team classified 9,863,366 documents in thirty-four review projects.

The topics searched in 2016 and their issue names are shown in the chart below. Also included are the first names of the e-Discovery Team member who did the review for that topic, the total time spent by that reviewer and the number of documents manually reviewed to find all of the relevant documents in that topic. The total time of all reviewers on all projects was 234.25 hours. All relevant documents, totaling 34,723 by Team count, were found by manual review of 6,957 documents. The thirteen topics in red were considered mandatory by TREC and the remaining twenty-one were optional. The e-Discovery Team did all topics.


They were all one-person, solo efforts, although there was coordination and communications between Team members on the Subject Matter Expert (SME) type issues encountered. This pertained to questions of true relevance and errors found in the gold standard for many of these topics. A detailed description of the search for each topic is contained in the Appendix.

In each topic the assigned Team attorney personally read and evaluated for true relevance every email that TREC returned as a relevant document, and every email that TREC unexpectedly returned as Irrelevant. Some of these were read and studied multiple times before we made our final calls on true relevance, determinations that took into consideration and gave some deference to the TREC assessor adjudications, but were not bound by them. Many other emails that the Team members considered irrelevant, and TREC agreed, were also personally reviewed as part of their search efforts. As mentioned, there was sometimes consultations and discussion between Team members as to the unexpected TREC opinions on relevance.

This contrasts sharply with participants in the Sandbox division. They never make any effort to determine where their software made errors in predicting relevance, or for any other reasons. They accept as a matter of faith the correctness of all TREC’s prior assessment of relevance. To these participants, who were all academic institutions, the ground truth itself as to relevance or not, was of no relevance. Apparently, that did not matter to their research.

All thirty-four topics presented search challenges to the Team that were easier, some far easier, than the Team typically face as attorneys leading legal document review projects. (If the Bush email had not been altered by omission of metadata, the searches would have been even easier.) The details of the searches performed in each of the thirty-four topics are included in the Appendix. The search challenges presented by these topics were roughly equivalent to the most simplistic challenges that the e-Discovery Team might face in projects involving relatively simple legal disputes. A few of the search topics in 2016 included quasi legal issues, more than were found in the 2015 Total Recall Track. This is a revision that the Team requested and appreciated because it allowed some, albeit very limited testing of legal judgment and analysis in determination of true relevance in these topics. In legal search relevancy, legal analysis skills are obviously very important. In most of the 2016 Total Recall topics, however, no special legal training or analysis was required for a determination of true relevance.

At Home participants were asked to track and report their manual efforts. The e-Discovery Team did this by recording the number of documents that were human reviewed and classified prior to submission. More were reviewed after submission as part of the Team’s TREC relevance checking. Virtually all documents human reviewed were also classified, although all documents classified were not used for active training of the software classifier. The Team also tracked effort by number of attorney hours worked as is traditional in legal services. Although the amount of time varied somewhat by topic, the average time spent per topic was only 6.89 hours. The average review and classification speed for each project was 42,106 files per hour (9,863,366/234.25).

Again, for the full picture and complete details of our work please see the complete 164 page report to TREC of the e-Discovery Team’s Participation in the 2016 Total Recall Track.





WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0

December 3, 2017

Many lawyers and technologists like predictive coding and recommend it to their colleagues. They have good reasons. It has worked for them. It has allowed them to do e-discovery reviews in an effective, cost efficient manner, especially the big projects. That is true for me too, but that is not why I love predictive coding. My feelings come from the excitement, fun, and amazement that often arise from seeing it in action, first hand. I love watching the predictive coding features in my software find documents that I could never have found on my own. I love the way the AI in the software helps me to do the impossible. I really love how it makes me far smarter and skilled than I really am.

I have been getting those kinds of positive feelings consistently by using the latest Predictive Coding 4.0 methodology (shown right) and KrolLDiscovery’s latest eDiscovery.com Review software (“EDR”). So too have my e-Discovery Team members who helped me to participate in TREC 2015 and 2016 (the great science experiment for the latest text search techniques sponsored by the National Institute of Standards and Technology). During our grueling forty-five days of experiments in 2015, and again for sixty days in 2016, we came to admire the intelligence of the new EDR software so much that we decided to personalize the AI as a robot. We named him Mr. EDR out of respect. He even has his own website now, MrEDR.com, where he explains how he helped my e-Discovery Team in the 2015 and 2015 TREC Total Recall Track experiments.

Bottom line for us from this research was to prove and improve our methods. Our latest version 4.0 of Predictive Coding, Hybrid Multimodal IST Method is the result. We have even open-sourced this method, well most of it, and teach it in a free seventeen-class online program: TARcourse.com. Aside from testing and improving our methods, another, perhaps even more important result of TREC for us was our rediscovery that with good teamwork, and good software like Mr. EDR at your side, document review need never be boring again. The documents themselves may well be boring as hell, that’s another matter, but the search for them need not be.

How and Why Predictive Coding is Fun

Steps Four, Five and Six of the standard eight-step workflow for Predictive Coding 4.0 is where we work with the active machine-learning features of Mr. EDR. These are its predictive coding features, a type of artificial intelligence. We train the computer on our conception of relevance by showing it relevant and irrelevant documents that we have found. The software is designed to then go out and find all other relevant documents in the total dataset. One of the skills we learn is when we have taught enough and can stop the training and complete the document review. At TREC we call that the Stop decision. It is important to keep down the costs of document review.

We use a multimodal approach to find training documents, meaning we use all of the other search features of Mr. EDR to find relevant ESI, such as keyword searches, similarity and concept. We iterate the training by sample documents, both relevant and irrelevant, until the computer starts to understand the scope of relevance we have in mind. It is a training exercise to make our AI smart, to get it to understand the basic ideas of relevance for that case. It usually takes multiple rounds of training for Mr. EDR to understand what we have in mind. But he is a fast learner, and by using the latest hybrid multimodal IST (“intelligently spaced learning“) techniques, we can usually complete his training in a few days. At TREC, where we were moving fast after hours with the Ã-Team, we completed some of the training experiments in just a few hours.

After a while Mr. EDR starts to “get it,” he starts to really understand what we are after, what we think is relevant in the case. That is when a happy shock and awe type moment can happen. That is when Mr. EDR’s intelligence and search abilities start to exceed our own. Yes. It happens. The pupil then starts to evolve beyond his teachers. The smart algorithms start to see patterns and find evidence invisible to us. At that point we sometimes even let him train himself by automatically accepting his top-ranked predicted relevant documents without even looking at them. Our main role then is to determine a good range for the automatic acceptance and do some spot-checking. We are, in effect, allowing Mr. EDR to take over the review. Oh what a feeling to then watch what happens, to see him keep finding new relevant documents and keep getting smarter and smarter by his own self-programming. That is the special AI-high that makes it so much fun to work with Predictive Coding 4.0 and Mr. EDR.

It does not happen in every project, but with the new Predictive Coding 4.0 methods and the latest Mr. EDR, we are seeing this kind of transformation happen more and more often. It is a tipping point in the review when we see Mr. EDR go beyond us. He starts to unearth relevant documents that my team would never even have thought to look for. The relevant documents he finds are sometimes completely dissimilar to any others we found before. They do not have the same keywords, or even the same known concepts. Still, Mr. EDR sees patterns in these documents that we do not. He can find the hidden gems of relevance, even outliers and black swans, if they exist. When he starts to train himself, that is the point in the review when we think of Mr. EDR as going into superhero mode. At least, that is the way my young e-Discovery Team members likes to talk about him.

By the end of many projects the algorithmic functions of Mr. EDR have attained a higher intelligence and skill level than our own (at least on the task of finding the relevant evidence in the document collection). He is always lighting fast and inexhaustible, even untrained, but by the end of his training, he becomes a search genius. Watching Mr. EDR in that kind of superhero mode is what makes Predictive Coding 4.0 a pleasure.

The Empowerment of AI Augmented Search

It is hard to describe the combination of pride and excitement you feel when Mr. EDR, your student, takes your training and then goes beyond you. More than that, the super-AI you created then empowers you to do things that would have been impossible before, absurd even. That feels pretty good too. You may not be Iron Man, or look like Robert Downey, but you will be capable of remarkable feats of legal search strength.

For instance, using Mr. EDR as our Iron Man-like suits, my e-discovery Ã-Team of three attorneys was able to do thirty different review projects and classify 17,014,085 documents in 45 days. See 2015 TREC experiment summary at Mr. EDR. We did these projects mostly at nights, and on weekends, while holding down our regular jobs. What makes this crazy impossible, is that we were able to accomplish this by only personally reviewing 32,916 documents. That is less than 0.2% of the total collection. That means we relied on predictive coding to do 99.8% of our review work. Incredible, but true.

Using traditional linear review methods it would have taken us 45 years to review that many documents! Instead, we did it in 45 days. Plus our recall and precision rates were insanely good. We even scored 100% precision and 100% recall in one TREC project in 2015 and two more in 2016. You read that right. Perfection. Many of our other projects attained scores in the high and mid nineties. We are not saying you will get results like that. Every project is different, and some are much more difficult than others. But we are saying that this kind of AI-enhanced review is not only fast and efficient, it is effective.

Yes, it’s pretty cool when your little AI creation does all the work for you and makes you look good. Still, no robot could do this without your training and supervision. We are a team, which is why we call it hybrid multimodal, man and machine.

Having Fun with Scientific Research at TREC 2015 and 2016

During the 2015 TREC Total Recall Track experiments my team would sometimes get totally lost on a few of the really hard Topics. We were not given legal issues to search, as usual. They were arcane technical hacker issues, political issues, or local news stories. Not only were we in new fields, the scope of relevance of the thirty Topics was never really explained. (We were given one to three word explanations in 2015, in 2016 we got a whole sentence!) We had to figure out intended relevance during the project based on feedback from the automated TREC document adjudication system. We would have some limited understanding of relevance based on our suppositions of the initial keyword hints, and so we could begin to train Mr. EDR with that. But, in several Topics, we never had any real understanding of exactly what TREC thought was relevant.

This was a very frustrating situation at first, but, and here is the cool thing, even though we did not know, Mr. EDR knew. That’s right. He saw the TREC patterns of relevance hidden to us mere mortals. In many of the thirty Topics we would just sit back and let him do all of the driving, like a Google car. We would often just cheer him on (and each other) as the TREC systems kept saying Mr. EDR was right, the documents he selected were relevant. The truth is, during much of the 45 days of TREC we were like kids in a candy store having a great time. That is when we decided to give Mr. EDR a cape and superhero status. He never let us down. It is a great feeling to create an AI with greater intelligence than your own and then see it augment and improve your legal work. It is truly a hybrid human-machine partnership at its best.

I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding are over, but the search for truth and justice goes on in lawsuits across the country. Try it on your next document review project.

Do What You Love and Love What You Do

Mr. EDR, and other good predictive coding software like it, can augment our own abilities and make us incredibly productive. This is why I love predictive coding and would not trade it for any other legal activity I have ever done (although I have had similar highs from oral arguments that went great, or the rush that comes from winning a big case).

The excitement of predictive coding comes through clearly when Mr. EDR is fully trained and able to carry on without you. It is a kind of Kurzweilian mini-singularity event. It usually happens near the end of the project, but can happen earlier when your computer catches on to what you want and starts to find the hidden gems you missed. I suggest you give Predictive Coding 4.0 and Mr. EDR a try. To make it easier I open-sourced our latest method and created an online course. TARcourse.com. It will teach anyone our method, if they have the right software. Learn the method, get the software and then you too can have fun with evidence search. You too can love what you do. Document review need never be boring again.


One note of caution: most e-discovery vendors, including the largest, do not have active machine learning features built into their document review software. Even the few that have active machine learning do not necessarily follow the Hybrid Multimodal IST Predictive Coding 4.0 approach that we used to attain these results. They instead rely entirely on machine-selected documents for training, or even worse, rely entirely on random selected documents to train the software, or have elaborate unnecessary secret control sets.

The algorithms used by some vendors who say they have “predictive coding” or “artificial intelligence” are not very good. Scientists tell me that some are only dressed-up concept search or unsupervised document clustering. Only bona fide active machine learning algorithms create the kind of AI experience that I am talking about. Software for document review that does not have any active machine learning features may be cheap, and may be popular, but they lack the power that I love. Without active machine learning, which is fundamentally different from just “analytics,” it is not possible to boost your intelligence with AI. So beware of software that just says it has advanced analytics. Ask if it has “active machine learning“?

It is impossible to do the things described in this essay unless the software you are using has active machine learning features.  This is clearly the way of the future. It is what makes document review enjoyable and why I love to do big projects. It turns scary to fun.

So, if you tried “predictive coding” or “advanced analytics” before, and it did not work for you, it could well be the software’s fault, not yours. Or it could be the poor method you were following. The method that we developed in Da Silva Moore, where my firm represented the defense, was a version 1.0 method. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 183 (S.D.N.Y. 2012). We have come a long way since then. We have eliminated unnecessary random control sets and gone to continuous training, instead of train then review. This is spelled out in the TARcourse.com that teaches our latest version 4.0 techniques.

The new 4.0 methods are not hard to follow. The TARcourse.com puts our methods online and even teaches the theory and practice. And the 4.0 methods certainly will work. We have proven that at TREC, but only if you have good software. With just a little training, and some help at first from consultants (most vendors with bona fide active machine learning features will have good ones to help), you can have the kind of success and excitement that I am talking about.

Do not give up if it does not work for you the first time, especially in a complex project. Try another vendor instead, one that may have better software and better consultants. Also, be sure that your consultants are Predictive Coding 4.0 experts, and that you follow their advice. Finally, remember that the cheapest software is almost never the best, and, in the long run will cost you a small fortune in wasted time and frustration.


Love what you do. It is a great feeling and sure fire way to job satisfaction and success. With these new predictive coding technologies it is easier than ever to love e-discovery. Try them out. Treat yourself to the AI high that comes from using smart machine learning software and fast computers. There is nothing else like it. If you switch to the 4.0 methods and software, you too can know that thrill. You can watch an advanced intelligence, which you helped create, exceed your own abilities, exceed anyone’s abilities. You can sit back and watch Mr. EDR complete your search for you. You can watch him do so in record time and with record results. It is amazing to see good software find documents that you know you would never have found on your own.

Predictive coding AI in superhero mode can be exciting to watch. Why deprive yourself of that? Who says document review has to be slow and boring? Start making the practice of law fun again.

Here is the PDF version of this article, which you may download and distribute, so long as you do not revise it or charge for it.



TAR Course Expands Again: Standardized Best Practice for Technology Assisted Review

February 11, 2018

The TAR Course has a new class, the Seventeenth Class: Another “Player’s View” of the Workflow. Several other parts of the Course have been updated and edited. It now has Eighteen Classes (listed at end). The TAR Course is free and follows the Open Source tradition. We freely disclose the method for electronic document review that uses the latest technology tools for search and quality controls. These technologies and methods empower attorneys to find the evidence needed for all text-based investigations. The TAR Course shares the state of the art for using AI to enhance electronic document review.

The key is to know how to use the document review search tools that are now available to find the targeted information. We have been working on various methods of use since our case before Judge Andrew Peck in Da Silva Moore in 2012. After we helped get the first judicial approval of predictive coding in Da Silva, we began a series of several hundred document reviews, both in legal practice and scientific experiments. We have now refined our method many times to attain optimal efficiency and effectiveness. We call our latest method Hybrid Multimodal IST Predictive Coding 4.0.

The Hybrid Multimodal method taught by the TARcourse.com combines law and technology. Successful completion of the TAR course requires knowledge of both fields. In the technology field active machine learning is the most important technology to understand, especially the intricacies of training selection, such as Intelligently Spaced Training (“IST”). In the legal field the proportionality doctrine is key to the  pragmatic application of the method taught at TAR Course. We give-away the information on the methods, we open-source it through this publication.

All we can transmit by online teaching is information, and a small bit of knowledge. Knowing the Information in the TAR Course is a necessary prerequisite for real knowledge of Hybrid Multimodal IST Predictive Coding 4.0. Knowledge, as opposed to Information, is taught the same way as advanced trial practice, by second chairing a number of trials. This kind of instruction is the one with real value, the one that completes a doc review project at the same time it completes training. We charge for document review and throw in the training. Information on the latest methods of document review is inherently free, but Knowledge of how to use these methods is a pay to learn process.

The Open Sourced Predictive Coding 4.0 method is applied for particular applications and search projects. There are always some customization and modifications to the default standards to meet the project requirements. All variations are documented and can be fully explained and justified. This is a process where the clients learn by doing and following along with Losey’s work.

What he has learned through a lifetime of teaching and studying Law and Technology is that real Knowledge can never be gained by reading or listening to presentations. Knowledge can only be gained by working with other people in real-time (or near-time), in this case, to carry out multiple electronic document reviews. The transmission of knowledge comes from the Q&A ESI Communications process. It comes from doing. When we lead a project, we help students to go from mere Information about the methods to real Knowledge of how it works. For instance, we do not just make the Stop decision, we also explain the decision. We share our work-product.

Knowledge comes from observing the application of the legal search methods in a variety of different review projects. Eventually some Wisdom may arise, especially as you recover from errors. For background on this triad, see Examining the 12 Predictions Made in 2015 in “Information → Knowledge → Wisdom” (2017). Once Wisdom arises some of the sayings in the TAR Course may start to make sense, such as our favorite “Relevant Is Irrelevant.” Until this koan is understood, the legal doctrine of Proportionality can be an overly complex weave.

The TAR Course is now composed of eighteen classes:

  1. First Class: Background and History of Predictive Coding
  2. Second Class: Introduction to the Course
  3. Third Class:  TREC Total Recall Track, 2015 and 2016
  4. Fourth Class: Introduction to the Nine Insights from TREC Research Concerning the Use of Predictive Coding in Legal Document Review
  5. Fifth Class: 1st of the Nine Insights – Active Machine Learning
  6. Sixth Class: 2nd Insight – Balanced Hybrid and Intelligently Spaced Training (IST)
  7. Seventh Class: 3rd and 4th Insights – Concept and Similarity Searches
  8. Eighth Class: 5th and 6th Insights – Keyword and Linear Review
  9. Ninth Class: 7th, 8th and 9th Insights – SME, Method, Software; the Three Pillars of Quality Control
  10. Tenth Class: Introduction to the Eight-Step Work Flow
  11. Eleventh Class: Step One – ESI Communications
  12. Twelfth Class: Step Two – Multimodal ECA
  13. Thirteenth Class: Step Three – Random Prevalence
  14. Fourteenth Class: Steps Four, Five and Six – Iterative Machine Training
  15. Fifteenth Class: Step Seven – ZEN Quality Assurance Tests (Zero Error Numerics)
  16. Sixteenth Class: Step Eight – Phased Production
  17. Seventeenth Class: Another “Player’s View” of the Workflow (class added 2018)
  18. Eighteenth Class: Conclusion

With a lot of hard work you can complete this online training program in a long weekend, but most people take a few weeks. After that, this course can serve as a solid reference to consult during complex document review projects. It can also serve as a launchpad for real Knowledge and eventually some Wisdom into electronic document review. TARcourse.com is designed to provide you with the Information needed to start this path to AI enhanced evidence detection and production.


%d bloggers like this: