TAR Course: 17th Class

Seventeenth Class: Another “Player’s View” of the Workflow

In this class we take a look at the workflow in a complex ESI Review Project from the “Players” perspective, the people involved. In the process of teaching this new perspective we will a little deeper into into the activities of each team member, especially that of the AI Trainer. This is the same basic eight-step workflow we have already discussed, but this time we focus on the iterative training steps four, five and six and the latest approach to do this training in a cost efficient manner. We also provide more information here on teamwork in first pass review as led by the AI Trainer. The primary purpose of the TAR Course is to prepare legal technologists to fulfill the AI Trainer role. In this class the importance of metadata to the AI Trainer will also be explained. The graphic below illustrates the new perspective.

AI Trainers Need Only Look at Metadata

As you can see from the diagram, in the role of AI Trainer you do not actually have to look at any documents, just the ESI metadata, such as number of documents with certain keywords or documents that already have relevance classifications. The metadata is all the information that an AI Trainer needs in order to search, make machine training selection decisions and select documents for reviewers to actually look at and classify (multimodal review). These are steps four, five and six. This new Key Players diagram is another way of describing the iterative process that makes up the core of AI-enhanced document review today.

An AI Trainer could not complete their role in this process with their own limited human capacities and intelligence. They must rely heavily on the input of the machine intelligence, the ranking metadata created by the AI (step-five in the iterated steps 4, 5 and 6 shown above). This is a Hybrid process, Man and Machine, working together using a variety of search techniques.

The highest ranking documents are almost always included in the documents that an AI Trainer selects to batch out to reviewers to examine and code. (Step One in the Key Players work flow.) This ranking is part of the metadata that the AI adds to the ESI. (Step Four in the Key Players work flow.)

Infinity Loop

The iterative training workflow when described in terms of the Key Players forms a figure-eight, an infinity loop, as shown in blue in the diagram below. Step One in the Key Players work flow is the AI Trainer led work of multimodal search and choosing ESI for the Review Attorneys to read and code. Step Two is the work of the Review Attorneys to code the documents. This changes the metadata by adding classifications to the documents they review. In Step Three the AI Trainer studies the classification metadata created by the Reviewers, and other metadata, and uses this information to choose the ESI to Train the AI. In Step Four the AI re-ranks the ESI again, changing the metadata again. That brings us back to Step One again and the AI Trainer’s work to study the new metadata, create searches and batch out more documents for the Document Reviewers. The highest ranking documents are almost always included in the documents that the AI Trainer selects to batch out to reviewers to examine and code.

This circle eight workflow keeps repeating until all of the responsive documents (ESI) required by the project have been found. That should include all Highly Relevant documents and most, if not close to all, of the merely relevant. The iterative infinity loop comes to an end when the AI Trainer determines that reasonable, proportional efforts have been made and makes a Stop Decision. Then we move onto testing that decision with a random sample, which is part of Step Seven, Zero Error Numerics (ZEN) Quality Control.

Limiting the Time of High Billing Rate Attorneys

The metadata, and an AI Trainer’s monitoring of all communications in a project, provides all of the information that an AI Trainer needs to help supervise first-pass quality control. (AI Trainers rarely get involved in second pass, final production work because of their higher billing rates and because the project managers have that well under control.) Since AI Trainers do not spend much time looking at documents, their time on a project, which is at a relatively high billing rate, is very limited. Clients like that. So do most AI Trainers, since they just get to do what they enjoy the most.

With the help of an AI Trainer’s document review attorneys, most of whom are contract attorneys who specialize in review, an AI Trainer can complete an entire project without ever reading a single document! The review attorneys do it for you and, if you have a good team, they should do a great job at it. (Pay more to only use the best.) The bulk of the work in complex projects like this is now performed by these document reviewers, either contract review attorneys or Junior SME (Subject Matter Expert) attorneys. They are the ones that put in the hours. The senior SMEs do not have to spend much time on the review at all, which is good, because most of them do not like this kind of work.

The AI Trainer time on the high-level meta-functions is also constrained and very limited compared to the reviewers. But a good AI Trainer will like what they do and find it challenging, even fun. WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0. People like Ralph Losey, who have done this since 2012, when his Da Silva Moore case with Judge Peck kicked off the predictive coding frenzy, are able to do this pretty fast. That is why a good AI Trainer is still a great deal for a client even though their billing rates are high, oftentimes higher even then the Senior SME on a project.

Better Recall and Precision

If an AI Trainer does their job right, and the AI probability ranking algorithm in the software works correctly, then the amount of time needed by the reviewers to do theirs will be far less. That is primarily because they will have to look at far fewer irrelevant documents. Moreover, we will be able to find far more of the relevant documents by using the AI-enhanced, iterated figure-eight methods.

Our tests, both formal and ad hoc, have shown that the number of relevant documents that we miss is far less in complex projects when we have the help of machine learning. In other words, our recall is higher when we use machine learning search features to help us to find documents. See e-Discovery Team’s 2016 TREC Report: Once Again Proving the Effectiveness of Our Standard Method of Predictive Coding; and, official TREC report for 2015, published on February 20, 2016, found on the NIST website at http://trec.nist.gov/pubs/trec24/papers/eDiscoveryTeam-TR.pdf. We can do without it in simple and small projects, but when the going gets tough, we need the help of machine learning. It is a game changer.

Better, faster, cheaper. This is not a myth or idle vendor promise. If you tried it before and it did not work for you, your software might have been poor or your methods wrong. If you used a method with large random samples at the start and secret control sets, your method was wrong. If you used a method where you trained and then reviewed, instead of continuously training, your method was wrong too. Try it again with the latest methods and software with actual active machine learning features, not just passive analytics. Active machine learning is what makes the difference. Software quality is important. So too are the proper methods of using the technology.

With this technology and these methods it is now possible to have both improved precision and better recall. It is possible to find more relevant documents at a lower cost, even considering the relatively high billing rates of the AI-Trainers and Senior SMEs.

The system shown in the Key Player diagrams allows us to limit the time of these two key players. The Senior SMEs just supervise the work of their junior counterparts and the skilled legal searchers, the AI Trainers, just look at metadata, not the documents. Using this method a good AI Trainer graduate of this TAR Course can supervise and serve as AI-Trainer in multiple projects at the same time.

AI Trainers Need Not Be Involved in SME Issues

As you can see from the gears work flow, as an AI Trainer you do not get involved in subject matter expert issues, such as scope of relevance/responsiveness. Although an AI Trainer may occasionally still do this, especially ones that have experience as a Senior SME, it is not necessary. Most of the time an AI Trainer is not involved. The senior trial attorney and their number-two know a lot more about the subject and case than the AI Trainer. The only issue may be the quality of their expertise, or the good faith of the execution. See Ninth Class: 7th, 8th and 9th Insights – GIGO, QC, SME, Method, Software (GIGO training issues and the problem of the negligent or corrupt SME).

The SME Team

On the question of competent, good faith activities of the SMEs, any negligent or intentional twisting of relevance to hide evidence would be revealed by the metadata. It would also be exposed by the AI and the Review Attorneys. In other words, the AI would notice, the AI Trainer would notice and so would the Review Attorneys. If an unscrupulous attorney were to attempt to hide evidence, it would have to be done before the review by excluding the ESI from the collection. (That is one reason many AI Trainers also like to be involved in collection and insist on bulk collections.) If the ESI is not in the review database to begin with, then it will not be found. But even then, tell-tale traces of the omitted documents may be noticed, such as gaps in email chains. Bottom line, the team approach described here makes the kind of corrupt practices described in Waymo v Uber far more difficult. Waymo v. Uber, Hide-the-Ball Ethics and the Special Master Report of December 15, 2017.

It would be much easier for a corrupt attorney to get away with hiding evidence in an older system, without AI Trainers and document review specialists. Unethical behavior thrives in the dark. It is near impossible to pull off in open group team work. Everyone would have to be in on it. For that reason, in most projects an AI Trainer can be  comfortable to leave most relevance decisions to the SMEs, even when they are not “your attorneys.” But that does does not mean you should ever turn a blind eye, nor ignore red flags. Speak up when you have concerns. Ethics is everyone’s business.

SME in e-Discovery

AI Trainers are e-discovery specialist. They rarely attempt to stay current in other substantive areas of the law. It is hard enough to stay current with e-discovery, both case law and new technology. Assuming a project has good communications (and a good AI Trainer should help out with that), there is no reason for an AI Trainer to know much more than the basics about a case. Also, as discussed in the Ninth Class, we have multiple built-in safeguards for quality control. They catch and help correct mistakes and inconsistencies in relevance judgment. Such mistakes are inevitably in any complex project. The understanding of relevance naturally evolves as more ESI is reviewed. That is the main reason the first methods of predictive coding often worked poorly. They used large, random secret control sets that incorrectly assumed that relevance was fixed. We have fixed and stopped using control sets long ago. See: First Class: Background and History of Predictive Coding.


Under our current methods, the document reviewers themselves (and the AI) are the ones who have to closely understand and follow the SMEs, not you. That means an AI Trainer can work on almost any type of document review project without personal expertise in the type of law involved, nor the relevance rules that develop. It is more important that an AI Trainer know the Reviewers and type of ESI involved. It is especially important for an AI Trainer to understand the metadata created by the reviewers as they code the documents and the metadata created by the software algorithm, the AI, as it ranks the documents. It is more important for an AI Trainer to know the software than the subject matter of the case.

Iterated Four-Step Work Flow for Active Machine Training

Note that in each round of training, including the first, document reviewers are only looking at a few hundred, to at most, a few thousand documents. A typical complex project takes about ten rounds of machine training to complete by finding all of the relevant documents required.

More on the SME Team

The Project Manager



Go on to the Eighteenth class.

Or pause to do this suggested “homework” assignment for further study and analysis.

SUPPLEMENTAL READING: Study Losey’s 2018 Power Point presentation: The Team Players and Play in a Complex Document Review Project: Past, Present and Future.

EXERCISES: Consider that someday SMEs will be largely supplemented, if not replaced altogether by Artificial Intelligence. The AI would know the law and know what evidence would be relevant and discoverable based on the pleadings. See if you can figure out how the workflow would work if and when this happens. The diagrams below on Future and Far Future work flows are two possible scenarios.



Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!


e-Discovery Team LLC COPYRIGHT 2018




Comments are closed.

%d bloggers like this: