TAR Course: 4th Class

Fourth Class: Introduction to the Nine Insights Concerning the Use of Predictive Coding in Legal Document Review

The diagram below summarizes the nine basic insights that have come out of our scientific research in 2015 and 2016. These are the key concepts that we now think are important to understand and implement.

The nine insights are:

  1. Active Machine Learning (aka Predictive Coding)
  2. Concept & Similarity Searches (aka Passive Learning)
  3. Keyword Search (tested, Boolean, parametric)
  4. Focused Linear Search (key dates & people)
  5. GIGO & QC (Garbage In, Garbage Out) (Quality Control)
  6. Balanced Hybrid (man-machine balance with IST)
  7. SME (Subject Matter Expert, typically trial counsel)
  8. Method (for electronic document review)
  9. Software (for electronic document review)

They will be explained in detail throughout the rest of this course, as will the restated 8-Step Workflow. These are not new concepts. Most have been at least mentioned on this blog before. I am confident that all readers will be able to follow along as I set forth the new nuances we learned at TREC.

Although these concepts are all familiar, some of our deepened understanding of these concepts may surprise you. Some were surprising to us. These insights include several changes in thinking on our part. Some of the research results we saw were unexpected. But we follow the data. Our opinions are always held lightly. I have argued both sides of a legal issue too many times as a lawyer to fall into that trap. Our thinking follows the evidence, not our preconceptions. That is, after all, the whole point of research. Schedule permitting, we are also happy to provide in-person or online presentations that explain these concept-summary diagrams. If retained, you can also see it in action.

Although the insights and experiments we did used KrolLDiscovery EDR software, they could have been run on a five or six other software configurations out there today (early 2017).  Our methods require a full-featured document review platform, one that includes bona fide active machine learning abilities, aka Predictive Coding. As all experts in this field know, many of the most popular document review platforms do not have these features, even those stating they use Analytics.

Active Machine Learning is very different, and far more advanced than Analytics, the early forms of which were called Concept Search. This type of machine learning is passive and is not predictive coding. It has its place in any multimodal system such as ours. There is not doubt it can be a powerful feature to improve search and review. But such software is incomplete unless you can add active machine learning.

truth-to-powerSorry dear start-up vendors, and others, but that’s the truth. Consumers, you get what you pay for. You know that. Not sure? Get the help of an independent expert advisor. Do this before you make substantial investments in e-discovery software or choose a vendor for a major project. Also, if you have tried predictive coding, or what you were told was advanced TAR, and it did not work well, do not blame yourself. It could be the software, and could be software and method used. There is a lot of bullshit out there. There always has been when it comes to new technology. It does, however, seems especially prevalent in the legal technology field. Perhaps they think we lawyers are naive and technologically gullible. Do not be fooled. Again, look to an independent consultant if you get confused by vendor claims.

ralph_and_lexieContrary to what some vendors will tell you (typically the ones without robust active machine learning), predictive coding 3.0, and now 4.0 methods, are not rocket science. You do not have to be a TAR-whisperer or do nothing but search, like my A-team for TREC. With good software it is not really that hard at all.

These methods do, however, require an attorney knowledgable in e-discovery and comfortable with software. This is not for novices. But every law firm should anyway have attorneys with special training and experience in technology and e-discovery. For instance, if you practice in the Northern District of California, an e-discovery liaison with such expertise is required in most cases. See Guidelines for the Discovery of Electronically Stored Information. Almost half of the Bar Associations in the U.S. require basic technology competence as an ethical imperative. See eg. ABA Model Rule 1.1, Comment [8] and Robert Ambrogi’s list of 23 states, and counting, that now require such competence. (My own law firm has had an e-discovery liaison program in place since 2010, which I lead and train. I am proud to say that after six years of work it is now a great success.)

So no, you do not have to be a full-time specialist, like the members of my TREC e-Discovery team, to successfully use our method of AI-enhanced review. This is especially true when you work with vendors like Kroll, Catalyst and others that have teams of special consultants to guide you. You just have to pick your vendors wisely.

The nine insights are based on our experience and research. Some of our insights may be complicated, especially our lead insight on Active Machine Learning and what we call ISTIntelligently Spaced Training. We consider IST the smart, human empowering alternative to CAL. If I am able to write these insights up here correctly, the obviousness of them should come through. They are all simple in essence. The insights and methods of Predictive Coding 4.0 document review are partially summarized in the chart below.


Our method is Multimodal in that it uses all kinds of document search tools. Although we emphasize active machine learning, we do not rely on that method alone. Our method is also Hybrid in that we use both machine judgments and human (lawyer) judgments. Moreover, in our method the lawyer is always in charge. We may take our hand off the wheel and let the machine drive for a while, but under our versions of Predictive Coding, we watch carefully. We remain ready to take over at a moment’s notice. We do not rely on one brain to the exclusion of another. See eg. Why the ‘Google Car’ Has No Place in Legal Search (caution against over reliance on fully automated methods of active machine learning). Of course the converse is also true, we never just rely on our human brain alone. It has too many limitations. We enhance our brain with predictive coding algorithms. We add to our own natural intelligence with artificial intelligence. The perfect balance between the two, the Balanced Hybrid, is another of insights that we will discuss later.

Go on to Class Five.

Or pause to do this suggested “homework” assignment for further study and analysis.

SUPPLEMENTAL READING: Read the Predictive Coding 3.0 article in full, both part one and part two. Part one describes the history and part two describes the method. This is still good background, especially for understanding the why so many attorneys who used the old 1.0 and 2.0 methods ended up not liking predictive coding. Many vendors are still stuck in these old, inefficient methods. Still, most are probably able to use the latest 4.0 methods with some minor modifications to the software’s built in work-flows. Also be sure to read the Google Car article referenced in this class. As general advice. you should try to take time to read all articles and other materials that are hyper-linked in any of our classes. That is why we linked them.

EXERCISES: Multiple suggested exercises for you:

  1. Search and find some online articles, including vendor papers, that discuss document review software analytics, but do not use the term predictive coding. Why do you think that is? (Hint – there are several reasons, not just one.)
  2. Search for other articles that specifically explain that TAR is active machine learning. Any idea why there are so few, relatively speaking?
  3. TAR seems to be the acronym that has caught on. Ideas why? What other terms were popular? Any ideas why it is that TAR won out?
  4. This class suggests getting help of an independent expert advisor before deciding upon predictive coding software. Aside from Losey, what other experts can you find that fit the bill?

Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!


e-Discovery Team LLC COPYRIGHT 2017




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: