TAR Course: 12th Class

March 17, 2017

Twelfth Class: Step Two – Multimodal ECA

Multimodal Early Case Assessment – ECA – summarizes the second step in our 8-step work flow. We used to call the second step “Multimodal Search Review.” It is still the same activity, but we tweaked the name to emphasize the ECA significance of this step. After we have an idea of what we are looking for from ESI Communications in step one, we start to use every tool at our disposal to try to find the relevant documents. Every tool that is, except for active machine learning. Our first look at the documents is our look, not the machine’s. That is not because we do not trust the AI’s input. We do. It is because there is no AI yet. The predictive coding only begins after you feed training documents into the machine. That happens in step four.

predictive_coding_4-0_web

NIST-Logo_RLOur Multimodal ECA step-two does not take that long, so the delay in bringing in our AI is usually short. In our experiments at TREC in 2015 and 2016 under the auspicious of NIST, where we skipped steps three and seven to save time, and necessarily had little ESI Communications in step one, we would often complete simple document reviews of several hundred thousand documents in just a few hours. We cannot match these results in real-life legal document review projects because the issues in law suits are usually much more complicated than the issues presented by most topics at TREC. Also, we cannot take the risk of making mistakes in a real legal project that we did in an academic event like TREC.

Again, the terminology revision to say Multimodal ECA is more a change of style than substance. We have always worked in this manner. The name change is just to better convey the idea that we are looking for the low hanging fruit, the easy to find documents. We are getting an initial assessment of the data by using all of the tools of the search pyramid except for the top tier active machine learning. The AI comes into play soon enough in steps four and five, sometimes as early as the same day.

search_pyramid_revised

I have seen projects where key documents are found during the first ten minutes of looking around. Usually the secrets are not revealed so easily, but it does happen. Step two is the time to get to know the data, run some obvious searches, including any keyword requests for opposing counsel. You use the relevant and irrelevant documents you find in step two as the documents you select in step four to train the AI.

In the process of this initial document review you start to get a better understanding of the custodians, their data and relevance. This is what early case assessment is all about. You will find the rest of the still hidden relevant documents in the iterated rounds of machine training and other searches that follow. Here is my video description of step two.

______

_______

Although we speak of searching for relevant documents in step two, it is important to understand that many irrelevant documents are also incidentally found and coded in that process. Active machine learning does not work by training on relevant documents alone. It must also include examples of irrelevant documents. For that reason we sometimes actively search for certain kinds of irrelevant documents to use in training. One of our current research experiments with Kroll Ontrack is to determine the best ratios between relevant and irrelevant documents for effective document ranking. See TREC reports at Mr. EDR as updated from time to time. At this point we have that issue nailed.

The multimodal ECA review in step two is carried out under the supervision of the Subject Matter Experts on the case. They make final decisions where there is doubt concerning the relevance of a document or document type. The SME role is typically performed by a team, including the partner in charge of the case – the senior SME – and senior associates, and e-Discovery specialist attorney(s) assigned to the case. It is, or should be, a team effort, at least in most large projects. As previously described, the final arbitrator on scope is made by the senior SME, who in turn is acting as the predictor of the court’s views. The final, final authority is always the Judge. The chart below summarizes the analysis of the SME and judge on the discoverability of any document.

relevance_scope_2016

When I do a project, acting as the e-Discovery specialist attorney for the case, I listen carefully to the trial lawyer SME as he or she explains the case. By extensive Q&A the members of the team understand what is relevant. We learn from the SME. It is not exactly a Vulcan mind-meld, but it can work pretty well with a cohesive team.  Most trial lawyers love to teach and opine on relevance and their theory of the case.

Helmuth Karl Bernhard von Moltke

General Moltke

Although a good SME team communicates and plans well, they also understand, typically from years of experience, that the intended relevance scope is like a battle plan before the battle. As the famous German military strategist, General Moltke the Elder said: No battle plan ever survives contact with the enemy. So too no relevance scope plan ever survives contact with the corpus of data. The understanding of relevance will evolve as the documents are studied, the evidence is assessed, and understanding of what really happened matures. If not, someone is not paying attention. In litigation that is usually a recipe for defeat. See Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three.

Army of One: Multimodal Single-SME Approach To Machine LearningThe SME team trains and supervises the document review specialists, aka, contract review attorneys, who usually then do a large part of the manual reviews (step-six), and few if any searches. Working with review attorneys is a constant iterative process where communication is critical. Although I sometimes use an army-of-one approach where I do everything myself (that is how I did the EDI Oracle competition and most of the TREC topics), my preference now is to use two or three reviewers to help with the document review. With good methods, including culling methods, and good software, it is rarely necessary to use more reviewers than that. With the help of strong AI, say that included in Mr. EDR, we can easily classify a million or so documents for relevance with that size team. More reviewers than that may well be needed for complex redaction projects and other production issues, but not for a well-designed first-pass relevance search.

One word of warning when using document reviewers, it is very important for all members of the SME team to have direct and substantial contact with the actual documents, not just the reviewers. For instance, everyone involved in the project should see all hot documents found in any step of the process. It is especially important for the SME trial lawyer at the top of the expert pyramid to see them, but that is rarely more than a few hundred documents, often just a few dozen. Otherwise, the top SME need only see the novel and grey area documents that are encountered, where it is unclear on which side of the relevance line they should fall in accord with the last instructions. Again, the burden on the senior, and often technologically challenged senior SME attorneys, is fairly light under these Version 4.0 procedures.

The SME team relies on a primary SME, who is typically the trial lawyer in charge of the whole case, including all communications on relevance to the judge and opposing counsel. Thereafter, the head SME is sometimes only consulted on an as-needed basis to answer questions and make specific decisions on the grey area documents. There are always a few uncertain documents that need elevation to confirm relevance, but as the review progresses, their number usually decreases, and so the time and attention of the senior SME decreases accordingly.

Go on to the Thirteen Class.

Or pause to do this suggested “homework” assignment for further study and analysis.

SUPPLEMENTAL READING: This is a very important step and class. To really understand how this second step works you need to see it in action several times. It is never the exact same process, but instead depends on the circumstances of the search. The basic idea is to use every search tool you have, except of course AI ranking, to locate the easy-to-find documents, the low hanging fruit. In some projects this step only takes a few hours, in other it may take days. In TREC 2016 where we were looking to set speed records, we would usually take less than an hour, but we were searching a known dataset, the Jeb Bush email. For background reading on step two we suggest you look at articles on e-discovery and early case assessment. This is one of those ambiguous terms in e-discovery that can mean many different things, many different activities, many of them outside of our step two. It is good to have a familiarity with all of them. Predictive coding experts must also become early case assessment experts.

If you have not already done so, read Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three. In step two your first ideas of relevance meet the data you intend to classify. If this meeting does not somehow alter your ides of relevance, at least somewhat, then chances are you are not paying attention.

EXERCISES: Consider how much more effective 26(f) conferences would be if both sides had completed this second step before the meeting. Try doing that in your next case. Leave us a comment on how this changed the meeting and discovery plan process. Also try doing all of the first seven steps before the conferences with opposing counsel. Notice how much easier the process becomes. Do not have the authority to accelerate the document review process to try these exercises? Plead your case to whomever is in charge. Try describing this as an experiment to improve your methods and abilities. Perhaps offer this to the client at an introductory reduced price (or free) service to show them the benefits.

As a separate exercise, ask your preferred vendor to help you to better understand one of the search tools on their platform that you have not used very much before or are otherwise not that familiar with. Ask for some free training just on that feature from their best expert. If your e-discovery vendor does not offer such guidance and free training, you should consider a new vendor. In addition to training from experts, just try out a new search feature on your own. See how it works for yourself. Ask questions. Most good software today has a multitude of search features. If yours does not, you should consider using different software. The days on one-trick ponies are long gone. The best approach is variety.

Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!

_

e-Discovery Team LLC COPYRIGHT 2017

ALL RIGHTS RESERVED

_


Legal Search Science

December 15, 2016

Gold_Lexie_robotLegal Search Science is an interdisciplinary field concerned with the search, review, and classification of large collections of electronic documents to find information for use as evidence in legal proceedings, for compliance to avoid litigation, or for general business intelligence. See Computer Assisted ReviewLegal Search Science as practiced today uses software with artificial intelligence features to help lawyers to find electronic evidence in a systematic, repeatable, and verifiable manner. See:  TAR Training Course (sixteen class course teaches our latest insights and methods of Predictive Coding 4.0) and the over sixty or so articles on the subject that I have written since mid-2011. The hybrid search method of AI human computer interaction developed in this field that will inevitably have a dramatic impact on the future practice of law. Lawyers will never be replaced entirely by robots embodying AI search algorithms, but the productivity of some lawyers using AI will allow them to do the work of dozens, if not hundreds of lawyers.

My own experience (Ralph Losey) provides an example. I participated in a study in 2013 where I searched and reviewed over 1.6 Millions documents by myself, with only the assistance of one computer – one robot, so to speak – running AI-enhanced software by Kroll Ontrack. I was able to do so more accurately and faster than large teams of lawyers working without artificial intelligence software. I was even able to work faster and more accurately than all other teams of lawyers and vendors that used AI-enhanced software, but did not use the science-based search methods described here. I do not attribute my success to my own intelligence, or any special gifts or talents. (They are very moderate.) I was able to succeed by applying the established scientific methods described here and in more detail in our TAR Training Course. They allowed me to augment my own small intelligence with that of the machine. If I have any special skills, it is in human-computer interaction, and legal search intuition. They are based on my long experience in the law with evidence (over 35 years), and in my experience in the last few years using predictive coding software.

Team_Triangle_2Legal Search Science as I understand it is a combination and subset of three fields of study: Information Science, the legal field of Electronic Discovery, and the engineering field concerned with the design and creation of Search Software. Its primary concern is with information retrieval and the unique problems faced by lawyers in the discovery of relevant evidence.

Most specialists in legal search science use a variety of search methods when searching large datasets. The use of multiple methods of search is referred to here as a multimodal approach. Although many search methods are used at the same time, the primary, or controlling search method in large projects is typically what is known as supervised or semi-supervised machine learning.  Semi-supervised learning is a type of artificial intelligence (AI) that uses an active learning approach. I refer to this as AI-enhanced review or AI-enhanced search. In information science it is often referred to as active machine learningand in legal circles as Predictive Coding.

For reliable introductory information on Legal Search Science see the works of attorney, Maura Grossman, and her information scientist partner, Professor Gordon Cormack, including:

The Grossman-Cormack Glossary explains that in machine learning:

Supervised Learning Algorithms (e.g., Support Vector Machines, Logistic Regression, Nearest Neighbor, and Bayesian Classifiers) are used to infer Relevance or Non-Relevance of Documents based on the Coding of Documents in a Training Set. In Electronic Discovery generally, Unsupervised Learning Algorithms are used for Clustering, Near-Duplicate Detection, and Concept Search.

For another perspective see the over sixty or so articles on the subject that I have written since mid-2011. They are listed in rough chronological order, with the most recent on top. The most important of these articles is Predictive Coding 4.0.

Legal Search ScienceMultimodal search uses both machine learning algorithms and unsupervised learning search tools (clustering, near-duplicates and concept), as well as keyword search and even some limited use of traditional linear search. This is further explained here in the section below entitled, Hybrid Multimodal Bottom Line Driven Review. The hybrid multimodal aspects described represent the consensus view among information search scientists. The bottom line driven aspects represent my legal overlay on the search methods. All of these components together make up what I call Legal Search Science. It represents a synthesis of knowledge and search methods from science, law, and software engineering.

The key definition of the Glossary is for Technology Assisted Review, their term for AI-enhanced review.

Technology-Assisted Review (TAR): A process for Prioritizing or Coding a Collection of Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of Documents and then extrapolates those judgments to the remaining Document Collection. Some TAR methods use Machine Learning Algorithms to distinguish Relevant from Non-Relevant Documents, based on Training Examples Coded as Relevant or Non-Relevant by the Subject Matter Experts(s), …. TAR processes generally incorporate Statistical Models and/or Sampling techniques to guide the process and to measure overall system effectiveness.

The Grossman-Cormack Glossary makes clear the importance of Subject Matter Experts (SMEs) by including their use as the document trainer into the very definition of TAR. Nevertheless, experts agree that good predictive coding software is able to tolerate some errors made in the training documents. For this reason experiments are being done on ways to minimize the central role of the SMEs, to see if lesser-qualified persons could also be used in document training, at least to some degree. See Webber & Pickens, Assessor Disagreement and Text Classifier Accuracy (SIGIR, 2013); John Tredennick, Subject Matter Experts: What Role Should They Play in TAR 2.0 Training? (2013). These experiments are of special concern to software developers and others who would like to increase the utilization of AI-enhanced software because, at the current time, very few SMEs in the law have the skills or time necessary to conduct AI-enhanced searches. This is one reason that predictive coding is still not widely used, even though it has been proven effective in multiple experiments and adopted by several courts.

Professor Oard

Professor Oard

For in-depth information on key experiments already performed in the field of Legal Search Science, see the TREC Legal Track reports whose home page is maintained by a leader in the field, information scientist, Doug Oard. Professor Oard is a co-founder of the TREC Legal track. Also see the research and reports of Herb Rotiblat and the Electronic Discovery Institute, and my papers on TREC (and otherwise as listed below): Analysis of the Official Report on the 2011 TREC Legal Track – Part OnePart Two and Part Three; and Secrets of Search: Parts OneTwo, and Three.

For general legal background on the field of Legal Search Science see the works of the attorney co-founder of TREC Legal Track, Jason R. Baron, including:

Baron_at_blackboardAs explained in Baron and Freeman’s Quick Peek at the Math, and my blog introduction thereto, the supervised learning algorithms behind predictive coding utilize a hyper-dimensional space. Each each document in the dataset, including its metadata, represent a different dimension mapped in trans-Cartesian space, called hyper-planes. Each document is placed according to a multi-dimensional dividing line of relevant and irrelevant. The important document ranking feature of predictive coding is performed by measure as to how far from the dividing line a particular document lies. Each time a training session is run the line moves and the ranking fluctuates in accordance with the new information provided. The below diagram attempts to portray this hyperplane division and document placement. The points shown in red designate irrelevant documents and the blue points relevant documents. The dividing line would run through multiple dimensions, not just the usual two of a Cartesian graph. This is depicted in this diagram by folding fields. For more read the entire Quick Peek article.hyperplanes3d_2

For a scientific and statistical view of Legal Search Science that is often at least somewhat intelligible to lawyers and other non-scientists, see the blog of information scientist and consultant, William Webber, Evaluating e-DiscoveryAlso see the many judicial opinions approving and encouraging the use of predictive coding.

AI-Enhanced Search Methods

AI-enhanced search represents an entirely new method of legal search, which requires a completely new approach to large document reviews. Below is the diagram of the latest Predictive Coding 4.0 workflow I use in a typical predictive coding project.

predictive_coding_4-0_web

For a full description of the eight steps take our free sixteen class online training program. See: TAR Training Course

I have found that proper AI-enhanced review is the most interesting and exciting activity in electronic discovery law. Predictive Coding version 3.0 is The tool that we have all been waiting for. When used properly, i.w. Version 4.0, good AI-enhanced software such as Mr.EDR, allows attorneys to find the information they need in vast stores of ESI, and to do so in an effective and affordable manner.

Hybrid Human Computer Information Retrieval

human-and-robots

Further, in contradistinction to Borg approaches, where the machine controls the learning process, I advocate a hybrid approach where Man and Machine work together. In my hybrid search and review projects the expert reviewer remains in control of the process, and their expertise is leveraged for greater accuracy and speed. The human intelligence of the SME is a key part of the search process. In the scholarly literature of information science this hybrid approach is known as Human–computer information retrieval (HCIR). (My thanks to information scientist Jeremy Pickens for pointing out this literature to me.)

The classic text in the area of HCIR, which I endorse, is Information Seeking in Electronic Environments (Cambridge 1995) by Gary Marchionini, Professor and Dean of the School of Information and Library Sciences of U.N.C. at Chapel Hill. Professor Marchionini speaks of three types of expertise needed for a successful information seeker:

  1. Domain Expertise. This is equivalent to what we now call SME, subject matter expertise. It refers to a domain of knowledge. In the context of law the domain would refer to the particular type of lawsuit or legal investigation, such as antitrust, patent, ERISA, discrimination, trade-secrets, breach of contract, Qui Tam, etc. The knowledge of the SME on the particular search goal is extrapolated by the software algorithms to guide the search. If the SME also has the next described System Expertise and Information Seeking Expertise, they can run the search project themselves. That is what I like to call the Army of One approach. Otherwise, they will need a chauffeur or surrogate with such expertise, one who is capable of learning enough from the SME to recognize the relevant documents.
  2. System Expertise. This refers to expertise in the technology system used for the search. A system expert in predictive coding would have a deep and detailed knowledge of the software they are using, including the ability to customize the software and use all of its features. In computer circles a person with such skills is often called a power-user. Ideally a power-user would have expertise in several different software systems. They would also be an expert in one or more particular method of search.
  3. Information Seeking Expertise. This is a skill that is often overlooked in legal search. It refers to a general cognitive skills related to information seeking. It is based on both experience and innate talents. For instance, “capabilities such as superior memory and visual scanning abilities interact to support broader and more purposive examination of text.” Professor Marchionini goes on to say that: “One goal of human-computer interaction research is to apply computing power to amplify and augment these human abilities.” Some lawyers seem to have a gift for search, which they refine with experience, broaden with knowledge of different tools, and enhance with technologies. Others do not.

Id. at pgs.66-69, with the quotes from pg. 69.

All three of these skills are required for an legal team to have the expertise in legal search today, which is one reason I find this new area of legal practice so interesting and exciting. See:  TAR Training Course

Predictive_coding_triangles

It is not enough to be an SME, or a power-user, or have a special knack for search. You need a team that has it all, and great software. However, studies have shown that of the three skill-sets, System Expertise, which in legal search primarily means mastery of the particular software used (Power User), is the least important. Id. at 67. The SMEs are more important, those who have mastered a domain of knowledge. In Professor Marchionini’s words:

Thus, experts in a domain have greater facility and experience related to information-seeking factors specific to the domain and are able to execute the subprocesses of information seeking with speed, confidence, and accuracy.

Id. That is one reason that the Grossman Cormack glossary quoted before builds in the role of SMEs as part of their base definition of technology assisted review. Glossary at pg. 21 defining TAR.

According to Marchionini, Information Seeking Expertise, much like Subject Matter Expertise, is also more important than specific software mastery. Id. This may seem counter-intuitive in the age of Google, where an illusion of simplicity is created by typing in words to find websites. But legal search of user-created data is a completely different type of search task than looking for information from popular websites. In the search for evidence in a litigation, or as part of a legal investigation, special expertise in information seeking is critical, including especially knowledge of multiple search techniques and methods. Again quoting Professor Marchionini:

Expert information seekers possess substantial knowledge related to the factors of information seeking, have developed distinct patterns of searching, and use a variety of strategies, tactics and moves.

Id. at 70.

In the field of law this kind of information seeking expertise includes the ability to understand and clarify what the information need is, in other words, to know what you are looking for, and articulate the need into specific search topics. This important step precedes the actual search, but is an integral part of the process. As one of the basic texts on information retrieval written by Gordon Cormack, et al, explains:

Before conducting a search, a user has an information need, which underlies and drives the search process. We sometimes refer to this information need as a topic …

Buttcher, Clarke & Cormack, Information Retrieval: Implementation and Evaluation of Search Engines (MIT Press, 2010) at pg. 5. The importance of pre-search refining of the information need is stressed in the first step of the above diagram of my methods, ESI Discovery Communications. It seems very basic, but is often under appreciated, or overlooked entirely in the litigation context where information needs are often vague and ill-defined, lost in overly long requests for production and adversarial hostility.

Hybrid Multimodal Bottom Line Driven Review

I have a long name for what Marchionini calls the variety of strategies, tactics and moves that I have developed for legal search: Hybrid Multimodal. See: TAR Training Course. This sixteen class course teaches our latest insights and methods of Predictive Coding 4.0.. I refer to it as a multimodal method because, although the predictive coding type of searches predominate (shown on the below diagram as AI-enhanced review – AI), I also  use the other modes of search, including the mentioned Unsupervised Learning Algorithms (clustering and concept), keyword search, and even some traditional linear review (although usually very limited). As described, I do not rely entirely on random documents, or computer selected documents for the AI-enhanced searches, but use a three-cylinder approach that includes human judgment sampling and AI document ranking. The various types of legal search methods used in a multimodal process are shown in this search pyramid.

search_pyramid_revised

Most information scientists I have spoken to agree that it makes sense to use multiple methods in legal search and not just rely on any single method, even the best AI method. UCLA Professor Marcia J. Bates first advocated for using multiple search methods back in 1989, which she called it berrypicking. Bates, Marcia J. The Design of Browsing and Berrypicking Techniques for the Online Search Interface, Online Review 13 (October 1989): 407-424. As Professor Bates explained in 2011 in Quora:

An important thing we learned early on is that successful searching requires what I called “berrypicking.” … Berrypicking involves 1) searching many different places/sources, 2) using different search techniques in different places, and 3) changing your search goal as you go along and learn things along the way. This may seem fairly obvious when stated this way, but, in fact, many searchers erroneously think they will find everything they want in just one place, and second, many information systems have been designed to permit only one kind of searching, and inhibit the searcher from using the more effective berrypicking technique.

This berrypicking approach, combined with HCIR, is what I have found from practical experience works best with legal search. They are the Hybrid Multimodal aspects of my AI-Enhanced Review Bottom Line Driven Review method.

Why AI-Enhanced Search and Review Is Important

I focus on this sub-niche area of e-discovery because I am convinced that it is critical to advancement of the law in the 21st Century. The new search and review methods that I have developed from my studies and experiments in legal search science allow a skilled attorney using readily available predictive coding type software to review at remarkable rates of speed and cost. Review rates are more than 250-times faster than traditional linear review, and costs less than a tenth as much. See eg Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron, and the report by the Rand Corporation,  Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery.

Thanks to the new software and methods, what was considered impossible, even absurd, just a few short years ago, namely one attorney accurately reviewing over a million documents by him or herself in 14-days, is attainable by many experts. I have done it. That is when I came up with the Army of One motto and realized that we were at a John Henry moment in Legal Search. Maura tells me that she once did a seven-million document review by herself. Maura and Gordon were correct to refer to TAR as a disruptive technology in the Preface to their Glossary. Technology that can empower one skilled lawyer to do the work of hundreds of unskilled attorneys is certainly a big deal, one for which we have Legal Search Science to thank.

Ralph and some of his computers at one of his law offices

More Information On Legal Search Science

For further information on Legal Search Science see all of the articles cited above, along with the over sixty or so articles on the subject that I have written since mid-2011. Also enroll in our free 16 class TAR Training Course. This course teaches our latest insights and methods of Predictive Coding 4.0. Most of my articles were written for the general reader, some are highly technical but still accessible with study. All have been peer-reviewed in my blog by most of the founders of this field who are regular readers and thousands of other readers.

I am especially proud of the legal search experiments I have done using AI-enhanced search software provided to me by Kroll Ontrack to review the 699,083 public Enron documents and my reports on these reviews. Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents(Part Two); A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents. (Part One). I have been told by scientists in the field that my over 100 hours of search, consisting of two fifty-hour search projects using different methods, is the largest search project by a single reviewer that has ever been undertaken, not only in Legal Search, but in any kind of search. I do not expect this record will last for long, as others begin to understand the importance of Information Science in general, and Legal Search Science in particular.


Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Six

October 16, 2016

This is the sixth installment of the article explaining the e-Discovery Team’s latest enhancements to electronic document review using Predictive Coding. Here are Parts OneTwoThreeFour and Five. This series explains the nine insights behind the latest upgrade to version 4.0 and the slight revisions these insights triggered to the eight-step workflow. We have already covered the nine insights. Now we will begin to review the revised eight-step workflow.

predictive_coding_4-0_webThe eight-step chart provides a model of the Predictive Coding 4.0 methods. (You may download and freely distribute this chart without further permission, so long as you do not change it.) The circular flows depict the iterative steps specific to the predictive coding features. Steps four, five and six iterate until the active machine training reaches satisfactory levels and thereafter final quality control and productions are done.

Although presented as sequential steps for pedantic purposes, Predictive Coding 4.0 is highly adaptive to circumstances and does not necessarily follow a rigid linear order. For instance, some of the quality control procedures are used throughout the search and review, and rolling productions can begin at any time.

CULLING.filters_SME_only_reviewTo fully understand the 4.0 method, it helps to see how it is fits into an overall Dual-Filter Culling process. See License to Cull The Two-Filter Document Culling Method (2015) (see illustrative diagram right). Still more information on predictive coding and electronic document review can be found in the over sixty articles published here on the topic since 2011. Reading helps, but we have found that the most effective way to teach this method, like any other legal method, is by hands-on guidance. Our eight-step workflow can be taught to any legal professional who already has experience with document review by the traditional second-chair type of apprenticeship training.

This final segment of our explanation of Predictive Coding 4.0 will include some of the videos that I made earlier this year describing our document review methods. Document Review and Predictive Coding: an introductory course with 7 videos and 2,982 words. The first video below introduces the eight-step method. Once you get past my attempt at Star Wars humor in the opening credits of the video you will hear my seven-minute talk. It begins with why I think predictive coding and other advanced technologies are important to the legal profession and how we are now at a critical turning point of civilization.

_______

_______

Step One – ESI Communications

Business Discussion --- Image by © Royalty-Free/CorbisGood review projects begin with ESI Communications, they begin with talking. You need to understand and articulate the disputed issues of fact. If you do not know what you are looking for, you will never find it. That does not mean you know of specific documents. If you knew that, it would not be much of a search. It means you understand what needs to be proven at trial and what documents will have impact on judge and jury. It also means you know the legal bounds of relevance, including especially Rule 26(b)(1).

relevance_scope_2016

ESI Communications begin and end with the scope of the discovery, relevance and related review procedures. The communications are not only with opposing counsel or other requesting parties, but also with the client and the e-discovery team assigned to the case. These Talks should be facilitated by the lead e-Discovery specialist attorney assigned to the case. But they should include the active participation of the whole team, including all trial lawyers not otherwise very involved in the ESI review.

The purpose of all of this Talk is to give everyone an idea as to the documents sought and the confidentiality protections and other special issues involved. Good lines of communication are critical to that effort. This first step can sometimes be difficult, especially if there are many new members to the group. Still, a common understanding of relevance, the target searched, is critical to the successful outcome of the search. This includes the shared wisdom that the understanding of relevance will evolve and grow as the project progresses.

bullseye_arrow_hitWe need to Talk to understand what we are looking for. What is the target? What is the information need? What documents are relevant? What would a hot document look like? A common understanding of relevance by a review team, of what you are looking for, requires a lot of communication. Silent review projects are doomed to failure. They tend to stagnate and do not enjoy the benefits of Concept Drift, where a team’s understanding of relevance is refined and evolves as the review progresses. Yes, the target may move, and that is a good thing. See: Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three.

Missed_targetReview projects are also doomed where the communications are one way, lecture down projects where only the SME talks. The reviewers must talk back, must ask questions. The input of reviewers is key. Their questions and comments are very important. Dialogue and active listening are required for all review projects, including ones with predictive coding.

You begin with analysis and discussions with your client, your internal team, and then with opposing counsel, as to what it is you are looking for and what the requesting party is looking for. The point is to clarify the information sought, the target. You cannot just stumble around and hope you will know it when you find it (and yet this happens all too often). You must first know what you are looking for. The target of most searches is the information relevant to disputed issues of fact in a case or investigation. But what exactly does that mean? If you encounter unresolvable disputes with opposing counsel on the scope of relevance, which can happen during any stage of the review despite your best efforts up-front, you may have to include the Judge in these discussions and seek a ruling.

Here is my video explaining the first step of ESI Communications.

________

______

talk_friendlyESI Discovery Communications” is about talking to your review team, including your client, key witnesses; it is about talking to opposing counsel; and, eventually, if need be, talking to the judge at hearings. Friendly, informal talk is a good method to avoid the tendency to polarize and demonize “the other side,” to build walls and be distrustful and silent.

angry-messageThe amount of distrust today between attorneys is at an all-time high. This trend must be reversed. Mutually respectful talk is part of the solution. Slowing things down helps too. Do not respond to a provocative text or email until you calm down. Take your time to ponder any question, even if you are not upset. Take your time to research and consult with others first. This point is critical. The demand for instant answers is never justified, nor required under the rules of civil procedure. Think first and never respond out of anger. We are all entitled to mutual respect. You have a right to demand that. So do they.

iphonerlThis point about not actually speaking with people in realtime, in person, or by phone or video, is, to some extent, generational. Many younger attorneys seem to have an inherent loathing of the phone and speaking out loud. They let their thumbs do the talking. (This is especially true in e-discovery where the professionals involved tend to be very computer oriented, not people oriented. I know because I am like that.) Meeting in person in real-time is distasteful to many, not just Gen X. Many of us prefer to put everything in emails and texts and tweets and posts, etc. That may make it easier to pause to reflect, especially if you are loathe to say in person that you do not know and will need to get back to them on that. But real time talking is important to full communication. You may need to force yourself to real-time interpersonal interactions. Many people are better at real-time talk than others, just like many are better at fast comprehension of documents than others. It is often a good idea for a team to have a designated talker, especially when it comes to speaking with opposing counsel or the client.

In e-discovery, where the knowledge levels are often extremely different, with one side knowing more about the subject than the other, the fist step of ESI Communications or Talk usually requires patient explanations. ESI Communications often require some amount of educational efforts by the attorneys with greater expertise. The trick is to do that without being condescending or too pedantic, and, in my case at least, without losing your patience.

predictive_coding_4-0_8-steps_ist

Some object to the whole idea of helping opposing counsel by educating them, but the truth is, this helps your clients too. You are going to have to explain everything when you take a dispute to the judge, so you might as well start upfront. It helps save money and moves the case along. Trust building is a process best facilitated by honest, open talk.

ralph_listening_4I use of the term Talk to invoke the term listen as well. That is one reason we also refer to the first step as “Relevance Dialogues” because that is exactly what it should be, a back and forth exchange. Top down lecturing is not intended here. Even when a judge talks, where the relationship is truly top down, the judge always listens before rendering his or her decision. You are given the right to be heard at a hearing, to talk and be listened to. Judges listen a lot and usually ask many questions. Attorneys should do the same. Never just talk to hear the sound of your own voice. As Judge David Waxse likes to say, talk to opposing counsel as if the judge were listening.

judge_friendlyThe same rules apply when communicating about discovery with the judge. I personally prefer in-person hearings, or at least telephonic, as opposed to just throwing memos back and forth. This is especially true when the memorandums have very short page limits. Dear Judges: e-discovery issues are important and can quickly spiral out of control without your prompt attention. Please give us the hearings and time needed. Issuing easy orders that just split the baby will do nothing but pour gas on a fire.

In my many years of lawyering I have found that hearings and meetings are much more effective than exchanging papers. Dear brothers and sisters in the BAR: stop hating, stop distrusting and vilifying, and start talking to each other. That means listening too. Understand the other-side. Be professional. Try to cooperate. And stop taking extreme positions that assume the judge will just split the baby. 

talking_hearingIt bears emphasis that by Talk in this first step we intend dialogue. A true back and forth. We do not intend argument, nor winners and losers. We do intend mutual respect. That includes respectful disagreement, but only after we have heard each other out and understood our respective positions. Then, if our talks with the other side have reached an impasse, at least on some issues, we request a hearing from the judge and set out the issues for the judge to decide. That is how our system of justice and discovery are designed to work. If you fail to talk, you not only doom the document review project, you doom the whole case to unnecessary expense and frustration.

Richard BramanThis dialogue method is based on a Cooperative approach to discovery that was promoted by the late, great Richard Braman of The Sedona Conference. Cooperation is not only a best practice, but is, to a certain extent, a minimum standard required by rules of professional ethics and civil procedure. The primary goal of these dialogues for document review purposes is to obtain a common understanding of the e-discovery requests and reach agreement on the scope of relevancy and production.

ESI Communications in this first step may, in some cases, require disclosure of the actual search techniques used, which is traditionally protected by work product. The disclosures may also sometimes include limited disclosure of some of the training documents used, typically just the relevant documents. SAndrew J. Peckee Judge Andrew Peck’s 2015 ruling on predictive coding, Rio Tinto v. Vale, 2015 WL 872294 (March 2, 2015, SDNY). In Rio Tinto Judge Peck wisely modified somewhat his original views stated in Da Silva on the issue of disclosure. Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012) (approved and adopted in Da Silva Moore v. Publicis Groupe, 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012)). Judge Peck no longer thinks that parties should necessarily disclose any training documents, and may instead:

… insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as non-responsive. See generally Grossman & Cormack, Comments, supra, 7 Fed. Cts. L.Rev. at 301-12.

The Court, however, need not rule on the need for seed set transparency in this case, because the parties agreed to a protocol that discloses all non-privileged documents in the control sets. (Attached Protocol, ¶¶ 4(b)-(c).) One point must be stressed — it is inappropriate to hold TAR to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using TAR for review.

Id. at *3. Also see Rio Tinto v. Vale, Stipulation and Order Re: Revised Validation and Audit Protocols for the use of Predictive Coding in Discovery, 14 Civ. 3042 (RMB) (AJP), (order dated 9/2/15 by Maura Grossman, Special Master, and adopted and ordered by Judge Peck on 9/8/15).

Judge Peck here follows the current prevailing view on disclosure that I also endorse. Disclose the relevant documents used in active machine learning, but not the irrelevant documents used in training. If there are borderline, grey area documents classified as irrelevant, you may need to disclose these type of documents by description, not actual production. Again, talk to the requesting party on where you are drawing the line. Talk about the grey area documents that you encounter. If they disagree, ask for a ruling before your training is complete.

grey_area_disclosure

The goals of Rule 1 of the Federal Rules of Civil Procedure (just, speedy and inexpensive) are impossible in all phases of litigation, not just discovery, unless attorneys communicate with each other. The parties may hate each other and refuse to talk. That sometimes happens. But the attorneys must be above the fray. That is a key purpose and function of an attorney in a dispute. It is sad that so many attorneys do not seem to understand that. If you are faced with such an attorney, my best advice is to lead by example, document the belligerence and seek the help of your presiding judge.

vulcan-mind-meldAlthough Talk to opposing counsel is important, even more important is talking within the team. It is an important method of quality control and efficient project management. Everyone needs to be on the same page of relevance and discoverability. Work needs to be coordinated. Internal team Talk needs to be very close. Although a Vulcan mind meld might be ideal, it is not really necessary. Still, during a project a steady flow of talk, usually in the form of emails or chats, is normal and efficient. Clients should never complain about time spent communicating to manage a document review project. It can save a tremendous amount of money in the long run, so long as it is focused on the task at hand.

Step Two – Multimodal ECA

Multimodal Early Case Assessment – ECA – summarizes the second step in our 8-step work flow. We used to call the second step “Multimodal Search Review.” It is still the same activity, but we tweaked the name to emphasize the ECA significance of this step. After we have an idea of what we are looking for from ESI Communications in step one, we start to use every tool at our disposal to try to find the relevant documents. Every tool that is, except for active machine learning. Our first look at the documents is our look, not the machine’s. That is not because we do not trust the AI’s input. We do. It is because there is no AI yet. The predictive coding only begins after you feed training documents into the machine. That happens in step four.

predictive_coding_4-0_web

NIST-Logo_RLOur Multimodal ECA step-two does not take that long, so the delay in bringing in our AI is usually short. In our experiments at TREC in 2015 and 2016 under the auspicious of NIST, where we skipped steps three and seven to save time, and necessarily had little ESI Communications in step one, we would often complete simple document reviews of several hundred thousand documents in just a few hours. We cannot match these results in real-life legal document review projects because the issues in law suits are usually much more complicated than the issues presented by most topics at TREC. Also, we cannot take the risk of making mistakes in a real legal project that we did in an academic event like TREC.

Again, the terminology revision to say Multimodal ECA is more a change of style than substance. We have always worked in this manner. The name change is just to better convey the idea that we are looking for the low hanging fruit, the easy to find documents. We are getting an initial assessment of the data by using all of the tools of the search pyramid except for the top tier active machine learning. The AI comes into play soon enough in steps four and five, sometimes as early as the same day.

search_pyramid_revised

I have seen projects where key documents are found during the first ten minutes of looking around. Usually the secrets are not revealed so easily, but it does happen. Step two is the time to get to know the data, run some obvious searches, including any keyword requests for opposing counsel. You use the relevant and irrelevant documents you find in step two as the documents you select in step four to train the AI.

In the process of this initial document review you start to get a better understanding of the custodians, their data and relevance. This is what early case assessment is all about. You will find the rest of the still hidden relevant documents in the iterated rounds of machine training and other searches that follow. Here is my video description of step two.

______

_______

Although we speak of searching for relevant documents in step two, it is important to understand that many irrelevant documents are also incidentally found and coded in that process. Active machine learning does not work by training on relevant documents alone. It must also include examples of irrelevant documents. For that reason we sometimes actively search for certain kinds of irrelevant documents to use in training. One of our current research experiments with Kroll Ontrack is to determine the best ratios between relevant and irrelevant documents for effective document ranking. See TREC reports at Mr. EDR as updated from time to time. At this point we have that issue nailed.

The multimodal ECA review in step two is carried out under the supervision of the Subject Matter Experts on the case. They make final decisions where there is doubt concerning the relevance of a document or document type. The SME role is typically performed by a team, including the partner in charge of the case – the senior SME – and senior associates, and e-Discovery specialist attorney(s) assigned to the case. It is, or should be, a team effort, at least in most large projects. As previously described, the final arbitrator on scope is made by the senior SME, who in turn is acting as the predictor of the court’s views. The final, final authority is always the Judge. The chart below summarizes the analysis of the SME and judge on the discoverability of any document. See Predictive Coding 4.0, Part Five.

relevance_scope_2016

When I do a project, acting as the e-Discovery specialist attorney for the case, I listen carefully to the trial lawyer SME as he or she explains the case. By extensive Q&A the members of the team understand what is relevant. We learn from the SME. It is not exactly a Vulcan mind-meld, but it can work pretty well with a cohesive team.  Most trial lawyers love to teach and opine on relevance and their theory of the case.

Helmuth Karl Bernhard von Moltke

General Moltke

Although a good SME team communicates and plans well, they also understand, typically from years of experience, that the intended relevance scope is like a battle plan before the battle. As the famous German military strategist, General Moltke the Elder said: No battle plan ever survives contact with the enemy. So too no relevance scope plan ever survives contact with the corpus of data. The understanding of relevance will evolve as the documents are studied, the evidence is assessed, and understanding of what really happened matures. If not, someone is not paying attention. In litigation that is usually a recipe for defeat. See Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three.

Army of One: Multimodal Single-SME Approach To Machine LearningThe SME team trains and supervises the document review specialists, aka, contract review attorneys, who usually then do a large part of the manual reviews (step-six), and few if any searches. Working with review attorneys is a constant iterative process where communication is critical. Although I sometimes use an army-of-one approach where I do everything myself (that is how I did the EDI Oracle competition and most of the TREC topics), my preference now is to use two or three reviewers to help with the document review. With good methods, including culling methods, and good software, it is rarely necessary to use more reviewers than that. With the help of strong AI, say that included in Mr. EDR, we can easily classify a million or so documents for relevance with that size team. More reviewers than that may well be needed for complex redaction projects and other production issues, but not for a well-designed first-pass relevance search.

One word of warning when using document reviewers, it is very important for all members of the SME team to have direct and substantial contact with the actual documents, not just the reviewers. For instance, everyone involved in the project should see all hot documents found in any step of the process. It is especially important for the SME trial lawyer at the top of the expert pyramid to see them, but that is rarely more than a few hundred documents, often just a few dozen. Otherwise, the top SME need only see the novel and grey area documents that are encountered, where it is unclear on which side of the relevance line they should fall in accord with the last instructions. Again, the burden on the senior, and often technologically challenged senior SME attorneys, is fairly light under these Version 4.0 procedures.

The SME team relies on a primary SME, who is typically the trial lawyer in charge of the whole case, including all communications on relevance to the judge and opposing counsel. Thereafter, the head SME is sometimes only consulted on an as-needed basis to answer questions and make specific decisions on the grey area documents. There are always a few uncertain documents that need elevation to confirm relevance, but as the review progresses, their number usually decreases, and so the time and attention of the senior SME decreases accordingly.

Step Three – Random Prevalence

Control-SetsThere has been no change in this step from Version 3.0 to Version 4.0. The third step, which is not necessarily chronological, is essentially a computer function with statistical analysis. Here you create a random sample and analyze the results of expert review of the sample. Some review is thus involved in this step and you have to be very careful that it is correctly done. This sample is taken for statistical purposes to establish a baseline for quality control in step seven. Typically prevalence calculations are made at this point. Some software also uses this random sampling selection to create a control set. As explained at length in Predictive Coding 3.0, we do not use a control set because it is so unreliable. It is a complete waste of time and money and does not produce reliable recall estimates. Instead, we take a random sample near the beginning of a project solely to get an idea on Prevalence, meaning the approximate number of relevant documents in the collection.

predictive_coding_4-0_web

Unless we are in a very rushed situation, such as in the TREC projects, where we would do a complete review in a day or two, or sometimes just a few hours, we like to take the time for the sample and prevalence estimate.

It is all about getting a statistical idea as to the range of relevant documents that likely exist in the data collected. This is very helpful for a number of reasons, including proportionality analysis (importance of the ESI to the litigation and cost estimates) and knowing when to stop your search, which is part of step seven. Knowing the number of relevant documents in your dataset can be very helpful, even if that number is a range, not exact. For example, you can know from a random sample that there are between four thousand and six thousand relevant documents. You cannot know there are exactly five thousand relevant documents. See: In Legal Search Exact Recall Can Never Be Known. Still, knowledge of the range of relevant documents (red in the diagram below) is helpful, albeit not critical to a successful search.

Prevalence_binomial_gaussian

In step three an SME is only needed to verify the classifications of any grey area documents found in the random sample. The random sample review should be done by one reviewer, typically your best contract reviewer. They should be instructed to code as Uncertain any documents that are not obviously relevant or irrelevant based on their instructions and step one. All relevance codings should be double checked, as well as Uncertain documents. The senior SME is only consulted on an as-needed basis.

Document review in step three is limited to the sample documents. Aside from that, this step is a computer function and mathematical analysis. Pretty simple after you do it a few times. If you do not know anything about statistics, and your vendor is also clueless on this (rare), then you might need a consulting statistician. Most of the time this is not necessary and any competent Version 4.0 vendor expert should be able to help you through it.

thumb_ruleIt is not important to understand all of the math, just that random sampling produces a range, not an exact number. If your sample size is small, then the range will be very high. If you want to reduce your range in half, which is a function in statistics known as a confidence interval, you have to quadruple your sample size. This is a general rule of thumb that I explained in tedious mathematical detail several years ago in Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022. Our Team likes to use a fairly large sample size of about 1,533 documents that creates a confidence interval of plus or minus 2.5%, subject to a confidence level of 95% (meaning the true value will lie within that range 95 times out of 100). More information on sample size is summarized in the graph below. Id.

random_size_graph

The picture below this paragraph illustrates a data cloud where the yellow dots are the sampled documents from the grey dot total, and the hard to see red dots are the relevant documents found in that sample. Although this illustration is from a real project we had, it shows a dataset that is unusual in legal search because the prevalence here was high, between 22.5% and 27.5%. In most data collections searched in the law today, where the custodian data has not been filtered by keywords, the prevalence is far less than that, typically less than 5%, maybe even less that 0.5%. The low prevalence increases the range size, the uncertainties, and requires a binomial calculation adjustment to determine the statistically valid confidence interval, and thus the true document range.

data-visual_RANDOM_2

For example, in a typical legal project with a few percent prevalence range, it would be common to see a range between 20,000 and 60,000 relevant documents in a 1,000,000 collection. Still, even with this very large range, we find it useful to at least have some idea of the number of relevant documents that we are looking for. That is what the Baseline step can provide to you, nothing more nor less.

95 Percent Confidence Level with Normal Distribution 1.96As mentioned, your vendor can probably help you with these statistical estimates. Just do not let them tell you that it is one exact number. It is always a range. The one number approach is just a shorthand for the range. It is simply a point projection near the middle of the range. The one number point projection is the top of the typical probability bell curve range shown right, which illustrates a 95% confidence level distribution. The top is just one possibility, albeit slightly more likely than either end points. The true value could be anywhere in the blue range.

To repeat, the step three prevalence baseline number is always a range, never just one number. Going back to the relatively high prevalence example, the below bell cure shows a point projection of 25% prevalence, with a range of 22.2% and 27.5%, creating a range of relevant documents of from between 225,000 and 275,000. This is shown below.

25_bell-curve-Standard_deviation_diagram

confidence interval graph showing standard distribution and 50% prevalenceThe important point that many vendors and other “experts” often forget to mention, is that you can never know exactly where within that range the true value may lie. Plus, there is always a small possibility, 5% when using a sample size based on a 95% confidence level, that the true value may fall outside of that range. It may, for example, only have 200,000 relevant documents. This means that even with a high prevalence project with datasets that approach the Normal Distribution of 50% (here meaning half of the documents are relevant), you can never know that there are exactly 250,000 documents, just because it is the mid-point or point projection. You can only know that there are between 225,000 and 275,000 relevant documents, and even that range may be wrong 5% of the time. Those uncertainties are inherent limitations to random sampling.

Shame on the vendors who still perpetuate that myth of certainty. Lawyers can handle the truth. We are used to dealing with uncertainties. All trial lawyers talk in terms of probable results at trial, and risks of loss, and often calculate a case’s settlement value based on such risk estimates. Do not insult our intelligence by a simplification of statistics that is plain wrong. Reliance on such erroneous point projections alone can lead to incorrect estimates as to the level of recall that we have attained in a project. We do not need to know the math, but we do need to know the truth.

The short video that follows will briefly explain the Random Baseline step, but does not go into the technical details of the math or statistics, such as the use of the binomial calculator for low prevalence. I have previously written extensively on this subject. See for instance:

Byte and Switch

If you prefer to learn stuff like this by watching cute animated robots, then you might like: Robots From The Not-Too-Distant Future Explain How They Use Random Sampling For Artificial Intelligence Based Evidence Search. But be careful, their view is version 1.0 as to control sets.

Thanks again to William Webber and other scientists in this field who helped me out over the years to understand the Bayesian nature of statistics (and reality).

_______

_____

To be continued …


Predictive Coding 3.0 – The method is here described as an eight-part work flow

October 18, 2015

This is the second part of a two-part article. Part One of Predictive Coding 3.0 described the errors in Predictive Coding 1.0 and 2.0, errors that are corrected by 3.0. The primary error addressed was the fallacy of the secret control set. The control set is very much accepted dogma among most e-discovery vendors and their hired experts. Still, after Part One came out, a few well-known experts spoke publicly in support of my anti-vendor-establishment critique. Many others have written to me privately to say they agree that control sets are b.s., that they have never used them, but few want to wade into the controversy. Never stopped me, especially when the attainment of just legal processes is concerned. Still, criticisms are easy. The articulation of positive replacements is the real challenge, and that is what this Part Two addresses.

____________

Losey.at.UFThis concluding segment describes the Predictive Coding 3.0 methodology in terms of an eight-step work flow. Steps four, five and six iterate until the active machine training reaches satisfactory levels, and thereafter final quality control and productions are done. Although presented as sequential steps for pedantic purposes, Predictive Coding 3.0 is highly adaptive to circumstances and does not necessarily follow a rigid linear order. For instance, some of the quality control procedures are used throughout the search and review, and rolling productions can begin at any time. Also, the truth is, the work flow is far easier to do, then it is to put in words. I have only rarely been the smartest guy in the room (and they were usually small rooms in rural Florida where I live and went to school) and so, if I can do all of this, then you can too. It is easier than it looks. It just takes some practice and experience. A good guide is also very helpful at first.

  ______________

Eight-Step Work Flow of Predictive Coding 3.0

predictive_coding_3.0

The eight-step chart provides a model of the Predictive Coding 3.0 methodology. The circular flows depict the iterative steps specific to the predictive coding features. (You may download and freely distribute this chart without further permission, so long as you do not change it.) For background on how to plan for a complex predictive coding document review project, see Form Plan of a Predictive Coding Project. The plan consists of detailed Outline for the project. To understand the 3.0 method, you also need to understand how it is fits into an overall Dual-Filter Culling process. See License to Cull The Two-Filter Document Culling Method (2015).

The overall process is not nearly as complicated as version 1.0 and 2.0, as Grossman and Cormack criticize in their patent claim. See end of Part One of Predictive Coding 3.0 where this is discussed. I have found that it can be taught to any experienced lawyer in a second-chair type of hands-on training. Mere intellectual descriptions, as I am doing here, and have done before in the over fifty or so articles on the subject, can serve as a good preparation for effective apprenticeship training. The following is a full description of the work flow. It should look very familiar to prior readers of my articles on predictive coding. It is consistent with these prior articles, but has several important refinements and improvements that have emerged from my ongoing research and legal practice experience.

Step One: ESI Discovery Communications 

vulcan-mind-meldThe process starts with ESI Discovery Communications, not only with opposing counsel or other requesting parties, but also with the client and within the e-discovery team assigned to the case. Analysis of the scope of the discovery, and clear communications on relevance and other review procedures, are critical to all successful project management. The ESI Discovery Communications should be facilitated by the lead e-Discovery specialist attorney assigned to the case. But they must include the active participation by the whole team, including all trial lawyers not otherwise very involved in the ESI review. These communications are facilitated by a master plan, the details of which are refined in these initial communications. See eg. Form Plan of a Predictive Coding Project. Since nobody seems to have Spock’s Vulcan mind-meld abilities, this first step can sometimes be difficult, especially if there are many new members to the group. Still, a common understanding of relevance, the target searched, is critical to the successful outcome of the search. This includes the shared wisdom that this understanding will evolve and grow as discussed in Part One of this essay.

You begin with analysis and discussions with your client, your internal team, and then with opposing counsel, as to what it is you are looking for and the requesting party is looking for. The point is to clarify the information sought, the target. You cannot just stumble around and hope you will know it when you find it (and yet this happens all too often in legal search). You must first know what you are looking for. The target of most searches is the information relevant to disputed issues of fact in a case or investigation. But what exactly does that mean? If you encounter unresolvable disputes with opposing counsel on the scope of relevance, which can happen during any stage of the review despite your best efforts up-front, you may have to include the Judge in these discussions and seek a ruling.

Richard BramanThis dialogue approach is based on a Cooperative approach to discovery that was popularized by the late, great Richard Braman of the Sedona Conference. Cooperation is not only a best practice, but is, to a certain extent at least, a minimum standard required by rules of professional ethics and civil procedure. The primary goal of these dialogues for Predictive Coding purposes is to obtain a common understanding of the e-discovery requests, and reach agreement on the scope of relevancy and production. Additional conferences on other e-discovery issues are also key to attaining the now strongly rule endorsed doctrine of proportionality.

The dialogues in this first step may, in some cases, require disclosure of the actual search techniques used, which is traditionally protected by work product. The disclosures may also sometimes include limited disclosure of some of the training documents used, both relevant and irrelevant. Nothing in the rules requires disclosure of irrelevant ESI, but if adequate privacy protections are provided, it may be in the best interests of all parties to do so. Such discretionary disclosures may be advantageous as risk mitigation and efficiency tactics. If an agreement on search protocol is reached by the parties, or imposed by the court, the parties are better protected from the risk of expensive motion practice and repetitions of discovery search and production. Agreement on search protocols can also be used to implement bottom line driven proportional review practices. See Eg. the first case approving predictive coding search protocols by Judge Andrew Peck: Da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012) (approved and adopted in Da Silva Moore v. Publicis Groupe, 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012)) and the many thereafter that followed Da Silva.

Andrew J. PeckAlso see Judge Andrew Peck’s more recent ruling on predictive coding, especially concerning disclosures: Rio Tinto v. Vale, 2015 WL 872294 (March 2, 2015, SDNY). Here Judge Peck wisely modifies somewhat his original views stated in Da Silva on the issue of disclosure. He no longer thinks that parties should necessarily disclose training documents, and may instead:

… insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as non-responsive. See generally Grossman & Cormack, Comments, supra, 7 Fed. Cts. L.Rev. at 301-12.

The Court, however, need not rule on the need for seed set transparency in this case, because the parties agreed to a protocol that discloses all non-privileged documents in the control sets. (Attached Protocol, ¶¶ 4(b)-(c).) One point must be stressed — it is inappropriate to hold TAR to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using TAR for review.

Id. at *3. Also see Rio Tinto v. Vale, Stipulation and Order Re: Revised Validation and Audit Protocols for the use of Predictive Coding in Discovery, 14 Civ. 3042 (RMB) (AJP), (order dated 9/2/15 by Maura Grossman, Special Master, and adopted and ordered by Judge Peck on 9/8/15).

Judge Peck here follows the current prevailing view on disclosure that I also endorse, a view entirely in accord with Predictive Coding 3.0. Note that the review of quality control samples is specified in Step Seven, ZEN Quality Assurance Tests, of the 3.0 methodology. The clear trend today is away from full disclosure, especially for irrelevant documents. Counsel is advised to offer translucency, not transparency, and to run quality control tests of the efficacy of their work. The cooperative approach to discovery may sometimes require partial disclosure of relevant documents used for training, but only partial or otherwise limited disclosure of irrelevant documents used in training. Still, the polestar remains cooperation, a goal totally consistent with the protection of client rights and interests. Mancia v. Mayflower Begins a Pilgrimage to the New World of Cooperation, 10 Sedona Conf. J. 377 (2009 Supp.).

Step Two: Multimodal Search Review

Multimodal Search PyramidIn this step all types of search methods are used to try to find as many relevant documents as possible for the training rounds. In version 3.0 the samples found by the multimodal search methods in Step Two are selected by human judgment, not by random samples. The selections are made with the help of various software search features, including parametric Boolean keyword searches, similarity searches, and concept searches, and even strategic linear reviews of select custodians and date ranges. Documents outside of the dataset such a subpoenas or complaints may be included for training purposes too, even synthetic documents may be used as ideal exemplars.

predictive_coding_3.0All type of searches are used in Step Two except for Predictive coding based searches. They only reason they are not used here in Step Two is because you have not yet started predictive coding training. Step two is the search for the initial training set. It can be a long process, or a very short one. The same multimodal search process is carried out in Step-6, Hybrid Active Training, but now predictive coding is also used. So in that sense Step Six is where full multimodal search comes into play, including interaction with the AI you are training (that is the Hybrid part).

Although we speak of searching for relevant documents in Steps Two and Six, it is important to understand that many irrelevant documents are also incidentally found and coded in that process. Active machine learning does not work by training on relevant documents alone. It must also include examples of irrelevant documents. For that reason we sometimes actively search in Steps Two and Six for certain kinds of irrelevant documents to use in training. One of my current research experiments with Kroll Ontrack is to determine the best ratios between relevant and irrelevant documents for effective document ranking. See TREC reports at Mr. EDR as updated from time to time. This is one area where experience, art and skill now come into play, but we are working on standardizing that.

The multimodal search review in Steps Two and Six is carried out under the very general, second level supervision of the Subject Matter Experts on the case. They make final decisions where there is doubt concerning the relevance of a document or document type. The SME role is typically performed by a team, including the partner in charge of the case – the senior SME – and senior associates, and e-Discovery specialist attorney(s) assigned to the case. It is, or should be, a team effort, at least in most large projects.

The old-fashioned Predictive Coding 1.0 and 2.0 notions that a senior partner must work alone as the sole SME, and that he or she has to toil for days reviewing thousands of documents, including random junk files in a supposed control set, is not part of Predictive Coding 3.0. With no control set there is no need for such an inefficient process. Under my system a well-managed project has no SME time-demand problem. When I do a project, acting as the e-Discovery specialist attorney for the case, I listen carefully to the trial lawyer SME as he or she explains the case. By extensive Q&A the members of the team understand what is relevant. We learn from the SME. It is not exactly a Vulcan mind-meld, but it can work pretty well with a cohesive team.  Most trial lawyers love to teach and opine on relevance and their theory of the case.

Helmuth_Karl_Bernhard_von_MoltkeAlthough a good SME team communicates and plans well, they also understand, typically from years of experience, that the intended relevance scope is like a battle plan before the battle: No battle plan ever survives contact with the enemy. So too no relevance scope plan never survives contact with the corpus of data. The understanding of relevance will evolve as the documents are studied, the evidence is assessed, and understanding of what really happened matures. If not, someone is not paying attention. In litigation that is usually a recipe for defeat.

The SME team trains and supervises the document review specialists, aka, contract review attorneys, who usually then do a large part of the manual reviews (Step-Five), and few if any searches. Working with review attorneys is a constant iterative process where communication is critical. Although contract reviewers can be used for efficiency and money-saving purposes, instead of an army-of-one approach that I have also used, I typically use only a few reviewers, say from one to three. With good methods, including culling methods, and good software, it is rarely necessary to have more than that. With the help of strong AI, say that included in Mr. EDR, no more attorneys than that are needed to classify a million or so documents for relevance. More reviewers than that may well be needed for complex redaction projects and other production issues, but not for a well-designed relevance search.

When reviewers are used in relevance culling, it is very important for all members of the SME team to have direct and substantial contact with the actual documents, not just the reviewers. For instance, everyone involved in the project should see all hot documents found in any step of the process. It is especially important for the SME trial lawyer at the top of the expert pyramid to see them, but that is rarely more than a few hundred documents, often just a few dozen. Otherwise, the top SME need only see the novel and grey area documents that are encountered, where it is unclear on which side of the relevance line they should fall in accord with the last instructions. Again, the burden on the senior, and often technologically challenged senior SME attorneys, is fairly light under these Version 3.0 procedures.

The hands-on involvement of the entire SME team is especially needed in the second step, Multimodal search, and its echo Step Six, but is otherwise limited. The SME involvement up-front is needed to ensure that proper expertise is provided on relevance and the expected story to be told at trial. In some projects, at least one contract lawyer is brought in at Step Two to assist the SME team, and then later help in training of additional reviewers when they are included in Step Five. The e-Discovery specialist with expertise and experience with search, the Experienced Searcher, along with an expert on the software being used, the Power-User, should be involved in all stages of the project. Often these two roles (Power User and Experienced Searcher) are performed by one search expert, but rarely is that person also the sole SME of the legal issues. (I performed all three roles in the EDI Oracle experiment, but that was a rare exception.) In most real-world projects a team approach to the SME function is used. Still, the Experienced Searcher should always be a part if that SME team, if for no other reason than to ensure that the full communications outlined in Step One are maintained throughout the project.

The SME team relies on a primary SME, who is typically the trial lawyer in charge of the whole case, including all arguments of relevance to the judge and opposing counsel, at the start of the review. Thereafter, the head SME is only consulted on an as-needed basis to answer questions and make specific decisions on the grey area documents, again, typically in the echo Step Six, Hybrid Active Training, and Step Five, Document Review, as questions are raised by reviewers. There are always uncertain documents that need elevation to confirm relevance, but as the review progresses, their number usually decreases, and so the time and attention of the senior SME decreases accordingly.

The first round of machine training is also sometimes called the initial Seed Set Buildbut under 3.0 there is nothing special about it. The following training rounds are identified by number (assuming you even keep track of them at all), such as the second round of training, the third, etc. The only thing special about the first round of training is that it cannot include rank-based document searches because no predictive coding ranking has yet occurred. The ranking of documents according to probable relevance is only established by the machine training. So, of course, it cannot be used before the first training begins. It is instead used in Step Six Hybrid Active Training.

lexinton-inkedPersonally, I like to keep track of and control when the training happens, as opposed to having training running continuously in the background. That is where art and experience again come it. It is also where the man-machine hybrid aspects of my search methods come in. I like to see the impact on ranking of particular training documents. I like to see how it impacts the learning of Mr. EDR. If it is always on, you cannot really see it on a granular, document by document level. The conscious knowledge of training rounds is not a mandatory aspect of Predictive Coding 3.0, but does help me to maintain a close hybrid relationship with the AI in the software, the ghost in the machine. This is one of the things, for me at least, that makes predictive coding so much fun. Working with Mr. EDR can be a real blast. I hope to explain this a little better later in this essay, and in other essays that I plan to write to in the future on the joys of predictive coding.

Step Three: Random Baseline

dice_manyThe third step, which is not necessarily chronological, is essentially a computer function with statistical analysis. Here you create a random sample and analyze the results of expert review of the sample. Some review is thus involved in this step and you have to be very careful it is correctly done. This sample is taken for statistical purposes to establish a baseline for quality control purposes in Step Seven. Typically prevalence calculations are made at this point. Some software also uses this random sampling selection for purposes of a control set creation. As explained in Part One, Predictive Coding 3.0 does not use a control set, because it is so unreliable. In version 3.0 the sole purpose of the sample is to determine prevalence. Also see: In Legal Search Exact Recall Can Never Be Known. This can help guide your review and help you to decide when to stop training and move from the last iterative cycle of Step Six, into Step Seven – ZEN Quality Assurance Tests.

In Step Three an SME is only needed to verify the classifications of any grey area documents found in the random sample. The random sample review should be done by one reviewer, typically your best contract reviewer. They should be instructed to code as Uncertain any documents that are not obviously relevant or irrelevant based on their instructions and Step One. All relevance codings should be double checked, as well as Uncertain documents. The senior SME is only consulted on an as-needed basis.

Document review in Step Three is limited to the sample documents. Aside from that, this step is a computer function and mathematical analysis. Pretty simple after you do it a few times. In complex cases a consulting statistician or scientist might be needed for a short consult, especially if you want to go beyond simple random sampling and do stratification, or some other complex variation. Most of the time this is not necessary and any competent version 3.0 vendor expert should be able to help you through it.

Step Four: AI Predictive Ranking

Lexington-Web_basicThis is the Auto Coding Run where the software’s predictive coding calculations are performed. The software I use, at least most of the time, is Kroll Ontrack’s Mr. EDR. In the Fourth Step the software does all of the work. It applies all of the training provided by the lawyers to sort the data corpus according to their instructions. In Step Four the human trainers can take a coffee break while Mr. EDR ranks all of the documents for us according to probable relevance, or whatever other category we request. For instance, I usually like to train and rank on Highly Relevant and Privilege at the same time as plain Relevant – Irrelevant.

The first time the training runs used to be called the seed set training. Step Four repeats, with steps Five and Six, in an iterative process, which is also known as Continuous Active learning (CAL). The first repetition of the training is known as the second round of training, the next, the third round, etc. These iterations continue until the training is complete within the proportional constraints of the case. At that point the attorney in charge of the search may declare the search complete and ready for the next quality assurance test in Step Seven.

predictive_coding_3.0It is important to understand that this diagram is just a linear two-dimensional representation of Predictive Coding 3.0 for teaching purposes. These step descriptions are also a simplified explanation, at least to some extent. Step Four can take place just a soon as a single document has been coded. You could have continuous, ongoing machine training, all the time, if you wanted. That is what CAL means. Although it would be inefficient, you could in theory have as many rounds of training as there are documents reviewed and classified. In my TREC experiments with Mr. EDR, we would sometimes have over fifty rounds of training, and still complete the Topic review in just over a day.

As mentioned, I personally do not like the machine to train at certain arbitrarily set time intervals, which is the way most continuous training CAL 2.0 software does it (i.e. – every fifteen minutes). I like to be in control and to tell the machine exactly when and if to train. I do that to improve communication and understanding of the software ranking. It helps me to have a better intuitive understanding of the machine processes. It allows me to see for myself how a particular document, or usually a particular group of documents, impacts the overall ranking. This is an important part of the Hybrid aspects of the Predictive Coding 3.0 Hybrid Multimodal Method.

Lexie_robot_red_stickerStep Four in the eight-step workflow is a purely algorithmic function. The ranking of a million documents may take as long as an hour, or even more, depending on the complexity, the number of documents, and software. Or it might just take a few minutes. This depends on the circumstances and tasks presented. From the human trainer perspective Step Four is just slight break to relax and keep the mind clear, while the computer does all of the work.

The predictive coding software in this step is analyzing all of the document categorizations made in Step Three for the initial run, the seed set. Thereafter in all subsequent training rounds, when Step Four repeats, the Machine, for me Mr. EDR, not only uses the input from Steps Two and Three, but also the new documents reviewed in Step Five, and found and selected for training coded in Step Six. Note that skilled searchers rarely use all documents coded as training documents, and that is where the art and experience of search come in again. The concern is to avoid over-training on any one document type and thus lowering recall and missing a key black-swan document. There is also the question of the ideal relevance/irrelevance ratio for effective document ranking.

All documents selected for training are included in this Step Four computer processing. The software studies the documents marked for training, and then scans all of the data uploaded onto the review platform (aka, the corpus). It then ranks all of the documents according to probable relevance (and, as mentioned according to other categories too, such as Highly Relevant and Privilege, and does all of these categories at the same time, but for simplicity purposes here we will just consider the relevance rankings). It essentially assigns a probable value of from 0.01% to 99.9% probable relevance to each document in the corpus. (Note, some software uses different ranking values, but this is essentially what it is doing.) A value of 99.9% represents the highest probability that the document matches the category trained, such as relevant, or highly relevant, or privileged. A value of 0.01% means no likelihood of matching. A probability ranking of 50% represents equal likelihood. The machine is uncertain as to the document classification.

The first few times this AI-Ranking step is run, the software predictions as to a document’s categorization are often wrong, sometimes wildly so. It depends on the kind of search and data involved, and the number of documents already classified and included for training. That is why spot-checking and further training are always needed for predictive coding to work properly.

Predictive Ranking at this point in AI development is necessarily an iterative process where human feedback is provided throughout the process. Analytic software in the future may be far less dependent on human involvement in the iterative process, but for now it is critical. That is where the next two Steps Five and Six come in, Document Review and Hybrid Active Training.

Step Five: Document Review

This is the step where most of the actual document review is done, where the documents are seen and classified by human reviewers. Note that I also sometimes refer to this step as Multimodal Search Review to emphasize that more than review takes place here. All types of search may also be conducted in this step and the next to find and batch out documents for human review and machine training. This step thus parallels Step Two except that documents are also found by ranking of probable relevance. This is not yet possible in Step Two because Step Four of of AI Predictive Ranking has not yet occurred.

In my experience, the human document review can take as little as one-second per document, assuming your software is good and fast, and it is an obvious document, to as long as a half-hour. The lengthy time to review a document is rare and only occurs where you have to fast-read a long document to be sure of its classification. Step five is the human time intensive part of Predictive Coding 3.0 and can take most of the time. Although, when I do a review, I usually spend more than half of the time in the other steps, sometimes considerable more. The TREC experiment was a good example of that, so was the Oracle EDI experiment.

4-5-6-only_predictive_coding_3.0

Depending on the classification during Step Five Document Review, a document is either produced, if relevant and not-privileged, or not produced if irrelevant. If relevant and privileged, then it is logged, but not produced. If relevant, not privileged, but confidential for some reason, then it is either redacted and/or specially labeled before production. The special labeling performed is typically to prominently affix the word CONFIDENTIAL on the Tiff image production, or the phrase CONFIDENTIAL – ATTORNEYS EYES ONLY. The actual wording of the legends depends upon the parties confidentiality agreement or court order.

When redaction is required, the total time to review a document can sometimes go way up. The same goes for double and triple checking of privileged documents that sometime infect document collections in large numbers. In my TREC and Oracle experiments redactions and privilege double-checking were not required. The time-consuming redactions are often deferred to Step Eight – Productions. The equally as time-consuming privilege double-checking efforts can also be deferred to Step Seven – Quality Assurance, and again for a third-check in Step Eight.

When reviewing a document not already manually classified, the reviewer is usually presented with a document that the expert searcher running the project has determined is probably relevant. Typically this means it has higher than a 50% probable relevant ranking. The reviewer may, or may not, know the ranking. Whether you disclose that to a reviewer depends on a number of factors. Since I usually only use highly skilled reviewers, I trust them with disclosure. But sometimes you may not want to disclose the ranking.

During the review many documents predicted to be relevant, will not be. The reviewers will code them correctly, as they see them. If they are in doubt, they should consult the SME team. Furthermore, special quality controls in the form of second reviews on a random, or judgmental, selection process may be imposed on Man Machine disagreements. They often involve close questions and the ultimate results of the resolved conflicts are typically used in the next round of training. That is a decision made in Step Six. Prediction error corrections can be the focus of special searches in Step Six that look for such conflicts. Most quality version 3.0 software such as Mr. EDR have search functions built-in that are designed to locate all such conflicts. Reviewers then review and correct the computer errors by a variety of methods, or change their own prior decisions. This typically requires SME team involvement, but only very rarely senior level SMEs.

The predictive coding software learns from all of corrections to its predictive rankings. Steps 4 and 5 then repeat as shown in the diagram. This iterative process is considered a positive feedback loop that continues until the computer predictions are accurate enough to satisfy the proportional demands of the case.

Step Six: Hybrid Active Training

man_robotIn this step new documents are selected for review in the next iteration of Step Five. Moreover, in Step Six decisions are made as to what documents to include in training in the next round of Step Four, AI Predictive Ranking. Step Six is much like Step Two, Multimodal Search Review, except that now new types of document ranking search are possible. Since the documents are now all probability ranked in Step Four, you can use this ranking to select documents for the next round of document review (Step Five). For instance, the research of Cormack and Grossman, has shown that selection of the highest ranked documents can be a very effective method to continuously find and train relevant documents. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic DiscoverySIGIR’14, July 6–11, 2014, at pg. 9. Also see Latest Grossman and Cormack Study Proves Folly of Using Random Search for Machine Training – Parts One,  TwoThree and Four. Another popular method, also tested and reported on by Grossman and Cormack, is to select mid-ranked documents, the ones the computer is uncertain about.

The preferred active learning process in the iterative machine learning steps of Predictive Coding 3.0 is now four-fold. How you mix and match the four methods is a matter of personal preference. Here are my team’s current preferences.

1. High Ranked Documents. My team will almost always look to see what the highest unreviewed ranked documents are after AI Predictive Ranking, Step Four. We may review them on a document by document basis, or only by spot-checking them. In the later, more common spot-checking scenario, a quick review of a certain probable relevant range, say all documents ranked between 95% to 99.9% (Mr. EDR has no 100%), may show that they all seem obvious relevant. We may then bulk code all documents in that range as relevant without actually reviewing them. This is a very powerful and effective method with Mr. EDR, and other software (so I’ve heard), so long as care is used not to over-extend the probability range. In other situations, we may only select the 99%+ probable relevant set for checking and bulk coding without review. The safe range typically changes as the review evolves and your latest conception of relevance is successfully imprinted on the computer.

EDR_Cape_found_itIn our cases the most enjoyable part of the review project comes when we see that Mr. EDR has understood our training and gone beyond us. He starts to see patterns that we cannot. He amazingly unearths documents that my team never thought to look for. The relevant documents he finds are sometimes dissimilar to any others found. They do not have the same key words, or even be the same known concepts. Still, Mr. EDR sees patterns in these documents that we do not. He finds the hidden gems of relevance, even outliers and black swans. That is when we think of Mr. EDR as going into superhero mode. At least that is the way my e-Discovery Team likes to talk about him.

By the end of most projects Mr. EDR attains a much higher intelligence and skill level than our own (at least on the task of finding the relevant evidence in the document collection). He is always lightening fast and inexhaustible, even untrained, but by the end of his education, he becomes a genius. Definitely smarter than any human as to this one task. Mr. EDR in that kind of superhero mode is what makes Predictive Coding 3.0 so much fun.

Watching AI with higher intelligence than your own, intelligence which you created by your training, is exciting. More than that, the AI you created empowers you to do things that would have been impossible before, absurd even. For instance, using Mr. EDR, my e-Discovery Team of three attorneys was able to do 30 review projects and classify 16,576,820 documents in 45 days. See TREC experiment summary at Mr. EDR. This is a very gratifying feeling of empowerment and augmentation of our own abilities. The high-AI experience comes though very clearly in the ranking of Mr. EDR near the end of the project, or really anytime before that, when he catches on to what you want and starts to find the hidden gems. I urge you all to give Predictive Coding 3.0 a try so you can have this same kind of advanced AI hybrid excitement.

Mr_EDR_Uncertain2. Mid-Ranked Uncertain Documents. We often choose to allow the machine, in our case Mr. EDR, to select the documents for review in the next iterated Step Five. We listen to what Mr. EDR tells us are the documents he wants to see. These are documents where the software classifier is uncertain of the correct classification. They are usually in the 40% to 60% probable relevant range. Human guidance on these documents as to their relevance helps the machine to learn by adding diversity to the documents presented for review. This in turn also helps to locate outliers of a type the initial judgmental searches in Step Two and Five may have missed.

3. Random. We may also select some documents at random, either by proper computer random sampling or, more often, by informal random selection, including spot-checking. This again helps maximize recall and premature focus on the relevant documents initially retrieved. Random samples taken in Steps Three and Seven are typically also all included for training, and, of course, are always very carefully reviewed. The use of random selection for training purposes alone is minimized in Predictive Coding 3.0.

4. Multimodal Human Search. Most of the time when not following the machine’s high ranked selection we are using whatever search method we can to try to find relevant documents in Step Six. It is a multimodal search process, except this time we can also use a variety of document ranking based searches. As mentioned, the ranked searches are not available in Step Two because the active machine learning had not already begun. The searches may include some linear review of selected custodians or dates, parametric Boolean keyword searches, similarity searches of all kinds, concept searches, as well as several unique predictive coding probability searches. We call that a multimodal approach. Again, you need not limit these searches to ESI in the original dataset, but can also use outside documents such a subpoenas or complaints; even synthetic documents may be used as ideal exemplars.

Step Seven: ZEN Quality Assurance Tests

ZEN here stands for Zero Error Numerics. Predictive Coding 3.0 requires quality control activities in all steps, but the efforts peak in this Step Seven. For more on the ZEN approach to quality control in document review see ZeroErrorNumerics.com.ZenBIn Step Seven a random sample is taken to try to evaluate the recall range attained in the project. The method currently favored is described in detail in Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part ThreeAlso see: In Legal Search Exact Recall Can Never Be Known.

ZENumerics

ei-recallThe ei-Recall test is based on a random sample of all documents to be excluded from the Final Review for possible production. Unlike the ill-fated control set of Predictive Coding 1.0 methodologies, the sample here is taken at the end of the project. At that time the final relevance conceptions have evolved to their final form and therefore much more accurate projections of recall can be made from the sample. The documents sampled can be based on documents excluded by category prediction (i.e. probable irrelevant) and/or by probable ranking of documents with proportionate cut-offs. The focus is on a search for any false negatives (i.e., relevant documents incorrectly predicted to be irrelevant) that are Highly Relevant or otherwise of significance.

Total 100% recall of all relevant documents is said by the professors to be scientifically impossible (unless you produce all documents, 0% precision), a myth that I predict will soon be shattered. In any event, be it either impossible or very rare, total recall of all relevant document is legally unnecessary. The legal requirement is reasonable, proportional efforts to find the ESI that is important to resolve the key disputed issues of fact in the case. The goal is to avoid all false negatives of Highly Relevant documents. If this error is encountered, one or more additional iterations of Steps 4, 5 and 6 are required.

In Step Seven you also make and test the decision to stop the training (the repetition of Steps Four, Five and Six). This decision is evaluated by the random sample, but determined by a complex variety of factors that can be case specific. Typically it is determined by when the software has attained a highly stratified distribution of documents. See License to Kull: Two-Filter Document Culling and Visualizing Data in a Predictive Coding ProjectPart One, Part Two and Part Three, and Introducing a New Website, a New Legal Service, and a New Way of Life / Work; Plus a Postscript on Software Visualization.

When the stratification has stabilized you will see very few new documents found as predicted relevant that have not already been human reviewed and coded as relevant. You essentially run out of documents for Step Five review. Put another way, your Step Six no longer uncovers new relevant documents. This exhaustion marker may in many projects mean that the rate of newly found documents has slowed, but not stopped entirely. I have written about this quite a bit, primarily in Visualizing Data in a Predictive Coding ProjectPart One, Part Two and Part Three. The distribution ranking of documents in a mature project that has likely found all relevant documents of interest will typically look something like the diagram below. We call this the upside down champagne glass with red relevant documents on top and irrelevant on the bottom.data-visual_Round_5

Also see Postscript on Software Visualization where even more dramatic stratifications are encountered and shown.

Another key determinant of when to stop is the cost of further review. Is it worth it to continue on with more iterations of Steps Four, Five and Six? See Predictive Coding and the Proportionality Doctrine: a Marriage Made in Big Data, 26 Regent U. Law Review 1 (2013-2014). Another criteria in the stop decision is whether you have found the information needed. If so, what is the purpose of continuing a search? Again, the law never requires finding all relevant, only reasonable efforts to find the relevant documents needed to decide the important fact issues in the case. This last point is often overlooked by inexperienced lawyers.

Step Eight: Phased Production

This last step is where the documents are actually produced. Technically, it has nothing to do with a predictive coding protocol, but for completeness sake, I wanted to include it in the work flow. This final step may also include document redaction, document labeling, and a host of privilege review issues, including double-checking, triple checking of privilege protocols. These are tedious functions where contract lawyers can be a big help. The actual identification of privileged documents from the relevant should have been part of the prior seven steps.

The production of electronic documents to the requesting party is done after a last quality control check of the media on which the production is made, typically CDs or DVDs. If you have to do FTP production to meet a tight deadline, I suggest also producing the same documents again the next day on a tangible media to keep a permanent record of the production. Always use a WORM medium for the production, meaning write once, and read many times. That means the data you produced cannot be altered. This might be helpful later for forensic purposes, along with hash, to confirm ESI authenticity and detect any changes.

The format of the production should be a non-issue. This is supposed to be discussed at the initial Rule 26(f) conference. Still, you might want to check again with the requesting party before you select the final production format and metadata fields. Remember, cooperation should be your benchmark and courtesy to opposing counsel on these small issues can go a long way. The existence of a clawback agreement and order, including a Rule 502(d) Order, should also be routinely verified before the production is made. Again, this should be a non-issue. The forms used should be worked out as part of the initial 26(f) meet and greet.

The final work included here is to prepare a privilege log. All good vendor review software should make this into a semi-automated process, and thus slightly less tedious. The logging is typically delayed until after production. Check with local rules on this and talk to the requesting party to let them know it is coming. Also, production is usually done in rolling stages as review is completed in order to buy more time and good will. As mentioned before, production of at least some documents can begin very early in the process and does not have to wait until the last step. Waiting to produce all of your documents at once is rarely a good idea, but is sometimes necessary.

predictive_coding_3.0

Conclusion

After talking to many scientists in the information retrieval world I have found that they all agree it is a good idea to find relevant documents for training in any way you can. It makes no sense to limit yourself to any one search method. They agree that multimodal is the way to go, even if they do not use that language (after all, I did make up the term). They also all agree that effective text retrieval searches today should use some type of active machine learning (what we in the legal world calls predictive coding), and not just rely on the old search methods of keyword, similarity and concept. The multimodal use of all of the old methods to find training documents for the new method of active machine learning, is clearly the way to go. This hybrid approach exemplifies man and machine working together in an active partnership, a union where the machine augments human search abilities, not replaces them.

The Hybrid Multimodal Predictive Coding 3.0 approach described here is still not followed by most e-discovery vendors, including several prominent software vendors. They instead rely entirely on machine selected documents for training, or even worse, rely entirely on random selected documents to train the software. Others use all search methods except for predictive coding, primarily just keyword searches. They do so to try to keep it simple they say. It may be simple, but the power and speed given up for that simplicity is not worth it.

superman_animated3The users of the old software and old-fashioned methods will never know the genuine thrill that most search lawyers using AI experience when watching really good AI in action. The good times roll when you see that the AI you have been training has absorbed your lessons. When you see the advanced intelligence that you helped create kick-in to complete the project for you. When you see your work finished in record time and with record results. It is sometimes amazing to see the AI find documents that you know you would never have found on your own. Predictive coding AI in superhero mode can be exciting to watch.

My entire e-Discovery Team had a great time watching Mr. EDR do his thing in the thirty Recall Track TREC Topics in 2015. We would sometimes be lost, and not even understand what the search was for anymore. But Mr. EDR knew, he saw the patterns hidden to us mere mortals. In those cases we would just sit back and let him do the driving, occasionally cheering him on. That is when my Team decided to give Mr. EDR a cape and superhero status. He never let us down. It is a great feeling to see your own intelligence augmented and save you like that. It was truly a hybrid human-machine partnership at its best. I hope you get the opportunity soon to see this in action for yourself.


%d bloggers like this: