e-Discovery Team’s 2016 TREC Report: Once Again Proving the Effectiveness of Our Standard Method of Predictive Coding

February 24, 2017

Team_TRECOur Team’s Final Report of its participation in the 2016 TREC ESI search Conference has now been published online by NIST and can be found here. TREC stands for Text Retrieval Conference. It is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The stated purpose of the annual TREC conference is to encourage research in information retrieval from large text collections.

The other co-sponsor of TREC is the United States Department of Defense. That’s right, the DOD is the official co-sponsor of this event, although TREC almost never mentions that. Can you guess why the DOD is interested? No one talks about it at TREC, but I have some purely speculative ideas. Recall that the NSA is part of the DOD.

We participated in one of several TREC programs in both 2015 and 2016, the one closest to legal search, called the Total Recall Track. The leaders, administrators of this Track were Professors Gordon Cormack and Maura Grossman. They also participated each year in their own track.

One of the core purposes of all of the Tracks is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is:

[T]o speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems.

Our participation in TREC in 2015 and 2016 has demonstrated substantial improvements in retrieval methodologies. That is what we set out to do. That is the whole point of the collaboration between the Department of Commerce and Department of Defense to establish TREC.

clinton_emailThe e-Discovery Team has a commercial interest in participation in TREC, not a defense or police interest. Although from what we saw with the FBI’s struggles to search email last year, the federal government needs help. We were very unimpressed by the FBI’s prolonged efforts to review the Clinton email collection. I was one of the few e-discovery lawyers to correctly call the whole Clinton email server “scandal” a political tempest in a teapot. I still do and I am still outraged by how her email review was handled by the FBI, especially with the last-minute “revelations.”

prism_nsaThe executive agencies of the federal government have been conspicuously absent from TREC. They seem incapable of effective search, which may well be a good thing. Still, we have to believe that the NSA and other defense agencies are able to do a far better job at large-scale search than the FBI. Consider their ongoing large-scale metadata and text interception efforts, including the once Top Secret PRISM operation. Maybe it is a good thing the NSA doe not share it abilities with the FBI, especially these days. Who knows? We certainly will not.

Mr_EDRThe e-Discovery Team’s commercial interest is to transfer Predictive Coding technology from our research labs into commercial products, namely transfer our Predictive Coding 4.0 Method using KrolL Discovery EDR software to commercial products. In our case at the present time “commercial products” means our search methods, time and consultations. But who knows, it may be reduced to a robot product someday like our Mr. EDR.

The e-Discovery Team method can be used on other document review platforms as well, not just Kroll’s, but only if they have strong active machine learning features. Active machine learning is what everyone at TREC was testing, although we appear to have been the only participant to focus on a particular method of operation. And we were the only team led by a practicing attorney, not an academic or software company. (Catalyst also fielded a team in 2015 and 2106 headed by Information Science Ph.D., Jeremy Pickens.)

Olympics-finish-line-Usain-Bolt-winsThe e-Discovery Team wanted to test the hybrid multimodal software methods we use in legal search to demonstrate substantial improvements in retrieval methodologies on real-world problems. We have now done so twice; participating in both the 2015 and 2016 Total Recall Tracks. The results in 2016 were even better than 2015. We obtained remarkable results in document review speed, recall and precision; although, as we admit, the search challenges presented at TREC 2016 were easier than most projects we see in legal discovery. Still, to use the quaint language of TREC, we have demonstrated the robustness of our methods and software.

These demonstrations, and all of the reporting and analysis involved, have taken hundreds of hours of our time, but there was no other venue around to test our retrieval methodologies on real-world problems. The demonstrations are now over. We have proven our case. Our standard Predictive Coding method has been tested and its effectiveness demonstrated. No one else has tested and proven their predictive coding methods as we have done. We have proven that our hybrid multimodal method of AI-Enhanced document review is the gold standard. We will continue to make improvements in our method and software, but we are done with participation in federal government programs to prove our standard, even one run by the National Institute of Standards and Technology.

predictive_coding_4-0_web

To prove our point that we have now demonstrated substantial improvements in retrieval methodologies, we quote below Section 5.1 of our official TREC report, but we urge you to read the whole thing. It is 164 pages. This section of our report covers our primary research question only. We investigated three additional research questions not included below.

__________

Section 5.1 First and Primary Research Question

What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all thirty-four topics using the Team’s Predictive Coding 4.0 hybrid multimodal search methods and Kroll Ontrack’s software, eDiscovery.com Review (EDR).

Again, as in the 2015 Total Recall Track, the Team attained very good results with high levels of Recall and Precision in all topics, including perfect or near perfect results in several topics using the corrected gold standard. The Team did so even though it only used five of the eight steps in its usual methodology, intentionally severely constrained the amount of human effort expended on each topic and worked on a dataset stripped of metadata. The Team’s enthusiasm for the record-setting results, which were significantly better than its 2015 effort, is tempered by the fact that the search challenges presented in most of the topics in 2016 were not difficult and the TREC relevance judgments had to be corrected in most topics.  …

This next chart uses the corrected standard. It is the primary reference chart we use to measure our results. Unfortunately, it is not possible to make any comparisons with BMI standards because we do not know the order in which the BMI documents were submitted.

trec-16_revised-all-results

The average results obtained across all thirty-four topics at the time of reasonable call using the corrected standard are shown below in bold. The average scores using the uncorrected standard are shown for comparison in parentheses.

  • 88.17% Recall (75.46%)
  • 64.94% Precision (57.12%)
  • 69.15% F1 (57.69%)
  • 124 Docs Reviewed Effort (124)

At the time of reasonable call the Team had recall scores greater than 90% in twenty-two of the thirty-four topics and greater than 80% in five more topics. Recall of greater than 95% was attained in fourteen topics. These Recall scores under the corrected standard are shown in the below chart. The results are far better than we anticipated, including six topics with total recall – 100%, and two topics with both total recall and perfect precision, topic 417 Movie Gallery and topic 434 Bacardi Trademark.

recall-scores-amended-2016

At the time of reasonable call the Team had precision scores greater than 90% in thirteen of the thirty-four topics and greater than 75% in three more topics. Precision of greater than 95% was attained in nine topics. These Precision scores under the corrected standard are shown in the below chart. Again, the results were, in our experience, incredibly good, including three topics with perfect precision at the time of the reasonable call.

precision-scores-amended-2016

At the time of reasonable call the Team had F1 scores greater than 90% in twelve of the thirty-four topics and greater than 75% in two more. F1 of greater than 90% was attained in eight topics. These F1 scores under the corrected standard are shown in the below chart. Note there were two topics with a perfect score, Movie Gallery (100%) and Bacardi Trademark (100%) and three more that were near perfect: Felon Disenfranchisement (98.5%), James V. Crosby (97.57%), and Elian Gonzalez (97.1%).

f1-scores-amended_2016

We were lucky to attain two perfect scores in 2016 (we attained one in 2015), in topic 417 Movie Gallery and topic 434 Bacardi Trademark. The perfect score of 100% F1 was obtained in topic 417 by locating all 5,945 documents relevant under the corrected standard after reviewing only 66 documents. This topic was filled with form letters and was a fairly simple search.

The perfect score of 100% F1 was obtained in topic 434 Bacardi Trademark by locating all 38 documents relevant under the corrected standard after reviewing only 83 documents. This topic had some legal issues involved that required analysis, but the reviewing attorney, Ralph Losey, is an SME in trademark law so this did not pose any problems. The issues were easy and not critical to understand relevance. This was a simple search involving distinct language and players. All but one of the 38 relevant documents were found by tested, refined keyword search. One additional relevant document was found by a similarity search. Predictive coding searches were run after the keywords searches and nothing new was uncovered. Here machine learning merely performed a quality assurance role to verify that all relevant documents had indeed been found.

The Team proved once again, as it did in 2015, that perfect recall and perfect precision is possible, albeit rare, using the Team’s methods and fairly simple search projects.

The Team’s top ten projects attained remarkably high scores with an average Recall of 95.66%, average Precision of 97.28% and average F-Measure: 96.42%. The top ten are shown in the chart below.

top-10_results

In addition to Recall, Precision and F1, the Team per TREC requirements also measured the effort involved in each topic search. We measured effort by the number of documents that were actually human-reviewed prior to submission and coded relevant or irrelevant. We also measured effort by the total human time expended for each topic. Overall, the Team human-reviewed only 6,957 documents to find all the 34,723 relevant documents within the overall corpus of 9,863,366 documents. The total time spent by the Team to review the 6,957 documents, and do all the search and analysis and other work using our Hybrid Multimodal Predictive Coding 4.0 method, was 234.25 hours. reviewed_data_pie_chart_2016

It is typical in legal search to try to measure the efficiency of a document review by the number of documents classified by an attorney in an hour. For instance, a typical contract review attorney can read and classify an average of 50 documents per hour. The Team classified 9,863,366 documents by review of 6,957 documents taking a total time of 234.25 hours. The Team’s overall review rate for the entire corpus was thus 42,106 files per hour (9,863,366/234.25).

In legal search it is also typical, indeed mandatory, to measure the costs of review and bill clients accordingly. If we here assume a high attorney hourly rate of $500 per hour, then the total cost of the review of all 34 Topics would be $117,125. That is a cost of just over $0.01 per document. In a traditional legal review, where a lawyer reviews one document at a time, the cost would be far higher. Even if you assume a low attorney rate of $50 per hour, and review speed of 50 files per hour, the total cost to review every document for every issue would be $9,863,366. That is a cost of $1.00 per document, which is actually low by legal search standards.13

Analysis of project duration is also very important in legal search. Instead of the 234.25 hours expended by our Team using Predictive Coding 4.0, traditional linear review would have taken 197,267 hours (9,863,366/50). In other words, the review of thirty-four projects, which we did in our part-time after work in one Summer, would have taken a team of two lawyers using traditional methods, 8 hours a day, every day, over 33 years! These kinds of comparisons are common in Legal Search.

Detailed descriptions of the searches run in all thirty-four topics are included in the Appendix.

___________

We also reproduce below Section 1.0, Summary of Team Efforts, from our 2016 TREC Report. For more information on what we learned in the 2016 TREC see alsoComplete Description in 30,114 Words and 10 Videos of the e-Discovery Team’s “Predictive Coding 4.0” Method of Electronic Document ReviewNine new insights that we learned in the 2016 research are summarized by the below diagram more specifically described in the article.

predictive_coding_six-three-2

_________

Excerpt From Team’s 2016 Report

1.1 Summary of Team’s Efforts. The e-Discovery Team’s 2016 Total Recall Track Athome project started June 3, 2016, and concluded on August 31, 2016. Using a single expert reviewer in each topic the Team classified 9,863,366 documents in thirty-four review projects.

The topics searched in 2016 and their issue names are shown in the chart below. Also included are the first names of the e-Discovery Team member who did the review for that topic, the total time spent by that reviewer and the number of documents manually reviewed to find all of the relevant documents in that topic. The total time of all reviewers on all projects was 234.25 hours. All relevant documents, totaling 34,723 by Team count, were found by manual review of 6,957 documents. The thirteen topics in red were considered mandatory by TREC and the remaining twenty-one were optional. The e-Discovery Team did all topics.

trec-2016-topics

They were all one-person, solo efforts, although there was coordination and communications between Team members on the Subject Matter Expert (SME) type issues encountered. This pertained to questions of true relevance and errors found in the gold standard for many of these topics. A detailed description of the search for each topic is contained in the Appendix.

In each topic the assigned Team attorney personally read and evaluated for true relevance every email that TREC returned as a relevant document, and every email that TREC unexpectedly returned as Irrelevant. Some of these were read and studied multiple times before we made our final calls on true relevance, determinations that took into consideration and gave some deference to the TREC assessor adjudications, but were not bound by them. Many other emails that the Team members considered irrelevant, and TREC agreed, were also personally reviewed as part of their search efforts. As mentioned, there was sometimes consultations and discussion between Team members as to the unexpected TREC opinions on relevance.

This contrasts sharply with participants in the Sandbox division. They never make any effort to determine where their software made errors in predicting relevance, or for any other reasons. They accept as a matter of faith the correctness of all TREC’s prior assessment of relevance. To these participants, who were all academic institutions, the ground truth itself as to relevance or not, was of no relevance. Apparently, that did not matter to their research.

All thirty-four topics presented search challenges to the Team that were easier, some far easier, than the Team typically face as attorneys leading legal document review projects. (If the Bush email had not been altered by omission of metadata, the searches would have been even easier.) The details of the searches performed in each of the thirty-four topics are included in the Appendix. The search challenges presented by these topics were roughly equivalent to the most simplistic challenges that the e-Discovery Team might face in projects involving relatively simple legal disputes. A few of the search topics in 2016 included quasi legal issues, more than were found in the 2015 Total Recall Track. This is a revision that the Team requested and appreciated because it allowed some, albeit very limited testing of legal judgment and analysis in determination of true relevance in these topics. In legal search relevancy, legal analysis skills are obviously very important. In most of the 2016 Total Recall topics, however, no special legal training or analysis was required for a determination of true relevance.

At Home participants were asked to track and report their manual efforts. The e-Discovery Team did this by recording the number of documents that were human reviewed and classified prior to submission. More were reviewed after submission as part of the Team’s TREC relevance checking. Virtually all documents human reviewed were also classified, although all documents classified were not used for active training of the software classifier. The Team also tracked effort by number of attorney hours worked as is traditional in legal services. Although the amount of time varied somewhat by topic, the average time spent per topic was only 6.89 hours. The average review and classification speed for each project was 42,106 files per hour (9,863,366/234.25).

Again, for the full picture and complete details of our work please see the complete 164 page report to TREC of the e-Discovery Team’s Participation in the 2016 Total Recall Track.

 

 

 

 



Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Seven

November 1, 2016

This is the seventh and last installment of the article explaining the e-Discovery Team’s latest enhancements to electronic document review using Predictive Coding. Here are Parts OneTwoThreeFourFive and Six. This series explains the nine insights behind the latest upgrade to version 4.0. It also explains slight revisions these insights triggered to the eight-step workflow. We have already covered the nine insights and the first three steps in our slightly revised eight-step workflow. We will now cover the remaining five steps.

predictive_coding_4-0_web

Steps Four, Five and Six – Training Select, AI Document Ranking and Multimodal Review

These are the three iterated steps that are the heart of our active machine learning process. The description of steps four, five and six constitutes the most significant change, although the content of what we actually do has not changed much. We have changed the iterated steps order by making a new step four – Training Select. We have also changed somewhat the descriptions in Predictive Coding Version 4.0. This was all done to better clarify and simplify what we are doing. This is our standard work flow. Our old description now seems somewhat confusing. As Steve Jobs famously said:

You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.

In our case it can help you to move mountains of data by proper use of active machine learning.

predictive_coding_4-0_4-5-6-steps

In version 3.0 we called these three iterated steps: AI Predictive Ranking (step 4), Document Review (step 5), and Hybrid Active Training (step 6). The AI Predictive Ranking step, now called AI Document Ranking, was moved from step four to step five. This is to clarify that the task of selecting documents for training always comes before the training itself. We also made Training Selection a separate step to emphasize the importance of this task. This is something that we have come to appreciate more fully over the past year.

black_box_SVMThe AI Document ranking step is where the computer does its thing. It is where the algorithm goes into action and ranks all of the documents according to the training documents selected by the humans. It is the unique AI step. The black box. No human efforts in step five at all. All we do is wait on the machine analysis. When it is done, all documents have been ranked (first time) or reranked (all training rounds after the first). We slightly tweaked the name here to be AI Document Ranking, instead of AI Predictive Ranking, as that is, we think, a clearer description of what the machine is doing. It is ranking all documents according to probability of relevance, or whatever other binary training you are doing. For instance, we usually also rank all documents according to probable privilege too and also according to high relevance.

Our biggest change here in version 4.0 is to make this AI step number five, instead of four, and, as mentioned, to add a new step four called Training Select. The new step four – Training Select – is the human function of deciding what documents to use to train the machine. (This used to be included in iterated step six, which was, we now see, somewhat confusing.) Unlike other predictive coding methods, we empower humans to make this selection in step four, Training Select. We do not, like some methods, create automatic rules for selection of training documents. For example, the Grossman Cormack CAL method (their trademark) only uses a predetermined number of the top ranked documents for training. In our method, we could also select these top ranked documents, or we could include other documents we have found to be relevant from other methods.

ralph_and_lexieThe freedom and choices that our method provides to the humans in charge is another reason our method is called Hybrid, in that it features natural human intelligence. It is not all machine controlled. In Predictive Coding 4.0 we use artificial intelligence to enhance or augment our own natural intelligence. The machine is our partner, our friend, not our competitor or enemy. We tell our tool, our computer algorithm, what documents to train on in step four, and when, and the machine implements in step five.

Typically in step four, Training Select, we will include all documents that we have previously coded as relevant as training documents, but not always. Sometimes, for instance, we may defer including very long relevant documents in the training, especially large spreadsheets, until the AI has a better grasp of our relevance intent. Skilled searchers rarely use all documents coded as training documents, but sometimes do. The same reasoning may apply to excluding a very short message, such as a one word message saying “call,” although we are more likely to leave that in. This selection process is where the art and experience of search come in. The concern is to avoid over-training on any one document type and thus lowering recall and missing a key black-swan document.

Justice_scaleAlso, we now rarely include all irrelevant documents into training, but instead used a balanced approach. Otherwise we tend to see incorrectly low rankings cross the board. The 50% plus dividing line can be an inaccurate indicator of probable relevant. It may instead go down to 40%, or even lower. We also find the balanced approach allows the machine to learn faster. Information scientists we have spoken with on this topic say this is typical with most types of active machine learning algorithms. It is not unique to our Mr. EDR, an active machine learning algorithm that uses an logistic regression method.

The sixth step of Multimodal Review is where we find new relevant or irrelevant documents for the next round of training. This is the step where most of the actual document review is done, where the documents are seen and classified by human reviewers. It is thus like step two, multimodal ECA. But now in step six we can also performed ranking searches, such as find all documents ranked 90% probable relevant or higher. Usually we rely heavily on such ranking searches.

We then human review all of the documents, which can often include very fast skimming and bulk coding. In addition to these ranked searches for new documents to review and code, we can use any other type of search we deem appropriate. This is the multimodal approach. Typically keyword and concept searches are used less often after the first round of training, but similarity searches of all kinds are often used throughout a project to supplement ranking based searches. Sometimes we may even use a linear search, expert manual review at the base of the search pyramid, if a new hot document is found. For instance, it might be helpful to see all communications that a key witness had on a certain day. The two-word stand-alone call me email when seen in context can sometimes be invaluable to proving your case.

search_pyramid_revised

predictive_coding_4-0_8-steps_istStep six is much like step two, Multimodal ECA, except that now new types of document ranking search are possible. Since the documents are now all probability ranked in step five, you can use this ranking to select documents for the next round of document review (step four). For instance, the research of Professors Cormack and Grossman has shown that selection of the highest ranked documents can be a very effective method to continuously find and train relevant documents. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic DiscoverySIGIR’14, July 6–11, 2014, at pg. 9. Also see Latest Grossman and Cormack Study Proves Folly of Using Random Search for Machine Training – Parts One,  TwoThree and Four. Another popular method, also tested and reported on by Grossman and Cormack, is to select mid-ranked documents, the ones the computer is uncertain about. They are less fond of that method, and we are too, but we will sometimes use it too.

The e-Discovery team’s preferred active learning process in the iterative machine learning steps of Predictive Coding 4.0 is still four-fold, just as it was in version 3.0. It is multimodal. How you mix and match the search methods is a matter of personal preference and educated response to the data searched. Here are my team’s current preferences for most projects. Again, the weight for each depends upon the project. The only constant is that more that one method is always used.

man_robot1. High Ranked Documents. My team will almost always look to see what the highest unreviewed ranked documents are after AI Ranking, step five. We agree with Cormack and Grossman that this is a very effective search. We may review them on a document by document basis, or only by spot-checking some of them. In the later spot-checking scenario, a quick review of a certain probable relevant range, say all documents ranked between 95% to 99.9% (Mr. EDR has no 100%), may show that they all seem obvious relevant. We may then bulk code all documents in that range as relevant without actually reviewing them. This is a very powerful and effective method with Mr. EDR, and other software, so long as care is used not to over-extend the probability range. In other situations, we may only select the 99%+ probable relevant set for checking and bulk coding with limited review. The safe range typically changes as the review evolves and your latest conception of relevance is successfully imprinted on the computer.

Note that when we say a document is selected without individual review – meaning no human actually read the document – that is only for purposes of training selection and identifying relevant documents for production. We sometimes call that first pass review. In real world projects for clients we always review each document found in steps four, five and six, that has not been previously reviewed by a human, before we produce the document. (This is not true in our academic or scientific studies for TREC or EDI/Oracle.) That takes place in the last step – step eight, Productions. To be clear, in legal practice we do not produce without human verification and review of each and every document produced. The stakes if an error is made are simply too high.

EDR_Cape_found_itIn our cases the most enjoyable part of the review project comes when we see from this search method that Mr. EDR has understood our training and has started to go beyond us. He starts to see patterns that we cannot. He amazingly unearths documents that our team never thought to look for. The relevant documents he finds are sometimes dissimilar to any others found. They do not have the same key words, or even the same known concepts. Still, Mr. EDR sees patterns in these documents that we do not. He finds the hidden gems of relevance, even outliers and black swans. That is when we think of Mr. EDR as going into superhero mode. At least that is the way my e-Discovery Team likes to talk about him.

By the end of most projects Mr. EDR attains a much higher intelligence and skill level than our own (at least on the task of finding the relevant evidence in the document collection). He is always lightening fast and inexhaustible, even untrained, but by the end of his education, he becomes a genius. Definitely smarter and faster than any human as to this one production review task. Mr. EDR in that kind of superhero mode is what makes Predictive Coding so much fun. See Why I Love Predictive Coding.

usain-bolt-smilingWatching AI with higher intelligence than your own, intelligence which you created by your training, is exciting. More than that, the AI you created empowers you to do things that would have been impossible before, absurd even. For instance, using Mr. EDR, my e-Discovery Team of three attorneys was able to do 30 review projects and classify 16,576,820 documents in 45 days. See TREC 2015 experiment summary at Mr. EDR. This was a very gratifying feeling of empowerment, speed and augmentation of our own abilities. The high-AI experience comes though very clearly in the ranking of Mr. EDR near the end of the project, or really anytime before that, when he catches on to what you want and starts to find the hidden gems. I urge you all to give Predictive Coding a try so you can have this same kind of advanced AI hybrid excitement.

Mr_EDR_Uncertain2. Mid-Ranked Uncertain Documents. We sometimes choose to allow the machine, in our case Mr. EDR, to select the documents for review in the sense that we review some of the mid-range ranked documents. These are documents where the software classifier is uncertain of the correct classification. They are usually in the 40% to 60% probable relevant range. Human guidance on these documents as to their relevance will sometimes help the machine to learn by adding diversity to the documents presented for review. This in turn also helps to locate outliers of a type the initial judgmental searches in step two and six may have missed. If a project is going well, we may not need to use this type of search at all.

dice_many3. Random and Judgmental Sampling. We may also select some documents at random, either by proper computer random sampling or, more often, by informal random selection, including spot-checking. The later is sometimes called judgmental sampling. These sampling techniques can help maximize recall by avoidance of a premature focus on the relevant documents initially retrieved. Random samples taken in steps three and six are typically also all included for training, and, of course, are always very carefully reviewed. The use of random selection for training purposes alone was minimized in Predictive Coding 3.0 and remains of lower importance in version 4.0. With today’s software, and using the multimodal method, it is not necessary. We did all of our TREC research without random sampling. We very rarely see the high-ranking searches become myopic without it. Plus, our multimodal approach guards against such over-training throughout the process.

4. Ad Hoc Searches Not Based on Document Ranking. Most of the time we supplement the machine’s ranking-based-searches with additional search methods using non-AI based analytics. The particular search supplements we use depends on the relevant documents we find in the ranked document searches. The searches may include some linear review of selected custodians or dates, parametric Boolean keyword searches, similarity searches of all kinds, concept searches. We use every search tool available to us. Again, we call that a multimodal approach.

More on Step Six  – Multimodal Review

predictive_coding_4.0As seen all types of search may be conducted in step six to find and batch out documents for human review and machine training. This step thus parallels step two, ECA, except that documents are also found by ranking of probable relevance. This is not yet possible in step two because step five of AI Document Ranking has not yet occurred.

It is important to emphasize that although we do searches in step six, steps six and eight are the steps where most of the actual document review is also done, where the documents are seen and classified by human reviewers. Search is used in step six to find the documents that human reviewers should review next. In my experience (and timed tests) the human document review can take as little as one-second per document, assuming your software is good and fast, and it is an obvious document, to as long as a half-hour. The lengthy time to review a document is very rare and only occurs where you have to fast-read a long document to be sure of its classification.

Ralph in the morning reading on his 17 inch MacProStep six is the human time intensive part of Predictive Coding 4.0 and can take most of the time in a project. Although when our top team members do a review, such as in TREC, we often spend more than half of the time in the other steps, sometimes considerably more.

Depending on the classifications during step six Multimodal Review, a document is either set for production, if relevant and not-privileged, or, if coded irrelevant, it is not set for production. If relevant and privileged, then it is logged but not produced. If relevant, not privileged, but confidential for some reason, then it is either redacted and/or specially labeled before production. The special labeling performed is typically to prominently affix the word CONFIDENTIAL on the Tiff image production, or the phrase CONFIDENTIAL – ATTORNEYS EYES ONLY. The actual wording of the legends depends upon the parties confidentiality agreement or court order.

When many redactions are required the total time to review a document can sometimes go way up. The same goes for double and triple checking of privileged documents that are sometimes found in document collections in large numbers. In our TREC and Oracle experiments redactions and privilege double-checking were not required. The time-consuming redactions are usually deferred to step eight – Productions. The equally as time-consuming privilege double-checking efforts can also be deferred to step seven – Quality Assurance, and again for a third-check in step eight.

When reviewing a document not already manually classified, the reviewer is usually presented with a document that the expert searcher running the project has determined is probably relevant. Typically this means that it has higher than a 50% probable relevance ranking. The reviewer may, or may not know the ranking. Whether you disclose that to a reviewer depends on a number of factors. Since I usually only use highly skilled reviewers, I trust them with disclosure. But sometimes you may not want to disclose the ranking.

sorry_dave_aiDuring the review many documents predicted to be relevant will not be. The reviewers will code them correctly, as they see them. Our reviewers can and do disagree with and overrule the computer’s predictions. The “Sorry Dave” phrase of the HAL 9000 computer in 2001 Space Odyssey is not possible.

If a reviewer is in doubt, they consult the SME team. Furthermore, special quality controls in the form of second reviews may be imposed on Man Machine disagreements (the computer says a document should be relevant, but the human reviewer disagrees, and visa versa). They often involve close questions and the ultimate results of the resolved conflicts are typically used in the next round of training.

Sometimes the Machine will predict that a document is relevant, maybe even with 99.9% certainty, even though you have already coded the document as Irrelevant. It does so even though you have already told the Machine to train on it as irrelevant. The Machine does not care about your feelings! Or your authority as chief SME. It considers all of the input, all of your documents input in step four. If the cold, hard logic of its algorithms tells it that a document should be relevant, that is what it will report, in spite of how the document has already been coded. This is an excellent quality control tool.

ralph_wrongI cannot tell you how impressed I was when that first happened to me. I was skeptical, but I went ahead and reread the long document anyway, this time more carefully. Sure enough, I had missed a paragraph near the end that made the document relevant. That was an Eureka moment for me. I have been a strong proponent of predictive coding ever since. Software does not get tried like we do. If the software is good it reads the whole document and is not front-loaded like we usually are. That does not mean Mr. EDR is always right. He is not. Most of the time we reaffirm the original coding, but not without a careful double-check. Usually we can see where the algorithm went wrong. Sometimes that influences our next iteration of step four, selection of training documents.

Prediction error type corrections such as this can be the focus of special searches in step six. Most quality version 4.0 software such as Mr. EDR have search functions built-in that are designed to locate all such conflicts between document ranking and classification. Reviewers then review and correct the computer errors by a variety of methods, or change their own prior decisions. This often requires SME team involvement, but only very rarely the senior level SME.

predictive_coding_4-0_4-5-6-stepsThe predictive coding software learns from all of the corrections to its prior predictive rankings. Steps 4, 5 and 6 then repeat as shown in the diagram. This iterative process is a positive feedback loop that continues until the computer predictions are accurate enough to satisfy the proportional demands of the case. In almost all cases that means you have found more than enough of the relevant documents needed to fairly decide the case. In many cases it is far better than that. It is routine for us to attain recall levels of 90% or higher. In a few you may find almost all of the relevant documents.

General Note on Ease of Version 4.0 Methodology and Attorney Empowerment

The machine training process for document review has become easier over the last few years as we have tinkered with and refined the methods. (Tinkering is the original and still only true meaning of hacking. See: HackerLaw.org) At this point of the predictive coding life cycle it is, for example, easier to learn how to do predictive coding than to learn how to do a trial – bench or jury. Interestingly, the most effective instruction method for both legal tasks is similar – second chair apprenticeship, watch and learn. It is the way complex legal practices have always been taught. My team can teach it to any smart tech lawyer by having them second chair a couple of projects.

da_vinci_surgical_robotIt is interesting to note that medicine uses the same method to teach surgeons how to do complex robotic surgery, with a da Vinci  surgical system, or the like. Whenever a master surgeon operates with robotics, there are always several doctors watching and assisting, more than are needed. In this photo they are the ones around the patient. The master surgeon who is actually controlling the tiny knifes in the patient is the guy on the far left sitting down with his head in the machine. He is looking at a magnified video picture of what is happening inside the patient’s body and moving the tiny knives around with a joystick.

da_vinci_robotic_joystickThe hybrid human-robot system augments the human surgeon’s abilities. The surgeon has his hands on the wheel at all times. The other doctors may watch dozens, and if they are younger, maybe even hundreds of surgeries before they are allowed to take control of the joy stick and do the hard stuff themselves. The predictive coding steps four, five and six are far easier than this, besides, if you screw up, nobody dies.

More on Step Five  – AI Document Ranking

Lexington-Web_basicMore discussion on step five may help clarify all three iterated steps. Again, step five is the AI Document Ranking step where the machine takes over and does all of the work. We have also called this the Auto Coding Run because this is where the software’s predictive coding calculations are performed. The software we use is Kroll Ontrack’s Mr. EDR. In the fifth step the software applies all of the training documents we selected in step four to sort the data corpus. In step five the human trainers can take a coffee break while Mr. EDR ranks all of the documents according to probable relevance or other binary choices.

predictive_coding_4-0_4-5-6-stepsThe first time the document ranking algorithm executes is sometimes called the seed set run. The first repetition of the ranking step five is known as the second round of training, the next, the third round, etc. These iterations continue until the training is complete within the proportional constraints of the case. At that point the attorney in charge of the search may declare the search complete and ready for the next quality assurance test in Step Seven. That is called the Stop decision.

It is important to understand that this entire eight-step workflow diagram is just a linear two-dimensional representation of Predictive Coding 4.0 for teaching purposes. These step descriptions are also a simplified explanation. Step Five can take place just a soon as a single document has been coded. You could have continuous, ongoing machine training at any time that the humans in charge decide to do so. That is the meaning of out team’s IST (Intelligently Spaced Training), as opposed to Grossman and Cormack’s trademarked CAL method, where the training always goes on without any human choice. This was discussed at length in Part Two of this series.

scales_hybrid_tippedWe space the training times ourselves to improve our communication and understanding of the software ranking. It helps us to have a better intuitive grasp of the machine processes. (Yes, such a thing is possible.) It allows us to observe for ourselves how a particular document, or usually a particular group of documents, impact the overall ranking. This is an important part of the Hybrid aspects of the Predictive Coding 4.0 Hybrid IST Multimodal Method. We like to be in control and to tell the machine exactly when and if to train, not the other way around. We like to understand what is happening and not just delegate everything to the machine. That is one reason we like to say that although we promote a balanced hybrid-machine process, we are pro-human and tip the scales in our favor.

As stated, step five in the eight-step workflow is a purely algorithmic function. The ranking of a few million documents may take as long as an hour, depending on the complexity, the number of documents, software and other factors. Or it might just take a few minutes. This depends on the circumstances and tasks presented.

hyperplanes3d_2All documents selected for training in step four are included in step five computer processing. The software studies the documents marked for training, and then analyzes all of the data uploaded onto the review platform. It then ranks all of the documents according to probable relevance (and, as mentioned according to other binary categories too, such as Highly Relevant and Privilege, and does all of these categories at the same time, but for simplicity purposes here we will just speak of the relevance rankings). It essentially assigns a probable value of from 0.01% to 99.9% probable relevance to each document in the corpus. (Note, some software uses different ranking values, but this is essentially what it is doing.) A value of 99.9% represents the highest probability that the document matches the category trained, such as relevant, or highly relevant, or privileged. A value of 0.01% means no likelihood of matching. A probability ranking of 50% represents equal likelihood, unless there has been careless over-training on irrelevance documents or other errors have been made. In the middle probability rankings the machine is said to be uncertain as to the document classification.

predictive_coding_4-0_4-5-6-stepsThe first few times the AI-Ranking step is run the software predictions as to document categorization are often wrong, sometimes wildly so. It depends on the kind of search and data involved and on the number of documents already classified and included for training. That is why spot-checking and further training are always needed for predictive coding to work properly. That is why predictive coding is always an iterative process.

Step Seven: ZEN Quality Assurance Tests

There has been no change in this step from Version 3.0 to Version 4.0. If you already know 3.0 well, skip to the conclusion. ZEN here stands for Zero Error Numerics. Predictive Coding 4.0 requires quality control activities in all steps, but the efforts peak in this Step Seven. For more details than provided here on the ZEN approach to quality control in document review see ZeroErrorNumerics.com.ZenBIn Step Seven a random sample is taken to try to evaluate the recall range attained in the project. The method currently favored is described in detail in Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part ThreeAlso see: In Legal Search Exact Recall Can Never Be Known.

ZENumerics

ei-recallThe ei-Recall test is based on a random sample of all documents to be excluded from the Final Review for possible production. Unlike the ill-fated control set of Predictive Coding 1.0 methodologies, the sample here is taken at the end of the project. At that time the final relevance conceptions have evolved to their final form and therefore much more accurate projections of recall can be made from the sample. The documents sampled can be based on documents excluded by category prediction (i.e. probable irrelevant) and/or by probable ranking of documents with proportionate cut-offs. The focus is on a search for any false negatives (i.e., relevant documents incorrectly predicted to be irrelevant) that are Highly Relevant or otherwise of significance.

Total 100% recall of all relevant documents is said by the professors to be scientifically impossible (unless you produce all documents, 0% precision), a myth that the e-Discovery Team shattered in TREC 2015 and again in 2016 in our Total Recall Track experiments. Still, it is very rare, and only happens in relatively simple search and review projects, akin to a straightforward single plaintiff employment case with clear relevance. In any event, total recall of all relevant document is legally unnecessary. Perfection – zero error – is a good goal, but never a legal requirement. The legal requirement is reasonable, proportional efforts to find the ESI that is important to resolve the key disputed issues of fact in the case. The goal is to avoid all false negatives of Highly Relevant documents. If this error is encountered, one or more additional iterations of Steps 4, 5 and 6 are required.

In step seven you also test the decision made at the end of step six to stop the training. This decision is evaluated by the random sample, but determined by a complex variety of factors that can be case specific. Typically it is determined by when the software has attained a highly stratified distribution of documents. See License to Kull: Two-Filter Document Culling and Visualizing Data in a Predictive Coding ProjectPart One, Part Two and Part Three, and Introducing a New Website, a New Legal Service, and a New Way of Life / Work; Plus a Postscript on Software Visualization.

predictive_coding_4-0_8-steps_istWhen the stratification has stabilized you will see very few new documents found as predicted relevant that have not already been human reviewed and coded as relevant. You essentially run out of documents for step six review. Put another way, your step six no longer uncovers new relevant documents. This exhaustion marker may, in many projects, mean that the rate of newly found documents has slowed, but not stopped entirely. I have written about this quite a bit, primarily in Visualizing Data in a Predictive Coding ProjectPart One, Part Two and Part Three. The distribution ranking of documents in a mature project, one that has likely found all relevant documents of interest, will typically look something like the diagram below. We call this the upside down champagne glass with red relevant documents on top and irrelevant on the bottom.data-visual_Round_5

Also see Postscript on Software Visualization where even more dramatic stratifications are encountered and shown.

Another key determinant of when to stop is the cost of further review. Is it worth it to continue on with more iterations of steps four, five and six? See Predictive Coding and the Proportionality Doctrine: a Marriage Made in Big Data, 26 Regent U. Law Review 1 (2013-2014) (note article was based on earlier version 2.0 of our methods where the training was not necessarily continuous). Another criteria in the stop decision is whether you have found the information needed. If so, what is the purpose of continuing a search? Again, the law never requires finding all relevant, only reasonable efforts to find the relevant documents needed to decide the important fact issues in the case. This last point is often overlooked by inexperienced lawyers.

Another important quality control technique, one used throughout a project, is the avoidance of all dual tasking, and learned, focused concentration, a flow-state, like an all-absorbing video game, movie, or a meditation. Here is a short video I did on the importanced of focus in document review.

________

____

Stephen_Breyer_at_homeSpeaking of relaxed, thought free, flow state, did you know that United States Supreme Court Justice Stephen Breyer is a regular meditator? In a CNN reporter interview in 2011 he said:

For 10 or 15 minutes twice a day I sit peacefully. I relax and think about nothing or as little as possible. … And really I started because it’s good for my health. My wife said this would be good for your blood pressure and she was right. It really works. I read once that the practice of law is like attempting to drink water from a fire hose. And if you are under stress, meditation – or whatever you choose to call it – helps. Very often I find myself in circumstances that may be considered stressful, say in oral arguments where I have to concentrate very hard for extended periods. If I come back at lunchtime, I sit for 15 minutes and perhaps another 15 minutes later. Doing this makes me feel more peaceful, focused and better able to do my work.”

Charles_HalpernApparently Steve Breyer also sometimes meditates with friends, including legendary Public Interest Lawyer, Professor and meditation promoter, Charles Halpern. Also see Halpern, Making Waves and Riding the Currents (2008) (his interesting autobiography); Charles Halpern on Empathy, Meditation, Barack Obama, Justice and Law (YouTube Interview in 2011 with interesting thoughts on judicial selection).

Document review is not as stressful as a Supreme Court oral argument, but it does go on far longer. Everybody needs to relax with a clear mind, and with focused attention, to attain their peak level of performance. That is the key to all quality control. How you get there is your business. Me, in addition to frequent breaks, I like headphones with music to help me there and help me to stay undistracted, focused. So, sit comfortably, spine erect, and enjoy this moment of ZEN.

_________

For more details on step seven see ZeroErrorNumericcs.com.

____

Step Eight: Phased Production

predictive_coding_4-0_simpleThere has been no change in this step from Version 3.0 to Version 4.0. If you already know 3.0 well, skip to the conclusion. This last step is where the relevant documents are reviewed again and actually produced. This step is also sometimes referred to as Second Pass Review. Technically, it has nothing to do with a predictive coding protocol, but for completeness sake, we needed to include it in the work flow. This final step may also include document redaction, document labeling, and a host of privilege review issues, including double-checking, triple checking of privilege protocols. These are tedious functions where contract lawyers can be a big help. The actual identification of privileged documents from the relevant should have been part of the prior seven steps.

document dump ralph LoseyAlways think of production in e-discovery as phased production. Do not think of making one big document dump. That is old-school paper production style. Start with a small test document production after you have a few documents ready. That will get the bugs out of the system for both you, the producer, and also for the receiving party. Make sure it is in the format they need and they know how to open it. Little mistakes and re-dos in a small test production are easy and inexpensive to fix. Getting some documents to the requesting party also gives them something to look at right away. It can buy you time and patience for the remaining productions. It is not uncommon for a large production to be done in five or more smaller stages. There is no limit so long as the time delay is not overly burdensome.

Multiple productions are normal and usually welcome by the receiving party. Just be sure to keep them informed of your progress and what remains to be done. Again, step one – Talk – is supposed to continue throughout a project. Furthermore, production of at least some documents can begin very early in the process. It does not have to wait until the last step. It can, for instance, begin while you are still in the iterated steps four, five and six. Just make sure you apply your quality controls and final second pass reviews to all documents produced. Very early productions during the intensive document training stages may help placate a still distrustful requesting party. It allows them to see for themselves that you are in fact using good relevant documents for training and they need not fear GIGO.

Losey Explains Clawback AgreementsThe format of the production should always be a non-issue. This is supposed to be discussed at the initial Rule 26(f) conference. Still, you might want to check again with the requesting party before you select the production format and metadata fields. More and more we see requesting parties that want a PDF format. That should not be a problem. Remember, cooperation should be your benchmark. Courtesy to opposing counsel on these small issues can go a long way. The existence of a clawback agreement and order, including a Rule 502(d) Order, and also a confidentiality agreement and order in some cases, should also be routinely verified before any production is made. This is critical and we cannot over-state its importance. You should never make a production with a 502(d) Order in place, or at least requested from the court. Again, this should be a non-issue. The forms used should be worked out as part of the initial 26(f) meet and greet.

Here is my short, five-minute video summary of this step.

_________

____

After the second pass review is completed there is still one more inspection, a short third pass. Before delivery of electronic documents we perform yet another quality control check. We inspect the media on which the production is made, typically CDs or DVDs, and do a third review of a few of the files themselves. This is an important quality control check, the last one, done just before the documents are delivered to the requesting party. You do not inspect every document, of course, but you do a very limited spot check based on judgmental sampling. You especially want to verify that critical privileged documents you previously identified as privileged have in fact been removed, and that redactions have been properly made. Trust but verify. Also check to verify the order of production is what you expected. You also verify little things that you would do for any paper production, like verify that the document legends and Bates stamping are done the way you wanted. Even the best vendors sometimes make mistakes, and so too does your team.

You need to be very diligent in protecting your client’s confidential information. It is an ethical duty of all lawyers. It weighs heavily in what we consider a properly balanced, proportional approach. That is why you must take time to do the Production step correctly and should never let yourself be rushed. Here is a short video on my philosophy of proportional balance in legal services, including a discussion of the mentioned final spot check of production CDs.

______

____

The final work included here is to prepare a privilege log. All good vendor review software should make this into a semi-automated process, and thus slightly less tedious. The logging is typically delayed until after production. Check with local rules on this and talk to the requesting party to let them know it is coming.

Time_SpiralOne final comment on the e-Discovery Team’s methods. We are very hyper about  time management throughout a project, but especially in the last step. Never put yourself in a time bind. Be Proactive. Stay ahead of the curve. This is important for the entire project, but especially in the last step. Mistakes are made when you have to rush to meet tight production deadlines. You must avoid this. Ask for an extension and motion the court if you have to. Better that than make a serious error. Again, produce what you have ready and come back for the rest.

Here is a video I prepared on the importance of good time management to any document review project.

__________

____

 

predictive_coding_4-0_web

Conclusion

ralph_masters_16Every search expert I have ever talked to agrees that it is just good common sense to find relevant information by using every search method that you can. It makes no sense to limit yourself to any one search method. They agree that multimodal is the way to go, even if they do not use that language (after all, I did make up the term), and even if they do not publicly promote that protocol (they may be promoting software or a method that does not use all methods). All of the scientists I have spoken with about search also all agree that effective text retrieval should use some type of active machine learning (what we in the legal world calls predictive coding), and not just rely on the old search methods of keyword, similarity and concept type analytics. The combined multimodal use of the old and new methods is the way to go. This hybrid approach exemplifies man and machine working together in an active partnership, a union where the machine augments human search abilities, not replaces them.

The Hybrid IST Multimodal Predictive Coding 4.0 approach described here is still not followed by most e-discovery vendors, including several prominent software vendors. Instead, they rely on just one or two methods to the exclusion of the others. For instance, they may rely entirely on machine selected documents for training, or even worse, rely entirely on random selected documents. They do so to try to keep it simple they say. It may be simple, but the power and speed given up for that simplicity is not worth it. Others have all types of search, including concept search and related analytics, but they still do not have active machine learning. You probably know who they are by now. This problem will probably be solved soon, so I will not belabor the point.

superman_animated3The users of the old software and old-fashioned methods will never know the genuine thrill known by most search lawyers using AI enhanced methods like Predictive Coding 4.0. The good times roll when you see that the AI you have been training has absorbed your lessons. When you see the advanced intelligence that you helped create kick-in to complete the project for you. When you see your work finished in record time and with record results. It is sometimes amazing to see the AI find documents that you know you would never have found on your own. Predictive coding AI in superhero mode can be exciting to watch.

My entire e-Discovery Team had a great time watching Mr. EDR do his thing in the thirty Recall Track TREC Topics in 2015. We would sometimes be lost, and not even understand what the search was for anymore. But Mr. EDR knew, he saw the patterns hidden to us mere mortals. In those cases we would just sit back and let him do the driving, occasionally cheering him on. That is when my Team decided to give Mr. EDR a cape and superhero status. He never let us down. It is a great feeling to see your own intelligence augmented and save you like that. It was truly a hybrid human-machine partnership at its best. I hope you get the opportunity soon to see this in action for yourself.

3-factors_hybrid_prohumanOur experience in TREC 2016 was very different, but still made us glad to have Mr. EDR around. This time most of the search projects were simple enough to find the relevant documents without his predictive coding superpowers. As mentioned, we verified in test conditions that the skilled use of Tested, Parametric Boolean Keyword Search is very powerful. Keyword search, when done by experts using hands-on testing, and not simply blind Go Fish keyword guessing, is very effective. We proved that in the 2016 TREC search projects. As explained in Part Four of this series, the keyword appropriate projects are those where the data is simple, the target is clear and the SME is good. Still, even then, Mr. EDR was helpful as a quality control assistant. He verified that we had found all of the relevant documents.

Bottom line for the e-Discovery Team at this time is that the use of all methods is appropriate in all projects, even in simple searches where predictive coding is not needed to find all relevant documents. You can still use active machine learning in simple projects as a way to verify the effectiveness of your keyword and other searches. It may not be necessary in the simple cases, but it is still a good search to add to your tool chest. When the added expense is justified and proportional, the use of predictive coding can help assure you, and the other side, that a high quality effort has been made.

predictive_coding_six-three-2

The multimodal approach is the most effective method of search. All search tools should be used, not only Balanced Hybrid – IST active machine learning searches, but also concept and similarity searches, keyword search and, in some instances, even focused linear review. By using some or all search methods, depending on the project and challenges presented, you can maximize recall (the truth, the whole truth) and precision (nothing but the truth). That is the goal of search: effective and efficient. Along the way we must exercise caution to avoid the errors of Garbage in, Garbage Out, that can be caused by poor SMEs. We must also guard against the errors and omissions, low recall and low precision, that can arise from substandard software and methods. In our view the software must be capable of all search methods, including active machine learning, and the methods used should too.


Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Six

October 16, 2016

This is the sixth installment of the article explaining the e-Discovery Team’s latest enhancements to electronic document review using Predictive Coding. Here are Parts OneTwoThreeFour and Five. This series explains the nine insights behind the latest upgrade to version 4.0 and the slight revisions these insights triggered to the eight-step workflow. We have already covered the nine insights. Now we will begin to review the revised eight-step workflow.

predictive_coding_4-0_webThe eight-step chart provides a model of the Predictive Coding 4.0 methods. (You may download and freely distribute this chart without further permission, so long as you do not change it.) The circular flows depict the iterative steps specific to the predictive coding features. Steps four, five and six iterate until the active machine training reaches satisfactory levels and thereafter final quality control and productions are done.

Although presented as sequential steps for pedantic purposes, Predictive Coding 4.0 is highly adaptive to circumstances and does not necessarily follow a rigid linear order. For instance, some of the quality control procedures are used throughout the search and review, and rolling productions can begin at any time.

CULLING.filters_SME_only_reviewTo fully understand the 4.0 method, it helps to see how it is fits into an overall Dual-Filter Culling process. See License to Cull The Two-Filter Document Culling Method (2015) (see illustrative diagram right). Still more information on predictive coding and electronic document review can be found in the over sixty articles published here on the topic since 2011. Reading helps, but we have found that the most effective way to teach this method, like any other legal method, is by hands-on guidance. Our eight-step workflow can be taught to any legal professional who already has experience with document review by the traditional second-chair type of apprenticeship training.

This final segment of our explanation of Predictive Coding 4.0 will include some of the videos that I made earlier this year describing our document review methods. Document Review and Predictive Coding: an introductory course with 7 videos and 2,982 words. The first video below introduces the eight-step method. Once you get past my attempt at Star Wars humor in the opening credits of the video you will hear my seven-minute talk. It begins with why I think predictive coding and other advanced technologies are important to the legal profession and how we are now at a critical turning point of civilization.

_______

_______

Step One – ESI Communications

Business Discussion --- Image by © Royalty-Free/CorbisGood review projects begin with ESI Communications, they begin with talking. You need to understand and articulate the disputed issues of fact. If you do not know what you are looking for, you will never find it. That does not mean you know of specific documents. If you knew that, it would not be much of a search. It means you understand what needs to be proven at trial and what documents will have impact on judge and jury. It also means you know the legal bounds of relevance, including especially Rule 26(b)(1).

relevance_scope_2016

ESI Communications begin and end with the scope of the discovery, relevance and related review procedures. The communications are not only with opposing counsel or other requesting parties, but also with the client and the e-discovery team assigned to the case. These Talks should be facilitated by the lead e-Discovery specialist attorney assigned to the case. But they should include the active participation of the whole team, including all trial lawyers not otherwise very involved in the ESI review.

The purpose of all of this Talk is to give everyone an idea as to the documents sought and the confidentiality protections and other special issues involved. Good lines of communication are critical to that effort. This first step can sometimes be difficult, especially if there are many new members to the group. Still, a common understanding of relevance, the target searched, is critical to the successful outcome of the search. This includes the shared wisdom that the understanding of relevance will evolve and grow as the project progresses.

bullseye_arrow_hitWe need to Talk to understand what we are looking for. What is the target? What is the information need? What documents are relevant? What would a hot document look like? A common understanding of relevance by a review team, of what you are looking for, requires a lot of communication. Silent review projects are doomed to failure. They tend to stagnate and do not enjoy the benefits of Concept Drift, where a team’s understanding of relevance is refined and evolves as the review progresses. Yes, the target may move, and that is a good thing. See: Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three.

Missed_targetReview projects are also doomed where the communications are one way, lecture down projects where only the SME talks. The reviewers must talk back, must ask questions. The input of reviewers is key. Their questions and comments are very important. Dialogue and active listening are required for all review projects, including ones with predictive coding.

You begin with analysis and discussions with your client, your internal team, and then with opposing counsel, as to what it is you are looking for and what the requesting party is looking for. The point is to clarify the information sought, the target. You cannot just stumble around and hope you will know it when you find it (and yet this happens all too often). You must first know what you are looking for. The target of most searches is the information relevant to disputed issues of fact in a case or investigation. But what exactly does that mean? If you encounter unresolvable disputes with opposing counsel on the scope of relevance, which can happen during any stage of the review despite your best efforts up-front, you may have to include the Judge in these discussions and seek a ruling.

Here is my video explaining the first step of ESI Communications.

________

______

talk_friendlyESI Discovery Communications” is about talking to your review team, including your client, key witnesses; it is about talking to opposing counsel; and, eventually, if need be, talking to the judge at hearings. Friendly, informal talk is a good method to avoid the tendency to polarize and demonize “the other side,” to build walls and be distrustful and silent.

angry-messageThe amount of distrust today between attorneys is at an all-time high. This trend must be reversed. Mutually respectful talk is part of the solution. Slowing things down helps too. Do not respond to a provocative text or email until you calm down. Take your time to ponder any question, even if you are not upset. Take your time to research and consult with others first. This point is critical. The demand for instant answers is never justified, nor required under the rules of civil procedure. Think first and never respond out of anger. We are all entitled to mutual respect. You have a right to demand that. So do they.

iphonerlThis point about not actually speaking with people in realtime, in person, or by phone or video, is, to some extent, generational. Many younger attorneys seem to have an inherent loathing of the phone and speaking out loud. They let their thumbs do the talking. (This is especially true in e-discovery where the professionals involved tend to be very computer oriented, not people oriented. I know because I am like that.) Meeting in person in real-time is distasteful to many, not just Gen X. Many of us prefer to put everything in emails and texts and tweets and posts, etc. That may make it easier to pause to reflect, especially if you are loathe to say in person that you do not know and will need to get back to them on that. But real time talking is important to full communication. You may need to force yourself to real-time interpersonal interactions. Many people are better at real-time talk than others, just like many are better at fast comprehension of documents than others. It is often a good idea for a team to have a designated talker, especially when it comes to speaking with opposing counsel or the client.

In e-discovery, where the knowledge levels are often extremely different, with one side knowing more about the subject than the other, the fist step of ESI Communications or Talk usually requires patient explanations. ESI Communications often require some amount of educational efforts by the attorneys with greater expertise. The trick is to do that without being condescending or too pedantic, and, in my case at least, without losing your patience.

predictive_coding_4-0_8-steps_ist

Some object to the whole idea of helping opposing counsel by educating them, but the truth is, this helps your clients too. You are going to have to explain everything when you take a dispute to the judge, so you might as well start upfront. It helps save money and moves the case along. Trust building is a process best facilitated by honest, open talk.

ralph_listening_4I use of the term Talk to invoke the term listen as well. That is one reason we also refer to the first step as “Relevance Dialogues” because that is exactly what it should be, a back and forth exchange. Top down lecturing is not intended here. Even when a judge talks, where the relationship is truly top down, the judge always listens before rendering his or her decision. You are given the right to be heard at a hearing, to talk and be listened to. Judges listen a lot and usually ask many questions. Attorneys should do the same. Never just talk to hear the sound of your own voice. As Judge David Waxse likes to say, talk to opposing counsel as if the judge were listening.

judge_friendlyThe same rules apply when communicating about discovery with the judge. I personally prefer in-person hearings, or at least telephonic, as opposed to just throwing memos back and forth. This is especially true when the memorandums have very short page limits. Dear Judges: e-discovery issues are important and can quickly spiral out of control without your prompt attention. Please give us the hearings and time needed. Issuing easy orders that just split the baby will do nothing but pour gas on a fire.

In my many years of lawyering I have found that hearings and meetings are much more effective than exchanging papers. Dear brothers and sisters in the BAR: stop hating, stop distrusting and vilifying, and start talking to each other. That means listening too. Understand the other-side. Be professional. Try to cooperate. And stop taking extreme positions that assume the judge will just split the baby. 

talking_hearingIt bears emphasis that by Talk in this first step we intend dialogue. A true back and forth. We do not intend argument, nor winners and losers. We do intend mutual respect. That includes respectful disagreement, but only after we have heard each other out and understood our respective positions. Then, if our talks with the other side have reached an impasse, at least on some issues, we request a hearing from the judge and set out the issues for the judge to decide. That is how our system of justice and discovery are designed to work. If you fail to talk, you not only doom the document review project, you doom the whole case to unnecessary expense and frustration.

Richard BramanThis dialogue method is based on a Cooperative approach to discovery that was promoted by the late, great Richard Braman of The Sedona Conference. Cooperation is not only a best practice, but is, to a certain extent, a minimum standard required by rules of professional ethics and civil procedure. The primary goal of these dialogues for document review purposes is to obtain a common understanding of the e-discovery requests and reach agreement on the scope of relevancy and production.

ESI Communications in this first step may, in some cases, require disclosure of the actual search techniques used, which is traditionally protected by work product. The disclosures may also sometimes include limited disclosure of some of the training documents used, typically just the relevant documents. SAndrew J. Peckee Judge Andrew Peck’s 2015 ruling on predictive coding, Rio Tinto v. Vale, 2015 WL 872294 (March 2, 2015, SDNY). In Rio Tinto Judge Peck wisely modified somewhat his original views stated in Da Silva on the issue of disclosure. Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012) (approved and adopted in Da Silva Moore v. Publicis Groupe, 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012)). Judge Peck no longer thinks that parties should necessarily disclose any training documents, and may instead:

… insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as non-responsive. See generally Grossman & Cormack, Comments, supra, 7 Fed. Cts. L.Rev. at 301-12.

The Court, however, need not rule on the need for seed set transparency in this case, because the parties agreed to a protocol that discloses all non-privileged documents in the control sets. (Attached Protocol, ¶¶ 4(b)-(c).) One point must be stressed — it is inappropriate to hold TAR to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using TAR for review.

Id. at *3. Also see Rio Tinto v. Vale, Stipulation and Order Re: Revised Validation and Audit Protocols for the use of Predictive Coding in Discovery, 14 Civ. 3042 (RMB) (AJP), (order dated 9/2/15 by Maura Grossman, Special Master, and adopted and ordered by Judge Peck on 9/8/15).

Judge Peck here follows the current prevailing view on disclosure that I also endorse. Disclose the relevant documents used in active machine learning, but not the irrelevant documents used in training. If there are borderline, grey area documents classified as irrelevant, you may need to disclose these type of documents by description, not actual production. Again, talk to the requesting party on where you are drawing the line. Talk about the grey area documents that you encounter. If they disagree, ask for a ruling before your training is complete.

grey_area_disclosure

The goals of Rule 1 of the Federal Rules of Civil Procedure (just, speedy and inexpensive) are impossible in all phases of litigation, not just discovery, unless attorneys communicate with each other. The parties may hate each other and refuse to talk. That sometimes happens. But the attorneys must be above the fray. That is a key purpose and function of an attorney in a dispute. It is sad that so many attorneys do not seem to understand that. If you are faced with such an attorney, my best advice is to lead by example, document the belligerence and seek the help of your presiding judge.

vulcan-mind-meldAlthough Talk to opposing counsel is important, even more important is talking within the team. It is an important method of quality control and efficient project management. Everyone needs to be on the same page of relevance and discoverability. Work needs to be coordinated. Internal team Talk needs to be very close. Although a Vulcan mind meld might be ideal, it is not really necessary. Still, during a project a steady flow of talk, usually in the form of emails or chats, is normal and efficient. Clients should never complain about time spent communicating to manage a document review project. It can save a tremendous amount of money in the long run, so long as it is focused on the task at hand.

Step Two – Multimodal ECA

Multimodal Early Case Assessment – ECA – summarizes the second step in our 8-step work flow. We used to call the second step “Multimodal Search Review.” It is still the same activity, but we tweaked the name to emphasize the ECA significance of this step. After we have an idea of what we are looking for from ESI Communications in step one, we start to use every tool at our disposal to try to find the relevant documents. Every tool that is, except for active machine learning. Our first look at the documents is our look, not the machine’s. That is not because we do not trust the AI’s input. We do. It is because there is no AI yet. The predictive coding only begins after you feed training documents into the machine. That happens in step four.

predictive_coding_4-0_web

NIST-Logo_RLOur Multimodal ECA step-two does not take that long, so the delay in bringing in our AI is usually short. In our experiments at TREC in 2015 and 2016 under the auspicious of NIST, where we skipped steps three and seven to save time, and necessarily had little ESI Communications in step one, we would often complete simple document reviews of several hundred thousand documents in just a few hours. We cannot match these results in real-life legal document review projects because the issues in law suits are usually much more complicated than the issues presented by most topics at TREC. Also, we cannot take the risk of making mistakes in a real legal project that we did in an academic event like TREC.

Again, the terminology revision to say Multimodal ECA is more a change of style than substance. We have always worked in this manner. The name change is just to better convey the idea that we are looking for the low hanging fruit, the easy to find documents. We are getting an initial assessment of the data by using all of the tools of the search pyramid except for the top tier active machine learning. The AI comes into play soon enough in steps four and five, sometimes as early as the same day.

search_pyramid_revised

I have seen projects where key documents are found during the first ten minutes of looking around. Usually the secrets are not revealed so easily, but it does happen. Step two is the time to get to know the data, run some obvious searches, including any keyword requests for opposing counsel. You use the relevant and irrelevant documents you find in step two as the documents you select in step four to train the AI.

In the process of this initial document review you start to get a better understanding of the custodians, their data and relevance. This is what early case assessment is all about. You will find the rest of the still hidden relevant documents in the iterated rounds of machine training and other searches that follow. Here is my video description of step two.

______

_______

Although we speak of searching for relevant documents in step two, it is important to understand that many irrelevant documents are also incidentally found and coded in that process. Active machine learning does not work by training on relevant documents alone. It must also include examples of irrelevant documents. For that reason we sometimes actively search for certain kinds of irrelevant documents to use in training. One of our current research experiments with Kroll Ontrack is to determine the best ratios between relevant and irrelevant documents for effective document ranking. See TREC reports at Mr. EDR as updated from time to time. At this point we have that issue nailed.

The multimodal ECA review in step two is carried out under the supervision of the Subject Matter Experts on the case. They make final decisions where there is doubt concerning the relevance of a document or document type. The SME role is typically performed by a team, including the partner in charge of the case – the senior SME – and senior associates, and e-Discovery specialist attorney(s) assigned to the case. It is, or should be, a team effort, at least in most large projects. As previously described, the final arbitrator on scope is made by the senior SME, who in turn is acting as the predictor of the court’s views. The final, final authority is always the Judge. The chart below summarizes the analysis of the SME and judge on the discoverability of any document. See Predictive Coding 4.0, Part Five.

relevance_scope_2016

When I do a project, acting as the e-Discovery specialist attorney for the case, I listen carefully to the trial lawyer SME as he or she explains the case. By extensive Q&A the members of the team understand what is relevant. We learn from the SME. It is not exactly a Vulcan mind-meld, but it can work pretty well with a cohesive team.  Most trial lawyers love to teach and opine on relevance and their theory of the case.

Helmuth Karl Bernhard von Moltke

General Moltke

Although a good SME team communicates and plans well, they also understand, typically from years of experience, that the intended relevance scope is like a battle plan before the battle. As the famous German military strategist, General Moltke the Elder said: No battle plan ever survives contact with the enemy. So too no relevance scope plan ever survives contact with the corpus of data. The understanding of relevance will evolve as the documents are studied, the evidence is assessed, and understanding of what really happened matures. If not, someone is not paying attention. In litigation that is usually a recipe for defeat. See Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three.

Army of One: Multimodal Single-SME Approach To Machine LearningThe SME team trains and supervises the document review specialists, aka, contract review attorneys, who usually then do a large part of the manual reviews (step-six), and few if any searches. Working with review attorneys is a constant iterative process where communication is critical. Although I sometimes use an army-of-one approach where I do everything myself (that is how I did the EDI Oracle competition and most of the TREC topics), my preference now is to use two or three reviewers to help with the document review. With good methods, including culling methods, and good software, it is rarely necessary to use more reviewers than that. With the help of strong AI, say that included in Mr. EDR, we can easily classify a million or so documents for relevance with that size team. More reviewers than that may well be needed for complex redaction projects and other production issues, but not for a well-designed first-pass relevance search.

One word of warning when using document reviewers, it is very important for all members of the SME team to have direct and substantial contact with the actual documents, not just the reviewers. For instance, everyone involved in the project should see all hot documents found in any step of the process. It is especially important for the SME trial lawyer at the top of the expert pyramid to see them, but that is rarely more than a few hundred documents, often just a few dozen. Otherwise, the top SME need only see the novel and grey area documents that are encountered, where it is unclear on which side of the relevance line they should fall in accord with the last instructions. Again, the burden on the senior, and often technologically challenged senior SME attorneys, is fairly light under these Version 4.0 procedures.

The SME team relies on a primary SME, who is typically the trial lawyer in charge of the whole case, including all communications on relevance to the judge and opposing counsel. Thereafter, the head SME is sometimes only consulted on an as-needed basis to answer questions and make specific decisions on the grey area documents. There are always a few uncertain documents that need elevation to confirm relevance, but as the review progresses, their number usually decreases, and so the time and attention of the senior SME decreases accordingly.

Step Three – Random Prevalence

Control-SetsThere has been no change in this step from Version 3.0 to Version 4.0. The third step, which is not necessarily chronological, is essentially a computer function with statistical analysis. Here you create a random sample and analyze the results of expert review of the sample. Some review is thus involved in this step and you have to be very careful that it is correctly done. This sample is taken for statistical purposes to establish a baseline for quality control in step seven. Typically prevalence calculations are made at this point. Some software also uses this random sampling selection to create a control set. As explained at length in Predictive Coding 3.0, we do not use a control set because it is so unreliable. It is a complete waste of time and money and does not produce reliable recall estimates. Instead, we take a random sample near the beginning of a project solely to get an idea on Prevalence, meaning the approximate number of relevant documents in the collection.

predictive_coding_4-0_web

Unless we are in a very rushed situation, such as in the TREC projects, where we would do a complete review in a day or two, or sometimes just a few hours, we like to take the time for the sample and prevalence estimate.

It is all about getting a statistical idea as to the range of relevant documents that likely exist in the data collected. This is very helpful for a number of reasons, including proportionality analysis (importance of the ESI to the litigation and cost estimates) and knowing when to stop your search, which is part of step seven. Knowing the number of relevant documents in your dataset can be very helpful, even if that number is a range, not exact. For example, you can know from a random sample that there are between four thousand and six thousand relevant documents. You cannot know there are exactly five thousand relevant documents. See: In Legal Search Exact Recall Can Never Be Known. Still, knowledge of the range of relevant documents (red in the diagram below) is helpful, albeit not critical to a successful search.

Prevalence_binomial_gaussian

In step three an SME is only needed to verify the classifications of any grey area documents found in the random sample. The random sample review should be done by one reviewer, typically your best contract reviewer. They should be instructed to code as Uncertain any documents that are not obviously relevant or irrelevant based on their instructions and step one. All relevance codings should be double checked, as well as Uncertain documents. The senior SME is only consulted on an as-needed basis.

Document review in step three is limited to the sample documents. Aside from that, this step is a computer function and mathematical analysis. Pretty simple after you do it a few times. If you do not know anything about statistics, and your vendor is also clueless on this (rare), then you might need a consulting statistician. Most of the time this is not necessary and any competent Version 4.0 vendor expert should be able to help you through it.

thumb_ruleIt is not important to understand all of the math, just that random sampling produces a range, not an exact number. If your sample size is small, then the range will be very high. If you want to reduce your range in half, which is a function in statistics known as a confidence interval, you have to quadruple your sample size. This is a general rule of thumb that I explained in tedious mathematical detail several years ago in Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022. Our Team likes to use a fairly large sample size of about 1,533 documents that creates a confidence interval of plus or minus 2.5%, subject to a confidence level of 95% (meaning the true value will lie within that range 95 times out of 100). More information on sample size is summarized in the graph below. Id.

random_size_graph

The picture below this paragraph illustrates a data cloud where the yellow dots are the sampled documents from the grey dot total, and the hard to see red dots are the relevant documents found in that sample. Although this illustration is from a real project we had, it shows a dataset that is unusual in legal search because the prevalence here was high, between 22.5% and 27.5%. In most data collections searched in the law today, where the custodian data has not been filtered by keywords, the prevalence is far less than that, typically less than 5%, maybe even less that 0.5%. The low prevalence increases the range size, the uncertainties, and requires a binomial calculation adjustment to determine the statistically valid confidence interval, and thus the true document range.

data-visual_RANDOM_2

For example, in a typical legal project with a few percent prevalence range, it would be common to see a range between 20,000 and 60,000 relevant documents in a 1,000,000 collection. Still, even with this very large range, we find it useful to at least have some idea of the number of relevant documents that we are looking for. That is what the Baseline step can provide to you, nothing more nor less.

95 Percent Confidence Level with Normal Distribution 1.96As mentioned, your vendor can probably help you with these statistical estimates. Just do not let them tell you that it is one exact number. It is always a range. The one number approach is just a shorthand for the range. It is simply a point projection near the middle of the range. The one number point projection is the top of the typical probability bell curve range shown right, which illustrates a 95% confidence level distribution. The top is just one possibility, albeit slightly more likely than either end points. The true value could be anywhere in the blue range.

To repeat, the step three prevalence baseline number is always a range, never just one number. Going back to the relatively high prevalence example, the below bell cure shows a point projection of 25% prevalence, with a range of 22.2% and 27.5%, creating a range of relevant documents of from between 225,000 and 275,000. This is shown below.

25_bell-curve-Standard_deviation_diagram

confidence interval graph showing standard distribution and 50% prevalenceThe important point that many vendors and other “experts” often forget to mention, is that you can never know exactly where within that range the true value may lie. Plus, there is always a small possibility, 5% when using a sample size based on a 95% confidence level, that the true value may fall outside of that range. It may, for example, only have 200,000 relevant documents. This means that even with a high prevalence project with datasets that approach the Normal Distribution of 50% (here meaning half of the documents are relevant), you can never know that there are exactly 250,000 documents, just because it is the mid-point or point projection. You can only know that there are between 225,000 and 275,000 relevant documents, and even that range may be wrong 5% of the time. Those uncertainties are inherent limitations to random sampling.

Shame on the vendors who still perpetuate that myth of certainty. Lawyers can handle the truth. We are used to dealing with uncertainties. All trial lawyers talk in terms of probable results at trial, and risks of loss, and often calculate a case’s settlement value based on such risk estimates. Do not insult our intelligence by a simplification of statistics that is plain wrong. Reliance on such erroneous point projections alone can lead to incorrect estimates as to the level of recall that we have attained in a project. We do not need to know the math, but we do need to know the truth.

The short video that follows will briefly explain the Random Baseline step, but does not go into the technical details of the math or statistics, such as the use of the binomial calculator for low prevalence. I have previously written extensively on this subject. See for instance:

Byte and Switch

If you prefer to learn stuff like this by watching cute animated robots, then you might like: Robots From The Not-Too-Distant Future Explain How They Use Random Sampling For Artificial Intelligence Based Evidence Search. But be careful, their view is version 1.0 as to control sets.

Thanks again to William Webber and other scientists in this field who helped me out over the years to understand the Bayesian nature of statistics (and reality).

_______

_____

To be continued …


%d bloggers like this: