Here is the e-Discovery Team’s second most interesting e-discovery opinion of 2016: Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, No. 2685-11, 8393-12, 2016 WL 4204067 (T.C. July 13, 2016). It was written by Ronald L. Buch, Judge of the United States Tax Court.
SECOND – Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue
The Dynamo Holdings case is of great interest to the e-Discovery Team because it is all about our favorite topic, the thing we do best, Predictive Coding. Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, No. 2685-11, 8393-12, 2016 WL 4204067 (T.C. July 13, 2016). Judge Ronald Buch’s opinion contains an excellent discussion of the law and application of predictive coding.
Recall that this is the second opinion by Judge Buch on predictive coding in this case. In 2014 the Court approved the use of predictive coding. It was one of the first courts to do so after Judge Peck’s Da Silva Moore. Dynamo Holdings Limited P’ship v. Commissioner, 143 T.C. 183 (2014). In Dynamo One, Judge Buch first stated his views on predictive coding:
Predictive coding is an expedited and efficient form of computer-assisted review that allows parties in litigation to avoid the time and costs associated with the traditional, manual review of large volumes of documents.
Id. at 190. In Dynamo One Judge Buch granted the Commissioner’s motion and compelled the plaintiffs to produce the backup tapes, but also granted the plaintiffs’ request for permission to use predictive coding. Id. at 194. In Dynamo Two Judge Buch considered objections to the plaintiff’s predictive coding work and rejected the Commissioner’s motion for the plaintiff to redo the document review using keyword search. Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, No. 2685-11, 8393-12, 2016 WL 4204067 (T.C. July 13, 2016).
With this background Dynamo Holdings Two had a good chance at the top ranked case of 2016 but for two things. One, the top of the opinion contains this statement: Pursuant to Tax Court Rule 50(f), orders shall not be treated as precedent, except as otherwise provided. Not the kind of statement you want to see on a great opinion like this, but there it is.
The second reason Dynamo Two has to settle for number two is the fact that the predictive coding methodology used by the parties in Dynamo was totally bonkers. This is not the judge’s fault, of course. Indeed, the author of second most interesting e-Discovery opinion of 2016, did a good job of explaining the crazy, random based predictive coding protocol the attorneys in this case came up with. Also see Judge Buch’s Order Concerning ESI Discovery (Dec. 11, 2015) that sets forth more detail of their ill-informed, compromise protocol.
The e-Discovery Team’s interest in Dynamo Holdings is in spite of our misgivings concerning the way predictive coding was used in this case back in 2014. We cannot get past the pathetic random based methods for training document selection, not to mention the old-fashioned version 1.0 methods they used. Suffice it to say that predictive coding can be done far, far better than it was here, and that the Team, like the Defendant in Dynamo Holdings, has serious misgivings as to the predictive coding based document review done by the Plaintiffs. So to do many others. See: Tredennick & Gricks, Discussion About Dynamo Holdings: Is 43% Recall Enough?.
Our objection is not to the predictive coding software, or the idea of predictive coding, it is an objection as to the specific method of use. The Dynamo Holdings method had very little in common with today’s state of the art methodology. See PREDICTIVE CODING 4.0. (method shown in diagram below.) Still, putting this objection aside, it is the second most interesting e-discovery opinion of 2016.
The Plaintiffs in this case, called in Tax Court lingo, the Petitioners, did the document production using a predictive coding method that the Defendant, the Commissioner of the Internal Revenue Service, obviously had a had in developing in a semi-cooperative fashion. (Reminds me of the explanation of the Camel, a Horse designed by a committee.) The attorneys for both sides were obviously responsible for the mess of a compromise method they came up with. If they had any actual predictive coding experts to advise them, and I assume they did, it is obvious that their advice was not followed. See eg: Predictive Coding 3.0 article, part one and part two (part one describes the history and part two describes the method. Superseded by Predictive Coding 4.0 article, but still good background); Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One, Part Two, Part Three, and Part Four.
When the Petitioners completed the document review with predictive coding, the Defendant basically objected to the work and said it was bad, unreasonably inadequate, and the result was “incomplete.” They argued that the Petitioner’s attorneys should keep on looking for relevant documents, but this time use keyword search, not predictive coding (as if they were mutually exclusive, which they are not). We agree with the defense that the Petitioner’s work was bad, due to poor choice of method. It probably was an incomplete response, as the Commissioner defendant put it. But we also agree with Judge Buch’s refusal to order the Petitioner to redo or supplement the project. The Commissioner’s attorney helped create this error, so we think some estoppel applies here. But the main reason we agree with Judge Buch’s denial of any relief is the Commissioner’s failure to provide proof of the inadequacy of the review, nor an alternative method that would be better.
Here is how Judge Ronald Buch, obviously a very intelligent, well-informed judge, described the dispute resolved by this opinion:
The quality of that response (by Petitioner) is now before us. Using a process described in more detail below, petitioners responded to the discovery requests by using predictive coding. The Commissioner, believing the response to be incomplete, served petitioners with a new discovery request asking for all documents containing any of a series of search terms. (Those same search terms had been used in a Boolean search during the predictive coding process to identify how many documents in the electronic records had each term.) Petitioners objected to this new discovery request as duplicative of the previous discovery responses made through the use of predictive coding. On June 17, 2016, the Commissioner filed a motion under Tax Court Rules 72(b)(2)¹ to compel the production of documents responsive to the Boolean search that were not produced through the use of predictive coding. The petitioners object.
Now we get into the excellent language used by Judge Buch to describe the new technology, language that we fully endorse, but with a caveat that methods differ and some, as we see in Dynamo Holdings, are downright defective.
When responding to a document request, technology has rendered the traditional approach to document review impracticable. The traditional method is labor intensive, with people reviewing documents to discern what is (or is not) responsive, with the responsive documents then reviewed for privilege, and with the responsive and non-privileged documents being produced. When reviewing documents in the dozens, hundreds, or low thousands, this worked fine. But with the advent of electronic recordkeeping, documents no longer number in the mere thousands, and various electronic search methods have developed.
When electronic records are involved, perhaps the most common technique that is employed is to begin with keyword searches or Boolean searches to a defined universe of documents. Then, the responding party typically reviews the results of those searches to identify what, in fact, is responsive to the request.
Implicit in this approach is the fact that some of the documents that are responsive to the word or Boolean search are responsive, while others are not.
An emerging approach, and the approach authorized in this case in our Opinion at 143 T.C. 183, is to use predictive coding to identify those documents that are responsive. A few key points of that Opinion are worth highlighting.
First, the Court authorized the responding party (petitioners) to use predictive coding, but the Court did not, in either its Opinion or its subsequent Order of September 17, 2014, mandate how the parties proceed from that point. …
Second, the Court held open the issue of whether the resulting document production would be sufficient, expressly stating “If, after reviewing the results, respondent believes that the response to the discovery request is incomplete, he may file a motion to compel at that time.” hl. at 189, 194. To state the obvious, (1) it is the obligation of the responding party to respond to the discovery, and (2) if the requesting party can articulate a meaningful shortcoming in that response, then the requesting party can seek relief. We turn now to those two points.
The opinion then goes into the tedious task of describing the was predictive coding was used here. As mentioned, there is no educational value in reviewing that, except to say, do not use this so called method. It was all wrong and the discussion by both sides on recall and precision is incorrect and does not follow bona fide statistical analysis. See eg: Predictive Coding 3.0 article, part one and part two (part one describes the history and part two describes the method. Superseded by Predictive Coding 4.0 article, but still good background); Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal Search – Part One, Part Two and Part Three; In Legal Search Exact Recall Can Never Be Known; Concept Drift and Consistency: Two Keys To Document Review Quality, part one, part two and part three.
Judge Buch’s bottom line on the argument in Dynamo Two was correct, and in line with that of the e-Discovery Team, when he assumed that the predictive coding done in this case was “flawed,” but held that “the question remains whether any relief should be afforded.” It was a flawed method alright, but the Defendant failed to provide good cause to justify the relief sought of an expensive do-over using keyword search.
Judge Buch correctly surmised that the Defendant’s request for relief was predicate upon two false premises, which he colorfully calls “myths.”
Respondent’s motion is predicated on two myths.
The first is the myth of human review. As noted in The Sedona Conference Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery: “It is not possible to discuss this issue without noting that there appears to be a myth that manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured.” 15 Sedona Conf. J. 214, 230 (2014). This myth of human review is exactly that: a myth. Research shows that human review is far from perfect. Several studies are summarized in Nicholas M. Pace & Laura Zakaras, RAND Corp., Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012) at 55. To summarize even further, if two sets of human reviewers review the same set of documents to identify what is responsive, research shows that those reviewers will disagree with each on more than half of the responsiveness claims. As the RAND report concludes:
Taken together, this body of research shows that groups of human reviewers exhibit significant inconsistency when examining the same set of documents for responsiveness under conditions similar to those in large-scale reviews. Is the high level of disagreement among reviewers with similar backgrounds and training reported in all of these studies simply a function of the fact that determinations of responsiveness or relevance are so subjective that reasonable and informed people can be expected to disagree on a routine basis? Evidence suggests that this is not the case. Human error in applying the criteria for inclusion, not a lack of clarity in the document’s meaning or ambiguity in how the scope of the production demand should be interpreted, appears to be the primary culprit. In other words, people make mistakes, and, according to the evidence, they make them regularly when it comes to judging relevance and responsiveness.
Id. at 58. (Indeed, even keyword searches are flawed. One study summarized in Moore v. Publicis Groupe & MSL Grp., 287 F.R.D. 182, 191 (S.D.N.Y. 2012), found that the average recall rate based on a keyword review was only 20%.)
The second myth is the myth of a perfect response. The Commissioner is seeking a perfect response to his discovery request, but our Rules do not require a perfect response. Instead, the Tax Court Rules require that the responding party make a “reasonable inquiry” before submitting the response. Specifically, Rule 70(f) requires the attorney to certify, to the best of their knowledge formed after a “reasonable inquiry,” that the response is consistent with our Rules, not made for an improper purpose, and not unreasonable or unduly burdensome given the needs of the case. Rule 104(d) provides that “an evasive or incomplete * * * response is to be treated as a failure to * * * respond.” But when the responding party is signing the response to a discovery demand, he is not certifying that he turned over everything, he is certifying that he made a reasonable inquiry and to the best of his knowledge, his response is complete.
Likewise, “the Federal Rules of Civil Procedure do not require perfection.” Moore, 287 F.R.D. at 191. Like the Tax Court Rules, the Federal Rule of Civil – 9 – Procedure 26(g) only requires a party to make a “reasonable inquiry” when making discovery responses.
The fact that a responding party uses predictive coding to respond to a request for production does not change the standard for measuring the completeness of the response. Here, the words of Judge Peck, a leader in the area of e-discovery, are worth noting:
One point must be stressed – it is inappropriate to hold TAR [technology assisted review] to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using from using TAR for review.
Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 129 (S.D.N.Y. 2015).
We agree with this holding in Dynamo Two and the black letter rule that only reasonable efforts are required, not perfection. That applies to all human endeavors, including every duty imposed by the law. The real issue is whether the predictive coding method was so bad here that is use was per se unreasonable. The proof of unreasonable efforts in any document review project requires proof that actual relevant documents were overlooked and not produced. The defenses did not do that, they just said there were documents with keywords in them that were missed. They did not go on to say that these documents were highly relevant. Apparently the defense here did not grasp a basic fact, that the mere presence of a keyword in a document does not automatically make that document relevant.
The defense also failed to use experts to attack the dreadful method used here to incorporate active machine learning into the document search. We suspect that is because the IRS Commissioner attorneys were part of the problem. It was a practical estoppel.
The opinion by Judge Buch in Dynamo Two is important, and very interesting, because it shows that any method used, be it one that uses predictive coding or keywords, or a method like the e-Discovery Team uses where all methods – multimodal – are employed, including active machine learning (predictive coding), will be presumed reasonable until the challenging party proves otherwise. Again, the proof of unreasonability would, in our view, require proof of both relevant documents missed, namely highly relevant documents where the omission would matter, and proof by experts that the method used was flawed, including statistical evidence, with suggestions for a reasonable effort. None of this evidence and testimony was provided in Dynamo Holdings.
That argument is an outgrowth of what specialists living in the Sedona bubble sometimes call the “dancing ponies” argument. Relevant documents could be chosen by any method, even by a system where every document that dancing ponies happen to step on are considered relevant. (The related image is one where a large stack of paper documents, typically exams, are tossed in the air at a stairway and the papers that reach the bottom of the stair are relevant (or if an exam, graded with an “A”). Many students today suspect their professors of using this grading method.) Dancing ponies, or stair tossing, are truly ridiculous methods, irrational in the extreme, but if they works, so the argument goes, then the method does not matter. The argument of the dancing ponies than goes on to assume that whether a method works, or not, can be determined by statistical methods. Tredennick & Gricks, Discussion About Dynamo Holdings: Is 43% Recall Enough? (Gricks: “As long as validation evinces (sic – I think he meant evidences) a reasonable production, that should be the end of the inquiry.”). The later assumption is, however, predicated by the proper use of statistics for validation, which requires recall to be stated as a probability range, not a point, and is anyway suspect. In Legal Search Exact Recall Can Never Be Known (e-Discovery Team, Dec. 2014).
Even when the range and uncertainties inherent in random sampling are understood, the use of statistics alone to verify reasonability of efforts is questionable. There is just too much uncertainty, especially when low prevalence datasets are involved. Examination of the exact methods used must be included in any defense of process. Dancing pony methods must be attacked and shown for what they are, pseudoscience. At the same time, proof of missing key or highly relevant documents should be shown. That is true, or should be, for all methods, including keyword search and linear review, and also predictive coding. Judge Peck and Judge Buch are right, “it is inappropriate to hold TAR [technology assisted review] to a higher standard than keywords or manual review.” Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, No. 2685-11, 8393-12, 2016 WL 4204067 (T.C. July 13, 2016).
Still, inherent in this equality statement is the underlying premise that all document review methods should be held to a standard of reasonable efforts. In an important case, where proportionality factors favor it, the standard of care should be high. The courts have, in general, been somewhat lax in this department so far, but we think that Dynamo Two is a herald of things to come. In the future we will see predictive coding cases, along with keyword and linear review cases, where the court holds the standard of proportional, reasonable efforts has not been met.
Document review is the essence of all e-discovery. It must be done right for the truth, the whole truth and nothing but the truth to be found. Just because you use some predictive coding in your document review does not mean that the review was done right. There are just as many ways to screw up predictive coding as there are to screw up keyword search or linear review. We conclude our review of Dynamo Holdings Two with the inverse statement to the one made by Judge Andrew Peck and approved by Judge Buch:
It is inappropriate to hold TAR [technology assisted review] to a lower standard than keywords or manual review. Doing so discourages parties from using TAR properly, with expert advice, active machine learning training methods that have been proven to be effective, and statistically correct sample verifications.
Predictive coding is not a magic pill. It is a powerful algorithm that brings active machine learning, a form of artificial intelligence, into legal document review. This requires a high degree of skill to be done properly. Some day this point will be made in a case like Dynamo Holdings where unreasonable, unskilled predictive coding methods are used. We foresee that the objecting party will not complain about the use of predictive coding, but rather HOW it was used. They will likely suggest an alternative predictive coding method, such as the e-Discovery Team’s Predictive Coding 4.0, including quality control and quality assurance tests, sampling and statistical analysis or Recall, Precision and F1. If they make this argument, and buttress it with a showing that important ESI was missed by the first amateur attempt, then we think the objection will be sustained and a redo required.
We suspect that such a ruling is still a few years down the road. In the meantime we should all exercise constant vigilance against bad science and even worse methods. When opposing counsel says they plan to use predictive coding, the appropriate response is great, what type of method will you use? When I am asked that, I hand them my 97 page description, Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow. Others may use that as well. You can call it the standard e-Discovery Team method. Sligh variations are expected to fit the particular assignment. In the future I expect many experts will come up with explanations and detailed description of the method they use. Right now I am the only one putting it out there. Still, I talk to lots these folks, and if they ever do get around to writing it up, it will not be very different from our Hybrid Multimodal method. Among the top experts in the field, including scientists and professors, there is wide spread agreement on basic methods.