This is part two of this blog. Suggest you read part one first for this to make much sense.
What Happened When Plaintiff Started Carrying Out Its Keyword Agreement
I am sure you can guess what happened in Progressive after plaintiff’s counsel started trying to follow the keyword agreement. We have all seen this many times before. In Judge Leen’s own words, here is what happened:
Progressive and the FDIC–R agreed to search terms to be applied to ESI documents. Applying the search terms to the 1.8 million documents collected reduced potentially responsive documents culled by the search terms to approximately 565,000 documents. Progressive’s counsel began manually reviewing these 565,000 documents, but quickly determined that manual review was too time intensive and expensive. As a result, Progressive began exploring alternative approaches to the review and production of ESI.
In other words, Plaintiff’s counsel finally figured out that the keywords they had agreed only served to reduce the documents to be manually reviewed from 1,500,000 to 565,000 documents. I suspect this large number of hit came as a surprise to them. Had they only sampled and tested they would have known better than to agree to this protocol, so I assume they did not.
After plaintiff’s counsel begun the manual review of the 565,00 documents they no doubt realized how long review takes. My guess is that the reviewers could only review about 75 files per hour. Thus it would take 7,533 attorney hours to complete the review. Assuming a fairly low contract review lawyer rate of only $40 per hour, that means a cost for manual review of over $300,000, which does not include the substantial additional law firm attorneys fees for supervision of this review. My guess is that the estimated costs to complete the manual review after keyword culling was between $350,000 to $400,000, possibly far more if the review speed was slower or rates higher. Every project is different for a variety of reasons.
Expert Arrives to Try to Save the Day
If I were a client hearing those costs for the first time, I would be upset. I would demand that my attorney do something. I suspect that is why the expert was finally brought in to this case, to help reduce the projected costs. The expert, of course, recommended predictive coding, no doubt as part of a multimodal search effort. I would assume that after the client and plaintiff’s counsel heard the expert, they were happy to learn that the use of predictive coding would greatly reduce the projected expenses. I am sure they were also pleased to learn this method would be more effective at finding relevant documents and privileged documents. There was only one little problem, they had already agreed to manual review of the documents with key words, and this agreement had been reduced to a court order.
Here is how Judge Leen describes what happened:
Progressive began utilizing predictive coding techniques to review ESI without the Defendants’ agreement to amend the parties’ stipulated ESI protocol Order (Dkt.# 67), and without seeking leave of the court to amend the ESI order.
Judge Peggy Leen says they began using predictive coding techniques, and makes a big point that they did so without the court’s permission. But if you did deeper into the opinion, it is apparent that plaintiff never actually got into full predictive coding training and review. They just began the process, and then went to the court for approval to carry it out.
Of course they had to begin the process in order to gather data as to how effective it might be and how it would work. How could they otherwise present their argument to change the protocol? In fact, all that plaintiff’s counsel did, no doubt under the supervision of their new expert, was to perform the initial random samples to determine prevalence and set up a control set of test documents. Somehow this point seems to have been lost in the opinion, which makes it seem like they did the entire predictive coding review before coming to the court. That appears to be a misunderstanding on the part of the court. In my standard protocol shown in the diagram below plaintiff had only done the first preliminary steps one, two, and three. They had only just begun. See LegalSearchScience.com and my Computer Assisted Review page.
The plaintiff in Progressive, according to my reading of the technical facts stated in the opinion, had not even started the iterative machine training, which is the heart of all predictive coding. They were far away from the all important step seven of second pass final review. If they had they done the whole predictive coding process, all eight steps, and then come to the court for approval, I could understand why the request would be denied as too late. But, as it is, the denial seems contrary to the dictates of Rule One to always aim for just, speedy and inexpensive discovery. Rule 1, FRCP.
You could not come to a court to seek a new plan without providing concrete facts that the old plan was not working, and at least an outline of a new, better plan. I assume that is why the plaintiff’s expert in Progressive performed step two, and brought in a subject matter expert to review the samples. This allowed them to determine prevalence of the keyword culled collection. According to the court’s opinion, Plaintiff”s new experts found that only approximately 55,765 documents were likely relevant out of the total 565,000. That proves that the keyword search was a total disaster. It proves that almost 90% of the documents with keywords were irrelevant. That is a pathetic 10% precision rate.
That does not surprise me. Indeed, any search Dilbert could have predicted that. Ten percent precisions is bad, but I have seen worse. I once had a case much like this one, where I was brought in after the fact. I then found that almost 98% of the documents with keywords were irrelevant. This kind of evidence helped persuade the court to adopt a new plan. But that did not work here, and the only real clue we have are somewhat vague allegations in the Court order that plaintiff was not transparent enough, and did not cooperate. To me this smacks of a successful smoke and mirrors spin by counsel. That is a constant danger in highly technical areas like predictive coding unless you are very good at simplified, yet accurate explanations.
The expert showed to opposing counsel, then later the court, that the old plan was not working. He showed how imprecise the keyword search was, how expensive it would be to complete the search, and presented an alternative plan using predictive coding. Opposing counsel did not care about plaintiff’s expense, nor precision. They just said a deals a deal and refused any compromise. Then the issue of predictive coding went to the court, where, as we know, it was promptly shot down.
Alleged Defects of Proposed Predictive Coding Plan
Here is Judge Peggy Leen’s explanation as to why she refused to allow plaintiff to switch to predictive coding:
Had the parties worked with their e-discovery consultants and agreed at the onset of this case to a predictive coding based ESI protocol, the court would not hesitate to approve a transparent mutually agreed upon ESI protocol. However, this is not what happened.
Aside from the too late rationale, Judge Leen suggests that although she favors predictive coding, the details of the particular proposal made by plaintiff’s expert was somehow deficient. The opinion does not really spell out these deficiencies, at least not to my satisfaction. Judge Leen states:
Progressive’s proposal would relieve it of the burden of manual review of ESI according to the ESI protocol it stipulated to and allow it to adopt a predictive coding alternative to a small subset of the universe of ESI collected. Its proposal would also give its counsel exclusive responsibility for training the predictive coding software, and fails to comply with all of the best practices recommended by the Equivio software program. Progressive proposes a “do-over” of its own invention that lacks transparency and cooperation regarding the search methodologies applied.
Failure to comply with all of a software provider’s best practices? Please! They make software, not legal methods, not best practices. They are not lawyers. They do not set legal standards. They are engineers and businessman. They make and sell software for profit. They cannot provide legal opinions on best practices. Whereas the expert retained by plaintiff in Progressive was a lawyer. In fact, he was one of the top experts in the field. What does an Israeli software company like Equivio have to do with legal standards in the U.S. justice system? Software is just a tool for legal search, much like a judge has a gavel as a tool. If a gavel came with recommendations on how to use it, would a judge’s call to order be invalid because she did it her own way? Her way might be far better than the default instructions.
So called “best practices” by non-lawyer vendors are nothing more than suggestions for beginners. No vendor would ever purport to provide legal advice, nor opine that its software can only effectively be used in one way. I doubt very much that Equivio does so. They are too sophisticated for that. They understand full well the importance of legal experts. In fact, best practice number one of Equivio’s 2013 paper, Ten Essential Best Practices in Predictive Coding, is: “Choose the expert with due consideration.” The Equivio Best Practices paper wisely concludes by saying:
In conclusion, it should be noted that predictive coding is a dynamic, rapidly developing arena. The best practices described here will undoubtedly continue to evolve. In documenting them as they have taken shape over the past year or so, the intention is not to define a universal textbook template for the generic predictive coding project, but to provide a platform from which it will be possible to develop, refine and create new and better practices as the e-discovery industry continues to assimilate the game-changing technology of predictive coding.
Equivio’s article on Best Practices was written by Warwick Sharp, the Vice President of Marketing and Business Development for this software corporation based in Rosh Haayin, Israel. Mr. Sharp has a master’s degree in Political Thought from Tel-Aviv University. He has it right in stating that his company builds a platform, and others, namely legal and scientific experts, develop, refine and create new and better practices to use this game-changing technology. If details had been provided in Judge Leen’s opinion as to the alleged failures to comply with all of Equivio’s best practices, then perhaps we could understand what this comment was supposed to mean, but there were none. Again I suspect clever spin doctor arguments at work here by opposing counsel.
As to Judge Leen’s “lack of transparency” comment, we are again left guessing at what this means. It seemed like a pretty detailed disclosure to me. I am just guessing here, but usually the big issue in predictive coding is whether to disclose irrelevant training documents. My thinking on this has evolved since the Da Silva Moore case protocol where all irrelevant training documents were to be disclosed. Based on my own research, and that of others, only grey area irrelevant documents need to be disclosed for the process to work. These borderline documents are the only ones that make a difference in the training, as I have discussed in detail before. See Eg: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three at the subheadings Disclosure of Irrelevant Training Documents and Conclusions Regarding Inconsistent Reviews.
Probably the experts disagreed on this point. There is still not widespread consensus on the issue of disclosure of irrelevant training documents. No doubt defense counsel wanted to look at everything, even though the science does not support that. Maybe that was the key issue, maybe not. The opinion does not really say. If that was the issue, then I would have preferred the court to rule on it and allow predictive coding to be used according to what the judge thought was a reasonable disclosure. Give the proponent the choice to go forward on that basis or not.
The only specific objection made by defense counsel that I could see had to do with the objection that the predictive coding was based on a keyword filtered subset of 565,000 documents, and not the full collection of 1,800,000. This objection by the defense was noted in Judge Leen’s opinion, but it was not analyzed, nor included as a specific basis for the denial. Judge Leen just used the dispute on this issue as an example of plaintiff’s alleged failure to cooperate. This two-filter culling approach is controversial primarily because the first filter, keyword, usually does such a poor job of recall. It filters out too much. The only real justification for this approach is cost. Most experts agree a filter of some kind should be used for the pool of documents to be subject to predictive coding, but it should usually not be a simplistic keyword filter as proposed here. For instance, non-text documents should be filtered out since predictive coding now uses text analytics. Other types of documents should be filtered out too for best results.
Again, I would have preferred a ruling on this two-filter issue, perhaps after hearing evidence, or at least a detailed argument with proffers. Then the proponent should be given the opportunity to go forward on the basis of the ruling or not. But that did not happen, instead we get an order like this:
Progressive is unwilling to engage in the type of cooperation and transparency that its own e-discovery consultant has so comprehensibly and persuasively explained is needed for a predictive coding protocol to be accepted by the court or opposing counsel as a reasonable method to search for and produce responsive ESI. Progressive is also unwilling to apply the predictive coding method it selected to the universe of ESI collected. The method described does not comply with all of Equivio’s recommended best practices. The court agrees with the FDIC–R that approving Progressive’s predictive coding proposal, or for that matter, the FDIC-r’ competing predictive coding protocol, will only result in more disputes. It will also further delay completion of discovery in this 2011 filed case.
Court’s Punitive Order in Progressive
The court ordered the plaintiff to produce all of the 565,000 “hit” documents within fourteen days without further review. The court justifies this seemingly harsh result with the following palliations:
Under these circumstances, the court will require Progressive to produce the “hit” documents to the FDIC–R within fourteen days without further review. The court recognizes that requiring production of all of the “hit” documents will likely result in the production of documents not responsive to the FDIC–R’s discovery requests. However, the parties’ stipulated ESI protocol adopted this approach as one of two alternatives for Progressive’s production. Progressive elected and then abandoned the second option-to manually review and produce responsive ESI documents. It abandoned the option it selected unilaterally, without the FDICR’s acquiescence or the court’s approval and modification of the parties’ stipulated ESI protocol. Adopting the FDIC’s proposal of producing the “hit” documents will shift the cost of review to the FDIC–R. The FDIC–R has committed to devote the resources required to review the documents as expeditiously as possible and estimates the process could be completed in about a month by tapping into the resources of the numerous firms involved in these related actions who also have a substantially similar ESI protocol entered in their actions. It will allow discovery, which has been stalled for many months while this dispute is pending, to move forward, and reduce future disputes about Progressive’s ESI production.
But what about the right of plaintiff to protect its privileged communications in the forced production of 565,000 documents, you might well ask? Judge Leen seems mindful as to the apparent harshness of her order, and offers this as a consolation.
Progressive may apply privilege filters to the 565,000 documents retrieved by use of the search terms and withhold more likely privileged documents identified by use of privilege filters. Progressive shall serve a privileged document log which fully complies with Rule 26(b)(5)(A) for any documents withheld or redacted as privileged or subject to protection as trial-preparation material. To reduce the burden of preparation of a privileged document log, the court will permit Progressive to serve three privileged documents logs for one-third of the documents withheld or redacted at thirty day, sixty day, and ninety day intervals.
No word yet on whether plaintiff’s have appealed this order. I suspect they will. But in the meantime you can bet plaintiff’s counsel are working feverishly on privilege filters to try to protect their client’s privileged communications. Clawback orders can never un-ring a bell.
Progressive shows the dangers of not involving expert search counsel at the very beginning of a case. This may seem a little self-serving on my part, but the message in Progressive is loud and clear – hire a predictive coding search consultant as soon as you realize you have a big data case. If you are going to have to review several hundred thousand documents, then it is probably big enough to use predictive coding. Bring in a legal search expert. If there are more than a million documents, as in Progressive, then you obviously should use AI-enhanced review. Progressive shows that it is ill-advised to wait to hire a legal search consultant. Do not be penny wise and pound foolish, bring one of us in right away. If you wait too long, it may be too late for us to help you.
Progressive also shows, once again, that keyword search alone does not work to find targeted documents. It is not a magic incantation. In Progressive only one out of ten of the documents with keywords was at all relevant. That is very imprecise. It creates a huge waste of reviewer time doing second review. Every predictive coding project I have ever worked on has had far better precision than that. My experience is not unique. Other tests show the recall is also better when you include active machine learning. It is better than using keyword and manual review alone. I might also add that similarity and concept type searches are also good. Use all of the tools provided by modern search and review software.
Finally, Progressive shows that, timeliness issues aside, judges understand the efficacy of predictive coding. Judge Leen said she believed that predictive coding was better than the keyword protocol adopted. She said she would have approved it, if only it had been brought to her earlier.
Do not make the same mistake as plaintiff’s counsel in Progressive, hire a predictive coding expert sooner rather than later and so avoid the “if only” Vegas blues.