U.S. Magistrate Judge G. Michael Harvey is the court successor to the position held by e-discovery famous Magistrate Judge John Facciola (shown here). Judge Harvey’s latest Order on e-Discovery is his best so far. Oxbow Carbon & Minerals LLC v. Union Pacific Railroad Company, No. 11-cv-1049 (PLF/GMH), 2017 WL 4011136, (D.D.C. Sept. 11, 2017).
Oxbow contains an especially good analysis and discussion of the six proportionality factors in revised Rule 26(b)(1), FRCP. It also contains useful metrics on cost that establish a new comparative benchmark for simple email discovery. It is a must-read in the growing list of cases on proportionality and document review.
Background on the Koch Antitrust Case
Oxbow is a very large antitrust case brought by five related companies that mine and sell coal and petroleum coke. The suit is against the Union Pacific and BNSF Railway Company. The complaint alleges a conspiracy by the railroads to fix the prices of shipping coal. Over $1.5 Billion in damages are sought, including treble damages under the Sherman Antitrust Act.
Oxbow is a big case by anyone’s standards. For that reason it should not be surprising that the plaintiffs opposition to the discovery request on proportionality grounds was denied. The final projected costs of the disputed discovery was $85,000. Still, the ruling in Oxbow is interesting for at least two reasons.
First, the scholarly opinion by Judge Harvey contains an excellent collection and discussion of case law on proportionality and discovery. You will want to keep this case handy as a research starter on a number of issues, especially if your case is in D.C.C.
Judge Harvey, shown here in a court room sketch as a young DOJ trial attorney, continues the scholarly writing tradition of his predecessor, Judge John Facciola. This has long been one of the top courts in the country. The DOJ is also usually known for its high quality lawyers. We are in good hands with judges like G. Michael Harvey.
Immediately prior to Michael Harvey becoming a magistrate judge, he served for over seven years in the U.S. Attorney’s National Security Section “where he prosecuted high-profile cases involving espionage, terrorism, export violations, alien smuggling, unauthorized disclosure of classified information, threats against high-ranking officials, and cybercrime.” Impressive. His current term as Magistrate Judge will expire on February 12, 2023.
The second reason the e-Discovery Team likes the Oxbow opinion is because it shares some of the usually hidden costs involved in a big document review project. The producing party here used what is currently the most popular search method among non-e-discovery lawyers, one that is antiquated for a project of this size, namely negotiated keyword filtering.
This method hearkens back to the earliest days of electronic document review in the late eighties and early nineties. In most projects today by e-discovery experts, if keyword filtering is used at all to exclude data from review, it is based on tested and refined keywords, not mere negotiated keywords. Typically negotiated keywords are tested and refined on the unfiltered data of one or two custodians before used to filter out the rest.
Careful study of the facts in the opinion allow us to calculate certain metrics on costs and effectiveness, from both the economic and information scientific perspective. We can not only calculate costs, but also the relevance prevalence rates and precision. We can also make some educated guesses on recall. These metrics can assist in quality control. As we will see, the producing plaintiffs’ team actually did quite well on Precision – 65%, but we can only guess on the Recall rate. We can watch that unfold as the case progresses. It is still in its early discovery stages.
The information distilled from Oxbow provides a current benchmark for your own doc review efforts. I will do the analysis for you to facilitate comparisons. That can get tricky. (Even some of the numbers stated in this opinion do not add up.) Comparisons can help you to prove that your own efforts were reasonable, or to challenge the reasonability of the responding party’s efforts.
In Oxbow the parties successfully completed phase one email discovery on the email of nineteen custodians of plaintiffs employees. This Order here concerns a requested twentieth custodian, what I would call the beginning of phase two discovery. That new custodian was the CEO of the plaintiff coal companies, William I. Koch, Oxbow’s founder, CEO, and principle owner.
Yes, Bill Koch is one of the infamous billionaire conservative activists Koch brothers, although he keeps a much lower political profile than his brothers Charles and David Koch. According to news reports the 2017 Oxbow employee Christmas party was held at Mar-a-Lago, where Koch is a longtime member, and President Trump made an appearance at the party. Need I say more?
Plaintiffs argued that the email request from William Koch was one step too far, that Koch had nothing of value or interest to say on the subjects in the antitrust litigation. This position was unsupported by the facts.
[U]nder the amended Rule 26, discovery must be relevant and “proportional to the needs of the case.” Fed. R. Civ. P. 26(b)(1). To determine whether a discovery request is proportional, courts weigh the following six factors: “(1) the importance of the issues at stake in this action; (2) the amount in controversy; (3) the parties’ relative access to relevant information; (4) the parties’ resources; (5) the importance of the discovery in resolving the issues; and (6) whether the burden or expense of the proposed discovery outweighs its likely benefit.” Williams v. BASF Catalysts, LLC, Civ. Action No. 11-1754, 2017 WL 3317295, at *4 (D.N.J. Aug. 3, 2017) (citing Fed. R. Civ. P. 26(b)(1)); Arrow Enter. Computing Solutions, Inc. v. BlueAlly, LLC, No. 5:15-CV-37-FL, 2017 WL 876266, at *4 (E.D.N.C. Mar. 3, 2017); FTC v. Staples, Inc., Civ. Action No. 15-2115 (EGS), 2016 WL 4194045, at *2 (D.D.C. Feb. 26, 2016).
“[N]o single factor is designed to outweigh the other factors in determining whether the discovery sought is proportional,” and all proportionality determinations must be made on a case-by-case basis. Williams, 2017 WL 3317295, at *4 (internal citations omitted); see also Bell v. Reading Hosp., Civ. Action No. 13-5927, 2016 WL 162991, at *2 (E.D. Pa. Jan. 14, 2016). To be sure, however, “the amendments to Rule 26(b) do not alter the basic allocation of the burden on the party resisting discovery to—in order to successfully resist a motion to compel—specifically object and show that . . . a discovery request would impose an undue burden or expense or is otherwise objectionable.” Mir v. L-3 Commc’ns Integrated Sys., L.P., 319 F.R.D. 220, 226 (N.D. Tex. 2016). …
The Court is unpersuaded by Oxbow’s arguments. In its briefing, Oxbow declines to address any of the other proportionality factors highlighted in Rule 26—namely, the importance of the issues at stake in this action, the amount in controversy, the parties’ relative access to relevant information, the parties’ resources, or the importance of the discovery in resolving the issues in this case, see Fed. R. Civ. P. 26(b)(1)—stressing only that the burden and cost of complying with Defendants’ request would outweigh its likely benefit. Id. Weighing the six Rule 26 proportionality factors, however, demonstrates that adding Koch as a custodian of documents to be searched for material responsive to Defendants’ discovery requests in this matter will be neither unduly burdensome nor unreasonably expensive in light of the facts of this case. Likewise, the Court finds that the instant circumstances do not warrant shifting the costs of doing so to Defendants. Accordingly, Defendants’ motion to compel will be granted and Plaintiffs shall be ordered to produce all remaining responsive documents from Koch’s file, the cost of which Plaintiffs shall bear.
Judge Harvey then reviewed each of the six factors and explained how each applied in this case. In commenting on the fifth factor, The Importance of the Discovery in Resolving the Issues, Judge Harvey held:
The Court appreciates that Koch’s files do not appear to contain as a high a proportion of responsive documents as the files of custodians who dealt exclusively with Oxbow’s coal and petcoke business, but it strains reason to suggest that the principal owner and CEO of a company, who has publicly commented on the importance and magnitude of litigation to which his company is a party and in which the financial health of his company is at issue, see Def. Ex. 22 [Dkt. 105-23] at 2-3, would have no unique information relevant to that litigation in his possession. While it may be too early in the production process to determine exactly how significant Koch’s records are, the categories of relevant documents identified by Defendants after reviewing the approximately 1,300 documents produced from Koch’s files indicates to the Court that Defendants’ discovery request has merit and is not intended to be the first strike in a war of attrition or a coercion tactic. Accordingly, the Court concludes that this factor favors granting Defendants’ proposed discovery.
Most of the opinion is focused on the sixth factor, Whether the Burden or Expense of the Proposed Discovery Outweighs its Likely Benefit, because that is all plaintiffs’ counsel argued. You can hardly fault them for omitting the other factors because they are all obviously adverse to excluding this Koch discovery. Here is Judge Harvey’s final conclusion and order compelling discovery of the Koch email:
Oxbow rests its argument entirely on this final factor, asserting that it is the most important of the Rule 26 proportionality factors and counsels against granting Defendants’ proposed discovery. …
The Court disagrees. The cost of reviewing and producing Koch’s documents does not strike the undersigned as unduly burdensome or disproportionate, especially given the discovery conducted to date and the damages that Oxbow seeks in this action. Plaintiffs’ counsel explained at the second hearing in this matter that Oxbow has spent $1.391 million to date on reviewing and producing approximately 584,000 documents from its nineteen other custodians and Oxbow’s email archive. See 8/24/17 TR. at 44:22-45:10. And again, Oxbow seeks tens of millions of dollars from Defendants. Through that lens, the estimated cost of reviewing and producing Koch’s responsive documents—even considering the total approximate cost of $142,000 for that effort, which includes the expense of the sampling effort—while certainly high, is not so unreasonably high as to warrant rejecting Defendants’ request out of hand. See Zubulake v. UBS Warburg, LLC, 217 F.R.D. 309, 321 (S.D.N.Y. 2003) (explaining, in the context of a cost-shifting request, that “[a] response to a discovery request costing $100,000 sounds (and is) costly, but in a case potentially worth millions of dollars, the cost of responding may not be unduly burdensome”); Xpedior Creditor Trust v. Credit Suisse First Boston (USA), Inc., 309 F. Supp. 2d 459, 466 (S.D.N.Y. 2003) (finding no “undue burden or expense” to justify cost-shifting where the requested discovery cost approximately $400,000 but the litigation involved at least $68.7 million in damages). Moreover, based on the parties’ representations at the second hearing in this matter, the projected number of responsive and unique documents in Koch’s files—approximately 10,000—is largely consistent with the number of responsive and unique documents produced by the other Oxbow custodians, and the responsiveness rate of Koch’s documents—11.67 percent—while low, is not the lowest among Oxbow’s custodians.
In light of the above analysis—including the undersigned’s assessment of each of the Rule 26 proportionality factors, all of which weigh in favor of granting Defendants’ motion—the Court is unwilling to find that the burden of reviewing the remaining 65,000 responsive documents for a fraction of the cost of discovery to date should preclude Defendants’ proposed request. See BlueAlly, 2017 WL 876266, at *5 (“This [last Rule 26] factor may combine all the previous factors into a final analysis of burdens versus benefits.” (citing Fed. R. Civ. P. 26 advisory committee’s notes)). For all of the reasons stated above, and absent any evidence establishing that Defendants are using the discovery of Koch’s records to wage a war of attrition or as a device to coerce Oxbow, the Court finds that Defendants’ motion must be granted.
Phase One Keyword Search with Linear Review
Oxbow not only provides good legal analysis, it contains interesting disclosures of the costs involved in a large electronic document review project. The project in question was managed by plaintiffs’ counsel for Oxbow. They used the now popular, negotiated keyword search and linear hit review method. The keyword filtering used mere negotiated keywords that were not tested and refined on actual data. The linear review was limited to documents with a hit on one or more of the negotiated keywords. There were 584,000 such hit-documents. The rest of the documents of the nineteen custodians, the ones that did not have a keyword in them, were not reviewed. They were all presumed irrelevant.
Simple. But not necessarily the most effective or efficient method of document review. Certainly most e-discovery professionals would handle a large project like this in a different manner, especially considering the size of the case itself. Keyword search would be used, but only as part of a multimodal process that used additional search methods, such as concept searches, similarity searches and the all-powerful (if done right) predictive coding searches (AI). I have described this popular, but wrong methods before and called it the Distorted Search Pyramid. See eg. TAR Course: 8th Class (Keyword and Linear Review).
The parties completed phase one discovery on nineteen custodians with more than a few difficult negotiations and hearings concerning keyword negotiation. But they eventually got through it and the production was complete. After that, the defendants decided they wanted Koch’s relevant documents too. The defendants proposed using the same keyword filter on his data as was used on the first nineteen custodians.
This motions and Order here concern the Koch email. Plaintiffs argued that this custodian was over-burdensome, that they had already spent enough in document review, and that Koch’s emails would have nothing of value or interest to say on the subjects in the antitrust litigation. At least nothing unique that had not already been seen in the first nineteen custodians email. This position was unsupported by the facts. In fact, it is hard to imagine any set of facts in which this position would be persuasive in an antitrust case like this, especially with a CEO Owner like Koch. To quote Judge Harvey at pg. 12:
[I]t strains reason to suggest that the principal owner and CEO of a company, who has publicly commented on the importance and magnitude of litigation to which his company is a party and in which the financial health of his company is at issue, see Def. Ex. 22 [Dkt. 105-23] at 2–3, would have no unique information relevant to that litigation in his possession.
Cost of Phase One Keyword Search and Review
I like to determine the costs of a project based on the number of files reviewed, which would exclude files that had been keyword filtered out of the collection. The cost per file of phase one discovery in Oxbow of ESI from the first nineteen custodians was $2.38 per file. This is derived from the following information provided by Judge Harvey at page 13 of the Opinion:
Plaintiffs’ counsel explained at the second hearing in this matter that Oxbow has spent $1.391 million to date on reviewing and producing approximately 584,000 documents from its nineteen other custodians and Oxbow’s email archive.
Does this seem expensive? Does the fact we are talking about a volume of 584,000 computer files effect your view? (It should.)
The total cost of the phase one review was $1,391,000. Some may say yes, $2.38 per document is too expensive, and others may say no, $2.38 is very reasonable. Most experts would think that varying prices are in this range are to be expected and that the costs vary according to a number of factors, including market conditions and data complexity. In my view the only way to lower this price further is by use a full multimodal Predictive Coding Hybrid method, preferably one that uses IST instead of other kinds of continuous training.
In Oxbow the data appears to have been a fairly simple Outlook Exchange collection. The 584,000 documents referenced in the Opinion is the total keyword filter hit count for the first nineteen custodians. The files all appear to be ordinary vanilla emails and attachments. I wish we had the total custodian data count, in other words, the total number of deduplicated files for these nineteen custodians before the keyword culling, but we don’t. I would estimate the count to be between One and Three Million files, probably in the 1.5M area.
Phase Two Keyword Search with Linear Review
The defendants asked for plaintiffs to review the Koch files the same way they reviewed the others. Screen out all files that did not have a keyword on their negotiated list and review for relevance all of the others, namely all of the documents with hits and their families. Unlike the first nineteen custodians, the Opinion tells us the total data universe count for Koch. It was 467,614. Apparently, almost all of the 467,614 Koch files were emails and attachments.
Before the Koch files were processed, including deduping and keyword filtering, the parties could only guess at the total number of unique files would contain a keyword term. The plaintiffs estimated that there would be about 214,000 Koch files with a keyword hit. That would be the number of files that they would have to review for relevance. That 214,000 files estimate was highly speculative. There does not appear to have been any sampling done. It turns the estimate was way off, way too high, and with it, the cost estimate to review the Koch files.
Cost of Phase Two Keyword Search and Review
The plaintiff’s at first estimated the costs for review of the Koch email to be $250,000. That is $1.17 per file $250,000/214,000). This is half the $2.38 per file cost of review of the first nine-custodians. One wonders why they thought the cost per file would go down? Certainly the privilege concerns would be heightened by review of the owner’s files and that would drive the costs up. One wonders if they made a mistake and were not tracking metrics properly. (Or perhaps I am making a mistake? Please let me know if you see any error in my cost analysis.)
Once the plaintiff actually processed the 467,614 Koch files and ran the keyword filter, there were only 45,639 files with a hit. That is 9.76%. When you add families of these documents to the review, which was their protocol, the total number of files to review increased to 82,000. After learning of the actual hit count plaintiffs were forced to reduce their cost estimate from $250,000 to $142,000. The new estimated cost of review is $1.73 per file. That is higher than the $1.17 per file cost when they estimated when there were 214,000 files to review, but still lower than their actual cost for the first nineteen custodians, $2.38.
The next thing that happened is that plaintiff’s reviewed a random sample of ten percent (10%) sample of the 82,000 total. The ten percent sample of hit documents and families ended up being 12,074 documents total. Plaintiff’s counsel reviewed these 12,074 documents and found that approximately 1,300 documents—11.67 percent of them—were actually relevant. These 1,300 documents were than produced or logged. Plaintiffs then reported to Judge Harvey that it cost them $57,197.95 to review these 12,074 documents. That is a cost of $4.74 per file, way higher than they had projected. But the higher cost is not surprising when you consider the heightened privilege expenses that you can expect when reviewing CEO emails.
Based on this experience plaintiff’s attorneys told Judge Harvey that it would likley cost another $85,000 to review the remaining 69,926 files (82,000-12,074). That would bring the total cost of the Phase Two review of Koch email, including the sample documents, to approximately $142,000. Judge Harvey points out several times in the opinion that that is “significantly less than Oxbow’s original estimate of $250,000.” The new $142,000 projected cost to review the last 69,926 files creates a rate of $1.22 per file. How plaintiffs’ counsel keeps coming up with different costs per file is not explained. That final number, $1.22 per file, shows that their last projection of $85,000 (and $142,000) is likely too low. Unless they change their review method, I do not see how or why the cost per file would be come down like that, especially since the sample review of the Koch emails cost $4.74 per file.
Plaintiffs’ counsel argued that there was not that much benefit from review of Koch files because only 11.67% were relevant. A technical way to say is by use of the term prevalence, meaning the percent of relevant documents. The prevalence of the Koch docs was 11.67%. They said that is too low a prevalence rate to justify the $142,000 expense under proportionality standards. It shows that there is not much benefit. The fallacies of this argument should be obvious. Koch is, after all, the CEO and owner of all plaintiff businesses. His unique relevant documents are obviously of more importance than the nineteen others that work for him. It is his company and he is the plaintiff in a major antitrust case. Of course he will be deposed and his documents reviewed.
Further, as Judge Harvey points out in Footnote Five of the Opinion, although Koch’s 11.67 % rate was low, one of the nineteen earlier custodians has a much lower rate than that, 3.03% and another had a 13.36% prevalence, which is only slightly higher than Koch’s 11.67%. For that reason Judge Harvey was not impressed by the fact that the average Prevalence rate of the first nineteen custodians was 65%.
That 65% prevalence rate is, in my experience, very high for this kind of keyword guessing approach. We often see prevalence rates less than 20% or 10% or less using this method. That in turn suggests that the keywords negotiated in the instant case were not overly burdensome on the responding party. This high a Precision, 65%, which is another way to look at it, means that there must be some trade-off in Recall. Typically with keyword search a high precision rate, one greater than 50%, is at the expense of some loss in Recall, meaning missing relevant documents. The 65% suggests that many relevant documents were likely left on the table, in other words, filtered out of the review because they did not contain one of the keywords. That is one reason why we try to never rely on keyword search alone, but to take a multimodal approach. The mono-modal approach of just using negotiated keywords is too unreliable, too risky. It will sometime miss key evidence that used unexpected text. That is why we teach Multimodal Culling.
Successful arguments on motions to compel require hard evidence. To meet your burden of proof you must present credible estimates of the costs of document review. This requires experience, market knowledge and complete data about the files (metadata).
The arguments on both sides of a motion to compel ESI production should begin with reliable metrics and statistics concerning the ESI that the requesting party wants the responding party to review. This requires disclosures by the responding party, but these data disclosures are limited and probably do not matter to anyone else in any event. Who care how many files are in Mr. X’s Outlook Mailbox?
In the motions counsel should explain the costs and billings involved in the proposed document review project and, ideally, also provide a comparative reference to other projects. This helps a judge to determine reasonability. Arguments can be based on your general experience with document review costs and on the specific costs already incurred in the instant case. Emphasis should be put on current project costs and prevalence rates.
Remember to do the cost per file calculations in every motion. They are great metrics for comparisons. Apparently that was never done in Oxbow because the plaintiffs counsel’s cost estimates varied so much. A per file cost analysis shows that they missed a key metric, one that could have helped their motion. The cost ranged from an actual cost of $2.38 per file for the first 584,000, to an 1.17 per file estimate to review 214,000 Koch files, to an estimate of $1.73 per file to review 82,000 Koch files, to an actual cost of $4.74 per file to review 12,074 Koch files, to another estimate of $1.22 per file to review the remaining 69,926 Koch files. The actual costs are way higher than their estimated costs.
Finally, in these motions counsel should mention all six factors under Rule 26(b)(1), not just cost burden. Still, in many cases like Oxbow Carbon & Minerals LLC v. Union Pacific Railroad Company, it may be appropriate to focus your memorandum on the cost estimates. Realistic concessions are always appreciated by the judiciary and help to narrow the issues.