The researchers at IT-Lex have uncovered a previously unknown predictive coding case out of the SDNY, Federal Housing Finance Authority v. JP Morgan Chase & Co., Inc. et al. The et al here includes just about every other major bank in the world, each represented by one of the top 25 mega law firms in the country. The interesting orders approving predictive coding were entered in 2012, yet, until now, no one has ever talked about FHFA v JP Morgan. That is amazing considering the many players involved.
The two main orders in the case pertaining to predictive coding, are here (order dated July 24, 2012), and here (order dated July 31, 2012). I have highlighted the main passages in these long transcripts. These are Ore Tenus orders, but orders none the less. The Pacer file is huge, so IT-Lex may have missed others, but we doubt it. The two key memorandums underlying the orders are by the defendant, JP Morgan’s attorneys, Sullivan & Cromwell, dated July 20, 2012, and by the plaintiff, FHFA’s lawyers, Quinn Emanuel Urquhart & Sullivan, dated July 23, 2012.
The fact these are ore tenus rulings on predictive coding explains how they have remained under the radar for so long. The orders show the mastery, finesse, and wisdom of the presiding District Court Judge Denise Cote. She was hearing her first predictive coding issue and handled it beautifully. Unfortunately, under the transcript the trial lawyers arguing pro and con did not hold up as well. Still, they appear to have been supported by good e-discovery lawyer experts behind the scenes. It all seems to have all turned out relatively well in the end as a recent Order dated February 14, 2014 suggests. Predictive coding was approved and court ordered cooperation resulted in a predictive coding project that appears to have gone pretty well.
Defense Wanted To Use Predictive Coding
The case starts with the defense, primarily JP Morgan, wanting to use predictive coding and the plaintiff, FHFA, objecting. The FHFA wanted the defendant banks to review everything. Good old tried and true linear review. The plaintiff also had fall back objections on the way the defense proposed to conduct the predictive coding.
The letter memorandum by Sullivan & Cromwell for JP Morgan is only three pages in length, but has 63 pages of exhibits attached. The letter relies heavily on the then new Da Silva Moore opinion by Judge Peck. The exhibits include the now famous 2011 Grossman and Cormack law review article on TAR, a letter from plaintiff’s counsel objecting to predictive coding, and a proposed stipulation and order. Here are key segments of Sullivan and Cromwell’s arguments:
According to Plaintiff, it will not agree to JPMC’s use of any Predictive Coding unless JPMC agrees to manually review each and every one of the millions of documents that JPMC anticipates collecting. As Plaintiff stated: “FHF A’s position is straightforward. In reviewing the documents identified by the agreed-upon search terms, the JPM Defendants should not deem a document nonresponsive unless that document has been reviewed by an attorney.”
Plaintiffs stated position, and its focus on “non-responsive” documents, necessitates this request for prompt judicial guidance. Predictive Coding has been recognized widely as a useful, efficient and reliable tool precisely because it can help determine whether there is some subset of documents that need not be manually reviewed, without sacrificing the benefit, if any, gained from manual review. Predictive Coding can also aid in the prioritization of documents that are most likely to be responsive. As a leading judicial opinion as well as commentators have warned, the assumption that manual review of every document is superior to Predictive Coding is “a myth” because “statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review.” Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350, at *28 (S.D.N.Y. Feb. 24, 2012) (Peck, Mag. J.) …
JPMC respectfully submits that this is an ideal case for Predictive Coding or “machine learning” to be deployed in aid of a massive, expedited document production. Plaintiffs claims in this case against JPMC concern more than 100 distinct securitizations, issued over a several year period by three institutions that were entirely separate until the end of that period, in 2008 (i.e., JPMorgan Chase, Bear Stearns & Co., and Washington Mutual). JPMC conservatively predicts that it will have to review over 2.5 million documents collected from over 100 individual custodians. Plaintiffhas called upon JPMC to add large numbers of custodians, expand date ranges, and otherwise augment this population, which could only expand the time and expense required? Computer assisted review has been approved for use on comparable volumes of material. See, e.g., DaSilva Moore, 2012 U.S. Dist. LEXIS 23350, at *40 (noting that the manual review of3 million emails is “simply too expensive.”).
The plaintiff federal government agency, FHFA, filed its own three page response letter with 11 pages of exhibits. The response objects to use of predictive coding and the plaintiff’s proposed methodology. Here is the core of their argument:
First, JPMC’s proposal is the worst of both worlds, in that the set of documents to which predictive coding is to be applied is already narrowed through the use of search terms designed to collect relevant documents, and predictive coding would further narrow that set of documents without attorney review,1 thereby eliminating potentially responsive documents. …
Finally, because training a predictive coding program takes a considerable amount of time,2 the truncated timeframe for production of documents actually renders these Actions far from “ideal” for the use of predictive coding.
The first objection on keyword search screening is good, but the second, that training would take too long, shows that the FHFA needed better experts. The machine learning training time is usually far less than the document review time, especially in a case like this, and the overall times savings from using predictive coding are dramatic. So the second objection was a real dog.
Still, FHFA made one more objection to method that was well placed, namely that their had been virtually no disclosure as to how Sullivan and Cromwell intended to conduct the process. (My guess is, they had not really worked that all out yet. This was all new then, remember.)
[I]t has similarly failed to provide this Court with any details explaining (i) how it intends to use predictive coding, (ii) the methodology or computer program that will be used to determine responsiveness, or (iii) any safeguards that will ensure that responsive documents are not excluded by the computer model. Without such details, neither FHFA nor this Court can meaningfully assess JPMC’s proposal. See Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350, at *23 (S.D.N.Y. Feb. 24, 2012) (“[Defendant’s] transparency in its proposed ESI search protocol made it easier for the Court to approve the use of predictive coding.”).4 JPMC’s proposed order sets forth an amorphous proposal that lacks any details. In the absence of such information, this Court’s authorization of JPMC’s use of predictive coding would effectively give JPMC carte blanche to implement predictive coding as it sees fit.
Hearing of July 24, 2012
Judge Denise Cote came into the hearing having read the briefs and Judge Peck’s then recent landmark ruling in Da Silva Moore. It was obvious from her initial comments that her mind was made up that predictive coding should be used. She understood that this mega-size case needed predictive coding to meet the time deadlines and not waste a fortune on e-document review. Here are Judge Cote’s words at pages 8-9 of the transcript:
It seems to me that predictive coding should be given careful consideration in a case like this, and I am absolutely happy to endorse the use of predictive coding and to require that it be used as part of the discovery tools available to the parties. But it seems to me that the reliability and utility of predictive coding depends upon the process that takes place in the initial phases in which there is a pool of materials identified to run tests against, and I think that some of the documents refer to this as the seed — S-E-E-D — set of documents, and then there are various rounds of further testing to make sure that the code becomes smart with respect to the issues in this case and is sufficiently focused on what needs to be defined as a responsive document. And for this entire process to work, I think it needs transparency and cooperation of counsel.
I think ultimately the use of predictive coding is a benefit to both the plaintiff and the defendants in this case. I think there’s every reason to believe that, if it’s done correctly, it may be more reliable — not just as reliable but more reliable than manual review, and certainly more cost effective — cost effective for the plaintiff and the defendants.
To plaintiff’s counsel credit she quickly shifted her arguments from whether to how. Defense counsel also falls all over herself about how cooperative she has been and will continue to be, all the while implying that the other side is a closet non-cooperator.
As it turns out, very little actual conservation had occurred between the two lead counsel before the hearing, as both had preferred snarly emails and paper letters. At the hearing Judge Cote ordered the attorneys to talk first, and then rather than shoot off more letters, and to call her if they could not agree.
I strongly suggest you read the whole transcript of the first order to see the effect a strong judge can have on trial lawyers. Page 24 is especially instructive as to just how active a bench can be. At the second hearing of July 24, 2012, I suggest you read the transcript at pages 110-111 to get an idea as to just how difficult those attorneys meetings proved to be.
As a person obsessed with predictive coding I find the transcripts of the two hearings to be kind of funny in a perverse sort of way. The best way for me to share my insights is by using the format of a lawyer joke.
Two Lawyers Walked Into A Bar
One e-discovery lawyer walks into a Bar and nothing much happens. Two e-discovery lawyers walks into a Bar and an interesting discussion ensues about predictive coding. One trial lawyer walks into a Bar the volume of the whole place increases. Two trial lawyers walk into a Bar and an argument starts.
The 37 lawyers who filed appearances in the FHFA case walk into a Bar and all hell breaks loose. There are arguments everywhere. Memos are written, motions are filed, and the big bank clients are billed a million or more just talking about predictive coding.
Then United States District Court Judge Denise Cote walks into the Bar. All the trial lawyers immediately shut up, stand up, and start acting real agreeable, nice, and polite. Judge Cote says she has read all of the letters and they should all talk less, and listen more to the two e-discovery specialists still sitting in the bar bemused. Everything becomes a cooperative love-fest thereafter, at least, as far as predictive coding and Judge Conte are concerned. The trial lawyers move on to fight and bill about other issues more within their kin.
Substantive Disputes in FHFA v. JP Morgan
The biggest substantive issues in the first hearing of July 24, 2012 had to do with disclosure and keyword filtering before machine training. Judge Cote was prepared on the disclosure issue from having read the Da Silva Moore protocol, and so were the lawyers. The judge easily pressured defense counsel to disclose both relevant and irrelevant training documents to plaintiff’s counsel, with the exception of privileged documents.
As to the second issue of keyword filtering, the defense lawyers had been told by the experts behind the scenes that JP Morgan should be allowed to keyword filter the custodians ESI before running predictive coding. Judge Peck had not addressed that issue in Da Silva Moore, since the defense had not asked for that, so Judge Cote was not prepared to rule on that then new and esoteric issue. The trial lawyers were not able to articulate much on the issue either.
Judge Cote asked trial counsel if they had previously discussed this issue, not just traded memos, and they admitted no. So she ordered them to talk about it. It is amazing how much easier it is to cooperate and reach agreement when you actually speak, and have experts with you guiding the process. So Judge Cote ordered them to discuss the issue, and, as it turns out from the second order of July 31, 2012, they reached agreement. There would be no keyword filtering.
Although we do not know all of the issues discussed by attorneys, we do know they managed to reach agreement, and we know from the first hearing what a few of the issues were. They were outlined by plaintiff’s counsel who complained that they had no idea as to how defense counsel was going to handle the following issues at page 19 of the first hearing transcript:
What is the methodology for creating the seed set? How will that seed set be pulled together? What will be the number of documents in the seed set? Who will conduct the review of the seed set documents? Will it be senior attorneys or will it be junior attorneys? Whether the relevant determination is a binary determination, a yes or no for relevance, or if there’s a relevance score or scale in terms of 1 to 100. And the number of rounds, as your Honor noted, in terms of determining whether the system is well trained and stable.
So it seems likely all these issues and more were later discussed and accommodations reached. At the second hearing of July 31, 2012, we get a pretty good idea as to how difficult the attorneys meetings must have been. At pages 110-111 of the second hearing transcript we see how counsel for JP Morgan depicted these meetings and the quality of input received from plaintiff’s counsel and experts:
We meet every day with the plaintiff to have a status report, get input, and do the best we can to integrate that input. It isn’t always easy, not just to carry out those functions but to work with the plaintiff.
The suggestions we have had so far have been unworkable and by and large would have swamped the project from the outset and each day that a new suggestion gets made. But we do our best to explain that and keep moving forward.
Defense counsel then goes into what most lawyers would call “suck-up” mode to the judge and says:
We very much appreciate that your Honor has offered to make herself available, and we would not be surprised if we need to come to you with a dispute that hasn’t been resolved by moving forward or that seems sufficiently serious to put the project at risk. But that has not happened yet and we hope it will not.
After that plaintiff’s counsel complains the defense counsel has not agreed to allow depositions transcripts and witness statements to be used as training documents. That’s right. The plaintiff wanted to include congressional testimony, depositions and other witness statements that they found favorable to their position as part of the training documents to find relevant information store of custodian information.
Judge Cote was not about to be tricked into making a ruling on the spot, but instead wisely told them to go back and talk some more and get real expert input on the advisability of this approach. She is a very quick study as the following exchange at page 114 of the transcript with defense counsel after hearing the argument of plaintiff’s counsel illustrates:
THE COURT: Good. We will put those over for another day. I’m learning about predictive coding as we go. But a layperson’s expectation, which may be very wrong, would be that you should train your algorithm from the kinds of relevant documents that you might actually uncover in a search. Maybe that’s wrong and you will all educate me at some other time. I expect, Ms. Shane, if a deposition was just shot out of this e-discovery search, you would produce it. Am I right?
MS. SHANE: Absolutely, your Honor. But your instinct that what they are trying to train the system with are the kinds of documents that would be found within the custodian files as opposed to a batch of alien documents that will only confuse the computer is exactly right.
It is indeed a very interesting issue, but we cannot see a report in the case on Pacer that shows how the issue was resolved. I suspect the transcripts were all excluded, unless they were within a custodian’s account.
2014 Valentines Day Hearing
The only other order we found in the case mentioning predictive coding is here (dated February 14, 2014). Most of the Valentine’s Day transcript pertains to trial lawyers arguing about perjury, and complaining that some key documents were missed in the predictive coding production by JP Morgan. But the fault appears due to the failure to include a particular custodian in the search, an easy mistake to have happen. That has nothing to do with the success of the predictive coding or not.
Judge Cote handled that well, stating that no review is “perfect” and she was not about to have a redo at this late date. Her explanation at pages 5-6 of the February 14, 2014 transcript provides a good wrap up for FHFA v. JP Morgan:
Parties in litigation are required to be diligent and to act in good faith in producing documents in discovery. The production of documents in litigation such as this is a herculean undertaking, requiring an army of personnel and the production of an extraordinary volume of documents. Clients pay counsel vast sums of money in the course of this undertaking, both to produce documents and to review documents received from others. Despite the commitment of these resources, no one could or should expect perfection from this process. All that can be legitimately expected is a good faith, diligent commitment to produce all responsive documents uncovered when following the protocols to which the parties have agreed, or which a court has ordered.
Indeed, at the earliest stages of this discovery process, JP Morgan Chase was permitted, over the objection of FHFA, to produce its documents through the use of predictive coding. The literature that the Court reviewed at that time indicated that predictive coding had a better track record in the production of responsive documents than human review, but that both processes fell well short of identifying for production all of the documents the parties in litigation might wish to see.
There are many unpublished decisions out there approving and discussing predictive coding. I know of several more. Many of them, especially the ones that first came out and pretty much blindly followed our work in Da Silva Moore, call for complete transparency, including disclosure of irrelevant documents used in training. That is what happened in FHFA v. JP Morgan and the world did not come to an end. Indeed, the process seemed to go pretty well, even with a plaintiff’s counsel who, in the words of Sullivan and Cromwell, made suggestions everyday that were unworkable and by and large would have swamped the project … but we do our best to explain that and keep moving forward. Pages 110-111 of the second hearing transcript. So it seems cooperation can happen, even when one side is clueless, and even if full disclosure has been ordered.
Since the days of 2011 and 2012, when our Da Silva Moore protocol was developed, we have had much more experience with predictive coding. We have more information on how the training actually functions with a variety of chaotic email datasets, including the new Oracle ESI collection, and even more testing with the Enron dataset.
Based on what we know now, I do not think it is necessary to make disclosure of all irrelevant documents used in training. The only documents that have a significant impact on machine learning are the borderline, grey area documents. These are the ones who relevancy is close, and often a matter of opinion, of how you view the case. Only these grey area irrelevant documents need to be disclosed to protect the integrity of the process.
The science and other data behind that has to do with Jaccard Index classification inconsistencies, as well as the importance of mid-range ranked documents to most predictive coding algorithmic analysis. See Eg: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three at the subheadings Disclosure of Irrelevant Training Documents and Conclusions Regarding Inconsistent Reviews. When you limit disclosure to grey area training documents, and relevant documents, the process can become even more efficient without any compromise in quality or integrity. This of course assumes honest evaluations of grey area documents and forthright communications between counsel. But then so does all discovery in our system of justice. So this is really nothing new, nor out of the ordinary.
All discovery depends on the integrity and trustworthiness of the attorneys for the parties. Fortunately, almost all attorneys honorably fulfill these duties, except perhaps for the duty of technology competence. That is the greatest ethical challenge of the day for all litigators.