Analysis of Important New Case on Predictive Coding by a Rising New Judicial Star: “Winfield v. City of New York”

December 10, 2017

There is an important new predictive coding opinion in the SDNY by a new judge in town. Winfield, Stewart, Noel v. City of New York, Case no. 15-CV-05236, 2017 U.S. Dist. LEXIS 194413 (S.D.N.Y., Nov. 27, 2017). The opinion is by United States Magistrate Judge Katharine H. Parker. I have carefully studied this opinion and it looks like we have a new sTar in the making in our world of e-discovery and predictive coding. With all of the recent and forthcoming retirements on the bench, this is very welcome news.

Katharine H. Parker became a judge in November 2016 and before that was a partner in Proskauer’s Labor and Employment Law practice. See her interesting, humorous farewell message to her firm in Above The Law that took the form of a mock order that, among other things, said: Don’t let me take myself too seriously after I become a judge. 

Judge Parker obviously brings with her to the bench not only wit and wisdom, but also experience and knowledge of e-discovery, including legal search. As soon as she was one the bench she issued standard orders that include a Joint Electronic Discovery Submission and [Proposed] Order. It has a good e-discovery checklist built-in, albeit less complete than the new one out of Miami that I have been writing about. Good New 33-Point e-Discovery Checklist From Miami (e-Discovery Team, October 1, 2017). Item Six of her Joint Electronic Discovery Submission and [Proposed] Order is entitled “Search and Review” and states:

The parties have discussed methodologies or protocols for the search and review of electronically stored information, as well as the disclosure of techniques to be used. Some of the approaches that may be considered include: the use and exchange of keyword search lists, “hit reports,” and/or responsiveness rates; concept search; machine learning, or other advanced analytical tools; limitations on the fields or file types to be searched; date restrictions; limitations on whether back-up, archival, legacy, or deleted electronically stored information will be searched; testing; sampling; etc. To the extent the parties have reached agreement as to search and review methods, provide details below:

Way to go Judge Parker! That is a very good checklist for a Rule 26(f) conference and Case Management Report to get lawyers talking about the key legal issue in e-discovery, legal search. I especially like the reference to many different (“multimodal”) types of search and the use of the technically correct term, “machine learning.”

Transparency and GIGO

Judge Parker’s Order in Winfield v. City of New York (15-CV-05236) resolved a discovery dispute, which, among other things, challenged the Defendant City’s predictive coding process, in other words, it’s machine learning search. That challenge involved issues of transparency and relevance. Experts on predictive coding were apparently involved (although it looks like they were not retained until fairly late in the dispute). At that point the issues became much more technical and sophisticated. The plaintiffs wanted to know more about the defendant’s machine learning process, and the defendant did not want to share that. Instead, defendant invoked the attorney work-product secrecy cloak and good old, and I mean old, Sedona Principle Six. Protecting the Fourteen Crown Jewels of the Sedona Conference in the Third Revision of its Principles (my article requesting that Principle Six be updated).

Judge Parker then went hands-on, as a good judge should, and required in camera submissions. Then, after she was fully informed, she rendered this decision. Textbook procedure for how to resolve complex e-discovery issues. Note especially that she did not need expert testimony, nor a mini-trial, nor extensive discovery about discovery, to render her decision. These are all things that many nay sayers on predictive coding all said would happen. Aside from the ill-fated expert testimony for days on end by Dave Lewis and Dan Regard in Kleen Products back in 2012, it hasn’t happened. Kleen Products, LLC, et al. v. Packaging Corp. of Amer., et al.Case: 1:10-cv-05711, Document #412 (ND, Ill., Sept. 28, 2012).

Of course, I am ready to testify at any time to explain my now open-source methodology, Predictive Coding 4.0. See Eg. TARcourse.com. Most good experts are. But, so far, there have been no challenges, nor any reason for disputes. After Da Silva Moore, where my firm represented the defendants, and the series of opinions on predictive coding by Judge Andrew Peck, and others, things have settled down. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 183 (S.D.N.Y. 2012); Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016). Time has proven the pro-technology lawyers, myself included, were right. The cost savings realized by use of AI technology have not been robbed by the expense of expert litigation, as many predicted. Moreover, there have been very few challenges of any kind to predictive coding methodology since Da Silva Moore, which is why the instant case of Winfield v. City of New York is so interesting.

Plaintiffs’ challenge and demand for greater transparency in predictive coding in Winfield was based primarily on plaintiff’s contention that the defendant was too narrowly interpreting relevance, that it was miscoding relevant documents as irrelevant and not producing them. In other words, they had a disagreement on True Relevance. Plaintiffs argued that since defendant was miscoding documents, it must also have been incorrectly training documents in its machine learning. The logic is impeccable. If you prove one – misconception of relevance, then the other necessarily follows – incorrect training.

It is the oldest truism in computers, and remains a valid principle even in active machine learning, albeit mitigated somewhat by good software and good methods: GIGO – Garbage In, Garbage Out. The active machine learning is based on the input by the SME (“subject matter expert”). If the SME misdefines relevance, then so will the AI. It is, after all, just a creation of the SME, his or hers leveraged servant. Still, any good machine learning software can handle some bad input, it is a question of degree and methods. These are precisely the questions raised and answered, at least in a general way, in Winfield.

The artificial intelligence that your active machine training creates will only look for what you tell it to look for. It has no intelligence of its own. It is a type of specialized AI that starts out as a blank slate. It knows nothing. Someday our legal AI may know the law and know what is relevant, but we are not there yet. Right now it only knows that you tell it. If you tell it to look for garbage, that is what it will find. It can not hit a bullseye of relevance if you have trained it to find a different relevance target.

The problem here is that one person’s treasure is another trash. In other words, the lead counsel on both sides did not agree on the target, they did not agree on relevance. They did not even agree on Judge Parker’s rulings on relevance. This necessarily means they did not agree on the documents should be used in the training. I have written on this at great length in the TARcourse.com. See especially the Ninth Class: GIGO, QC, SME, Method, Software.

The SMEs in Winston Did Not Understand Judge Parker’s Rulings on Relevance

The first step in my open-sourced eight-step workflow is designed to address this issue, ESI Communications. Eleventh Class: Step One – ESI Communications. Some consider this the hardest step. The talk includes communications with opposing counsel on relevance, and communications with the judge where you cannot agree. Elsewhere I define an SME, the subject matter expert in charge of determining the target relevance, as the lawyer best able to predict the judge’s rulings on relevance. In litigation the ultimate SME is the judge. The relevance target is what she says it is. This is covered in Class Nine of the TAR Course where I explain:

If the attorney in charge (SME) does not understand the scope of discoverable information, does not understand probative value, then the odds of finding the documents important to a case are significantly diminished. You could look at a document with high probative value and not even know that it is relevant. This is exactly the concern of many requesting parties, that the responding party’s attorney will not understand relevance and discoverability the same way they do. That is why the first step in my recommended work flow is to Talk, which I also call Relevance Dialogues. . . .

The job of the SME is to know relevance. In the law that means you must know how the ultimate arbitrator of relevance will rule – the judge assigned to your case. They determine truth. An SME’s own personal opinion is important, but ultimately of secondary importance to that of the judge. For that reason a good SME will often vary on the side of over-expansive relevance because they know from history that is what the judge is likely to allow in this type of case.

Judges-Peck_Grimm_FacciolaThis is a key point. The judges, not the attorneys, ultimately decide on close relevance and related discoverability issues. The head trial attorney interfaces with the judge and opposing counsel, and should have the best handle on what is or is not relevant or discoverable. A good SME can predict the judge’s rulings and, even if not perfect, can gain the judicial guidance needed in an efficient manner.

In Winfield the SMEs on both sides got it wrong. Even though Judge Parker made several rulings on relevance, the attorneys did not properly interpret the rulings. In the case of defense counsel, the SME interpreted the relevance rulings too narrowly. For this reason their reviewers mismarked relevant documents as irrelevant and so did not properly train the system. Here is Judge Parker’s ruling on relevance, which sets the stage for her other rulings on predictive coding.

In sum, the City must expand its search for documents responsive to Plaintiffs’ document requests as it construed this Court’s prior ruling too narrowly. If the City finds additional documents in the DCP/Banks review population regarding the primary issues and defenses in this case responsive to Plaintiffs’ other document requests—such as documents concerning the Community Preference Policy and the City’s justification for the same, including the Policy’s role in overcoming community opposition—such documents should be produced, absent privilege or other objection for which the City has a good-faith basis to withhold the document. If, after review of the City’s production of responsive documents from the DCP/Banks review population, Plaintiffs believe that the City impermissibly withheld documents responsive to specific requests, Plaintiffs shall meet-and-confer with the City and thereafter may file a motion to compel.

Relevance is the threshold issue. As seen defense counsel did not get it right, did not properly construe Judge Parker’s rulings on the relevance, including such topics as the Community Preference Policy. For that reason, they also got the machine training wrong on that issue. The defense SME told the machine, and the human document reviewers, that, as to that issue at least, black was white, relevant was irrelevant. That is confusing to anyone, even a smart AI.

Of course, I have done that myself, many times. Any SME has. Perfection in relevance construction is nearly impossible in a complex case. What separates the good SMEs from the mediocre or bad ones is the number of errors made and the significance of those errors. As it turns out in Winfield, Judge Parker did not consider the errors made by the defendant’s SME to be that serious. That evaluation tempered her ruling and ultimately limited the remedies granted to plaintiffs.

Usually the AI points out obvious errors to you, the oops kind of errors, not systemic ones. If the software is good, it will still predict documents to be relevant if logic suggests they must be so in order to be consistent with other documents coded relevant. It will do so in spite of the fact you have already coded the documents as irrelevant. (If the software you are using does not do that, find other software!) If the mistake is systemic and widespread or across a whole class, it may not do so, but still, even then, the error is usually not hard to correct. You just change the coding on the disputed documents and retrain the system. Good software will usually adjust rankings easily, if you can find and correct all, or nearly all of the incorrectly coded ESI. One round of training should do it. No big deal. Mistake fixed. Again, if your software can not do that, then your software is no good, out of date. Get new software. See: WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0.

Procedural Background in Winfield

Here is Judge Parker’s summary of the parties convoluted discovery disputes.

Plaintiffs have sought wide-ranging discovery, which the City has resisted vigorously. This Court has issued various rulings consistent with Federal Rule of Civil Procedure 26(b)(1) to ensure that discovery is focused on information that is relevant and proportional to the needs of this case, considering “the importance of the issues at stake in the action, the amount in controversy, the parties’ relative access to relevant information, the parties’ resources, the importance of the discovery in resolving the issues, and whether the burden or expense of the proposed discovery outweighs its likely benefit.” Fed. R. Civ. P. 26(b)(1). This included directing phased discovery and limiting some of Plaintiffs’ discovery demands.  . . .

Notice how Judge Parker, fresh out of private practice, well understands new Rule 26(b)(1) and proportional relevance. She also understands the need to phase discovery.

The history of the parties e-discovery dispute on document search and review is all too common. Both sides start off playing the Go Fish keyword guessing game, just like there were still in the nineties. Apparently they all slept through Judge Peck’s Wake Up call in Gross Construction. William A. Gross Constr. Assocs., Inc. v. Am. Mutual Mfrs. Ins. Co., 256 F.R.D. 134, 134 (S.D.N.Y. 2009); Losey, R., Adventures in Electronic Discovery (West 2011); Child’s Game of ‘Go Fish’ is a Poor Model for e-Discovery Search; National Day Laborer Organizing Network v. US Immigration and Customs Enforcement Agency877 F.Supp.2d 87 (SDNY, 2012) (J. Scheindlin) (“As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: “In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of `Go Fish’ … keyword searches usually are not very effective.” FN 113“).

Here is Judge Parker’s charitable description of the legal search efforts of the attorneys in this case at that point.

Pursuant to the Court’s directives, the City began its document review process with HPD and the Mayor’s Office. As it pertained to ESI, the parties disagreed as to the search terms that would be applied to the electronic documents collected from these custodians. Ultimately, the City applied a set of search terms that was heavily negotiated by the parties, with some guidance from the Court, to the custodians from HPD and the Mayor’s Office. The City then began reviewing the electronic documents from these custodians.

Plaintiffs lodged numerous complaints about the pace of discovery and document review, which initially involved only manual linear review of documents. Accordingly, given the volume of documents collected, this Court directed the City to complete linear review as to certain custodians and begin using Technology Assisted Review (“TAR”) software (also commonly referred to as “predictive coding”) to hasten the identification, review, and production of documents responsive to Plaintiffs’ document requests.

Keyword negotiation and linear review! Can you believe it? No wonder the City had already reviewed “over 100,000 documents at a cost of over $350,000.” Yes. It pays to be computer illiterate and practice law like you were still in the nineties. I wonder if they used Summation? Judge Parker properly threw up her hands at this and cajoled counsel to jump into the Twenty-First Century TAR pits with the rest of us.

Her description is pretty good, although it still uses the old-fashioned seed-set term and does not take into consideration that machine training these days continues throughout the review as an iterated process (steps 4, 5 and 6 in the 8-step method of Predictive Coding 4.0). See the Tar Course,  Fourteenth Class: Steps Four, Five and Six – Iterative Machine Training. The particular flavor of continuous training that I include in my open source methodology is called Intelligently Spaced Training or IST. It is explained in the Tar Course: Sixth Class: 2nd Insight – Balanced Hybrid and Intelligently Spaced Training (IST).

TAR allows parties to prioritize and/or categorize documents for purposes of document review and has been shown to produce more accurate results than manual review. See Rio Tinto PLC v. Vale S.A., No. 14-cv-3042 (RMB) (AJP), 2015 WL 872294, at *3 (S.D.N.Y. Mar. 2, 2015); Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012) (noting that linear manual review was too expensive when there were over three million emails to review and that computer-assisted review can save both parties significant time and money in document review and is at least as, if not more, accurate as manual review). For TAR to work properly, the producing party must prepare a training, or seed set, of responsive and non-responsive documents to train the computer system how to distinguish between them. For purposes of this review, the City trained its TAR system using documents that it had already reviewed during its linear review, as well as by completing several training rounds of additional documents that had not yet been reviewed. Once the computer system is trained, it segregates the potential review population into responsive and non-responsive documents and prioritizes them in terms of relevancy (based on similarity to documents in the seed set) so that trained document reviewers can focus on documents that are likely to be most relevant first among the documents classified as responsive. Best practices also dictate that the producing party validate the results of its trained TAR system using certain metrics, such as a recall rate that measures the effectiveness of the software in finding responsive documents.  . . .

Even with this good guidance from Judge Parker to go beyond mere keyword search and use active machine learning, aka TAR, counsel for both sides resisted. Instead, they spent most of their time doing what they know best, arguing about keywords. I do not think they had retained their outside experts yet. Defendant finally agreed to some stupid list, but said they would also use active machine learning – TAR – to only review the keyword hits with the highest ranking. Yes. Finally. Of course, the Plaintiff’s attorneys would not agree with that because they did not trust defendant’s ability to do TAR properly.

Requesting Parties Demand for Quality and Transparency

At this point, with experts now likely on board for both sides, the plaintiffs counsel finally got smart and started arguing about relevance and quality of process. Here is how Judge Parker explained it.

After further negotiation and a modest reduction in Plaintiffs’ number of search terms, the City agreed that it would accept all of Plaintiffs’ modified proposed terms, but that it would use its predictive coding system to reduce the DCP/Banks review population to a more manageable number of documents. Plaintiffs, in turn, have indicated that the City’s proposal is not acceptable because they are concerned about the reliability of the City’s predictive coding processes. Specifically, Plaintiffs contend that the City has over-designated documents as privileged and non-responsive, including by applying an impermissibly narrow view of responsiveness during its review process. As a result of this alleged practice, Plaintiffs assert that the City’s TAR software was improperly trained on what constitutes a responsive and nonresponsive document, leading to a predictive coding system that is unable to recognize documents that are truly responsive to the issues in this case.

Now we finally have a valid issue, one that goes beyond last-centuries keyword search arguments. Did the City properly train the system? Or did it train the algorithm to avoid certain types of relevant ESI, such as documents related to the Community Preference Policy? Did we have GIGO in play here or not? If so, how serious is this mistake to the overall discovery process? And what should be done now to correct the mistake? These are legitimate predictive coding quality issues. Here we see a new judge, one who has practiced this century and knows her way around e-discovery, succeed in elevating the dispute. Back to Judge Parker’s narrative.

In response to Plaintiffs’ concerns, this Court has issued several orders. It required the City to submit a letter for in camera review describing its predictive coding process and training for document reviewers. The City asserted that these materials are core work product, but nevertheless provided this information to the Court on September 8 and October 5, 2017.  . . .

After studying the material submitted by defense counsel, or rather, I would hope and expect, material submitted by defense counsel’s e-discovery experts (Maura Grossman perhaps?), Judge Parker was in a position to rule on both the relevance and transparency issues. Here is her description of the dispute at this point.

To increase transparency and confidence in the City’s document review, Plaintiffs have proposed that the Court direct the City to provide for review by Plaintiffs’ counsel random samples of each of the following (size and methodology to be negotiated by counsel and their e-discovery experts, or determined by the Court in the absence of agreement):

  • non-privileged documents collected from DCP and Banks that were subject to TAR that are above the predictive-coding ranking cut-off used by the City, but where the City ultimately determined that the documents were non-responsive;
  • non-privileged documents collected from DCP and Banks that were subject to TAR that are just below the predictive-coding ranking cut-off used by the City;
  • non-privileged documents collected from HPD and the Mayor’s Office that were subject to TAR that were above the predictive-coding cut-off, but that the City ultimately designated as non-responsive through its attorney review; and
  • non-privileged documents collected from HPD and the Mayor’s Office that were subject to TAR that were just below the predictive-coding cut-off used by the City.

Plaintiffs also seek:

  • information about the ranking system used (i.e., what cut-off was used, and how
    many documents were deemed responsive and unresponsive at each ranking);
  • all materials submitted by the City for the Court’s in camera review relating to
    predictive coding, other than materials that are in fact privileged, on an
    attorneys’ and experts’ eyes-only basis;
  • an order requiring the City to search for documents responsive to all of Plaintiffs’
    document requests other than those specifically limited or stricken by the Court
    at prior court conferences (or those that had been withdrawn by Plaintiffs);

This is a pretty sophisticated proposal by plaintiffs’ counsel. I have to assume they finally have, or are finally following, the advice of outside e-discovery experts, probably a vendor advising them on predictive coding. (Anybody in a position to share with me who that was? I assume it is probably a matter of public record in the court file somewhere.)

Judge Parker Rules

Judge Katharine Parker begins her ruling in traditional fashion by reciting the governing law. Here is the key part.

Collection, review, and production of ESI presents special challenges and requires “cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI.” William A. Gross Const. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009). Courts have recognized that keyword searches used to search for and collect documents for review must be carefully crafted. Id. Here, Plaintiffs have provided 665 additional search terms to be applied to the DCP/Banks review population. The City has stated that the supplemental search would require review of 90,000 additional documents at a cost of approximately $248,000. Nevertheless, it has stated that it is willing to use all of Plaintiffs’ proposed search terms and use TAR, leveraging the training already done on the software. (Doc. No. 203.) Plaintiffs object to the City’s proposal because they believe that the City’s TAR processes are flawed insofar as it results in the over-designation of documents as nonresponsive. . . .

Judge Parker then rules against the defendant, at least in part, on the relevance issue. They had been construing her prior rulings on relevance too narrowly. I have previously quoted this part of her order a little out of sequence. Recall that she held that defense counsel had construed this Court’s prior ruling too narrowly . . . by, for instance, excluding documents concerning the Community Preference Policy and the City’s justification for the policy.

That means appropriate training documents had not been used, at least as to certain issues. Plaintiff’s were right to challenge the machine training. Here is how Judge Parker then rules on the key issue of process.

C. Plaintiffs’ Challenges To The City’s TAR Processes and Responsiveness Designations

Finally, Plaintiffs object to the City’s continued use of its TAR system. Plaintiffs contend that the system is improperly trained because the City’s human document reviewers overdesignated documents as non-responsive during both the linear review and during the TAR training stages. As a result, Plaintiffs claim, the TAR software is unable to recognize and properly categorize responsive documents.

As I have previously stated, good predictive coding software today corrects and mitigates against such errors. Now that the errors have been pointed out by Judge Parker, it should not be too hard for the defense to retrain the TAR software so that it is able to recognize and properly categorize responsive documents. Plaintiff’s claim is an exaggeration, assuming, as we must, that defendant’s predictive coding experts are competent.

Judge Parker goes on with a very long paragraph reciting the law to date on process, and duty of the responding party to make reasonable, good faith efforts.

As courts have noted, the producing party is in the best position to “evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.” Hyles v. New York City, No. 10-cv-3119 (AT) (AJP), 2016 WL 4077114, at *3 (S.D.N.Y. Aug. 1, 2016) (citing Principle 6 of the Sedona Conference). Traditionally, courts have not micro-managed parties’ internal review processes for a number of reasons. First, attorneys, as officers of the court, are expected to comply with Rules 26 and 34 in connection with their search, collection, review and production of documents, including ESI. Second, internal attorney ESI work processes may reveal work product, litigation tactics, and trial strategy. See generally Disability Rights Council of Greater Wash. v. Wash. Metro. Transit Auth., 242 F.R.D. 139, 142-43 (D.D.C. 2007) (holding that a compilation of documents culled from a larger protection is protectable as attorney work product). Third, as noted above, the producing party is better equipped than the court to identify and utilize the best process for producing their own ESI consistent with their obligations under the Federal Rules of Civil Procedure. See Hyles, 2016 WL 4077114, at *3 (citing Principle 6 of the Sedona Conference). Fourth, perfection in ESI discovery is not required; rather, a producing party must take reasonable steps to identify and produce relevant documents. See HM Elecs., Inc. v. R.F. Techs., Inc., No. 12-cv-2884 (BAS) (MDD), 2015 WL 471498, at *12 (S.D. Cal. Aug. 7, 2015), vacated in part on other grounds, 171 F. Supp. 3d 1020 (S.D. Cal. 2016); Pension Comm. of the Univ. of Montreal Pension Plan v. Banc of Am. Secs., LLC, 685 F. Supp. 2d 456, 461 (S.D.N.Y. 2010); see also Advisory Committee Notes, 2015 Amendments to Fed. R. Civ. P. 37(e). However, “‘[p]arties cannot be permitted to jeopardize the integrity of the discovery process by engaging in halfhearted and ineffective efforts to identify and produce relevant documents.’” HM Elecs., Inc., 2015 WL 4714908 at *12 (quoting Bratka v. Anheuser-Busch Co., Inc., 164 F.R.D. 448, 463 (S.D. Ohio 1995)). “Litigation is not a game. It is the time-honored method of seeking the truth, finding the truth, and doing justice.” Id. (citation and quotations omitted). In keeping with these principles, this Court is of the view that there is nothing so exceptional about ESI production that should cause courts to insert themselves as super-managers of the parties’ internal review processes, including training of TAR software, or to permit discovery about such process, in the
absence of evidence of good cause such as a showing of gross negligence in the review and production process, the failure to produce relevant specific documents known to exist or that are likely to exist, or other malfeasance.

Judge Parker then address the transparency issue.

Courts are split as to the degree of transparency required by the producing party as to its predictive coding process. See Rio Tinto PLC., 306 F.R.D. at 128 (citing John M. Facciola & Philip J. Favro, Safeguarding the Seed Set: Why Seed Set Documents May Be Entitled To Work Product Protection, 8 Fed. Cts. L. Rev. 1 (2015)). In some cases, parties have agreed to—or courts have ordered—transparency into the TAR or predictive coding processes. See, e.g., id. at 129 (“[t]he Court, however, need not rule on the need for seed set transparency in this case, because the parties agreed to a protocol that discloses all non-privileged documents in the control sets.”); In re Actos Prods. Liab. Litig., No. 6:11-md-299, 2012 WL 7861249, at *4-5 (W.D. La. July 27, 2012) (the parties’ protocol had “experts” from each side simultaneously reviewing and coding the seed set); Bridgestone Ams., Inc. v. Int’l Bus. Machs. Corp., No. 3:13-cv-1196, 2014 WL 4923014, at *1 (M.D. Tenn. July 22, 2014) (party offered to provide responsive and non-responsive seed set documents—an offer endorsed by the court). In other cases, courts have not required production of seed set documents. See In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig., No. 3:12-md-2391, 2013 WL 6405156, at *1-2 (N.D. Ind. Aug. 21, 2013) (refusing to direct party to produce seed set, but encouraging transparency); Aurora Coop. Elev. Co. v. Aventine Renewable Energy-Aurora W., LLC, No. 4:12-cv-230, 2015 WL 10550240, at *2 (D. Neb. Jan. 6, 2015) (encouraging the parties to work cooperatively in developing a TAR process, but observing that the Federal Rules of Civil Procedure do not require a party to disclose information that is not relevant to any party’s claim or defense).

While it is true that Plaintiffs here do not have clear insight into the City’s predictive coding process and training, this Court has required the City to provide in camera submissions addressing these subjects. These in camera submissions reveal that the City appropriately trained and utilized its TAR system. The City’s seed set included over 7,200 documents that were reviewed by the City’s document review team and marked as responsive or nonresponsive in order to train the system. Its seed set included randomly selected documents, as well as pre-coded example documents such as the pleadings, Plaintiffs’ document requests, and other relevant, privileged and non-privileged documents. At the conclusion of the TAR training phase, which include five full training rounds, the City conducted a validation process.

Judge Parker’s hands-on review of the process provides reasonable safeguards to the requesting party, here the plaintiffs. This represents a good working compromise on the transparency issue. Disclosure was made to the judge, but not the opposing party of counsel. The producing parties confidentiality rights were protected. The disclosure made Judge Parker comfortable that the errors made in the training to date by the improper relevance classification was not too serious and could be corrected. As she put it, “these in camera submissions reveal that the City appropriately trained and utilized its TAR system” including a “validation process.”

Judge Parker goes on to describe other disclosures made to her by the responding party, here the defendant City, that  convinced her that their efforts were very reasonable. You would be well advised to incorporate these same quality control procedures into all of your document review projects.

Moreover, the City provided detailed training to its document review team as to the issues in the case. The document review team also was provided with all of Plaintiffs’ document requests to use in connection with their review and designation of documents as responsive and non-responsive. In sum, the City’s training and review processes and protocols present no basis for finding that the City engaged in gross negligence in connection with its ESI discovery – far from it.

Judge Parker then addresses head-on the important relevance issue, GIGO. Based on what she saw, she did not think the responding party’s mistakes were that serious. The number of misclassified documents used in the training was relatively small.

Plaintiffs’ objections to the City’s use of TAR largely stems from their beliefs that the City
has over-designated documents as non-responsive. In support of this claim, Plaintiffs rely on a few documents that were inadvertently produced by the City that they contend are responsive to their discovery requests and should have been produced. Plaintiffs also point to the fact that the City has flip-flopped on its designation of documents when challenged. For example, the City has admitted that the two “slip sheeted” documents initially marked as non-responsive are in fact “arguably responsive.”7 Although the City has not provided an explanation for the additional three “slip-sheeted” documents that Plaintiffs have identified, these documents also appear to be at least “arguably responsive” to Plaintiffs’ discovery requests.

Additionally, Plaintiffs emphasize that the City’s document review has resulted in many
documents being categorized as responsive, but ultimately withheld on privilege grounds. As mentioned above, the City has produced a sample of 80 of these supposedly privileged
documents for in camera review. And, as noted above, the City has withdrawn its claim of
deliberative process privilege as to 59 of the documents initially designated as privileged (out of the 80-document sample set), but also has changed its responsiveness designation to nonresponsive and, as a result, has withheld these 15 documents as non-responsive to Plaintiffs’ document requests (the “NR Documents”). This Court has reviewed these 15 NR Documents in camera and determined that nearly all of them are responsive to Plaintiffs’ document requests to some degree, though most are only of marginal relevance.

In sum, the City incorrectly categorized at least five “slip-sheeted” documents as nonresponsive during the electronic review process and correctly categorized 15 other documents as responsive during the electronic review process but later labeled them as non-responsive. However, this Court does not find the labeling of these 20 documents, only 5 of which were “incorrectly” categorized as non-responsive during the initial ESI review—out of the 100,000 documents that have been reviewed thus far in this case—sufficient to question the accuracy and reliability of the City’s TAR process as a whole. In any ESI review, “the Federal Rules of Civil Procedure do not require perfection.” Moore, 287 F.R.D. at 191. Instead, the proper inquiry is whether the “search results are reasonable and proportional.” Hyles, 2016 WL 4077114, at *3 (citing Fed. R. Civ. P 26(g)(1)(B)).

Here, neither this Court nor Plaintiffs have identified anything in the TAR process itself that is inherently defective; rather, Plaintiffs’ objections are premised upon human error in categorizing a small subset of documents as responsive or non-responsive. The City in this case has produced over 12,500 documents that were all designated as responsive. Moreover, it seems unlikely that the misdesignations identified by Plaintiffs would have affected the City’s TAR processes in any meaningful way, given that the seed set was comprised of over 7,000 documents. The City’s validation process, which was described to this Court in the City’s in camera submission, further supports this conclusion.

Still, as described before, the City’s SME did make a mistake in relevance and thus a mistake in machine training. For that reason a limited remedy was provided to the requesting party plaintiffs. Limited disclosure of non-privileged sample documents was ordered to plaintiff’s attorneys and experts’ eyes-only basis.

While the Court disagrees with Plaintiffs’ assertions that the TAR process as a whole is
defective, it nevertheless finds that Plaintiffs have presented sufficient evidence to justify their request for sample sets of non-privileged documents from the documents pulled from the 50 custodians. In particular, this Court agrees that the sample sets will increase transparency, a request that is not unreasonable in light of the volume of documents collected from the custodians, the low responsiveness rate of documents pulled for review by the TAR software, and the examples that Plaintiffs have presented, which suggest there may have been some human error in categorization that may have led to gaps in the City’s production.

Thus, this Court will grant Plaintiffs’ request in part as follows:

  • The City is directed to produce the five “slip-sheeted” documents and the 15 NR
    Documents within one week of the date of this order.
  • The City is directed to provide to Plaintiffs a sample of 300 non-privileged
    documents in total from the HPD custodians and the Mayor’s Office.8 These
    documents should be randomly pulled from the corpus of non-responsive
    documents. The City shall provide these documents to Plaintiffs’ counsel on an
    attorneys’ and experts’ eyes-only basis. The City shall be required to produce the
    300 non-privileged documents from the HPD custodians and the Mayor’s Office
    by December 22, 2017.
  • The City shall apply the Plaintiffs’ search terms to the DCP/Banks review
    population and shall leverage its existing TAR process to segregate the additional
    documents into responsive and non-responsive categories and begin producing
    any responsive documents, subject to its objections and this Court’s orders
    limiting Plaintiffs’ discovery requests. It then shall provide Plaintiffs with a
    random sample of 100 non-privileged, non-responsive documents in total from
    the DCP/Banks review population.9 These too shall be provided to Plaintiffs’
    counsel on an attorneys’ and experts’ eyes-only basis. The City shall be required
    to produce the 100 non-privileged documents from the DCP/Banks review
    population by January 12, 2018.
  • To the extent Plaintiffs contend that the sampling process described above
    suggests that there are documents within the HPD/Mayor’s Office or DCP/Banks
    review populations that are responsive and relevant, but have not been
    produced, they shall meet and confer with the City to determine whether
    additional training and review is necessary with the understanding that
    reasonableness and proportionality, not perfection and scorched-earth, must be
    their guiding principles. See Rio Tinto PLC, 306 F.R.D. at 129 (“[o]ne point must
    be stressed—it is inappropriate to hold TAR to a higher standard than keywords
    or manual review. Doing so discourages parties from using TAR for fear of
    spending more in motion practice than the savings from using TAR for review.”).

This seems to me to be a fair compromise. The requesting party had asked for even more disclosure, namely information on the predictive coding ranking system. It looks like their experts were asking for the probability cut-off metric, in other words, how far down in the ranking did the review extend. As you can see by the ruling below, the court denied this, but at the same time Judge Parker encouraged the City to share such information with Plaintiffs. Personally, I would do it. It is not a big deal. The rational cut-off point is typically 51% probable relevance, but again with a validation process, typically a random sample of the null-set. The null-set refers to the documents ranked below 51%, or whatever the cut-off point is, that will not be reviewed. I would not produce the documents produced in camera. That would defeat the whole in camera process. Trust the judge. She seems pretty terrific to me.

This Court denies Plaintiffs’ request for information about the ranking system used by Plaintiffs (i.e., what cut-off was used, and how many documents deemed responsive and unresponsive are at each ranking) and for all materials submitted by defendant in camera relating to predictive coding. Plaintiffs have failed to explain why information about the City’s ranking system is needed, particularly given that Plaintiffs’ request to sample non-responsive documents from the review populations is being granted. It is also unclear how this information is even potentially relevant to the claims and defenses in this litigation, as required under Federal Rule of Civil Procedure 26. However, in the interests of transparency and cooperation in the discovery process, the City is encouraged to share such information with Plaintiffs. With respect to the City’s in camera submissions, this Court views this information as being protected by the work product privilege and, accordingly, is not subject to disclosure.

Conclusion

Winfield v. City of New York (SDNY, Nov. 27, 2017) is a next generation predictive coding case that gets into the methodologies and quality controls of the active machine learning process. It assumes that of course you should do predictive coding and moves on to the practical issues of methods. Specifically, how to deal with relevance training errors, quality controls and transparency. You would be well advised to carefully study this opinion and cite to it often.

With the recent retirement of so many great e-discovery judges, including soon the predictive coding champion, Judge Andrew Peck, it is encouraging to see new judges like Katharine H. Parker come onto the scene. Electronic discovery and the sub-specialty of Legal Search need the continued encouragement and guidance of skilled judges. My congratulations to Judge Parker’s hands-on, proactive approach in Winfield and her good grasp of the sticky subject of TAR. Yes, it is fair to say that with Winfield a star is born. Long may she shine.

 


WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0

December 3, 2017

Many lawyers and technologists like predictive coding and recommend it to their colleagues. They have good reasons. It has worked for them. It has allowed them to do e-discovery reviews in an effective, cost efficient manner, especially the big projects. That is true for me too, but that is not why I love predictive coding. My feelings come from the excitement, fun, and amazement that often arise from seeing it in action, first hand. I love watching the predictive coding features in my software find documents that I could never have found on my own. I love the way the AI in the software helps me to do the impossible. I really love how it makes me far smarter and skilled than I really am.

I have been getting those kinds of positive feelings consistently by using the latest Predictive Coding 4.0 methodology (shown right) and KrolLDiscovery’s latest eDiscovery.com Review software (“EDR”). So too have my e-Discovery Team members who helped me to participate in TREC 2015 and 2016 (the great science experiment for the latest text search techniques sponsored by the National Institute of Standards and Technology). During our grueling forty-five days of experiments in 2015, and again for sixty days in 2016, we came to admire the intelligence of the new EDR software so much that we decided to personalize the AI as a robot. We named him Mr. EDR out of respect. He even has his own website now, MrEDR.com, where he explains how he helped my e-Discovery Team in the 2015 and 2015 TREC Total Recall Track experiments.

Bottom line for us from this research was to prove and improve our methods. Our latest version 4.0 of Predictive Coding, Hybrid Multimodal IST Method is the result. We have even open-sourced this method, well most of it, and teach it in a free seventeen-class online program: TARcourse.com. Aside from testing and improving our methods, another, perhaps even more important result of TREC for us was our rediscovery that with good teamwork, and good software like Mr. EDR at your side, document review need never be boring again. The documents themselves may well be boring as hell, that’s another matter, but the search for them need not be.

How and Why Predictive Coding is Fun

Steps Four, Five and Six of the standard eight-step workflow for Predictive Coding 4.0 is where we work with the active machine-learning features of Mr. EDR. These are its predictive coding features, a type of artificial intelligence. We train the computer on our conception of relevance by showing it relevant and irrelevant documents that we have found. The software is designed to then go out and find all other relevant documents in the total dataset. One of the skills we learn is when we have taught enough and can stop the training and complete the document review. At TREC we call that the Stop decision. It is important to keep down the costs of document review.

We use a multimodal approach to find training documents, meaning we use all of the other search features of Mr. EDR to find relevant ESI, such as keyword searches, similarity and concept. We iterate the training by sample documents, both relevant and irrelevant, until the computer starts to understand the scope of relevance we have in mind. It is a training exercise to make our AI smart, to get it to understand the basic ideas of relevance for that case. It usually takes multiple rounds of training for Mr. EDR to understand what we have in mind. But he is a fast learner, and by using the latest hybrid multimodal IST (“intelligently spaced learning“) techniques, we can usually complete his training in a few days. At TREC, where we were moving fast after hours with the Ã-Team, we completed some of the training experiments in just a few hours.

After a while Mr. EDR starts to “get it,” he starts to really understand what we are after, what we think is relevant in the case. That is when a happy shock and awe type moment can happen. That is when Mr. EDR’s intelligence and search abilities start to exceed our own. Yes. It happens. The pupil then starts to evolve beyond his teachers. The smart algorithms start to see patterns and find evidence invisible to us. At that point we sometimes even let him train himself by automatically accepting his top-ranked predicted relevant documents without even looking at them. Our main role then is to determine a good range for the automatic acceptance and do some spot-checking. We are, in effect, allowing Mr. EDR to take over the review. Oh what a feeling to then watch what happens, to see him keep finding new relevant documents and keep getting smarter and smarter by his own self-programming. That is the special AI-high that makes it so much fun to work with Predictive Coding 4.0 and Mr. EDR.

It does not happen in every project, but with the new Predictive Coding 4.0 methods and the latest Mr. EDR, we are seeing this kind of transformation happen more and more often. It is a tipping point in the review when we see Mr. EDR go beyond us. He starts to unearth relevant documents that my team would never even have thought to look for. The relevant documents he finds are sometimes completely dissimilar to any others we found before. They do not have the same keywords, or even the same known concepts. Still, Mr. EDR sees patterns in these documents that we do not. He can find the hidden gems of relevance, even outliers and black swans, if they exist. When he starts to train himself, that is the point in the review when we think of Mr. EDR as going into superhero mode. At least, that is the way my young e-Discovery Team members likes to talk about him.

By the end of many projects the algorithmic functions of Mr. EDR have attained a higher intelligence and skill level than our own (at least on the task of finding the relevant evidence in the document collection). He is always lighting fast and inexhaustible, even untrained, but by the end of his training, he becomes a search genius. Watching Mr. EDR in that kind of superhero mode is what makes Predictive Coding 4.0 a pleasure.

The Empowerment of AI Augmented Search

It is hard to describe the combination of pride and excitement you feel when Mr. EDR, your student, takes your training and then goes beyond you. More than that, the super-AI you created then empowers you to do things that would have been impossible before, absurd even. That feels pretty good too. You may not be Iron Man, or look like Robert Downey, but you will be capable of remarkable feats of legal search strength.

For instance, using Mr. EDR as our Iron Man-like suits, my e-discovery Ã-Team of three attorneys was able to do thirty different review projects and classify 17,014,085 documents in 45 days. See 2015 TREC experiment summary at Mr. EDR. We did these projects mostly at nights, and on weekends, while holding down our regular jobs. What makes this crazy impossible, is that we were able to accomplish this by only personally reviewing 32,916 documents. That is less than 0.2% of the total collection. That means we relied on predictive coding to do 99.8% of our review work. Incredible, but true.

Using traditional linear review methods it would have taken us 45 years to review that many documents! Instead, we did it in 45 days. Plus our recall and precision rates were insanely good. We even scored 100% precision and 100% recall in one TREC project in 2015 and two more in 2016. You read that right. Perfection. Many of our other projects attained scores in the high and mid nineties. We are not saying you will get results like that. Every project is different, and some are much more difficult than others. But we are saying that this kind of AI-enhanced review is not only fast and efficient, it is effective.

Yes, it’s pretty cool when your little AI creation does all the work for you and makes you look good. Still, no robot could do this without your training and supervision. We are a team, which is why we call it hybrid multimodal, man and machine.

Having Fun with Scientific Research at TREC 2015 and 2016

During the 2015 TREC Total Recall Track experiments my team would sometimes get totally lost on a few of the really hard Topics. We were not given legal issues to search, as usual. They were arcane technical hacker issues, political issues, or local news stories. Not only were we in new fields, the scope of relevance of the thirty Topics was never really explained. (We were given one to three word explanations in 2015, in 2016 we got a whole sentence!) We had to figure out intended relevance during the project based on feedback from the automated TREC document adjudication system. We would have some limited understanding of relevance based on our suppositions of the initial keyword hints, and so we could begin to train Mr. EDR with that. But, in several Topics, we never had any real understanding of exactly what TREC thought was relevant.

This was a very frustrating situation at first, but, and here is the cool thing, even though we did not know, Mr. EDR knew. That’s right. He saw the TREC patterns of relevance hidden to us mere mortals. In many of the thirty Topics we would just sit back and let him do all of the driving, like a Google car. We would often just cheer him on (and each other) as the TREC systems kept saying Mr. EDR was right, the documents he selected were relevant. The truth is, during much of the 45 days of TREC we were like kids in a candy store having a great time. That is when we decided to give Mr. EDR a cape and superhero status. He never let us down. It is a great feeling to create an AI with greater intelligence than your own and then see it augment and improve your legal work. It is truly a hybrid human-machine partnership at its best.

I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding are over, but the search for truth and justice goes on in lawsuits across the country. Try it on your next document review project.

Do What You Love and Love What You Do

Mr. EDR, and other good predictive coding software like it, can augment our own abilities and make us incredibly productive. This is why I love predictive coding and would not trade it for any other legal activity I have ever done (although I have had similar highs from oral arguments that went great, or the rush that comes from winning a big case).

The excitement of predictive coding comes through clearly when Mr. EDR is fully trained and able to carry on without you. It is a kind of Kurzweilian mini-singularity event. It usually happens near the end of the project, but can happen earlier when your computer catches on to what you want and starts to find the hidden gems you missed. I suggest you give Predictive Coding 4.0 and Mr. EDR a try. To make it easier I open-sourced our latest method and created an online course. TARcourse.com. It will teach anyone our method, if they have the right software. Learn the method, get the software and then you too can have fun with evidence search. You too can love what you do. Document review need never be boring again.

Caution

One note of caution: most e-discovery vendors, including the largest, do not have active machine learning features built into their document review software. Even the few that have active machine learning do not necessarily follow the Hybrid Multimodal IST Predictive Coding 4.0 approach that we used to attain these results. They instead rely entirely on machine-selected documents for training, or even worse, rely entirely on random selected documents to train the software, or have elaborate unnecessary secret control sets.

The algorithms used by some vendors who say they have “predictive coding” or “artificial intelligence” are not very good. Scientists tell me that some are only dressed-up concept search or unsupervised document clustering. Only bona fide active machine learning algorithms create the kind of AI experience that I am talking about. Software for document review that does not have any active machine learning features may be cheap, and may be popular, but they lack the power that I love. Without active machine learning, which is fundamentally different from just “analytics,” it is not possible to boost your intelligence with AI. So beware of software that just says it has advanced analytics. Ask if it has “active machine learning“?

It is impossible to do the things described in this essay unless the software you are using has active machine learning features.  This is clearly the way of the future. It is what makes document review enjoyable and why I love to do big projects. It turns scary to fun.

So, if you tried “predictive coding” or “advanced analytics” before, and it did not work for you, it could well be the software’s fault, not yours. Or it could be the poor method you were following. The method that we developed in Da Silva Moore, where my firm represented the defense, was a version 1.0 method. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 183 (S.D.N.Y. 2012). We have come a long way since then. We have eliminated unnecessary random control sets and gone to continuous training, instead of train then review. This is spelled out in the TARcourse.com that teaches our latest version 4.0 techniques.

The new 4.0 methods are not hard to follow. The TARcourse.com puts our methods online and even teaches the theory and practice. And the 4.0 methods certainly will work. We have proven that at TREC, but only if you have good software. With just a little training, and some help at first from consultants (most vendors with bona fide active machine learning features will have good ones to help), you can have the kind of success and excitement that I am talking about.

Do not give up if it does not work for you the first time, especially in a complex project. Try another vendor instead, one that may have better software and better consultants. Also, be sure that your consultants are Predictive Coding 4.0 experts, and that you follow their advice. Finally, remember that the cheapest software is almost never the best, and, in the long run will cost you a small fortune in wasted time and frustration.

Conclusion

Love what you do. It is a great feeling and sure fire way to job satisfaction and success. With these new predictive coding technologies it is easier than ever to love e-discovery. Try them out. Treat yourself to the AI high that comes from using smart machine learning software and fast computers. There is nothing else like it. If you switch to the 4.0 methods and software, you too can know that thrill. You can watch an advanced intelligence, which you helped create, exceed your own abilities, exceed anyone’s abilities. You can sit back and watch Mr. EDR complete your search for you. You can watch him do so in record time and with record results. It is amazing to see good software find documents that you know you would never have found on your own.

Predictive coding AI in superhero mode can be exciting to watch. Why deprive yourself of that? Who says document review has to be slow and boring? Start making the practice of law fun again.

 

 


%d bloggers like this: