Elusion Random Sample Test Ordered Under Rule 26(g) in a Keyword Search Based Discovery Plan

August 26, 2018

There is a new case out of Chicago that advances the jurisprudence of my sub-specialty, Legal Search. City of Rockford v. Mallinckrodt ARD Inc., 2018 WL 3766673, Case 3:17-cv-50107 (N.D. Ill., Aug. 7, 2018). This discovery order was written by U.S. Magistrate Judge Iain Johnston who entitled it: “Order Establishing Production Protocol for Electronically Stored Information.” The opinion is both advanced and humorous, destined to be an oft-cited favorite for many. Thank you Judge Johnston.

In City of Rockford an Elusion random sample quality assurance test was required as part of the parties discovery plan to meet the reasonable efforts requirements of Rule 26(g). The random sample procedure proposed was found to impose only a proportional, reasonable burden under Rule 26(b)(1). What makes this holding particularly interesting is that an Elusion test is commonly employed in predictive coding projects, but here the parties had agreed to a keyword search based discovery plan. Also see: Tara Emory, PMP, Court Holds that Math Matters for eDiscovery Keyword Search,  Urges Lawyers to Abandon their Fear of Technology (Driven, (August 16, 2018) (“party using keywords was required to test the search effectiveness by sampling the set of documents that did not contain the keywords.”)

The Known Unknowns and Unknown Unknowns

Judge Johnston begins his order in City of Rockford with a famous quote by Donald Rumseld, a two-time Secretary of Defense.

“[A]s we know there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. . .”
Donald Rumseld

For those not familiar with this famous Known Knowns quip, here is a video of the original:

Here the knowledge logic is spelled out in a chart, since I know we all love that sort of thing. Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Anybody who does complex investigations is familiar with this problem. Indeed, you can argue this insight is fundamental to all of science and experimental method. Logan, David C. (March 1, 2009). “Known knowns, known unknowns, unknown unknowns and the propagation of scientific enquiry”, Journal of Experimental Botany 60 (3). pp. 712–4. [I have always wanted to quote a botany journal.]

How do you deal with the known unknowns and the unknown unknowns, the information that we don’t even know that we don’t know about? The deep, hidden information that is both obtuse and rare. Information that is hard to retrieve and harder still to prove does not exist at all. Are you chasing something that might not exist? Something unknown because nonexistent? Such as an overlooked Highly Relevant document? (The stuff of nightmares!) Are you searching for nothing? Zero? If you find it, what does that mean? What can be known and what can never be known? Scientists, investigators and the Secretary of Defense alike all have to ponder these questions and all want to use the best tools and best people possible to do so. See: Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Seeking Knowledge of the Unknown Elusion Error Rate

These big questions, though interesting, are not why Judge Johnston started his opinion with the Rumseld quote. Instead, he used the quote to emphasize that new e-discovery methods, namely random sampling and statistical analysis, can empower lawyers to know what they never did before. A technical way to know the known unknowns. For instance, a way to know the number of relevant documents that will be missed and not produced: the documents that elude retrieval.

As the opinion and this blog will explain, you can do that, know that, by using an Elusion random sample of the null-set. The statistical analysis of the sample transforms the unknown quantity to a known (subject to statistical probabilities and range). It allows lawyers to know, at least within a range, the number of relevant documents that have not been found. This is a very useful quality assurance method that relies on objective measurements to demonstrate success of your project, which here is information retrieval. This and other random sampling methods allow for the calculation of Recall, meaning the percent of total relevant documents found. This is another math-based, quality assurance tool in the field of information retrieval.

One of the main points Judge Johnston makes in his order is that lawyers should embrace this kind of technical knowledge, not shy away from it. As Tara Emory said in her article, Court Holds that Math Matters for eDiscovery Keyword Search:

A producing party must determine that its search process was reasonable. In many cases, the best way to do this is with objective metrics. Producing parties often put significant effort into brainstorming keywords, interviewing witnesses to determine additional terms, negotiating terms with the other party, and testing the documents containing their keywords to eliminate false positives. However, these efforts often still fail to identify documents if important keywords were missed, and sampling the null set is a simple, reasonable way to test whether additional keywords are needed. …

It is important to overcome the fear of technology and its related jargon, which can help counsel demonstrate the reasonableness of search and production process. As Judge Johnston explains, sampling the null set is a process to determine “the known unknown,” which “is the number of the documents that will be missed and not produced.” Judge Johnson disagreed with the defendants’ argument “that searching the null set would be costly and burdensome.” The Order requires Defendants to sample their null set at a 95% +/-2% margin of error (which, even for a very large set of documents, would be about 2,400 documents to review).[4] By taking these measures—either with TAR or with search terms, counsel can more appropriately represent that they have undertaken a “reasonable inquiry” for relevant information within the meaning of FRCP 26(g)(1).

Small Discovery Dispute in an Ocean of Cooperation

Judge Johnston was not asked to solve the deep mysteries of knowing and not knowing in City of Rockford. The parties came to him instead with an interesting, esoteric discovery dispute. They had agreed on a great number of things, for which the court profusely congratulated them.

The attorneys are commended for this cooperation, and their clients should appreciate their efforts in this regard. The Court certainly does. The litigation so far is a solid example that zealous advocacy is not necessarily incompatible with cooperation. The current issue before the Court is an example of that advocacy and cooperation. The parties have worked to develop a protocol for the production of ESI in this case, but have now reached an impasse as to one aspect of the protocol.

The parties disagreed on whether to include a document review quality assurance test in the protocol. The Plaintiffs wanted one and the Defendants did not. Too burdensome they said.

To be specific, the Plaintiffs wanted a test where the efficacy of any parties production would be tested by use of an Elusion type of Random Sample of the documents not produced. The Defendants opposed any specific test. Instead, they wanted the discovery protocol to say that if the receiving party had concerns about the adequacy of the producing party’s efforts, then they would have a conference to address the concerns.

Judge Johnston ruled for the plaintiff in this dispute and ordered a  random elusion sample to be taken after the defendant stopped work and completed production. In this case it was a good decision, but should not be routinely required in all matters.

The Stop Decision and Elusion Sample

One of the fundamental problems in any investigation is to know when you should stop the investigation because it is no longer worth the effort to carry on. When has a reasonable effort been completed? Ideally this happens after all of the important documents have already been found. At that point you should stop the effort and move on to a new project. Alternatively, perhaps you should keep on going and look for more? Should you stop or not?

In Legal Search we all this the “Stop Decision.” Should you conclude the investigation or continue further AI training rounds and other search. As explained in the e-Discovery Team TAR Course:

The all important stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment.You decide to stop the review after weighing a multitude of considerations. Then you test your decision with a random sample in Step Seven.

See: TAR Course: 15th Class – Step Seven – ZEN Quality Assurance Tests.

If you want to go deeper into this, then listen in on this TAR Course lecture on the Stop decision.

____________

Once a decision is made to Stop, then a well managed document review project will use different tools and metrics to verify that the Stop decision was correct. Judge Johnston in City of Rockford used one of my favorite tools, the Elusion random sample that I teach in the e-Discovery Team TAR Course. This type of random sample is called an Elusion sample.

Judge Johnston ordered an Elusion type random sample of the null set in City of Rockford. The sample would determine the range of relevant documents that likely eluded you. These are called False Negatives. Documents presumed Irrelevant and withheld that were in fact Relevant and should have been produced. The Elusion sample is designed to give you information on the total number of Relevant documents that were likely missed, unretrieved, unreviewed and not produced or logged. The fewer the number of False Negatives the better the Recall of True Positives. The goal is to find, to retrieve, all of the Relevant ESI in the collection.

Another way to say the same thing is to say that the goal is Zero False Negatives. You do not miss a single relevant file. Every file designated Irrelevant is in fact not relevant. They are all True Negatives. That would be Total Recall: “the Truth, the Whole Truth …” But that is very rare and some error, some False Negatives, are expected in every large information retrieval project. Some relevant documents will almost always be missed, so the goal is to make the False Negatives inconsequential and keep the Elusion rate low.

Here is how Judge Iain Johnston explained the random sample:

Plaintiffs propose a random sample of the null set. (The “null set” is the set of documents that are not returned as responsive by a search process, or that are identified as not relevant by a review process. See Maura R. Grossman & Gordon v. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. L. Rev. 1, 25 (2013). The null set can be used to determine “elusion,” which is the fraction of documents identified as non-relevant by a search or review effort that are, in fact, relevant. Elusion is estimated by taking a random sample of the null set and determining how many or what portion of documents are actually relevant. Id. at 15.) FN 2

Judge Johnston’s Footnote Two is interesting for two reasons. One, it attempts to calm lawyers who freak out when hearing anything having to do with math or statistics, much less information science and technology. Two, it does so with a reference to Fizbo the clown.

The Court pauses here for a moment to calm down litigators less familiar with ESI. (You know who you are.) In life, there are many things to be scared of, including, but not limited to, spiders, sharks, and clowns – definitely clowns , even Fizbo. ESI is not something to be scared of. The same is true for all the terms and jargon related to ESI. … So don’t freak out.

Accept on Zero Error for Hot Documents

Although this is not addressed in the court order, in my personal view, no False Negatives, iw – overlooked  documents – are acceptable when it comes to Highly Relevant documents. If even one document like that is found in the sample, one Highly Relevant Document, then the Elusion test has failed in my view. You must conclude that the Stop decision was wrong and training and document review must recommence. That is called an Accept on Zero Error test for any hot documents found. Of course my personal views on best practice here assume the use of AI ranking, and the parties in City of Rockford only used keyword search. Apparently they were not doing machine training at all.

The odds of finding False Negatives, assuming that only a few exist (very low prevalence) and the database is large, are very unlikely in a modest sized random sample. With very low prevalence of relevant ESI the test can be of limited effectiveness. That is an inherent problem with low prevalence and random sampling. That is why statistics have only limited effectiveness and should be considered part of a total quality control program. See Zero Error Numerics: ZEN. Math matters, but so too does good project management and communications.

The inherent problem with random sampling is that the only way to reduce the error interval is to increase the size of the sample. For instance, to decrease the margin of error to only 2% either way, a total error of 4%, a random sample size of around 2,400 documents is needed. Even though that narrows the error rate to 4%, there is still another error factor of the Confidence Level, here at 95%. Still, it is not worth the effort to review even more sample documents to reduce that to a 99% Level.

Random sampling has limitations in low prevalence datasets, which is typical in e-discovery, but still sampling can be very useful. Due to this rarity issue, and the care that producing parties always take to attain high Recall, any documents found in an Elusion random sample should be carefully studied to see if they are of any significance. We look very carefully at any new documents found that are of a kind not seen before. That is unusual. Typically  any relevant documents found by random sample of the elusion set are of a type that have been seen before, often many, many times before. These “same old, same old” type of documents are of no importance to the investigation at this point.

Most email related datasets are filled with duplicative, low value data. It is not exactly irrelevant noise, but it is not a helpful signal either. We do not care if we  get all of that kind of merely relevant data. What we really want are the Hot Docs, the high value Highly Relevant ESI, or at least Relevant and of a kind not seen before. That is why the Accept On Zero Error test is so important for Highly Relevant documents.

The Elusion Test in City of Rockford 

In City of Rockford Judge Johnston considered a discovery stipulation where the parties had agreed to use a typical keyword search protocol, but disagreed on a quality assurance protocol. Judge Johnston held:

With key word searching (as with any retrieval process), without doubt, relevant documents will be produced, and without doubt, some relevant documents will be missed and not produced. That is a known known. The known unknown is the number of the documents that will be missed and not produced.

Back to the False Negatives again, the known unknown. Judge Johnston continues his analysis:

But there is a process by which to determine that answer, thereby making the known unknown a known known. That process is to randomly sample the nullset. Karl Schieneman & Thomas C. Gricks III, The Implications of Rule26(g) on the Use of Technology-Assisted Review, 2013 Fed. Cts. L. Rev. 239, 273 (2013)(“[S]ampling the null set will establish the number of relevant documents that are not being produced.”). Consequently, the question becomes whether sampling the null set is a reasonable inquiry under Rule 26(g) and proportional to the needs of this case under Rule 26(b)(1).

Rule 26(g) Certification
Judge Johnston takes an expansive view of the duties placed on counsel of record by Rule 26(g), but concedes that perfection is not required:

Federal Rule of Civil Procedure 26(g) requires all discovery requests be signed by at least one attorney (or party, if proceeding pro se). Fed. R. Civ. P. 26(g)(1). By signing the response, the attorney is certifying that to the best of counsel’s knowledge, information, and belief formed after a reasonable inquiry, the disclosure is complete and correct at the time it was made. Fed. R. Civ. P. 26(g)(1)(A). But disclosure of documents need not be perfect. … If the Federal Rules of Civil Procedure were previously only translucent on this point, it should now be clear with the renewed emphasis on proportionality.

Judge Johnston concludes that Rule 26(g) on certification applies to require the Elusion sample in this case.

Just as it is used in TAR, a random sample of the null set provides validation and quality assurance of the document production when performing key word searches.  Magistrate Judge Andrew Peck made this point nearly a decade ago. See William A. Gross Constr. Assocs., 256 F.R.D. at 135-6 (citing Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 262 (D. Md. 2008)); In re Seroquel Products Liability Litig., 244 F.R.D. 650, 662 (M.D. Fla. 2007) (requiring quality assurance).

Accordingly, because a random sample of the null set will help validate the document production in this case, the process is reasonable under Rule 26(g).

Rule 26(b)(1) Proportionality

Judge Johnston considered as a separate issue whether it was proportionate under Rule 26(b)(1) to require the elusion test requested. Again, the court found that it was in this large case on the pricing of prescription medication and held that it was proportional:

The Court’s experience and understanding is that a random sample of the null set will not be unreasonably expensive or burdensome. Moreover and critically, Defendants have failed to provide any evidence to support their contention. Mckinney/Pearl Rest. Partners, L.P. v. Metro. Life Ins. Co., 322 F.R.D. 235, 242 (N.D.Tex. 2016) (party required to submit affidavits or offer evidence revealing the nature of the burden)
Once again we see a party seeking protection from having to do something because it is so burdensome then failing to present actual evidence of burden. We see this a lot lately. Responding Party’s Complaints of Financial Burden of Document Review Were Unsupported by the Evidence, Any Evidence (e-Discovery Team, 8/5/18);

Judge Johnston concludes his “Order Establishing Production Protocol for Electronically Stored Information” with the following:

The Court adopts the parties’ proposed order establishing the production protocol for ESI with the inclusion of Plaintiffs’ proposal that a random sample of the null set will occur after the production and that any responsive documents found as a result of that process will be produced. Moreover, following that production, the parties should discuss what additional actions, if any, should occur. If the parties cannot agree at that point, they can raise the issue with the Court.

Conclusion

City of Rockford is important because it is the first case to hold that a quality control procedure should be used to meet the reasonable efforts certification requirements of Rule 26(g). The procedure here required was a random sample Elusion test with related, limited data sharing. If this interpretation of Rule 26(g) is followed by other courts, then it could have a big impact on legal search jurisprudence. Tara Emory in her article, Court Holds that Math Matters for eDiscovery Keyword Search goes so far as to conclude that City of Rockford stands for the proposition that “the testing and sampling process associated with search terms is essential for establishing the reasonableness of a search under FRCP 26(g).”

The City of Rockford holding could persuade other judges and encourage courts to be more active and impose specific document review procedures on all parties, including requiring the use of sampling and artificial intelligence. The producing party cannot always have a  free pass under Sedona Principle Six. Testing and sampling may well be routinely ordered in all “large” document review cases in the future.

It will be very interesting to watch how other attorneys argue City of Rockford. It will continue a line of cases examining methodology and procedures in document review. See eg., William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134 (S.D.N.Y. 2009) (“wake-up call” for lawyers on keyword search); Winfield v. City of New York (SDNY, Nov. 27, 2017), where Judge Andrew Peck considers methodologies and quality controls of the active machine learning process. Also see Special Master Maura Grossman’s Order Regarding Search Methodology for ESI, a validation Protocol for the Broiler Chicken antitrust cases.

The validation procedure of an Elusion sample in City of Rockford is just one of many possible review protocols that a court could impose under Rule 26(g). There are dozens more, including whether predictive coding should be required. So far, courts have been reluctant to order that, as Judge Peck explained in Hyles:

There may come a time when TAR is so widely used that it might be unreasonable for a party to decline to use TAR. We are not there yet.

Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016):

Like a kid in the backseat of the car, I cannot help but ask, are we there yet? Hyles was published over two years ago now. Maybe some court, somewhere in the world, has already ordered a party to do predictive coding against their will, but not to our knowledge. That is a known unknown. Still, we are closer to “There” with the City of Rockford’s requirement of an Elusion test.

When we get “there,” and TAR is finally ordered in a case, it will probably arise in a situation like City of Rockford where a joint protocol applicable to all parties is involved. That is easier to sell than a one-sided protocol. The court is likely to justify the order by Rule 26(g), and hold that it requires all parties in the case to use predictive coding. Otherwise, they will not meet the  reasonable effort burdens of Rule 26(g). Other rules will be cited too, of course, including Rule 1, but Rule 26(g) is likley to be key.

____________

___

 

____

 

 


Dumb and Dumber Strike Again: New case out of California provides a timely lesson in legal search stupidity

February 18, 2018

An interesting, albeit dumb, case out of California provides some good cautionary instruction for anybody doing discovery. Youngevity Int’l Corp. v. Smith, 2017 U.S. Dist. LEXIS 210386 (S.D. Cal. Dec. 21, 2017). Youngevity is essentially an unfair competition dispute that arose when some multi-level nutritional marketing sales types left one company to form their own. Yup, multi-level nutritional sales; the case has sleaze written all over it. The actions of the Plaintiff in this case are, in my opinion, and that of the judge, especially embarrassing. In fact both sides remind me of a classic movie Dumb and Dumber. It has a line in it favored by all students of statistics: So you’re telling me there’s a chance.

One in a million is about the chance that the Plaintiff’s discovery plan in Youngevity had of succeeding in federal court in front of the smart United States Magistrate Judge assigned to the case, Jill L. Burkhardt.

Dumb and Dumber

So what did the Plaintiff do that is so dumb? So timely? They confused documents that have a “hit” in them with documents that are relevant. As if having a keyword in a document could somehow magically make it relevant under the rules, or responsive to a request for relevant information under the rules. Not not only that, and here is the dumber part, the Plaintiff produced 4.2 Million pages of such “hit” documents to defendant without reviewing them. They produced the documents without review, but tried to protect their privilege by designating them all “Attorney Eyes Only.” Dumb and dumber. But, in fairness to Plaintiff’s counsel, not something I am especially known for doing, I know, but still, in fairness to the eight attorneys of record for the plaintiffs, this is something that clients sometimes make their attorneys do as a “cost saving” maneuver.

Fellow Blogger Comment

As my bow-tied friend, , put it in his blog on this case:

Just because ESI is a hit to a search term, does NOT mean that data is responsive to any discovery request. Moreover, designating all ESI as Attorney-Eyes Only should not be done as a tactic to avoid conducting document review. …

Responding to discovery requests should not ignore requests for production. Parties often get lost in search terms, focusing on document review as process independent of the claims of the lawsuit. Lawyers should resist that quagmire and focus document review to respond to the requests for production. Developing searches is the first step in responding, however, a search strategy should not simply be keywords. Searches should be built with the requests, including date ranges, messages sent between individuals, and other methods to focus on the merits of the case, not document review for the sake of document review.

The occurrence of a keyword term in a paper document, or a computer file, or any other ESI does not make the file relevant. A ESI file is relevant depending on the overall content of the file, not just one word.

Procedural Background

Here is Judge Jill L. Burkhardt concise explanation of the factual, procedural background of the keyword dispute (citations to the record omitted).

On May 9, 2017, Wakaya emailed Youngevity to discuss the use of search terms to identify and collect potentially responsive electronically-stored information (ESI) from the substantial amount of ESI both parties possessed. Wakaya proposed a three-step process by which: “(i) each side proposes a list of search terms for their own documents; (ii) each side offers any supplemental terms to be added to the other side’s proposed list; and (iii) each side may review the total number of results generated by each term in the supplemented lists (i.e., a ‘hit list’ from our third-party vendors) and request that the other side omit any terms appearing to generate a disproportionate number of results.” On May 10, 2017, while providing a date to exchange search terms, Youngevity stated that the “use of key words as search aids may not be used to justify non-disclosure of responsive information.” On May 15, 2017, Youngevity stated that “[w]e are amenable to the three step process described in your May 9 e-mail….” Later that day, the parties exchanged lists of proposed search terms to be run across their own ESI. On May 17, 2017, the parties exchanged lists of additional search terms that each side proposed be run across the opposing party’s ESI.

The plaintiffs never produced their hit list as promised and as demanded by Defendants several times after the agreement was reached. Instead, they produced all documents on the hit list, some 4.2 Million pages, and labeled them all AEO. The defendants primarily objected to calling the plaintiffs’ labeling all documents Attorneys Eyes Only, instead of Confidential. The complaint about the production defect by producing all documents with hits, instead of all documents that were responsive, seems like an after thought.

Keyword Search Was New in the 1980s

The focus in this case on keyword search alone, instead of using a Hybrid Multimodal approach, is how a majority of ill-informed lawyers today still handle legal search today. I think keywords are an acceptable way to start a conversation, and begin a review, but to use keyword search alone  hearkens back to the dark ages of document review, the mid-nineteen eighties. That is when lawyers first started using keyword search. Remember the Blair & Maron study of the San Francisco subway litigation document search? The study was completed in 1985. It found that when the lawyers and paralegals thought they had found over 75% of the relevant documents using keyword search, that they had in fact only found 20%. Blair, David C., & Maron, M. E., An evaluation of retrieval effectiveness for a full-text document-retrieval system; Communications of the ACM Volume 28, Issue 3 (March 1985).

The Blair Maron study is thirty-three years old and yet today we still have lawyers using keyword search alone, like it was the latest and greatest. The technology gap in the law is incredibly large. This is especially true when it comes to document review where the latest AI enhanced technologies are truly great. WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0. Wake up lawyers. We have come a long was since the 1980s and keyword search.

Judge Burkhardt’s Ruling

Back to the Dumb and Dumber story in Youngevity as told to us by the smartest person in that room, by far, Judge Burkhardt:

The Court suggested that a technology-assisted review (TAR) may be the most efficient way to resolve the myriad disputes surrounding Youngevity’s productions.

Note this suggestion seems to have been ignored by both sides. Are you surprised? At least the judge tried. Not back to the rest of the Dumb and Dumber story:

designated as AEO. Youngevity does not claim that the documents are all properly designated AEO, but asserts that this mass designation was the only way to timely meet its production obligations when it produced documents on July 21, 2017 and August 22, 2017. It offers no explanation as to why it has not used the intervening five months to conduct a review and properly designate the documents, except to say, “Youngevity believes that the parties reached an agreement on de-designation of Youngevity’s production which will occur upon the resolution of the matters underlying this briefing.” Why that de-designation is being held up while this motion is pending is not evident.

Oh yeah. Try to BS the judge. Another dumb move. Back to the story:

Wakaya argues that Youngevity failed to review any documents prior to production and instead provided Wakaya with a “document dump” containing masses of irrelevant documents, including privileged information, and missing “critical” documents. Youngevity’s productions contain documents such as Business Wire news emails, emails reminding employees to clean out the office
refrigerator, EBay transaction emails, UPS tracking emails, emails from StubHub, and employee file and benefits information. Youngevity argues that it simply provided the documents Wakaya requested in the manner that Wakaya instructed.  …

Wakaya demanded that Youngevity review its production and remove irrelevant and non-responsive documents.

The poor judge is now being bothered by motions and phone calls as the many lawyers for both sides bill like crazy and ask for her help. Judge Burkhardt again does the smart thing and pushed the attorneys to use TAR and, since it is obvious they are clueless, to hire vendors to help them to do it.

[T]he Court suggested that conducting a TAR of Youngevity’s productions might be an efficient way to resolve the issues. On October 5, 2017, the parties participated in another informal discovery conference with the Court because they were unable to resolve their disputes relating to the TAR process and the payment of costs associated with TAR. The Court suggested that counsel meet and confer again with both parties’ discovery vendors participating. Wakaya states that on October 6, 2017, the parties participated in a joint call with their discovery vendors to discuss the TAR process.  The parties could not agree on who would bear the costs of the TAR process. Youngevity states that it offered to pay half the costs associated with the TAR process, but Wakaya would not agree that TAR alone would result in a document production that satisfied Youngevity’s discovery obligations. Wakaya argued that it should not have bear the costs of fixing Youngevity’s improper productions. On October 9, 2017, the parties left a joint voicemail with the Court stating that they had reached a partial agreement to conduct a TAR of Youngevity’s production, but could not resolve the issue of which party would bear the TAR costs. In response to the parties’ joint voicemail, the Court issued a briefing schedule for the instant motion.

Makes you want to tear your hair out just to read it, doesn’t it? Yet the judge has to deal with junk like this every day. Patience of a saint.

More from Judge Burkhardt, who does a very good survey of the relevant law, starting at page four of the opinion (I suggest you read it). Skipping to the Analysis segment of the opinion at pages five through nine, here are the highlights, starting with a zinger against all counsel concerning the Rule 26(g) arguments:

Wakaya fails to establish that Youngevity violated Rule 26(g). Wakaya does not specifically claim that certificates signed by Youngevity or its counsel violate Rule 26(g). Neither party, despite filing over 1,600 pages of briefing and exhibits for this motion, provided the Court with Youngevity’s written discovery responses and certification. The Court declines to find that Youngevity improperly certified its discovery responses when the record before it does not indicate the content of Youngevity’s written responses, its certification, or a declaration stating that Youngevity in fact certified its responses. See Cherrington Asia Ltd. v. A & L Underground, Inc., 263 F.R.D. 653, 658 (D. Kan. 2010) (declining to impose sanctions under Rule 26(g) when plaintiffs do not specifically claim that certificates signed by defendant’s counsel violated the provisions of Rule 26(g)(1)). Accordingly, Wakaya is not entitled to relief under Rule 26(g).

Wow! Over 1,600 pages of memos and nobody provided the Rule 26(g) certification to the court that plaintiffs’ counsel allegedly violated. Back to the Dumb and Dumber story as told to us by Judge Burkhardt:

Besides establishing that Youngevity’s production exceeded Wakaya’s requests, the record indicates that Youngevity did not produce documents following the protocol to which the parties agreed.  … Youngevity failed to produce its hit list to Wakaya, and instead produced every document that hit upon any proposed search term. Had Youngevity provided its hit list to Wakaya as agreed and repeatedly requested, Wakaya might have proposed a modification to the search terms that generated disproportionate results, thus potentially substantially reducing the number of documents requiring further review and ultimate production. …

Second, Youngevity conflates a hit on the parties’ proposed search terms with responsiveness.[11] The two are not synonymous. Youngevity admits that it has an obligation to produce responsive documents. Youngevity argues that because each document hit on a search term, “the documents Youngevity produced are necessarily responsive to Wakaya’s Requests.” Search terms are an important tool parties may use to identify potentially responsive documents in cases involving substantial amounts of ESI. Search terms do not, however, replace a party’s requests for production. See In re Lithium Ion Batteries Antitrust Litig., No. 13MD02420 YGR (DMR), 2015 WL 833681, at *3 (N.D. Cal. Feb. 24, 2015) (noting that “a problem with keywords ‘is that they often are over inclusive, that is, they find responsive documents but also large numbers of irrelevant documents’”) (quoting Moore v. Publicis Groupe , 287 F.R.D. 182, 191 7 of 11 (S.D.N.Y. 2012)). UPS tracking emails and notices that employees must clean out the refrigerator are not responsive to Wakaya’s requests for production solely because they hit on a search term the parties’ agreed upon.

It was nice to see my Da Silva Moore case quoted on keyword defects, not just approval of predictive coding. The quote refers to what know known as the lack of PRECISION in using untested keyword search. One of the main advantages of active machine learning it to improve precision and keep lawyers from wasting their time reading messages about refrigerator cleaning.

Now Judge Burkhardt is ready to rule:

The Court is persuaded that running proposed search terms across Youngevity’s ESI, refusing to honor a negotiated agreement to provide a hit list which Wakaya was to use to narrow its requested search terms, and then producing all documents hit upon without reviewing a single document prior to production or engaging in any other quality control measures, does not satisfy Youngevity’s discovery obligations. Further, as is discussed below, mass designation of every document in both productions as AEO clearly violates the Stipulated Protective Order in this case. Youngevity may not frustrate the spirit of the discovery rules by producing a flood of documents it never reviewed, designate all the documents as AEO without regard to whether they meet the standard for such a designation, and thus bury responsive documents among millions of produced pages. See Queensridge Towers, LLC v. Allianz Glob. Risks US Ins. Co. , No. 2:13-CV-00197-JCM, 2014 WL 496952, at *6-7 (D. Nev. Feb. 4, 2014) (ordering plaintiff to supplement its discovery responses by specifying which documents are responsive to each of defendant’s discovery requests when plaintiff responded to requests for production and interrogatories by stating that the answers are somewhere among the millions of pages produced). Youngevity’s productions were such a mystery, even to itself, that it not only designated the entirety of both productions as AEO, but notified Wakaya that the productions might contain privileged documents. Accordingly, Wakaya’s request to compel proper productions is granted, as outlined below. See infra Section IV.

Judge Jill Burkhardt went on the award fees and costs to be taxed against the plaintiffs.

Conclusion

A document is never responsive, never relevant, just because it has a keyword in it. As Judge Burkhardt put it, that conflates a hit on the parties’ proposed search terms with responsiveness. In some cases, but not this one, a request for production may explicitly demand production of all documents that contain certain keywords. If such a request is made, then you should object. We are seeing more and more improper requests like this. The rules do not allow for a request to produce documents with certain keywords regardless of the relevance of the documents. (The reasonably calculated phrase was killed in 2015 and is no longer good law.) The rules and case law do not define relevance in terms of keywords. They define relevance in terms of proportional probative value to claims and defense raised. Again, as Judge Burkhardt out it, search terms do not …replace a party’s requests for production.

I agree with Josh Gilliland who said parties often get lost in search terms, focusing on document review as process independent of the claims of the lawsuit. The first step in my TAR process is ESI communications or Talk. This includes speaking with the requesting party to clarify the documents sought. This should mean discussion of the claims of the lawsuit and what the requesting party hopes to find. Keywords are just a secondary byproduct of this kind of discussion. Keywords are not an end in themselves. Avoid that quagmire as Josh says and focus on clarifying the requests for production. Focus on Rule 26(b)(1) relevance and proportionality.

Another lesson, do not get stuck with just using keywords. We have come up with many other search tools since the 1980s. Use them. Use all of them. Go Multimodal. In a big complex case like Youngevity Int’l Corp. v. Smith, be sure to go Hybrid too. Be sure to use the most powerful search tool of all,  predictive coding. See TAR Course for detailed instruction on Hybrid Multimodal. The robots will eat your keywords for lunch.

The AI power of active machine learning was the right solution available to the plaintiffs all along. Judge Burkhardt tried to tell them. Plaintiffs did not have to resort to dangerous production without review just to avoid paying their lawyers to read about their refrigerator cleanings. Let the AI read about all of that. It reads at near the speed of light and never forgets. If you have a good AI trainer, which is my specialty, the AI will understand what is relevant and find what you are looking for.


TAR Course Expands Again: Standardized Best Practice for Technology Assisted Review

February 11, 2018

The TAR Course has a new class, the Seventeenth Class: Another “Player’s View” of the Workflow. Several other parts of the Course have been updated and edited. It now has Eighteen Classes (listed at end). The TAR Course is free and follows the Open Source tradition. We freely disclose the method for electronic document review that uses the latest technology tools for search and quality controls. These technologies and methods empower attorneys to find the evidence needed for all text-based investigations. The TAR Course shares the state of the art for using AI to enhance electronic document review.

The key is to know how to use the document review search tools that are now available to find the targeted information. We have been working on various methods of use since our case before Judge Andrew Peck in Da Silva Moore in 2012. After we helped get the first judicial approval of predictive coding in Da Silva, we began a series of several hundred document reviews, both in legal practice and scientific experiments. We have now refined our method many times to attain optimal efficiency and effectiveness. We call our latest method Hybrid Multimodal IST Predictive Coding 4.0.

The Hybrid Multimodal method taught by the TARcourse.com combines law and technology. Successful completion of the TAR course requires knowledge of both fields. In the technology field active machine learning is the most important technology to understand, especially the intricacies of training selection, such as Intelligently Spaced Training (“IST”). In the legal field the proportionality doctrine is key to the  pragmatic application of the method taught at TAR Course. We give-away the information on the methods, we open-source it through this publication.

All we can transmit by online teaching is information, and a small bit of knowledge. Knowing the Information in the TAR Course is a necessary prerequisite for real knowledge of Hybrid Multimodal IST Predictive Coding 4.0. Knowledge, as opposed to Information, is taught the same way as advanced trial practice, by second chairing a number of trials. This kind of instruction is the one with real value, the one that completes a doc review project at the same time it completes training. We charge for document review and throw in the training. Information on the latest methods of document review is inherently free, but Knowledge of how to use these methods is a pay to learn process.

The Open Sourced Predictive Coding 4.0 method is applied for particular applications and search projects. There are always some customization and modifications to the default standards to meet the project requirements. All variations are documented and can be fully explained and justified. This is a process where the clients learn by doing and following along with Losey’s work.

What he has learned through a lifetime of teaching and studying Law and Technology is that real Knowledge can never be gained by reading or listening to presentations. Knowledge can only be gained by working with other people in real-time (or near-time), in this case, to carry out multiple electronic document reviews. The transmission of knowledge comes from the Q&A ESI Communications process. It comes from doing. When we lead a project, we help students to go from mere Information about the methods to real Knowledge of how it works. For instance, we do not just make the Stop decision, we also explain the decision. We share our work-product.

Knowledge comes from observing the application of the legal search methods in a variety of different review projects. Eventually some Wisdom may arise, especially as you recover from errors. For background on this triad, see Examining the 12 Predictions Made in 2015 in “Information → Knowledge → Wisdom” (2017). Once Wisdom arises some of the sayings in the TAR Course may start to make sense, such as our favorite “Relevant Is Irrelevant.” Until this koan is understood, the legal doctrine of Proportionality can be an overly complex weave.

The TAR Course is now composed of eighteen classes:

  1. First Class: Background and History of Predictive Coding
  2. Second Class: Introduction to the Course
  3. Third Class:  TREC Total Recall Track, 2015 and 2016
  4. Fourth Class: Introduction to the Nine Insights from TREC Research Concerning the Use of Predictive Coding in Legal Document Review
  5. Fifth Class: 1st of the Nine Insights – Active Machine Learning
  6. Sixth Class: 2nd Insight – Balanced Hybrid and Intelligently Spaced Training (IST)
  7. Seventh Class: 3rd and 4th Insights – Concept and Similarity Searches
  8. Eighth Class: 5th and 6th Insights – Keyword and Linear Review
  9. Ninth Class: 7th, 8th and 9th Insights – SME, Method, Software; the Three Pillars of Quality Control
  10. Tenth Class: Introduction to the Eight-Step Work Flow
  11. Eleventh Class: Step One – ESI Communications
  12. Twelfth Class: Step Two – Multimodal ECA
  13. Thirteenth Class: Step Three – Random Prevalence
  14. Fourteenth Class: Steps Four, Five and Six – Iterative Machine Training
  15. Fifteenth Class: Step Seven – ZEN Quality Assurance Tests (Zero Error Numerics)
  16. Sixteenth Class: Step Eight – Phased Production
  17. Seventeenth Class: Another “Player’s View” of the Workflow (class added 2018)
  18. Eighteenth Class: Conclusion

With a lot of hard work you can complete this online training program in a long weekend, but most people take a few weeks. After that, this course can serve as a solid reference to consult during complex document review projects. It can also serve as a launchpad for real Knowledge and eventually some Wisdom into electronic document review. TARcourse.com is designed to provide you with the Information needed to start this path to AI enhanced evidence detection and production.

 


True Confessions: The Importance of Project Managers and Details of Second Pass Review in a Complex ESI Review Project

January 21, 2018

 

 

 

See the full article on all members and activities of a complex document review team, which this video supplements:  The Key Players and Play of an e-Discovery Team in a Complex AI-Enhanced Document Review Project.

Also see the video on the four-step method. Iterated Four-Step Work Flow for Active Machine Training to Help Attorneys Locate Relevant Evidence

Also see the video The SME Team Members in a Complex AI-Enhanced Document Review Project.

 


%d bloggers like this: