An interesting, albeit dumb, case out of California provides some good cautionary instruction for anybody doing discovery. Youngevity Int’l Corp. v. Smith, 2017 U.S. Dist. LEXIS 210386 (S.D. Cal. Dec. 21, 2017). Youngevity is essentially an unfair competition dispute that arose when some multi-level nutritional marketing sales types left one company to form their own. Yup, multi-level nutritional sales; the case has sleaze written all over it. The actions of the Plaintiff in this case are, in my opinion, and that of the judge, especially embarrassing. In fact both sides remind me of a classic movie Dumb and Dumber. It has a line in it favored by all students of statistics: So you’re telling me there’s a chance.
One in a million is about the chance that the Plaintiff’s discovery plan in Youngevity had of succeeding in federal court in front of the smart United States Magistrate Judge assigned to the case, Jill L. Burkhardt.
Dumb and Dumber
So what did the Plaintiff do that is so dumb? So timely? They confused documents that have a “hit” in them with documents that are relevant. As if having a keyword in a document could somehow magically make it relevant under the rules, or responsive to a request for relevant information under the rules. Not not only that, and here is the dumber part, the Plaintiff produced 4.2 Million pages of such “hit” documents to defendant without reviewing them. They produced the documents without review, but tried to protect their privilege by designating them all “Attorney Eyes Only.” Dumb and dumber. But, in fairness to Plaintiff’s counsel, not something I am especially known for doing, I know, but still, in fairness to the eight attorneys of record for the plaintiffs, this is something that clients sometimes make their attorneys do as a “cost saving” maneuver.
Fellow Blogger Comment
As my bow-tied friend, blog on this case:, put it in his
Just because ESI is a hit to a search term, does NOT mean that data is responsive to any discovery request. Moreover, designating all ESI as Attorney-Eyes Only should not be done as a tactic to avoid conducting document review. …
Responding to discovery requests should not ignore requests for production. Parties often get lost in search terms, focusing on document review as process independent of the claims of the lawsuit. Lawyers should resist that quagmire and focus document review to respond to the requests for production. Developing searches is the first step in responding, however, a search strategy should not simply be keywords. Searches should be built with the requests, including date ranges, messages sent between individuals, and other methods to focus on the merits of the case, not document review for the sake of document review.
The occurrence of a keyword term in a paper document, or a computer file, or any other ESI does not make the file relevant. A ESI file is relevant depending on the overall content of the file, not just one word.
Here is Judge Jill L. Burkhardt concise explanation of the factual, procedural background of the keyword dispute (citations to the record omitted).
On May 9, 2017, Wakaya emailed Youngevity to discuss the use of search terms to identify and collect potentially responsive electronically-stored information (ESI) from the substantial amount of ESI both parties possessed. Wakaya proposed a three-step process by which: “(i) each side proposes a list of search terms for their own documents; (ii) each side offers any supplemental terms to be added to the other side’s proposed list; and (iii) each side may review the total number of results generated by each term in the supplemented lists (i.e., a ‘hit list’ from our third-party vendors) and request that the other side omit any terms appearing to generate a disproportionate number of results.” On May 10, 2017, while providing a date to exchange search terms, Youngevity stated that the “use of key words as search aids may not be used to justify non-disclosure of responsive information.” On May 15, 2017, Youngevity stated that “[w]e are amenable to the three step process described in your May 9 e-mail….” Later that day, the parties exchanged lists of proposed search terms to be run across their own ESI. On May 17, 2017, the parties exchanged lists of additional search terms that each side proposed be run across the opposing party’s ESI.
The plaintiffs never produced their hit list as promised and as demanded by Defendants several times after the agreement was reached. Instead, they produced all documents on the hit list, some 4.2 Million pages, and labeled them all AEO. The defendants primarily objected to calling the plaintiffs’ labeling all documents Attorneys Eyes Only, instead of Confidential. The complaint about the production defect by producing all documents with hits, instead of all documents that were responsive, seems like an after thought.
Keyword Search Was New in the 1980s
The focus in this case on keyword search alone, instead of using a Hybrid Multimodal approach, is how a majority of ill-informed lawyers today still handle legal search today. I think keywords are an acceptable way to start a conversation, and begin a review, but to use keyword search alone hearkens back to the dark ages of document review, the mid-nineteen eighties. That is when lawyers first started using keyword search. Remember the Blair & Maron study of the San Francisco subway litigation document search? The study was completed in 1985. It found that when the lawyers and paralegals thought they had found over 75% of the relevant documents using keyword search, that they had in fact only found 20%. Blair, David C., & Maron, M. E., An evaluation of retrieval effectiveness for a full-text document-retrieval system; Communications of the ACM Volume 28, Issue 3 (March 1985).
The Blair Maron study is thirty-three years old and yet today we still have lawyers using keyword search alone, like it was the latest and greatest. The technology gap in the law is incredibly large. This is especially true when it comes to document review where the latest AI enhanced technologies are truly great. WHY I LOVE PREDICTIVE CODING: Making Document Review Fun Again with Mr. EDR and Predictive Coding 4.0. Wake up lawyers. We have come a long was since the 1980s and keyword search.
Judge Burkhardt’s Ruling
Back to the Dumb and Dumber story in Youngevity as told to us by the smartest person in that room, by far, Judge Burkhardt:
The Court suggested that a technology-assisted review (TAR) may be the most efficient way to resolve the myriad disputes surrounding Youngevity’s productions.
Note this suggestion seems to have been ignored by both sides. Are you surprised? At least the judge tried. Not back to the rest of the Dumb and Dumber story:
designated as AEO. Youngevity does not claim that the documents are all properly designated AEO, but asserts that this mass designation was the only way to timely meet its production obligations when it produced documents on July 21, 2017 and August 22, 2017. It offers no explanation as to why it has not used the intervening five months to conduct a review and properly designate the documents, except to say, “Youngevity believes that the parties reached an agreement on de-designation of Youngevity’s production which will occur upon the resolution of the matters underlying this briefing.” Why that de-designation is being held up while this motion is pending is not evident.
Oh yeah. Try to BS the judge. Another dumb move. Back to the story:
Wakaya argues that Youngevity failed to review any documents prior to production and instead provided Wakaya with a “document dump” containing masses of irrelevant documents, including privileged information, and missing “critical” documents. Youngevity’s productions contain documents such as Business Wire news emails, emails reminding employees to clean out the office
refrigerator, EBay transaction emails, UPS tracking emails, emails from StubHub, and employee file and benefits information. Youngevity argues that it simply provided the documents Wakaya requested in the manner that Wakaya instructed. …
Wakaya demanded that Youngevity review its production and remove irrelevant and non-responsive documents.
The poor judge is now being bothered by motions and phone calls as the many lawyers for both sides bill like crazy and ask for her help. Judge Burkhardt again does the smart thing and pushed the attorneys to use TAR and, since it is obvious they are clueless, to hire vendors to help them to do it.
[T]he Court suggested that conducting a TAR of Youngevity’s productions might be an efficient way to resolve the issues. On October 5, 2017, the parties participated in another informal discovery conference with the Court because they were unable to resolve their disputes relating to the TAR process and the payment of costs associated with TAR. The Court suggested that counsel meet and confer again with both parties’ discovery vendors participating. Wakaya states that on October 6, 2017, the parties participated in a joint call with their discovery vendors to discuss the TAR process. The parties could not agree on who would bear the costs of the TAR process. Youngevity states that it offered to pay half the costs associated with the TAR process, but Wakaya would not agree that TAR alone would result in a document production that satisfied Youngevity’s discovery obligations. Wakaya argued that it should not have bear the costs of fixing Youngevity’s improper productions. On October 9, 2017, the parties left a joint voicemail with the Court stating that they had reached a partial agreement to conduct a TAR of Youngevity’s production, but could not resolve the issue of which party would bear the TAR costs. In response to the parties’ joint voicemail, the Court issued a briefing schedule for the instant motion.
Makes you want to tear your hair out just to read it, doesn’t it? Yet the judge has to deal with junk like this every day. Patience of a saint.
More from Judge Burkhardt, who does a very good survey of the relevant law, starting at page four of the opinion (I suggest you read it). Skipping to the Analysis segment of the opinion at pages five through nine, here are the highlights, starting with a zinger against all counsel concerning the Rule 26(g) arguments:
Wakaya fails to establish that Youngevity violated Rule 26(g). Wakaya does not specifically claim that certificates signed by Youngevity or its counsel violate Rule 26(g). Neither party, despite filing over 1,600 pages of briefing and exhibits for this motion, provided the Court with Youngevity’s written discovery responses and certification. The Court declines to find that Youngevity improperly certified its discovery responses when the record before it does not indicate the content of Youngevity’s written responses, its certification, or a declaration stating that Youngevity in fact certified its responses. See Cherrington Asia Ltd. v. A & L Underground, Inc., 263 F.R.D. 653, 658 (D. Kan. 2010) (declining to impose sanctions under Rule 26(g) when plaintiffs do not specifically claim that certificates signed by defendant’s counsel violated the provisions of Rule 26(g)(1)). Accordingly, Wakaya is not entitled to relief under Rule 26(g).
Wow! Over 1,600 pages of memos and nobody provided the Rule 26(g) certification to the court that plaintiffs’ counsel allegedly violated. Back to the Dumb and Dumber story as told to us by Judge Burkhardt:
Besides establishing that Youngevity’s production exceeded Wakaya’s requests, the record indicates that Youngevity did not produce documents following the protocol to which the parties agreed. … Youngevity failed to produce its hit list to Wakaya, and instead produced every document that hit upon any proposed search term. Had Youngevity provided its hit list to Wakaya as agreed and repeatedly requested, Wakaya might have proposed a modification to the search terms that generated disproportionate results, thus potentially substantially reducing the number of documents requiring further review and ultimate production. …
Second, Youngevity conflates a hit on the parties’ proposed search terms with responsiveness. The two are not synonymous. Youngevity admits that it has an obligation to produce responsive documents. Youngevity argues that because each document hit on a search term, “the documents Youngevity produced are necessarily responsive to Wakaya’s Requests.” Search terms are an important tool parties may use to identify potentially responsive documents in cases involving substantial amounts of ESI. Search terms do not, however, replace a party’s requests for production. See In re Lithium Ion Batteries Antitrust Litig., No. 13MD02420 YGR (DMR), 2015 WL 833681, at *3 (N.D. Cal. Feb. 24, 2015) (noting that “a problem with keywords ‘is that they often are over inclusive, that is, they find responsive documents but also large numbers of irrelevant documents’”) (quoting Moore v. Publicis Groupe , 287 F.R.D. 182, 191 7 of 11 (S.D.N.Y. 2012)). UPS tracking emails and notices that employees must clean out the refrigerator are not responsive to Wakaya’s requests for production solely because they hit on a search term the parties’ agreed upon.
It was nice to see my Da Silva Moore case quoted on keyword defects, not just approval of predictive coding. The quote refers to what know known as the lack of PRECISION in using untested keyword search. One of the main advantages of active machine learning it to improve precision and keep lawyers from wasting their time reading messages about refrigerator cleaning.
Now Judge Burkhardt is ready to rule:
The Court is persuaded that running proposed search terms across Youngevity’s ESI, refusing to honor a negotiated agreement to provide a hit list which Wakaya was to use to narrow its requested search terms, and then producing all documents hit upon without reviewing a single document prior to production or engaging in any other quality control measures, does not satisfy Youngevity’s discovery obligations. Further, as is discussed below, mass designation of every document in both productions as AEO clearly violates the Stipulated Protective Order in this case. Youngevity may not frustrate the spirit of the discovery rules by producing a flood of documents it never reviewed, designate all the documents as AEO without regard to whether they meet the standard for such a designation, and thus bury responsive documents among millions of produced pages. See Queensridge Towers, LLC v. Allianz Glob. Risks US Ins. Co. , No. 2:13-CV-00197-JCM, 2014 WL 496952, at *6-7 (D. Nev. Feb. 4, 2014) (ordering plaintiff to supplement its discovery responses by specifying which documents are responsive to each of defendant’s discovery requests when plaintiff responded to requests for production and interrogatories by stating that the answers are somewhere among the millions of pages produced). Youngevity’s productions were such a mystery, even to itself, that it not only designated the entirety of both productions as AEO, but notified Wakaya that the productions might contain privileged documents. Accordingly, Wakaya’s request to compel proper productions is granted, as outlined below. See infra Section IV.
Judge Jill Burkhardt went on the award fees and costs to be taxed against the plaintiffs.
A document is never responsive, never relevant, just because it has a keyword in it. As Judge Burkhardt put it, that conflates a hit on the parties’ proposed search terms with responsiveness. In some cases, but not this one, a request for production may explicitly demand production of all documents that contain certain keywords. If such a request is made, then you should object. We are seeing more and more improper requests like this. The rules do not allow for a request to produce documents with certain keywords regardless of the relevance of the documents. (The reasonably calculated phrase was killed in 2015 and is no longer good law.) The rules and case law do not define relevance in terms of keywords. They define relevance in terms of proportional probative value to claims and defense raised. Again, as Judge Burkhardt out it, search terms do not …replace a party’s requests for production.
I agree with Josh who said parties often get lost in search terms, focusing on document review as process independent of the claims of the lawsuit. The first step in my TAR process is ESI communications or Talk. This includes speaking with the requesting party to clarify the documents sought. This should mean discussion of the claims of the lawsuit and what the requesting party hopes to find. Keywords are just a secondary byproduct of this kind of discussion. Keywords are not an end in themselves. Avoid that quagmire as Josh says and focus on clarifying the requests for production. Focus on Rule 26(b)(1) relevance and proportionality.
Another lesson, do not get stuck with just using keywords. We have come up with many other search tools since the 1980s. Use them. Use all of them. Go Multimodal. In a big complex case like Youngevity Int’l Corp. v. Smith, be sure to go Hybrid too. Be sure to use the most powerful search tool of all, predictive coding. See TAR Course for detailed instruction on Hybrid Multimodal. The robots will eat your keywords for lunch.
The AI power of active machine learning was the right solution available to the plaintiffs all along. Judge Burkhardt tried to tell them. Plaintiffs did not have to resort to dangerous production without review just to avoid paying their lawyers to read about their refrigerator cleanings. Let the AI read about all of that. It reads at near the speed of light and never forgets. If you have a good AI trainer, which is my specialty, the AI will understand what is relevant and find what you are looking for.
I agree that a keyword hit doesn’t define relevance *unless* the parties have explicitly agreed to indulge that loose equation. I wrote about this disconnect five years ago at https://ballinyourcourt.wordpress.com/2013/03/22/are-documents-containing-agreed-upon-keywords-responsive-per-se-2/
But, shouldn’t we be as troubled by the alternate equation; one which producing parties accept in as “dumb” a way as you criticize? That is, the ABSENCE of a keyword hit is not conclusive of an item being non-responsive; yet, producing parties routinely dismiss huge numbers of responsive documents from scrutiny because they didn’t get “hit” by a keyword Double standard? Of course! But, a different ox is being gored, so we don’t talk about how equally “dumb” this is.
If keywords don’t work as reliable proxies for relevance and irrelevance, then let’s discontinue their use entirely rather than using them principally to benefit producing party interests to the detriment of requesting parties.
Agree, kind of. Of course the absence of a keyword is not definitive of irrelevance. I do not favor keyword filtering, although, I admit, such filtering is sometimes required in small cases under proportionality analysis. If you go that route, you do so knowing that will eliminate a certain percentage of relevant documents, but how many and how important? What are the odds? Hard to say, especially without facts. Guess work alone is inadequate. That is why I want to test and refine keywords in an iterative process using parameters and Boolean logic anytime keywords are used, either to cull out or tee-up for review. It is a matter of judgment based on experience.
Most analysis is a probability based, including legal search. Although my goal is always perfection, that is just me. The Law only requires reasonable efforts. Total recall is not required and only rarely attained, and even then, in only simple cases with some luck, as my research in TREC showed. Same thing goes for forensic analysis I’m sure. I know you forensic guys rely heavily on keyword search for your work.
As to stop using keywords entirely, that would be stupid too. You are not being serious here I know, just continuing that tired old P. v D. thing we both grew up with. A quagmire (I could say Tar Pit) that many are still stuck in. Best practices are best practices regardless of which side of the V. you are on. I still do plenty of plaintiffs work in non-compete and trade-secret cases. So do most employment litigation lawyers.
Keywords are a powerful search tool, especially in skilled hands. It does not matter whether you are prosecuting or defending. You use it with all other search methods in your tool chest. Just know its limits; don’t use pliers on a screw. Use all of your tools and know the limits and abilities of each. That is what I’ve been saying for over a decade now, use a MULTIMODAL approach. Use all search tools, even keyword.
In my view, while it is certainly not necessary (or wise or cost-effective) to run a second search of all non-responsive documents, what I regularly do while a review team is up and running and even if TAR is being used, is run an independent search across all documents of key concepts, dates, personnel exchanges, etc., looking for communications about topics that I would expect to pop relating to the claims and events in the case. Likewise, for example, if the main review has identified a significant email exchange, I will search around that communication to see if others became involved shortly thereafter because that is when an email to another person sent five minutes after the relevant/important email saying only “I just spoke to Bob — call me” — can suddenly take on significance. In short, to test both the search parameters and the results I do a separate “senior level” review on top of the main review. And, very rarely has this not resulted in identifying other documents or search paths worth following.
Very good. This is what I call multimodal. Foley is very lucky to have a search expert like yourself running the important electronic evidence reviews.
Thank you Ralph, Craig and Josh “Bowtie” Gilliand for wrting about this. TAR can be challenging so it’s educational to start in reality and a truly basement level example highlighting the terms of art: relevence and responsive. In terms of importance to this matter, would one of the main metrics needed here be how many relevant docs are in the entire collection? A recall estimate based on Youngevity’s 4.2 Million pages of “hit” documents?
Pardon the typos. No spellcheck apparently.
I never saw a document count in the opinion, just a page count. That is odd, but ultimately not too important to the judge’s decision. Estimating recall is important. It begins with understanding “prevalence.” Some random sampling should have been done here. See my step three, Random Prevalence, in the eight-step work flow.