Three cases came out recently on proportionality, the key legal doctrine to discovery based on Federal Rules 26(b)(2)(C), 26(b)(2)(B)(iii), and 26(g)(1)(B). I-Med Pharma Inc. v. Biomatrix, 2011 WL 6140658 (D.N.J. Dec. 9, 2011) (GOOD); U.S. ex rel McBride v. Halliburton Co.,, 272 F.R.D. 235 (D.D.C. 2011) (BETTER); DCG Sys., Inc. v. Checkpoint Techs, LLC, 2011 WL 5244356 (N.D. Cal. Nov. 2, 2011) (BEST). Since all three cases embody proportionality, they are all good. But some are better looking than others.
The quality of the application of the doctrine in these cases is directly tied to the parties timing. In the best case the issue was raised fast, even before discovery. It was raised in the 26(f) conference and 16(b) hearing. DCG Sys., Inc. v. Checkpoint Techs, LLC, 2011 WL 5244356 (N.D. Cal. Nov. 2, 2011). Counsel followed the Facebook good-hacker credo, as I explained in Impactful, Fast, Bold, Open, Values: Guidance of the “Hacker Way.” They were fast, and their bold use of proportionality will likely have a big impact on the case. With the guidance of the wise judge supervising the parties’ discovery, U.S. Magistrate Judge Paule S. Grewal, the application of the proportionality doctrine should lead to open, yet cost-controlled discovery. This should allow both parties in the patent case to focus on the merits. It should help avoid wasteful email reading endurance contests. Is that not a positive value we should all endorse? Answer: Yes! See Rule 1 and Judge Waxse’s article, Cooperation—What Is It and Why Do It?, as discussed in last week’s blog, Judge David Waxse on Cooperation and Lawyers Who Act Like Spoiled Children.
The Good Case: I-Med Pharma
Before I get into the California case that shows the right way to raise proportionality – at the beginning of the case – let’s look at the other two cases. The responding parties in both cases raised proportionality, but they came to the party a tad late. The worst of these good cases illustrates the saying better late than never. I-Med Pharma Inc. v. Biomatrix, 2011 WL 6140658 (D.N.J. Dec. 9, 2011). The party responding to discovery, the plaintiffs, finally raised the doctrine when their backs were against the wall fighting sanctions for failing to comply with a discovery order they had stipulated to.
Yup. You read that right. The plaintiff, I-Med Pharma Inc., was seeking relief on the basis of disproportionate burden from a discovery order they had agreed to. Kind of makes you wonder. But no explanation is provided in this opinion as to why plaintiff’s counsel in this case agreed to review and produce all non-privileged files that matched a ridiculously long list of keywords from opposing counsel. This was a contract dispute and the keyword list dreamed up by defense counsel, who apparently engaged in a rousing game of “Go Fish,” included such zingers as: contract*, loss, profit*, credit, refund, revenue, CL, HS*, return, claim, FDA, HA. I could go on, but you get the picture. (Side note: when I say keyword search sucks, as I did in Secrets of Search: Part One, this is the kind of search I am referring to: the blind guessing, Go Fish, linear kind with no quality controls. I am certainly not referring to the kind of iterative keyword search on steroids that we see in Kleen Products, which appears to be almost as good as predictive coding based hybrid multimodal methods. But I digress.)
In I-Med Pharma the attorneys not only used Go Fish keyword search, the kind that sucks, they agreed to have the search run by an outside forensic expert with no limits placed on target custodians. It was a search of the plaintiff’s entire corporate computer system. Not only that, there were no time limits placed on the search. To make matters worse, they not only agreed to search the active files with word matches, they agreed to search the slack space too. That is the so-called “unallocated space files” recovered by a forensic exam of plaintiff’s computer system.
Yup. You read all that right again. No wonder the wise judge presented with this conundrum, Senior U.S. District Court Judge Dickinson R. Debevoise, began his opinion with these words:
This case highlights the dangers of carelessness and inattention in e-discovery.
Boy did Judge Debevoise get that right!
Plaintiff’s counsel finally woke up and discovered proportionality (here is where we get to the better late than never part), when the forensic expert searched the unallocated space of their client’s computer system and found 64,382,929 hits covering the equivalent of 95 Million pages of documents! Based on the complete failure to limit the search to custodians, or date, and the Go Fish type list of keywords, this result, in Judge Debevoise’s own words, should come as no surprise. Id. at pg. 5.
The opinion does not say how many pages of documents with hits were found in the allocated spaces of the system, the active files, but it was probably millions more. I-Med Pharma, Inc., apparently did not oppose the privilege review and production of these documents. No doubt they paid millions in vendor costs and attorney fees to comply with this portion of the stipulation.
Since plaintiff’s counsel by now probably had a pretty good idea of what a privilege review of another 95 Million pages of mostly gibberish from slack space might cost, and since at this point the client probably did not want to pay for more, plaintiff’s counsel said no. They asked defense counsel to give them a break on their prior agreement. Defense counsel said no to that, a deals a deal, and perhaps sensing victory, refused to modify the prior stipulation. Then plaintiff moved for a protective order asking relief from the prior stipulation on discovery that had, as a matter of course, been converted to an order.
Here is where plaintiff raised the doctrine of proportionality and suggested that the costs and burdens to review 64,382,929 hits covering the equivalent of 95 Million pages of documents from slack space would exceed any possible benefit from that exercise. The Magistrate assigned to hear the dispute, Judge Shipp, quickly agreed, and the defendant, having little to lose (except perhaps credibility), appealed the decision to Judge Debevoise.
Judge Debevoise, of course, affirms his magistrate, which is why, after all, I call this comedy of errors a good decision. Judge Debevoise is a master of understatement and notes that:
A privilege review of 65 million documents is no small undertaking. Even if junior attorneys are engaged, heavily discounted rates are negotiated, and all parties work diligently and efficiently, even a cursory review of that many documents will consume large amounts of attorney time and cost millions of dollars.
Id. at pg. 10.
Judge Debevoise granted a hearing on plaintiff’s appeal, which I imagine would have been interesting to observe, since this judge really seems to have a good understanding of e-discovery. At the hearing defense counsel argued that plaintiff’s obligation to review 95 Million pages was not really that burdensome. As footnote six of the opinion explains, Judge Debevoise responded by asking defense counsel how they would do a privilege review of that many documents. That’s where it gets fun, as defendants’ counsel said they would simply run a search for the word privilege and only review the documents with that word. Uh, huh. Sure. Judge Debevoise observed in footnote six:
In spite of the answer given, it is difficult to believe that lawyers from xxxxxxx and xxxxxxxx regularly disclose large quantities of information from their client’s files without examining it.
Id. at pg. 10-11, FN 6. (self censored, hey it could happen to any otherwise good firm.)
So, Judge Debevoise let’s plaintiff’s counsel off the hook and relieves them of their prior e-discovery agreement. But he has some choice words for them too (and again I will not name plaintiff’s counsel), and provides good advice for all on a better way to do keyword search, a way that goes far beyond the simple guessing game the attorneys in this case had apparently been playing:
While the precise number of hits produced was not known in advance and Plaintiff argues that it could not have predicted the volume of material that the search would uncover, it should have exercised more diligence before stipulating to such broad search terms, particularly given the scope of the search. In evaluating whether a set of search terms are reasonable, a party should consider a variety of factors, including: (1) the scope of documents searched and whether the search is restricted to specific computers, file systems, or document custodians; (2) any date restrictions imposed on the search; (3) whether the search terms contain proper names, uncommon abbreviations, or other terms unlikely to occur in irrelevant documents; (4) whether operators such as “and”, “not”, or “near” are used to restrict the universe of possible results; (5) whether the number of results obtained could be practically reviewed given the economics of the case and the amount of money at issue.
While Plaintiff should have known better than to agree to the search terms used here, the interests of justice and basic fairness are little served by forcing Plaintiff to undertake an enormously expensive privilege review of material that is unlikely to contain non-duplicative evidence.
Id. at pgs. 11-12.
I-Med Pharma Inc. v. Biomatrix, is a good case, not only for proportionality, but also for search. It is very telling to note that even though the case embodies the doctrine of proportionality, the keyword itself is never used – even Rule 26(b)(2)(C) is never referred to. This once again demonstrates the limits of keyword search. I would not have found this case but for the genius of a non-linear, protodigital, artificial intelligence agent I know, a mobile Siri of e-discovery, Kenneth J. Withers. Thank you Ken. (Actually, truth be known, he found all three of these cases.)
The Better Case: U.S. ex rel McBride
The next case demonstrating proportionality is U.S. ex rel McBride v. Halliburton Co. , 272 F.R.D. 235 (D.D.C. 2011). It is a better case than I-Med-Pharma because proportionality was applied in a slightly more timely fashion. True, discovery had already closed, but at least protection was sought before stipulation to an order. It is also a good case because it was written by one of my favorite judges, a master wordsmith and e-discovery guru, John M. Facciola. He’s got a proportional look too.
Judge Facciola is a strong advocate for proportionality. By the way, that does not mean the judge has prejudged anything, or favors responding parties over requesting parties, or anything like that. It is the same thing as saying a judge favors reasonability, or motherhood, or apple pie, or, dare I say it, modern efficient search and review methods.
Judge Facciola was one of my keynote speakers in a CLE event that I co-chaired in 2010 with Maura Grossman. The sole topic of that day-long event was proportionality. It was one of the best CLEs I have ever done for that reason. There was a thorough examination of one issue from some of the top minds in e-discovery, including especially the top judges who specialize in the subject. Judge Facciola started us off by speaking of proportionality in the law and art and music where the principle is called the golden mean or golden ratio. He even played a few lines of Bach that demonstrated the golden ratio in music. I followed his presentation with my own where I spoke and showed slides on proportionality in the law and art, paintings and design. I showed famous paintings, buildings (Parthenon) and other golden ratio based designs (iPods) and examined how this ratio would apply to e-discovery costs. It seemed like part two of what Judge Facciola had started.
Judge Facciola and I were both blown away by the eerie coincidence of our art based approaches to explaining proportionality in the law. I am sure everyone there thought this was a well orchestrated presentation. But the truth is, I had no idea he was going to use the same art based approach to proportionality. Neither of us had ever talked to the other about what we were going to say. (You don’t need to have prep talks with a pro like him.) The truth is, there was no need for us to talk and prepare in order to have harmonious presentations. There is a high degree of consensus and similar thinking among judges and lawyers who specialize in this area. We have all read each others writings and heard each other speak a number of times. We think about this stuff, a lot. We tend to reach the same conclusions. There is a general consensus on most issues. What’s wrong with that? But I digress.
Back to U.S. ex rel McBride v. Halliburton Co.. Although Judge Facciola is an expert and strong proponent of proportionality, my keyword search of the opinion shows that he never once uses the word in this opinion. He cites Rule 26(b)(2)(C) several times to be sure, but never says proportionality, proving once again the limits of that old keyword technology. For more on new search alternatives to 1940s era software methods, new technologies that favor the truth, not one side of the Bar or another, see (in reverse chronological order) my following recent articles (yes, I’m obsessed with this):
- Predictive Coding Based Legal Methods for Search and Review;
- New Methods for Legal Search and Review;
- Perspective on Legal Search and Document Review;
- LegalTech Interview of Dean Gonsowski on Predictive Coding and My Mission to Make Predictive Coding Software More Affordable;
- My Impromptu Video Interview at NY LegalTech on Predictive Coding and Some Hopeful Thoughts for the Future;
- The Legal Implications of What Science Says About Recall;
- Reply to an Information Scientist’s Critique of My “Secrets of Search” Article;
- Bottom Line Driven Proportional Review;
- Secrets of Search: Parts One, Two, and Three;
- Information Scientist William Webber Posts Good Comment on the Secrets of Search Blog;
- Judge Peck Calls Upon Lawyers to Use Artificial Intelligence and Jason Baron Warns of a Dark Future of Information Burn-Out If We Don’t;
- We Are at the Dawn of a Golden Age of Justice.
So Judge Facciola implements proportionality without ever saying the word, and grants the defendant’s motion for protective order. Here Halliburton in this qui tam action had already reviewed and produced relevant emails of 230 custodians. Yup, you read that right, 230! Discovery closed and what does the plaintiff want? They want Halliburton to search and produce still more email from an additional 35 custodians. These additional custodians were now targeted by the plaintiff, McBride, because they were shown as CCs on emails transmitting relevant documents that were already produced as part of the 230. No other reason provided. Let’s listen to how the Sage from D.C., who loves the mathematical proportionality of Johann Sebastian Bach (not to mention Bruce Springsteen), deals with such a dissonant request:
In excruciating, but highly educational and useful, detail, she (one of Halliburton’s e-discovery compliance managers) explains how the process and collection of data from a current or former employee takes place, from what may be many different sources … Once the data is found, it must be copied in a forensically appropriate manner to preserve its metadata and prevent its alteration. … The data must then be processed to be rendered searchable by the review tool being used, a process that can overwhelm the computer’s capacity and require that the data be processed by batch, as opposed to all at once. …
All discovery, even if otherwise permitted by the Federal Rules of Civil Procedure because it is likely to yield relevant evidence, is subject to the court’s obligation to balance its utility against its cost. Fed.R.Civ.P. 26(b)(2)(C). More specifically, the court is obliged to consider whether (1) the discovery sought is unreasonably cumulative or duplicative, or obtainable from a cheaper and more convenient source; (2) the party seeking the discovery has had ample opportunity to obtain the sought information by earlier discovery; or (3) the burden of the discovery *241 outweighs its utility. Id. The latter requires the court to consider (1) the needs of the case; (2) the amount in controversy; (3) the parties’ resources; (4) the importance of the issues at stake in the action; and (5) the importance of the discovery in resolving the issues. Fed.R.Civ.P. 26(b)(2)(C)(iii).
While the present record does not permit a precise conclusion, I can presume, given the numbers of hours for which the defendants billed and the period of time at issue, that the amount in controversy is great and that the defendants’ resources are greater than the relator’s. Claims of fraud in providing services to military personnel raise important, vital issues of governmental supervision and public trust. Thus, these factors might weigh in favor of the discovery sought.
On the other hand, the defendants protest, and relator does not deny, that they have already spent a king’s ransom on discovery in this case–$650,000–without the addition of attorneys’ fees. They have produced more than two million paper documents, thousands of spreadsheets, and over a half a million e-mails.
Id. at pgs. 240-241 (record citations omitted).
Judge Facciola goes on, and note once again that he never says proportion* or any variation thereof:
Given the discovery that relator has had, what defendants have already spent, and the detailed showing made of how much more time and money will likely have to spent to search an additional thirty-five custodians, surely relator has to make a showing that the e-mails not produced are crucial to her proof. She has not made such a showing, and they are not. First, she has the LOGREP reports, and no one is pretending that defendants are (or could) be asserting that, without the transmitting e-mails, she cannot establish the submission of a false claim. To the contrary, the defendants conceded that point. Thus, the transmitting e-mails seem to be hopelessly insignificant.
In this context, it is telling that relator does not show from the e-mails she has received that there is good reason to believe that the ones she claims are missing are highly probative of some fact. Indeed, there is no showing whatsoever from what has been produced that those e-mails not produced will make the existence of some crucial fact more likely than not. It is, after all, unlikely that a transmitting e-mail will do any more than transmit attached information and, by copy, alert others of that transmittal.
Without any showing of the significance of the non-produced e-mails, let alone the likelihood of finding the “smoking gun,” the search relator demands cannot possibly be justified when one balances its cost against its utility. The motion will be denied.
Id. at 241.
Well put Judge Facciolla. This is music to the ears of any e-discovery devotee. I am reminded of my favorite, the Goldberg Variations. There are additional melodic words in this opinion about a party not losing their right to proportionate discovery because of preservation failures, but I suggest you read the opinion yourself for that last variation. Id. at 241-242.
The Best Case: DCG Systems
End of Part One. Stay tuned for the exciting conclusion, where you will hear the happy ending to this tale of three cases with the story of the best case, DCG Systems, more case-law on proportionality, and an opinion-riddled conclusion that will leave your jaws gaping.
As a sneak preview on the ending I will say this much. Although I agree with Siri’s explanation of DCG Systems, err, I mean, Ken Withers’ explanation: DCG Systems is a case that “proactively cuts the Gordian Knot by adopting Judge Rader’s Model Patent Discovery Order,” (yes, he actually talks like that, and with links too), I nevertheless have some criticisms of this case and the Model Order on which it is based. Its reliance on five keywords is flawed. Still, given the cost of most vendor’s predictive coding software these days, and the weak understanding most lawyers have of legal search, this reliance on outdated technology and search methods is to be expected.
Now for something completely different, I leave you with a few lyrics from Judge Facciola’s favorite musician-poet, Bruce Springsteen:
That you know flag flying over the courthouse
Means certain things are set in stone
Who we are, what we’ll do and what we won’t.
“Long Walk Home”
God have mercy on the man who doubts what he’s sure of.
“Brilliant Disguise”
But it’s a sad man my friend who’s livin’ in his own skin
And can’t stand the company.
Every fool’s got a reason to feelin’ sorry for himself
And turn his heart to stone.
Tonight this fool’s halfway to heaven and just a mile outta hell
And I feel like I’m comin’ home.
“Better Days”
So let’s take the good times as they go and I’ll meet you further on up the road…
“Further On (Up the Road)”
Posted by Ralph Losey
Predictive coding type algorithms are designed to leverage the expertise of human input, preferably attorneys who are subject matter experts of the case at hand. A classification of one document by an attorney results in a recommended classification of hundreds, if not thousands of other documents that the computer identifies as similar. The computer examines the entire data set, the corpus, and predicts the probability of each document therein fitting within the same classification. The expert then tests and correct the predictions in an iterative process. The cost savings in this approach are obvious, particularly considering the high expense of attorney review time. Reviewers can break through the current linear review speed barrier of approximately 100 files per hour, to 1,000, or even 10,000 files per hour. These supersonic review speeds are what it takes to handle e-discovery today in an effective and economical manner.
The input of human experts to train the artificial intelligence algorithms is essential and has a direct impact upon the quality of performance of the predictive coding. If, for instance, the expert is inconsistent in making initial calls on the relevance of documents, then the computer extrapolations based on these calls will also be inconsistent. It is the perennial computer situation of garbage in, garbage out. The quality of the predictions made by the computer is, in large part, based on the quality of the training, the input, provided by the human experts. Quality is, of course, also impacted by other factors, including the nature of the ESI under review, the number of iterative human expert input cycles, and the quality of the software itself. But any deficiencies in these quality factors can be detected and corrected by statistical random sampling of the final predictions. This is ultimately why the black box itself, i.w. exactly how the software works, is not really that important. The final quality controls of random sampling protect against any errors, including software errors.
Do not simply use in an unthinking manner the standard work flow methods that vendors set up as defaults for software. Also, if a vendor offers advice on legal methods and best practices (and good vendors know better than to do that), take it all with a big grain of salt. Vendors are not lawyers. E-discovery vendors are not allowed to provide legal services or legal advice. They are prohibited from doing so, even if the expert has a law degree and is licensed to practice law. It is even worse when a tech expert, one who has never even gone to law school, provides opinions on legal methods and what is reasonable and what is not. Vendors and techs are permitted to be advisers on technology and software functions, not the law. Legal methods must be determined by lawyers, not techs. Legal advice may only be provided by practicing attorneys in firms, or acting as solo practitioners. Still, unless you are already an expert in a particular kind of software or technology, it is a very good idea to consult with a software expert or bona fide information scientist in order to formulate your legal opinions. The multi-disciplinary team approach is required in e-discovery work.
The inability of attorneys to uncover documents requested does not necessarily mean that the attorneys are unskilled or the software is defective (although it might). The template for the smoking gun document might be accurate, but nothing like it is found for the simple reason that none exist. This is a key point that should never be forgotten. It is also one reason that skill alone is not enough to know whether you have proven a negative, or have made a mistake. Art and experience are important in this area of the law, just like in any other area of the law. If there is no needle in the haystack to begin with, then no amount of skills, quality controls, or repetition in search will find one.
Even the most highly skilled attorneys will make mistakes from time to time. All experienced attorneys know that. That is why the iterative processes of predictive coding are so valuable. When mistakes in input are made by the subject matter expert attorneys who create the first seed set in the predictive coding process, it is not a one-time, all or nothing procedure.


Predictive coding is a powerful new tool, but technology alone is not enough. We must also have new legal methods. Technology and law have to work together, grounded in science, to create a new gold standard. The law, driven as it is to stop the run-away costs of e-discovery, is now ready to adopt these new ideas and methods. It is ready to change from a linear, confrontative, one-dimensional, mostly-manual, Bates stamp approach to discovery, to a multidimensional, cooperative, iterative, largely-automated, hash value approach.
New systems of e-discovery are emerging that are designed for today’s digital world. Unlike most existing e-discovery systems, they are not mere adaptations of old paper discovery ways. The new methods use an entirely new collaborative approach and technologies, exemplified by predictive coding software. Although this paradigm shift in discovery is just starting, many of the contours of the new methods are already apparent.
Many of our leading jurists, information scientists, academics, scholars, writers, and legal practitioners recognize that the old methods and attitudes that worked for paper no longer work for ESI. 

Discovery of evidence and the legal analysis of relevancy and privilege determinations are at the heart of our legal system. They are essential to the common law evidence based system of justice. The methods and tools used in paper discovery did not work with the vast stores of digital information ubiquitous in the Twenty-First Century. That is why e-discovery became so expensive and riddled with mistakes. That is why completely new methods and tools emerged for digital discovery, which, in 2012, finally caught on. The last impediment of no judicial approval has been destroyed. The way is now clear.
The old-school, linear, confrontative, one-dimensional, largely manual, Bates stamp approach to e-discovery has already been abandoned by all of the top experts in the field. The old ways have been replaced by a new nine-step process I described in 
For millennium writings were on paper. For centuries the legal profession depended upon writings, referred to in the law as documents, as the key evidence to resolve disputes in a fair and just manner. Losey, Mathematical Formula for Justice Proves the Importance of ESI in Civil Litigation, Chapter 4 of Electronic Discovery (West 2010). Paper documents were well-known and mastered by every lawyer and judge who swore an oath to uphold the law.
Many see this as a much more profound cultural revolution than that precipitated by Guttenberg, which took centuries to play out, not decades. George L. Paul and Jason R. Baron,
The legal profession has been severely stressed by the rapid, ever-accelerating advances in technology. The changes in writing and resulting information explosion have been the key stressors. ESI is not only changing and evolving new into forms every year, but is now multiplying at an exponential rate that is almost beyond comprehension. See Eg., Rowe Entm’t, Inc. v. William Morris Agency, Inc., 205 F.R.D. 421, 429 (S.D.N.Y. 2002) (explaining that electronic data is so voluminous because, unlike paper documents, “the costs of storage are virtually nil. Information is retained not because it is expected to be used, but because there is no compelling reason to discard it”), aff’d, 2002 WL 975713 (S.D.N.Y. May 9, 2002); Data, Data Everywhere (The Economist, March 2010); Baron and Losey 
These old paper-based legal search and review methods are one dimensional and linear in nature. They typically follow a sequential Bates stamp organizational model created in the 1890s. The simple paper evidence discovery processes worked pretty well for decades before computers. It should be noted, however, that even before technology moved away from paper typing machines to computers in the 1980s, the discovery processes were already severely taxed by the growing volumes of paper documents generated from the 1960s forward. The increase in paper volume was caused by another technological innovation, the photocopy machine and by ever more complex transactions. Still, the legal profession coped somehow for the rest of the Twentieth Century. Lawyers added more numbers to the Bate stamps and used larger teams of lawyers and paralegals to manage the additional papers. They were still on familiar ground.
The old linear review methods involved serial culling of documents down to a final production set. The process generally required multiple reviews of the same document for different purposes. It was inefficient. It was expensive. Moreover, the quality control of human eyes on paper did not work with high volumes of documents. This is shown by the latest scientific experiments where the agreement rate among professional legal reviewers was found to be just less than 50%. Cormack, Grossman, Hedin, Oard, Overview of TREC 2010 Legal Track (February 21, 2012).
I want to make predictive coding software an affordable, everyday item. Now that I’ve helped to open the door, I want as many people as possible to be able to walk through. With the right methods to use this tool, a new world of affordable e-discovery awaits.