A Big Data Example
Let us change the scenario somewhat for a final quick hypothetical. Assume 10,000,000 documents after bulk culling for your SMEs to review for relevance. Assume sampling by an SME showed a prevalence of 10% (again somewhat high), and a first-pass review rate of 10,000 files per hour (somewhat slow for Big Data reviews). This means you can assume that only around 1,000,000 documents will need final second pass protection review.
More sampling shows the contract reviewers will, using computer-assisted techniques (smart routing, etc.), be able to review them at 100 files per hour. The improvement in protections review from the small case example, where a speed of only 20 files per hour was assumed, is possible when you have larger document collections sizes like this (one million documents), high quality review software, and experienced review teams.
With this information from sampling you can now estimate a total first pass review cost of $1,000,000.00 ($1,000.00 x 1,000 hours), which compares pretty well to the $2,829,349.10 charge in Gabriel. Next you can estimate a total final pass Protection review cost of $500,000.00 ($50 x 10,000 hours).
Lets us also assume you know from experience that your other supervision fees and software costs are likely to total another $150,000.00. Your estimate for the project would be $1,650,000.00. That is 16.5 cents a document ($0.165).
Too high you say? Not proportionate to the value of the case? Not proportionate to the expected probative value in this case from these 10,000,000 documents, which is something your sampling can tell you and provide evidence to support? Then use the document ranking feature of AI enhanced document review to further reduce the second pass protection costs.
If the SME’s identification of 1,000,000 likely relevant documents was based on a 50% plus probability ranking using predictive coding, then try a higher-ranking cut off. Again, this is not too difficult to do with sampling and good software. Maybe a 75% plus ranking cut off brings the count down from 1,000,000 to 250,000. Or maybe you just arbitrarily decide to use the top ranked 250,000 documents because that is all you can afford, or think is proportionate for this data and this case. Either way, you are now passing the strongest document sets along for second pass review. You will probably be producing the most relevant documents, the ones likely to have the highest probative value to the case.
Using the higher cut off, assume that the cost for second pass Protection review would then be 25% of what it was, reduced from $500,000.00 for review of 1,000,000 documents, to $125,000.00 to review 250,000 documents. Your other fees and costs also drop in your experience by 50%, from $150,000.00 to $75,000.00. Your total estimate is now $1,200,000.00, instead of $1,650,000.00. You have gone down to twelve cents a document. Assume this $1,200,000.00 number is now within your legal judgment to be proportional to this document request in this very large lawsuit. It is now within your budget.
If this was still not proportional in your judgment, or the client’s opinion, there are many other things you could try, but I would focus on the bulk culling before the SME first pass relevancy review. For instance, change the custodian count, or date range, but do not keyword filter like they did in Gabriel. Bring the initial 10,000,000 documents down to 5,000,000 documents, then do the math (500,000+250,000+37,500). You are now talking about around $800,000.00, back to sixteen cents per document. Is that within the budget? Is that an amount that a court is likely to force you to spend anyway?
The More Bang for the Buck Bottom Line Ranked Approach is Good for Both the Requesting Party and the Producing Party
When a producing party uses advanced computer review software with predictive coding ranking, they can limit their review and production to the documents with the highest relevancy ranking. This gives everyone the most truth for the dollar. For that reason this is the best approach, not only for the producing party, who can thereby budget and avoid disproportionate burdens, but also for the requesting party. They can also limit their expense to review the many thousands of documents (sometimes millions of documents) produced to them. They can also focus on the documents with the highest value. They can be reasonably assured that AI enhanced processes have probably found the documents that really matter to the resolution of the case. The plaintiffs understood in Kleen Products, a large class action pending in District Court in Chicago, which is one reason they tried to force the defendants to use predictive coding in their productions.
In spite of the Kleen Products precedent, not all plaintiff’s counsel are as sophisticated as counsel in that large class action. A producing party, especially in a David and Goliath type case, will often have to sell the benefits of these new methods to the smaller requesting party. The requesting party will be more likely to cooperate if they understand the value to them of these methods. This will often require the producing party to provide some reasonable degree of transparency into the proposed review processes. For instance, if you have well known experts retained to direct the review, tell them. Share his or her qualifications and experience. Engage in frank and full relevancy dialogues with the requesting party. Make sure you are training the machine to find the documents that they really want. Clarify the target. If need be, share some examples early on of the relevant documents you will use in the training. Invite them to provide you with documents they consider relevant to use in the training. Try to make it a collaborative approach. If they refuse, document your efforts and take any disputes to the judge as early as possible.
A collaborative approach is an effective way to convince a requesting party that your estimate is reasonable and that you are not still stuck in the old paradigm of hide-the-ball discovery games. I cannot overstate how important it is to develop trust between opposing counsel on discovery. Often the only way to do that is through some level of transparency. You do not have to disclose all of your search secrets, but you may have to keep the requesting party informed and involved in the process. That is what cooperation looks like.
All Review Projects Are Different
In order to be able to make a valid estimate for bottom line driven proportional review you must closely study the case and review the project goals. It is not enough to have handy per file cost estimates. This move to actual examination of the electronically stored information at issue, and study of the specific review tasks, is equivalent to the move in construction estimation from rough estimates based on average per square foot prices, to a site visit with inspection and measurements of all relevant conditions. No builder would bid on a project without first doing the detailed real world estimation work. We lawyers must do the same for this method to succeed.
Even in the same organization, and just dealing with email, the variances between document custodians can be tremendous. Some, for instance, may have large amounts of privileged communications. This kind of email takes the most time to review, and, if relevant, to log. High percentages of confidential documents, especially partially confidential, can also significantly drive up the costs of the second pass review. All of the many unique characteristics of electronic document collections can affect the speed of review and total costs of review. That is why you have to look at your data and test sample the emails in your collection to make accurate predictions. Estimation in the blind is never adequate. It would be like bidding on a construction project without first reading the plans and specifications.
Even when you have dealt with a particular client’s email collection before, a repeat customer so to speak, the estimates can still vary widely depending on the type of lawsuit, the issues, and the amount of money in controversy or general importance of the case.
Although this may seem counter-intuitive, the truth is, complex, big-ticket cases are the easiest to do e-discovery, especially if your goal is to do so in a proportional manner. If there is a billion dollars at issue, a reasonable budget for document review is large. On the other hand, proportional e-discovery in small cases is a real challenge, no matter how simple they supposedly are. Many cases that are small in monetary value are still very complex. And complex or not, all cases today have a lot of electronically stored information.
The medium size to small cases are where my bottom line driven proportional review has the highest application for cost control and the greatest promise to bring e-discovery to the masses. Give it a try, with or without AI enhanced predictive coding. For large cases, you should always employ AI enhanced methods. The ranking feature makes it easier to implement a proportional approach. Bottom line driven proportional review in big cases can save the greatest amount of money and provide much needed certainty and risk control.
The future of litigation involves new methods of technology-assisted discovery where Man and Machine work together to find the core truth. This day will come; in fact it is already here. As the science fiction writer William Gibson said: The future is already here – it’s just not evenly distributed yet. The key facts you need to try a case and do justice can be found in any size case, big and small, at an affordable price. But you have to embrace change and adopt new legal and technical methodologies. The Bottom Line Driven Proportional Review method is part of that answer, so too is advanced review software at affordable prices. It is already working for some today, it can work for you too.
 A random sample is step-three in my recommended eight-step methodology for AI enhanced review, aka predictive coding. See eg my blog at https://e-discoveryteam.com/car/ and the EDBP at http://www.edbp.com/search-review/predictive-coding/. For more on random sampling see Losey, R., Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents (Part Two) found at https://e-discoveryteam.com/2013/06/17/comparative-efficacy-of-two-predictive-coding-reviews-of-699082-enron-documents/; and Losey, R. Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022 found at https://e-discoveryteam.com/2012/05/06/random-sample-calculations-and-my-prediction-that-300000-lawyers-will-be-using-random-sampling-by-2022/.
 Losey, R. Relevancy Ranking is the Key Feature of Predictive Coding Software found at https://e-discoveryteam.com/2013/08/25/relevancy-ranking-is-the-key-feature-of-predictive-coding-software/. Relevancy ranking only works well now with the best software on the market and requires proper, accurate, machine training. Even then it is not an exact measure of relevancy weight, and testing and quality controls should always be applied to know when and to what degree the ranking strata are reliable.
 The costs of review have come way, way down in the past few years for those who are using advanced methods. For some context on the fourteen cents a document number used in this example, back in 2007 the Department of Justice spent $9.09 per document for review in the Fannie Mae case, even though it used contract lawyers for the review work. In re Fannie Mae Securities Litig., 552 F.3d 814, 817 (D.C. Cir. 2009) ($6,000,000/660,000 emails). There were no comments by the court that this price was excessive when the government later came back and sought cost shifting. At about the same time Verizon paid $6.09 per document for a massive second review project that enjoyed large economies of scale and, again, utilized contract review lawyers. Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010 ($14,000,000 to review 2.3 million documents in four months). In 2011 I was still seeing an average cost of $5.00 per file for tested keyword multimodal reviews before advanced predictive coding engines started to become available.
 See: Losey, R., Why a Receiving Party Would Want to Use Predictive Coding? found at https://e-discoveryteam.com/2013/08/12/why-a-receiving-party-would-want-to-use-predictive-coding/.
 Kleen Products, LLC, et al. v. Packaging Corp. of Amer., et al., Case: 1:10-cv-05711, Document #412 (ND, Ill., Sept. 28, 2012).
 Relevancy dialogues, which I also call ESI Discovery Communications, is step-one in my standard eight step methodology for predictive coding. See the EDBP description found at: http://www.edbp.com/search-review/predictive-coding/.
 Kleen Products, LLC, et al. v. Packaging Corp. of Amer., et al., 2012 WL 4498465 (ND, Ill., Sept. 28, 2012). Also see: Losey, R., Attorneys Admonished by Judge Nolan Not to “Confuse Advocacy with Adversarial Conduct” and Instructed on the Proportionality Doctrine.