Editor’s Preface: This article is an almost complete rewrite and update to an earlier blog I wrote about a year and a half ago. I consider this bottom line, proportional approach to be critical to modern e-discovery practice, and for that reason took the time to update and improve it, and incorporate predictive coding type methods. I even included the new Gabriel Techs opinion where a fee of $2,829,349.10 for a first pass CAR relevancy review of 1,000,000 documents was approved as reasonable. (Heck, I would have done it for a mere million:)
I hope you will in turn take the time to read this new version and tell your colleagues about it. Also, I look forward to your comments. For instance, do you think my fourteen cents per document review cost estimate in the big data example I provide is way off, or about right?
I have also created a PDF version of this article, here attached, which you are free to circulate and use, so long as you do not change anything or charge anything for its distribution. Please help me to get the word out there by using this with your firm, company, school, and CLE events.
Bottom Line Driven Proportional Review
I have been working on the problem of out-of-control e-discovery costs since 2006. At that time I phased out my general trial practice, went full-time e-discovery, and started the e-Discovery Team® blog. I focused on the expense side because it was obvious that crazy high e-discovery costs were a core problem of civil litigation. It still is. The high price of e-discovery, and the uncertainty of these costs, is the main reason most attorneys still avoid e-discovery like the plague.
The primary expense of e-discovery comes from the document search and review process; most estimate that it constitutes from 60% to 80% of the total. See Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (corporate survey found 73%).
Two Different Review Tasks
The review costs in turn arise from two different activities: identification of the likely responsive or relevant documents (aka first pass review), and then study of these selected documents to determine which must be withheld, logged, redacted, and/or labeled to protect a client’s confidential information. (It can also include specific issue tagging work unrelated to confidentiality concerns.) This second pass, final review primarily for confidentiality protection is an enormous problem in litigation for a variety of reasons, especially as it concerns attorney-client privileges. See Anonymous, An Open Letter to the Judiciary – Can We Talk? Parts One and Two.
I call this final stage of document review Protections and allocate a separate module for this legal work in my standard workflow for e-discovery legal work. I include the binary relevancy identification work in step seven, called C.A.R. for computer assisted review. The Protections work is step number eight in this best practices workflow.
My thinking and experiments since 2006 have focused on how to control both the cost of the initial relevancy review, and the cost of the final protection review for privilege and confidentiality.
At the present time, in most cases, you cannot simply dispense with final manual review of documents and rely solely on automated software for Protections. The likelihood of error is simply still too high for this to be an acceptable risk in most cases for most clients. The damage caused by disclosure of some privileged communications cannot be fully repaired by clawback agreements. See: Another Clawback Enforcement Order Shows the Importance of the Selection of Quality Vendors.
As I explained in my series Secrets of Search, Parts One, Two and Three, the software is not yet good enough to allow for a fully automated approach to Protections. I have asked around and no law firms do that now, except in unusual circumstances. Few experts advocate that approach either, even the most extreme advocates for automation (of which I’m one). The only exception I have heard of is in non-litigation circumstances, such as second reviews with production to the government, or in cases where the data under review is very unlikely to contain confidential information that the client cares anything about, such as old data of a company that it acquired.
Although automated review is nowhere near good enough, yet, to go it alone on the Protection step, you should use predictive coding to do first pass relevancy review, part of CAR. You should also use CAR to speed up the final manual review for Protections. But, as of today at least, in 2013, few trust the coding software alone, without human verification, to protect client confidences. Of course, it is the client’s data and their secrets, so if they really do not care that much about disclosure in any particular data set, and want to rely on clawback orders alone, they may tell you to skip this step; but you had better confirm that in writing with disclosure of the risks involved.
The Idea of Bottom Line Driven Review
After two years of thinking about the problem of high review costs to find relevant documents and protection confidential ones, I came up with an idea in early 2008 that looked promising. I have been testing and refining this invention ever since with several e-discovery teams. I have also talked about it with many other attorneys, and used this new method in many law suits, big and small. I call it Bottom Line Driven Proportional Review and Production. A more technical description for it, the one I used in a legal methods patent application, is: System and Method for Establishing, Managing, and Controlling the Time, Cost, and Quality of Information Retrieval and Production in Electronic Discovery. But I usually just call it Bottom Line Driven Review.
I am going to explain the idea here and urge you to try it out. The more attorneys that use this method, the more accepted it will be by judges. Right now most are hearing it from my teams for the first time, and, like anything new, it takes some explaining and getting used to. But, once understood, it appears obvious, and I expect all thinking clients will demand that their attorneys use this approach. It saves money.
The Bottom Line of Productions
The bottom line in e-discovery production is what it costs. Believe me, clients care about that …. a lot! In Bottom Line Driven Proportional Review everything starts with the bottom line. What is the production going to cost? Despite what some lawyers and vendors may tell you, that is not an impossible question to answer. It takes an experienced lawyer’s skill to answer, but after a while, you can get quite good at such estimation. It is basically a matter of man-hours estimation. With my method it becomes a reliable art that you can count on. It may never be exact, but the ranges can usually be predicted, subject of course to the target changing after the estimate is given. If the complaint is amended, or different evidence becomes relevant, then a change order may be required for the new specifications.
Price estimation is second nature to me, and an obvious thing to do before you begin work on any big project. I think that is primarily because I worked as a construction estimator out of college to save up money for law school back in the seventies. Estimating legal review costs is basically the same thing, projecting materials and labor costs. In construction you come up with prices per square foot. In e-discovery you estimate prices per file, as I will explain in detail later.
My new strategy and methodology is based on the bottom line. It is based on projected review costs, defensible culling, and best-practices of computer assisted review. Under this method the producing party determines the number of documents to be subjected to costly final review by calculating backwards from the bottom line of what they are willing, or required, to pay for the production.
Setting a Budget Proportional to the Case
The process begins by the producing party calculating the maximum amount of money appropriate to spend on ESI production. A budget. This requires not only an understanding of the ESI production requests, but also a careful evaluation of the merits of the case. This is where the all important proportionality element comes in.
The amount selected for the budget should be proportional to the monies and issues in the case. Any more than that is unduly burdensome and prohibited under Rule 26(b)(2)(C), Federal Rules of Civil Procedure and other rules that underlie what is now generally known as the Proportionality Principle. See Rule 1, Rule 26(b)(2)(C), Rule 26(b)(2)(B), and Rule 26(g), Federal Rules of Civil Procedure; Commentary on Proportionality in Electronic Discovery, 11 SEDONA CONF. J. 289 (2010); Oot, Kershaw & Roitblat, Mandating Reasonableness in a Reasonable Inquiry, Denver University Law Review, 87:2, 522-559 (2010); also see Rule 403 of the Federal Evidence Code (inadmissibility of cumulative evidence). For a larger collection of authorities, including numerous cases applying proportionality in e-discovery, see Good, Better, Best: a Tale of Three Proportionality Cases – Part Two.
The budget becomes the bottom line that drives the review and keeps the costs proportional. The producing party seeks to keep the total costs within that budget. The budget should either be by agreement of the parties, or at least without objection, or by court order. The failure to estimate and project future costs, and to decide in advance to conduct the review so as to stay within the budget, is the primary reason that e-discovery costs are so high. Once you spend the money, it is very hard to have costs shifted to the requesting party. But if you raise objections and argue proportionality before the spend, then you will have a good chance.
Under my recommended Bottom Line Driven Proportional approach, after analysis of the case merits, and determination of the maximum expense for production proportional to a case, the responding party makes a good faith estimate of the likely maximum number of documents that can be reviewed within that budget. The document count represents the number of documents that you estimate can be reviewed for final decisions of relevance, confidentiality, privilege and other issues, and still remain within your budget. The review costs you estimate must be based on best practices and be accurate (no puffing).
The producing party then uses predictive coding CAR techniques and quality controls to find the documents most likely to be responsive within the number of documents that the budget allows. Predictive coding CAR is based on document relevancy ranking. That capacity makes it the perfect tool to facilitate bottom line driven review.
Any good predictive coding driven CAR evaluates the strength of relevance and irrelevance of every document in the data set analyzed. That is one reason I am especially pleased to see the price of predictive coding software finally come down. It can be done without predictive coding ranking to be sure (I did so for years), but it is harder to be accurate, especially with recall. Using best methods with CAR and relevancy ranking allows you to get the most bang for your buck, the core truth. That it turn helps persuade the requesting party or court to go along with your budgetary limits.
Small Case Example
An example may help clarify how it works. If you set a proportional cost for a case of $100,000, and estimate based on sampling and other hard facts that it will cost you $2.50 per file for both the automated and manual review before production of the ESI at issue (steps seven and eight of the Electronic Discovery Best Practices flow chart), then you can review no more than 40,000 documents and stay within budget. It is basically that simple. No higher math is required.
The most difficult part is the legal analysis to determine a budget proportional to the real merits of the case. But that is nothing new. What is the golden mean in litigation expense? How to balance just, with speedy and inexpensive? The essence of the ideal proportionality question has preoccupied lawyers for decades. It has also preoccupied scientists, mathematicians, and artists for centuries. They claim to have found an answer that they call the golden mean or golden ratio:
In law this is the perennial Goldilocks question. How much is too much? Too little? Just right? How much is an appropriate spend to produce documents? The issue is old. I have personally been dealing with this problem for over thirty-three years. What’s new is applying that legal analysis to a modern-day high-volume-ESI search and review plan. Unfortunately, unlike art and math, there is no accepted golden ratio in the law, so it has to be recalculated and reargued for each case.
Side Note: If the golden ratio were accepted in law as an ideal proportionality, the number is 1.61803399, aka Phi. That would mean 38% is the perfect proportion. I have argued that when applied to litigation that means the total cost of litigation should never exceed 38% of the amount at issue. In turn, the total cost of discovery should not exceed 38% of the total litigation cost, and the cost of document production should not exceed 38% of the total costs of discovery (as opposed to our current 73% reality). (It’s like Russian nesting dolls that get proportionally smaller.) Thus for a $1 million case you should not spend more than $54,872 for document productions (1,000,000 – 380,000 – 144,400 – 54,872). See Losey, R., Beware of the ESI-discovery-tail wagging the poor old merits-of-the-dispute dog. But I digress too far.
Estimation for bottom line driven review is essentially a method for marshaling evidence to support an undue burden argument under Rule 26(b)(2)(C). It is basically the same thing we have been doing to support motions for protective orders in the paper production world for over sixty years. The only difference is that now the facts are technological, the numbers and variety of documents are enormous, sometimes astronomical, and the methods of review are very complex and not yet standardized.
Estimate of Projected Costs
The calculation of projected cost per file to review can be quite complicated, and is frequently misunderstood, or is not based on best practices. Still, in essence this cost projection is also fairly simple. You basically project how long it will take to do the review and the total cost of the time. (You may also have to factor in the materials costs, i.e.- software usage fees.)
Thus, for example (and this is an over-simplification), assume again our review project of 40,000 documents. Note it probably started as 100,000 or 200,000 documents, but you bulk-culled it down (Culling is step six in the EDBP) before beginning review by making such legal decisions as custodian ranking and phasing, date ranges, and file types.
Your next step is to identify the relevant documents from the 40,000 remaining after bulk culling. This is often called first pass relevancy review. It sets the stage for the Protections review, where documents coded relevant, and only those documents, are then re-reviewed for privilege, confidentiality, redacted, labeled and logged. They are often also issue tagged at this stage for the later use and convenience of trial lawyers. Mistakes in first pass relevancy review are also corrected.
The first pass relevancy review used to be done (and still is by out-of-date lawyers and review companies) by having a lawyer actually look at, meaning skim or read, each of the 40,000 documents. Using low paid contract lawyers, this kind of first-pass relevancy review typically goes at a rate of from between 50 to 100 files per hour. But by using CAR, a skilled search expert, who must also be a subject matter expert (SME) for predictive coding to work, can attain speeds in excess of 10,000 files per hour for first pass review.
Yes, a good SME can use machine training – predictive coding techniques - and determine file relevancy at a speed 1,000 times faster than a contract lawyer and can do so far more accurately. That is why the SME with good software can charge twenty times as much as a contract lawyer, if not more, and still do the first pass review at a fraction of the cost. For an example of this cost-savings approach, where a team of SMEs doing first pass review were paid $2,829,349.10 for their efforts, which was upheld as reasonable in a subsequent hearing on a fee award, see: Gabriel Techs., Corp. v. Qualcomm, Inc., No. 08CV1992 AJB (MDD), 2013 WL 410103 (S.D. Cal. Feb. 1, 2013) (keyword bulk culled 12,000,000 docs down to 1,000,000 (not a best practice) and then used predictive coding type CAR (rule based, not machine learning) algorithms to determine relevancy of the remaining 1,000,000 relevant).
I do not know what speeds the experts in Gabriel achieved for their CAR review of a million documents, but in my multimodal, machine learning type of predictive coding review of 699,082 ENRON documents for evidence concerning involuntary employee terminations, a fairly simple relevancy determination, the first pass review was completed at an average speed of 13,444 files per hour. See Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. Speeds such as this are common in the type of employment law issues I have been dealing with in the past couple of years, but similar speeds are attainable in other types of cases too. For instance, I recently completed another more complex review of over a million and a half documents at an average speed of 35,831 files per hour. I did this review myself, an Army of One, since I happened to be the only SME available for this project involving non-employment law legal issues.
Back to the example, let us assume a first pass CAR review speed of 10,000 files per hour. That means an SME could complete the review in only four hours. (Typically SME’s drive CARs in cases with far more documents than this to review, still, I have done it in small cases with only 40,000 documents before. The math and costs savings still works if the predictive coding software cost is not too high.) It would probably take the SME that much time to become an expert in the particular factual issues in the case, so let us budget a total time of eight hours, and a review rate of $1,000 per hour. That means an expense for first pass review (excluding software charges) of $8,000.00, which is two-thirds less than manual review.
Under a traditional contract lawyer review, where we assume a very fast speed (for them) of 75 files per hour, and a very low unmotivated lawyer rate of $25.00 per hour, you have a projected fee of $13,333 (66% higher cost). Even though the $1,000 rate is 40-times higher than the contract lawyers, since the SMEs are 133.33 times faster, the net savings is still 66%. That’s because it would take the contract lawyers 533.33 hours to complete the project, and, this is important, they would necessarily do so with a far lower accuracy rate. That’s right, they are likely to find far fewer relevant documents that the automated SME approach. This makes clear the power and importance of SMEs driving CARs, and why, along with their current scarcity, they are now in such demand. See David Cowen, Job Market Heating Up for e-Discovery Technologists, Managers, and Attorneys.
Side Note: To back up the likely far better accuracy claim for an SME driving a CAR, please see the following scientific studies: Grossman & Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Rich. J.L. & Tech., Spring 2011. Grossman & Cormack, Inconsistent Responsiveness Determination; Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010; Voorhees,Variations in relevance judgments and the measurement of retrieval effectiveness, 36:5 Information Processing & Management 697, 701 (2000).
Back to the example, after the first pass review comes the slower protections review. Now the highly skilled SMEs are no longer required. The lower-paid contract lawyers can do the review on the documents the SMEs have determined to be relevant. Assume that the first pass review found that 10,000 of the 40,000 documents were relevant. This means that 10,000 documents are subject to confidentiality Protections review (step eight of EDBP). Let us assume this work goes at an average rate of 50 files per hour. This means a final pass review should be completed in 200 hours at a cost of $5,000.00. So our base minimum review cost for both passes is $13,000.00.
I say base minimum because you have additional expenses beyond just contract reviewer time, including the expense of partner and senior associate management time, direct supervision of contract lawyers, quality control reviews, etc., plus software costs, which, depending on the vendor and your deal, can sometimes be very high. Let us assume that is another $7,000.00 cost here, for a total expense of $20,000. You would then have completed your review of 20,000 documents at a cost of one dollar per document. That is pretty good. But in larger projects, where you are talking about millions of documents, and more realistic prevalence rates, frequently less than 5%, the savings are even higher, and the per document rate even lower, sometimes, much lower.
You could estimate all of these costs in advance by having a bank of experience to draw upon, knowing the likely costs per file range. But remember, even in the world of repeat litigation, like employment law claims, all projects are different. All document sets are different. You have to, as I like to say, get your hands dirty in the digital mud. You have to know your ESI collection. Even in just one type of ESI, the one most common in e-discovery today, email and attachments, the variances in email collections can be tremendous.
The review speeds and thus review costs depend on the density of the documents and difficulty of the determinations. That is where the art of estimation comes in and depends on your understanding of the project. You have to understand the blueprints and specs of a project before you can make a valid estimation.
This is especially true of the SME work. You need to do some sampling to see what review rates apply. How long will it take these particular SMEs or contract reviewers to do the tasks assigned to them in this case with this data. Sampling is the only reliable way to do that, especially when it comes to the all important prevalence calculations.
A Big Data Example
Let us change the scenario somewhat for a final quick example. Assume 10,000,000 documents after culling for your SMEs to review. Assume sampling by an SME showed a prevalence of 10% (somewhat high), and a CAR review rate of 10,000 files per hour (somewhat slow for Big Data reviews). This means you can assume that only around 1,000,000 documents will need final Protection review.
More sampling shows the contract reviewers will, using CAR techniques (smart routing, etc), be able to review them at 100 files per hour.
With this information from sampling you can now estimate a total first pass review cost of $1,000,000 ($1,000 x 1,000 hours), which compares pretty well to the $2,829,349.10 charge in Gabriel. Next you can estimate a total final pass Protection review cost of $250,000 ($25 x 10,000 hours).
Lets us also assume you know from experience that your other supervision fees and software costs are likely to total another $150,000. Your estimate for the project would be $1,400,000. That is fourteen cents a document.
Too high you say? Not proportionate to the value of the case? Not proportionate to expected probative value in this case from these 10,000,000 documents, which is something your sampling can tell you and provide evidence to support? Then use ranking.
If the SME’s identification of 1,000,000 likely relevant documents was based on a 50% plus probability ranking using predictive coding, then try a higher ranking cut off. Again, not too difficult to do with sampling and good software. Maybe a 75% plus ranking cut off brings the count down from 1,000,000 to 250,000. Or maybe you just arbitrarily decide to use the top ranked 250,000 documents because that is all you can afford, or think is proportionate for this data and this case. Either way, you are now only passing the strongest documents along for second pass review.
Using the higher cut off the cost for second pass Protection review would then be 25% of what it was, reduced from $250,000 for review of 1,000.000 documents, to $62,500 to review 250,000 documents. Your other fees and costs also drop in your experience by 50%, from $150,000 to $75,000. Your total estimate is now $1,137,500, instead of $1,400,000. You have gone down to just over eleven cents a document. Assume this $1,137,500 number is now within your legal judgment to be proportional to this document request. It is now within your budget. You are done.
If it was still not proportional in your judgment, or the client’s opinion, there are many other things you could try, but I would focus on the bulk culling before the SME first pass relevancy review. Change the custodian count, or date range (but please, do not keyword search filter like they did in Gabriel). Bring the initial 10,000,000 documents down to 5,000,000 documents, then do the math. You are now talking about around $700,000, back to fourteen cents per document. Is that within the budget? Is that an amount that a court is likely to force you to spend anyway?
Side Note: The costs of review have come way, way down in the past few years for those who are using CAR methods. For some context on the fourteen cents a document number used in this example, back in 2007 the Department of Justice spent $9.09 per document for review in the Fannie Mae case, even though it used contract lawyers for the review work. In re Fannie Mae Securities Litig., 552 F.3d 814, 817 (D.C. Cir. 2009) ($6,000,000/660,000 emails). There were no comments by the court that this price was excessive when the government later came back and sought cost shifting. At about the same time Verizon paid $6.09 per document for a massive second review project that enjoyed large economies of scale and, again, utilized contract review lawyers. Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010 ($14,000,000 to review 2.3 million documents in four months). In 2011 I was still seeing an average cost of $5.00 per file for reviews before CARs with advance predictive coding engines started to become available.
The More Bang for the Buck Bottom Line Ranked Approach is Good for Both the Requesting Party and the Producing Party
The use of CARs and predictive coding ranking so that you only review and produce the best documents, the ones with the highest relevancy ranking, gives everyone the most truth for the dollar. For that reason this is the best approach, not only for the producing party, who can thereby budget and avoid disproportionate burdens, but also for a requesting party. The plaintiffs in Kleen Products realized that, which is why they tried to force the defendants to use predictive coding in their productions. Kleen Products, LLC, et al. v. Packaging Corp. of Amer., et al., Case: 1:10-cv-05711, Document #412 (ND, Ill., Sept. 28, 2012).
In spite of the Kleen Products precedent, a producing party will often need to sell the benefits of these new methods to the requesting party. The requesting party will be more likely to cooperate if they understand the value to them of these methods. This will often require the producing party to provide some reasonable degree of transparency into the proposed review processes. For instance, if you’ve got a hot SME lined up to drive the CAR, tell them. Share his or her qualifications and experience. Engage in relevancy dialogues with them. Make sure you are training the machine to find the documents that they really want. Clarify the target. If need be, share some examples early on of the relevant documents you will use in the training. Invite them to provide you with documents they consider relevant to use in the training. Try to make it at least a somewhat collaborative approach. If they refuse, document your efforts and take any disputes to the judge as early as possible.
An at least quasi-collaborative approach is how you can convince a requesting party that the estimate is reasonable and that you are not still stuck in the old paradigm of hide-the-ball discovery games. I cannot overstate how important it is to develop trust between opposing counsel on discovery. Often the only way to do that is through some level of transparency. You do not have to disclose all of your search secrets, but you may have to keep the requesting party somewhat informed and involved in the process. That is what cooperation looks like.
All Review Projects Are Different
In order to be able to make a valid estimate for bottom line driven proportional review you must closely study the case and review project goals. It is not enough to have handy per file cost estimates. This move to actual examination of the ESI at issue, and study of the specific review tasks, is equivalent to the move in construction estimation from rough estimates based on average per square foot prices, to a careful study of the building’s plans and specifications, and a site visit with inspection and measurements of all relevant conditions. No builder would bid on a project without first doing the detailed real world estimation work. We lawyers must do the same for this method to succeed.
Even in the same organization, and just dealing with email, the variances between custodians can be tremendous. Some for instance may have large amounts of privileged communications. This kind of email takes the most time to review, and if relevant, to log. High percentages of confidential documents, especially partially confidential, can also significantly drive up the costs of the second pass review. All of the many unique characteristics of ESI collections can affect the speed of review and total costs of review. That is why you have to look at your data and test sample the emails in your collection to make accurate predictions. Estimation in the blind is never adequate. It would be like bidding on a building without first reading the plans and specs.
Even when you have dealt with a particular client’s email collection before, a repeat customer so to speak, the estimates can still vary widely depending on the type of lawsuit, the issues, and on the amount of money in controversy or general importance of the case.
Although this may seem counter-intuitive, the truth is, complex, big-ticket cases are the easiest to do e-discovery, especially if your goal is to do so in a proportional manner. If there is a billion dollars at issue, a reasonable budget for ESI review is pretty big. On the other hand, proportional e-discovery in small cases is a real challenge, no matter how simple they supposedly are. Many cases that are small in monetary value are still very complex. And complex or not, all cases today have a lot of ESI.
The medium size to small cases are where my bottom line driven proportional review has the highest application for cost control and the greatest promise to bring e-discovery to the masses. Give it a try, with our without predictive coding.
The future of discovery involves new methods of technology-assisted discovery where Man and Machine work together to find the core truth. This day will come; in fact it is already here. As William Gibson said: The future is already here – it’s just not evenly distributed yet. The key facts you need to try a case and to do justice can be found in any size case, big and small, at an affordable price. But you have to embrace change and adopt new legal and technical methodologies. The Bottom Line Driven Proportional Review method is part of that answer, so too are advanced review software at affordable prices. It is already working for some today, it can work for you too.