Editor’s Preface: This article is an almost complete rewrite and update to an earlier blog I wrote about a year and a half ago. I consider this bottom line, proportional approach to be critical to modern e-discovery practice, and for that reason took the time to update and improve it, and incorporate predictive coding type methods. I even included the new Gabriel Techs opinion where a fee of $2,829,349.10 for a first pass CAR relevancy review of 1,000,000 documents was approved as reasonable. (Heck, I would have done it for a mere million:)
I hope you will in turn take the time to read this new version and tell your colleagues about it. Also, I look forward to your comments. For instance, do you think my fourteen cents per document review cost estimate in the big data example I provide is way off, or about right?
I have also created a PDF version of this article, here attached, which you are free to circulate and use, so long as you do not change anything or charge anything for its distribution. Please help me to get the word out there by using this with your firm, company, school, and CLE events.
__________
Bottom Line Driven Proportional Review

I have been working on the problem of out-of-control e-discovery costs since 2006. At that time I phased out my general trial practice, went full-time e-discovery, and started the e-Discovery Team® blog. I focused on the expense side because it was obvious that crazy high e-discovery costs were a core problem of civil litigation. It still is. The high price of e-discovery, and the uncertainty of these costs, is the main reason most attorneys still avoid e-discovery like the plague.
The primary expense of e-discovery comes from the document search and review process; most estimate that it constitutes from 60% to 80% of the total. See Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (corporate survey found 73%).
Two Different Review Tasks
The review costs in turn arise from two different activities: identification of the likely responsive or relevant documents (aka first pass review), and then study of these selected documents to determine which must be withheld, logged, redacted, and/or labeled to protect a client’s confidential information. (It can also include specific issue tagging work unrelated to confidentiality concerns.) This second pass, final review primarily for confidentiality protection is an enormous problem in litigation for a variety of reasons, especially as it concerns attorney-client privileges. See Anonymous, An Open Letter to the Judiciary – Can We Talk? Parts One and Two.
I call this final stage of document review Protections and allocate a separate module for this legal work in my standard workflow for e-discovery legal work. I include the binary relevancy identification work in step seven, called C.A.R. for computer assisted review. The Protections work is step number eight in this best practices workflow.
My thinking and experiments since 2006 have focused on how to control both the cost of the initial relevancy review, and the cost of the final protection review for privilege and confidentiality.
At the present time, in most cases, you cannot simply dispense with final manual review of documents and rely solely on automated software for Protections. The likelihood of error is simply still too high for this to be an acceptable risk in most cases for most clients. The damage caused by disclosure of some privileged communications cannot be fully repaired by clawback agreements. See: Another Clawback Enforcement Order Shows the Importance of the Selection of Quality Vendors.
As I explained in my series Secrets of Search, Parts One, Two and Three, the software is not yet good enough to allow for a fully automated approach to Protections. I have asked around and no law firms do that now, except in unusual circumstances. Few experts advocate that approach either, even the most extreme advocates for automation (of which I’m one). The only exception I have heard of is in non-litigation circumstances, such as second reviews with production to the government, or in cases where the data under review is very unlikely to contain confidential information that the client cares anything about, such as old data of a company that it acquired.
Although automated review is nowhere near good enough, yet, to go it alone on the Protection step, you should use predictive coding to do first pass relevancy review, part of CAR. You should also use CAR to speed up the final manual review for Protections. But, as of today at least, in 2013, few trust the coding software alone, without human verification, to protect client confidences. Of course, it is the client’s data and their secrets, so if they really do not care that much about disclosure in any particular data set, and want to rely on clawback orders alone, they may tell you to skip this step; but you had better confirm that in writing with disclosure of the risks involved.
The Idea of Bottom Line Driven Review

After two years of thinking about the problem of high review costs to find relevant documents and protection confidential ones, I came up with an idea in early 2008 that looked promising. I have been testing and refining this invention ever since with several e-discovery teams. I have also talked about it with many other attorneys, and used this new method in many law suits, big and small. I call it Bottom Line Driven Proportional Review and Production. A more technical description for it, the one I used in a legal methods patent application, is: System and Method for Establishing, Managing, and Controlling the Time, Cost, and Quality of Information Retrieval and Production in Electronic Discovery. But I usually just call it Bottom Line Driven Review.
I am going to explain the idea here and urge you to try it out. The more attorneys that use this method, the more accepted it will be by judges. Right now most are hearing it from my teams for the first time, and, like anything new, it takes some explaining and getting used to. But, once understood, it appears obvious, and I expect all thinking clients will demand that their attorneys use this approach. It saves money.
The Bottom Line of Productions
The bottom line in e-discovery production is what it costs. Believe me, clients care about that …. a lot! In Bottom Line Driven Proportional Review everything starts with the bottom line. What is the production going to cost? Despite what some lawyers and vendors may tell you, that is not an impossible question to answer. It takes an experienced lawyer’s skill to answer, but after a while, you can get quite good at such estimation. It is basically a matter of man-hours estimation. With my method it becomes a reliable art that you can count on. It may never be exact, but the ranges can usually be predicted, subject of course to the target changing after the estimate is given. If the complaint is amended, or different evidence becomes relevant, then a change order may be required for the new specifications.
Price estimation is second nature to me, and an obvious thing to do before you begin work on any big project. I think that is primarily because I worked as a construction estimator out of college to save up money for law school back in the seventies. Estimating legal review costs is basically the same thing, projecting materials and labor costs. In construction you come up with prices per square foot. In e-discovery you estimate prices per file, as I will explain in detail later.
My new strategy and methodology is based on the bottom line. It is based on projected review costs, defensible culling, and best-practices of computer assisted review. Under this method the producing party determines the number of documents to be subjected to costly final review by calculating backwards from the bottom line of what they are willing, or required, to pay for the production.
Setting a Budget Proportional to the Case
The process begins by the producing party calculating the maximum amount of money appropriate to spend on ESI production. A budget. This requires not only an understanding of the ESI production requests, but also a careful evaluation of the merits of the case. This is where the all important proportionality element comes in.
The amount selected for the budget should be proportional to the monies and issues in the case. Any more than that is unduly burdensome and prohibited under Rule 26(b)(2)(C), Federal Rules of Civil Procedure and other rules that underlie what is now generally known as the Proportionality Principle. See Rule 1, Rule 26(b)(2)(C), Rule 26(b)(2)(B), and Rule 26(g), Federal Rules of Civil Procedure; Commentary on Proportionality in Electronic Discovery, 11 SEDONA CONF. J. 289 (2010); Oot, Kershaw & Roitblat, Mandating Reasonableness in a Reasonable Inquiry, Denver University Law Review, 87:2, 522-559 (2010); also see Rule 403 of the Federal Evidence Code (inadmissibility of cumulative evidence). For a larger collection of authorities, including numerous cases applying proportionality in e-discovery, see Good, Better, Best: a Tale of Three Proportionality Cases – Part Two.
The budget becomes the bottom line that drives the review and keeps the costs proportional. The producing party seeks to keep the total costs within that budget. The budget should either be by agreement of the parties, or at least without objection, or by court order. The failure to estimate and project future costs, and to decide in advance to conduct the review so as to stay within the budget, is the primary reason that e-discovery costs are so high. Once you spend the money, it is very hard to have costs shifted to the requesting party. But if you raise objections and argue proportionality before the spend, then you will have a good chance.
Under my recommended Bottom Line Driven Proportional approach, after analysis of the case merits, and determination of the maximum expense for production proportional to a case, the responding party makes a good faith estimate of the likely maximum number of documents that can be reviewed within that budget. The document count represents the number of documents that you estimate can be reviewed for final decisions of relevance, confidentiality, privilege and other issues, and still remain within your budget. The review costs you estimate must be based on best practices and be accurate (no puffing).
The producing party then uses predictive coding CAR techniques and quality controls to find the documents most likely to be responsive within the number of documents that the budget allows. Predictive coding CAR is based on document relevancy ranking. That capacity makes it the perfect tool to facilitate bottom line driven review.
Any good predictive coding driven CAR evaluates the strength of relevance and irrelevance of every document in the data set analyzed. That is one reason I am especially pleased to see the price of predictive coding software finally come down. It can be done without predictive coding ranking to be sure (I did so for years), but it is harder to be accurate, especially with recall. Using best methods with CAR and relevancy ranking allows you to get the most bang for your buck, the core truth. That it turn helps persuade the requesting party or court to go along with your budgetary limits.
Small Case Example
An example may help clarify how it works. If you set a proportional cost for a case of $100,000, and estimate based on sampling and other hard facts that it will cost you $2.50 per file for both the automated and manual review before production of the ESI at issue (steps seven and eight of the Electronic Discovery Best Practices flow chart), then you can review no more than 40,000 documents and stay within budget. It is basically that simple. No higher math is required.
The most difficult part is the legal analysis to determine a budget proportional to the real merits of the case. But that is nothing new. What is the golden mean in litigation expense? How to balance just, with speedy and inexpensive? The essence of the ideal proportionality question has preoccupied lawyers for decades. It has also preoccupied scientists, mathematicians, and artists for centuries. They claim to have found an answer that they call the golden mean or golden ratio:
In law this is the perennial Goldilocks question. How much is too much? Too little? Just right? How much is an appropriate spend to produce documents? The issue is old. I have personally been dealing with this problem for over thirty-three years. What’s new is applying that legal analysis to a modern-day high-volume-ESI search and review plan. Unfortunately, unlike art and math, there is no accepted golden ratio in the law, so it has to be recalculated and reargued for each case.
Side Note: If the golden ratio were accepted in law as an ideal proportionality, the number is 1.61803399, aka Phi. That would mean 38% is the perfect proportion. I have argued that when applied to litigation that means the total cost of litigation should never exceed 38% of the amount at issue. In turn, the total cost of discovery should not exceed 38% of the total litigation cost, and the cost of document production should not exceed 38% of the total costs of discovery (as opposed to our current 73% reality). (It’s like Russian nesting dolls that get proportionally smaller.) Thus for a $1 million case you should not spend more than $54,872 for document productions (1,000,000 – 380,000 – 144,400 – 54,872). See Losey, R., Beware of the ESI-discovery-tail wagging the poor old merits-of-the-dispute dog. But I digress too far.
Estimation for bottom line driven review is essentially a method for marshaling evidence to support an undue burden argument under Rule 26(b)(2)(C). It is basically the same thing we have been doing to support motions for protective orders in the paper production world for over sixty years. The only difference is that now the facts are technological, the numbers and variety of documents are enormous, sometimes astronomical, and the methods of review are very complex and not yet standardized.
Estimate of Projected Costs
The calculation of projected cost per file to review can be quite complicated, and is frequently misunderstood, or is not based on best practices. Still, in essence this cost projection is also fairly simple. You basically project how long it will take to do the review and the total cost of the time. (You may also have to factor in the materials costs, i.e.- software usage fees.)
Thus, for example (and this is an over-simplification), assume again our review project of 40,000 documents. Note it probably started as 100,000 or 200,000 documents, but you bulk-culled it down (Culling is step six in the EDBP) before beginning review by making such legal decisions as custodian ranking and phasing, date ranges, and file types.
Your next step is to identify the relevant documents from the 40,000 remaining after bulk culling. This is often called first pass relevancy review. It sets the stage for the Protections review, where documents coded relevant, and only those documents, are then re-reviewed for privilege, confidentiality, redacted, labeled and logged. They are often also issue tagged at this stage for the later use and convenience of trial lawyers. Mistakes in first pass relevancy review are also corrected.
The first pass relevancy review used to be done (and still is by out-of-date lawyers and review companies) by having a lawyer actually look at, meaning skim or read, each of the 40,000 documents. Using low paid contract lawyers, this kind of first-pass relevancy review typically goes at a rate of from between 50 to 100 files per hour. But by using CAR, a skilled search expert, who must also be a subject matter expert (SME) for predictive coding to work, can attain speeds in excess of 10,000 files per hour for first pass review.
Yes, a good SME can use machine training – predictive coding techniques - and determine file relevancy at a speed 1,000 times faster than a contract lawyer and can do so far more accurately. That is why the SME with good software can charge twenty times as much as a contract lawyer, if not more, and still do the first pass review at a fraction of the cost. For an example of this cost-savings approach, where a team of SMEs doing first pass review were paid $2,829,349.10 for their efforts, which was upheld as reasonable in a subsequent hearing on a fee award, see: Gabriel Techs., Corp. v. Qualcomm, Inc., No. 08CV1992 AJB (MDD), 2013 WL 410103 (S.D. Cal. Feb. 1, 2013) (keyword bulk culled 12,000,000 docs down to 1,000,000 (not a best practice) and then used predictive coding type CAR (rule based, not machine learning) algorithms to determine relevancy of the remaining 1,000,000 relevant).
I do not know what speeds the experts in Gabriel achieved for their CAR review of a million documents, but in my multimodal, machine learning type of predictive coding review of 699,082 ENRON documents for evidence concerning involuntary employee terminations, a fairly simple relevancy determination, the first pass review was completed at an average speed of 13,444 files per hour. See Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. Speeds such as this are common in the type of employment law issues I have been dealing with in the past couple of years, but similar speeds are attainable in other types of cases too. For instance, I recently completed another more complex review of over a million and a half documents at an average speed of 35,831 files per hour. I did this review myself, an Army of One, since I happened to be the only SME available for this project involving non-employment law legal issues.
Back to the example, let us assume a first pass CAR review speed of 10,000 files per hour. That means an SME could complete the review in only four hours. (Typically SME’s drive CARs in cases with far more documents than this to review, still, I have done it in small cases with only 40,000 documents before. The math and costs savings still works if the predictive coding software cost is not too high.) It would probably take the SME that much time to become an expert in the particular factual issues in the case, so let us budget a total time of eight hours, and a review rate of $1,000 per hour. That means an expense for first pass review (excluding software charges) of $8,000.00, which is two-thirds less than manual review.
Under a traditional contract lawyer review, where we assume a very fast speed (for them) of 75 files per hour, and a very low unmotivated lawyer rate of $25.00 per hour, you have a projected fee of $13,333 (66% higher cost). Even though the $1,000 rate is 40-times higher than the contract lawyers, since the SMEs are 133.33 times faster, the net savings is still 66%. That’s because it would take the contract lawyers 533.33 hours to complete the project, and, this is important, they would necessarily do so with a far lower accuracy rate. That’s right, they are likely to find far fewer relevant documents that the automated SME approach. This makes clear the power and importance of SMEs driving CARs, and why, along with their current scarcity, they are now in such demand. See David Cowen, Job Market Heating Up for e-Discovery Technologists, Managers, and Attorneys.
Side Note: To back up the likely far better accuracy claim for an SME driving a CAR, please see the following scientific studies: Grossman & Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Rich. J.L. & Tech., Spring 2011. Grossman & Cormack, Inconsistent Responsiveness Determination; Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010; Voorhees,Variations in relevance judgments and the measurement of retrieval effectiveness, 36:5 Information Processing & Management 697, 701 (2000).
Back to the example, after the first pass review comes the slower protections review. Now the highly skilled SMEs are no longer required. The lower-paid contract lawyers can do the review on the documents the SMEs have determined to be relevant. Assume that the first pass review found that 10,000 of the 40,000 documents were relevant. This means that 10,000 documents are subject to confidentiality Protections review (step eight of EDBP). Let us assume this work goes at an average rate of 50 files per hour. This means a final pass review should be completed in 200 hours at a cost of $5,000.00. So our base minimum review cost for both passes is $13,000.00.
I say base minimum because you have additional expenses beyond just contract reviewer time, including the expense of partner and senior associate management time, direct supervision of contract lawyers, quality control reviews, etc., plus software costs, which, depending on the vendor and your deal, can sometimes be very high. Let us assume that is another $7,000.00 cost here, for a total expense of $20,000. You would then have completed your review of 20,000 documents at a cost of one dollar per document. That is pretty good. But in larger projects, where you are talking about millions of documents, and more realistic prevalence rates, frequently less than 5%, the savings are even higher, and the per document rate even lower, sometimes, much lower.
You could estimate all of these costs in advance by having a bank of experience to draw upon, knowing the likely costs per file range. But remember, even in the world of repeat litigation, like employment law claims, all projects are different. All document sets are different. You have to, as I like to say, get your hands dirty in the digital mud. You have to know your ESI collection. Even in just one type of ESI, the one most common in e-discovery today, email and attachments, the variances in email collections can be tremendous.
The review speeds and thus review costs depend on the density of the documents and difficulty of the determinations. That is where the art of estimation comes in and depends on your understanding of the project. You have to understand the blueprints and specs of a project before you can make a valid estimation.
This is especially true of the SME work. You need to do some sampling to see what review rates apply. How long will it take these particular SMEs or contract reviewers to do the tasks assigned to them in this case with this data. Sampling is the only reliable way to do that, especially when it comes to the all important prevalence calculations.
A Big Data Example
Let us change the scenario somewhat for a final quick example. Assume 10,000,000 documents after culling for your SMEs to review. Assume sampling by an SME showed a prevalence of 10% (somewhat high), and a CAR review rate of 10,000 files per hour (somewhat slow for Big Data reviews). This means you can assume that only around 1,000,000 documents will need final Protection review.
More sampling shows the contract reviewers will, using CAR techniques (smart routing, etc), be able to review them at 100 files per hour.
With this information from sampling you can now estimate a total first pass review cost of $1,000,000 ($1,000 x 1,000 hours), which compares pretty well to the $2,829,349.10 charge in Gabriel. Next you can estimate a total final pass Protection review cost of $250,000 ($25 x 10,000 hours).
Lets us also assume you know from experience that your other supervision fees and software costs are likely to total another $150,000. Your estimate for the project would be $1,400,000. That is fourteen cents a document.
Too high you say? Not proportionate to the value of the case? Not proportionate to expected probative value in this case from these 10,000,000 documents, which is something your sampling can tell you and provide evidence to support? Then use ranking.
If the SME’s identification of 1,000,000 likely relevant documents was based on a 50% plus probability ranking using predictive coding, then try a higher ranking cut off. Again, not too difficult to do with sampling and good software. Maybe a 75% plus ranking cut off brings the count down from 1,000,000 to 250,000. Or maybe you just arbitrarily decide to use the top ranked 250,000 documents because that is all you can afford, or think is proportionate for this data and this case. Either way, you are now only passing the strongest documents along for second pass review.
Using the higher cut off the cost for second pass Protection review would then be 25% of what it was, reduced from $250,000 for review of 1,000.000 documents, to $62,500 to review 250,000 documents. Your other fees and costs also drop in your experience by 50%, from $150,000 to $75,000. Your total estimate is now $1,137,500, instead of $1,400,000. You have gone down to just over eleven cents a document. Assume this $1,137,500 number is now within your legal judgment to be proportional to this document request. It is now within your budget. You are done.
If it was still not proportional in your judgment, or the client’s opinion, there are many other things you could try, but I would focus on the bulk culling before the SME first pass relevancy review. Change the custodian count, or date range (but please, do not keyword search filter like they did in Gabriel). Bring the initial 10,000,000 documents down to 5,000,000 documents, then do the math. You are now talking about around $700,000, back to fourteen cents per document. Is that within the budget? Is that an amount that a court is likely to force you to spend anyway?
Side Note: The costs of review have come way, way down in the past few years for those who are using CAR methods. For some context on the fourteen cents a document number used in this example, back in 2007 the Department of Justice spent $9.09 per document for review in the Fannie Mae case, even though it used contract lawyers for the review work. In re Fannie Mae Securities Litig., 552 F.3d 814, 817 (D.C. Cir. 2009) ($6,000,000/660,000 emails). There were no comments by the court that this price was excessive when the government later came back and sought cost shifting. At about the same time Verizon paid $6.09 per document for a massive second review project that enjoyed large economies of scale and, again, utilized contract review lawyers. Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010 ($14,000,000 to review 2.3 million documents in four months). In 2011 I was still seeing an average cost of $5.00 per file for reviews before CARs with advance predictive coding engines started to become available.
The More Bang for the Buck Bottom Line Ranked Approach is Good for Both the Requesting Party and the Producing Party
The use of CARs and predictive coding ranking so that you only review and produce the best documents, the ones with the highest relevancy ranking, gives everyone the most truth for the dollar. For that reason this is the best approach, not only for the producing party, who can thereby budget and avoid disproportionate burdens, but also for a requesting party. The plaintiffs in Kleen Products realized that, which is why they tried to force the defendants to use predictive coding in their productions. Kleen Products, LLC, et al. v. Packaging Corp. of Amer., et al., Case: 1:10-cv-05711, Document #412 (ND, Ill., Sept. 28, 2012).
In spite of the Kleen Products precedent, a producing party will often need to sell the benefits of these new methods to the requesting party. The requesting party will be more likely to cooperate if they understand the value to them of these methods. This will often require the producing party to provide some reasonable degree of transparency into the proposed review processes. For instance, if you’ve got a hot SME lined up to drive the CAR, tell them. Share his or her qualifications and experience. Engage in relevancy dialogues with them. Make sure you are training the machine to find the documents that they really want. Clarify the target. If need be, share some examples early on of the relevant documents you will use in the training. Invite them to provide you with documents they consider relevant to use in the training. Try to make it at least a somewhat collaborative approach. If they refuse, document your efforts and take any disputes to the judge as early as possible.
An at least quasi-collaborative approach is how you can convince a requesting party that the estimate is reasonable and that you are not still stuck in the old paradigm of hide-the-ball discovery games. I cannot overstate how important it is to develop trust between opposing counsel on discovery. Often the only way to do that is through some level of transparency. You do not have to disclose all of your search secrets, but you may have to keep the requesting party somewhat informed and involved in the process. That is what cooperation looks like.
All Review Projects Are Different
In order to be able to make a valid estimate for bottom line driven proportional review you must closely study the case and review project goals. It is not enough to have handy per file cost estimates. This move to actual examination of the ESI at issue, and study of the specific review tasks, is equivalent to the move in construction estimation from rough estimates based on average per square foot prices, to a careful study of the building’s plans and specifications, and a site visit with inspection and measurements of all relevant conditions. No builder would bid on a project without first doing the detailed real world estimation work. We lawyers must do the same for this method to succeed.
Even in the same organization, and just dealing with email, the variances between custodians can be tremendous. Some for instance may have large amounts of privileged communications. This kind of email takes the most time to review, and if relevant, to log. High percentages of confidential documents, especially partially confidential, can also significantly drive up the costs of the second pass review. All of the many unique characteristics of ESI collections can affect the speed of review and total costs of review. That is why you have to look at your data and test sample the emails in your collection to make accurate predictions. Estimation in the blind is never adequate. It would be like bidding on a building without first reading the plans and specs.
Even when you have dealt with a particular client’s email collection before, a repeat customer so to speak, the estimates can still vary widely depending on the type of lawsuit, the issues, and on the amount of money in controversy or general importance of the case.
Although this may seem counter-intuitive, the truth is, complex, big-ticket cases are the easiest to do e-discovery, especially if your goal is to do so in a proportional manner. If there is a billion dollars at issue, a reasonable budget for ESI review is pretty big. On the other hand, proportional e-discovery in small cases is a real challenge, no matter how simple they supposedly are. Many cases that are small in monetary value are still very complex. And complex or not, all cases today have a lot of ESI.
The medium size to small cases are where my bottom line driven proportional review has the highest application for cost control and the greatest promise to bring e-discovery to the masses. Give it a try, with our without predictive coding.
Conclusion
The future of discovery involves new methods of technology-assisted discovery where Man and Machine work together to find the core truth. This day will come; in fact it is already here. As William Gibson said: The future is already here – it’s just not evenly distributed yet. The key facts you need to try a case and to do justice can be found in any size case, big and small, at an affordable price. But you have to embrace change and adopt new legal and technical methodologies. The Bottom Line Driven Proportional Review method is part of that answer, so too are advanced review software at affordable prices. It is already working for some today, it can work for you too.







Ralph, this current post and your recent Search series have raised the bar higher for all of us out there on the edisco speaking circuit, as you have done a wonderful job of educating the legal community on the present gap between a wave of technology-assisted review overhype, and the reality that we, as lawyers, still have important roles to play in any e-discovery review process. Anyone who thinks technology-assisted review is a “solved problem” for the legal profession needs to go back to re-read your columns!
As a point of privilege, I’d like to just say that I am not quite as dystopian as you may have suggested to your faithful readers. It is not so much that we are drowning in information that concerns me (although I do dream of e-mail free Fridays to actually accomplish some work). Rather, the exponentially growing amount of information — especially in the public sector — amounts to a vast, dark, and de facto inaccessible universe, not practically reachable by means of e-discovery, FOIA, or otherwise. That is why I have been on a quest for the past decade or so to find more powerful search techniques than mere reliance on keywords, so as to be able to properly categorize and extract knowledge from the vast e-haystacks of records that are accumulating all around us, as efficiently as possible. I actually am hopeful that in an “arms race” between on the one hand, preserving the newly created exabytes of record information and beyond, and on the other, employing new forms of AI to extract meaning from big data sets, that the AI will in the end win.
I can’t think of anything more exciting for a young lawyer to contemplate than thinking about how to apply state-of-the-art technology in continuing to tackle the volume and complexity issues posed by e-discovery. IMHO, your recent columns should be required reading in every civil procedure course taught in US law schools. Doug Oard and I actually have your search columns already listed on our syllabus for the e-discovery course we’re teaching this Spring to PhD and Masters candidates at U. Maryland’s School of Information Studies.
Thanks Jason. As usual, you were right about what we need next in e-discovery, namely standards. I am once again following in your footsteps by working on a foundation for these standards. Also, thanks for tolerating my false emphasis (I would rather say poetic license) of your information dystopia comment to serve as my foil. Dear readers, Jason is not really a prophet of doom. In fact, he is one of the funniest people I know.
All of us in the eDiscovery field are seeking to tame the “crazy high” costs of document review as you stated. We share the same dream. For in-house counsels such as myself who are actually writing the “check” for those services, the quest for starndards, is as pressing as it will ever be. The cost of review, however, is just one piece of the puzzle. I would like to go one step further and try to come up with estimates for the total cost of eDiscovery or total cost of document production. The challenge in estimating the total cost of production lies in that we have to take into account all that will be collected, processed, reviewed and, ultimately, the small set that will be produced. It is the same challenge in estimating a billable attorney’s rate, when you have to take into account all the overhead in a practice. In order to estimate the total cost of production, from an in-house perspective, one has to account for:
1) collection: costs of the software (if collection is done in-house), or costs of collection (if done by a 3rd party)
2)processing: costs of culling the data and costs of hosting the dataset plus access to reviewers
3) review: costs of contract attorneys for 1st pass, costs of QC, costs of lawfirm for secondary review and privilege, confidentiality codding/logging/redacting
4) production: costs of tiffing, converting data to media for production, costs of hardware.
All those costs have to be factored in when you are trying to determine the total cost per document ultimately produced.
Therein lies the challenge. Sorry to add to your already herculean effort to create standards but, to the extent that we are seeking standardization, I believe we ought to aim for creating processes that will give clients complete cost predictability and visibility. Thanks.
There are yet additional costs — such as the outside vendor’s hosting charges for the data collected. Vendors typically charge for this on a gigabyte basis and those costs can be quite high in a large volume case — e.g., $40,000-$50,000 a month. To the extent that a case and the discovery phase goes on for an extended period of time, those high, monthly costs to keep the data “alive” and available for subsequent searches to use in deposition preparation or for motions purposes can really mount. And, let’s not forget that there will be hosting charges for the data that is produced from the other side — which the receiving party will want to have available for the same types of searches and factual document research as the case progresses. (Query: Why is it that none of us in the e-discovery world ever speaks about the parallel search and review cost (and hosting fees) associated with examining the documents produced by the opposing party or parties? . . . All the steps and costs associated with producing documents in response to discovery is only one half of the story. Granted with documents produced to us, we don’t have to bear the preservation costs, but basically all the rest remains. Just wondering . . . .)
In my experience, most litigation is asymetrical, in that one party (usually plaintiff or government agency) is the recipient of far greater amount of documents than the responding party. So the responding party only has to worry about hosting its own documents, and the requestor has to worry about hosting the docs produced to them.
Even if the litigation is symetrical, the production volume is about 10% of the collected volume by the other side, so the relative added burden to the recipient is not that great.
[...] in the legal blogosphere lately about the cost of production in e-discovery, including here, here, and here. Most of the attention is focused on improving the review process in order to limit [...]
[...] the triumvirate opine on proportionality and how to control costs. As you know from my last blog, Bottom Line Driven Proportional Review, cost control is my current mission, my way of helping Superman in the never-ending battle for [...]
[...] Bottom Line Driven Proportional Review [...]
[...] Bottom Line Driven Proportional Review [...]
Ralph,
I just finished reading your blog entry about Bottom Line Driven Proportional Review, and my initial response is that I am not sure if I buy into it.
I certainly agree that the bottom line is important and that any review should be proportional to the amount at issue in the case, but I find something inherently discomforting about deciding how many documents to review based upon a predetermined budget. If there are truly 30,000 relevant documents in a matter, how can I say I will only look at, and potentially produce 10,000 of them? Is this not a partial truth? If so, how is that really justice?
Of course proportionality must have its place. My concern with your method is that it may place too much emphasis on being able to choose what you will review and produce based on budget alone (ignoring the reality of how many responsive documents there are), and provide lawyers and clients an easy out that allows them to artificially truncate document review. To use a simplified (and perhaps somewhat antiquated) analogy, if you had a case about a motorcycle accident, I would argue it would be difficult to not use the search term “motorcycle accident” simply because it returned more hits than you budgeted for.
I think you attempt to address my issue by cautioning that you have to get your hands dirty and understand the data before creating a budget, and ideally, this would help. Unfortunately, in practice, too often the budget is created without input from e-discovery experts, and/or at times, the people creating the budget are not allowed access to the data until after they have submitted a budget and then been engaged for the work.
As I said, I just read this, so I need to think about it more, but I am left with the feeling that I have to choose between proportionality and the “complete” truth, and furthermore, that Bottom Line Driven Proportional Review may become a tool for abuse and misuse. Perhaps I am being cynical and need to have more faith in my fellow colleagues. Regardless, I do appreciate that you are trying to solve a significant problem, and therefore, whether or not I ultimately agree with you, what you wrote certainly has value to me.
Thanks,
Brandon
“If there are truly 30,000 relevant documents in a matter, how can I say I will only look at, and potentially produce 10,000 of them? Is this not a partial truth?”
Because you can’t know you have 30,000 relevant documents in a matter going in. In fact, you can never know how many relevant documents there are even after you’re done with the review! This is because you don’t know how many were missed by culling or by the reviewers.
It’s a world where a multiplication of imperfections add up to reasonableness. You are going in blind, but you’re coming out only with somewhat better vision. How much better? You’ll never know, unless the universe of documents can fit in a stack of paper on your desk.
“Complete truth” is not absolute. It’s the complete truth within the universe that makes sense given what’s at stake in the case. That’s your proportionality.
[...] in reasonability, but they are also unrealistic and contra to the latest scientific research. In my Bottom Line Driven Proportional Review article I showed how this kind of demand for all relevant ESI is not permitted under the rules and [...]
[...] Part III. My own work has been driven by this hacker focus on impact and led to my development of Bottom Line Driven Proportional Review. Other hacker oriented lawyers and technologists have developed their own methods to give clients [...]
[...] To maximize impact e-discovery teams everywhere should focus on these two, costs and training. They should look for bold new ways to control costs and train attorneys. As discussed briefly in “The Hacker Way” – What e-Discovery Can Learn From Facebook’s Culture and Management, for me cost control means focusing on search, since review is the most expensive part of any production project. It also means building new aggressive culling methods, such as Bottom Line Driven Proportional Review. [...]
[...] Bottom Line Driven Proportional Review [...]
[...] predictive coding culling is affordable in all sizes of cases. It is the ideal tool to implement my Bottom Line Driven Proportional Review [...]
[...] 12 Journal of Technology Law & Policy 1 (June 2007). For more on cost control see my blog Bottom Line Driven Proportional Review. For more on the new approaches in general and predictive coding in particular, see eg. Chris [...]
[...] Also see Final Report on the Joint Project of the American College of Trial Lawyers and the Institute for the Advancement of the American Legal System (2009) (At page 7: “Proportionality should be the most important principle applied to all discovery.”) Also see my recent blog: Bottom Line Driven Proportional Review. [...]
[...] my blog, Bottom Line Driven Proportional Review I added the sixth idea, where the process gets real and takes money into consideration. In that [...]
[...] Bottom Line Driven Proportional Review; [...]
[...] new steps to control e-discovery costs, to make them proportionate. That is why I came up with my Bottom Line Driven Proportional Review approach. But the Patent Committee approach has the advantage of far greater simplicity. Moreover, [...]
[...] legal method I promote for CAR is called: Bottom Line Driven Proportional Review. It is based on the well established legal doctrine of proportionality. See eg.: Good, Better, [...]
[...] on my blog and went public with something I’ve been doing internally at my law firms and that’s bottom-line-driven proportional review. This is something we try to do every chance we get in every case to make sure that our production [...]
Hallelujah. I believe in your dream and I think it is essential that it come true.
This article is required reading for the “Budgeting, Risk Management, and Cost Control in E-Discovery” class taught by Scott Cohen at Bryan University. Many of us in the e-discovery program there subscribe to your blog and are better educated for it. Thank you.
[...] I think this means they used enhanced keyword searches with concept search type expansion of keywords for each topic. Seems similar to the other participants’ description, but they used different software to do it. As the Legal Track Results page shows, TCDI used the automatic (Borg) approach in all of its test runs, and not the TechAssist (Hybrid) approach. They relied upon mathematics, more than Man, including a couple of my favorites, the Golden Ratio and prime numbers. See Eg. Good, Better, Best: a Tale of Three Proportionality Cases – Part One; and Bottom Line Driven Proportional Review. [...]
[...] Bottom Line Driven Proportional Review (2013 Updated Version) [...]
[...] in reasonability, but they are also unrealistic and contra to the latest scientific research. In my Bottom Line Driven Proportional Review article I showed how this kind of demand for all relevant ESI is not permitted under the rules and [...]
[...] Bottom Line Driven Proportional Review (2013 Updated Version) - http://bit.ly/Ydy7Et (@RalphLosey) [...]
[...] Bottom Line Driven Proportional Review (2013 Updated Version) - http://bit.ly/Ydy7Et (@RalphLosey) [...]
[...] Bottom Line Driven Proportional Review 2013 Updated Version | e-Discovery Team ®. [...]
[...] Bottom Line Driven Proportional Review (2013 Updated Version) | e-Discovery Team ®. [...]
[...] Bottom Line Driven Proportional Review (2013 Updated Version). [...]
[...] Multimodal Computer Assisted Review Bottom Line Driven Proportional Strategy. See eg. Bottom Line Driven Proportional Review (2013). I refer to it as a multimodal method because, although the predictive coding type of [...]