“Where The Money Goes” – a Report by the Rand Corporation

The Rand Corporation is a well-known and prestigious non-profit institution. Its stated charitable purpose is to improve policy and decision-making through research and analysis. It has recently turned its attention to electronic discovery. Rand concluded, as have I, and many others, that the primary problem in e-discovery is the high cost of document review. They found it constitutes 73% of the total cost of e-discovery. For that reason, Rand focused its first report on electronic discovery on this topic, with side comments on the issue of preservation. The study was written by Nicholas M. Pace and Laura Zakaras and is entitled Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery. It can be downloaded for free, both a summary and the full report (131 pages). A nicely bound paper version can be purchased for a modest fee of $20.

The full report is actually much better than the summary, in no small part because it shows the degree of care they used, and the honest disclaimers they make concerning the research. The disclaimers are needed because the study was only based on input from eight corporations. Still, it is a well written report with excellent analysis. I suggest you make time to read the full report.

The Rand Corporation Confirms Our Own Analysis and Makes The Same Bold Recommendations

The report not only analyzes the problem, it recommends a solution. Basically it says what I have been saying now for years, be bold, and take forward thinking action now to fight the high-cost problem head on. See Impactful, Fast, Bold, Open, Values: Guidance of the “Hacker Way.” As I have said before, the words lawyer and timid are not supposed to go together. Yet that is what we have here when it comes to the Bar’s use of advanced technologies, even when it is in the clients’ best interests. The Rand report recognizes the widespread timidness of many in the legal community, and makes the following recommendation at page 83, one that I strongly endorse:

To truly open the doors to more-efficient ways of conducting large-scale reviews in the face of ever-increasing volumes of digital information, litigants that have complained in the past about the high costs of e-discovery will have to take some very bold steps.

What action does the Rand study recommend as the core solution to the high costs of review? Again, it is the same mantra that most everyone in the field of e-discovery has been saying, fight the problems caused by technology (i.e. – too much information) by the intelligent use of even more technology. By intelligent we mean use the technology as part of a valid legal methodology, one based on the law. Do not just use technology on its own, for its own sake. The technology has to be run by lawyers, not techs. Sorry my tech friends, lawyers have to drive the CAR, computer assisted review.

The legal method I promote for CAR is called: Bottom Line Driven Proportional Review. It is based on the well established legal doctrine of proportionality. See eg.: Good, Better, Best: a Tale of Three Proportionality Cases, Part One and Part Two. Of course, my way is not the only way for the CAR highway. There are many other valid legal methods to use advanced technologies. There are many other reasonable applications in use by other respected attorneys in the field. The focus on budgeting, estimation, transparency, cooperation, and proportionality is just my particular method. One that I encourage others to follow.

The Rand Report does more than just recommend the use advanced technology, it actually endorses one particular type of technology, my friend Predictive Coding. That’s right, this prestigious, non-profit, independent group has reach the same conclusions that I have, and many, many others have (in fact, you would be hard pressed to find any bona fide expert to argue against the idea of predictive coding). It is now official. Predictive coding is the best answer we have to the problem of the high costs of e-discovery. Of course, there will be good faith debates for years to come on the best methods to use this new technology, and in what cases it is appropriate. The Rand report discusses all of these considerations.

The conclusion of the report states at pages 97-99:

The most promising alternative available today for large-scale reviews is the use of predictive coding and other computerized categorization strategies that can rank electronic documents by the likelihood that they are relevant, responsive, or privileged. Eyes-on review is still required but only for a much smaller set of documents determined to be the most-likely candidates for production. Empirical research suggests that predictive coding is at least as accurate as humans in traditional large-scale review. Moreover, there is evidence that the number of hours of attorney time that would be required in a large-scale review could be reduced by as much as three-fourths, depending on the nature of the documents and other factors, which would make predictive coding one answer to the critical need of significantly reducing review costs. …

Despite the apparent promise of predictive coding and other computerized categorization techniques, however, the legal world has been reluctant to embrace the new technology. … the key reason is the absence of widespread judicial approval of the methodology, specifically regarding any acknowledgment of the adequacy of the results in actual cases or whether the process was a reasonable way to prevent inadvertent privilege waiver. Without clear signs from the bench that the use of computer-categorized review tools should be considered in the same light as eyes-on review or keyword searching, litigants involved in large-scale reviews are unlikely to employ the technologies on a routine basis. …

The use of computerized categorization techniques, such as predictive coding, will likely become the norm for large-scale reviews in the future, given the likelihood of increasing societal acceptance of artificial intelligence technologies that might have seemed like improbable science fiction only a few decades ago. The problem is that considerable sums of money are being spent unnecessarily today while attitudes slowly change over time. New court rules might move the process forward, but the best catalyst for more-widespread use of predictive coding would be well-publicized instances of successful implementation in cases in which the process has received close judicial scrutiny. It will be up to forward-thinking litigants to make that happen.

Again, I join the call to all forward-thinking litigants to, in the words of Star Trek, boldly go where no man has gone before. See eg. Predictive Coding Based Legal Methods for Search and Review; and, New Methods for Legal Search and Review. I am reminded once again of the words of a famous Indian lawyer turned saint: Be the change that you wish to see in the world. Mahatma Gandhi.

By the way, even though this report basically affirms my own analysis and blogs, I had absolutely no involvement in the research or preparation of this report. I am not sure I have even met Nicholas Pace and Laura Zakaras. But I note that two of the top experts in our field did help out the Rand newcomers, mainly Thomas Y. Allman and Jason R. Baron. I am of course influenced by their many excellent writings, just as I will henceforth be influenced by the Rand report of Pace and Zakaras. That is how knowledge always advances in every field of law, technology, and science. As my readers well know, my opinions are an amalgamation of the thinking of all of the leaders in the field. Only a few of my thoughts are truly original. If I occasionally appear to be smart and far-seeing, it is only because I am standing on the shoulder of giants. It has always been so.

Rand Describes Predictive Coding

Pace and Zakaras not only recommend predictive coding, they venture deeply into the who, what, when, where and why of the new technology. For instance, they do a nice job of describing how predictive coding works at page 59 of the report:

Predictive coding, sometimes referred to as suggestive coding, is a process by which the computer does the heavy lifting in deciding whether documents are relevant, responsive, or privileged. This process is not to be confused with keyword-based Boolean searches or the similarity detection technologies described in Chapter Four. Near-duplication techniques, clustering, and email threading can help provide organizational structure to the corpus of documents requiring review but do not reduce the document set that has to be reviewed by attorneys for specific aspects, such as responsiveness or privilege. Predictive coding, on the other hand, takes the very substantial next step of automatically assigning a rating (or proximity score) to each document to reflect how close it is to the concepts and terms found in examples of documents attorneys have already determined to be relevant, responsive, or privileged. This assignment becomes increasingly accurate as the software continues to learn from human reviewers about what is, and what is not, of interest. This score and the self-learning function are the two key characteristics that set predictive coding apart from less robust analytical techniques.

They go on to point out at page 61 what they call an ironic feature of predictive coding, which, by the way, I now sometimes also like to call Intelligent Review or Probabilistic Review:

As should be clear from this description, predictive coding does not take humans out of the review loop. It requires intensive attorney support throughout the process in order to advance machine learning. Ironically, for a technique that could substantially reduce discovery expenses, the best results will be achieved if the attorneys most closely involved in the case select the seed documents and review sampled extracts, effectively precluding the use of lower cost contract attorneys or LPO vendors for these particular tasks. Moreover, attorney judgment continues to loom large in the process after the application has completed its work, with eyes on review required, for example, to check documents of unknown relevance and responsiveness or look for privileged communications.

Advanced technologies like predictive coding do not replace lawyers. Instead they require better educated lawyers. Still, the days of vast armies of minimum skilled contract lawyers are numbered. Fewer lawyers will be needed for intelligent review, but they will have to be better trained about the case and the technology. They will need to be SMEs – subject matter experts, and technophiles. I know that most contract lawyers will be quite happy about this change, as they have only been willing to suffer through the drudgery of never-ending email reading because of the economy. I predict that many of these lawyers will rise to the occasion and become the best SMEs of the future.

Rand Dares to Mention the Elephant in the Room

The Rand report discusses many resistance factors against the widespread adoption of predictive coding technologies. They even  touch on the one that most analysts dare not mention. They raise the issue of the vested financial interests of certain companies and law firms to continue expensive, over-review of documents. Here is how Pace and Zakaras describe it at page 76:

Resistance of External Counsel

Another barrier to the widespread use of predictive coding could well be resistance to the idea of outside counsel motivated not so much by accuracy issues as by the potential loss of a historical revenue stream. Some interviewees reported grumblings from outside counsel when their companies decided to directly handle a fraction of the overall review process or to markedly reduce what was shipped out for review through the use of additional data processing.

My applause to the Rand Corporations for this bold statement of the obvious. I hope they have been warned, as I was when I stood next to the elephant in the picture, not to touch him. If he steps on your toes, your whole foot will be crushed.

Vendor Cost

I always include in my essays on predictive coding a call for vendors to bring down the prices of these advanced software features. The high prices are a serious impediment to adoption by even brave attorneys and forward-thinking litigants. The prices of most vendors today usually restrains the use of predictive coding to big cases. The Rand report once again validates my complaints at page 98:

Moreover, computer applications for conducting review are unlikely to be economically viable options when dealing with smaller document sets, in which any savings in attorney hours might be overwhelmed by vendor costs and machine-training requirements. Existing approaches, such as deduplication, cluster analysis, and email threading, may provide a more practical answer in these situations.

By the way, predictive coding is not a replacement of all other search methods, it is a supplement. It is the current crown jewel of search, to be sure, but it is still just one of many methods. It is one tool in an arsenal or weapons. That is why I call my search method multimodal. It features predictive coding, but includes other types of review too, including keyword search and human eyes-on review. Predictive Coding Based Legal Methods for Search and Review.

As the Rand report indicates, cases with smaller documents sets are not yet economically viable for predictive coding. But, when vendors do finally heed my call and lower prices, predictive coding will be economically viable for many more cases. Then the full arsenal of truth-seeking missiles can be used in even medium-sized cases.

Preservation Woes

The Rand report also looks into corporate complaints of the high cost of preservation. This topic is something of an add-on to the primary topic of review, but it is still well worth reading. Preservation expenses are, after all, present in every case, which is not necessarily true with expensive review costs. The survey showed that preservation has become a significant financial burden for many companies, with many explanations on why, but nobody seemed to have good metrics on the burdens. Rand recommends that corporations begin to systematically track costs in this area. Uncertainty and conflicts in the law of preservation were also discussed, but no recommendations were made. For a new case finding gross negligence in preservation, but only awarding monetary sanctions, see Telecom, Inc. v. Global Crossing Bandwidth, Inc. No. 05-CV-6734T (W.D.N.Y. Mar. 22, 2012). Compare with Aviva USA Corp. v. Vazirani, No. 11-0369 (D. Ariz. Jan. 10, 2012) where monetary sanctions and an adverse inference is granted. Compare both with Spanish Peaks Lodge, LLC v. Keybank National Assoc., No. 10-453  (W.D. Penn. Mar. 15, 2012) where no sanctions were granted. Compare all of these with United Factory Furniture Corp. v. Alterwitz, No. 2:12-cv-00059-KJD-VCF, 2012 WL 1155741 (D. Nev. Apr. 6, 2012) where mirror imaging was ordered for preservation.


Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery is a must read that is within everyone’s budget. It can be downloaded for free, both a summary and the full report (131 pages), but I recommend you read the full report. Although I disagree with a few points in the report, they are not worth examination. For the most part they got it right. It will be interesting to see what companies, if any, heed their call for forward-thinking litigants to take bold steps to use predictive coding. Regardless, kudos to the Rand Corporation, the RAND Institute for Civil Justice and the authors of the report, Nicholas M. Pace and Laura Zakaras, for a job well done.

14 Responses to “Where The Money Goes” – a Report by the Rand Corporation

  1. craigball says:

    Hi Ralph:

    Interesting post. The Rand Study recites: “Near-duplication techniques, clustering, and email threading can help provide organizational structure to the corpus of documents requiring review but do not reduce the document set that has to be reviewed by attorneys for specific aspects, such as responsiveness or privilege.”

    I don’t agree. Near-duplication techniques, when properly applied, permit a reviewer to look at but a single iteration of an item, excluding from review other items defensibly deemed to be so closely duplicative as to obviate the need for individualized review. Likewise, when properly implemented, e-mail threading enables a reviewer to look at each component of a thread just once, instead of running into iteration-after-iteration under parent messages.

    Yes, these techniques DO provide improved structure, but they also eliminate significant volume without sacrificing quality. That is, they *do* reduce the document set that has to be reviewed by minimizing the incidence of senseless repetitive review so widespread today.

    We have yet to see the full flush of benefits from broadbased implementation of even these simpler and less-costly techniques. Drive the CAR, if you’ve got the gas money; but, we needn’t run bikes off the road. There are lots of ways to get where we’re going in e-discovery, and plenty of potholes to pave, all along the way.

  2. (from Nick Pace at RAND…)

    The choice of words I used in the report to describe the potential impact of near-dupe detection, clustering, and threading were, alas, unfortunate, a fact that lept out at me as I read Mr. Ball’s comments. What I had hoped to get across was that these techniques, in and of themselves, do not have the same type of effect in reducing the sheer number of documents for the review as do culling, de-duplication, de-NISTing, and similar processes. Twenty versions of the same memo would still be 20 individual documents after being identified as closely related by a near-duplication search; 30 reports that essentially cover the same topic would still be 30 reports after being grouped together by a clustering application; and 40 emails all involving the same conversation would still be 40 individual messages after being linked by a threading process. The big savings in time (and therefore money) would come only if the reviewer was comfortable in applying the decision he or she made on the first of the 20 near-dupes to the other 19 (or on the most representative document in the cluster to the other 29, or on the first message in the thread to the other 39…). If this sort of bulk decision isn’t possible, then presumably good practice would require looking at each of the individual memos, reports, or emails. Such a review will undoubtedly be more efficient because the document set is far better organized, but unless bulk decisions can be made on a routine basis, the vendor claims that we read of effective rates of 300, 500, and even 1000 documents reviewed per hour simply aren’t realistic.

    But even with computer-categorized document review, it still makes sense to apply these sorts of leveraging analytics. At the end of the day, attorneys are still needed to look at the results of process. Having that residual set organized in a logical way will reduce the amount of time spent by those attorneys. Our recommendation in the conclusions chapter that litigants strongly consider using computer-categorized techniques assumes “that best practices have been followed throughout the e-discovery life cycle”, which would obviously include Mr. Ball’s suggestion of “simpler and less-costly techniques” such as deduplication, cluster analysis, and email threading.

    As for the comment that “We have yet to see the full flush of benefits from broadbased implementation” of leveraging analytics, I couldn’t agree more. It was very difficult to identify clearly transparent examples of what these techniques have accomplished across a considerable number of document productions. I would have loved to run across a study that reported on a dozen or so very large productions, where near-duplication saved X hours of review time, clustering saved Y hours, and threading saved Z hours on average. If the reason for the lack of reported numbers here is that these techniques are not used as often as they should be, then asking why that’s the case seems like an obvious question for the e-discovery community.


    • Ralph Losey says:


      Again, congrats on the good work. I closely studied your report and perfectly understood your intent in the phrase in question. Perfect parsing is impossible, especially in the text of this length.

      We would all like to know if there is another project in the works?


      • “We would all like to know if there is another project in the works?”

        Let me change the direction of the question: Where are the most troubling gaps in information when it comes to e-discovery? How can independent research organizations such as RAND be most helpful to litigants, lawyers, judges, vendors, and policymakers in this area? We certainly understand that the topic is of vital interest to just about everyone in the civil justice system, but we’d like to hear from that same community to learn where we should focus our limited resources. If your readers have suggestions, please send them along.

  3. […] “Where The Money Goes” – a Report by the Rand Corporation […]

  4. […] “Where The Money Goes” – a Report by the Rand Corporation […]

  5. […] There are many fine software programs out there with predictive coding type features. These advanced features, which I call artificial intelligence type features, are now critical to moving the law forward. The recent Rand Report agrees. Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012); also see my blog on this report. […]

  6. […] Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012); also see my blog on the Rand Report. I wonder when insurers are going to catch on to […]

  7. […] The Money Goes” – A Report By The Rand Corporation | eDiscovery Team – bit.ly/JyxAVs (Ralph […]

  8. […] Litigant Expenditures for Producing Electronic Discovery (Rand Corporation, 2012); also see my article on the Rand Report. Use of proper predictive coding search techniques is imperative in today’s age of too […]

  9. Carrie Mallen says:

    You are indeed a gift to all litigators who look to you for your excellent advice and expertise on eDiscovery and informing the “Village”.
    I would like to slightly correct the record in this case as the Rand paper on costs started in mid-2008. I was having discussions (I was whining more than discussing) with Steve Lilley about costs related to eDiscovery and frustration on how to unravel this very complex issue to my GC for the Fortune 12 company I had been working for at the time. We had suffered through 9 years of a massive Securities matter in addition to an already robust litigation practice and I had become intimate with cost breakdown because of the way I had to submit the bills to the insurance companies for payment if legal fees.
    Steve introduced me to Jim Dertouzos who headed up the legal division for Institute of Civil Justice at Rand and we (Nick, Jim & me) had multiple conversations about the topic. They hired me as a consultant and we continued meeting and conversing over the next couple of years. I also called Judge Grimm and tried to get him involved. He was busy in a trial and suggested Tom Allman. I called Tom and he was receptive. I am not sure where Jason came in but if was after me and he has been a major contributor in the practice of eDiscovery.
    My role was simply to provide info on business process and data on costs per segment of the EDRM – most of which were not quantified by empirical data because of the exceedingly broad descriptions in the UTBMS codes. Law firms did not break out costs on billing for this matter so I requested this of them. I had to in order for the legal bills to get paid. Insurers demanded the costs broken out. It was an onerous task that became a very telling commentary on litigation costs tracked for a decade. Cost at that time was primarily in review. We had a pretty sophisticated setup at the time- (2001-2008) Lextranet, Attenex and we used DiscoverReady for review- it was tight as far as costs were concerned-averaged $2.00 a doc.
    Jim became far too busy with his responsibilities with Rand. Nic was immersed in the project and really got what needed to be studied and the focus of the study.
    My small consulting company at the time, “eDiscovery Response Team” had just been formed and needed my time. Nick and I met a couple of times and Nick cleverly took the study to a more granular level by using active data provided by corporations and vendors over a period of time.
    George Socha and I along with another “village” committed to work on the modification of the UTBMS codes to reflect actual eDiscovery costs. George and I submitted a long version and a short version. The short version was adopted because change comes slowly. Clearly transparency in legal billing is antithical to firm profitability.
    I must say your site and the eDiscovery rulings by Judges like Grimm, Facciola, Scheindlin, Peck and good work by Nic and Laura + many more have contributed tremendously to forcing transparency of the costs associated with litigation, particurally review.
    The bottom line is that there are many levels of knowledge and sophistication in this area of law and much inconsistency in the practice of same. Information transfer is necessary for improvement. It takes a Village.
    Carrie Mallen

  10. […] is up to the providers, however, to make these technologies affordable.  Ralph Losey said as much back in 2012 when commenting on the RAND report that touted the transformational potential of […]

Leave a Reply