My Basic Plan for Document Reviews: The “Bottom Line Driven” Approach – Part Three

This is part three of the series by Ralph Losey explaining his document review strategy. Please read part one and part two first.

Relative Speeds of Review

Assuming your document review project is large enough, and you can use AI enhanced software and good SMEs, you can determine file relevancy at speeds hundred of times faster than linear review by contract lawyers, and can do so far more accurately. That is why an SME with good software can charge ten to twenty times as much as a contract lawyer and still do the first pass review at a fraction of the cost. There is a recent example of this cost-savings approach in Gabriel Techs. Corp. v. Qualcomm, Inc., a 2013 patent case in District Court in California that considered the reasonability of a fee award to a prevailing party.[20] A fee of $2,829,349.10 was awarded for the work of a team of SMEs to review one million documents for possible relevance.

Although I cannot discuss the actual cases I have handled, in 2012 I did a public experiment using what I call a multimodal AI enhanced type of review of 699,082 ENRON documents. I was looking for evidence in this public ENRON dataset concerning involuntary employee terminations, a fairly simple relevancy determination. I completed the first pass review at an average speed of 13,444 files per hour.[21] Speeds such as this are common in the type of employment law issues I have been dealing with in the past couple of years, but similar speeds are attainable in other types of cases too. For instance, I recently completed another more complex first-pass review of over a million and a half documents at an average speed of 35,831 files per hour. I did this review myself, since I happened to be the only SME available for this project involving non-employment law legal issues. There were significant time pressures for this review and I completed the first-pass review of over a million and a half documents in only two weeks.

Back to the Small Case Example

Back to the small case example of only 25,000 documents, let us assume a modest AI enhanced first-pass review speed of 2,000 files per hour. This means an SME could complete the review in 12.5 hours. (Typically SME’s using AI enhanced software review far more than 25,000 documents and so are able to attain faster speeds. The math and costs savings still work if the cost of using the predictive coding software and other transaction costs are not too high.) It would probably take the SME about 2.5 hours to master the particular factual issues in the case, so let us assume a total time of 15 hours, and a review rate for this SME of $500.00 per hour (in a small case like this SMEs at relatively low rates like this are common, whereas in larger cases the SME rates can be much higher, but the speed of review and savings realized can also be much larger.) That means an expense for first pass review (excluding software charges) of $7,500.00, which is still half the cost of traditional manual review.

Under a traditional contract lawyer review, where we assume a very fast speed (for them) of 75 files per hour (producing a total of 333.33 hours of review time), and a very low unmotivated lawyer rate of $50.00 per hour, you have a projected fee of $16,666.50. Even though the $500 per hour rate of the SME is 10-times higher than the contract lawyers, since the SME is 26.67 times faster, the SME AI enhanced review still costs half as much. That is because it would take the contract lawyers 333.33 hours to complete the project, and, this is important, they would necessarily do so with a far lower accuracy rate. The contract lawyers working without the assistance of AI augmentation are likely to find far fewer relevant documents than the automated SME approach.[22] This makes clear the power and importance of SMEs driving predictive coding review, and why, along with their current scarcity, attorneys with both subject matter expertise and advanced search skills are now in such demand.[23]

In our hypothetical after the first pass review of 25,000 documents the slower protection review begins. Now the highly skilled SMEs are no longer required. Lower-paid contract lawyers, or associates and paralegals, can do the review of the documents the SMEs have determined to be relevant. (There is still, of course, expert attorney supervision required of the contract reviewers, associates and paralegals.) Let us assume that the first pass review found that 2,500 of the 25,000 documents were relevant. (The ten percent relevance ratio, aka “prevalence rate,” is a relatively high rate.) This means that only 2,500 documents are subject to second pass review for confidentiality. Let us assume this work goes at an average rate of 20 files per hour. (Remember this review includes incidental related activities such as preparing a privilege log.) The 2,500 documents could be reviewed for protections in 125 hours at a cost of $6,250.00. So our base minimum review cost for both passes is $13,750.00.

I say base minimum because you have additional expenses beyond just contract reviewer time, or junior associate and paralegal time, including the expense of partner, senior associate, and project manager time to supervise other attorneys and paralegals, perform quality control reviews, etc., plus software and hosting costs. Let us assume that is another $7,000.00 cost here, for a total expense of $20,750.00.

Congratulations! You came in under your $25,000.00 budget. You completed your review of 25,000 documents at a cost of $0.83 per document instead of the $1.00 per document you had estimated. For smaller projects of less than 25,000 documents, where AI enhanced review would probably not be cost effective, the per document estimate would be higher. Still, a multimodal tested keyword approach can achieve substantial savings and better quality than a brute force linear “review everything” approach.

That $0.83 per document in this small case hypothetical is pretty good. But in larger projects, where you are talking about millions of documents to review, and more realistic prevalence rates, where frequently less than five percent of the total documents are relevant and thus need to be second reviewed, the savings can be even greater. With larger document volumes the review speeds are much higher, and per document review costs are lower, sometimes, much lower.

Estimation Caveats

You could estimate all of these costs in advance by having a bank of experience to draw upon, knowing the likely costs per file range. But remember, even in the world of repeat litigation, like employment law claims, all projects are different. All document sets are different. You have to, as I like to say, get your hands dirty in the digital mud. You have to know your ESI collection before you make an estimate.

Even in just one type of electronic document, the one most common in e-discovery today, email and attachments, the variances in email collections can be tremendous. With those variables come different review speeds and confidentiality concerns. The review speeds, and thus the review costs, depend on the density of the documents themselves and difficulty of the document classifications. That is where the art of estimation comes in and depends on your understanding of the project. You have to understand the blueprints and specs of a project before you can make a valid estimation.

This is especially true of the SMEs’ first-pass work. You need to do some sampling to see what review rates apply. How long will it take these particular SMEs to do the tasks assigned to them in this case with this data. Sampling is the only reliable way to do that, especially when it comes to the all important prevalence calculations. As a general rule of thumb I have found that the higher the prevalence, the lower the review speed and thus higher the cost. Random sampling to determine prevalence is not only a routine part of my normal eight-step AI-enhanced review process to ensure quality control,[24] but also an important step to give reliable cost estimates.

To Be Continued …..

[20] Gabriel Techs. Corp. v. Qualcomm, Inc., No. 08CV1992 AJB (MDD), 2013 WL 410103 (S.D. Cal. Feb. 1, 2013) (keyword bulk culled 12,000,000 documents down to 1,000,000 (not a best practice) and then used predictive coding type rule based algorithms (not AI) to determine relevancy of the remaining 1,000,000 relevant).

[21] See Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron found at http://e-discoveryteam.com/wp-content/uploads/2013/04/predictive-coding-narrative_corrrected_3-21-13.pdf

[22] To back up the likely far better accuracy claim for an SME driving a CAR, please see the following scientific studies: Grossman & Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Rich. J.L. & Tech., Spring 2011; Grossman & Cormack, Inconsistent Responsiveness Determination; Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review, Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010; Voorhees,Variations in relevance judgments and the measurement of retrieval effectiveness, 36:5 Information Processing & Management 697, 701 (2000); Losey, R., A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents, found in two parts at http://e-discoveryteam.com/2013/06/11/a-modest-contribution-to-the-science-of-search-report-and-analysis-of-inconsistent-classifications-in-two-predictive-coding-reviews-of-699082-enron-documents/, and http://e-discoveryteam.com/2013/06/17/comparative-efficacy-of-two-predictive-coding-reviews-of-699082-enron-documents/.

[23] See David Cowen, Job Market Heating Up for e-Discovery Technologists, Managers, and Attorneys; Losey, R., Reinventing the Wheel: My Discovery of Scientific Support for “Hybrid Multimodal” Search found at http://e-discoveryteam.com/2013/04/21/reinventing-the-wheel-my-discovery-of-scientific-support-for-hybrid-multimodal-search/.