Corrections and Refinements to Predictive Coding Narrative

A careful reader, William H. George, Tanenholz & Ash LLP in Baltimore, caught a mathematical error in my Narrative. He also provided some good suggestions on how to improve the comparative analysis with linear review. Of course, I would correct any error noted, especially in math, but I was happy to see this was an error in my favor. Not sure how the simple math error happened, but my analysis understated how much better my multimodal predictive coding method was than tradition linear by over a factor of ten!

For that reason, please discard the prior version of my Narrative and download and use this updated version instead. Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. (I have also corrected it on my prior blog.) Not only is the math now correct, but I added other slight improvements as well in the analysis sections at pages 66-71. Note that to be fair I changed my hypothetical review rate for predictive coding multimodal work to $1,000.00 per hour, and reduced the contract lawyer review rate to $50 per hour. Even with the hourly rate adjustment, the corrected math shows that my fee would have still been 92.6% less.

Here are excerpts of the paragraphs with the now edited statements, starting on page 67:

A contract review team performing a linear review at a rate of 50 docs per hours would take 13,982 hours to complete the project (699,082/50=13,982). As you have seen in this narrative, I completed the project in 52 hours. I did so by relying in a hybrid manner on my computer to work with me, under my direct supervision and control, to review most of the documents for me.

The comparison shows that manual review is two-hundred and sixty-nine (269) times slower than hybrid multimodal (13,982/52=269).

So much for linear review, especially when testing shows that such manual review over large scales is not more accurate. See eg. Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010. In fact, the Roitblat, et al study showed that a second set of professional human reviewers only agreed with the first set of reviewers of a large collection of documents 28% of the time, suggesting error rates with manual review of 72%!

Saving 93% (even with a billing rate twenty times as high)

Consider the costs of these CAR rides, which is central to my bottom line driven proportional review approach. It would be unfair to do a direct comparison and conclude that a linear review costs 269 times more than a predictive coding. Or put another way, that the state-of-the-art predictive coding CAR costs 269 time less than the old fashioned Model-T liner review method. It is an unfair comparison because the billing rate of a predictive coding skilled attorney would not be the same as a linear document reviewer, and the software costs would be higher.

Still, even if you assumed the skilled reviewer charged twenty times as much, the predictive coding review would still cost over thirteen times less.

Let’s put some dollars on this to make it more real. For an old fashioned linear review utilizing a team of contract attorneys (often billed out anywhere from $45 – $80 depending on the market and complexity of the review), let’s assume they were billed out at $50 per hour for their services. At 13,982 hours that would generate a fee of $699,100. On the other hand, at a twenty-times higher billing rate of $1,000 per hour, my 52 hours of work would cost the client $52,000. That represents a savings of $647,100. Not to mention the time savings – 13,892 hours would take a 20 person contract attorney team working 40 hours per week 17 weeks to complete, a 40 person contract attorney team working 40 hour weeks close to 9 weeks, etc. You get the idea.

My multimodal review utilizing predictive coding, even if billing at $1,000 per hour, still cost only 7.4% of what a team of contract review attorneys would have cost undertaking an old fashioned linear review. That is a 92.6% savings.

This is significantly more than the estimate of a 75% savings made in the Rand Report, but in the same dramatic-savings neighborhood. Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012); also see my blog on the Rand Report. I wonder when insurers are going to catch on to this?

. . .

That is the bottom line: seven cents per document versus six dollars and nine cents per document. That is the power of predictive culling and precision. It is the difference between a hybrid, predictive coding, targeted approach with high precision, and a keyword search, gas-guzzler, shotgun approach with very low precision. The recall rates are also, I suggest, at least as good, and probably better, when using far more precise predictive coding, instead of keywords. Hopefully my lengthy narrative here of a multimodal approach, including predictive coding, has helped to show that. Also see the studies cited above and my prior trilogy Secrets of Search: Parts One, Two, and Three.

93% Savings Is Not Possible Under Real World Conditions

In future articles I may opine at length on how my review of the Enron database was able to achieve such dramatic cost savings, 93% ($52,000 vs. $699,100). For one thing, you would hope that attorneys would not review the entire set of documents, even though it had already been technically culled by deduplication, deNisting, and custodian limits. You would hope they would look for further culling alternatives to reduce the total file count. But, I am told by review companies that this kind of full linear review of full data sets still happens everyday. So it is not far fetched to assume a full review of all 699,082 documents for comparison purposes. Even assuming the same number of documents are reviewed, I still do not think that this kind of 93% savings will often be possible in real world conditions, that 50%-75% is more realistic.

Putting aside the question of software costs, the 50%-75% savings assumes a modicum of cooperation between the parties. My review was done with maximum system efficiency, and thus resulted in maximum savings, because I was the requesting party, the responding party, the reviewer, the judge, and appeals court all rolled into one. There was no friction in the system. No vendor costs. No transaction costs or delays. No carrying costs. No motion costs. No real disagreements, just dialogue (and inner dialogue at that).

. . .

These transaction costs, including especially the friction inherent in the adversarial system, explain the difference between a 93% savings in an ideal world, and a 75%-50% savings in a real world, under good conditions, or perhaps no savings at all under bad conditions. Still, as the software improves, and our review techniques improve, so will the review speeds, the average files per hour. For that reason the savings may continue to increase in spite of the transaction costs.

Even if we speed up the file review speeds, we must still also address the transaction costs that arise out of the adversarial system. Much of this arises from unnecessary friction between opposing counsel. . . .

This entry was posted on Thursday, March 21st, 2013 at 7:15 am and is filed under Review, Search, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

10 Responses to Corrections and Refinements to Predictive Coding Narrative

March 30th weekend “Top 20+” e-discovery compendium > “ENOUGH with computer assisted review, already!!” | The Electronic Discovery Reading Room says:

March 29, 2013 at 1:35 pm

[…] Corrections and Refinements to Predictive Coding Narrative – http://bit.ly/14eexvW (@RalphLosey) […]

Loading...

Reply
There Can Be No Justice Without Truth, And No Truth Without Search | e-Discovery Team ® says:

March 31, 2013 at 6:44 pm

[…] of Enron in PDF form for easy distribution and the blog introducing this 82-page narrative, with second blog regarding an […]

Loading...

Reply
Predictive Coding Primer Part One: Estimating Cost Savings says:

April 18, 2013 at 10:22 am

[…] Losey’s Corrections and Refinements to the Predictive Coding Narrative, he tweaks his conclusion and acknowledges that 93% savings may not be possible in real world […]

Loading...

Reply
Predictive Coding Primer Part Two: Key Variables says:

May 8, 2013 at 12:49 pm

[…] review in a case study reviewing 699,082 Enron emails and attachments. However, in Losey’s “Predictive Coding Narrative,” the richness rate was approximately one tenth of one percent – extremely low. The average […]

Loading...

Reply
Technology-Assisted Review: From Expert Mentions to Mainstream Coverage | @ComplexD says:

May 30, 2013 at 10:55 am

[…] Corrections and Refinements to Predictive Coding Narrative – http://bit.ly/14eexvW (@RalphLosey) […]

Loading...

Reply
$3.1 Million e-Discovery Vendor Fee Was Reasonable in a $30 Million Case | e-Discovery Team ® says:

August 4, 2013 at 9:46 pm

[…] which was $0.07 per document, not page, and assumed an SME billing rate of $1,000 per hour. Corrections and Refinements to Predictive Coding Narrative. Now you see why experts are all psyched up about the potential of predictive coding? The cost […]

Loading...

Reply
Legal Search Science | e-Discovery Team ® says:

November 10, 2013 at 11:09 pm

[…] Enron in PDF form for easy distribution and the blog introducing this 82-page narrative, with second blog regarding an […]

Loading...

Reply
depo.com | There Can Be No Justice Without Truth and No Truth Without Search says:

February 14, 2014 at 6:36 am

[…] of Enron in PDF form for easy distribution and the blog introducing this 82-page narrative, with second blog regarding an […]

Loading...

Reply
Latest Grossman and Cormack Study Proves Efficacy of Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two | e-Discovery Team ® says:

July 20, 2014 at 3:32 pm

[…] Relevance in the Ashes of Enron (PDF), plus the blog introducing this 82-page narrative, with second blog regarding an […]

Loading...

Reply
Predictive Coding Primer Part Two: Key Variables says:

July 21, 2014 at 10:05 am

[…] review in a case study reviewing 699,082 Enron emails and attachments. However, in Losey’s “Predictive Coding Narrative,” the richness rate was approximately one tenth of one percent – extremely low. The average […]

Loading...

Reply