Day Ten of a Predictive Coding Narrative: A post hoc test of my hypothesis of insignificant false negatives

This is the seventh in a series of narrative descriptions of my predictive coding search of 699,082 Enron emails. My legal search methodology is predictive coding dominant, but includes the four other basic types of search in a process I call hybrid multimodal. The five elements of hybrid multimodal search are shown below using the Olympic rings symbol in honor of the great XXX Olympics concluding now in London.

The preceding narratives are:

In this seventh installment I continue my description, this time covering day ten of the project.

Post Hoc Analysis

In Day Ten I subject myself to another quality control check, another hurdle, to evaluate my decision in day eight to close the search. My decision to stop the search in day eight after five rounds of predictive coding was based on the hypothesis that I had already found all significant relevant evidence. In my opinion the only relevant documents that I had not found, which in information science would be called false-negatives, were not important to the case. They would have some probative value, but not much, certainly not enough to continue the search project.

Put another way, my supposition was that the only documents not found and produced would be technically relevant only, and of no real value. They would certainly not be highly relevant (one of my coding categories). Further, the relevant documents remaining were probably of a type that I had seen before. They were cumulative in nature and thus not worth the extra time, money and effort required to unearth them. See my Secrets of Search, Part III, where I expound on the two underlying principles at play here: Relevant Is Irrelevant and 7±2.

This tenth day exercise was a post hoc test because I had already concluded my search based on my hypothesis that all significant relevant documents had been discovered. I confirmed this hypothesis to my satisfaction in the previously described Day Nine elusion quality control test. This was a random sample test with a 99.9% accuracy finding. (This is to in no way intended to imply 99.9% recall. The elusion test is not intended to calculate recall.) In the elusion test I did a random sample test of all unreviewed documents to search for significant relevant evidence. Only one false negative out of a random sample of 1,065 was found and it was not significant. So I passed the test that was built into my quality control system. But would I now pass this additional post hoc test for significant false negatives?

Day Ten: 3 Hours

I start the day by initiating another round of predictive coding, the sixth round. It only takes a minute to start the process.

As I write this I am now waiting on Inview to do its thing and re-rank all 699,082 documents according to the new input I provided after the last session. This new input was described in Days Seven and Eight. It included my manual review and coding of two sets of 100 computer-selected training documents (total 200), plus review of all 51% plus predicted relevant documents.

At the end of day eight I had attained a count of 659 confirmed relevant documents and decided that enough was enough. I decided that any further rounds of predictive coding would likely just uncover redundant relevant documents of no real importance. I decided to stop the search, at least temporary, to see if I would pass a random sample elusion test for false negatives that I described in Day Nine.

As you know, the passed the test in Day Nine and so the project ended. And yet, here I am again, subjecting myself to yet another test. This Day Ten exercise is the result of my ethical wranglings described at the end of Day Nine.

Mea Culpa

I am still waiting on Inview to give me more information, but whatever the findings, when I now look back on day eight, it seems to me like I made a mistake to stop the search when I did. Even if I pass this latest self-imposed test, and the decision is proven to be correct, it was still a mistake to have stopped there. Hopefully, just a slight mistake, but a mistake just the same. I had already trained 200 documents. I had found one new Highly Relevant document. I had provided new training information for the computer. Why not just take a couple of more hours to see what impact this would have?

The lesson I learned from this, which I pass on to you here, is never to stop a project until you see the last report and results of training documents. Why guess that nothing of importance will come of the next training when it is easy enough to just run another round and find out? The answer, of course, is time and money, but here I guessed  that only a few new relevant documents would be found, so the costs of the extra effort would be negligible. In retrospect, I think I was too confident and should have trusted by instincts less and my software more. But I will soon see for myself if this was harmless error.

Moment of Truth

The Moment of truth came soon enough on a Sunday morning as I logged back on to Inview to see the results of the Sixth Round. I began by running a search for all 51%+ predicted relevant documents. The search took an unusually long time to run. In the meantime I stared at this screen.

Call me over-dramatic if you will, but I was getting nervous. What if I made a bad call in stopping before the sixth round?

Finally it completed. There were 566 documents found. So far so good. Slight sigh of relief. If it were thousands I would have been screwed. Remember, I had already coded 659 documents as relevant. The computer’s predicted relevant numbers were less than my last actuals.

After determining count, I sorted by categorization to see how many of the predicted relevant had not previously been categorized? In other words, how many of the 566 were new documents that I had not looked at before? Another slight sigh of relief. The answer was 51. These were 51 new documents that I would now need to look at for the final test. So far, this is all as predicted. But now to see if any of them were significant relevant. (Remember, I had predicted that some relevant would left, just not significant relevant.)

I noticed right away that 1 of the 51 documents had already been reviewed, but not categorized. I frequently did that for irrelevant documents of a type I had seen before. It was an Excel spreadsheet with voluntary termination  payout calculations. I still thought it was irrelevant. Now on to the 50 documents that I had not reviewed before.

The 50 New Documents

Four of the fifty were the same email with the subject Bids Open for Enron Trading Unit. They had a 71.3% prediction of relevance. It was an AP news article. It had to do with an upcoming bankruptcy sale of Enron contracts. It included discussion of employees complaining about Enron’s employee termination policy. Here is the relevant excerpt for your amusement. Note the reference to protesters carrying Moron signs.

It might be relevant, or might not. It was a newspaper article, nothing more. No comments by any Enron employees about it. I guess I would have to call it marginally relevant, but unimportant. There were now only 46 documents left to worry about.

The next document I looked at was a three page word document named Retention program v2.doc. It had to do with the payment of bonuses to keep employees from leaving during the Enron collapse. It had a 59.3% probable relevant prediction. I considered it irrelevant. There were several others like that.

Another document was an email dated November 15, 2001 concerning a rumor that Andy Fastow was entitled to a nine million dollar payout due to change in control of Enron. I remembered seeing this same email before. I checked, and I had seen and marked several copies of versions of this email before as marginally relevant. Nothing new at all in this email. There were several more document examples like that, about 25 altogether, documents that I had seen before in the exact same or similar form. Yes, they were relevant, but again duplicative or cumulative. It was a complete waste to time to look at these documents again.

I also ran into a few documents that were barely predicted relevant that had to do with voluntary termination and payment of severance for voluntary termination. The software was still having trouble making the differentiation between irrelevant voluntary and relevant involuntary. It was understandable in view of the circumstances. It was a grey area, but bottom line, none of these borderline documents presented were deemed relevant by me during this last quality control review.

One new relevant document was found, a two page spreadsheet named Mariner events.xls bearing control number 1200975. It had an agenda for a mass termination of employees on August 23, 2001. It apparently pertained to a subsidiary or office named Mariner. I had seen agendas like this before, but not this particular one for this particular office. I had called the other agendas relevant, so I would have to consider this one relevant too. But again, there was nothing especially important or profound about it.

In that same category as a new relevant document, but not important, I would include an email dated November 20, 2001, from Jim Fallon, bearing control number 11815873, who was trying to get his employment agreement changed to, among other things, provide benefits in case of termination.

The last document I considered seemed to address involuntary terminations and tax consequences of some kind concerning a so-called clickathome program. Frankly, I did not really understand what this was about from this email chain. The last date in the chain was June 15, 2001. The subject line is Clickathome – proposed Treatment for Involuntary Terminations – Business reorganizations. It has control number 15344649 and is three pages long. It was predicted 66.9% likely relevant. The emails look like they pertain to employees who are transferred from one entity to another, and does not really involve employment termination at all. I cannot be sure, but it certainly is not important in my mind. Here is a portion of the first page.

I was kind of curious as to what the clickathome program was that the emails referred to, so I Goggled it. At page two I found an Enron document that explained:

clickathome is Enron’s new program that gives eligible employees a computer and Internet connection (including broadband connectivity where available through program-approved vendors) for use at home.

Now I understood the reference in the email to a “PC forfeiture penalty.” I suppose maybe this email chain worrying about tax consequences of PC forfeiture in the clickathome program might be technically relevant, but again, of no importance. Just to be sure I was not missing anything, I also keyword searched the Enron database for clickathome and found 793 hits. I looked around and saw many emails and documents had been reviewed before and classified as irrelevant that pertained to the clickathome program where an Enron employee could get free PC from Dell. I was now comfortable that this email chain was also unimportant.

Hypothesis Tested and Validated

This meant that I was done. The second quality control test was over. Although I found 32 technically relevant documents as described above, no major relevant documents had been found. I had passed another test. (If you are still keeping score, the above additional review means I found a total of 691 relevant documents (659+4+25+1+1+1) out of my yield point projection at the beginning of the project of 928 likely relevant. That means a score of almost 75%. Not bad.)

It all went pretty much as expected and predicted at the end of Day Eight. I had wasted yet another perfectly good Sunday afternoon, but at least now I knew for sure that the sixth round was not necessary. My hypothesis that only marginally relevant documents would turn up in another round had been tested and validated.

I suppose I should feel happy or vindicated or something, but actually, tired and bored are the more accurate adjectives to describe my current mood. At least I am not feeling embarrassed, as I was concerned might happen.

By the way, the three hours that this last day took would have gone faster but for the many Internet disconnects I experienced while working from home. My three hours of reported time did not include the substantial write-up time, nor time waiting for the computer to train. Sigh. Test and writing is over. Time to jump in the pool!

Conclusion: Come On In, The Water’s Fine

I hope this longer than intended narrative fulfills its purpose and encourages more lawyers to jump in and use predictive coding and other advanced technologies. The water is fine. True, there are sharks in some pools, but they are outside the pool too. They are a fact of life in litigation today. Discovery As Abuse is a systemic problem, inherent in the adversarial model of justice. The abuses are by both sides, including requesters who make intentionally over-broad demands and drive up the costs every chance they get, and responders who play hide-the-ball. Predictive coding will not cure the systemic flaws, but it will lessen the bite.

The multimodal hybrid CAR with a predictive coding search engine can mitigate your risks and your expenses. More often than not, it can save you anywhere from 50% to 75% in review costs and improve recall. The new technology is win win for both requesting parties and responding parties. I urge everyone to give it a try.

When you go in and swim please remember the five rules of search safety. They were explained in my Secrets of Search trilogy in parts OneTwo, and Three and are shown below in another version of the Olympic rings.

These five, when coupled with the five Olympic rings of multimodal search shown at the top of this essay, provide a blueprint for effective legal search. These ten, shown as one large symbol below, are a kind of seed set of best-practices principles. The legal profession can use them as a beginning to develop full peer-reviewed standards for reasonable legal search. I join with Jason R. Baron and others in a call for these new standards.

5 Responses to Day Ten of a Predictive Coding Narrative: A post hoc test of my hypothesis of insignificant false negatives

  1. […] accuracy measure is not as well-known. I have been using the term in the last two installments my seven-part Search Narrative where I showed how it can be a quality control tool. Careful readers will also note that this […]

  2. […] tests, for instance using Prevalence as a guide or target for relevant search as I described in my seven part Search Narrative. Just as we must use multimodal search efforts for effective search of large-scale data sets, so […]

  3. […] of 685,592 documents. (TREC used the same Enron database put together by EDRM that I used in my seven-part search narrative, only my count was 699,082 documents, not 685,592, for reasons unknown. See: Day One of a […]

  4. […] you were encountering in the Kroll OnTrack system. [William is referring to my descriptions in the seven-part search narrative using […]

  5. […] a Predictive Coding Narrative: Post Hoc Test of Hypothesis of Insignificant False Negatives – (Ralph […]

Leave a Reply