Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disagree

This is the third in a series of detailed descriptions of a legal search project. The project was an academic training exercise for Jackson Lewis e-discovery liaisons conducted in May and June 2012. I searched a set of 699,082 Enron emails and attachments for possible evidence pertaining to involuntary employee terminations. The first day of search is described in Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. The second day is described in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane.

The description of day-two was short, but it was preceded by a long explanation of my review plan and search philosophy, along with a rant in favor of humanity and against over-dependence on computer intelligence. Here I will just stick to the facts of what I did in days three and four of my search using Kroll Ontrack’s (KO) Inview software.

Third Day of Review (4 Hours)

I continued to search for more relevant documents, or irrelevant documents, that would be useful in training for the next round of predictive coding. I started the day by running concept searches based on the first twenty-six documents that I had already identified as relevant. I also reviewed from 20 to 50 of the most similar documents per search (one search was especially good and I reviewed the top 100 similar). I only bothered to mark the relevant ones, most of which I also instructed to train. I only took time to mark a few irrelevant documents and marked them to train. These documents were close to relevant and I wanted to try to teach Inview the distinctions.

This exercise of reviewing about 1,500 docs took about four hours. It is faster when you don’t have to mark (code) a document. I attained an average speed of 375 files per hour, even though I had a few documents that I really had to think and look carefully at to determine whether the termination was voluntary or not, or otherwise met the relevancy scope for this assignment. As a result of this exercise I have now found a total of 55 relevant docs, plus 8 more highly relevant docs (total 63).

I did not do any of the IRT (“Intelligent Review Technology”) ranking based reviews at this time because I wanted to train the system more before investing time in that. I did not think Inview had enough relevant documents to train on from the first session, only 2. So I did not want to waste my time on ranking-driven based reviews yet.

The Inview system allows you the flexibility to do that. Some other predictive coding software do not. Still, for academic, record keeping purposes, I did a search of all docs ranked 50% or higher probable relevant. It is interesting to observe that InView agreed with me because it did not rank any docs (-0-) as probable relevant (51% or more).

Fourth Day of Review (8 Hours) 

I began the fourth day by attempting to expand upon the Highly Relevant docs found yesterday, and went from 8 to 14 hot documents. I did this my right-clicking on each of the six hot documents (yes, you can do that on a Mac too) that I started with. I then selected the Add Associated Documents from the drop down menu. That opens up seven more menu selections (Family Members, Threads and Attachments, Duplicates, Near-Duplicates, Core Near-Duplicates, Duplicate Family Members, and Contextual Duplicates).

I ran through most of them for all six docs (and the 8 additional this exercise found) to find the additional highly relevant docs. This took about three hours. I am sure I could do that faster the next time. I learned-by-doing, and by later instructions from KO experts, several new tricks to do this kind of expansion work quicker. Like anything, the more you work with new software, the faster you can go. My thousands of hours playing video games helped too. The next generations of gamers will be able to go even faster, I’m sure.

At this point I initiated another learning session (KO’s language), and so we are now finally starting round two of predictive coding. I figure this might take an hour or more for the KO mainframe computers to run the learning session, so I signed off and did something else for a few hours. You do not have to stay connected for your commands to execute on the mainframes.

Second Round of Predictive Coding

The machine learning, a/k/a auto coding completed, and so at this point I had now run two seed set trainings, two auto coding iterations. Note this iterative design follows standard project management protocol for creative IT processes. See the diagrams below adapted from standard Microsoft project management illustrations

At the end of the second iteration of seed set builds, the total number of training documents, by coincidence, was exactly 2,000 documents. 1,507 of the documents were selected at random by the computer (a/k/a “system identified”) in the first session, and another 493 were selected by me (“trainer identified”) and marked for training before the second session.

I then ran what Inview calls a iC Relationships search, looking for iCategory Probability of Relevance 51% or higher. It had 162 docs. Recall that after the first round, based only on 2 relevant documents, and the rest marked irrelevant, there were no -0- docs with probable (51% +) relevance predictions. Now after the second round, where we had 55 marked relevant in the seed set, the computer returns 162 as probable relevant, and marks the degree of likely relevance for each. In other words, it found 111 new docs that it thought were likely relevant. They had a probability range from 99.9% probable relevance to 54.3%.

The computer keeps going and marks below 51% probable relevance too, all the way down to 1%, but I did not include them in this search and review. I only wanted to see the documents the computer predicted would “more likely than not” be relevant. You could also just see the 90% plus docs, or whatever probability-range you wanted, depending on your purposes, including proportionality, i.w. the number of docs you could afford to have reviewed. You would just adjust the search parameters.

Inview gives you a lot of flexibility in the ways you can look at the probability sets of the total collection. Still, I was not satisfied, and complained about the work flow involved. (Personal note: I have never met a piece of software that I did not think needed improvement, nor a program I could not crash.) In a future version of Inview (coming soon), I have been promised that the display ranking will be easier to determine. They will include a new column in the general display with a probability percentage ranking. Then all you will have to do is click on the probability % column to arrange the total documents display in either descending or ascending order. You can do something like that now by display of the IP Relevance score, which is too complicated to explain, and anyway is awkward and, to me at least, does not work as well.

Training Inview on Close Distinctions

One document is interesting to point out, control number 12005925, and is representative of many others. It is predicted to be 64.8% likely relevant. It is a one-page employee memo agreement having to do with payment of an end-of-year performance bonus. It mentions termination of employment as grounds for forfeiture of the bonus. 

I know that it looks and reads a lot like similar documents used for pay-outs when an employee is terminated, as in a short release. I suppose that is why the computer thought is was probably relevant, but I knew that it is not. So I marked it as irrelevant, and marked it to train, thus continuing the effort to try to teach the computer to distinguish between this document, that is irrelevant, and others similar to it that are relevant. In my experience with this and other predictive coding software, the training on such fine distinctions may take several rounds of instruction. (Same holds true with humans, by the way).

Computer is Free to Disagree with Me

Another interesting document to consider is control #12007393. I had already marked this as irrelevant after the first iteration. I took the time to  add a yellow sticky note with the comment that it was a close question, but I decided to call it irrelevant.

It is a 2-page form waiver and release of claims agreement. It could be used in the case of a termination, or maybe not. It is a form. The computer’s analysis of this document is that it was 77.5% likely relevant. It did not change my coding of the document as irrelevant, but it did not let my prior marking of the document as irrelevant stand in its way of analysis that it probably was relevant. Interesting, eh? The computer is free to disagree with me.

This experience contradicts, or rather refines, the whole GIGO theory (garbage in, garbage out). You can make mistakes and the computer will correct, and suggest corrections. I looked carefully at the document again based on its input, but I stuck by my guns on this. I stayed with the irrelevant call.  But perhaps with another document I might be persuaded. Once you get used to the computer disagreeing with you, you start to realize what a cool feature it is. That is especially true when you consider the fuzziness of relevance and how it can change over time during a review and recall the complete consistency of computer code. Yes. I am a believer in hybrid, but not Borg where the computer makes the final decisions.

For another example of the computer disagreeing, slightly, see document with control number 8400149. I had marked this letter irrelevant, but the computer gave it an 54.5% chance of relevancy. It was a letter referencing consultation with a lawyer about an employment agreement, and what happens if employment is terminated. So I can understand the confusion, but again I am pretty certain I am right about that one.

For one more like this, see 12000474. By the way, to the right is a screen shot of how to quickly find any documents using the Inview control numbers. Each document has it own number, and if it is multiple pages, it has a beginning and end control number.

Again I had placed a sticky note in my prior review that this document was a close question. The computer agreed with me that it was relevant, but marked it as only 74.7% likely. It seems to understand that it is a close question too, even though I cannot share with the computer any more that yes or no. (In future versions of IRT we hope to see ranked relevancy, but that may take some time to perfect, and no one has that level of sophistication, yet.)

I also searched for 51% probable Highly Relevant. I found 12 docs, only one of which was new, control number 12005880. It was a near duplicate of another email already marked highly relevant, which I had not seen before, or possible had seen but missed the connection (as it is not obvious). Recall we had found only 14 Highly Relevant documents before and trained on 13. So it’s not surprising it only returned one new one, as it had so little to go on.

What is somewhat surprising is that the computer essentially disagreed with me on two documents that I had marked as hot (Highly Relevant). On control number 12006691 Inview gave only a 30.4% probability of hot, but did agree that it was relevant with a probability score of 80.3%. This was a form letter for termination only, and included payment of a severance. It looks very similar to forms used for all-purpose employee departures. But recall we are going to define relevance as only involuntary terminations, not voluntary. This can be a subtle distinction when it comes to documentation. Do you agree with Inview or me on this? Hot or Not? (Hmm, might be a good name for a website?)

The other Inview disagreement with me on hot docs is on control # 12005730. Inview had only a 35.5% probability of Hot, and just 50.1% that it was even relevant. Jeesh! This is a short agenda documents with three scenarios. Agenda A employees are told they are important and are not fired. Agenda B employees get fired in person. Poor Agenda C employees get fired by phone. (Wonder if they left a message if no one answered?) Clearly Inview has not yet been trained as to the emotional impact (probative value) of certain kinds of documents.

Finally, let’s look at the one hot doc that I marked as hot, but decided not to train on it, because I thought it might throw Inview off the track. See control number 12006686. It is just an email transmittal that says “please use the following documents.” The only thing hot about it is the attachment.

Inview said it was 97.1% sure it was irrelevant, with a 1% chance it was relevant, and a 1% chance it was hot. (It does not have to add up to 100%.) This is what I expected and it confirms my decision not to mark this stand-alone email as a Trainer in the first place. In fact, if Inview served this document up to me to review in the initial random sample, the so-called “System Identified” documents, where everything you code is automatically a trainer document for Inview auditing purposes, I would have marked it as irrelevant, even if I knew it to be hot due to its association with the attachment. It relevance was purely derivative.

Do not forget, there is no chance for under-production by doing this because we produce the underlying email if one of the attachments is marked for production. No orphan productions. My general rule is that the parent must always accompany the child, unless the requesting party or court does not want them. That is my standard, and I demand the same from parties producing to us. No orphans, please. Think of the poor little email children! But, as you will see later, I am not adverse to separating siblings. However, I am having second thoughts about that too.

This was a long day with a total of 8-hours of search and review work. This does not include the analysis time, nor time it took me to write this up. (The first draft of this narrative was written contemporaneously and shared daily with trainees who were invited to follow along.)

To be continued . . . .

8 Responses to Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disagree

  1. David Sharpe says:

    Ralph –
    I’m really enjoying this series. Thanks very much. One question: do you intend to run some analysis/QC to determine the rate of false negatives (docs that InView is coding as Irrelevant (Relevance score of <50%) that are in fact Relevant)?

  2. Ralph Losey says:

    David – Reading your question carefully, the answer is no. I did not attempt to make careful measurements of the prediction errors of Inview while the system was training, where it predicted relevance, but in fact I confirmed it was irrelevant. In hindsight, that would have been an interesting thing to have done. But, I can tell you anecdotally that it was rare that it would predict a document to be relevant that I had already marked as relevant. The few like that that I did notice I wrote up here.

    I’m pretty sure the kind of metric analysis you ask about is possible using some kind of built-in-report, or search, while the project progresses, but I just did not do it during this project. Now all that would be possible is analysis of the final results, which I may take the time to do later.

    Not sure it would be that helpful in the early rounds of training anyway, as it takes a couple of rounds for the machine to get it, as you will see described in this narrative. I don’t think any software can be an effective searcher after just one round of training. Indeed, I have seen some wild claims by vendors, but that is not one of them.

  3. […] This is my fourth in a series of narrative descriptions of an academic search project of 699,082 Enron emails and attachments. It started as a predictive coding training exercise that I created for Jackson Lewis attorneys. The goal was to find evidence concerning involuntary employee terminations. The first day of search is described in Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. The second day is described in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane. The third and fourth days are described in Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disa…. […]

  4. […] Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disa… by Ralph Losey. […]

  5. […] Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disa…. […]

  6. […] Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disa…. […]

  7. […] Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disa…. […]

  8. […] and Four of a Predictive Coding Narrative: Where I Find that the Computer is Free to Disagree – (Ralph […]

Leave a Reply