Concept Drift and Consistency: Two Keys To Document Review Quality – Part Three

January 29, 2016

This is Part Three of this blog. Please read Part One and Part Two first.

Mitigating Factors to Human Inconsistency

Bob_DylanWhen you consider all of the classifications of documents, both relevant and irrelevant, my consistency rate in the two ENRON reviews jumps to about 99% (01% inconsistent). Compare this with the Grossman Cormack study of the 2009 TREC experiments, where agreement on all non-relevant adjudications, assuming all non-appealed decisions were correct, was 97.4 percent (2.6% inconsistent). My guess is that most well run CAR review projects today are in fact attaining overall high consistency rates. The existing technologies for duplication, similarity, concept and predictive ranking are very good, especially when all used together. When you consider both relevant and irrelevant coding, it should be in the 90s for sure, probably the high nineties. Hopefully, by using todays’ improved software and the latest, fairly simple 8-step methods, we can reduce the relevance inconsistency problem even further. Further scientific research is, however, needed test these hopes and suppositions. My results in the Enron studies could be black swan, but I doubt it. I think my inconsistency is consistent.

ei-recall_sphereEven though overall inconsistencies may be small, the much higher inconsistencies in relevance calls alone remains a continuing problem. It is a fact of life of all human document review as Voorhees showed years ago. The inconsistency problem must continue to be addressed by a variety of ongoing quality controls, including the use of predictive ranking, and including post hoc quality assurance tests such as ei-Recall. The research to date shows that duplicate, similarity and predictive coding ranking searches can help mitigate the inconsistency problem (the overlap has increased from the 30% range, to the 70% range), but not eliminate them entirely. By 2012 I was able to use these features to get the relevant-only disagreement rates down to 23%, and even then, the 63 inconsistently coded relevant documents were all unimportant. I suspect, but do not know, that my rates are now lower with improved quality controls, but do not know that. Again, further research is required before any blanket statements like that can be made authoritatively.

Our quest for quality legal search requires that we keep the natural human weakness of inconsistency front and center. Only computers are perfectly consistent. To help keep the human reviewers as consistent as possible, and so mitigate any damages that inconsistent coding may cause, a whole panoply of quality control and quality assurance methods should be used, not just improved search methods. See eg: ZeroErrorNumerics.com.

ZenB_transparent

The Zero Error Numerics (ZEN) quality methods include:

  • UpSide_down_champagne_glasspredictive coding analytics, a type of artificial intelligence, actively managed by skilled human analysts in a hybrid approach;
  • data visualizations with metrics to monitor progress;
  • flow-state of human reviewer concentration and interaction with AI processes;
  • quiet, uninterrupted, single-minded focus (dual tasking during review is prohibited);
  • disciplined adherence to a scientifically proven set of search and review methods including linear, keyword, similarity, concept, and predictive coding;
  • repeated tests for errors, especially retrieval omissions;
  • objective measurements of recall, precision and accuracy ranges;
  • judgmental and random sampling and analysis such as ei-Recall;
  • active project management and review-lawyer supervision;
  • small team approach with AI leverage, instead of large numbers of reviewers;
  • quality_trianglerecognition that mere relevant is irrelevant;
  • recognition of the importance of simplicity under the 7±2 rule;
  • multiple fail-safe systems for error detection of all kinds, including reviewer inconsistencies;
  • use of only the highest quality, tested e-discovery software and vendor teamsunder close supervision and teamwork;
  • use of only experienced, knowledgeable Subject Matter Experts for relevancy guidance, either directly or by close consultation;
  • extreme care taken to protect client confidentiality; and,
  • high ethics – our goal is to find and disclose the truth in compliance with local laws, not win a particular case.

That is my quality play book. No doubt others have come up with their own methods.

Conclusion

quality_compassHigh quality effective legal search depends in part on recognition of the common document review phenomena of concept shift and inconsistent classifications. Although you want to avoid inconsistencies, concept drift is a good thing. It should appear in all complex review projects. Think Bob Dylan – He not busy being born is busy dying. Moreover, you should have a standard protocol in place to both encourage and efficiently deal with such changes in relevance conception. If coding does not evolve, if relevance conceptions do not shift by conversations and analysis, there could be a quality issue. It is a warning flag and you should at least investigate.

race_car_warning_flagVery few projects go in a straight line known from the beginning. Most reviews are not like a simple drag race. There are many curves. If you do not see a curve in the road, and you keep going straight, a spectacular wreck can result. You could fly off the track. This can happen all too easily if the SME in charge of defining relevance has lost track of what the reviewers are doing. You have to keep your eyes on the road and your hands on the wheel.

NASCAR-Driver that looks like Losey

Good drivers of CARs – Computer Assisted Reviews – can see the curves. They expect them, even when driving a new course. When they come to a curve, they are not surprised, they know how to speed through the curves. They can do a power drift through any corner. Change in relevance should not be a speed-bump. It should be an opportunity to do a controlled skid, an exciting drift with tires burning. Race_car_drift_cornerSpeed drifts help keep a document review interesting, even fun, much like a race track. If you are not having a good time with large scale document review, then you are obviously doing something wrong. You may be driving an old car using the wrong methods. See: Why I Love Predictive Coding: Making document review fun with Mr. EDR and Predictive Coding 3.0.

quality_diceConcept shift makes it harder than ever to maintain consistency. When the contours of relevance are changing, at least somewhat, as they should, then you have to be careful and be sure all of your prior codings are redone and made consistent with the latest understanding. Your third step of a baseline random sample should, for instance, be constantly revisited. All of the prior codings should be corrected to be consistent with the latest thinking. Otherwise your prevalence estimate could be way off, and with it all of your rough estimates of recall. The concern with consistency may slow you down a bit, and make the project cost a little more, but the benefits in quality are well worth it.

If you are foolish enough to still use secret control sets, you will not be able to make these changes at all. When the drift hits, as it almost always does, your recall and precision reports based on this control set will be completely worthless. Worse, if the driver does not know this, they will be mislead by the software reports of precision and recall based on the secret control set. That is one reason I am so adamantly opposed to the use of secret control set and have called for all software manufacturers to remove them. See Predictive Coding 3.0 article, part one.

If you do not go back and correct for changes in conception, then you risk withholding a relevant document that you initially coded irrelevant. It could be an important document. There is also the chance that the inconsistent classifications can impact the active machine learning by confusing the algorithmic classifier. Good predictive coding software can handle some errors, but you may slow things down, or if it is extreme, mess them up entirely. Quality controls of all kinds are needed to prevent that.

Less_More_RalphAll types of quality controls are needed to address the inevitability of errors in reviewer classifications. Humans, even lawyers, will make some mistakes from time to time. We should expect that and allow for it in the process. Use of duplicate and near-duplicate guides, email strings, and other similarity searches, concept searches and probability rankings can mitigate against that fact that no human will ever attain perfect machine like consistency. So too can a variety of additional quality control measures, primary among them being the use of as few human reviewers as possible. This is in accord with the general review principle that I call less is more. See: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One and Part Two. That is not a problem if you are driving a good CAR, one with the latest predictive coding search engines. More than a couple of reviewers in a CAR like that will just slow you down. But it’s alright, Ma, it’s life, and life only.

________________

Since I invoked the great Bob Dylan and It’s Alright, Ma earlier in this blog, I thought I owed it to you share the full lyrics, plus a video of young Bob’s performance. It could be his all time best song-poem. What do you think? Feeling very creative, leave a poem below that paraphrases Dylan to make one of the points in this blog

______________________

 “It’s Alright, Ma (I’m Only Bleeding)”

Bob Dylan as a young man

Bob Dylan

Darkness at the break of noon
Shadows even the silver spoon
The handmade blade, the child’s balloon
Eclipses both the sun and moon
To understand you know too soon
There is no sense in trying.
Pointed threats, they bluff with scorn
Suicide remarks are torn
From the fools gold mouthpiece
The hollow horn plays wasted words
Proved to warn
That he not busy being born
Is busy dying.
Temptation’s page flies out the door
You follow, find yourself at war
Watch waterfalls of pity roar
You feel to moan but unlike before
You discover
That you’d just be
One more person crying.
So don’t fear if you hear
A foreign sound to you ear
It’s alright, Ma, I’m only sighing.
As some warn victory, some downfall
Private reasons great or small
Can be seen in the eyes of those that call
To make all that should be killed to crawl
While others say don’t hate nothing at all
Except hatred.
Disillusioned words like bullets bark
As human gods aim for their marks
Made everything from toy guns that sparks
To flesh-colored Christs that glow in the dark
It’s easy to see without looking too far
That not much
Is really sacred.
While preachers preach of evil fates
Teachers teach that knowledge waits
Can lead to hundred-dollar plates
Goodness hides behind its gates
But even the President of the United States
Sometimes must have
To stand naked.
An’ though the rules of the road have been lodged
It’s only people’s games that you got to dodge
And it’s alright, Ma, I can make it.
Advertising signs that con you
Into thinking you’re the one
That can do what’s never been done
That can win what’s never been won
Meantime life outside goes on
All around you.
You loose yourself, you reappear
You suddenly find you got nothing to fear
Alone you stand without nobody near
When a trembling distant voice, unclear
Startles your sleeping ears to hear
That somebody thinks
They really found you.
A question in your nerves is lit
Yet you know there is no answer fit to satisfy
Insure you not to quit
To keep it in your mind and not forget
That it is not he or she or them or it
That you belong to.
Although the masters make the rules
For the wise men and the fools
I got nothing, Ma, to live up to.
For them that must obey authority
That they do not respect in any degree
Who despite their jobs, their destinies
Speak jealously of them that are free
Cultivate their flowers to be
Nothing more than something
They invest in.
While some on principles baptized
To strict party platforms ties
Social clubs in drag disguise
Outsiders they can freely criticize
Tell nothing except who to idolize
And then say God Bless him.
While one who sings with his tongue on fire
Gargles in the rat race choir
Bent out of shape from society’s pliers
Cares not to come up any higher
But rather get you down in the hole
That he’s in.
But I mean no harm nor put fault
On anyone that lives in a vault
But it’s alright, Ma, if I can’t please him.
Old lady judges, watch people in pairs
Limited in sex, they dare
To push fake morals, insult and stare
While money doesn’t talk, it swears
Obscenity, who really cares
Propaganda, all is phony.
While them that defend what they cannot see
With a killer’s pride, security
It blows the minds most bitterly
For them that think death’s honesty
Won’t fall upon them naturally
Life sometimes
Must get lonely.
My eyes collide head-on with stuffed graveyards
False gods, I scuff
At pettiness which plays so rough
Walk upside-down inside handcuffs
Kick my legs to crash it off
Say okay, I have had enough
What else can you show me?
And if my thought-dreams could been seen
They’d probably put my head in a guillotine
But it’s alright, Ma, it’s life, and life only.

 



“The Hacker Way” – What the e-Discovery Industry Can Learn From Facebook’s Management Ethic

August 18, 2013

Facebook’s regulatory filing for its initial public stock offering included a letter to potential investors by 27 year old billionaire Mark Zuckerberg. The letter describes the culture and approach to management that he follows as CEO of Facebook. Zuckerberg calls it the Hacker Way. Mark did not invent this culture. In a way, it invented him. It molded him and made him and Facebook what they are today. This letter reveals the secrets of Mark’s success and establishes him as the current child prodigy of the Hacker Way.

Too bad most of the CEOs in the e-discovery industry have not read the letter, much less understand how Facebook operates. They are clueless about the management ethic it takes to run a high-tech company.

An editorial in Law Technology News explains why I think most of the CEOs in the e-discovery software industry are just empty suits. They do not understand modern software culture. They think the Hacker Way is a security threat. They are incapable of creating insanely great software. They cannot lead with the kind of inspired genius that the legal profession now desperately needs from its software vendors to survive the data deluge. From what I have seen most of the pointy-haired management types that now run e-discovery software companies should be thrown out. They should be replaced with Hacker savvy management before their once proud companies go the way of the Blackberry. The LTN article has more details on the slackers in silk suits. Vendor CEOs: Stop Being Empty Suits & Embrace the Hacker Way. This essay, a partial rerun from a prior blog, gives you the background on Facebook’s Hacker Way.

Hacker History

The Hacker Way tradition and way of thinking has been around since at least the sixties. It has little or nothing to do with illegal computer intrusions. Moreover, to be clear, NSA leaker Edward Snowden is no hacker. All he did was steal classified information, put it on a thumb drive, meet the press, and then flea the country, to communist dictatorships no less. That has nothing to do with the Hacker Way and everything to do with politics.

The Hacker Way – often called the hacker ethic – has nothing to do with politics. It did not develop in government like the Internet did, but in the hobby of model railroad building and MIT computer labs. This philosophy is well-known and has influenced many in the tech world, including the great Steve Jobs (who never fully embraced its openness doctrines), and Steve’s hacker friend, Steve Wozniak, the laughing Yoda of the Hacker Way. The Hacker approach is primarily known to software coders, but can apply to all kinds of work. Even a few lawyers know about the hacker work ethic and have been influenced by it.

Who is Mark Zuckerberg?

We have all seen a movie version of Mark Zuckerberg in The Social Network, who, by the way, will still own 56.9% voting control of Facebook after the public offering later this year. But who is Mark Zuckerberg really? His Facebook page may reveal some of his personal life and ideas, but how did he create a Hundred Billion Dollar company so fast?

How did he change the world at such a young age? There are now over 850 million people on Facebook with over 100 billion connections. On any one day there are over 500 million people using Facebook. These are astonishing numbers. How did this kind of creative innovation and success come about? What drove Mark and his hacker friends to labor so long, and so well? The letter to investors that Mark published  gives us a glimpse into the answer and a glimpse into the real Mark Zuckerberg. Do I have your full attention yet?

The Hacker Way philosophy described in the investor letter explains the methods used by Mark Zuckerberg’s and his team to change the world. Regardless of who Mark really is, greedy guy or saint (or like Steve Jobs, perhaps a strange combination of both), Mark’s stated philosophy is very interesting. It has applications to anyone who wants to change the world, including those of us trying to change the law and e-discovery.

Hacker Culture and Management

Mark’s letter to investors explains the unique culture and approach to management inherent in the Hacker Way that he and Facebook have adopted.

As part of building a strong company, we work hard at making Facebook the best place for great people to have a big impact on the world and learn from other great people. We have cultivated a unique culture and management approach that we call the Hacker Way.

The word `hacker’ has an unfairly negative connotation from being portrayed in the media as people who break into computers. In reality, hacking just means building something quickly or testing the boundaries of what can be done. Like most things, it can be used for good or bad, but the vast majority of hackers I’ve met tend to be idealistic people who want to have a positive impact on the world.

The Hacker Way is an approach to building that involves continuous improvement and iteration. Hackers believe that something can always be better, and that nothing is ever complete. They just have to go fix it — often in the face of people who say it’s impossible or are content with the status quo.

Hackers try to build the best services over the long term by quickly releasing and learning from smaller iterations rather than trying to get everything right all at once. To support this, we have built a testing framework that at any given time can try out thousands of versions of Facebook. We have the words `Done is better than perfect’ painted on our walls to remind ourselves to always keep shipping.

Hacking is also an inherently hands-on and active discipline. Instead of debating for days whether a new idea is possible or what the best way to build something is, hackers would rather just prototype something and see what works. There’s a hacker mantra that you’ll hear a lot around Facebook offices: `Code wins arguments.’

Hacker culture is also extremely open and meritocratic. Hackers believe that the best idea and implementation should always win — not the person who is best at lobbying for an idea or the person who manages the most people.

To encourage this approach, every few months we have a hackathon, where everyone builds prototypes for new ideas they have. At the end, the whole team gets together and looks at everything that has been built. Many of our most successful products came out of hackathons, including Timeline, chat, video, our mobile development framework and some of our most important infrastructure like the HipHop compiler.

To make sure all our engineers share this approach, we require all new engineers — even managers whose primary job will not be to write code — to go through a program called Bootcamp where they learn our codebase, our tools and our approach. There are a lot of folks in the industry who manage engineers and don’t want to code themselves, but the type of hands-on people we’re looking for are willing and able to go through Bootcamp.

So sayst Zuckerberg. Hands-on is the way.

Application of the Hacker Way to e-Discovery

E-discovery needs that same hands-on approach. E-discovery lawyers need to go through bootcamp too, even if they primarily just supervise others. Even senior partners should go, at least if they purport to manage and direct e-discovery work. Partners should, for example, know how to use the search and review software themselves, and from time to time, do it, not just direct junior partners, associates, and contact lawyers. You cannot manage others at a job unless you can actually do the job yourself. That is the hacker key to successful management.

Also, as I often say, to be a good e-discovery lawyer, you have to get your hands dirty in the digital mud. Look at the documents, don’t just theorize about them or what might be relevant. Bring it all down to earth. Test your keywords, don’t just negotiate them. Prove your search concept by the metrics of the search results. See what works. When it doesn’t, change the approach and try again. Plus, in the new paradigm of predictive coding, where keywords are just a start, the SMEs must get their hand dirty. They must use the software to train the machine. That is how the artificial intelligence aspects of predictive coding work. The days of hands-off theorists is over. Predictive coding work is the penultimate example of code wins arguments.

Iteration is king of ESI search and production. Phased production is the only way to do e-discovery productions. There is no one final, perfect production of ESI. As Voltaire said, perfect is the enemy of  good. For e-discovery to work properly it must be hacked. It needs lawyer hackers. It needs SMEs that can train the machine on what is relevant, on what evidence must be found to do justice. Are you up to the challenge?

Mark’s Explanation to Investors of the Hacker Way of Management

Mark goes on to explain in his letter to investors how the Hacker Way translates into the core values for Facebook management.

The examples above all relate to engineering, but we have distilled these principles into five core values for how we run Facebook:

Focus on Impact

If we want to have the biggest impact, the best way to do this is to make sure we always focus on solving the most important problems. It sounds simple, but we think most companies do this poorly and waste a lot of time. We expect everyone at Facebook to be good at finding the biggest problems to work on.

Move Fast

Moving fast enables us to build more things and learn faster. However, as most companies grow, they slow down too much because they’re more afraid of making mistakes than they are of losing opportunities by moving too slowly. We have a saying: “Move fast and break things.” The idea is that if you never break anything, you’re probably not moving fast enough.

Be Bold

Building great things means taking risks. This can be scary and prevents most companies from doing the bold things they should. However, in a world that’s changing so quickly, you’re guaranteed to fail if you don’t take any risks. We have another saying: “The riskiest thing is to take no risks.” We encourage everyone to make bold decisions, even if that means being wrong some of the time.

Be Open

We believe that a more open world is a better world because people with more information can make better decisions and have a greater impact. That goes for running our company as well. We work hard to make sure everyone at Facebook has access to as much information as possible about every part of the company so they can make the best decisions and have the greatest impact.

Build Social Value

Once again, Facebook exists to make the world more open and connected, and not just to build a company. We expect everyone at Facebook to focus every day on how to build real value for the world in everything they do.

________

Applying the Hacker Way of Management to e-Discovery

Hacker_pentagram

Focus on Impact

Law firms, corporate law departments, and vendors need to focus on solving the most important problems, the high costs of e-discovery and the lack of skills. The cost problem primarily arises from review expenses, so focus on that. The way to have the biggest impact here is to solve the needle in the haystack problem. Costs can be dramatically reduced by improving search. In that way we can focus and limit our review to the most important documents. This incorporates the search principles of Relevant Is Irrelevant and 7±2 that I addressed in Secrets of Search, Part III. My own work has been driven by this hacker focus on impact and led to my development of Bottom Line Driven Proportional Review and multimodal predictive coding search methods. Other hacker oriented lawyers and technologists have developed their own methods to give clients the most bang for their buck.

The other big problem in e-discovery is that most lawyers do not know how to do it, and so they avoid it altogether. This in turn drives up the costs for everyone because it means the vendors cannot yet realize large economies of scale. Again, many lawyers and vendors understand that lack of education and skill sets is a key problem and are focusing on it.

Move Fast

This is an especially challenging dictate for lawyers and law firms because they are overly fearful of making mistakes, of breaking things as Facebook puts it. They are afraid of looking bad and malpractice suits. But the truth is, professional malpractice suits are very rare in litigation. Such suits happen much more often in other areas of the law, like estates and trusts, property, and tax. As far as looking bad goes, they should be more afraid of the bad publicity from not moving fast enough, which is a much more common problem, one that we see daily in sanctions cases. Society is changing fast, if you aren’t too, you’re falling behind.

The problem of slow adoptions also afflicts the bigger e-discovery vendors who often drown in bureaucracy and are afraid to make big decisions. That is why you see individuals like me starting an online education program, while the big boys keep on debating. I have already changed my e-Discovery Team Training program six times since it went public almost two years ago. `Code wins arguments.’ Lawyers must be especially careful of the thinking Man’s disease, paralysis by analysis, if they want to remain competitive.

A few lawyers and e-discovery vendors understand this hacker maxim and do move fast. A few vendors appreciate the value of getting there first, but fewer law firms do. It seems hard for most of law firm management to understand that the risks of lost opportunities are far more dangerous and certain than the risks of a making a few mistakes along the way. The slower, too conservative law firms are already starting to see their clients move business to the innovators, the few law firms who are moving fast. These firms have more than just puffed-up websites claiming e-discovery expertise, they have dedicated specialists and, in e-discovery at least, they are now far ahead of the rest of the crowd. Will the slow and timid ever catch up, or will they simply dissolve like Heller Ehrman, LLP?

Be Bold

This is all about taking risks and believing in your visions. It is directly related to moving fast and embracing change; not for its own sake, but to benefit your clients. Good lawyers are experts in risk analysis. There is no such thing as zero-risk, but there is certainly a point of diminishing returns for every litigation activity that is designed to control risks. Good lawyers know when enough is enough and constantly consult with their clients on cost benefit analysis. Should we take more depositions? Should we do another round of document checks for privilege? Often lawyers err on the side of caution, without consulting with their clients on the costs involved. They follow an overly cautious approach wherein the lawyers profit by more fees. Who are they really serving when they do that?

The adoption of predictive coding provides a perfect example of how some firms and vendors understand technology and are bold, and others do not and are timid. The legal profession is like any other industry, it rewards the bold, the innovators who create new legal methods and law for the benefit of their clients. What client wants a wimpy lawyer who is over-cautious and just runs up bills? They want a bold lawyer, who at the same time remains reasonable, and involves them in the key risk-reward decisions inherent in any e-discovery project.

Be Open

In the world of e-discovery this is all about transparency and strategic lowering of the wall of work product. Transparency is a proven method for building trust in discovery. Select disclosure is what cooperation looks like. It is what is supposed to happen at Rule 26(f) conferences, but seldom does. The attorneys that use openness as a tool are saving their clients needless expense and disputes. They are protecting them from dreaded redos, where a judge finds that you did a review wrong and requires you to do it again, usually under very short timelines. There are limits to openness of course, and lawyers have an inviolate duty to preserve their client’s secrets. But that still leaves room for disclosure of information on your own methods of search and review when doing so will serve your client’s interests.

Build Social Value 

The law is not a business. It is a profession. Lawyers and law firms exist to do justice. That is their social value. We should never lose sight of that in our day-to-day work. Vendors who serve the legal profession must also support these lofty goals in order to provide value. In e-discovery we should serve the prime directive, the dictates of Rule 1, for just, speedy, and inexpensive litigation. We should focus on legal services that provide that kind of social value. Profits to the firm should be secondary. As Zuckerberg said in the letter to potential investors:

Simply put: we don’t build services to make money; we make money to build better services.

This social value model is not naive, it works. It eventually creates huge financial rewards, as a number of e-discovery vendors and law firms are starting to realize. But that should never be the main point.

Conclusion

Facebook and Mark Zuckerberg should serve as an example to everyone, including e-discovery lawyers and vendors. I admit it is odd that we should have to turn to our youth for management guidance, but you cannot argue with success. We should study Zuckerberg’s 21st Century management style and Hacker Way philosophy. We can learn from its tremendous success. Zuckerberg and Facebook have proven that these management principles work in the digital age. It is true if it works. That is the pragmatic tradition of American philosophy. We live in fast changing times. Embrace change that works. As the face of Facebook says: “The riskiest thing is to take no risks.”


Day Ten of a Predictive Coding Narrative: A post hoc test of my hypothesis of insignificant false negatives

August 12, 2012

This is the seventh in a series of narrative descriptions of my predictive coding search of 699,082 Enron emails. My legal search methodology is predictive coding dominant, but includes the four other basic types of search in a process I call hybrid multimodal. The five elements of hybrid multimodal search are shown below using the Olympic rings symbol in honor of the great XXX Olympics concluding now in London.

The preceding narratives are:

In this seventh installment I continue my description, this time covering day ten of the project.

Post Hoc Analysis

In Day Ten I subject myself to another quality control check, another hurdle, to evaluate my decision in day eight to close the search. My decision to stop the search in day eight after five rounds of predictive coding was based on the hypothesis that I had already found all significant relevant evidence. In my opinion the only relevant documents that I had not found, which in information science would be called false-negatives, were not important to the case. They would have some probative value, but not much, certainly not enough to continue the search project.

Put another way, my supposition was that the only documents not found and produced would be technically relevant only, and of no real value. They would certainly not be highly relevant (one of my coding categories). Further, the relevant documents remaining were probably of a type that I had seen before. They were cumulative in nature and thus not worth the extra time, money and effort required to unearth them. See my Secrets of Search, Part III, where I expound on the two underlying principles at play here: Relevant Is Irrelevant and 7±2.

This tenth day exercise was a post hoc test because I had already concluded my search based on my hypothesis that all significant relevant documents had been discovered. I confirmed this hypothesis to my satisfaction in the previously described Day Nine elusion quality control test. This was a random sample test with a 99.9% accuracy finding. (This is to in no way intended to imply 99.9% recall. The elusion test is not intended to calculate recall.) In the elusion test I did a random sample test of all unreviewed documents to search for significant relevant evidence. Only one false negative out of a random sample of 1,065 was found and it was not significant. So I passed the test that was built into my quality control system. But would I now pass this additional post hoc test for significant false negatives?

Day Ten: 3 Hours

I start the day by initiating another round of predictive coding, the sixth round. It only takes a minute to start the process.

As I write this I am now waiting on Inview to do its thing and re-rank all 699,082 documents according to the new input I provided after the last session. This new input was described in Days Seven and Eight. It included my manual review and coding of two sets of 100 computer-selected training documents (total 200), plus review of all 51% plus predicted relevant documents.

At the end of day eight I had attained a count of 659 confirmed relevant documents and decided that enough was enough. I decided that any further rounds of predictive coding would likely just uncover redundant relevant documents of no real importance. I decided to stop the search, at least temporary, to see if I would pass a random sample elusion test for false negatives that I described in Day Nine.

As you know, the passed the test in Day Nine and so the project ended. And yet, here I am again, subjecting myself to yet another test. This Day Ten exercise is the result of my ethical wranglings described at the end of Day Nine.

Mea Culpa

I am still waiting on Inview to give me more information, but whatever the findings, when I now look back on day eight, it seems to me like I made a mistake to stop the search when I did. Even if I pass this latest self-imposed test, and the decision is proven to be correct, it was still a mistake to have stopped there. Hopefully, just a slight mistake, but a mistake just the same. I had already trained 200 documents. I had found one new Highly Relevant document. I had provided new training information for the computer. Why not just take a couple of more hours to see what impact this would have?

The lesson I learned from this, which I pass on to you here, is never to stop a project until you see the last report and results of training documents. Why guess that nothing of importance will come of the next training when it is easy enough to just run another round and find out? The answer, of course, is time and money, but here I guessed  that only a few new relevant documents would be found, so the costs of the extra effort would be negligible. In retrospect, I think I was too confident and should have trusted by instincts less and my software more. But I will soon see for myself if this was harmless error.

Moment of Truth

The Moment of truth came soon enough on a Sunday morning as I logged back on to Inview to see the results of the Sixth Round. I began by running a search for all 51%+ predicted relevant documents. The search took an unusually long time to run. In the meantime I stared at this screen.

Call me over-dramatic if you will, but I was getting nervous. What if I made a bad call in stopping before the sixth round?

Finally it completed. There were 566 documents found. So far so good. Slight sigh of relief. If it were thousands I would have been screwed. Remember, I had already coded 659 documents as relevant. The computer’s predicted relevant numbers were less than my last actuals.

After determining count, I sorted by categorization to see how many of the predicted relevant had not previously been categorized? In other words, how many of the 566 were new documents that I had not looked at before? Another slight sigh of relief. The answer was 51. These were 51 new documents that I would now need to look at for the final test. So far, this is all as predicted. But now to see if any of them were significant relevant. (Remember, I had predicted that some relevant would left, just not significant relevant.)

I noticed right away that 1 of the 51 documents had already been reviewed, but not categorized. I frequently did that for irrelevant documents of a type I had seen before. It was an Excel spreadsheet with voluntary termination  payout calculations. I still thought it was irrelevant. Now on to the 50 documents that I had not reviewed before.

The 50 New Documents

Four of the fifty were the same email with the subject Bids Open for Enron Trading Unit. They had a 71.3% prediction of relevance. It was an AP news article. It had to do with an upcoming bankruptcy sale of Enron contracts. It included discussion of employees complaining about Enron’s employee termination policy. Here is the relevant excerpt for your amusement. Note the reference to protesters carrying Moron signs.

It might be relevant, or might not. It was a newspaper article, nothing more. No comments by any Enron employees about it. I guess I would have to call it marginally relevant, but unimportant. There were now only 46 documents left to worry about.

The next document I looked at was a three page word document named Retention program v2.doc. It had to do with the payment of bonuses to keep employees from leaving during the Enron collapse. It had a 59.3% probable relevant prediction. I considered it irrelevant. There were several others like that.

Another document was an email dated November 15, 2001 concerning a rumor that Andy Fastow was entitled to a nine million dollar payout due to change in control of Enron. I remembered seeing this same email before. I checked, and I had seen and marked several copies of versions of this email before as marginally relevant. Nothing new at all in this email. There were several more document examples like that, about 25 altogether, documents that I had seen before in the exact same or similar form. Yes, they were relevant, but again duplicative or cumulative. It was a complete waste to time to look at these documents again.

I also ran into a few documents that were barely predicted relevant that had to do with voluntary termination and payment of severance for voluntary termination. The software was still having trouble making the differentiation between irrelevant voluntary and relevant involuntary. It was understandable in view of the circumstances. It was a grey area, but bottom line, none of these borderline documents presented were deemed relevant by me during this last quality control review.

One new relevant document was found, a two page spreadsheet named Mariner events.xls bearing control number 1200975. It had an agenda for a mass termination of employees on August 23, 2001. It apparently pertained to a subsidiary or office named Mariner. I had seen agendas like this before, but not this particular one for this particular office. I had called the other agendas relevant, so I would have to consider this one relevant too. But again, there was nothing especially important or profound about it.

In that same category as a new relevant document, but not important, I would include an email dated November 20, 2001, from Jim Fallon, bearing control number 11815873, who was trying to get his employment agreement changed to, among other things, provide benefits in case of termination.

The last document I considered seemed to address involuntary terminations and tax consequences of some kind concerning a so-called clickathome program. Frankly, I did not really understand what this was about from this email chain. The last date in the chain was June 15, 2001. The subject line is Clickathome – proposed Treatment for Involuntary Terminations – Business reorganizations. It has control number 15344649 and is three pages long. It was predicted 66.9% likely relevant. The emails look like they pertain to employees who are transferred from one entity to another, and does not really involve employment termination at all. I cannot be sure, but it certainly is not important in my mind. Here is a portion of the first page.

I was kind of curious as to what the clickathome program was that the emails referred to, so I Goggled it. At page two I found an Enron document that explained:

clickathome is Enron’s new program that gives eligible employees a computer and Internet connection (including broadband connectivity where available through program-approved vendors) for use at home.

Now I understood the reference in the email to a “PC forfeiture penalty.” I suppose maybe this email chain worrying about tax consequences of PC forfeiture in the clickathome program might be technically relevant, but again, of no importance. Just to be sure I was not missing anything, I also keyword searched the Enron database for clickathome and found 793 hits. I looked around and saw many emails and documents had been reviewed before and classified as irrelevant that pertained to the clickathome program where an Enron employee could get free PC from Dell. I was now comfortable that this email chain was also unimportant.

Hypothesis Tested and Validated

This meant that I was done. The second quality control test was over. Although I found 32 technically relevant documents as described above, no major relevant documents had been found. I had passed another test. (If you are still keeping score, the above additional review means I found a total of 691 relevant documents (659+4+25+1+1+1) out of my yield point projection at the beginning of the project of 928 likely relevant. That means a score of almost 75%. Not bad.)

It all went pretty much as expected and predicted at the end of Day Eight. I had wasted yet another perfectly good Sunday afternoon, but at least now I knew for sure that the sixth round was not necessary. My hypothesis that only marginally relevant documents would turn up in another round had been tested and validated.

I suppose I should feel happy or vindicated or something, but actually, tired and bored are the more accurate adjectives to describe my current mood. At least I am not feeling embarrassed, as I was concerned might happen.

By the way, the three hours that this last day took would have gone faster but for the many Internet disconnects I experienced while working from home. My three hours of reported time did not include the substantial write-up time, nor time waiting for the computer to train. Sigh. Test and writing is over. Time to jump in the pool!

Conclusion: Come On In, The Water’s Fine

I hope this longer than intended narrative fulfills its purpose and encourages more lawyers to jump in and use predictive coding and other advanced technologies. The water is fine. True, there are sharks in some pools, but they are outside the pool too. They are a fact of life in litigation today. Discovery As Abuse is a systemic problem, inherent in the adversarial model of justice. The abuses are by both sides, including requesters who make intentionally over-broad demands and drive up the costs every chance they get, and responders who play hide-the-ball. Predictive coding will not cure the systemic flaws, but it will lessen the bite.

The multimodal hybrid CAR with a predictive coding search engine can mitigate your risks and your expenses. More often than not, it can save you anywhere from 50% to 75% in review costs and improve recall. The new technology is win win for both requesting parties and responding parties. I urge everyone to give it a try.

When you go in and swim please remember the five rules of search safety. They were explained in my Secrets of Search trilogy in parts OneTwo, and Three and are shown below in another version of the Olympic rings.

These five, when coupled with the five Olympic rings of multimodal search shown at the top of this essay, provide a blueprint for effective legal search. These ten, shown as one large symbol below, are a kind of seed set of best-practices principles. The legal profession can use them as a beginning to develop full peer-reviewed standards for reasonable legal search. I join with Jason R. Baron and others in a call for these new standards.


%d bloggers like this: