TAR Course: 8th Class

March 17, 2017

Eighth Class: Keyword and Linear Review

This class covers Tested, Parametric Boolean Keyword Search and Linear Review.

Introduction to Keyword Search

The e-Discovery Team has extensive knowledge of how electronic document review is conducted by lawyers in the U.S. It comes with the territory of Losey’s work as an attorney in private practice with a large national law firm. He works with over fifty offices around the country dealing with e-discovery. He has seen it all. So too have the search experts that he works with at KrolLDiscovery. They handle even more cases. Bottom line, for most lawyers today keyword search is still king. They think that multimodal search means linear and keyword. Sad but true. It is as if the profession was stuck in the nineties. By taking this course you are joining an elite group.

The average lawyer in the U.S. knows only a little about legal search. The paucity of knowledge and skills is especially prevalent in lawyers in small to medium size cases and in lawyers who specialize in the representation of plaintiffs. Our knowledge of the practice of lawyers outside of the U.S. shows that things are pretty much the same world-wide.

We often must deal with opposing counsel who are mired in keywords, thinking it is the end-all and be-all of legal search. Moreover, they usually want to go about doing it without any testing. (We will go over some of the testing that they should be doing in this class.) Instead, they think they are geniuses who can just dream up good searches out of thin air. They cannot.

No one can, no matter what their intelligence. We know we cannot. Not unless we are already very familiar with the data-set in question through many prior reviews of that set. That kind of experience can give you the linguistic insight needed, at least for simple search projects. But even then, we know the limitations of keywrods (yes, intentional).

The inexperienced lawyers think they can guess right in every case, even without study of the data. They think they can guess right simply because they know what their legal complaint is about. They assume this knowledge somehow gives them special insights into what keywords were used by the witnesses in all relevant documents. This is delusional. They think they are state-of-the-art, but in fact they are using old search tools.

Knowledge of the case and law is not the same thing as knowledge of the documents. Moreover, inexperienced search lawyers have no idea as to the many limitations of keyword search. They are unacquainted with the scientific studies showing the poor recall using keyword search alone. Blair, David C., & Maron, M. E., An evaluation of retrieval effectiveness for a full-text document-retrieval system; Communications of the ACM Volume 28, Issue 3 (March 1985) (The study involved a 40,000 document case (350,000 pages). The lawyers, who were experts in keyword search, estimated that the Boolean searches they ran uncovered 75% of the relevant documents. In fact, they had only found 20%.). Also see: Grossman and Cormack, Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review, CoRR abs/1504.06868 (2015) at pgs. 2-3.

To put that into standard scientific language shown in the Search Quadrant below, most of the documents that the lawyers in the Blair & Maron study found were not relevant, they were False Positives. They thought they were relevant, but they were not. As a result their Recall was only 20%. That necessarily means that their False Negative rate was 80%. False Negatives are the relevant documents that they never found in their keyword searches.

Few keyword search obsessed attorneys have considered the substantial problem of false positives, meaning documents with the keywords that are not relevant. I cannot tell you how many times I see the word “complaint” in their keyword list. They also underestimate the problem of misspellings, odd language, special acronyms, nick-names and slang; not to mention intentional obfuscation. Here is a quick explanation of the Search Quadrant along with a war story of an document review project Losey was involved with.

____

Go FishThe guessing involved in blind negotiated keyword legal search has always reminded me of the child’s game of Go FishI wrote about this in 2009 and the Go Fish phrase caught on after Judge Peck and others started citing to that article, which later became a chapter in my book, Adventures in Electronic Discovery, 209-211 (West 2011). The Go Fish analogy appears to be the third most popular reference in predictive coding case-law, after the huge, Da Silva Moore case in 2012 that Judge Peck and I are best known for.

predictive_coding_chart-law

From our experience with thousands of lawyers in real world cases there is no doubt in our minds that keyword search is still the dominant method used by most attorneys. It is especially true in small to medium-sized firms, but also in larger firms that have no bona fide e-discovery search expertise. Many attorneys and paralegals who use a sophisticated, full featured document review platforms such as KrolLDiscovery’s EDR, still only use keyword search. They do not use the many other powerful search techniques of EDR, even though they are readily available to them. The Search Pyramid to them looks more like this, which I call a Dunce Hat.

distorted_search_pyramid

The AI at the top, standing for Predictive Coding, is, for average lawyers today, still just a far off remote mountain top; something they have heard about, but never tried. Or if they have tried, it was the early poorly designed methods, Predictive Coding 1.0 or 2.0. Those methods were flawed in many ways as I have detailed. Predictive Coding 3.0 article, part one. The use of a control set, which required SME review of thousands of irrelevant documents, was a big waste of time that did not work. The required disclosure of irrelevant documents was also flawed. We have now fixed these early mistakes and others. For that reason, even though AI-enhanced legal search is my specialty, I am not worried about the slow development. I am confident that this will all change soon. Our new, easier to use methods will help, so too will ever improving software by the few vendors left standing. I continue to try to push them, but it is like steering a battleship.

The judges are already doing their part. No judge has ever disapproved the use of predictive coding, although they do refuse to require it (so far). Hyles v. New York City, No. 10 Civ. 3119 (AT) (AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016). So far they refuse to force predictive coding largely because of an old Principle of the Sedona Conference, Principle Six:

Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.

Principle Six presumes that the responding party always knows best. So the producing party can refuse to use moderns tools and effective techniques for search and review if they want to. They can instead just use linear review, or use guessed keywords. This Principle is on shaky grounds these days, to say the least, especially when applied to legal search. SeeProtecting the Fourteen Crown Jewels of the Sedona Conference in the Third Revision of its Principles; and Sedona Principle Six: Overdue for an Overhaul (Ball in Your Court, October 2014). Also see: Ross-Williams, derivatively, on behalf of Sprint Nextel Corp. v. Sprint Nextel, Civil Action No. 11-cv-00890 (D.C., Kansas, 11/22/16) (Plaintiff’s counsel apparently chose linear review to run up a bill. One contract lawyer, Alexander Silow, spent 6,905 hours reviewing 48,443 documents at a charge of $1.5 million. The presiding Judge James Vano called the bill “Unbelievable!”).

In spite of these obstacles, we are none the less confident that change will come. Soon the profession’s unhealthy obsession with keyword search will end. The profession will eventually embrace the higher levels of the search pyramid, analytics and active machine learning. High-tech propagation is an inevitable result of the next generation of lawyers assuming leadership positions in law firms and legal departments. The old-timey paper lawyers around the world are finally retiring in droves. The aging out of current leadership is a good thing. Their over-reliance on untested keyword search to find evidence is holding back our whole justice system. The law must keep up with technology and lawyers must not fear math, science and AI. They must learn to keep up with technology. This is what will allow the legal profession to remain a bedrock of contemporary culture. It will happen. Positive disruptive change is just under the horizon and will soon rise.

Abuse is badIn the meantime we encounter opposing counsel everyday who think e-discovery means to dream up keywords and demand that every document that contains their keywords be produced. The more sophisticated of this confederacy of dunces understand that we do not have to produce them, that they might not all be per se relevant, but they demand that we review them all and produce the relevant ones. Fortunately we have the revised rules to protect our clients from these kind of disproportionate, unskilled demands. All too often this is nothing more than discovery as abuse.

This still dominant approach to litigation is really nothing more than an artifact of the old-timey paper lawyers’ use of discovery as a weapon. Let me speak plainly. This is nothing more than adversarial bullshit discovery with no real intent by the requesting party to find out what really happened. They just want to make the process as expensive and difficult as possible for the responding party because, well, that’s what they were trained to do. That is what they think smart, adversarial discovery is all about. Just another tool in their negotiate and settle, extortion approach to litigation. It is the opposite of the modern cooperative approach. That is one reason why so many lawyers still support Principle Six, even though it seems irrational as applied to legal search.

dino teachersI cannot wait until these dinosaurs retire so we can get back to the original intent of discovery, a cooperative pursuit of the facts. Fortunately, a growing number of our opposing counsel do get it. We are able to work very well with them to get things done quickly and effectively. That is what discovery is all about. Both sides save their powder for when it really matters, for arguments over the meaning of the facts, the governing law, and how the facts apply to this law for the result desired.

Tested, Parametric Boolean Keyword Search

The biggest surprise from our 2016 TREC research was just how well sophisticated, test-based keyword search can perform under the right circumstances. We are talking about hands-on, tested keyword search. This is not naive, Go Fish keyword guessing in the blind, although it can start that way. It is based on looking at documents and tests of a variety of keywords. It is based on human document review and human file scanning, by which we mean very quick review of portions of files, for instance, just of subject lines. It is based on sampling, usually judgment based sampling, not random. It is also based on strong keyword search software that has parametric and Boolean features.

In some projects tested Keyword Search works great.

When keyword search is done with skill and is based on the evidence seen, typically in a refined series of keyword searches, very high levels of Precision, Recall and F1 are sometimes attainable. Again, the dataset and other conditions must be just right for it to be that effective, as explained in the diagram: simple data, clear target and good SME. Sometimes keywords are the best way to find clear targets like names and dates.

In those circumstances the other search forms may not be needed to find the relevant documents, or at least to find almost all of the relevant documents. These are cases where the hybrid balance is tipped heavily towards the human searchers. All the AI does in these circumstances, when the human using keyword search is on a roll, is double-check and verify that it agrees that all relevant documents have been located. It is always nice to get a free second opinion from Mr. EDR. This is an excellent quality control and quality assurance application from our legal robot friends.

MrEdr_Caped

keywrodsearchWe are not going to try to go through all of the ins and outs of tested keyword search in this TAR Course. There are many variables and features available in most document review platforms today that make it easy to construct effective keyword searches and otherwise find similar documents. This is the kind of thing that Kroll and Losey teach to the e-discovery liaisons in Losey’s firm and other attorneys and paralegals handing electronic document reviews. The passive learning software features can be especially helpful, so too can simple indexing and clustering. Most software programs have important features to improve keyword search and make it more effective. All lawyers should learn the basic tested, keyword search skills.

There is far more to effective keyword search than a simple Google approach. (Google is concerned with finding websites, not recall of relevant evidence.) Still, in the right case, with the right data and easy targets, keywords can open the door to both high recall and precision. But, even then, for keyword search to work, even in those simple projects, must be tested, use metadata parameters and Boolean logic. Naive keyword search, the untested Go Fish variety, does not work, even with simple projects. That is one of the things we tested an proved in our post-hoc analysis of TREC 2016. See MrEDR.com for the TREC reports.

Moreover, we found in 2015 and 2016 TREC that keyword search, even tested and sophisticated, does not work well in complex cases or with dirty data. It certainly has its limits and there is a significant danger in over reliance on keyword search. It is typically very imprecise and can all to easily miss unexpected word usage and misspellings. That is one reason that the e-Discovery Team always supplements keyword search with a variety of other search methods, including predictive coding.

Focused Linear Search – Key Dates & People

Close up of Lincoln's face on April 10, 1865In Abraham Lincoln’s day all a lawyer had to do to prepare for a trial was talk to some witnesses, talk to his client and review all of the documents the clients had that could possibly be relevant. All of them. One right after the other. In a big case that might take an hour. Flash forward one hundred years to the post-photocopier era of the 1960s and document review, linear style reviewing them all, might take a day. By the 1990s it might take weeks. With the data volume of today such a review would take years.

All document review was linear up until the 1990s. Until that time almost all documents and evidence were paper, not electronic. The records were filed in accordance with an organization wide filing system. They were combinations of chronological files and alphabetical ordering. If the filing was by subject then the linear review conducted by the attorney would be by subject, usually in alphabetical order. Otherwise, without subject files, you would probably take the data and read it in chronological order. You would certainly do this with the correspondence file. This was done by lawyers for centuries to look for a coherent story for the case. If you found no evidence of value in the papers, then you would smile knowing that your client’s testimony could not be contradicted by letters, contracts and other paperwork.

Clarence Darrow and William Jennings Bryan

This kind of investigative, linear review still goes on today. But with today’s electronic document volumes the task is carried out in warehouses by relatively low paid, document review contract lawyers. By itself it is a fool’s errand, but it is still an important part of a multimodal approach.

Document_reviewers

There is nothing wrong with Focused Linear Search when used in moderation. And there is nothing wrong with document review contract-lawyers, except that they are underpaid for their services, especially the really good ones. I am a big fan of document review specialists.

Review_Consistency_RatesLarge linear review projects can be expensive and difficult to manage. Moreover, it typically has only limited use. It breaks down entirely when large teams are used because human review is so inconsistent in document analysis. Losey, R., Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” (parts OneTwo and three) (December 8, 2013, e-Discovery Team). When review of large numbers of documents are involved the consistency rate among multiple human reviewers is dismal. Also see: Roitblat, Predictive Coding with Multiple Reviewers Can Be Problematic: And What You Can Do About It (4/12/16).

Still, linear review can be very helpful in limited time spans and in reconstruction of a quick series of events, especially communications. Knowing what happened one day in the life of a key custodian can sometimes give you a great defense or great problem. Either are rare. Most of the time Expert Manual Review is helpful, but not critical. That is why Expert Manual Review is at the base of the Search Pyramid that illustrates our multimodal approach.

search_pyramid_revised

An attorney’s knowledge, wisdom and skill are the foundation of all that we do, with or without AI. The information that an attorney holds is also of value, especially information about the latest technology, but the human information roles are diminishing. Instead the trend is to delegate mere information level services to automated systems. The legal robots would not be permitted to go beyond information fulfillment roles and provide legal advice based on human knowledge and wisdom. Their function would be constrained to Information processing and reports.  The metrics and technology tools they provide can make it easier for the human attorneys to build a solid evidentiary foundation for trial.

Go on to Class Nine.

Or pause to do this suggested “homework” assignment for further study and analysis.

SUPPLEMENTAL READING:  Read the Go Fish article and then look for other articles and cases that mention it. Can you find any defenses at all to this still, very common approach to legal search? Consider what other games might apply as a good analogy for the untested, guessing based approach to locating evidence.  Also read the articles cited in this class on the Sedona Principle Six. See if you can find contra articles that defend Principle Six as it pertains to legal search. Read them and come to your own conclusion on this controversy. Also review the cited articles on the limits of keyword search and Blair and Moran’s work in the 1980s. Finally, if you have not already done so, read the latest, revised version of the e-Discovery Team’s Final Report for 2016 TREC. Study the findings and discussion on keyword search.

EXERCISES: Speculate as to why the guessing approach still seems so popular in the legal profession as a method to find evidence. How does the success of Google search play a part in lawyer preoccupation with keyword guessing? Why do you think the vast majority of lawyers still prefer the Dunce Hat approach to legal search where keyword and linear search are king?

On the issue of the Sedona Principles, consider your own position on Six. Speculate on why this Principle, unlike the others, has never changed. Try to understand both sides of this issue. Consider especially the inherent tension between Rule 26(b)(1) proportionality, which makes review costs central, and the responding parties decision to use expensive, ineffective methods to conduct the review. Check out the details of the post-settlement hearings in Ross-Williams, derivatively, on behalf of Sprint Nextel Corp. v. Sprint Nextel, Civil Action No. 11-cv-00890 (D.C., Kansas, 11/22/16) (Plaintiff’s counsel apparently chose linear review to run up a bill).

Finally, on Linear review, consider why we keep this ancient method on the search pyramid and still consider it a useful method to keep in a multimodal tool belt. Have you ever tried looking at the day in the life of a custodian’s email? That means looking at all of their email in and out on a particular day. What lessons did you learn about the issues and the custodian? It can be a good way to get into a custodian’s head, to see what they are like and how they operate.

Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!

_

e-Discovery Team LLC COPYRIGHT 2017

ALL RIGHTS RESERVED

_


Lawyers’ Job Security in a Near Future World of AI, the Law’s “Reasonable Man Myth” and “Bagley Two” – Part One

January 15, 2017

bad-robotDoes the inevitable triumph of AI robots over human reason and logic mean that the legal profession is doomed? Will Watson be the next generation’s lawyer of choice? I do no think so and have written many articles on why, including two last year: Scientific Proof of Law’s Overreliance On Reason: The “Reasonable Man” is Dead and the Holistic Lawyer is Born; and The Law’s “Reasonable Man,” Judge Haight, Love, Truth, Justice, “Go Fish” and Why the Legal Profession Is Not Doomed to be Replaced by Robots. In the Reasonable Man article I discussed how reasonability is the basis of the law, but that it is not objective. It depends on many subjective factors, on psychology. In the Scientific Proof article I continued the argument and argued:

The Law’s Reasonable Man is a fiction. He or she does not exist. Never has, never will. All humans, including us lawyers, are much more complex than that. We need to recognize this. We need to replace the Law’s reliance on reason alone with a more realistic multidimensional holistic approach.

Scientific Proof Article

brain_gears_NOTo help make my argument in the Scientific Proof article I relied on the analysis of Thomas H. Davenport and Julia Kirby in Only Humans Need Apply: Winners and Losers in the Age of Smart Machines (Harper 2016) and on the scientific work of Dan Ariely, a Professor of Psychology and Behavioral Economics at Duke University.

I cite to Only Humans Need Apply: Winners and Losers in the Age of Smart Machines to support my thesis:

Although most lawyers in the profession do not know it yet, the non-reasoning aspects of the Law are its most important parts. The reasoning aspects of legal work can be augmented. That is certain. So will other aspects, like reading comprehension. But the other aspects of our work, the aspects that require more than mere reason, are what makes the Law a human profession. These job functions will survive the surge of AI.

If you want to remain a winner in future Law, grow these aspects. Only losers will hold fast to reason. Letting go of the grip of the Reasonable Man, by which many lawyers are now strangled, will make you a better lawyer and, at the same time, improve your job security.

Also see Dean Gonsowski, A Clear View or a Short Distance? AI and the Legal Industry; and, Gonsowski, A Changing World: Ralph Losey on “Stepping In” for e-Discovery, (Relativity Blog).

Professor Ariely has found from many experiments that We’re All Predictably Irrational. In my article, Scientific ProofI point my readers to his many easily accessible video talks on the subject. I consider the implication of Professor Ariely’s research on the law:

Our legal house needs a new and better foundation than reason. We must follow the physicists of a century ago. We must transcend Newtonian causality and embrace the more complex, more profound truth that science has revealed. The Reasonable Man is a myth that has outlived its usefulness. We need to accept the evidence, and move on. We need to develop new theories and propositions of law that confirm to the new facts at hand. Reason is just one part of who we are. There is much more to us then that: emotion, empathy, creativity, aesthetics, intuition, love, strength, courage, imagination, determination – to name just a few of our many qualities. These things are what make us uniquely human; they are what separate us from AI. Logic and reason may end up being the least of our abilities, although they are still qualities that I personally cherish. …

Davinci_whole_manSince human reason is now known to be so unreliable, and is only a contributing factor to our decisions, on what should we base our legal jurisprudence? I believe that the Reasonable Man, now that he is known to be an impossible dream, should be replaced by the Whole Man. Our jurisprudence should be based on the reality that we are not robots, not mere thinking machines. We have many other faculties and capabilities beyond just logic and reason. We are more than math. We are living beings. Reason is just one of our many abilities.

So I propose a new, holistic model for the law. It would still include reason, but add our other faculties. It would incorporate our total self, all human attributes. We would include more than logic and reason to judge whether behavior is acceptable or not, to consider whether a resolution of a dispute is fair or not. Equity would regain equal importance.

A new schemata for a holistic jurisprudence would thus include not just human logic, but also human emotions, our feelings of fairness, our intuitions of what is right and just, and multiple environmental and perceptual factors. I suggest a new model start simple and use a four-fold structure like this, and please note I keep Reason on top, as I still strongly believe in its importance to the Law.

4-levels-Holistic_Law_pyramid

My Scientific Proof article included a call to action, the response to which has been positive:

The legal profession needs to take action now to reduce our over-reliance on the Myth of the Reasonable Man. We should put the foundations of our legal system on something else, something more solid, more real than that. We need to put our house in order before it starts collapsing around us. That is the reasonable thing to do, but for that very reason we will not start to do it until we have better motivation than that. You cannot get people to act on reason alone, even lawyers. So let us engage the other more powerful motivators, including the emotions of fear and greed. For if we do not evolve our work to focus on far more than reason, then we will surely be replaced.

cyborg-lawyer

AI can think better and faster, and ultimately at a far lower cost. But can AI reassure a client? Can it tell what a client really wants and needs. Can AI think out of the box to come up with new, creative solutions. Can AI sense what is fair? Beyond application of the rules, can it attain the wisdom of justice. Does it know when rules should be bent and how far? Does it know, like any experienced judge knows, when rules should be broken entirely to attain a just result? Doubtful.

I go on to make some specific suggestions, just to start the dialogue, and then closed with the following:

We must move away from over-reliance on reason alone. Our enlightened self-interest in continued employment in the rapidly advancing world of AI demands this. So too does our quest to improve our system of justice, to keep it current with the rapid changes in society.

Where we must still rely on reason, we should at the same time realize its limitations. We should look for new technology based methods to impose more checks and balances on reason than we already have. We should create new systems that will detect and correct the inevitable errors in reason that all humans make – lawyers, judges and witnesses alike. Bias and prejudice must be overcome in all areas of life, but especially in the justice system.

Computers, especially AI, should be able to help with this and also make the whole process more efficient. We need to start focusing on this, to make it a priority. It demands more than talk and thinking. It demands action. We cannot just think our way out of a prison of thought. We need to use all of our faculties, especially our imagination, creativity, intuition, empathy and good faith.

Reasonable Man Article

Reasonable_man_cloudTo help make my argument in the earlier blog, The Law’s “Reasonable Man,” Judge Haight, Love, Truth, Justice, “Go Fish” and Why the Legal Profession Is Not Doomed to be Replaced by Robots, I quoted extensively from an Order Denying Defendant’s Motion for Protective Order. The order arose out of a routine employment discrimination case. Bagely v. Yale, Civil Action No. 3:13-CV-1890 (CSH) (Doc. 108) (order dated April 27, 2015). The Order examined the “reasonability” of ESI accessibility under Rule 26(b)(2)(B) and the “reasonable” efforts requirements under Rule 26(b). I used language of that Bagley Order to help support my argument that there is far more to The Law than mere reason and logic. I also argued that this is a very good thing, for otherwise lawyers could easily be replaced by robots.

Another e-discovery order was entered in Bagley on December 22, 2016. Ruling On Plaintiff’s Motion To Compel. Bagely v. Yale, Civil Action No. 3:13-CV-1890 (CSH). Bagley Two again provokes me to write on this key topic. This second order, like the first, was written by Senior District Judge Charles S. Haight, Jr.. The eighty-six year old Judge Haight is becoming one of my favorite legal scholars because of his excellent analysis and his witty, fairly transparent writing style. This double Yale graduate has a way with words, especially when issuing rulings adverse to his alma mater. He is also one of the few judges that I have been unable to locate an online photo of, so use your imagination, which, by the way, is another powerful tool that separates us from AI juiced robots.

Lady JusticeI pointed out in the Reasonable Man article, and it bears repetition, that I am no enemy of reason and rationality. It is a powerful tool in legal practice, but it is hardly our only tool. It is one of many. The “Reasonable Man” is one of the most important ideas of Law, symbolized by the balance scales, but it is not the only idea. In fact, it is not even the most important idea for the Law. That honor goes to Justice. Lady Justice holding the scales of reason is the symbol of the Law, not the scales alone. She is usually depicted with a blindfold on, symbolizing the impartiality of justice, not dependent on the social status or position of the litigants.

My view is that lawyer reasoning should continue in all future law, but should augmented by artificial intelligence. With machines helping to rid us of hidden biases in all human reason, and making that part of our evaluation easier and more accurate, we are free to put more emphasis on our other lawyer skills, on the other factors that go into our evaluation of the case. These include our empathy, intuition, emotional intelligence, feelings, humor, perception (including lie detection), imagination, inventiveness and sense of fairness and justice. Reason is only one of many human capacities involved in legal decision making.

In Reasonable Man article I analyzed the first Bagley Order to help prove that point:

Bagley shows that the dividing line between what is reasonable and thus acceptable efforts, and what is not, can often be difficult to determine. It depends on a careful evaluation of the facts, to be sure, but this evaluation in turn depends on many subjective factors, including whether one side or another was trying to cooperate. These factors include all kinds of prevailing social norms, not just cooperativeness. It also includes personal values, prejudices, education, intelligence, and even how the mind itself works, the hidden psychological influences. They all influence a judge’s evaluation in any particular case as to which side of the acceptable behavior line a particular course of conduct falls.

In close questions the subjectivity inherent in determinations of reasonability is obvious. This is especially true for the attorneys involved, the ones paid to be independent analysts and objective advisors. People can, and often do, disagree on what is reasonable and what is not. They disagree on what is negligent and what is not. On what is acceptable and what is not.

All trial lawyers know that certain tricks of argument and appeals to emotion can have a profound effect on a judge’s resolution of these supposedly reason-based disagreements. They can have an even more profound affect on a jury’s decision. (That is the primary reason that there are so many rules on what can and cannot be said to a jury.)

lady_justice_not_blindIn spite of practical knowledge by the experienced, the myth continues in our profession that reasonability exists in some sort of objective, platonic plane of ideas, above all subjective influences. The just decision can be reached by deep, impartial reasoning. It is an article of faith in the legal profession, even though experienced trial lawyers and judges know that it is total nonsense, or nearly so. They know full well the importance of psychology and social norms. They know the impact of cognitive biases of all kinds, including, for example, hindsight biasSee Roitblat, The Schlemiel and the Schlimazel and the Psychology of Reasonableness (Jan. 10, 2014, LTN) (link is to republication by a vendor without attribution) (“tendency to see events that have already occurred as being more predictable than they were before they actually took place“); Also see Rimkus v Cammarata, 688 F. Supp. 2d 598 (S.D. Tex. 2010) (J. Rosenthal) (“It can be difficult to draw bright-line distinctions between acceptable and unacceptable conduct in preserving information and in conducting discovery, either prospectively or with the benefit (and distortion) of hindsight.” emphasis added); Pension Committee of the University of Montreal Pension Plan, et al. v. Banc of America Securities, LLC, et al., 685 F. Supp. 2d 456 (S.D.N.Y. Jan. 15, 2010 as amended May 28, 2010) at pgs. 463-464 (J. Scheindlin) (‘That is a judgment call that must be made by a court reviewing the conduct through the backward lens known as hindsight.” emphasis added).

In my conclusion to Reasonable Man article I summarized my thoughts and tried to kick off further discussion of this topic:

The myth of objectivity and the “Reasonable Man” in the law should be exposed. Many naive people still put all of their faith in legal rules and the operation of objective, unemotional logic. The system does not really work that way. Outsiders trying to automate the law are misguided. The Law is far more than logic and reason. It is more than the facts, the surrounding circumstances. It is more than evidence. It is about people and by people. It is about emotion and empathy too. It is about fairness and equity. It’s prime directive is justice, not reason.

That is the key reason why AI cannot automate law, nor legal decision making. Judge Charles (“Terry”) Haight could be augmented and enhanced by smart machines, by AI, but never replaced. The role of AI in the Law is to improve our reasoning, minimize our schlemiel biases. But the robots will never replace lawyers and judges. In spite of the myth of the Reasonable Man, there is far more to law then reason and facts. I for one am glad about that. If it were otherwise the legal profession would be doomed to be replaced by robots.

Bagley Two

Now let us see how Judge Haight once again helps prove the Reasonable Man points by his opinion in Bagley Two. Ruling On Plaintiff’s Motion To Compel (December 22, 2016), Bagely v. Yale, Civil Action No. 3:13-CV-1890 (CSH). In this opinion the reasonability of defendant Yale’s preservation efforts was considered in the context of a motion to compel discovery. His order again reveals the complexity and inherent subjectivity of all human reason. It shows that there are always multiple factors at work in any judge’s decision beyond just thought and reason, including an instinct born out of long experience for fairness and justice. Once again I will rely primarily on Judge Haight’s own words. I do so because I like the way he writes and because you need to read his original words to appreciate what I am talking about. But first, let me set the stage.

Reasonable_guageYale sent written preservation notices to sixty-five different people, which I know from thousands of matters is a very large number of custodians to put on hold in a single-plaintiff discrimination case. But Yale did so in stages, starting on March 1, 2013 and ending on August 7, 2014. Eight different times over this period they kept adding people to their hold list. The notices were sent by Jonathan Clune, a senior associate general counsel of Yale University. The plaintiff argued that they were too late in adding some of the custodians and otherwise attacked the reasonability of Yale’s efforts.

The plaintiff was not seeking sanctions yet for the suspected unreasonable efforts, they were seeking discovery from Yale as to details of these efforts. Specifically they sought production of: (1) the actual litigation hold notices; (2) the completed document preservation computer survey forms that were required to be returned to the Office of General Counsel by each Litigation Hold Recipient; and, (3) an affidavit detailing the retention and production for all non-ESI documents collected from each of the Litigation hold Recipients.

Yale opposed this discovery claiming any more information as to its preservation efforts was protected from discovery under the attorney-client privilege and attorney work product protection.  Yale also argued that even if the privileges did not apply here, the discovery should still be denied because to obtain such information a party must first provide convincing proof that spoliation in fact occurred. Yale asserted that the plaintiff failed to provide sufficient proof, or even any proof, that spoliation had in fact occurred.

Here is the start of Judge Haight’s evaluation of the respective positions:

Mr. Clune’s litigation hold notices stressed that a recipient’s failure to preserve pertinent documents could “lead to legal sanctions” against Yale. Clune was concerned about a possible sanction against Yale for spoliation of evidence. While Clune’s notices did not use the term, “spoliation” is a cardinal litigation vice, known by that name to trial lawyers and judges, perhaps unfamiliar to academics unable to claim either of those distinctions. Clune’s notices made manifest his concern that a trial court might sanction Yale for spoliation of evidence relevant to the University SOM’s decision not to reappoint Bagley to its faculty.

skull_bones_yaleNote the jab at academics. By the way, in my experience his observation is correct about the cluelessness of most law professors when it comes to e-discovery. But why does Judge Haight take the time here to point that out? This case did not involve the Law School. It involved the business school professors and staff (as you would expect). It is important to know that Judge Haight is a double Yale graduate, both undergraduate and law school. He graduated from Yale Law in 1955. He was even a member of Yale’s infamous of Skull and Bones society. (What does 322 really mean? Eulogia?) Perhaps there are some underlying emotions here? Judge Haight does seem to enjoy poking Yale, but he may do that in all his cases with Yale out of an eccentric kind of good humor, like a friendly shoulder punch. But I doubt it.

To be continued … 


The Law’s “Reasonable Man,” Judge Haight, Love, Truth, Justice, “Go Fish” and Why the Legal Profession Is Not Doomed to be Replaced by Robots

June 29, 2016

Reasonable_guageReasonability is a core concept of the law and foundation of our system of justice. Reason, according to accepted legal doctrine, is how we judge the actions of others and determine right from wrong. We do not look to Truth and Love for Justice, we look to Truth and Reason. If a person’s actions are reasonable, then, as a general matter, they are good and should not be punished, no matter what the emotional motives behind the actions. It is an objective standard. Actions judged as unreasonable are not good, no matter the emotional motive (think mercy killing).

Irrational actions are discouraged by law, and, if they cause damages, they are punished. The degree of punishment slides according to how unreasonable the behavior was and the extent of damages caused. Bad behavior ranges from the barely negligent – a close question – to intentionally bad, scienter. Analysis of reasonability in turn always depends on the facts and circumstances surrounding the actions being judged.

Reasonability Depends on the Circumstances

Justice_scaleWhenever a lawyer is asked a legal question they love to start the answer by pointing that it all depends. We are trained to see both sides, to weigh the evidence. We dissect, access and evaluate degrees of reasonability according to the surrounding circumstances. We deal with reason, logic and cold hard facts. Our recipe for justice is simple: add reason to facts and stir well.

The core concept of reasonability not only permeates negligence and criminal law, it underlies discovery law as well. We are constantly called upon the evaluate the reasonability of efforts to save, find and produce electronically stored information. This evaluation of reasonability always depends on the facts. It requires more than information. It requires knowledge of what the information means.

Perfect efforts are not required in the law, but reasonable efforts are. Failure to make such efforts can be punished by the court, with the severity of the punishment contingent on the degree of unreasonability and extent of damages. Again, this requires knowledge of the true facts of the efforts, the circumstances.

justice_guage_negligenceIn discovery litigants and their lawyers are not permitted to make anything less than reasonable efforts to find the information requested. They are not permitted to make sub-standard, negligent efforts, and certainly not grossly negligence efforts. Let us not even talk about intentionally obstructive or defiant efforts. The difference between good enough practice – meaning reasonable efforts – and malpractice is where the red line of negligence is drawn.

Bagely v. Yale

Yale Law Professor Constance Bagley

Professor Constance Bagley

One of my favorite district court judges – 86-year old Charles S. Haight – pointed out the need to evaluate reasonability of e-discovery efforts in a well-known, at this time still ongoing employment discrimination case. Bagely v. Yale, Civil Action No. 3:13-CV-1890 (CSH). See eg. Bagley v. Yale University, 42 F. Supp. 3d 332 (DC, Conn. 2014). On April 27, 2015, Judge Haight considered Defendant’s Motion for Protective Order.

The plaintiff, Constance Bagley, wanted her former employer, Yale University, to look through the emails of more witness to respond to her request for production. The defendant, Yale University, said it had already done enough, that it had reviewed the emails of several custodians, and should not be required to do more. Judge Haight correctly analyzed this dispute as requiring his judgment on the reasonability of Yale’s efforts. He focused on Rule 26(b)(2)(B) involving the “reasonable accessibility” of certain ESI and the reasonable efforts requirements under then Rule 26(b)(2)(C) (now 26(b)(1) – proportionality factors under the 2015 Rules Amendments). In the judge’s words:

Yale can — indeed, it has — shown that the custodians’ responsive ESI is not readily accessible. That is not the test. The question is whether this information is not reasonably accessible: a condition that necessarily implies some degree of effort in accessing the information. So long as that creature of the common law, the reasonable man,[6] paces the corridors of our jurisprudence, surrounding circumstances matter.

[6] The phrase is not gender neutral because that is not the way Lord Coke spoke.

Bagley v. Yale, Ruling on Defendant’s Motion for Protective Order (Doc. 108) (April 27, 2015) (emphasis added).

The Pertinent e-Discovery Facts of Bagley v. Yale

kiss_me_im_a_custodian_keychainJudge Haight went on to deny the motion for protective order by defendant Yale University, his alma mater, by evaluation of the facts and circumstances. Here the plaintiff originally wanted defendant to review for relevant documents the ESI that contained certain search terms of 24 custodians. The parties later narrowed the list of terms and reduced the custodian count from 24 to 10. The defendant began a linear review of each and every document. (Yes, their plan was to have a paralegal or attorney look at each any every document with a hit, instead of more sophisticated approaches, i.e. – concept search or predictive coding.) Here is Judge Haight’s description:

Defendants’ responsive process began when University staff or attorneys commandeered — a more appropriate word than seized — the computer of each of the named custodians. The process of ESI identification and production then “required the application of keyword searches to the computers of these custodians, extracting the documents containing any of those keywords, and then reading every single document extracted to determine whether it is responsive to any of the plaintiff’s production requests and further to determine whether the document is privileged.” Defendants’ Reply Brief [Doc. 124], at 2-3. This labor was performed by Yale in-house paralegals and lawyers, and a third-party vendor the University retained for the project.

Go FishIt appears from the opinion that Yale was a victim of a poorly played game of Go Fish where each side tries to find relevant documents by guessing keywords without study of the data, much less other search methods. Losey, R., Adventures in Electronic Discovery (West 2011); Child’s Game of ‘Go Fish’ is a Poor Model for e-Discovery Search. This is a very poor practice, as I have often argued, and frequently results in surprise burdens on the producing party.

This is what happened here. As Judge Haight explained, Yale did not complain of these keywords and custodian count (ten instead of five), until months later when the review was well underway:

[I]t was not until the parties had some experience with the designated custodians and search terms that the futility of the exercise and the burdens of compliance became sufficiently apparent to Defendants to complain of them.

go fishToo bad. If they had tested the keywords first before agreeing to review all hits, instead of following the Go Fish approach, none of this would have happened. National Day Laborer Organizing Network v. US Immigration and Customs Enforcement Agency, 877 F.Supp.2d 87 (SDNY, 2012) (J. Scheindlin) (“As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: “In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of `Go Fish’ … keyword searches usually are not very effective.” FN 113“); Losey, R., Poor Plaintiff’s Counsel, Can’t Even Find a CAR, Much Less Drive One (9/1/13).

After reviewing the documents of only three custodians, following the old-fashioned, buggy-whip method of looking at one document after another (linear review), the defendant complained as to the futility of their effort to the judge. They alleged that the effort:

… required paralegals and lawyers to review approximately 13,393 files, totaling 4.5 gigabytes, or the equivalent of about 450,000 pages of emails. Only 6% of this data was responsive to Plaintiff’s discovery request: about 300 megabytes, or about 29,300 pages of emails. In excess of 95% of this information, while responsive to the ESI request, has absolutely nothing to do with any of the issues in this case. Thus, defendants’ lawyers and paralegals reviewed approximately 450,000 pages of material in order to produce less than 1,500 pages of information which have any relationship whatsoever to this dispute; and the majority of the 1,500 pages are only marginally relevant.

ShiraScheindlin_sketchI do not doubt that at all. It is typical in cases like this. What do you expect from blind negotiated keyword search and linear review? For less effort try driving a CAR instead of walking. As Judge Scheindlin said in National Day Laborer back in 2012:

There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere.[114] There is a “need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or `keywords’ to be used to produce emails or other electronically stored information.”[115] And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.[116] Through iterative learning, these methods (known as “computer-assisted” or “predictive” coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies’ unsupported assertions that their lay custodians have designed and conducted a reasonable search.

National Day Laborer Organizing Network, supra 877 F.Supp.2d at pgs. 109-110.

Putting aside the reasonability of search and review methods selected, an issue never raised by the parties and not before the court, Judge Haight addressed whether the defendant should be required to review all ten custodians in these circumstances. Here is Judge Haight’s analysis:

Prior to making this motion, Yale had reviewed the ESI of a number of custodians and produced the fruits of those labors to counsel for Bagley. Now, seeking protection from — which in practical terms means cessation of — any further ESI discovery, the University describes in vivid, near-accusatory prose the considerable amount of time and treasure it has already expended responding to Bagley’s ESI discovery requests: an exercise which, in Yale’s non-objective and non-binding evaluation, has unearthed no or very little information relevant to the lawsuit. Yale’s position is that given those circumstances, it should not be required to review any additional ESI with a view toward producing any additional information in discovery. The contention is reminiscent of a beleaguered prizefighter’s memorable utterance some years ago: “No mas!” Is the University entitled to that relief? Whether the cost of additional ESI discovery warrants condemnation of the total as undue, thereby rendering the requested information not reasonably accessible to Yale, presents a legitimate issue and, in my view, a close question.

Judge Charles Haight (“Terry” to his friends) analyzed the facts and circumstances to decide whether Yale should continue its search and review of four more custodians. (It was five more, but Yale reviewed one while the motion was pending.) Here is his summary:

Defendants sum up the result of the ESI discovery they have produced to Plaintiff to date in these terms: “In other words, of the 11.88 gigabytes of information[3](which is the equivalent of more than 1 million pages of email files) that has so far been reviewed by the defendant, only about 8% of that information has been responsive and non-privileged. Furthermore, only a small percentage of those documents that are responsive and non-privileged actually have any relevance to the issues in this lawsuit.” Id., at 4-5.  . . .

[3] 11.88 gigabytes is the total of 4.5 gigabytes (produced by review of the computers of Defendant custodians Snyder, Metrick and Rae) and 7.38 gigabytes (produced by review of the computers of the additional five custodians named in text).

Defendants assert on this motion that on the basis of the present record, “the review of these remaining documents will amount to nothing more than a waste of time and money. This Court should therefore enter a protective order relieving the defendant[s] from performing the requested ESI review.” Id.  . . .

Ruling in Bagley v. Yale

gavelJudge Haight, a wise senior judge who has seen and heard it all before, found that under these facts Yale had not yet made a reasonable effort to satisfy their discovery obligations in this case. He ordered Yale to review the email of four more custodians. That, he decided, would be a reasonable effort. Here is Judge Haight’s explanation of his analysis of reasonability, which, in my view, is unaffected by the 2015 Rule Amendments, specifically the change to Rule 26(b)(1).

In the case at bar, the custodians’ electronically stored information in its raw form was immediately accessible to Yale: all the University had to do was tell a professor or a dean to hand over his or her computer. But Bagley’s objective is to discover, and Defendants’ obligation is to produce, non-privileged information relevant to the issues: Yale must review the custodians’ ESI and winnow it down. That process takes time and effort; time and effort can be expensive; and the Rule measures the phrase “not reasonably accessible” by whether it exposes the responding party to “undue cost.” Not some cost: undue cost, an adjective Black’s Law Dictionary (10th ed. 2014 at 1759) defines as “excessive or unwarranted.” . . .

In the totality of circumstances displayed by the case at bar, I think it would be an abuse of discretion to cut off Plaintiff’s discovery of Defendants’ electronically stored information at this stage of the litigation. Plaintiff’s reduction of custodians, from the original 24 targeted by Defendants’ furiously worded Main Brief to the present ten, can be interpreted as a good-faith effort by Plaintiff to keep the ESI discovery within permissible bounds. Plaintiff’s counsel say in their Opposing Brief [Doc. 113] at 2: “Ironically, this last production includes some of the most relevant documents produced to date.” While relevance, like beauty, often lies in the eyes of the beholder, and Defendants’ counsel may not share the impressions of their adversaries, I take the quoted remark to be a representation by an officer of the Court with respect to the value and timing of certain evidence which has come to light during this discovery process. The sense of irritated resignation conveyed by the familiar aphorism — “it’s like looking for a needle in a haystack” — does not exclude the possibility that there may actually be a needle (or two or three) somewhere in the haystack, and sharp needles at that. Plaintiff is presumptively entitled to search for them.

As Judge Haight understood when he said that the “Plaintiff is presumptively entitled to search for them,” the search effort is actually upon the defendant, not the plaintiff. The law requires the defendant to expend reasonable efforts to search for the needles in the haystack that the plaintiff would like to be found. Of course, if those needles are not there, no amount of effort can find them. Still, no one knows that in advance (although probabilities can be calculated), whether there are hot documents left to be found, so reasonable efforts are often required to show they are not there. This can be difficult as any e-discovery lawyer well knows.

Faced with this situation most e-discovery specialists will tell you the best solution is to cooperate, or at least try. If your cooperative efforts fail and you seek relief from the court, it needs to be clear to the judge that you did try. If the judge thinks you are just another unreasonable, over-assertive lawyer, your efforts are doomed. This is apparently part of what was driving Judge Haight’s analysis of “reasonable” as the following colorful, one might say “tasty,” quote from the opinion shows:

A recipe for a massive and contentious adventure in ESI discovery would read: “Select a large and complex institution which generates vast quantities of documents; blend as many custodians as come to mind with a full page of search terms; flavor with animosity, resentment, suspicion and ill will; add a sauce of skillful advocacy; stir, cover, set over high heat, and bring to boil. Serves a district court 2-6 motions to compel discovery or for protection from it.”

Yale_pot_boiling

You have got to love a judge with wit and wisdom like that. My only comment is that truly skillful advocacy here would include cooperation, and lots of it. The sauce added in that case would be sweet and sour, not just hot and spicy. It should not give a judge any indigestion at all, much less six motions. That is one reason why Electronic Discovery Best Practices (EDBP.com) puts such an emphasis on skillful cooperation.

EDBP.com You are free to use this chart in any manner so long as you do not chnage it.

What is Reasonable?

Reasonable_man_cloudBagley shows that the dividing line between what is reasonable and thus acceptable efforts, and what is not, can often be difficult to determine. It depends on a careful evaluation of the facts, to be sure, but this evaluation in turn depends on many subjective factors, including whether one side or another was trying to cooperate. These factors include all kinds of prevailing social norms, not just cooperativeness. It also includes personal values, prejudices, education, intelligence, and even how the mind itself works, the hidden psychological influences. They all influence a judge’s evaluation in any particular case as to which side of the acceptable behavior line a particular course of conduct falls.

In close questions the subjectivity inherent in determinations of reasonability is obvious. This is especially true for the attorneys involved, the ones paid to be independent analysts and objective advisors. People can, and often do, disagree on what is reasonable and what is not. They disagree on what is negligent and what is not. On what is acceptable and what is not.

All trial lawyers know that certain tricks of argument and appeals to emotion can have a profound effect on a judge’s resolution of these supposedly reason-based disagreements. They can have an even more profound affect on a jury’s decision. (That is the primary reason that there are so many rules on what can and cannot be said to a jury.)

Study of Legal Psychology

Every good student of the law knows this, but how many attempt to study the psychological dynamics of persuasion? How many attempt to study perceptions of reasonability? Of cognitive bias? Not many, and there are good reasons for this.

First and foremost, few law professors exist that have this kind of knowledge. The only attorneys that I know of with this knowledge are experienced trial lawyers and experienced judges. They know quite a lot about this, but not from any formal or systematic study. They pick up information, and eventually knowledge on the psychological underpinnings of justice by many long years of practice. They learn about the psychology of reasonability through thousands of test cases. They learn what is reasonable by involvement in thousands of disputes. Whatever I know of the subject was learned that way, although I have also read numerous books and articles on the psychology of legal persuasion written by still more senior trial lawyers.

That is not to say that experience, trial and error, is the quickest or best way to learn these insights. Perhaps there is an even quicker and more effective way? Perhaps we could turn to psychologists and see what they have to say about the psychological foundations of perception of reasonability. After all, this is, or should be, a part of their field.

Up until now, not very much has been said from psychologists on law and reasonability, at least not to my knowledge. There are a few books on the psychology of persuasion. I made a point in my early years as a litigator to study them to try to become a better trial lawyer. But in fact, the field is surprisingly thin. There is not much there. It turns out that the fields of Law and Psychology have not overlapped much, at least not in that way.

Perhaps this is because so few psychologists have been involved with legal arguments on reasonability. When psychologists are in the legal system, they are usually focused on legal issues of sanity, not negligence, or in cases involving issues of medial diagnoses.

The blame for the wide gulf between the two fields falls on both sides. Most psychologists, especially research psychologists, have not been interested in the law and legal process. Or when they have, it has involved criminal law, not civil. See eg: Tunnel Vision in the Criminal Justice System (May 2010, Psychology Today). This disinterest has been reciprocal. Most lawyers and judges are not really interested in hearing what psychologists have to say about reasonability. They consider their work to be above such subjective vagaries.

Myth of Objectivity

Myth_ObjectivityLawyers and judges consider reasonability of conduct to be an exclusively legal issue. Most lawyers and judges like to pretend that reasonability exists in some sort of objective, platonic plane of ideas, above all subjective influences. The just decision can be reached by deep, impartial reasoning. This is the myth of objectivity. It is an article of faith in the legal profession.

The myth continues to this day in legal culture, even though all experienced trial lawyers and judges know it is total nonsense, or nearly so. They know full well the importance of psychology and social norms. They know the impact of cognitive biases of all kinds, even transitory ones. As trial lawyers like to quip – What did the judge have for breakfast?

Experienced lawyers take advantage of these biases to win cases for their clients. They know how to push the buttons of judge and jury. See Cory S. Clements, Perception and Persuasion in Legal Argumentation: Using Informal Fallacies and Cognitive Biases to Win the War of Words, 2013 BYU L. Rev. 319 (2013)Justice is sometimes denied as a result. But this does not mean judges should be replaced by robots. No indeed. There is far more to justice than reason. Still a little help from robots is surely part of the future we are making together.

More often than not the operation of cognitive biases happen unconsciously without any puppet masters intentionally pulling the strings. There is more to this than just rhetoric and sophistry. Justice is hard. So is objective ratiocination.

Even assuming that the lawyers and judges in the know could articulate their knowledge of decisional bias, they have little incentive to do so. (The very few law professors with such knowledge do have an incentive, as we see in Professor Clements’ article cited above, but these articles are rare and too academic.) Moreover, most judges and lawyers are incapable of explaining these insights in a systematic manner. They lack the vocabulary of psychology to do so, and, since they learned by long, haphazard experience, that is their style of teaching as well.

Shattering the Myth

One psychologist I know has studies these issues and share his insights. They are myth shattering to be sure, and thus will be unwelcome to some idealists. But for me this is a much-needed analysis. The psychologist who has dared to expose the myth, to lift the curtain, has worked with lawyers for over a decade on discovery issues. He has even co-authored a law review article on reasonability with two distinguished lawyers. Oot, Kershaw, Roitblat, Mandating Reasonableness in a Reasonable Inquiry, Denver University Law Review, 87:2, 522-559 (2010).

Herb RoitblatI am talking about Herbert L. Roitbalt, who has a PhD in psychology. Herb did research and taught psychology for many years at the University of Hawaii. Only after a distinguished career as a research psychologist and professor did Herb turn his attention to computer search in general and then ultimately to law and legal search. He is also a great admirer of dolphins.

Schlemiel and Schlimazel

Herb has written a small gem of a paper on law and reasonability that is a must read for everyone, especially those who do discovery. The Schlemiel and the Schlimazel and the Psychology of Reasonableness (Jan. 10, 2014, LTN) (link is to republication by a vendor without attribution). I will not spoil the article by telling you Herb’s explanation of the Yiddish terms, Schlemiel and Schlimazel, nor what they have to do with reasonability and the law, especially the law of spoliation and sanctions. Only a schmuck would do that. It is a short article; be a mensch and go read it yourself. I will, however, tell you the Huffington Post definition:

A Schlemiel is an inept clumsy person and a Schlimazel is a very unlucky person. There’s a Yiddish saying that translates to a funny way of explaining them both. A schlemiel is somebody who often spills his soup and a schlimazel is the person it lands on.

This is folk wisdom for what social psychologists today call attribution error. It is the tendency to blame your own misfortune on outside circumstances beyond your control (the schlimazel) and blame the misfortune of others on their own negligence (the schlemiel). Thus, for example, when I make a mistake, it is in spite of my reasonable efforts, but when you make a mistake it is because of your unreasonably lame efforts. It is a common bias that we all have. The other guy is often unreasonable, whereas you are not.

Herb Roitblat’s article should be required reading for all judges and lawyers, especially new ones. Understanding the many inherent vagaries of reasonability could, for instance, lead to a much more civil discourse on the subject of sanctions. Who knows, it could even lead to cooperation, instead of the theatre and politics we now see everywhere instead.

Hindsight Bias

Roitblat’s article contains a two paragraph introduction to another important psychological factor at work in many evaluations of reasonability: Hindsight Bias. This has to do with the fact that most legal issues consider past decisions and actions that have gone bad. The law almost never considers good decisions, much less great decisions with terrific outcomes. Instead it focuses on situations gone bad, where it turns out that wrong decisions were made. But were they necessarily negligent decisions?

The mere fact that a decision led to an unexpected, poor outcome does not mean that the decision was negligent. But when we examine the decision with the benefit of 20/20 hindsight, we are naturally inclined towards a finding of negligence. In the same way, if the results prove to be terrific, the hindsight bias is inclined to perceive most any crazy decision as reasonable.

Due to hindsight bias, we all have, in Rotiblat’s words:

[A] tendency to see events that have already occurred as being more predictable than they were before they actually took place. We over-estimate the predictability of the events that actually happened and under-estimate the predictability of events that did not happen.  A related phenomenon is “blame the victim,” where we often argue that the events that occurred should have been predicted, and therefore, reasonably avoided.

Hindsight bias is well known among experienced lawyers and you will often see it argued, especially in negligence and sanctions cases. Every good lawyer defending such a charge will try to cloak all of the mistakes as seemingly reasonable at the time, and any counter-evaluation as merely the result of hindsight bias. They will argue, for instance, that while it may now seem obvious that wiping the hard drives would delete relevant evidence, that is only because of the benefit of hindsight, and that it was not at all obvious at the time.

Judge_Lee_RosenthalGood judges will also sometimes mention the impact of 20/20 hindsight, either on their own initiative, or in response to defense argument. See for instance the following analysis by Judge Lee H. Rosenthal in Rimkus v Cammarata, 688 F. Supp. 2d 598 (S.D. Tex. 2010):

These general rules [of spoliation] are not controversial. But applying them to determine when a duty to preserve arises in a particular case and the extent of that duty requires careful analysis of the specific facts and circumstances. It can be difficult to draw bright-line distinctions between acceptable and unacceptable conduct in preserving information and in conducting discovery, either prospectively or with the benefit (and distortion) of hindsight. Whether preservation or discovery conduct is acceptable in a case depends on what is reasonable ,and that in turn depends on whether what was done–or not done–was proportional to that case and consistent with clearly established applicable standards.  [FN8] (emphasis added)

Judge Shira A. Scheindlin also recognized the impact hindsight in Pension Committee of the University of Montreal Pension Plan, et al. v. Banc of America Securities, LLC, et al., 685 F. Supp. 2d 456 (S.D.N.Y. Jan. 15, 2010 as amended May 28, 2010) at pgs. 463-464:

While many treatises and cases routinely define negligence, gross negligence, and willfulness in the context of tortious conduct, I have found no clear definition of these terms in the context of discovery misconduct. It is apparent to me that these terms simply describe a continuum. FN9 Conduct is either acceptable or unacceptable. Once it is unacceptable the only question is how bad is the conduct. That is a judgment call that must be made by a court reviewing the conduct through the backward lens known as hindsight. It is also a call that cannot be measured with exactitude and might be called differently by a different judge. That said, it is well established that negligence involves unreasonable conduct in that it creates a risk of harm to others, but willfulness involves intentional or reckless conduct that is so unreasonable that harm is highly likely to occur. (emphasis added)

The relatively well-known backward lens known as hindsight can impact anyone’s evaluation of reasonability. But there are many other less obvious psychological factors that can alter a judge or jury’s perception. Herb Roitblat mentions a few more such as the overconfidence effect, where people tend to inflate their own knowledge and abilities, and framing, an example of cognitive bias where the outcome of questions is impacted by the way they are asked. The later is one reason that trial lawyers fight so hard on jury instructions and jury interrogatories.

Conclusion

Ralph_4-25-16Many lawyers are interested in this law-psych intersection and the benefits that might be gained by cross-pollination of knowledge. I have a life-long interest in psychology, and so do many others, some with advanced degrees. That includes my fellow predictive coding expert, Maura R. Grossman, an attorney who also has a Ph.D. in Clinical/School Psychology. A good discovery team can use all of the psychological insights it can get.

The myth of objectivity and the “Reasonable Man” in the law should be exposed. Many naive people still put all of their faith in legal rules and the operation of objective, unemotional logic. The system does not really work that way. Outsiders trying to automate the law are misguided. The Law is far more than logic and reason. It is more than the facts, the surrounding circumstances. It is more than evidence. It is about people and by people. It is about emotion and empathy too. It is about fairness and equity. It’s prime directive is justice, not reason.

That is the key reason why AI cannot automate law, nor legal decision making. Judge Charles (“Terry”) Haight could be augmented and enhanced by smart machines, by AI, but never replaced. The role of AI in the Law is to improve our reasoning, minimize our schlemiel biases. But the robots will never replace lawyers and judges. In spite of the myth of the Reasonable Man, there is far more to law then reason and facts. I for one am glad about that. If it were otherwise the legal profession would be doomed to be replaced by robots.


Concept Drift and Consistency: Two Keys To Document Review Quality – Part Two

January 24, 2016

This is Part Two of this blog. Please read Part One first.

Concept Freeze

frozenbrainsIn most complex review projects the understanding of relevance evolves over time, especially at the beginning of a project. This is concept drift. It evolves as the lawyers’ understanding evolves. It evolves as the facts unfold in the documents reviewed and other sources, including depositions. The concept of relevance shifts as the case unfolds with new orders and pleadings. This is a good thing. Its opposite, concept freeze, is not.

The natural shift in relevance understanding is well-known in the field of text retrieval. Consider for instance the prior cited classic study by Ellen M. Voorhees, the computer scientist at the National Institute of Standards and Technology in charge of TREC, where she noted:

Test collections represent a user’s interest as a static set of (usually binary) decisions regarding the relevance of each document, making no provision for the fact that a real user’s perception of relevance will change as he or she interacts with the retrieved documents, or for the fact that “relevance” is idiosyncratic.

Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 36 Info. Processing & Mgmt  697 (2000) at page 714 (emphasis added). (The somewhat related term query drift in information science refers to a different phenomena in machine learning. In query drift  the concept of document relevance unintentionally changes from the use of indiscriminate pseudorelevance feedback. Cormack, Buttcher & Clarke, Information Retrieval Implementation and Evaluation of Search Engines (MIT Press 2010) at pg. 277. This can lead to severe negative relevance feedback loops.)

In concept drift the concept of what is relevant changes as a result of:

  1. Trying to apply the abstract concepts of relevance to the particular documents reviewed, and
  2. Changes in the case itself over time from new evidence, stipulations and court orders.

cars_driftingThe word drift is somewhat inappropriate here. It suggests inadvertence, a boat at the mercy of a river’s current, drifting out of control. That is misleading. The kind of concept drift here intended is an intentional drift. The change is under the full conscious control of the legal team. The change must also be implemented in a consistent manner by all reviewers, not just one or two. As discussed, this includes retroactive corrections to prior document classifications. Concept drift is more like a racing car’s controlled drift around a corner. That is the more appropriate image.

In legal search relevance should change, should evolve, as the full facts unfold. Although concept drift is derived from a scientific term, it is a phenomena well-known to trial lawyers. If a lawyer’s concept of relevance does not change at all, if it stays frozen, then they are either in a rare black swan type of case, or the document review project is being mismanaged. It is usually the later. The concept of relevance has stratified. It has not evolved or been refined. It is instead static, dead. Sometimes this is entirely the fault of the SME for a variety of reasons. But typically the poor project management is a group effort. Proper execution of the first step in the eight-step work flow for document review, the communication step, will usually prevent concept drift. Although this is naturally the first step in a work-flow, communication should continue throughout a project.

predictive_coding_3.0

The problem of concept freeze is, however, inherent in all large document review projects, not just ones accelerated by predictive coding. In fact, projects using predictive coding are somewhat protected from this problem. Good machine learning software that makes suggestions, including suggestions that disagree with prior human coding, can sometimes prevent relevance stagnancy by forcing human re-conceptions.

No matter what the cause or type of search methods used, a concept freeze at the beginning of a review project, the most intense time for relevance development, is a big red flag. It should trigger a quality control audit. An early concept freeze suggests that the reviewers, the people who manage and supervise them, and SMEs, may not be communicating well, or may not be studying the documents closely enough. It is a sign of a project that has never gotten off the ground, an apathetic enterprise composed of people just going through the motions. It suggests a project dying at the time it should be busy being born. It is a time of silence about relevance when there should be many talks between team members, especially with the reviewers. Good projects have many, many emails circulating with questions, analysis, debate, decisions and instructions.

DylanAll of this reminds me of Bob Dylan’s great song, It’s Alright, Ma (I’m Only Bleeding):

To understand you know too soon
There is no sense in trying …

The hollow horn plays wasted words,
Proves to warn
That he not busy being born
Is busy dying. …

An’ though the rules of the road have been lodged
It’s only people’s games that you got to dodge
And it’s alright, Ma, I can make it.

Ralph Losey with this "nobody read my blog" sad shirtThis observation of the need for relevance refinement at the beginning of a project is based on long experience. I have been involved with searching document collections for evidence for possible use at trial for thirty-six years. This includes both the paper world and electronically stored information. I have seen this in action thousands of times. Since I like Dylan so much, here is my feeble attempt to paraphrase:

Relevance is rarely simple or static,
Drift is expected,
Complexities of law and fact arise and
Are work product protected.

An’ though the SMEs rules of relevance have been lodged
They must surely evolve, improve or be dodged
And its alright, Shira, I can make it.

My message here is that the absence of concept shift – concept freeze – is a warning sign. It is an indicator of poor project management, typically derived from inadequate communication or dereliction of duty by one or more of the project team members. There are exceptions to this general rule, of course, especially in simple cases, or ones where the corpus is well known. Plus, sometimes you do get it right the first time, just not very often.

The Wikipedia article on concept shift noted that such change is inherent in all complex phenomenon not governed by fixed laws of nature, but rather by human activity …. Therefore periodic retraining, also known as refreshing, of any model is necessary. I agree.

error-correctionDetermination of relevance in the law is a very human activity. In most litigation this is a very complex phenomenon. As the relevance concept changes, the classifications need to be refreshed and documents retrained according to the latest relevance model. This means that reviewers need to go back and change the prior classifications of documents. The classifications need to be corrected for uniformity. Here the quality factor of consistency comes into play. It is time-consuming to go back and make corrections, but important. Without these corrections and consistency efforts, the impact of concept drift can be very disruptive, and can result in decreased recall and precision. Important documents can be missed, documents that you need to defend or prosecute, or ones that the other side needs. The last error in egregious situations can be sanctionable.

Here is a quick example of the retroactive correction work in action. Assume that one type of document, say Spreadsheet X typehas been found to be irrelevant for the first several days, such that there are now hundreds, perhaps thousands of various documents coded irrelevant with information pertaining to Spreadsheet X. Assume that a change is made, and the SME now determines that a new type of this document is relevant. The SME realizes, or is told, that there are many other documents on Spreadsheet X that will be impacted by the decision on this new form. A conscious, proportional decision is then made to change the coding on all of the previously documents impacted by this decision. In this hypothetical the scope of relevance expanded. In other cases the scope of relevance might tighten. It takes time to go back and make such corrections in prior coding, but it is well worth it as a quality control effort. Concept drift should not be allowed to breed inconsistency.

Red_Flag_warningA static understanding by document reviewers of relevance, especially at the beginning of a project, is a red flag of mismanagement. It suggests that the subject matter expert (“SME”), who is the lawyer(s) in charge of determining what is relevant to the particular issues in the case, is not properly supervising the attorneys who are actually looking at the documents, the reviewers. If SMEs are not properly supervising the review, if they do not do their job, then the net result is loss of quality. This is the kind of quality loss where key documents could be overlooked. In this situation reviewers are forced to make their own decisions on relevance when new kinds of documents are encountered. This exasperates the natural inconsistencies of human reviewers (more on that later). Moreover, it forces the reviewers to try to guess what the expert in charge of the project might consider to be relevant. When in doubt the tendency of reviewers is to guess on the broadside. Over-extended notions of relevance are often result.

A review project of any complexity that does not run into some change in relevance at the beginning of a project is probably poorly managed and making many other mistakes. The cause may not be from the SME at all. It may be the fault of the document reviewers or mid-realm management. The reviewers may not be asking questions when they should, they may not be sharing their analysis of grey area documents. They may not care or talk at all. The target may be vague and elusive. No one may have a good idea of relevance, much less a common understanding.

This must be a team effort. If audits show that any reviewers or management are deficient, they should be quickly re-educated or replaced. If there are other quality control measures in place, then the potential damage from such mismanagement may be limited. In other review projects, however, this kind of mistake can go undetected and be disastrous. It can lead to an expensive redo of the project and even court sanctions for failure to find and produce key documents.

supervising-tipsSMEs must closely follow the document review progress. They must supervise the reviewers, at least indirectly. Both the law and legal ethics require that. SMEs should not only instruct reviewers at the beginning of a project on relevancy, they should be consulted whenever new document types are seen. This should ideally happen in near real time, but at least on a daily basis with coding on that document type suspended until the SME decisions are made.

With a proper surrogate SME agency system in place, this need not be too burdensome for the senior attorneys in charge. I have worked out a number of different solutions for that SME burdensomeness problem. One way or another, SME approval must be obtained during the course of a project, not at the end. You simply cannot afford to wait until the end to verify relevance concepts. Then the job can become overwhelming, and the risks of errors and inefficiencies too high.

Even if consistency of reviewers is assisted, as it should, by using similarity search methods, the consistent classification may be wrong. The production may well reflect what the SME thought months earlier, before the review started, whereas what matters is what the SME thinks at time of production. A relevance concept that does not evolve over time, that does not drift to the truth, is usually wrong. A document review project that ties all document classification to the SME’s initial ideas of relevance is usually doomed to failure. These initial SME concepts are typically made at the beginning of the case and after only a few relevant documents have been reviewed. Sometimes they are made completely in the abstract, with the SME having seen no documents. These initial ideas are only very rarely one hundred percent right. Moreover, even if the ideas, the concepts, are completely right from the beginning, and do not change, the application of these concepts to the documents seen will change. Modifications and shifts of some sort, and to some degree, are almost always required as the documents reveal what really happened and how. Modifications can also be driven by demands of the requesting party, and most importantly, by rulings of the court.

Consistency

Consistency as described before refers to the coding of the same or similar type documents in the same manner. This means that:

  1. A single reviewer determines relevance in a consistent manner throughout the course of a review project.
  2. Multiple reviewers determine relevance in a consistent manner with each other.

As mentioned, the best software now makes it possible to identify many of these inconsistencies, at least the easy ones involving near duplicates. Actual, exact duplicates are rarely a problem, as they are so easy to detect, but not all software is good at detecting near duplicates, threads, etc. Consistency in adjudications of relevance is a quality control feature that I consider indispensable. Ask your vendor how their software can help you to find and correct all obvious inconsistencies, and mitigate against the others. The real challenge, of course, is not in near duplicates, but in documents that have the same meaning, but very different form.

ConsistencyIsKey

VoorheesScientific research has shown that inconsistency of relevance adjudications is inherent in all human review, at least in large, document review projects requiring complex analysis. For authority I refer again to the prior cited study by Ellen M. Voorhees, the computer scientist at the National Institute of Standards and Technology in charge of TREC. Voorhees found that the average agreement rate of agreement by two human experts on documents determined to be relevant was only 43%. She called that overlap. This means that two manual reviewers disagreed with each other as to document relevance 57% of the time. Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, supra at pages 700-701.

Note that the reviewers in this study were all experts, all retired intelligence officers skilled in document analysis. Like litigation lawyers they all had similar backgrounds and training. When the relevance determinations of a third reviewer were considered in this study, the average overlap rate dropped down to 30%. That means the three experts disagreed in their independent analysis of document relevance 70% of the time. The 43% and 30% overlap they attained was higher that earlier TREC studies on inconsistency. The overlap rate is shown in Table 1 of her paper at page 701.

Voorhees_paper_screen_shot

Voorhees concluded from that this data was evidence for the variability of relevance judgments. Id. 

Ralph_InconsistenciesA 70% inconsistency rate on relevance classifications among three experts is troubling, and thus the need to check and correct for human errors, especially when expert decisions are required as is the case with all legal search. I assume that agreement rates would be much higher in a simple search matter, such as finding all articles in a newspaper collection relevant to a particular news event. That does not require expert legal analysis. It requires vert little analysis at all. For that reason I would expect human reviewer consistency rates to be much higher with such simple search. But that is not the world of legal search, where complex analysis of legal issues requiring special training is the norm. So for us, where document reviews are usually done with teams of lawyers, consistency by human reviewers is a real quality control problem that must be carefully addressed.

The Voorhees study was borne out by a later study on a legal search project by Herbert L. Roitblat, PhD, Anne Kershaw and Patrick Oot. Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, 61 (2010). Here a total of 1,600,047 documents were reviewed by contract attorneys in a real-world linear second review. A total of 225 attorneys participated in the review. The attorneys spent about 4 months, working 7 days a week, and 16 hours per day on this review.

A few years after the Verizon review, two re-review teams of professional reviewers (Team A and Team B) were retained by the Electronic Discovery Institute (EDI) who sponsored their study. They found that the overlap (agreement in relevance coding) between Team A and the original production was 16.3%; and the overlap between Team B and the original production was 15.8%. This means an inconsistency rate on relevance of 84%. The overlap between the two re-review Teams A and B was a little better at 28.1%, meaning an inconsistency rate of  72%. Better, but still terrible, and once again demonstrating how unreliable human review alone is without the assistance of computers, especially without active machine learning and the latest quality controls. Their study reaffirmed an important point about inconsistency in manual linear review, especially when the review requires complex legal analysis. It also showed the incredible cost savings readily available with using advanced search techniques to filter documents, instead of linear review of everything.

The total cost of the original Verizon merger review was $13,598,872.61 or about $8.50 per document. Apparently M&A has bigger budgets than Litigation.  Note the cost comparison to the 2015 e-Discovery Team effort at TREC reviewing Seventeen Million documents at an average review speed of 47,261 files per hour. The Team’s average cost per document was very low, but this cost is not yet possible in real-world for a variety of reasons. Still, it is illustrative of the state of the art. It shows what’s next in legal practice. Examining what we did at TREC: if you assume a billing rate of $500 per hour for the e-Discovery Team attorneys, then the cost per document for first pass attorney review would have been a penny a document. Compare that to $8.50 per document doing linear review without active machine learning, concept search, and parametric Boolean keyword searches.

Lexington - IT lexThe conclusions are obvious, and yet, there are many still ill-informed corporate clients that sanction the use horse and buggy linear reviews, along with their rich drivers, just like in the old days of 2008. Many in-house counsel still forgo the latest CARs with AI-enhanced drivers. Most do not know any better. They have not rad the studies, even the widely publicized EDI studies. Too bad, but that does spell opportunity for the corporate legal counsel who do keep up. More and more of the younger ones do get it, and the authority to make sweeping changes. The next generation will be all about active machine learning, lawyer augmentation, and super-fast smart robots, with and without mobility.

Clients still paying for large linear review projects are not only wasting good money, and getting poor results in the process, but no one is having any fun in such slow, boring reviews. I will not do it, no matter what the law firm profit potential from such price gouging. It is a matter of both professional pride and ethics, plus work enjoyment. Why would anyone other than the hopelessly greedy, or incompetent, mosey along at a snail’s pace when you could fly, when you could get there much faster, and overall do a better job, find more relevant documents?

The gullibility of some in-house counsel to keep paying for large-scale linear reviews by armies of lawyers is truly astounding. Insurance companies are waking up to this fact. I am helping some of them to clamp down on the rip offs. It is only a matter of time before everyone leaves the horse behind and gets a robot driven CAR. You can delay such progress, we are seeing that, but you can never stop it.

Google_Car_Hybrid

By the way, since my search method is Hybrid Multimodal, it follows that my Google CAR has a steering wheel to allow a human to drive. That is the Hybrid part. The Multimodal means the car has a stick shift, with many gears and search methods, not just AI alone. All of my robots, including the car, will  have an on-off-switch and manufacturer certifications of compliance with Isaac Asimov’s “Three Laws of Robotics.”

Back to the research on consistency, the next study that I know about was by Gordon Cormack and Maura Grossman: Inconsistent Assessment of Responsiveness in E-Discovery: Difference of Opinion or Human Error?, 32 Pace L. Rev. 267 (2012). It considered data from the TREC 2009 Legal Track Interactive Task. It attempts to rebut the conclusion by Voorehees that the inconsistencies she noted are the result of inherently subjective relevance judgments, as opposed to human error.

As to the seven topics considered at TREC in 2009, Cormack and Grossman found that the average agreement for documents coded responsive by the first-pass reviewers was 71.2 percent (28.8% inconsistent), while the average agreement for documents coded non-responsive by the first-pass reviewer was 97.4 percent (2.6% inconsistent). Id. at 274 (parentheticals added). Over the seven topics studied in 2009 there was a total overlap of relevance determinations of 71.2%. Id at 281. This is a big improvement, but it still means inconsistent calls on relevance occurred 29% of the time, and this was using the latest circa 2009 predictive coding methods. Also, these scores are in the context of a TREC protocol that allowed for participants to appeal TREC relevance calls that they disagreed with. The overlap for two reviewers relevance calls was 71%  in the Grossman Cormack study, only if you assume all unappealed decisions were correct. But if you were to only consider the appealed decisions, the agreement rate was only 11%.

Grossman and Cormack concluded in this study that only 5% of the inconsistencies in determinations of document relevance were attributable to differences in opinion, that 95% were attributable to human error. They concluded that most reviewer categorizations were caused by carelessness, such as not following instructions, and were not caused by differences in subjective evaluations. I would point out that carelessness also impacts analysis. So I do not see a bright line, like they apparently do, between “differences of opinion” and “human error.” Additional research into this area should be undertaken. But regardless of the primary cause, the inconsistencies again noted by Cormack and Grossman highlight once again the need for quality controls to guard against such human errors.

Enron_Losey_StudyThe final study with new data on reviewer inconsistencies was mine. A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents (2013). In this experiment I reviewed 699,082 Enron documents by myself, twice, on two review projects about six months apart. The projects were exactly the same, same issues, same relevance standards. The documents were also the same. The only difference between the two projects was in the type of predictive coding method used. The two projects were over six months apart and I had little or no recollection of the documents from one review to the next.

In a post hoc analysis of these two reviews I discovered that I had made 63 inconsistent relevance determinations of the same documentsLess Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Two (12/2/13). Yes, human error at work with no quality controls at play to try to contain such inconsistency errors. I think it was an error in analysis, not simply checking the wrong box by accident, or something like that.

Borg_Losey_stage2In the first multimodal review project I read approximately 2,500 individual documents to categorize the entire set of 699,082 ENRON emails. I found 597 relevant documents. In the second monomodal project, the one I called the Borg experiment, I read 12,000 documents to find 376 relevant documents. After removal of duplicate documents, which were all coded consistently thanks to simple quality controls employed in both projects, there were a total of 274 different documents coded relevant by one or both methods.

Of the 274 overlapping relevant categorizations, 63 of them were inconsistent. In the first (multimodal) project I found 31 documents to be irrelevant that I determined to be relevant in the second project. In the second (monomodal) project I found 32 documents to be irrelevant that I had determined to be relevant in the first project. An inconsistency of coding of 63 out of 274 relevant documents represents an inconsistency rate of 23%. This was using the same predictive coding software by Kroll Ontrack and the quality control similarity features included in software back in 2012. The software has improved since then, and I have added more quality controls, but I am still the same reviewer with the same all too human reading comprehension and analysis skills. I am, however, happy to report that even without my latest quality controls all of my inconsistent calls on relevance pertained to unimportant relevant documents, what I consider “more of the same” grey area types. No important document was miscoded.

My re-review of the 274 documents, where I made the 63 errors, creates an overlap or Jaccard index of 77% (211/274), which, while embarrassing, as most reports of error are, is still the best on record. See Grossman Cormack Glossary, Ver. 1.3 (2012) (defines the Jaccard index and goes on to state that expert reviewers commonly achieve Jaccard Index scores of about 50%, and scores exceeding 60% are very rare.) This overlap or Jaccard index for my two Enron reviews is shown by the Venn diagram below.

Unique_Docs_VennBy comparison the Jaccard index in the Voorhees studies were only 43% (two reviewers) and 30% (three reviewers). The Jaccard index of the Roitblat, Kershaw and Oot study was only 16% (multiple reviewers).

Review_Consistency_Rates-CORRECTED

This is the basis for my less is more postulate and why I always use as few contract review attorneys as possible in a review project. Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three.  This helps pursue the quality goal of perfect consistency. Sorry contract lawyers, your days are numbered. Most of you can and will be replaced. You will not be replaced by robots exactly, but by other AI-enhanced human reviewers. SeeWhy I Love Predictive Coding (The Empowerment of AI Augmented Search).

To be continued …


%d bloggers like this: