The Law’s “Reasonable Man,” Judge Haight, Love, Truth, Justice, “Go Fish” and Why the Legal Profession Is Not Doomed to be Replaced by Robots

June 29, 2016

Reasonable_guageReasonability is a core concept of the law and foundation of our system of justice. Reason, according to accepted legal doctrine, is how we judge the actions of others and determine right from wrong. We do not look to Truth and Love for Justice, we look to Truth and Reason. If a person’s actions are reasonable, then, as a general matter, they are good and should not be punished, no matter what the emotional motives behind the actions. It is an objective standard. Actions judged as unreasonable are not good, no matter the emotional motive (think mercy killing).

Irrational actions are discouraged by law, and, if they cause damages, they are punished. The degree of punishment slides according to how unreasonable the behavior was and the extent of damages caused. Bad behavior ranges from the barely negligent – a close question – to intentionally bad, scienter. Analysis of reasonability in turn always depends on the facts and circumstances surrounding the actions being judged.

Reasonability Depends on the Circumstances

Justice_scaleWhenever a lawyer is asked a legal question they love to start the answer by pointing that it all depends. We are trained to see both sides, to weigh the evidence. We dissect, access and evaluate degrees of reasonability according to the surrounding circumstances. We deal with reason, logic and cold hard facts. Our recipe for justice is simple: add reason to facts and stir well.

The core concept of reasonability not only permeates negligence and criminal law, it underlies discovery law as well. We are constantly called upon the evaluate the reasonability of efforts to save, find and produce electronically stored information. This evaluation of reasonability always depends on the facts. It requires more than information. It requires knowledge of what the information means.

Perfect efforts are not required in the law, but reasonable efforts are. Failure to make such efforts can be punished by the court, with the severity of the punishment contingent on the degree of unreasonability and extent of damages. Again, this requires knowledge of the true facts of the efforts, the circumstances.

justice_guage_negligenceIn discovery litigants and their lawyers are not permitted to make anything less than reasonable efforts to find the information requested. They are not permitted to make sub-standard, negligent efforts, and certainly not grossly negligence efforts. Let us not even talk about intentionally obstructive or defiant efforts. The difference between good enough practice – meaning reasonable efforts – and malpractice is where the red line of negligence is drawn.

Bagely v. Yale

Yale Law Professor Constance Bagley

Professor Constance Bagley

One of my favorite district court judges – 86-year old Charles S. Haight – pointed out the need to evaluate reasonability of e-discovery efforts in a well-known, at this time still ongoing employment discrimination case. Bagely v. Yale, Civil Action No. 3:13-CV-1890 (CSH). See eg. Bagley v. Yale University, 42 F. Supp. 3d 332 (DC, Conn. 2014). On April 27, 2015, Judge Haight considered Defendant’s Motion for Protective Order.

The plaintiff, Constance Bagley, wanted her former employer, Yale University, to look through the emails of more witness to respond to her request for production. The defendant, Yale University, said it had already done enough, that it had reviewed the emails of several custodians, and should not be required to do more. Judge Haight correctly analyzed this dispute as requiring his judgment on the reasonability of Yale’s efforts. He focused on Rule 26(b)(2)(B) involving the “reasonable accessibility” of certain ESI and the reasonable efforts requirements under then Rule 26(b)(2)(C) (now 26(b)(1) – proportionality factors under the 2015 Rules Amendments). In the judge’s words:

Yale can — indeed, it has — shown that the custodians’ responsive ESI is not readily accessible. That is not the test. The question is whether this information is not reasonably accessible: a condition that necessarily implies some degree of effort in accessing the information. So long as that creature of the common law, the reasonable man,[6] paces the corridors of our jurisprudence, surrounding circumstances matter.

[6] The phrase is not gender neutral because that is not the way Lord Coke spoke.

Bagley v. Yale, Ruling on Defendant’s Motion for Protective Order (Doc. 108) (April 27, 2015) (emphasis added).

The Pertinent e-Discovery Facts of Bagley v. Yale

kiss_me_im_a_custodian_keychainJudge Haight went on to deny the motion for protective order by defendant Yale University, his alma mater, by evaluation of the facts and circumstances. Here the plaintiff originally wanted defendant to review for relevant documents the ESI that contained certain search terms of 24 custodians. The parties later narrowed the list of terms and reduced the custodian count from 24 to 10. The defendant began a linear review of each and every document. (Yes, their plan was to have a paralegal or attorney look at each any every document with a hit, instead of more sophisticated approaches, i.e. – concept search or predictive coding.) Here is Judge Haight’s description:

Defendants’ responsive process began when University staff or attorneys commandeered — a more appropriate word than seized — the computer of each of the named custodians. The process of ESI identification and production then “required the application of keyword searches to the computers of these custodians, extracting the documents containing any of those keywords, and then reading every single document extracted to determine whether it is responsive to any of the plaintiff’s production requests and further to determine whether the document is privileged.” Defendants’ Reply Brief [Doc. 124], at 2-3. This labor was performed by Yale in-house paralegals and lawyers, and a third-party vendor the University retained for the project.

Go FishIt appears from the opinion that Yale was a victim of a poorly played game of Go Fish where each side tries to find relevant documents by guessing keywords without study of the data, much less other search methods. Losey, R., Adventures in Electronic Discovery (West 2011); Child’s Game of ‘Go Fish’ is a Poor Model for e-Discovery Search. This is a very poor practice, as I have often argued, and frequently results in surprise burdens on the producing party.

This is what happened here. As Judge Haight explained, Yale did not complain of these keywords and custodian count (ten instead of five), until months later when the review was well underway:

[I]t was not until the parties had some experience with the designated custodians and search terms that the futility of the exercise and the burdens of compliance became sufficiently apparent to Defendants to complain of them.

go fishToo bad. If they had tested the keywords first before agreeing to review all hits, instead of following the Go Fish approach, none of this would have happened. National Day Laborer Organizing Network v. US Immigration and Customs Enforcement Agency, 877 F.Supp.2d 87 (SDNY, 2012) (J. Scheindlin) (“As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: “In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of `Go Fish’ … keyword searches usually are not very effective.” FN 113“); Losey, R., Poor Plaintiff’s Counsel, Can’t Even Find a CAR, Much Less Drive One (9/1/13).

After reviewing the documents of only three custodians, following the old-fashioned, buggy-whip method of looking at one document after another (linear review), the defendant complained as to the futility of their effort to the judge. They alleged that the effort:

… required paralegals and lawyers to review approximately 13,393 files, totaling 4.5 gigabytes, or the equivalent of about 450,000 pages of emails. Only 6% of this data was responsive to Plaintiff’s discovery request: about 300 megabytes, or about 29,300 pages of emails. In excess of 95% of this information, while responsive to the ESI request, has absolutely nothing to do with any of the issues in this case. Thus, defendants’ lawyers and paralegals reviewed approximately 450,000 pages of material in order to produce less than 1,500 pages of information which have any relationship whatsoever to this dispute; and the majority of the 1,500 pages are only marginally relevant.

ShiraScheindlin_sketchI do not doubt that at all. It is typical in cases like this. What do you expect from blind negotiated keyword search and linear review? For less effort try driving a CAR instead of walking. As Judge Scheindlin said in National Day Laborer back in 2012:

There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere.[114] There is a “need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or `keywords’ to be used to produce emails or other electronically stored information.”[115] And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.[116] Through iterative learning, these methods (known as “computer-assisted” or “predictive” coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies’ unsupported assertions that their lay custodians have designed and conducted a reasonable search.

National Day Laborer Organizing Network, supra 877 F.Supp.2d at pgs. 109-110.

Putting aside the reasonability of search and review methods selected, an issue never raised by the parties and not before the court, Judge Haight addressed whether the defendant should be required to review all ten custodians in these circumstances. Here is Judge Haight’s analysis:

Prior to making this motion, Yale had reviewed the ESI of a number of custodians and produced the fruits of those labors to counsel for Bagley. Now, seeking protection from — which in practical terms means cessation of — any further ESI discovery, the University describes in vivid, near-accusatory prose the considerable amount of time and treasure it has already expended responding to Bagley’s ESI discovery requests: an exercise which, in Yale’s non-objective and non-binding evaluation, has unearthed no or very little information relevant to the lawsuit. Yale’s position is that given those circumstances, it should not be required to review any additional ESI with a view toward producing any additional information in discovery. The contention is reminiscent of a beleaguered prizefighter’s memorable utterance some years ago: “No mas!” Is the University entitled to that relief? Whether the cost of additional ESI discovery warrants condemnation of the total as undue, thereby rendering the requested information not reasonably accessible to Yale, presents a legitimate issue and, in my view, a close question.

Judge Charles Haight (“Terry” to his friends) analyzed the facts and circumstances to decide whether Yale should continue its search and review of four more custodians. (It was five more, but Yale reviewed one while the motion was pending.) Here is his summary:

Defendants sum up the result of the ESI discovery they have produced to Plaintiff to date in these terms: “In other words, of the 11.88 gigabytes of information[3](which is the equivalent of more than 1 million pages of email files) that has so far been reviewed by the defendant, only about 8% of that information has been responsive and non-privileged. Furthermore, only a small percentage of those documents that are responsive and non-privileged actually have any relevance to the issues in this lawsuit.” Id., at 4-5.  . . .

[3] 11.88 gigabytes is the total of 4.5 gigabytes (produced by review of the computers of Defendant custodians Snyder, Metrick and Rae) and 7.38 gigabytes (produced by review of the computers of the additional five custodians named in text).

Defendants assert on this motion that on the basis of the present record, “the review of these remaining documents will amount to nothing more than a waste of time and money. This Court should therefore enter a protective order relieving the defendant[s] from performing the requested ESI review.” Id.  . . .

Ruling in Bagley v. Yale

gavelJudge Haight, a wise senior judge who has seen and heard it all before, found that under these facts Yale had not yet made a reasonable effort to satisfy their discovery obligations in this case. He ordered Yale to review the email of four more custodians. That, he decided, would be a reasonable effort. Here is Judge Haight’s explanation of his analysis of reasonability, which, in my view, is unaffected by the 2015 Rule Amendments, specifically the change to Rule 26(b)(1).

In the case at bar, the custodians’ electronically stored information in its raw form was immediately accessible to Yale: all the University had to do was tell a professor or a dean to hand over his or her computer. But Bagley’s objective is to discover, and Defendants’ obligation is to produce, non-privileged information relevant to the issues: Yale must review the custodians’ ESI and winnow it down. That process takes time and effort; time and effort can be expensive; and the Rule measures the phrase “not reasonably accessible” by whether it exposes the responding party to “undue cost.” Not some cost: undue cost, an adjective Black’s Law Dictionary (10th ed. 2014 at 1759) defines as “excessive or unwarranted.” . . .

In the totality of circumstances displayed by the case at bar, I think it would be an abuse of discretion to cut off Plaintiff’s discovery of Defendants’ electronically stored information at this stage of the litigation. Plaintiff’s reduction of custodians, from the original 24 targeted by Defendants’ furiously worded Main Brief to the present ten, can be interpreted as a good-faith effort by Plaintiff to keep the ESI discovery within permissible bounds. Plaintiff’s counsel say in their Opposing Brief [Doc. 113] at 2: “Ironically, this last production includes some of the most relevant documents produced to date.” While relevance, like beauty, often lies in the eyes of the beholder, and Defendants’ counsel may not share the impressions of their adversaries, I take the quoted remark to be a representation by an officer of the Court with respect to the value and timing of certain evidence which has come to light during this discovery process. The sense of irritated resignation conveyed by the familiar aphorism — “it’s like looking for a needle in a haystack” — does not exclude the possibility that there may actually be a needle (or two or three) somewhere in the haystack, and sharp needles at that. Plaintiff is presumptively entitled to search for them.

As Judge Haight understood when he said that the “Plaintiff is presumptively entitled to search for them,” the search effort is actually upon the defendant, not the plaintiff. The law requires the defendant to expend reasonable efforts to search for the needles in the haystack that the plaintiff would like to be found. Of course, if those needles are not there, no amount of effort can find them. Still, no one knows that in advance (although probabilities can be calculated), whether there are hot documents left to be found, so reasonable efforts are often required to show they are not there. This can be difficult as any e-discovery lawyer well knows.

Faced with this situation most e-discovery specialists will tell you the best solution is to cooperate, or at least try. If your cooperative efforts fail and you seek relief from the court, it needs to be clear to the judge that you did try. If the judge thinks you are just another unreasonable, over-assertive lawyer, your efforts are doomed. This is apparently part of what was driving Judge Haight’s analysis of “reasonable” as the following colorful, one might say “tasty,” quote from the opinion shows:

A recipe for a massive and contentious adventure in ESI discovery would read: “Select a large and complex institution which generates vast quantities of documents; blend as many custodians as come to mind with a full page of search terms; flavor with animosity, resentment, suspicion and ill will; add a sauce of skillful advocacy; stir, cover, set over high heat, and bring to boil. Serves a district court 2-6 motions to compel discovery or for protection from it.”

Yale_pot_boiling

You have got to love a judge with wit and wisdom like that. My only comment is that truly skillful advocacy here would include cooperation, and lots of it. The sauce added in that case would be sweet and sour, not just hot and spicy. It should not give a judge any indigestion at all, much less six motions. That is one reason why Electronic Discovery Best Practices (EDBP.com) puts such an emphasis on skillful cooperation.

EDBP.com You are free to use this chart in any manner so long as you do not chnage it.

What is Reasonable?

Reasonable_man_cloudBagley shows that the dividing line between what is reasonable and thus acceptable efforts, and what is not, can often be difficult to determine. It depends on a careful evaluation of the facts, to be sure, but this evaluation in turn depends on many subjective factors, including whether one side or another was trying to cooperate. These factors include all kinds of prevailing social norms, not just cooperativeness. It also includes personal values, prejudices, education, intelligence, and even how the mind itself works, the hidden psychological influences. They all influence a judge’s evaluation in any particular case as to which side of the acceptable behavior line a particular course of conduct falls.

In close questions the subjectivity inherent in determinations of reasonability is obvious. This is especially true for the attorneys involved, the ones paid to be independent analysts and objective advisors. People can, and often do, disagree on what is reasonable and what is not. They disagree on what is negligent and what is not. On what is acceptable and what is not.

All trial lawyers know that certain tricks of argument and appeals to emotion can have a profound effect on a judge’s resolution of these supposedly reason-based disagreements. They can have an even more profound affect on a jury’s decision. (That is the primary reason that there are so many rules on what can and cannot be said to a jury.)

Study of Legal Psychology

Every good student of the law knows this, but how many attempt to study the psychological dynamics of persuasion? How many attempt to study perceptions of reasonability? Of cognitive bias? Not many, and there are good reasons for this.

First and foremost, few law professors exist that have this kind of knowledge. The only attorneys that I know of with this knowledge are experienced trial lawyers and experienced judges. They know quite a lot about this, but not from any formal or systematic study. They pick up information, and eventually knowledge on the psychological underpinnings of justice by many long years of practice. They learn about the psychology of reasonability through thousands of test cases. They learn what is reasonable by involvement in thousands of disputes. Whatever I know of the subject was learned that way, although I have also read numerous books and articles on the psychology of legal persuasion written by still more senior trial lawyers.

That is not to say that experience, trial and error, is the quickest or best way to learn these insights. Perhaps there is an even quicker and more effective way? Perhaps we could turn to psychologists and see what they have to say about the psychological foundations of perception of reasonability. After all, this is, or should be, a part of their field.

Up until now, not very much has been said from psychologists on law and reasonability, at least not to my knowledge. There are a few books on the psychology of persuasion. I made a point in my early years as a litigator to study them to try to become a better trial lawyer. But in fact, the field is surprisingly thin. There is not much there. It turns out that the fields of Law and Psychology have not overlapped much, at least not in that way.

Perhaps this is because so few psychologists have been involved with legal arguments on reasonability. When psychologists are in the legal system, they are usually focused on legal issues of sanity, not negligence, or in cases involving issues of medial diagnoses.

The blame for the wide gulf between the two fields falls on both sides. Most psychologists, especially research psychologists, have not been interested in the law and legal process. Or when they have, it has involved criminal law, not civil. See eg: Tunnel Vision in the Criminal Justice System (May 2010, Psychology Today). This disinterest has been reciprocal. Most lawyers and judges are not really interested in hearing what psychologists have to say about reasonability. They consider their work to be above such subjective vagaries.

Myth of Objectivity

Myth_ObjectivityLawyers and judges consider reasonability of conduct to be an exclusively legal issue. Most lawyers and judges like to pretend that reasonability exists in some sort of objective, platonic plane of ideas, above all subjective influences. The just decision can be reached by deep, impartial reasoning. This is the myth of objectivity. It is an article of faith in the legal profession.

The myth continues to this day in legal culture, even though all experienced trial lawyers and judges know it is total nonsense, or nearly so. They know full well the importance of psychology and social norms. They know the impact of cognitive biases of all kinds, even transitory ones. As trial lawyers like to quip – What did the judge have for breakfast?

Experienced lawyers take advantage of these biases to win cases for their clients. They know how to push the buttons of judge and jury. See Cory S. Clements, Perception and Persuasion in Legal Argumentation: Using Informal Fallacies and Cognitive Biases to Win the War of Words, 2013 BYU L. Rev. 319 (2013)Justice is sometimes denied as a result. But this does not mean judges should be replaced by robots. No indeed. There is far more to justice than reason. Still a little help from robots is surely part of the future we are making together.

More often than not the operation of cognitive biases happen unconsciously without any puppet masters intentionally pulling the strings. There is more to this than just rhetoric and sophistry. Justice is hard. So is objective ratiocination.

Even assuming that the lawyers and judges in the know could articulate their knowledge of decisional bias, they have little incentive to do so. (The very few law professors with such knowledge do have an incentive, as we see in Professor Clements’ article cited above, but these articles are rare and too academic.) Moreover, most judges and lawyers are incapable of explaining these insights in a systematic manner. They lack the vocabulary of psychology to do so, and, since they learned by long, haphazard experience, that is their style of teaching as well.

Shattering the Myth

One psychologist I know has studies these issues and share his insights. They are myth shattering to be sure, and thus will be unwelcome to some idealists. But for me this is a much-needed analysis. The psychologist who has dared to expose the myth, to lift the curtain, has worked with lawyers for over a decade on discovery issues. He has even co-authored a law review article on reasonability with two distinguished lawyers. Oot, Kershaw, Roitblat, Mandating Reasonableness in a Reasonable Inquiry, Denver University Law Review, 87:2, 522-559 (2010).

Herb RoitblatI am talking about Herbert L. Roitbalt, who has a PhD in psychology. Herb did research and taught psychology for many years at the University of Hawaii. Only after a distinguished career as a research psychologist and professor did Herb turn his attention to computer search in general and then ultimately to law and legal search. He is also a great admirer of dolphins.

Schlemiel and Schlimazel

Herb has written a small gem of a paper on law and reasonability that is a must read for everyone, especially those who do discovery. The Schlemiel and the Schlimazel and the Psychology of Reasonableness (Jan. 10, 2014, LTN) (link is to republication by a vendor without attribution). I will not spoil the article by telling you Herb’s explanation of the Yiddish terms, Schlemiel and Schlimazel, nor what they have to do with reasonability and the law, especially the law of spoliation and sanctions. Only a schmuck would do that. It is a short article; be a mensch and go read it yourself. I will, however, tell you the Huffington Post definition:

A Schlemiel is an inept clumsy person and a Schlimazel is a very unlucky person. There’s a Yiddish saying that translates to a funny way of explaining them both. A schlemiel is somebody who often spills his soup and a schlimazel is the person it lands on.

This is folk wisdom for what social psychologists today call attribution error. It is the tendency to blame your own misfortune on outside circumstances beyond your control (the schlimazel) and blame the misfortune of others on their own negligence (the schlemiel). Thus, for example, when I make a mistake, it is in spite of my reasonable efforts, but when you make a mistake it is because of your unreasonably lame efforts. It is a common bias that we all have. The other guy is often unreasonable, whereas you are not.

Herb Roitblat’s article should be required reading for all judges and lawyers, especially new ones. Understanding the many inherent vagaries of reasonability could, for instance, lead to a much more civil discourse on the subject of sanctions. Who knows, it could even lead to cooperation, instead of the theatre and politics we now see everywhere instead.

Hindsight Bias

Roitblat’s article contains a two paragraph introduction to another important psychological factor at work in many evaluations of reasonability: Hindsight Bias. This has to do with the fact that most legal issues consider past decisions and actions that have gone bad. The law almost never considers good decisions, much less great decisions with terrific outcomes. Instead it focuses on situations gone bad, where it turns out that wrong decisions were made. But were they necessarily negligent decisions?

The mere fact that a decision led to an unexpected, poor outcome does not mean that the decision was negligent. But when we examine the decision with the benefit of 20/20 hindsight, we are naturally inclined towards a finding of negligence. In the same way, if the results prove to be terrific, the hindsight bias is inclined to perceive most any crazy decision as reasonable.

Due to hindsight bias, we all have, in Rotiblat’s words:

[A] tendency to see events that have already occurred as being more predictable than they were before they actually took place. We over-estimate the predictability of the events that actually happened and under-estimate the predictability of events that did not happen.  A related phenomenon is “blame the victim,” where we often argue that the events that occurred should have been predicted, and therefore, reasonably avoided.

Hindsight bias is well known among experienced lawyers and you will often see it argued, especially in negligence and sanctions cases. Every good lawyer defending such a charge will try to cloak all of the mistakes as seemingly reasonable at the time, and any counter-evaluation as merely the result of hindsight bias. They will argue, for instance, that while it may now seem obvious that wiping the hard drives would delete relevant evidence, that is only because of the benefit of hindsight, and that it was not at all obvious at the time.

Judge_Lee_RosenthalGood judges will also sometimes mention the impact of 20/20 hindsight, either on their own initiative, or in response to defense argument. See for instance the following analysis by Judge Lee H. Rosenthal in Rimkus v Cammarata, 688 F. Supp. 2d 598 (S.D. Tex. 2010):

These general rules [of spoliation] are not controversial. But applying them to determine when a duty to preserve arises in a particular case and the extent of that duty requires careful analysis of the specific facts and circumstances. It can be difficult to draw bright-line distinctions between acceptable and unacceptable conduct in preserving information and in conducting discovery, either prospectively or with the benefit (and distortion) of hindsight. Whether preservation or discovery conduct is acceptable in a case depends on what is reasonable ,and that in turn depends on whether what was done–or not done–was proportional to that case and consistent with clearly established applicable standards.  [FN8] (emphasis added)

Judge Shira A. Scheindlin also recognized the impact hindsight in Pension Committee of the University of Montreal Pension Plan, et al. v. Banc of America Securities, LLC, et al., 685 F. Supp. 2d 456 (S.D.N.Y. Jan. 15, 2010 as amended May 28, 2010) at pgs. 463-464:

While many treatises and cases routinely define negligence, gross negligence, and willfulness in the context of tortious conduct, I have found no clear definition of these terms in the context of discovery misconduct. It is apparent to me that these terms simply describe a continuum. FN9 Conduct is either acceptable or unacceptable. Once it is unacceptable the only question is how bad is the conduct. That is a judgment call that must be made by a court reviewing the conduct through the backward lens known as hindsight. It is also a call that cannot be measured with exactitude and might be called differently by a different judge. That said, it is well established that negligence involves unreasonable conduct in that it creates a risk of harm to others, but willfulness involves intentional or reckless conduct that is so unreasonable that harm is highly likely to occur. (emphasis added)

The relatively well-known backward lens known as hindsight can impact anyone’s evaluation of reasonability. But there are many other less obvious psychological factors that can alter a judge or jury’s perception. Herb Roitblat mentions a few more such as the overconfidence effect, where people tend to inflate their own knowledge and abilities, and framing, an example of cognitive bias where the outcome of questions is impacted by the way they are asked. The later is one reason that trial lawyers fight so hard on jury instructions and jury interrogatories.

Conclusion

Ralph_4-25-16Many lawyers are interested in this law-psych intersection and the benefits that might be gained by cross-pollination of knowledge. I have a life-long interest in psychology, and so do many others, some with advanced degrees. That includes my fellow predictive coding expert, Maura R. Grossman, an attorney who also has a Ph.D. in Clinical/School Psychology. A good discovery team can use all of the psychological insights it can get.

The myth of objectivity and the “Reasonable Man” in the law should be exposed. Many naive people still put all of their faith in legal rules and the operation of objective, unemotional logic. The system does no really work that way. Outsiders trying to automate the law are misguided. The Law is far more than logic and reason. It is more than the facts, the surrounding circumstances.nit is more than evidence. It is about people and by people. It is about emotion and empathy too. It is about fairness and equity. It’s prime directive is justice, not reason.

That is the key reason why AI cannot automate law, nor legal decision making. Judge Charles (“Terry”) Haight could be augmented and enhanced by smart machines, by AI, but never replaced. The role of AI in the Law is to improve our reasoning, minimize our schlemiel biases. But the robots will never replace lawyers and judges. In spite of the myth of the Reasonable Man, there is far more to law then reason and facts. I for one am glad about that. If it were otherwise the legal profession would be doomed.


What Chaos Theory Tell Us About e-Discovery and the Projected ‘Information → Knowledge → Wisdom’ Transition

May 20, 2016
Ralph and Gleick

Gleick & Losey meeting sometime in the future

This article assumes a general, non-technical familiarity with the scientific theory of Chaos. See James Gleick’s book, Chaos: making a new science (1987). This field of study is not usually discussed in the context of “The Law,” although there is a small body of literature outside of e-discovery. See: Chen, Jim, Complexity Theory in Legal Scholarship (Jurisdymanics 2006).

The article begins with a brief, personal recapitulation of the basic scientific theories of Chaos. I buttress my own synopsis with several good instructional videos. My explanation of the Mandelbrot Set and Complex numbers is a little long, I know, but you can skip over that and still understand all of the legal aspects. In this article I also explore the application of the Chaos theories to two areas of my current work:

  1. The search for needles of relevant evidence in large, chaotic, electronic storage systems, such as email servers and email archives, in order to find the truth, the whole truth, and nothing but the truth needed to resolve competing claims of what happened – the facts – in the context of civil and criminal law suits and investigations.
  2. The articulation of a coherent social theory that makes sense of modern technological life, a theory that I summarize with the words/symbols: Information → Knowledge → Wisdom. See Information → Knowledge → Wisdom: Progression of Society in the Age of Computers and the more recent, How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.”

Introduction to the Science of Chaos

Gleick’s book on Chaos provides a good introduction to the science of chaos and, even though written in 1987, is still a must read. For those who have read this long ago, like me, here is a good, short, 3:53, refresher video James Gleick on Chaos: Making a New Science (Open Road Media, 2011) below:

mandelbrot_youngA key leader in the Chaos Theory field is the late great French mathematician, Benoit Mandelbrot (1924-2010) (shown right). Benoit, a math genius who never learned the alphabet, spent most of his adult life employed by IBM. He discovered and named the natural phenomena of fractals. He discovered that there is a hidden order to any complex, seemingly chaotic system, including economics and the price of cotton. He also learned that this order was not causal and could not be predicted. He arrived at these insights by study of geometry, specifically the rough geometric shapes found everywhere in nature and mathematics, which he called fractals. The penultimate fractal he discovered now bears his name, The Mandelbrot Fractalshown in the computer photo below, and explained further in the video that follows.

Mandelbrot set

Look here for thousands of additional videos of fractals with zoom magnifications. You will see the recursive nature of self-similarity over varying scales of magnitude. The patterns repeat with slight variations. The complex patterns at the rough edges continue infinitely without repetition, much like Pi. They show the unpredictable element and the importance of initial conditions played out over time. The scale of the in-between dimensions can be measured. Metadata remains important in all investigations, legal or otherwise.

mandelbrot_equation

The Mandelbrot is based on a simple mathematical formula involving feedback and Complex Numbers: z ⇔ z2 + c. The ‘c’ in the formula stands for any Complex Number. Unlike all other numbers, such as the natural numbers one through nine – 1.2.3.4.5.6.7.8.9, the Complex Numbers do not exist on a horizontal number line. They exist only on an x-y coordinate time plane where regular numbers on the horizontal grid combine with so-called Imaginary Numbers on the vertical grid. A complex number is shown as c= a + bi, where a and b are real numbers and i is the imaginary number. Complex_number_illustration

A complex number can be visually represented as a pair of numbers (a, b) forming a vector on a diagram called an Argand diagram, representing the complex plane. “Re” is the real axis, “Im” is the imaginary axis, and i is the imaginary number. And that is all there is too it. Mandelbrot calls the formula embarrassingly simple. That is the Occam’s razor beauty of it.

To understand the full dynamics of all of this remember what Imaginary Numbers are. They are a special class of numbers where a negative times a negative creates a negative, not a positive, like is the rule with all other numbers. In other words, with imaginary numbers -2 times -2 = -4, not +4. Imaginary numbers are formally defined as i2 = −1.

Thus, the formula z ⇔ z2 + c, can be restated as z ⇔ z2 + (a + bi).

The Complex Numbers when iterated according to this simple formula – subject to constant feedback – produce the Mandelbrot set.

mandelbrot

Mandelbrot_formulaThe value for z in the iteration always starts with zero. The ⇔ symbol stands for iteration, meaning the formula is repeated in a feedback loop. The end result of the last calculation becomes the beginning constant of the next: z² + c becomes the z in the next repetition. Z begins with zero and starts with different values for c. When you repeat the simple multiplication and addition formula millions of times, and plot it on a Cartesian grid, the Mandelbrot shape is revealed.

When iteration of a squaring process is applied to non-complex numbers the results are always known and predictable. For instance when any non-complex number greater than one is repeatedly squared, it quickly approaches infinity: 1.1 * 1.1 = 1.21 * 1.21 = 1.4641 * 1.4641 = 2.14358 and after ten iterations the number created is 2.43… * 10 which written out is 2,430,000,000,000,000,000,000,000,000,000,000,000,000,000. A number so large as to dwarf even the national debt. Mathematicians say of this size number that it is approaching infinity.

The same is true for any non-complex number which is less than one, but in reverse; it quickly goes to the infinitely small, the zero. For example with .9: .9.9=.81; .81.81=.6561; .6561.6561=.43046 and after only ten iterations it becomes 1.39…10 which written out is .0000000000000000000000000000000000000000000000139…, a very small number indeed.

With non-complex numbers, such as real, rational or natural numbers, the squaring iteration must always go to infinity unless the starting number is one. No matter how many times you square one, it will still equal one. But just the slightest bit more or less than one and the iteration of squaring will attract it to the infinitely large or small. The same behavior holds true for complex numbers: numbers just outside of the circle z = 1 on the complex plane will jump off into the infinitely large, complex numbers just inside z = 1 will quickly square into zero.

The magic comes by adding the constant c (a complex number) to the squaring process and starting from z at zero: z ⇔ z² + c. Then stable iterations – a set attracted to neither the infinitely small or infinitely large – become possible. The potentially stable Complex numbers lie both outside and inside of the circle of z = 1; specifically on the complex plane they lie between -2.4 and .8 on the real number line, the horizontal x grid, and between -1.2 and +1.2 on the imaginary line, the vertical y grid. These numbers are contained within the black of the Mandelbrot fractal.

Mandelbrot_grid

In the Mandelbrot formula z ⇔ z² + c, where you always start the iterative process with z equals zero, and c equaling any complex number, an endless series of seemingly random or chaotic numbers are produced. Like the weather, the stock market and other chaotic systems, negligible changes in quantities, coupled with feedback, can produce unexpected chaotic effects. The behavior of the complex numbers thus mirrors the behavior of the real world where Chaos is obvious or lurks behind the most ordered of systems.

With some values of ‘c’ the iterative process immediately begins to exponentially increase or fall into infinity. These numbers are completely outside of the Mandelbrot set. With other values of ‘c’ the iterative process is stable for a number of repetitions, and only later in the dynamic process are they attracted to infinity. These are the unstable strange attractor numbers just on the outside edge of the Mandelbrot set. They are shown on computer graphics with colors or shades of grey according to the number of stable iterations. The values of ‘c’ which remain stable, repeating as a finite number forever, never attracted to infinity, and thus within the Mandelbrot set, are plotted as black.

Mandel_Diagram

Some iterations of complex numbers like 1 -1i run off into infinity from the start, just like all of the real numbers. Other complex numbers are always stable like -1 +0i. Other complex numbers stay stable for many iterations, and then only further into the process do they unpredictably begin to start to increase or decrease exponentially (for example, .37 +4i stays stable for 12 iterations). These are the numbers on the edge of inclusion of the stable numbers shown in black.

Chaos enters into the iteration because out of the potentially infinite number of complex numbers in the window of -2.4 to .8 along the horizontal real number axis, and -1.2 to 1.2 along the vertical imaginary number axis. There are an infinite subset of such numbers on the edge, and they cannot be predicted in advance. All that we know about these edge numbers is that if the z produced by any iteration lies outside of a circle with a radius of 2 on the complex plane, then the subsequent z values will go to infinity, and there is no need to continue the iteration process.

By using a computer you can escape the normal limitations of human time. You can try a very large number of different complex numbers and iterate them to see what kind they may be, finite or infinite. Under the Mandelbrot formula you start with z equals zero and then try different values for c. When a particular value of c is attracted to infinity – produces a value for z greater than 2 – then you stop that iteration, go back to z equals zero again, and try another c, and so on, over and over again, millions and millions of times as only a computer can do.

Mandel_zoom_08_satellite_antennaMandelbrot was the first to discover that by using zero as the base z for each iteration, and trying a large number of the possible complex numbers with a computer on a trial and error basis, that he could define the set of stable complex numbers graphically by plotting their location on the complex plane. This is exactly what the Mandelbrot figure is. Along with this discovery came the surprise realization of the beauty and fractal recursive nature of these numbers when displayed graphically.

The following Numberphile video by Holly Krieger, an NSF postdoctoral fellow and instructor at MIT, gives a fairly accessible, almost cutesy, yet still technically correct explanation to the Mandelbrot set.

Fractals and the Mandelbrot set are key parts of the Chaos theories, but there is much more to it than that. Chaos Theory impacts our basic Newtonian, cause-effect, linear world view of reality as a machine. For a refresher on the big picture of the Chaos insights and how the old linear, Newtonian, machine view of reality is wrong, look at this short summary: Chaos Theory (4:48)

Anther Chaos Theory instructional applying the insights to psychology is worth your view. The Science and Psychology of the Chaos Theory (8:59, 2008). It suggests the importance of spontaneous actions in the moment, the so-called flow state.

Also see High Anxieties – The Mathematics of Chaos (59:00, BBC 2008) concerning Chaos Theories, Economics and the Environment, and Order and Chaos (50:36, New Atlantis, 2015).

Application of Chaos Theories to e-Discovery

The use of feedback, iteration and algorithmic processes are central to work in electronic discovery. For instance, my search methods to find relevant evidence in chaotic systems follow iterative processes, including continuous, interactive, machine learning methods. I use these methods to find hidden patterns in the otherwise chaotic data. An overview of the methods I use in legal search is summarized in the following chart. As you can see, steps four, five and six iterate. These are the steps where human computer interactions take place. 
predictive_coding_3.0

My methods place heavy reliance on these steps and on human-computer interaction, which I call a Hybrid process. Like Maura Grossman and Gordon Cormack, I rely heavily on high-ranking documents in this Hybrid process. The primary difference in our methods is that I do not begin to place a heavy reliance on high-ranking documents until after completing several rounds of other training methods. I call this four cylinder multimodal training. This is all part of the sixth step in the 8-step workflow chart above. The four cylinders search engines are: (1) high ranking, (2) midlevel ranking or uncertain, (3) random, and (4) multimodal (including all types of search, such as keyword) directed by humans.

Analogous Application of Similar Mandelbrot Formula For Purposes of Expressing the Importance of the Creative Human Component in Hybrid 

4-5-6-only_predictive_coding_3.0

Recall Mandelbrot’s formula: z ⇔ z² + c, which is the same as z ⇔ z2 + (a + bi). I have something like that going on in my steps four, five and six. If you plugged the numbers of the steps into the Mandelbrot formula it would read something like this: 4 ⇔ 4² + (5+6i). The fourth step is the key AI Predictive Ranking step, where the algorithm ranks the probable relevance of all documents. The fourth step of computer ranking is the whole point of the formula, so AI Ranking here I will call ‘z‘ and represents the left side of the formula. The fifth step is where humans read documents to determine relevance, let’s call that ‘r‘ and the sixth step is where human’s train the computer, ‘t‘. This is the Hybrid Active Training step where the four cylinder multimodal training methods are used to select documents to train the whole set. The documents in steps five and six, r and t are added together for relevance feedback, (r + ti).

Thus, z ⇔ z² + c, which is the same as z ⇔ z2 + (a + bi), becomes under my system z ⇔ z + (r + ti). (Note: I took out the squaring, z², because there is no such exponential function in legal search; it’s all addition.) What, you might ask, is the i in my version of the formula? This is the critical part in my formula, just as it is in Mandelbrot’s. The imaginary number – i – in my formula version represents the creativity of the human conducting the training.

The Hybrid Active Training step is not fully automated in my system. I do not simply use the highest ranking documents to train, especially in the early rounds of training, as do some others. I use a variety of methods in my discretion, especially the multimodal search methods such a keywords, concept search, and the like. In text retrieval science this use of human discretion, human creativity and judgment, is called an ad hoc search. It contrasts with fully automated search, where the text retrieval experts try to eliminate the human element. See Mr EDR for more detail on 2016 TREC Total Recall Track that had both ad hoc and fully automated sections.

My work with legal search engines, especially predictive coding, has shown that new technologies do not work with the old methods and processes, such as linear review or keyword alone. New processes are required that employ new ways of thinking. The new methods that link creative human judgments (i) and the computer’s amazing abilities at text reading speed, consistency, analysis, learning and ranking (z).

A rather Fat Cat. My latest processes, Predictive Coding  3.0, are variations of Continuous Active Training (CAT) where steps four, five and six iterate until the project is concluded. Grossman & Cormack call this Continuous Active Learning or CAL, and they claim Trademark rights to CAL. I respect their right to do so (no doubt they grow weary of vendor rip-offs) and will try to avoid the acronym henceforth. My use of the acronym CAT essentially takes the view of the other side, the human side that trains, not the machine side that learns. In both Continuous Active Learning and CAT the machine keeps learning with every document that a human codes. Continuous Active Learning or Training, makes the linear seed-set method obsolete, along with the control set and random training documents. See Losey, Predictive Coding 3.0.

In my typical implementation of Continuous Active Training I do not automatically include every document coded as a training document. This is the sixth training step (‘t‘ in the prior formula). Instead of automatically using every document to train that has been coded relevant or irrelevant, I select particular documents that I decide to use to train. This, in addition to multimodal search in step six, Hybrid Active, is another way in which the equivalent of Imaginary Numbers come into my formula, the uniquely human element (ti). I typically use most every relevant document coded in step five, the ‘r‘ in the formula, as a training document, but not all. z ⇔ z + (r + ti)

I exercise my human judgment and experience to withhold certain training documents. (Note, I never withhold hot trainers (highly relevant documents)). I do this if my experience (I am tempted to say ‘my imagination‘) suggests that including them as training documents will likely slow down or confuse the algorithm, even if temporarily. I have found that this improves efficiency and effectiveness. It is one of the techniques I used to win document review contests.

robot-friendThis kind of intimate machine communication is possible because I carefully observe the impact of each set of training documents on the classifying algorithm, and carryover lessons – iterate – from one project to the next. I call this keeping a human in the loop and the attorney in charge of relevance scope adjudications. See Losey, Why the ‘Google Car’ Has No Place in Legal Search. We humans provide experienced observation, new feedback, different approaches, empathy, play and emotion. We also add a whole lot of other things too. The AI-Robot is the Knowledge fountain. We are the Wisdom fountain.That it is why we should strive to progress into and through the Knowledge stage as soon as possible. We will thrive in the end-goal Wisdom state.

Application of Chaos Theory to Information→Knowledge→Wisdom

mininformation_arrowsThe first Information stage of the post-computer society in which we live is obviously chaotic. It is like the disconnected numbers that lie completely outside of the Mandelbrot set. It is pure information with only haphazard meaning. It is often just misinformation. Just exponential. There is an overwhelming deluge of such raw information, raw data, that spirals off into an infinity of dead-ends. It leads no where and is disconnected. The information is useless. You may be informed, but to no end. That is modern life in the post-PC era.

The next stage of society we seek, a Knowledge based culture, is geometrically similar to the large black blogs that unite most of the figure. This is the finite set of numbers that provide all connectivity in the Mandelbrot set. Analogously, this will be a time when many loose-ends will be discarded, false theories abandoned, and consensus arise.

In the next stage we will not only be informed, we will be knowledgable. The information we all be processed. The future Knowledge Society will be static, responsible, serious and well fed. People will be brought together by common knowledge. There will be large scale agreements on most subjects. A tremendous amount of diversity will likely be lost.

After a while a knowledgable world will become boring. Ask any professor or academic.  The danger of the next stage will be stagnation, complacency, self-satisfaction. The smug complacency of a know-it-all world. This may be just as dangerous as the pure-chaos Information world in which we now live.

If society is to continue to evolve after that, we will need to move beyond mere Knowledge. We will need to challenge ourselves to attain new, creative applications of Knowledge. We will need to move beyond Knowledge into Wisdom.

benoit-mandelbrot-seahorse-valleyI am inclined to think that if we ever do progress to a Wisdom-based society, we will be a place and time much like the unpredictable fractal edges of the Mandelbrot. Stable to a point, but ultimately unpredictable, constantly changing, evolving. The basic patterns of our truth will remain the same, but they will constantly evolve and be refined. The deeper we dig, the more complex and beautiful it will be. The dry sameness of a Knowledgable based world will be replaced by an ever-changing flow, by more and more diversity and individuality. Our social cohesivity will arise from recursivity and similarity, not sameness and conformity. A Wisdom based society will be filled with fractal beauty. It will live ever zigzagging between the edge of the known and unknown. It will also necessarily have to be a time when people learn to get along together and share in prosperity and health, both physical and mental. It will be a time when people are accustomed to ambiguities and comfortable with them.

In Wisdom World knowledge itself will be plentiful, but will be held very lightly. It will be subject to constant reevaluation. Living in Wisdom will be like living on the rough edge of the Mandelbrot. It will be a culture that knows infinity firsthand. An open, peaceful, ecumenical culture that knows everything and nothing at the same time. A culture where most of the people, or at least a strong minority, have attained a certain level of personal Wisdom.

Conclusion

Back to our times, where we are just now discovering what machine learning can do, we are just beginning to pattern our investigations, our search for truth, in the Law and elsewhere, on new information gleaned from the Chaos theories. Active machine learning, Predictive Coding, is a natural outgrowth of Chaos Theory and the Mandelbrot Set. The insights of hidden fractal order that can only be seen by repetitive computer processes are prevalent in computer based culture. These iterative, computer assisted processes have been the driving force behind thousands of fact investigations that I have conducted since 1980.

I have been using computers to help me in legal investigations since 1980. The reliance on computers at first increased slowly, but steadily. Then from about 2006 to 2013 the increase accelerated and peaked in late 2013. The shift is beginning to level off. We are still heavily dependent on computers, but now we understand that human methods are just as important as software. Software is limited in its capacities without human additive, especially in legal search. Hybrid, Man and Machine, that is the solution. But remember that the focus should be on us, human lawyers and search experts. The AIs we are creating and training should be used to Augment and Enhance our abilities, not replace them. They should complement and complete us.

butterfly_effectThe converse realization of Chaos Theory, that disorder underlies all apparent order, that if you look closely enough, you will find it, also informs our truth-seeking investigatory work. There are no smooth edges. It is all rough. If you look close enough the border of any coastline is infinite.

The same is true of the complexity of any investigation. As every experienced lawyer knows, there is no black and white, no straight line. It always depends on so many things. Complexity and ambiguity are everywhere. There is always a mess, always rough edges. That is what makes the pursuit of truth so interesting. Just when you think you have it, the turbulent echo of another butterfly’s wings knock you about.

The various zigs and zags of e-discovery, and other investigative, truth-seeking activities, are what make them fascinating. Each case is different, unique, yet the same patterns are seen again and again with recursive similarity. Often you begin a search only to have it quickly burn out. No problem, try again. Go back to square one, back to zero, and try another complex number, another clue. Pursue a new idea, a new connection. You chase down all reasonable leads, understanding that many of them will lead nowhere. Even failed searches rule out negatives and so help in the investigation. Lawyers often try to prove a negative.

The fractal story that emerges from Hybrid Multimodal search is often unexpected. As the search matures you see a bigger story, a previously hidden truth. A continuity emerges that connects previously unrelated facts. You literally connect the dots. The unknown complex numbers – (a + bi) – the ones that do not spiral off into the infinite large or small, do in fact touch each other when you look closely enough at the spaces.

z ⇔ z2 + (a + bi)

SherlockI am no Sherlock, but I know how to find ESI using computer processes. It requires an iterative sorting processes, a hybrid multimodal process, using the latest computers and software. This process allows you to harness the infinite patience, analytics and speed of a machine to enhance your own intelligence ……. to augment your own abilities. You let the computer do the boring bits, the drudgery, while you do the creative parts.

The strength comes from the hybrid synergy. It comes from exploring the rough edges of what you think you know about the evidence. It does not come from linear review, nor simple keyword cause-effect. Evidence is always complex, always derived from chaotic systems. A full multimodal selection of search tools is needed to find this kind of dark data.

The truth is out there, but sometimes you have to look very carefully to find it. You have to dig deep and keep on looking to find the missing pieces, to move from Information → Knowledge → Wisdom.

_______

______

_____

____

___

__

_

.

Mandelbrot_zoom

.

_

.

blue zoom Mandelbrot fractal animation of looking deeper into the details

.

.


e-Discovery Team’s Best Practices Education Program

May 8, 2016

EDBP_BANNER

EDBP                   Mr.EDR         Predictive Coding 3.0
59 TAR Articles
Doc Review  Videos

_______

TEAM_TRAINING_screen_shot

e-Discovery Team Training

Information → Knowledge → Wisdom

Ralph_4-25-16Education is the clearest path from Information to Knowledge in all fields of contemporary culture, including electronic discovery. The above links take you to the key components of the best-practices teaching program I have been working on since 2006. It is my hope that these education programs will help move the Law out of the dangerous information flood, where it is now drowning, to a safer refuge of knowledge. Information → Knowledge → Wisdom: Progression of Society in the Age of Computers; and How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.” For more of my thoughts on e-discovery education, see the e-Discovery Team School Page.

justice_guage_negligenceThe best practices and general educational curriculum that I have developed over the years focuses on the legal services provided by attorneys. The non-legal, engineering and project management practices of e-discovery vendors are only collaterally mentioned. They are important too, but students have the EDRM and other commercial organizations and certifications for that. Vendors are part of any e-Discovery Team, but the programs I have developed are intended for law firms and corporate law departments.

LIFE_magazine_Losey_acceleratesThe e-Discovery Team program, both general educational and legal best-practices, is online and available 24/7. It uses lots of imagination, creative mixes, symbols, photos, hyperlinks, interactive comments, polls, tweets, posts, news, charts, drawings, videos, video lectures, slide lectures, video skits, video slide shows, music, animations, cartoons, humor, stories, cultural themes and analogies, inside baseball references, rants, opinions, bad jokes, questions, homework assignments, word-clouds, links for further research, a touch of math, and every lawyer’s favorite tools: words (lots of them), logic, arguments, case law and precedent.

All of this to try to take the e-Discovery Team approach from just information to knowledge →. In spite of these efforts, most of the legal community still does not know e-discovery very well. What they do know is often misinformation. Scenes like the following in a law firm lit-support department are all too common.

supervising-tipsThe e-Discovery Team’s education program has an emphasis on document review. That is because the fees for lawyers reviewing documents is by far the most expensive part of e-discovery, even when contract lawyers are used. The lawyer review fees, and review supervision fees, including SME fees, have always been much more costly than all vendor costs and expenses put together. Still, the latest AI technologies, especially active machine learning using our Predictive Coding 3.0 methods, are now making it possible to significantly reduce review fees. We believe this is a critical application of best practices. The three steps we identify for this area in the EDBP chart are shown in green, to signify money. The reference to C.A. Review is to Computer Assisted Review or CAR, using our Hybrid Multimodal methods.

EDBP_detail_LARGE

____

Predictive Coding 3.0 Hybrid Multimodal Document Search and Review

Control-SetsOur new version 3.0 techniques for predictive coding makes it far easier than ever before to include AI in a document review project. The secret control set has been eliminated, so too has the seed set and SMEs wasting their time reviewing random samples of mostly irrelevant junk. It is a much simpler technique now, although we still call it Hybrid Multimodal.

robot-friendHybrid is a reference to the Man/Machine interactive nature of our methods. A skilled attorney uses a type of continuous active learning to train an AI to help them to find the documents they are looking for. This Hybrid method greatly augments the speed and accuracy of the human attorneys in charge. This leads to cost savings and improved recall. A lawyer with an AI helper at their side is far more effective than lawyers working on their own. This means that every e-discovery team today could use a robot like Kroll Ontrack’s Mr. EDR to help them to do document review.

Search_pyramidMultimodal is a reference to the use of a variety of search methods to find target documents, including, but not limited to, predictive coding type ranked searches. We encourage humans in the loop running a variety of searches of their own invention, especially at the beginning of a project. This always makes for a quick start in finding relevant and hot documents. Why the ‘Google Car’ Has No Place in Legal Search. The multimodal approach also makes for precise, efficient reviews with broad scope. The latest active machine learning software when fully integrated with a full suite of other search tools is attaining higher levels of recall than ever before. That is one reason Why I Love Predictive Coding.

Mr_EDRI have found that Kroll Ontrack’s EDR software is ideally suited for these Hybrid, Multimodal techniques. Try using it on your next large project and see for yourself. The Kroll Ontrack consultant specialists in predictive coding, Jim and Tony, have been trained in this method (and many others). They are well qualified to assist you in every step of the way and their rates are reasonable. With you calling the shots on relevancy, they can do most of the search work for you and still save your client’s money. If the matter is big and important enough, then, if I have a time opening, and it clears my firm’s conflicts, I can also be brought in for a full turn-key operation. Whether you want to include extra time for training your best experts is your option, but our preference.

Team_TREC_2

__________

Embrace e-Discovery Team Education to Escape Information Overload

____


Five Reasons You Should Read the ‘Practical Law’ Article by Maura Grossman and Gordon Cormack called “Continuous Active Learning for TAR”

April 11, 2016

Maura-and-Gordon_Aug2014There is a new article by Gordon Cormack and Maura Grossman that stands out as one of their best and most accessible. It is called Continuous Active Learning for TAR (Practical Law, April/May 2016). The purpose of this blog is to get you to read the full article by enticing you with some of the information and knowledge it contains. But before we go into the five reasons, we will examine the purpose of the article, which aligns with our own, and touch on the differences between their trademarked TAR CAL method and our CAR Hybrid Multimodal method. Both of our methods use continuous, active learning, the acronym for which, CAL, they now claim as a Trademark. Since they clearly did invent the acronym, CAL, we for one will stop using it – CAL – as a generic term.

The Legal Profession’s Remarkable Slow Adoption of Predictive Coding

The article begins with the undeniable point of the remarkably slow adoption of TAR by the legal profession, in their words:

Adoption of TAR has been remarkably slow, considering the amount of attention these offerings have received since the publication of the first federal opinion approving TAR use (see Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012)).

Winners in Federal CourtI remember getting that landmark ruling in our Da Silva Moore case, a ruling that pissed off plaintiffs’ counsel, because, despite what you may have heard to the contrary, they were strenuously opposed to predictive coding. Like most other lawyers at the time who were advocating for advanced legal search technologies, I thought Da Silva would open the flood gates, that it would encourage attorneys to begin using the then new technology in droves. In fact, all it did was encourage the Bench, but not the Bar. Judge Peck’s more recent ruling on the topic contains a good summary of the law. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015). There were a flood  of judicial rulings approving predictive coding all around the country, and lately, around the world. See Eg. Pyrrho Investments v MWB PropertyEWHC 256 (Ch) (2/26/16).

The rulings were followed in private arbitration too. For instance, I used the Da Silva More ruling a few weeks after it was published to obtain what was apparently the first ruling by an arbitrator in AAA approving use of predictive coding. The opposition to our use of cost-saving technology in that arbitration case was again fierce, and again included personal attacks, but the arguments for use in arbitration are very compelling. Discovery in arbitration is, after all, supposed to be constrained and expedited.

IT_GovernanceAfter the Da Silva Moore opinion, Maura Grossman and I upped our speaking schedule (she far more than me), and so did several tech-minded judges, including Judge Peck (although never at the same events as me, until the cloud of false allegations created by a bitter plaintiff’s counsel in Da Silva Moore could be dispelled). At Legal Tech for the next few years Predictive Coding is all anybody wanted to talk about. Then IG, Information Governance, took over as the popular tech-child of the day. In 2015 we had only a few predictive coding panels at Legal Tech, but they were well attended.

The Grossman Cormack speculates that the cause of the remarkably slow adoption is:

The complex vocabulary and rituals that have come to be associated with TAR, including statistical control sets, stabilization, F1 measure, overturns, and elusion, have dissuaded many practitioners from embracing TAR. However, none of these terms, or the processes with which they are associated, are essential to TAR.

Control-SetsWe agree. The vendors killed what could have been their golden goose with all this control set nonsense and their engineers love of complexity and misunderstanding of legal search. I have ranted about this before. See Predictive Coding 3.0. I will not go into that again here, except to say the statistical control set nonsense that had large sampling requirements was particularly toxic. It was not only hard and expensive to do, it led to mistaken evaluations of the success or failure of projects because it ignored the reality of the evolving understand of relevance, so called concept drift. Another wrong turn involved the nonsense of using only random selection to find training documents, a practice that Grossman and I opposed vigorously. See Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One,  Part Two,  Part Three, and Part Four. Grossman and Cormack correctly criticize these old vendor driven approaches in Continuous Active Learning for TAR. They call them SAL and SPL protocols (a couple of acronyms that no one wants to trademark!).

Bottom line, the tide is changing. Over the last several years the few private attorneys who specialize in legal search, but are not employed by a vendor, have developed simpler methods. Maura and I are just the main ones writing and speaking about it, but there are many others who agree. Many have found that it is counter-productive to use control sets, random input, non-continuous training with its illogical focus on the seed set, and misleading recall point projections.

grossman_cormack_filteredWe do so in defiance of the vendor establishment and other self-proclaimed pundits in this area who benefitted by such over-complexity. Maura and Gordon, of course, have their own software (Gordon’s creation), and so never needed any vendors to begin with. Not having a world renowned information scientist like Professor Cormack as my life partner, I had no choice but to rely on vendors for their software. (Not that I complaining, mind you. I’m married to a mental health counselor, and it does not get any better than that!)

MrEdr_CapedAfter a few years I ultimately settled on one vendor, Kroll Ontrack, but I continue to try hard to influence all vendors. It is a slow process. Even Kroll Ontrack’s software, which I call Mr. EDR, still has control set functions built in. Thanks to my persistence, it is easy to turn off these settings and do things my way, with no secret control sets and false recall calculations. Hopefully soon that will be the default setting. Their eyes have been opened. Hopefully all of the other major vendors will soon follow suit.

All of the Kroll Ontrack experts in predictive coding are now, literally, a part of my Team. They are now fully trained and believers in the simplified methods, methods very similar to those of Grossman and Cormack, albeit, as I will next explain, slightly more complicated. We proved how well these methods worked at TREC 2015 when the Kroll Ontrack experts and I did 30 review projects together in 45 days. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and  (web page with short summary). Also see – Mr. EDR with background information on the Team’s participation in the TREC 2015 Total Recall Track.

We Agree to Disagree with Grossman and Cormack on One Issue, Yet We Still Like Their Article

Team_TRECWe are fans of Maura Grossman and Gordon Cormack’s work, but not sycophants. We are close, but not the same; colleagues, but not followers. For those reasons we think our recommendation for you to read this article means more than a typical endorsement. We can be critical of their writings, but, truth is, we liked their new article, although we continue to dislike the name TAR (not important, but we prefer CAR). Also, and this is of some importance, my whole team continues to disagree with what we consider the somewhat over-simplified approach they take to finding training documents, namely reliance on the highest ranking documents alone.

LogisticRegressionWindowLogisticFitChart6Despite what some may think, the high-ranking approach does eventually find a full diversity of relevant documents. All good predictive coding software today pretty much uses some type of logistic regression based algorithms that are capable of building out probable relevance in that way. That is one of the things we learned by rubbing shoulders with text retrieval scientists from around the world at TREC when participating in the 2015 Total Recall Track that Grossman and Cormack helped administer. This regression type of classification system works well to avoid the danger of over-training on a particular relevancy type. Grossman and Cormack have proven that before to our satisfaction (so have our own experiments), and they again make a convincing case for this approach in this article.

4_Cylinder_engineStill, we disagree with their approach of only using high-ranking documents for training, but we do so on the grounds of efficiency and speed, not effectiveness. The e-Discovery Team continues to advocate a Hybrid Multimodal approach to active machine learning. We use what I like to call a four-cylinder type of CAR search engine, instead of one-cylinder, like they do.

  1. High-ranking documents;
  2. Mid-level, uncertain documents;
  3. A touch, a small touch, of random documents; and,
  4. Human ingenuity found documents, using all type of search techniques (multimodal) that seem appropriate to the search expert in charge, including keyword, linear, similarity (including chains and families), concept (including passive machine learning, clustering type search).

Predictive Coding 3.0 – The method is here described as an eight-part work flow (Step 6 – Hybrid Active Training).

The latest Grossman and Cormack’s versions of CAL (their trademark) only uses the highest-ranking documents for active training. Still, in spite of this difference, we liked their article and recommend you read it.

The truth is, we also emphasize the high-probable relevant documents for training. The difference between us is that we use the three other methods as well. On that point we agree to disagree. To be clear, we are not talking about continuous training or not, we agree on that. We are not talking about active training, or not (passive), we agree on that. We are not talking about using what they call using SAL or SPL protocols (read their article for details), we agree with them that these protocols are ineffective relics invented by misguided vendors. We are only talking about a difference in methods to find documents to use to train the classifier. Even that is not a major disagreement, as we agree with Grossman and Cormack that high-ranking documents usually make the best trainers, just not in the first seed set. There are also points in a search, depending on the project, where the other methods can help you get to the relevant documents in a fast, efficient manner. The primary difference between us is that we do not limit ourselves to that one retrieval method like Grossman and Cormack do in their trademarked CAL methodology.

Cormack and Grossman emphasize simplicity, ease of use, and reliance on the software algorithms as another way to try to overcome the Bar’s continued resistance to TAR. The e-Discovery Team has the same goal, but we do not think it is necessary to go quite that far for simplicity sake. The other methods we use, the other three cylinders, are not that difficult and have many advantages. e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF and web page with short  summary). Put another way, we like the ability of fully automatic driving from time to time, but we want to keep an attorney’s learned hand at or near the wheel at all times. See Why the ‘Google Car’ Has No Place in Legal Search.

Accessibility with Integrity: The First Reason We Recommend the Article

Professor Gordon Cormack

Here’s the first reason we like Grossman & Cormack’s article, Continuous Active Learning for TAR: you do not have to be one of Professor Cormac’s PhD students to understand it. Yes. It is accessible, not overly technical, and yet still has scientific integrity, still has new information, accurate information, and still has useful knowledge.

It is not easy to do both. I know because I try to make all of my technical writings that way, including the 57 articles I have written on TAR, which I prefer to call Predictive Coding, or CAR. I have not always succeeded in getting the right balance, to be sure. Some of my articles may be too technical, and perhaps some suffer from breezy information over-load and knowledge deficiency. Hopefully none are plain wrong, but my views have changed over the years. So have my methods. If you compare my latest work-flow (below) with earlier ones, you will see some of the evolution, including the new emphasis over the past few years with continuous training.

predictive_coding_revised_small_size

The Cormacks and I are both trying hard to get the word out to the Bar as to the benefits of using active machine learning in legal document review.  (We all agree on that term, active machine learning, and all agree that passive machine learning is not an acceptable substitute.) It is not easy to write on this subject in an accurate, yet still accessible and interesting manner. There is a constant danger that making a subject more accessible and simple will lead to inaccuracies and misunderstandings. Maura and Gordon’s latest article meets this challenge.

Search ImageTake for example the first description in the article of their continuous active training search method using highest ranking documents:

At the outset, CAL resembles a web search engine, presenting first the documents that are most likely to be of interest, followed by those that are somewhat less likely to be of interest. Unlike a typical search engine, however, CAL repeatedly refines its understanding about which of the remaining documents are most likely to be of interest, based on the user’s feedback regarding the documents already presented. CAL continues to present documents, learning from user feedback, until none of the documents presented are of interest.

That is a good way to start an article. The comparison with a Google search having continued refinement based on user feedback is well thought out; simple, yet accurate. It represents a description honed by literally hundreds of presentations on the topic my Maura Grossman. No one has talked more on this topic than her, and I for one intend to start using this analogy.

Rare Description of Algorithm Types – Our Second Reason to Recommend the Article

Another reason our Team liked Continuous Active Learning for TAR is the rare description of search algorithm types that it includes. Here we see the masterful touch of one of the world’s leading academics on text retrieval, Gordon Cormack. First, the article makes clear the distinction between effective analytic algorithms that truly rank documents using active machine learning, and a few other popular programs now out there that use passive learning techniques and call it advanced analytics.

The supervised machine-learning algorithms used for TAR should not be confused with unsupervised machine-learning algorithms used for clustering, near-duplicate detection, and latent semantic indexing, which receive no input from the user and do not rank or classify documents.

Old_CAR_stuck_mudThese other older, unsupervised search methods are what I call concept search. It is not predictive coding. It is not advanced analytics, no matter what some vendors may tell you. It is yesterday’s technology – helpful, but far from state-of-the-art. We still use concept search as part of multimodal, just like any other search tool, but our primary reliance to properly rank documents is placed on active machine learning.

hyperplanes3d_2The Cormack-Grossman article goes farther than pointing out this important distinction, it also explains the various types of bona fide active machine learning algorithms. Again, some are better than others. First Professor Cormack explains the types that have been found to be effective by extensive research over the past ten years or so.

Supervised machine-learning algorithms that have been shown to be effective for TAR include:

–  Support vector machines. This algorithm uses geometry to represent each document as a point in space, and deduces a boundary that best separates relevant from not relevant documents.

– Logistic regression. This algorithm estimates the probability of a document’s relevance based on the content and other attributes of the document.

Conversely Cormack explains:

Popular, but generally less effective, supervised machine-learning algorithms include:

– Nearest neighbor. This algorithm classifies a new document by finding the most similar training document and assuming that the correct coding for the new document is the same as its nearest neighbor.

– Naïve Bayes (Bayesian classifier). This algorithm estimates the probability of a document’s relevance based on the relative frequency of the words or other features it contains.

Ask your vendor which algorithms its software includes. Prepare yourself for double-talk.

Hot-or-Not

If you try out your vendors software and the Grossman-Cormack CAL method does not work for you, and even the e-Discovery Team’s slightly more diverse Hybrid Multimodal method does not work, then your software may be to blame. As Grossman-Cormack put it, where the phrase “TAR tool” means software:

[I]t will yield the best possible results only if the TAR tool incorporates a state-of-the-art learning algorithm.

That means software that uses a type of support vector machine and/or logistic regression.

Teaching by Example – Our Third Reason to Recommend the Article

The article uses a long example involving search of Jeb Bust email to show you how their CAL method works. This is an effective way to teach. We think they did a good job with this. Rather than spoil the read with quotes and further explanation, we urge you to check out the article to see for yourself. Yes, it is an oversimplification, after all this is a short article, but it is a good one, and is still accurate.

 Quality Control Suggestions – Our Fourth Reason to Recommend the Article

quality_diceAnother reason we like the article are the quality control suggestions it includes. They essentially speak of using other search methods, which is exactly what we do in Hybrid Multimodal. Here are their words:

To increase counsel’s confidence in the quality of the review, they might:

Review an additional 100, 1,000, or even more documents.

Experiment with additional search terms, such as “Steve Jobs,” “iBook,” or “Mac,” and examine the most-likely relevant documents containing those terms.

Invite the requesting party to suggest other keywords for counsel to apply.

Review a sample of randomly selected documents to see if any other documents of interest are identified.

We like this because it shows that the differences are small between the e-Discovery Team’s Hybrid Multimodal method (hey, maybe I should claim Trademark rights to Hybrid Multimodal, but then again, no vendors are using my phrase to sell their products) using continuous active training, and the Grossman-Cormack trademarked CAL method. We also note that their section on Measures of Success essentially mirrors our own thoughts on metric analysis and ei-Recall. Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part Three.

Article Comes With an Online “Do it Yourself” CAL Trial Kit – Our Fifth Reason to Recommend the Article

We are big believers in learning by doing. That is especially true in legal tasks that seem complicated in the abstract. I can write articles and give presentations that provide explanations of AI-Enhanced Review. You may get an intellectual understanding of predictive coding from these, but you still will not know how to do it. On the other hand, if we have a chance to show someone an entire project, have them shadow us, then they will really learn how it is done. It is like teaching a young lawyer how to try a case. For a price, we will be happy to do so (assuming conflicts clear).

Jeb_BushMaura and Gordon seem to agree with us on that learn by doing point and have created an online tool that anyone can use to try out their method. In allows for a search of the Jeb Bush email, the same set of 290,099 emails that we used in ten of the thirty topics in 2015 TREC. In their words:

There is no better way to learn CAL than to use it. Counsel may use the online model CAL system to see how quickly and easily CAL can learn what is of interest to them in the Jeb Bush email dataset. As an alternative to throwing up their hands over seed sets, control sets, F1 measures, stabilization, and overturns, counsel should consider using their preferred TAR tool in CAL mode on their next matter.

You can try out their method with their online tool, or in a real project using your vendor’s tool. By the way, we did that as part of our TREC 2015 experiments, and the Kroll Ontrack software worked about the same as theirs, even when we used their one-cylinder, high ranking only, CAL (their trademark) method.

Here is where you can find their CAL testing tool: cormack.uwaterloo.ca/cal. Those of you who are still skeptical can see for yourself how it works. You can follow the example given in the article about searching for documents relevant to Apple products, to verify their description of how that works. For even more fun, you can dream up your own searches.

030114-O-0000D-001 President George W. Bush. Photo by Eric Draper, White House.

Perhaps, if you try hard enough, you can find some example searches where their high-end only method, which is built into the test software, does not work well. For example, try finding all emails that pertain to, or in any way mention, the then President, George Bush. Try entering George Bush in the demo test and see for yourself what happens.

It becomes a search for George + Bush in the same document, and then goes from there based on your coding the highest ranked documents presented as either relevant or non-relevant. You will see that you quickly end up in a TAR pit. The word Bush is in every email (I think), so you are served up with every email where George is mentioned, and believe me, there are many Georges, even if there is only one President George Bush. Here is the screen shot of the first document presented after entering George Bush. I called it relevant.

Screen Shot 2016-04-10 at 4.13.24 PM

These kind of problem searches do not discredit TAR, or even the Grossman Cormack one-cylinder search method. If this happened to you in a real search project, you could always use our Hybrid Multimodal™ method for the seed set (1st training), or start over with a different keyword or keywords to start the process. You could, for instance, search for President Bush, or President within five of George, or “George Bush.” There are many ways, some faster and more effective than others.

Even using the single method approach, if you decided to use the keywords “President + Bush”, then the search will go quicker than “George + Bush.” Even just using the term “President” works better than George + Bush, but still seems like a TAR pit, and not a speeding CAR. It will probably get you to the same destination, high recall, but the journey is slightly longer and, at first, more tedious. This high recall result was verified in TREC 2015 by our Team, and by a number of Universities who participated in the fully automatic half of the Total Recall Track, including Gordon’s own team. This was all done without any manual review by the fully automatic participants because there was instant feedback of relevant or irrelevant based on a prejudged gold standard. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and (web page with short  summary). With this instant feedback protocol, all of the teams attained high recall and good precision. Amazing but true.

You can criticized this TREC experiment protocol, which we did in our report, as unrealistic to legal practice because:

(1) there is no SME who works like that (and there never will not be, until legal knowledge itself is learned by an AI); and,

(2) the searches presented as tasks were unrealistically over-simplistic. Id.

But you cannot fairly say that CAL (their trademark) does not work. The glass is most certainly not half empty. Moreover, the elixir in this glass is delicious and fun, especially when you use our Hybrid Multimodal™ method. See Why I Love Predictive Coding: Making document review fun with Mr. EDR and Predictive Coding 3.0.

Conclusion

Ralph_head_2016Active machine learning (predictive coding) using support vector or logistic regression algorithms, and a method that employs continuous active training, using either one cylinder (their CAL), or four (our Hybrid Multimodal), really works, and is not that hard to use. Try it out and see for yourself. Also, read the Grossman Cormack article, it only takes about 30 minutes. Continuous Active Learning for TAR (Practical Law, April/May 2016). Feel free to leave any comments below. I dare say you can even ask questions of Grossman or Cormack here. They are avid readers and will likely respond quickly.


Follow

Get every new post delivered to your Inbox.

Join 4,776 other followers

%d bloggers like this: