HEARTBLEED: A Lawyer’s Perspective On Cyber Liability and the Biggest Programming Error in History

April 22, 2014

heartbleed_SeggelmannThe Negligence of One Open Source Software Contributor Put Millions of Internet Users’ Confidential Data At Risk.

Is Open Source Software Appropriate for Major Security Applications?

In law we usually do not know who is to blame for a mistake. With Heartbleed we know exactly who made the error. It was a German programmer, Robin Seggelmann, who was a PhD student at the time. Assuming he is telling the truth, that this error was a mistake, not an intentional act of sabotage, Segglemann now apparently has the dubious distinction of having made the biggest computer programming error in history. Some journalist are calling Seggelmann the man who broke the Internet. That is an exaggeration, but I cannot think of a programming error that has ever had a bigger impact.

It was small oversight. Segglemann forgot to add a single line of code limiting the size of memory access to a feature called heartbeat (thus the nickname for the bug, heartbleed). Oops. These things can happen. Easy to understand. Hey, it was, after all, one minute before midnight on New Years eve 2011 when he submitted his work. I kid you not. Segglemann knew that another expert was going to check his work anyway, so why should  he be too concerned? Too bad the supervising expert missed the error too. Oops again. Oh well, that’s open source for you. Segglemann did not get paid for his work, and there may be no legal consequences for his gift to the world, a gift that many security experts call the worst thing to ever happen to Internet security.

Bruce Schneier, a leading digital security analyst that I follow, says that “‘Catastrophic’ is the right word. On the scale of one to 10, this is an 11.” Brean, How a programmer’s small error created Heartbleed — a secret back door to supposedly secure sites (National Post, 4/11/14). For more on Schneir’s thoughts on Heartbleed, see the Harvard Business Review interview of him by Scott Berinato. 

Rusty Foster wrote in The New Yorker that: “In the worst-case scenario, criminal enterprises, intelligence agencies, and state-sponsored hackers have known about Heartbleed for more than two years, and have used it to systematically access almost everyone’s encrypted data. If this is true, then anyone who does anything on the Internet has likely been affected by the bug.” Forbes cybersecurity columnist Joseph Steinberg wrote, “Some might argue that [Heartbleed] is the worst vulnerability found (at least in terms of its potential impact) since commercial traffic began to flow on the Internet.”

For details on Heartbleed see the Hacker News article by , HeartBleed Bug Explained – 10 Most Frequently Asked Questions (Hacker News, 4/14/14). Hacker News includes this good video explanation of the details by Fierce Outlaws:

Bottom line, this little programming error, which some experts refer to as a buffer over-read bug, has huge implications for the security of the Internet, and Android phones, which use the same protocol. It could affect almost every Internet user in the world, depending on who else knew about the mistake, and for how long. At this point, nobody knows. So far only one 19-year old Canadian hacker has been arrested for exploiting the bug, and one online railroad payment system in Russia has discovered it had been hacked. If all that were not bad enough, the security firm Mandiant claims to have found evidence of heartbleed based attacks on one of its client’s virtual private networks (outside of the Internet). Heartbleed vulnerability was used to get pass the firewall and gain access to the VPN.

The Heatbleed catastrophe has dramatically revealed that our current system of Internet security is primarily based  on an open source software called OpenSSL. Heartbleed has shown that the whole security of the Internet can depend on one unpaid volunteer like Segglemann, a lone PhD student in Münster, Germany, who had nothing better to do on New year’s eve than finish a freebie software coding project. No doubt he thought it would help his resume. Bad decision.

Something is terribly wrong when the whole Internet is vulnerable due to the mistake of one math student. This mistake should be a wake up call to change the system. I conclude this blog with a call for dialogue among security experts, open source experts, white-hat hackers, lawyers, the FBI, consumer advocates, and others, to come up with serious reforms to our current Internet security infrastructure, including especially reforms of OpenSSL as an organization, and to do so by the end of this year. The public trust in the security of the Internet cannot withstand another Heartbleed, especially if it turns out that thousands, perhaps millions, have been injured. (We already have reports that the hack of the Russian railroad website allowed 10,000 credit card accounts to be stolen.)


Seggelmann claims it was just a mistake. In his words, a trivial error. He seems kind of blasé about it in his only interview to date, a short talk with an Australian journalist. The interview is quoted in full below. In fairness, I do not think Seggelmann realized the implications of his error at the time he spoke. (He has stopped talking now, no doubt on advice of legal counsel, and his current employer, Deutsche Telekom.)

Seggelmann denies that he was paid to do this by the NSA or anyone else. Most of the articles on him written to date just take his word for it. Oddly enough, most writers even express sympathy for Segglemann. The response you see in Huffington Post is typical: “You could blame the author, but he did this work for free, for the community, and with the best of intentions. ” Oh really. How do you know that? Because he said so? I am tempted to say something about naive bleeding heart liberals, but have been accused of being one myself, and besides, it is a bad pun, so I will not.

I hope Robin Seggelmann is telling the truth true too, but have been a lawyer far too long to believe anything a person in his position now says. Plus, the circumstance of posting such important code just a minute before New Years is clearly indicative of carelessness. If the NSA was behind it, and Bloomberg has reported that they have known about the defect for years, then I expect we will know that soon enough from Snowden. If someone else was, we may never know.

Are Seggelmann or OpenSSL Liable for any Damages that Heartbleed May Cause? 

Judicial_HellholesEven if this was just a mistake, not fraud, major errors like this have consequences, but for whom? Innocent users of websites operating this code for over two years may already have been victimized. We do not know yet what damages this mistake may cause, but we already have had reports on one arrest in Canada, and one theft of credit card numbers in Russia.

That is just the tip of the iceberg. Who will be responsible for the damages caused to so many? Will it be Seggelmann himself, or perhaps the not-for-profit open source group, OpenSSL, that he did this work for as an unpaid volunteer? Although I have no sympathy for a person whose negligence has caused such havoc, I doubt Seggelmann will ever be forced to reimburse anyone for the harm he has caused. 

Perhaps the operators of websites who told their users that their website was secure? Probably not them either, but it may be a closer question. This may be a rare situation where there is no remedy for people damaged by another’s negligence. It will depend on the facts, the as yet unknown details. But, rest assured, much more of the truth will come out in due time. I fully expect some lawyer, somewhere, will file suit when damaged victims appear, or maybe even before.

It will probably be difficult to hold OpenSSL liable for a number of reasons. First or all, who or what is OpenSSL? It appears to be a type of legal entity that we would call in the U.S. an unincorporated association. It is often treated something like a partnership in U.S. law.

According to the Washington Post, OpenSSL‘s headquarters — to the extent one exists at all — is the home the group’s only employee, a part timer at that, located on Sugarloaf Mountain, Maryland. He lives and works amid racks of servers and an industrial-grade Internet connection. Craig Timberg, Heartbleed bug puts the chaotic nature of the Internet under the magnifying glass (Washington Post, 4/9/14).

You cannot make up stuff like this. Truth is always stranger than fiction. Tineberg’s article also reports that the software that serves as the backbone for security on the Internet has, due to the lack of personnel and funds of OpenSSL, never been though a security audit, a meticulous process that involves testing the software for vulnerabilities.

Here is what OpenSSL has to say about themselves:

The OpenSSL Project is a collaborative effort to develop a robust, commercial-grade, full-featured, and Open Source toolkit implementing the Secure Sockets Layer … managed by a worldwide community of volunteers that use the Internet to communicate, plan, and develop the OpenSSL toolkit and its related documentation.  . . .  [Y]ou are free to get and use it for commercial and non-commercial purposes.  . . .

The OpenSSL project is volunteer-driven. We do not have any specific requirement for volunteers other than a strong willingness to really contribute while following the projects goal. The OpenSSL project is formed by a development team, which consists of the current active developers and other major contributors. Additionally a subset of the developers form the OpenSSL core team which globally manages the OpenSSL project.

Https_symbolThis is the way most open source software works. As open source software goes, OpenSSL is one of the most successful in the world (well, it was, until this whole catastrophe thing). Their product, OpenSSL, was, and still is, the world’s most popular open source cryptographic library. It is used to encrypt most of the traffic on the Internet. About two-thirds of web sites with an “S” at the end of the HTTP address use this freebie software. 

Seggelmann had an accomplice of sorts at OpenSSL, although I do not mean to imply any type of conspiracy by use of this word. There is no evidence of that. But I am sure people will look into that possibility, not only government investigators, but also private eyes, especially if and when they are motivated by the kind of mixed greed and fear incentives that only law suits can bring. The appearances now all suggest that the double-checker just happened to miss the trivial error too. OpenSSL, like most good open source projects, has quality control procedures. Proposed code contributions are double checked for mistakes by a senior contributor to OpenSSL before they are accepted.

In this case Seggelmann’s work was checked by Stephen Henson. He is a freelance crypto consultant (his words) based in the U.K. who has a PhD in Mathematics. He is still listed by OpenSSL as one of only four core team members of this open source group. That’s right, four people; one of them the part-time employee who works out his home on a mountain that serves as the group’s headquarters.

So after two people looked over the new code contribution, way too casually as we now know, the code was approved. Soon thereafter millions of websites started using it and so made themselves susceptible to attack.

I could only find one legal disclaimer in the OpenSSL website, but I bet this changes soon as the academics in charge of this non-profit association start to wake up to legal realities:


I suspect the enforceability of this language may get tested in some court somewhere in the world, probably in the U.S., but to what end? I doubt OpenSSL has any assets, much less insurance, and, even if you could prove proximate causation, how deep could the pockets of Segglemann, Henson, and other contributors be? It is more likely the primary targets for restitution will be the companies who used the defective open source software in their servers, thus exposing their users’ confidential information.

Was it negligence for commercial sites to rely on Open Source software?

SSL_secureWas it negligence for commercial sites to rely on Open Source software? Free software donated by all-too-human expertse? I do not think so. I will not go that far, but I’m willing to bet some lawyer somewhere will go that far. With a mistake this size it is almost inevitable that a class action suit will eventually get filed against somebody.

With the facts I have seen to date I do not think there was adequate notice to the adopters of this free software as to its unreliability to support a cause of action against them for negligent use of OpenSSL. But can the same be said now after the Heartbleed disaster has come to light? Now that we know the mistakes of only two men can put everyone at risk? Maybe not.

Is this the Beginning of the End for Open Source?

password-heartbleed-thumbThis may spell the beginning of the end of widespread commercial adoption of free software, at least for security purposes. After all, it took a for-profit company, Google, to discover the error in the software that many of its websites were using too. What we do not know is how many hackers, government sponsored or free lance, had previously discovered this mistake. How many had previously exploited this flaw to discover the supposedly secret passwords of the hundreds of millions persons potentially impacted. What makes this even worse is that we will never know. The biggest programming error in history made it possible for hackers to steal your data without ever leaving a trace.

Many believe the NSA has been exploiting this flaw for years. Who knows what criminal enterprises and foreign governments may have done the same?

There are two rational responses to this open source security scandal. One, stop using the Internet for anything that you want to keep secure, like all financial information. Or two, stop using Open Source, and instead use paid software, software with real safeguards, and with an entity or entities who will stand behind their products, and insurers who will stand behind them.

Society today relies heavily on the Internet. Commerce relies heavily on the Internet. If security is at risk, our current way of life is at risk. It is that important. So the first alternative is out.

This means we have to stop reliance on Open Source software for security, at least the way it is run now. We need the safety of big corporations who will have a direct economic incentive to take responsibility for their work. We need paid employees, not volunteers. Ones who will get paid bonuses for doing great work, and fired if the make Heartbleed type errors that put us all at risk. Either that, or we need major reform of this open source non-profit so that they are accountable. We are way beyond the hobbyist beginnings of the Internet, a time I remember well, and yet we still delegate major Internet responsibilities to small, unregulated groups of independent associations.

The Heartbleed disaster shows that reliance on open source software for commerce is a risky proposition when it comes to security. It may save users some money, but the risk of error may be too high. Consumers will demand that companies pay up and protect their personal data security. As Chris Williams put it in his article for The RegisterOpenSSL Heartbleed: Bloody nose for open-source bleeding hearts:

Open source or open sores? The crux of the matter is that OpenSSL is used by millions and millions of people, even if they don’t know it, but this vital encryption software – used to secure online shopping and banking, mobile apps, VPNs and much more – has a core developer team of just four volunteers who rely on donations and sponsorship. The library code is free and open source, and is used in countless products and programs, but Seggelmann and others point out that the project receives little help.

The use of open source software for everything was a fine experiment, an idealistic one based on the notion that crowdsourcing provided a better alternative to free enterprise, that capitalism could be replaced by a volunteer society of dedicated altruists. Personally, I was always skeptical. I think that competition is a good thing and helps build better products. Heartbleed confirms the skepticism was warranted. Heartbleed has exposed the dark side of crowdsourcing, the inherent weaknesses of volunteerism. The dark side of crowdsourcing is that the crowds will not come, or will stop coming. Here the crowd that checked a critical update to the code consisted of two people only, Robin Seggelmann and Stephen Henson. Two is never a crowd. In fact, a jury may some day be called upon to decide whether it was reasonable to release security code after only two people looked at it.

Robin Seggelmann’s Side of the Story

heartbleed_responsibleBen Grubb is the only journalist so far to get an interview of Robin Seggelmann, published in Man who introduced serious ‘Heartbleed’ security flaw denies he inserted it deliberately (Sydney Morning Herald, 4/11/14). Here are the key excerpts and quotes from Grubb’s article, but I suggest you read the entire article and Grubb’s followup articles too. He has an interesting perspective, including criticism of Google’s handling of the release of information of the bug’s discovery.

Dr. Seggelmann, of Münster in Germany, said the bug which introduced the flaw was “unfortunately” missed by him and a reviewer when it was introduced into the open source OpenSSL encryption protocol over two years ago. “I was working on improving OpenSSL and submitted numerous bug fixes and added new features,” he said. “In one of the new features, unfortunately, I missed validating a variable containing a length.” After he submitted the code, a reviewer “apparently also didn’t notice the missing validation”, Dr Seggelmann said, “so the error made its way from the development branch into the released version.” Logs show that reviewer was Dr Stephen Henson. Dr Seggelmann said the error he introduced was “quite trivial”, but acknowledged that its impact was “severe”.

Conspiracy theories. A number of conspiracy theorists have speculated the bug was inserted maliciously. Dr Seggelmann said it was “tempting” to assume this, especially after the disclosure by Edward Snowden of the spying activities conducted by the US National Security Agency and others. “But in this case, it was a simple programming error in a new feature, which unfortunately occurred in a security relevant area,” he said. “It was not intended at all, especially since I have previously fixed OpenSSL bugs myself, and was trying to contribute to the project.” Despite denying he put the bug into the code intentionally, he said it was entirely possible intelligence agencies had been making use of it over the past two years. “It is a possibility, and it’s always better to assume the worst than best case in security matters, but since I didn’t know [about] the bug until it was released and [I am] not affiliated with any agency, I can only speculate.”

Benefits of discovery If anything had been demonstrated by the discovery of the bug, Dr Seggelmann said it was awareness that more contributors were needed to keep an eye over code in open source software. “It’s unfortunate that it’s used by millions of people, but only very few actually contribute to it,” he said. “The benefit of open source software is that anyone can review the code in the first place. “The more people look at it, the better, especially with a software like OpenSSL.”

Future Heartbleed prevention. Asked how OpenSSL would make sure something like Heartbleed didn’t happen in the future, OpenSSL core team member Ben Laurie, who just happens to work at Google, said no promises could be made. “No one knows how to write completely secure code,” he said, speaking on behalf of OpenSSL. “However, a better job could be done of reducing the risk. For example, code audit, more review of changes. These things take more manpower, which can either come from donated time or donated money.”

Call for Dialogue and Reforms

Ralph_2014From Ben Grubb’s article it seems that even OpenSSL agrees that some change is now needed to open source Internet security code. Not surprisingly, their answer is to give them more money, a lot more. According to a NY Times Bits article, OpenSSL has only been able to raise $2,000 per year. Nicole Perlroth, OpenSSL and Linux: A Tale of Two Open-Source Projects (NYT Bits, 4/18/14). Sorry, but that is beyond pathetic. Is a catastrophe really a good fundraising strategy? I think much more fundamental reforms are now required to protect the security of the Internet. Heartbleed has proven that.

I do not have the answers. But I do have a proposal. I call for real dialogues between security experts, and a broad range of other interested parties, to come up with ideas for serious Internet security reform, and then to act on them. This should be completed before the end of this year, 2014.

linux_logoI suggest that Jim Zemlin, the executive director of the highly successful open source project, Linux, assume at least part of the lead on this, but not take control, and not limit the agenda to open source. The NYT Bits article by Perlroth suggests that Zemlin, and other open source leaders, think that better funding for OpenSSL is all that is needed to fix the problem and reassure the public after the Heartbleed catastrophe.

Data_protection_Cyber-Liability-Insurance-3-Ways-to-Secure-Your-ComputerI think they are wrong about this. The public does not care at all about the survival of open source. All they care about is the survival and security of the Internet. After all, their bank account and refrigerator are connected to the Internet today; tomorrow it could be their pacemaker. It is a key part of their life. They do not care if Microsoft or other companies profit from keeping it secure. They want their personal data secure from criminals. They do not want their bank accounts drained or their identity stolen. They want security. They want insurance.

I hope that Zemlin and other outsourcing leaders get this, and will consider other, deeper reforms than better open source fund raising. The input of security experts, ones not tied to the open source movement, including its commercial competitors, should be considered. This is not an open source problem, this is a security problem. Opponents to the open source movement should also be invited so all sides can be heard, so too should open source neutrals and outsiders, which, for the record, is my position. I am not an open source fanboy, but, on the other hand, I do use open source software, WordPress, for my blog and most websites. Others who should be invited to the conference include all shades of security experts, white-hat hackers, lawyers, consumer advocates, and others. Even the government, including the FBI and NSA.  They should all be invited to dialogue and come up with serious reforms to our current Internet security infrastructure, including especially reforms of OpenSSL, and to do so by the end of this year.

Like it or not, the views of law and lawyers must be considered. Lawsuits are not the answer. But still, they will come. Proposed reforms should take legal consequences into consideration. Real people, innocent people, may already have been harmed by these security errors. It will take years to find out what damages have been caused by OpenSSL‘s major blooper. Some courts may find that they are entitled to restitution.

ralph_1990sThe Internet is not a no-mans-land of irresponsibility. It has laws and is subject to laws. I first pointed that out in my 1996 book for MacMillan, Your Cyber Rights and Responsibilities: The Law of the Internet, Chapter 3 of Que’s Special Edition Using the Internet. Persons committing crimes on the Internet must and will be prosecuted no matter where their bodies are located. The same goes for negligent actors, be they human, corporate, or robot. Responsibility for our actions must always be considered in any human endeavor, even online. Not-for-profit status is not a get out of jail free card. That is one reason why lawyers must have a seat at the table and participate in the Internet security dialogue. Law and cyber liability issues must be considered.

From my perspective as a lawyer I expect that any real reform of Internet security will include the development of new rules. They will likely be focused on mandatory procedures to safeguard quality. The rules will try to prevent the reoccurrence of another major screwup like Heartbleed. For instance, if there is no bona fide crowd sourcing, say a minimum of 10 to 20 experts reviewing each line of code, not just two, then other safeguards should be required. In that event, perhaps deep-pocket corporations should be hired to audit everything. They should be made to vouch for the code, to stand behind it.

All alternatives should be considered, not just better fundraising and publicity for OpenSSL. (Frankly, I think it is too late for publicity to ever help OpenSSL.) Maybe private enterprise should take over OpenSSL, at least in part? Or maybe some kind of quasi-governmental entity should get involved in Internet security. For example, maybe it should be a part of ICANN’s duties?

Maybe private or public insurance should be required for any software like this, and so spread the risk among all users. Although this may offend open source fanatics, but the reality is, as Heartbleed proves, Free is not necessarily a good thing when you are looking for quality. Perhaps providers should pay for at least part of all Open Source. Most are, after all, profiting from it in one way or another. Although I hate to say it, since most politicians are technically clueless, perhaps new laws should also be considered? Laws that place incentives for quality, that impose both carrot and stick consequences. I would put everything on the table for discussion. More of the same is too risky.

I invite this dialogue to begin here and now. Email me or leave a comment below.  If that dialogue is already happening elsewhere, please let me know. In any event, feel free to forward this call for dialogue. I will report on it all here, no matter where and how it occurs, so long as it is real dialogue, people really listening to what each other have to say, and not just posturing and win/lose debate.

If this happens, I will report on the parts that I can understand, the aspects that are not overly technical, and aspects that are somewhat legal in nature. If someone or organization wants to volunteer to convene a Congress to conclude the dialogue and facilitate consensus decisions, then I will assist in publicity and report on that too. I will also be happy to attend, if at all possible. If I have anything to say on issues, I will also do that, and not just report. But for now, aside from the few general suggestions already provided here, my message at this time is to sound an alarm on the need to take action, and to suggest that the action be preceded by dialogue. I would like to know what you think about all of this?

Lawyers as Legal-Fortune Tellers

March 30, 2014

crystal_ball_IBMMost lawyers predict the future as part of their every day work. The best lawyers are very good at it. Indeed, the top lawyers I have worked with have all been great prognosticators, at least when it comes to predicting litigation outcomes. That is why concepts of predictive coding come naturally to them. Since they already do probability analysis as part of their work, it is easy for them to accept the notion that new software can extend these forward-looking skills. They are not startled by the ability of predictive analytics to discover evidence.

Although these lawyers will not know how to operate predictive coding software, nor understand the many intricacies of computer assisted search, they will quickly understand the concepts of probability relevance predictions. This deep intuitive ability is found in all good transactional and litigation attorneys. Someday soon AI and data analytics, perhaps in the form as Watson as a lawyer, will significantly enhance all  lawyer’s abilities. It will not only help them to find relevant evidence, but also to predict case outcomes.

Transactional Lawyers and Future Projections

crystal-ball.ESCHER.Losey2A good contract lawyer is also a good prognosticator. They try to imagine all of the problems and opportunities that may arise from a new deal. The lawyer will help the parties foresee issues that he or she thinks are likely to arise in the future. That way the parties can address the issues in advance. The lawyers include provisions in the agreement to implement the parties intent. They predict events that may, or may not, ever come to pass. Even if it is a new type of deal, one that has never been done before, they try to predict the future of what is likely to happen. I recall doing this when I helped create some of the first Internet hosting agreements in the mid-nineties. (We started off making them like shopping center agreements and used real estate analogies.)

Contract lawyers become very good at predicting the many things that might go wrong and provide specific remedies for them. Many of the contractual provisions based on possible future events are fairly routine. For instance, what happens if a party does not make a payment? Others are creative and pertain to specific conduct in the agreement. Like what happens if any party loses any shared information? What disclosure obligations are triggered? What other curative actions? Who pays for it?

Most transactional lawyers focus on the worst case scenario. They write contract provisions that try to protect their clients from major damages if bad things happen. Many become very good at that. Litigators like myself come to appreciate that soothsaying gift. When a deal goes sour, and a litigator is then brought in to try to resolve a dispute, the first thing we do is read the contract. If we find a contract provision that is right on point, our job is much easier.

Litigation Lawyers and Future Projections

magic_8_ball_animatedIn litigation the prediction of probable outcomes is a constant factor in all case analysis. Every litigator has to dabble in this kind of future prediction. The most basic prediction, of course, is will you win the case? What are the probabilities of prevailing? What will have to happen in order to win the case? How much can you win or lose? What is the probable damage range? What is the current settlement value of the case? If we prevail on this motion, how will that impact settlement value? What would be the best time for mediation? How will the judge rule on various issues? How will the opposing counsel respond to this approach? How will this witness hold up under the pressure of deposition?

All litigation necessarily involves near constant probability analysis. The best litigators in the world become very good at this kind of future projection. They can very accurately predict what is likely to happen in a case. Not only that, they can provide pretty good probability ranges for each major future event. It becomes a part of their everyday practice.

Clients rely on this analysis and come to expect their lawyers to be able to accurately predict what will happen in court. Trust develops as they see their lawyer’s predictions come true. Eventually clients become true believers in their legal oracles. They even accept it when they are told from time to time that no reasonable prediction is possible, that anything might happen. They also come to accept that there are no certainties. They get used to probability ranges, and so do the soothsaying lawyers.

Good lawyers quickly understand the limits of all predictions. A successful lawyer will never say that anything will certainly happen, well almost never. Instead the lawyer almost always speaks in terms of probabilities. For instance, they rarely say we cannot lose this motion, only that loss is highly unlikely. That way they are almost never wrong.

Insightful or Wishful

Professor Jane Goodman-Delahunty, JD, PhD.

Professor Jane Goodman-Delahunty, JD, PhD, Australia.

An international team of law professors have looked into the legal-fortune telling aspects of lawyers and litigation. Goodman-Delahunty, Granhag, Hartwig, Loftus, Insightful or Wishful: Lawyers’ Ability to Predict Case Outcomes, (Psychology, Public Policy, and Law, 2010, Vol. 16, No. 2, 133–157). This is the introduction to their study:

In the course of regular legal practice, judgments and meta-judgments of future goals are an important aspect of a wide range of litigation-related decisions. (English & Sales, 2005). From the moment when a client first consults a lawyer until the matter is resolved, lawyers must establish goals in a case and estimate the likelihood that they can achieve these goals. The vast majority of lawyers recognize that prospective judgments are integral features of their professional expertise. For example, a survey of Dutch criminal lawyers acknowledged that 90% made predictions of this nature in some or all of their real-life cases (Malsch, 1990). The central question addressed in the present study was the degree of accuracy in lawyers’ forecasts of case outcomes. To explore this question, we contacted a broad national sample of U.S. lawyers who predicted their chances of achieving their goals in real-life cases and provided confidence ratings in their predictions.

Assoc. Professor Maria Hartwig, PhD,  Sweden, Psychology and Law

Assoc. Professor Maria Hartwig, PhD, Psychology & Law, Sweden

Prediction of success is of paramount importance in the system for several reasons. In the course of litigation, lawyers constantly make strategic decisions and/or advise their clients on the basis of these predictions. Attorneys make decisions about future courses of action, such as whether to take on a new client, the value of a case, whether to advise the client to enter into settlement negotiations, and whether to accept a settlement offer or proceed to trial. Thus, these professional judgments by lawyers are influential in shaping the cases and the mechanisms selected to resolve them. Clients’ choices and outcomes therefore depend on the abilities of their counsel to make reasonably accurate forecasts concerning case outcomes. For example, in civil cases, after depositions of key witnesses or at the close of discovery, the parties reassess the likelihood of success at trial in light of the impact of these events.

Professor Pär Anders Granhag, Ph.D. Psychology, Sweden

Professor Pär Anders Granhag, PhD, Psychology, Sweden

In summary, whether lawyers can accurately predict the outcome of a case has practical consequences in at least three areas: (a) the lawyer’s professional reputation and financial success; (b) the satisfaction of the client; and (c) the justice environment as a whole. Litigation is risky, time consuming, and expensive. The consequences of judgmental errors by lawyers can be costly for lawyers and their clients, as well as an unnecessary burden on an already overloaded justice system. Ultimately, a lawyer’s repute is based on successful calculations of case outcome. A lawyer who advises clients to pursue litigation without delivering a successful outcome will not have clients for long. Likewise, a client will be most satisfied with a lawyer who is accurate and realistic when detailing the potential outcomes of the case. At the end of the day, it is the accurate predictions of the lawyer that enable the justice system to function smoothly without the load of cases that were not appropriately vetted by the lawyers.

Elizabeth F. Loftus, Professor of Social Ecology, and Professor of Law, and Cognitive Science Ph.D., Stanford University

Elizabeth F. Loftus, Professor of Social Ecology, Law and Cognitive Science, PhD., California

The law professors found that a lawyer’s prognostication ability does not necessarily come from experience. This kind of legal-fortune telling appears to be a combination of special gift, knowledge, and learned skills. It certainly requires more than just age and experience.

The law professor survey showed two things: (1) that lawyers as a whole tend to be overconfident in their predictions of favorable outcomes, and, (2) that experienced lawyers do not on average do a better job of predicting outcomes than inexperienced lawyers. Insightful or Wishful (“Overall, lawyers were over-confident in their predictions, and calibration did not increase with years of legal experience”). The professors also found that women lawyers tend to be better at future projection than men, so too did specialists over generalists.

Experience should make lawyers better prognosticators, but it does not. Their ego gets in the way. The average lawyer does not get better at predicting case outcomes with experience because they get over-confident with experience. They remember the victories and rationalize the losses. They delude themselves into thinking that they can control things more than they can.

I have seen this happen in legal practice time and time again. Indeed, as a young lawyer I remember surprising senior attorneys I went up against. They were confident, but wrong. My son is now having the same experience. The best lawyers do not fall into the over confidence trap with age. They encourage their team to point out issues and problems, and to challenge them on strategy and analysis. The best lawyers I know tend to err on the side of caution. They are typically glass half empty types.They remember the times they have been wrong.

How Lawyers Predict The Future

SoothsayerAccurate prediction of future events by lawyers, or anyone for that matter, requires deep understanding of process, rules, and objective analysis. Deep intuitive insights into the people involved also helps. Experience assists too, but only in providing a deep understanding of process and rules, and knowledge of relevant facts in the past and present. Experience alone does not necessarily assist in analysis for the reasons discussed. Effective analysis has to be objective. It has to be uncoupled from personal perspectives and ego inflation.

The best lawyers understand all this, even if they may not be able to articulate it. That is how they are able to consistently and accurately calibrate case outcomes, including, when appropriate, probable losses. They do not take it personally. Accurate future vision requires not only knowledge, but also objectivity, humility, and freedom from ulterior motives. Since most lawyers lack these qualities, especially male lawyers, they end up simply engaging in wishful thinking.

The Insightful or Wishful study seems to have proven this point. (Note my use of the word seems, a typical weasel word that lawyers are trained to use. It is indicative of probability, as opposed to certainty, and protects me from ever being wrong. That way I can maintain my illusion of omnipotence.)

The best lawyers inspire confidence, but are not deluded by it. They are knowledgable and guided by hard reason, coupled with deep intuition into the person or persons whose decisions they are trying to predict. That is often the judge, sometimes a jury too, if the process gets that far (less than 1% of the cases go to trial). It is often opposing counsel or opposing parties, or even individual witnesses in the case.

All of these players have emotions. Unlike Watson, the human lawyers can directly pick up on these emotions. The top lawyers understand the non-verbal flows of energy, the irrational motivations. They can participate in them and influence them.

If lawyers with these skills can also maintain objective reason, then they can become the best in their field. They can become downright uncanny in their ability to both influence and forecast what is likely to happen in a law suit. Too bad so few lawyers are able to attain that kind of extremely high skill level. I think most are held back by an incapacity to temper their emotions with objective ratiocination. The few that can, rarely also have the emphatic, intuitive skills.

Watson as Lawyer Will be a Champion Fortune Teller

Is Watson coming to Legal Jeopardy?

Is Watson coming to Legal Jeopardy?

The combination of impartial reason and intuition can be very powerful, but, as the law professor study shows, impartial reason is a rarity reserved for the top of the profession. These are the attorneys who understand both reason and emotion. They know that the reasonable man is a myth. They understand the personality frailties of being human. Scientific Proof of Law’s Overreliance On Reason: The “Reasonable Man” is Dead, Long Live the Whole Man, Parts OneTwo and Three; and The Psychology of Law and Discovery.

I am speaking about the few lawyers who have human empathy, and are able to overcome their human tendencies towards overconfidence, and are able to look at things impartially, like a computer. Computers lack ego. They have no confidence, no personality, no empathy, no emotions, no intuitions. They are cold and empty, but they are perfect thinking machines. Thus they are the perfect tool to help lawyers become better prognosticators.

This is where Watson the lawyer comes in. Someday soon, say the next ten years, maybe sooner, most lawyers will have access to a Watson-type lawyer in their office. It will provide them with objective data analysis. It will provide clear rational insights into likely litigation outcomes. Then human lawyers can add their uniquely human intuitions, empathy, and emotional insights to this (again ever mindful of overconfidence).

The AI-enhanced analysis will significantly improve legal prognostications. It will level the playing field and up everyone’s game in the world of litigation. I expect it will also have the same quality improvement impact on contract and deal preparations. The use of data analytics to predict the outcome in patent cases is already enjoying remarkable success with a project called Lex Machina. The CEO of Lex Machina, Josh Becker, calls his data analytics company the moneyball of IP litigation. Tam Harbert, Supercharging Patent Lawyers With AI. Here is the Lex Machina description of services:

We mine litigation data, revealing insights never before available about judges, lawyers, parties, and patents, culled from millions of pages of IP litigation information.

Many corporations are already using the Lex Machina’s analytics to help them to select litigation counsel most likely to do well in particular kinds of patent cases, and with particular courts and judges. Law firms are mining the past case data for similar reasons.


Oracle_delphiHere is my prediction for the future of the legal profession. In just a few more years, perhaps longer, the linear, keyword-only evidence searchers will be gone. They will be replaced by multi-modal, predictive coding based evidence searchers. In just a decade, perhaps longer (note weasel word qualifier), all lawyers will be obsolete who are not using the assistance of artificial intelligence and data analytics for general litigation analysis.

Lawyers in the future who overcome their arrogance, their overconfidence, and accept the input and help of Watson-type robot lawyers, will surely succeed. Those who do not, will surely go the way of linear, keyword-only searchers in discovery today. These dinosaurs are already being replaced by AI-enhanced searchers and AI-enhanced reviewers. I could be overconfident, but that is what I am starting to see. It appears to me to be an inevitable trend pulled along by larger forces of technological change. If you think I am missing something, please leave a comment below.

This rapid forced evolution is a good thing for the legal profession. It is good because the quality of legal practice will significantly improve as the ability of lawyers to make more accurate predictions improves. For instance, the justice system will function much more smoothly when it does not have to bear the load of cases that have not been appropriately vetted by lawyers. Fewer frivolous and marginal cases will be filed that have no chance of success, except for in the deluded minds of second rate attorneys. (Yes, that is what I really think.) These poor prognosticators will be aided by robots to finally recognize a hopeless case. That is not to say that good lawyers will avoid taking any high risk cases. I think they should and I believe they will. But the cases will be appropriately vetted with realistic risk-reward analysis. The clients will not be seduced into them with false expectations.


With data analytics unnecessary motions and depositions will be reduced for the same reason. The parties will instead focus on the real issues, the areas where there is bona fide dispute and uncertainty. The Watson type legal robots will help the judges as well. With data analytics and AI, more and more lawyers and judges will be able to follow Rule 1 of the Federal Rules of Civil Procedure. Then just, speedy, and inexpensive litigation will be more than a remote ideal. The AI law robots will make lawyers and judges smart enough to run the judicial system properly.

Artificial intelligence and big data analytics will enable all lawyers to become excellent outcome predictors. It will allow all lawyers to move their everyday practice from art to science, much like predictive coding has already done for legal search.

Best Practices in e-Discovery for Handling Unreviewed Client Data

March 16, 2014

Hacker_closeupBig data security, hackers, and data breaches are critical problems facing the world today, including the legal profession. That is why I have focused on development of best practices for law firms to handle large stores of client data in e-discovery. The best practice I have come up with is simple. Do not do it. Outsource.

Attorneys should only handle evidence. Law firms should not take possession of large, unprocessed, unreviewed stores of client data, the contents of which are typically unknown. They should not even touch it. They should stay out of the chain of custody. Instead, lawyers should rely on professional data hosting vendors that have special expertise and facilities designed for data security. In today’s world, rife as it is with hackers and data breaches, hosting is a too dangerous and complex a business for law firms. The best practice is to delegate to security professionals the hosting of large stores of unreviewed client data.

Although it is still a best practice for knowledgable lawyers to control the large stores of client data collected for preservation and review, they should limit actual possession of client data. Only after electronic evidence has been reviewed and identified as relevant, or probable-relevant, should a law firm take possession. Before that, all large stores of unidentified client data, such as a custodian’s email box, should only be handled by the client’s IT experts, and professional data hosting companies, typically e-discovery vendors. The raw data should go directly from the client to the vendor. Alternatively, the client should never let the data leave its own premises. It should host the data on site for review by outside counsel. Either way, the outside law firm should not touch it, and certainly should not host it on the law firm’s computer systems. Instead, lawyers should search and review the data by secure online connections.

This outsourcing arrangement is, in my opinion, the best practice for law firm handling of large stores of unreviewed client data. I know that many private law firms, especially their litigation support departments, will strongly disagree.

focusLaw firms should stick to providing legal services, a position I have stated several times before. Losey, R., Five Reasons to Outsource Litigation Support (LTN, Nov. 2, 2012); WRECK-IT RALPH: Things in e-discovery that I want to destroy!Going “All Out” for Predictive Coding and Vendor Cost Savings. Data hosting is a completely different line of work, and is very hazardous in today’s world of hacking and data breaches.

Best Practice: Full Control, But Limited Possession

Again, to be clear, law firms must have actual possession of evidence, including original client documents. Lawyers cannot do their job without that. But lawyers do not need possession of vast hordes of unidentified, irrelevant data. The best practice is for law firms to control such client data, but to do so without taking possession. Attorneys should limit possession to the evidence.

Only after the large stores of client’s raw data have been searched, and evidence identified, should the digital evidence be transferred to the law firm for hosting and use in the pending legal matter. In other words, lawyers and law firms only need the signal, they do not need the noise. The noise – the raw data not identified as evidence or possible evidence – should be returned to the client, or destroyed. Typically this return or destruction is delayed pending the final outcome of the matter, just in case a research of the raw data is required.

I know this is a very conservative view. My law firm may well be the only AmLaw 100 firm that now has this rule. This hands-off rule as to all large stores of ESI is a radical departure from the status quo. But even if no other large law firm in the world now does this, that does not mean such outsourcing is wrong. It just means we are the first.


Remember the T.J. Hooper, the tugboat with valuable barge that sunk at sea because they were not equipped with radios to warn them of an approaching storm? The case involving this tragic loss of life and property is required reading in every law school torts class in the country. T. J. Hooper 60 F.2d 737 (2d Cir. 1932) (J. Hand).

Sometimes a whole profession can lag behind the times. There is no safety in numbers. The only safety is in following best practices that make sense in today’s environment.  Although law firm hosting of large data stores of client data once made sense, I no longer think it does. The high amount of data and security threats in today’s environment makes it too risky for me to continue to accept this as a best practice.

Current Practice of Most Law Firms

data-breachMost of the legal profession today, including most private attorneys and their law firms, collect large stores of ESI from their clients when litigation hits. This is especially true in significant cases. They do so for preservation purposes and in the hopes they may someday find relevant evidence. The law firms take delivery of the data from their clients. They hold the entire haystack, even though they only need the few needles they hope are hidden within. They insert themselves into the chain of custody. This needs to stop.

Corporate counsel often make the same mistake. The data may go from the client IT, to the client legal department, and then to the outside counsel. Three hands, at least, have by then already touched the data. Sometimes the metadata changes and sanctions motions follow.

It gets even worse from there, much worse. When the data arrives at the law firm, the firm typically keeps the data. The data is sent by the client on CDs, DVDs, thumb drives, or portable USB drives. Sometimes FTP transfer is used. It is received by the outside attorney, or their assistant, or paralegal, or law firm office manager, or tech support person. We are talking about receipt of giant haystacks of information, remember, not just a few hundred, or few thousand documents, but millions of documents, and other computer files. The exact contents of these large collections is unknown. Who knows, they might contain critical trade secrets of the company. They almost certainly contain some protected information. Perhaps a lot of protected information. Regardless, all of it must be treated as confidential and protected from disclosure, except by due process in the legal proceeding.

After the law firm receives the client’s confidential data one of three things typically happen:

1.  The law firm forwards the data to a professional data processing and hosting company and deletes all copies from its system, and, for example, does not keep a copy of the portable media on which the larges stores of ESI were received. This is not a perfect best practice because the law firm is in the chain of custody, but it is far better than the next two alternatives, which is what usually happens in most firms.

2.  The law firm again forwards the data to a professional data processing and hosting company, but does not delete all copies from its system, and, for example, keeps a copy of the portable media on which the larges stores of ESI were received. This is a very common practice. Many attorneys think this is a good practice because that way they have a backup copy, just in case. (The backup should be kept by the client IT as part of the collection and forwarding, not the law firm.) I used to do this kind of thing for years, until one day I realized how it was all piling up. I realized the risk from holding thousands of PST files and other raw unprocessed client data collections. I was literally holding billions of emails in little storage devices in my office or in subdirectories of one of my office computers. Trillions more were on our firm’s litigation support computers, which bring us to the third, worst case scenario, where the data is not forwarded to a vendor.

3.  In this third alternative, which is the most common practice in law firms today, and the most dangerous, the law firm keeps the data. All it does is transfer the data from the receiving attorney (or secretary) to another department in the law firm, typically called Litigation Support. The Litigation Support Department, or whatever name the law firm may choose to call it, than holds the billions of computer files, contents unknown, on law firms computers, and storage closets, hopefully locked. Copies are placed on law firm servers, so that some attorneys and paralegals in the firm can search them for evidence. Then they often multiply by backups and downloads. They stay in the firm’s IT systems until the case is over.

Rows of server cages

At that time, in theory at least, they are either returned to the client or destroyed. But in truth this often never happens and raw data tends to live on and on in law firm computers, back up tapes, personal hard drives, DVDs, etc. Some people call that dark data. Most large law firms have tons of client dark data like that. It is a huge hidden liability. Dark or not it is subject to subpoena. Law firm’s can be forced to search and produce from these stores of client data. I know of one firm forced to spend over a million dollars to review such data for privilege before production to the government. The client was insolvent and could not pay, but still the firm had to spend the money to protect the privileged communications.

Dangers of Data Intrusions of Law Firms

Data-Breach-climbThese practices are unwise and pose a serious risk to client data security, a risk that grows bigger each day. The amount of data in the world doubles every two years, so this problem is getting worse as the amount of data held for litigation grows at an exponential rate. The sophistication of data thieves is also growing. The firewall that law firms think protect their client’s data is child play to some hackers. The security is an illusion. It is only a matter of time before disaster strikes and a large store of client data is stolen. The damages from even an average sized data breach can be extensive, as the below chart shows.


Client data is usually held by law firms on their servers so that their attorneys can search and review the data as part of e-discovery. As IT security experts know, servers are the ultimate target at the end of a lateral kill chain that advanced persistent threat (APT)-type attackers pursue. Moreover, servers are the coveted prize of bot herders seeking persistent access to high-capacity computing. Application control and comprehensive vulnerability management are essential to breaking the lateral kill chain of attackers. You do not follow all of this? Never seen a presentation titled Keeping Bot Herders Off Your Servers and Breaking the Lateral Kill Chain of Today’s Attackers? Of course not. I do not really understand this either. IT security has become a very specialized and complex field. That is one of my key points here.

Law firms are the soft underbelly of corporate data security. More and more bad hackers are realizing the vulnerability of law firms and beginning to exploit it. So many lawyers are technically naive. They do not yet see the danger of hacking, nor the severity and complexity of issues surrounding data security.

Sharon_NelsonSharon Nelson, President of the Virginia State Bar and well known expert in this area, has been warning about this threat to law firms for years. In 2012 her warnings were echoed by the FBI. FBI Again Warns Law Firms About the Threat From Hackers. Mary Galligan, the special agent in charge of cyber and special operations for the FBI’s New York Office, is reported by Law Technology News as saying: We have hundreds of law firms that we see increasingly being targeted by hackers. Bloomberg’s Business Week quoted Galligan as saying: As financial institutions in New York City and the world become stronger, a hacker can hit a law firm and it’s a much, much easier quarryChina-Based Hackers Target Law Firms to Get Secret Deal Data (Bloomberg 1/31/12).

If lawyers are in a big firm, their client’s data may already have been hacked and they were never told about it. According to Sharon Nelson’s report on a survey done in 2013, 70% of large firm lawyers do not know if their firm has ever been breached. The same survey reported that 15% of the law firms have experienced a security breach. That’s right. Fifteen percent of the law firms surveyed admitted to having discovered a computer security intrusion of some kind.

Sharon said that the survey confirmed what her company Sensei Enterprises already knew from decades of experience with lawyers and data security. She reports that most law firms never tell their attorneys when there has been a breach. Your law firm may already have been hacked multiple times. You just do not know about it. Sharon, never an attorney to mince words, went on to say in her excellent blog, Ride the Lightning:

We often hear “we have no proof that anything was done with client data” in spite of the fact that the intruders had full access to their network. Our encounters with these breaches indicate that if law firms can keep the breach quiet, they will.

They will spend the money to investigate and remediate the breach, but they will fail to notify clients under state data breach laws and they won’t tell their own lawyers for fear the data breach will become public. Is that unethical? Probably. Unlawful? Probably. But until there is a national data breach law with teeth, that approach to data breaches is unlikely to change.

Someday a breach will go public. A big data breach and loss by just one law firm could quickly make the whole profession as conservative as me when it comes to big data and confidentiality. All it would take is public disclosure of one large data breach of one large law firm, especially if the ESI lost or stolen included protected information requiring widespread remedial action. Then everyone will outsource hosting to specialists.

What if a law firm happened to have credit card information and it was stolen from the law firm? Or worse yet, what if the client data was lost when a lawyer misplaced his brief case with a portable hard drive? This would a nightmare for any law firm, even if it did not get publicized. Why take that risk? That is my view. I am sounding the alarm now on big data security so that the profession can change voluntarily without the motivation of crisis.

Outsource To Trusted Professionals

Data_monitoringI have never seen a law firm with even close to the same kind of data security protocols that I have seen with the top e-discovery vendors. Law firms do not have 24/7 human in-person monitoring of all computer systems. They do not have dozens of video cameras recording all spaces where data is maintained. They do not have multiple layers of secured clean rooms, with iris scans and finger print scans, and other super high-tech security systems. You have seen this kind of thing in movies I’m sure, but not in your law firm.

Some vendors have systems like that. I know. I have seen them. As part of my due diligence for my firm’s selection of Kroll Ontrack, I visited their secure data rooms (well, some of them; others I was not allowed in). These were very cold, very clean, very secure rooms where the client data is stored. I am not even permitted to disclose the general location of these secure rooms. They are very paranoid about the whole thing. I like that. So do our clients. This kind of data security does not come cheap, but it is money well spent. The cheapest vendor is often a false bargain.

Data_SecurityHave you seen your vendor’s secure rooms? Does your law firm have anything like that? How many technical experts in data security does your firm employ? Again, I am not referring to legal experts, but to computer engineers who specialize in hacker defenses? The ones who know about the latest intrusion detection systems, viruses, bot herders, and breaking a lateral kill chain of attackers. Protecting client data is a serious business and should be treated seriously.

Any data hosting company that you choose should at least have independent certifications of security and other best practices based on audits. The ones I know about are the ISO/IEC 27000 series and the SSAE 16 SOC 2 certification. Is your law firm so certified? Your preferred vendor?

The key question here in choosing vendors is do you know where your client’s data is? In the clouds somewhere within your vendor’s control is not an acceptable answer, at least not for anyone who takes data security seriously. As a best practice you should know, and you should have multiple assurances, including third party certifications and input from security experts. In large matters, or when selecting a preferred vendor, you should also make a personal inspection, and you should verify adequate insurance coverage. You want to see cyber liability insurance. Remember, even the NSA screws up from time to time. Are you covered if this happens?

Client data security should be job number one for all e-discovery lawyers. I know it is for me, which is why I take this conservative hands-off position.

Most Law Firms Do a Poor Job of Protecting Client Data

Computer security conceptFrom what I have seen, very few law firms have highly secure client data hosting sites. Most do not even have reliable, detailed data accounting for checking in and out client data. The few that do, rarely enforce it. They rarely (never?) audit attorneys and inspect their offices and equipment to verify that they do not have copies of client data on their hard drives and DVDs, etc. In most law firms a person posing as a janitor could rummage through any office undisturbed, and probably even gain access to confidential computers. Have you ever seen all the sticky notes with passwords around the monitors of many (most?) attorneys.

Attorneys and law firms can and should be trusted to handle evidence, even when that may sometimes included hundreds of thousands of electronic and paper files. But they should not be over-burdened with the duty to also host large volumes of raw unprocessed data. Most are simply not up to the task. That is not their skill set. It is not part of the legal profession. It is not a legal service at all. Secure data hosting is a highly specialized computer engineering service, one that requires an enormous capital investment and constant diligence to do correctly. I do not think law firms have made that kind of investment, nor do I think they should. Again, it is beyond our core competence. We provide legal services, not data hosting.

Even data hosting by the best professionals is not without its risks. Just ask the NSA about the risks of rogue employees like Snowden. Are law firms equipped to mitigate these risks? Are they even adequately insured to deal with losses if client data is lost or stolen? I doubt it, and yet only a few more sophisticated clients even think to ask.

Is your law firm ready? Why even put yourself in that kind of risky position. Do you really make that much money in e-discovery charges to your clients? Is that profit worth the risk?

Ethical Considerations

This issue also has ethical implications. We are talking about protecting the confidentiality of client data. When it comes to issues like this I think the best practice is to take a very conservative view. The governing ethical rule for lawyers is Rule 1.6 of the ABA Model Rules of Professional Conduct. Subsection (c) of this rule applies here:

(c)  A lawyer shall make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client.

justice_guage_negligenceAgain we are faced with the difference between reasonable efforts and best practices. The ABA and most lawyers agree that Rule 1.6 allows a law firm to take possession of the raw, unreviewed client data, no matter what the size, so long as certain minimum “reasonable efforts” are made to safeguard the data. I do not disagree with this. I am certainly not attempting to create a new, higher standard for professional malpractice. It is not negligent for a law firm to possess large stores of unreviewed client data, although it could be, if rudimentary safeguards were not in place. My position is that it is no longer a best practice to do so. The best practice is now to outsource to reliable professionals who specialize in this sort of thing.


HackerLaw firms are in the business of providing legal services, not data hosting. They need to handle and use evidence, not raw data. Lawyers and law firms are not equipped to maintain and inventory terabytes of unknown client data. Some firms have petabytes of client data and seem to be very pleased with themselves about it. They brag about it. They seem oblivious of the risks. Or, at the very least, they are over confident. That’s something that bad hackers look for. Take a conservative view like I do and outsource this complex task. That is the best practice in e-discovery for handling large stores of unreviewed client data.

I sleep well at night knowing that if Anonymous or some other hacker group attacks my firm, and penetrates our high security, as they often do with even the best defenses of military security systems, that they will not get a treasure trove of client data.

Data_privacy_wordsThis does not mean law firms should be lax in handling their own data and communications. They must be hyper-vigilant in this too. Security and hacker defense is everyone’s concern. Law firms should focus on defense of their own information. Firms should not compound their problems by vastly increasing the size and value of their targets. Law firms are the soft underbelly of corporate data security because of the information of their corporate clients that most of them hold.

Although some hackers may be hired by litigants for purposes of illegal discovery of privileged communications and work product, most are not. They are after money and valuable trade secrets. The corporate stashes are the real target. If these potential treasure troves of data must leave a corporation’s possession, be sure they are in the hands of professional big data security experts. Do not set yourself up to be the next hacker victim.

IT-Lex Discovers a Previously Unknown Predictive Coding Case: “FHFA v. JP Morgan, et al”

March 2, 2014

brain_gearsThe researchers at IT-Lex have uncovered a previously unknown predictive coding case out of the SDNY, Federal Housing Finance Authority v. JP Morgan Chase & Co., Inc. et al. The et al here includes just about every other major bank in the world, each represented by one of the top 25 mega law firms in the country. The interesting orders approving predictive coding were entered in 2012, yet, until now, no one has ever talked about FHFA v JP Morgan. That is amazing considering the many players involved.

The two main orders in the case pertaining to predictive coding, are here (order dated July 24, 2012), and here (order dated July 31, 2012). I have highlighted the main passages in these long transcripts. These are Ore Tenus orders, but orders none the less. The Pacer file is huge, so IT-Lex may have missed others, but we doubt it. The two key memorandums underlying the orders are by the defendant, JP Morgan’s attorneys, Sullivan & Cromwell, dated July 20, 2012, and by the plaintiff, FHFA’s lawyers, Quinn Emanuel Urquhart & Sullivan, dated July 23, 2012.

The fact these are ore tenus rulings on predictive coding explains how they have remained under the radar for so long. The orders show the mastery, finesse, and wisdom of the presiding District Court Judge Denise Cote. She was hearing her first predictive coding issue and handled it beautifully. Unfortunately, under the transcript the trial lawyers arguing pro and con did not hold up as well. Still, they appear to have been supported by good e-discovery lawyer experts behind the scenes. It all seems to have all turned out relatively well in the end as a recent Order dated February 14, 2014 suggests. Predictive coding was approved and court ordered cooperation resulted in a predictive coding project that appears to have gone pretty well. 

Defense Wanted To Use Predictive Coding

JP_MorganThe case starts with the defense, primarily JP Morgan, wanting to use predictive coding and the plaintiff, FHFA, objecting. The FHFA wanted the defendant banks to review everything. Good old tried and true linear review. The plaintiff also had fall back objections on the way the defense proposed to conduct the predictive coding.

The letter memorandum by Sullivan & Cromwell for JP Morgan is only three pages in length, but has 63 pages of exhibits attached. The letter relies heavily on the then new Da Silva Moore opinion by Judge Peck. The exhibits include the now famous 2011 Grossman and Cormack law review article on TAR, a letter from plaintiff’s counsel objecting to predictive coding, and a proposed stipulation and order. Here are key segments of Sullivan and Cromwell’s arguments:

According to Plaintiff, it will not agree to JPMC’s use of any Predictive Coding unless JPMC agrees to manually review each and every one of the millions of documents that JPMC anticipates collecting. As Plaintiff stated: “FHF A’s position is straightforward. In reviewing the documents identified by the agreed-upon search terms, the JPM Defendants should not deem a document nonresponsive unless that document has been reviewed by an attorney.”

Plaintiffs stated position, and its focus on “non-responsive” documents, necessitates this request for prompt judicial guidance. Predictive Coding has been recognized widely as a useful, efficient and reliable tool precisely because it can help determine whether there is some subset of documents that need not be manually reviewed, without sacrificing the benefit, if any, gained from manual review. Predictive Coding can also aid in the prioritization of documents that are most likely to be responsive. As a leading judicial opinion as well as commentators have warned, the assumption that manual review of every document is superior to Predictive Coding is “a myth” because “statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review.” Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350, at *28 (S.D.N.Y. Feb. 24, 2012) (Peck, Mag. J.) …

JPMC respectfully submits that this is an ideal case for Predictive Coding or “machine learning” to be deployed in aid of a massive, expedited document production. Plaintiffs claims in this case against JPMC concern more than 100 distinct securitizations, issued over a several year period by three institutions that were entirely separate until the end of that period, in 2008 (i.e., JPMorgan Chase, Bear Stearns & Co., and Washington Mutual). JPMC conservatively predicts that it will have to review over 2.5 million documents collected from over 100 individual custodians. Plaintiffhas called upon JPMC to add large numbers of custodians, expand date ranges, and otherwise augment this population, which could only expand the time and expense required? Computer assisted review has been approved for use on comparable volumes of material. See, e.g., DaSilva Moore, 2012 U.S. Dist. LEXIS 23350, at *40 (noting that the manual review of3 million emails is “simply too expensive.”).

Plaintiff’s Objections


The plaintiff federal government agency, FHFA, filed its own three page response letter with 11 pages of exhibits. The response objects to use of predictive coding and the plaintiff’s proposed methodology. Here is the core of their argument:

First, JPMC’s proposal is the worst of both worlds, in that the set of documents to which predictive coding is to be applied is already narrowed through the use of search terms designed to collect relevant documents, and predictive coding would further narrow that set of documents without attorney review,1 thereby eliminating potentially responsive documents. …

Finally, because training a predictive coding program takes a considerable amount of time,2 the truncated timeframe for production of documents actually renders these Actions far from “ideal” for the use of predictive coding.

Poppy_headThe first objection on keyword search screening is good, but the second, that training would take too long, shows that the FHFA needed better experts. The machine learning training time is usually far less than the document review time, especially in a case like this, and the overall times savings from using predictive coding are dramatic. So the second objection was a real dog.

Still, FHFA made one more objection to method that was well placed, namely that their had been virtually no disclosure as to how Sullivan and Cromwell intended to conduct the process. (My guess is, they had not really worked that all out yet. This was all new then, remember.)

[I]t has similarly failed to provide this Court with any details explaining (i) how it intends to use predictive coding, (ii) the methodology or computer program that will be used to determine responsiveness, or (iii) any safeguards that will ensure that responsive documents are not excluded by the computer model. Without such details, neither FHFA nor this Court can meaningfully assess JPMC’s proposal. See Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350, at *23 (S.D.N.Y. Feb. 24, 2012) (“[Defendant’s] transparency in its proposed ESI search protocol made it easier for the Court to approve the use of predictive coding.”).4 JPMC’s proposed order sets forth an amorphous proposal that lacks any details. In the absence of such information, this Court’s authorization of JPMC’s use of predictive coding would effectively give JPMC carte blanche to implement predictive coding as it sees fit.

Hearing of July 24, 2012

Judge_Denise_CoteJudge Denise Cote came into the hearing having read the briefs and Judge Peck’s then recent landmark ruling in Da Silva Moore. It was obvious from her initial comments that her mind was made up that predictive coding should be used. She understood that this mega-size case needed predictive coding to meet the time deadlines and not waste a fortune on e-document review. Here are Judge Cote’s words at pages 8-9 of the transcript:

It seems to me that predictive coding should be given careful consideration in a case like this, and I am absolutely happy to endorse the use of predictive coding and to require that it be used as part of the discovery tools available to the parties. But it seems to me that the reliability and utility of predictive coding depends upon the process that takes place in the initial phases in which there is a pool of materials identified to run tests against, and I think that some of the documents refer to this as the seed — S-E-E-D — set of documents, and then there are various rounds of further testing to make sure that the code becomes smart with respect to the issues in this case and is sufficiently focused on what needs to be defined as a responsive document. And for this entire process to work, I think it needs transparency and cooperation of counsel.

I think ultimately the use of predictive coding is a benefit to both the plaintiff and the defendants in this case. I think there’s every reason to believe that, if it’s done correctly, it may be more reliable — not just as reliable but more reliable than manual review, and certainly more cost effective — cost effective for the plaintiff and the defendants.

To plaintiff’s counsel credit she quickly shifted her arguments from whether to how. Defense counsel also falls all over herself about how cooperative she has been and will continue to be, all the while implying that the other side is a closet non-cooperator.

As it turns out, very little actual conservation had occurred between the two lead counsel before the hearing, as both had preferred snarly emails and paper letters. At the hearing Judge Cote ordered the attorneys to talk first, and then rather than shoot off more letters, and to call her if they could not agree.

I strongly suggest you read the whole transcript of the first order to see the effect a strong judge can have on trial lawyers. Page 24 is especially instructive as to just how active a bench can be. At the second hearing of July 24, 2012, I suggest you read the transcript at pages 110-111 to get an idea as to just how difficult those attorneys meetings proved to be.

As a person obsessed with predictive coding I find the transcripts of the two hearings to be kind of funny in a perverse sort of way. The best way for me to share my insights is by using the format of a lawyer joke.

Two Lawyers Walked Into A Bar

star_trek_barOne e-discovery lawyer walks into a Bar and nothing much happens. Two e-discovery lawyers walks into a Bar and an interesting discussion ensues about predictive coding. One trial lawyer walks into a Bar the volume of the whole place increases. Two trial lawyers walk into a Bar and an argument starts.

The 37 lawyers who filed appearances in the FHFA case walk into a Bar and all hell breaks loose. There are arguments everywhere. Memos are written, motions are filed, and the big bank clients are billed a million or more just talking about predictive coding.

Then United States District Court Judge Denise Cote walks into the Bar. All the trial lawyers immediately shut up, stand up, and start acting real agreeable, nice, and polite. Judge Cote says she has read all of the letters and they should all talk less, and listen more to the two e-discovery specialists still sitting in the bar bemused. Everything becomes a cooperative love-fest thereafter, at least, as far as predictive coding and Judge Conte are concerned. The trial lawyers move on to fight and bill about other issues more within their kin.

Substantive Disputes in FHFA v. JP Morgan

disclosureThe biggest substantive issues in the first hearing of July 24, 2012 had to do with disclosure and keyword filtering before machine training. Judge Cote was prepared on the disclosure issue from having read the Da Silva Moore protocol, and so were the lawyers. The judge easily pressured defense counsel to disclose both relevant and irrelevant training documents to plaintiff’s counsel, with the exception of privileged documents.

As to the second issue of keyword filtering, the defense lawyers had been told by the experts behind the scenes that JP Morgan should be allowed to keyword filter the custodians ESI before running predictive coding. Judge Peck had not addressed that issue in Da Silva Moore, since the defense had not asked for that, so Judge Cote was not prepared to rule on that then new and esoteric issue. The trial lawyers were not able to articulate much on the issue either.

Judge Cote asked trial counsel if they had previously discussed this issue, not just traded memos, and they admitted no. So she ordered them to talk about it. It is amazing how much easier it is to cooperate and reach agreement when you actually speak, and have experts with you guiding the process. So Judge Cote ordered them to discuss the issue, and, as it turns out from the second order of July 31, 2012, they reached agreement. There would be no keyword filtering.

Although we do not know all of the issues discussed by attorneys, we do know they managed to reach agreement, and we know from the first hearing what a few of the issues were. They were outlined by plaintiff’s counsel who complained that they had no idea as to how defense counsel was going to handle the following issues at page 19 of the first hearing transcript:

What is the methodology for creating the seed set? How will that seed set be pulled together? What will be the number of documents in the seed set? Who will conduct the review of the seed set documents? Will it be senior attorneys or will it be junior attorneys? Whether the relevant determination is a binary determination, a yes or no for relevance, or if there’s a relevance score or scale in terms of 1 to 100. And the number of rounds, as your Honor noted, in terms of determining whether the system is well trained and stable.

So it seems likely all these issues and more were later discussed and accommodations reached.  At the second hearing of July 31, 2012, we get a pretty good idea as to how difficult the attorneys meetings must have been. At pages 110-111 of the second hearing transcript we see how counsel for JP Morgan depicted these meetings and the quality of input received from plaintiff’s counsel and experts:

We meet every day with the plaintiff to have a status report, get input, and do the best we can to integrate that input. It isn’t always easy, not just to carry out those functions but to work with the plaintiff.

The suggestions we have had so far have been unworkable and by and large would have swamped the project from the outset and each day that a new suggestion gets made. But we do our best to explain that and keep moving forward.

Defense counsel then goes into what most lawyers would call “suck-up” mode to the judge and says:

We very much appreciate that your Honor has offered to make herself available, and we would not be surprised if we need to come to you with a dispute that hasn’t been resolved by moving forward or that seems sufficiently serious to put the project at risk. But that has not happened yet and we hope it will not.

After that plaintiff’s counsel complains the defense counsel has not agreed to allow depositions transcripts and witness statements to be used as training documents. That’s right. The plaintiff wanted to include congressional testimony, depositions and other witness statements that they found favorable to their position as part of the training documents to find relevant information store of custodian information.

Judge Cote was not about to be tricked into making a ruling on the spot, but instead wisely told them to go back and talk some more and get real expert input on the advisability of this approach. She is a very quick study as the following exchange at page 114 of the transcript with defense counsel after hearing the argument of plaintiff’s counsel illustrates:

THE COURT: Good. We will put those over for another day. I’m learning about predictive coding as we go. But a layperson’s expectation, which may be very wrong, would be that you should train your algorithm from the kinds of relevant documents that you might actually uncover in a search. Maybe that’s wrong and you will all educate me at some other time. I expect, Ms. Shane, if a deposition was just shot out of this e-discovery search, you would produce it. Am I right?

MS. SHANE: Absolutely, your Honor. But your instinct that what they are trying to train the system with are the kinds of documents that would be found within the custodian files as opposed to a batch of alien documents that will only confuse the computer is exactly right.

It is indeed a very interesting issue, but we cannot see a report in the case on Pacer that shows how the issue was resolved. I suspect the transcripts were all excluded, unless they were within a custodian’s account.

2014 Valentines Day Hearing

kiss_me_im_a_custodian_keychainThe only other order we found in the case mentioning predictive coding is here (dated February 14, 2014). Most of the Valentine’s Day transcript pertains to trial lawyers arguing about perjury, and complaining that some key documents were missed in the predictive coding production by JP Morgan. But the fault appears due to the failure to include a particular custodian in the search, an easy mistake to have happen. That has nothing to do with the success of the predictive coding or not.

Judge Cote handled that well, stating that no review is “perfect” and she was not about to have a redo at this late date. Her explanation at pages 5-6 of the February 14, 2014 transcript provides a good wrap up for FHFA v. JP Morgan:

Parties in litigation are required to be diligent and to act in good faith in producing documents in discovery. The production of documents in litigation such as this is a herculean undertaking, requiring an army of personnel and the production of an extraordinary volume of documents. Clients pay counsel vast sums of money in the course of this undertaking, both to produce documents and to review documents received from others. Despite the commitment of these resources, no one could or should expect perfection from this process. All that can be legitimately expected is a good faith, diligent commitment to produce all responsive documents uncovered when following the protocols to which the parties have agreed, or which a court has ordered.

Indeed, at the earliest stages of this discovery process, JP Morgan Chase was permitted, over the objection of FHFA, to produce its documents through the use of predictive coding. The literature that the Court reviewed at that time indicated that predictive coding had a better track record in the production of responsive documents than human review, but that both processes fell well short of identifying for production all of the documents the parties in litigation might wish to see.


transparencyThere are many unpublished decisions out there approving and discussing predictive coding. I know of several more. Many of them, especially the ones that first came out and pretty much blindly followed our work in Da Silva Moore, call for complete transparency, including disclosure of irrelevant documents used in training. That is what happened in FHFA v. JP Morgan and the world did not come to an end. Indeed, the process seemed to go pretty well, even with a plaintiff’s counsel who, in the words of Sullivan and Cromwell, made suggestions everyday that were unworkable and by and large would have swamped the project … but we do our best to explain that and keep moving forward. Pages 110-111 of the second hearing transcript. So it seems cooperation can happen, even when one side is clueless, and even if full disclosure has been ordered.

Since the days of 2011 and 2012, when our Da Silva Moore protocol was developed, we have had much more experience with predictive coding. We have more information on how the training actually functions with a variety of chaotic email datasets, including the new Oracle ESI collection, and even more testing with the Enron dataset.

Based on what we know now, I do not think it is necessary to make disclosure of all irrelevant documents used in training. The only documents that have a significant impact on machine learning are the borderline, grey area documents. These are the ones who relevancy is close, and often a matter of opinion, of how you view the case. Only these grey area irrelevant documents need to be disclosed to protect the integrity of the process.


The science and other data behind that has to do with Jaccard Index classification inconsistencies, as well as the importance of mid-range ranked documents to most predictive coding algorithmic analysis. See Eg: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three at the subheadings Disclosure of Irrelevant Training Documents and Conclusions Regarding Inconsistent Reviews. When you limit disclosure to grey area training documents, and relevant documents, the process can become even more efficient without any compromise in quality or integrity. This of course assumes honest evaluations of grey area documents and forthright communications between counsel. But then so does all discovery in our system of justice. So this is really nothing new, nor out of the ordinary.

All discovery depends on the integrity and trustworthiness of the attorneys for the parties. Fortunately, almost all attorneys honorably fulfill these duties, except perhaps for the duty of technology competence. That is the greatest ethical challenge of the day for all litigators.


Get every new post delivered to your Inbox.

Join 3,122 other followers