Is the IRS’s Inability to Find Emails the Result of Unethical Behavior? New Opinion by U.S. Tax Court Provides Some Clues – Part 2

October 5, 2014

Lerner_arroganceThis is Part Two of the essay where I go into the specifics of the holding in Dynamo. Please read Part One first: Is the IRS’s Inability to Find Emails the Result of Unethical Behavior? New Opinion by U.S. Tax Court Provides Some Clues – Part One. There I pointed out that the IRS attitude towards email discovery, particularly predictive coding, shows that they belong to the unethical category I call The Clueless. Yes, the IRS is clueless, but not in an affable Pink Panther Inspector Clouseau way, but in an arrogant, super-know-it-all way of egomaniac types. It is wonderfully personified in Ms. Lerner’s face during her Congressional non-testimony. Like Congress did to Lerner, the Tax Court in Dynamo properly cut down the IRS attorneys and rejected all of their IRS’ anti-predictive coding non-sense arguments.

Dynamo Holdings Opinion

Judge_Ronald_Buch2Dynamo Holdings, Ltd. vs. Commissioner, 143 T.C. No. 9 (Sept. 17, 2014) is a very well written opinion by United Stated Tax Court by Judge Ronald L. Buch. I highly recommend that you study and cite this opinion. It is so good that I have decided to devote the rest of this blog to quotation of the portions of it that pertain to predictive coding.

I cannot refrain from provided some comments too, of course, otherwise what would be the point of doing more than provide a link? But for the sake of clarity, and purity, although I will intermix my [side bar comments] along with the quotes, I will do so with blue font, and italics, so you will not mistake the court’s words with my own. Yes, I know, that is not how you do things in law review articles, that this is way too creative. So what? It will be a lot more interesting for you to read it that way, and quicker too. So damn with the old rules of legal writing, here goes.

[P]etitioners request that the Court let them use predictive coding, a technique prevalent in the technological industry but not yet formally sanctioned by this Court, to efficiently and economically identify the nonprivileged information responsive to respondent’s discovery request. [The Petitioners are the defendants, and Respondents are the plaintiff, IRS. The IRS sued to collect tax on certain transfers between business entities alleging they were disguised gifts to the owners of Dynamo. Seems like a pretty clear cut issue to me, and I cannot see why it was necessary to look at millions of emails to find out what happened. The opinion does not explain that. The merits of the case are not addressed and a detailed proportionality analysis is not provided.]

Respondent [IRS] opposes petitioners’ request to use predictive coding because, he states, predictive coding is an “unproven technology”. Respondent adds that petitioners need not devote their claimed time or expense to this matter because they can simply give him access to all data on the two tapes and preserve the right (through a “clawback agreement”) to later claim that some or all of the data is privileged information not subject to discovery.2 [This is the disingenuous part I referred to previously.]

FN 2 – We understand respondent’s use of the term “clawback agreement” to mean that the disclosure of any privileged information on the tapes would not be a waiver of any privilege that would otherwise apply to that information.

The Court held an evidentiary hearing on respondent’s motion. [It looks like the Tax Court followed Judge David Waxse on this often debated issue as to whether an evidentiary hearing should be provided, but he only went part way. As you will see, a full scale Daubert type hearing was not provided. Instead, Judge Buch treated their testimony as informal input. Most judges agree that this is appropriate, even if they do not agree with Judge Waxe's position that Daubert type rulings are appropriate in a mere discovery dispute. Most judges I have talked to think that Evidence Rule 702 does not apply, since there is no evidence or trial, and no presentation to the jury to protect; there is just a dispute as to discovery search methods.]

[W]e hold that petitioners must respond to respondent’s discovery request but that they may use predictive coding in doing so. [The defendants had argued they should not have to search two backup tapes for email at all, and the use of predictive coding was a fall back argument. The decision did not provide any detailed explanation as to necessity, and I get the impression that it was not really pushed, that the main focus of the briefs was on predictive coding.]

Petitioners ask the Court to let them use predictive coding to efficiently and economically help identify the nonprivileged information that is responsive to respondent’s discovery request. More specifically, petitioners want to implement the following procedure to respond to the request: [I have omitted the first four reasons as not terribly interesting.] … 5. Through the implementation of predictive coding, review the remaining data using search criteria that the parties agree upon to ascertain, on the one hand, information that is relevant to the matter, and on the other hand, potentially relevant information that should be withheld as privileged or confidential information.

[T]he Court is not normally in the business of dictating to parties the process that they should use when responding to discovery. [This is a very important point. See Sedona Principle Six. The defendants did not really need the plaintiff's approval to use predictive coding. Judge Buch is suggesting that this whole permission motion is an unnecessary waste of time, but he will indulge them anyway and address it. I for one am glad that he did.] If our focus were on paper discovery, we would not (for example) be dictating to a party the manner in which it should review documents for responsiveness or privilege, such as whether that review should be done by a paralegal, a junior attorney, or a senior attorney. Yet that is, in essence, what the parties are asking the Court to consider–whether document review should be done by humans or with the assistance of computers. [These are all very good points.] Respondent fears an incomplete response to his discovery. [Parties in litigation always fear that. The U.S. employs a "trust based" system of discovery that relies on the honesty of the parties, and especially relies on the honesty and cooperativeness of the attorneys who conduct the discovery. There are alternatives, like having judges control discovery. Most of the world has such judge controlled discovery, but lawyers in the U.S. do not want that, and it is doubtful that taxpayers would want to fund an alternative court based approach.] If respondent believes that the ultimate discovery response is incomplete and can support that belief, he can file another motion to compel at that time. Nonetheless, because we have not previously addressed the issue of computer-assisted review tools, we will address it here.

Each party called a witness to testify at the evidentiary hearing as an expert. Petitioners’ witness was James R. Scarazzo. Respondent’s witness was Michael L. Wudke. [I added these links. Scarazzo is with the well known vendor, FTI, in Washington D.C., and Wudke is with another vendor in N.Y., Transperfect Legal Solutions. He used to be with Deloitte.] The Court recognized the witnesses as experts on the subject matter at hand. We may accept or reject the findings and conclusions of the experts, according to our own judgment.

Predictive coding is an expedited and efficient form of computer-assisted review that allows parties in litigation to avoid the time and costs associated with the traditional, manual review of large volumes of documents. Through the coding of a relatively small sample of documents, computers can predict the relevance of documents to a discovery request and then identify which documents are and are not responsive. The parties (typically through their counsel or experts) select a sample of documents from the universe of those documents to be searched by using search criteria that may, for example, consist of keywords, dates, custodians, and document types, and the selected documents become the primary data used to cause the predictive coding software to recognize patterns of relevance in the universe of documents under review. The software distinguishes what is relevant, and each iteration produces a smaller relevant subset and a larger set of irrelevant documents that can be used to verify the integrity of the results. [That is not technically correct, at least not in most cases. The relevance subset does not get smaller and smaller. The probability predictions do, however, get more accurate. True predictive coding as used by most vendors today is active machine learning. It ranks the relevance of the probability of all documents. See Eg AI-EnhancedReview.com] Through the use of predictive coding, a party responding to discovery is left with a smaller set of documents to review for privileged information, resulting in a savings both in time and in expense. [Now the judge is back on track and this is an essential truth.] The party responding to the discovery request also is able to give the other party a log detailing the records that were withheld and the reasons they were withheld. [Judge Buch is referring to the privilege log, or in some cases, also a confidentiality log.]

Andrew J. PeckMagistrate Judge Andrew Peck published a leading, oft-cited article on predictive coding which is helpful to our understanding of that method. [Of course Judge Peck's photograph is not in the opinion.See Andrew Peck, “Search, Forward: Will Manual Document Review and Keyboard Searches be Replaced by Computer-Assisted Coding?”, L. Tech. News (Oct. 2011). The article generally discusses the mechanics of predictive coding and the shortcomings of manual review and of keyword searches. The article explains that predictive coding is a form of “computed-assisted coding”, which in turn means “tools * * * that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e., training by) a human reviewer.” Id. at 29. The article explains that:

Unlike manual review, where the review is done by the most junior staff, computer-assisted coding involves a senior partner (or team) who review and code a “seed set” of documents.Less_More_Ralph [Judge Peck wrote this back in 2011. I believe his understanding of “senior parter” level skill needed for training has since evolved. I can elaborate, but it would take us too far astray. Let’s just say what is needed is a single, or at least, very small team of real experts on the relevance facts at issue in the case. See Eg. Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One, Part Two, Part ThreeThe computer identifies properties of those documents that it uses to code other documents. As the senior reviewer continues to code more sample documents, the computer predicts the reviewer’s coding. (Or, the computer codes some documents and asks the senior reviewer for feedback.)

When the system’s predictions and the reviewer’s coding sufficiently coincide, the system has learned enough to make confident predictions for the remaining documents. Typically, the senior lawyer (or team) needs to review only a few thousand documents to train the computer. [The number depends, of course. For some projects, tens of thousands of documents may be needed over multiple iterations to adequately train the computer. Some projects are much harder than others, despite the skills of the search designers involved. Yes, it takes a great deal of skill and experience to properly design a large predictive coding search and review project. It also takes good predictive coding software that ranks all document probabilities.]

Some systems produce a simple yes/no as to relevance, while others give a relevance score (say, on a 0 to 100 basis) that counsel can use to prioritize review. For example, a score above 50 may produce 97% of the relevant documents, but constitutes only 20% of the entire document set. [All good software today ranks all documents, typically 0 to 100% probability, rather than give a simplistic yes/no ranking.]

Counsel may decide, after sampling and quality control tests, that documents with a score of below 15 are so highly likely to be irrelevant that no further human review is necessary. Counsel can also decide the cost-benefit of manual review of the documents with scores of 15-50. [Typically the cut off point is way above 15% probability. I have no idea where that number came from. A more logical and frequent number is below 50%, meaning they are probably not relevant.]

Id.

The substance of the article was eventually adopted in an opinion that states: “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.” Moore v. Publicis Groupe, 287 F.R.D. 182, 183 (S.D.N.Y. 2012), adopted sub nom. Moore v. Publicis Groupe SA, No. 11 Civ. 1279 (ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012).

Respondent asserts that predictive coding should not be used in these cases because it is an “unproven technology”. We disagree. [The alternative methods, keyword search and linear human review are the "unproven technologies," not predictive coding. Indeed, the science proves that keyword and linear review are unreliable. See Eg. LEGAL SEARCH SCIENCE.  The new gold standard is active machine learning, aka predictive coding, not hundreds of low paid contract lawyers sitting in cubicles all day.] Although predictive coding is a relatively new technique, and a technique that has yet to be sanctioned (let alone mentioned) by this Court in a published Opinion, the understanding of e-discovery and electronic media has advanced significantly in the last few years, thus making predictive coding more acceptable in the technology industry than it may have previously been. In fact, we understand that the technology industry now considers predictive coding to be widely accepted for limiting e-discovery to relevant documents and effecting discovery of ESI without an undue burden.10 [Excellent point. Plus it is not really all that "new" by today's standards. It has been around in academic circles since the 1990s.]

FN 10 – Predictive coding is so commonplace in the home and at work in that most (if not all) individuals with an email program use predictive coding to filter out spam email. See Moore v. Publicis Groupe, 287 F.R.D. 182, n.2 (S.D.N.Y. 2012), adopted sub nom. Moore v. Publicis Groupe SA, No. 11 Civ. 1279 (ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012).

See Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678-LRH-PAL, 2014 WL 3563467, at *8 (D. Nev. July 18, 2014) (stating with citations of articles that predictive coding has proved to be an accurate way to comply with a discovery request for ESI and that studies show it is more accurate than human review or keyword searches); F.D.I.C. v. Bowden, No. CV413-245, 2014 WL 2548137, at *13 (S.D. Ga. June 6, 2014) (directing that the parties consider the use of predictive coding). See generally Nicholas Barry, “Man Versus Machine Review:  The Showdown between Hordes of Discovery Lawyers and a Computer-Utilizing Predictive-Coding Technology”, 15 Vand. J. Ent. & Tech. L. 343 (2013); Lisa C. Wood, “Predictive Coding Has Arrived”, 28 ABA Antitrust J. 93 (2013). The use of predictive coding also is not unprecedented in Federal litigation. See, e.g., Hinterberger v. Catholic Health Sys., Inc., No. 08-CV-3805(F), 2013 WL 2250603 (W.D.N.Y. May 21, 2013); In Re Actos, No. 6:11-md-2299, 2012 WL 7861249 (W.D. La. July 27, 2012); Moore, 287 F.R.D. 182. Where, as here, petitioners reasonably request to use predictive coding to conserve time and expense, and represent to the Court that they will retain electronic discovery experts to meet with respondent’s counsel or his experts to conduct a search acceptable to respondent, we see no reason petitioners should not be allowed to use predictive coding to respond to respondent’s discovery request. Cf. Progressive Cas. Ins. Co., 2014 WL 3563467, at *10-*12 (declining to allow the use of predictive coding where the record lacked the necessary transparency and cooperation among counsel in the review and production of ESI responsive to the discovery request).

Mr. Scarazzo’s expert testimony supports our opinion. He testified that11 discovery of ESI essentially involves a two-step process.

FN 11 – Mr. Wudke did not persuasively say anything to erode or otherwise undercut Mr. Scarazzo’s testimony. [This is to the credit of Mr. Wudke, an honest expert.]

First, the universe of data is narrowed to data that is potentially responsive to a discovery request. Second, the potentially responsive data is narrowed down to what is in fact responsive. He also testified that he was familiar with both predictive coding and keyword searching, two of the techniques commonly employed in the first step of the two-step discovery process, and he compared those techniques by stating:

[K]ey word searching is, as the name implies, is a list of terms or terminologies that are used that are run against documents in a method of determining or identifying those documents to be reviewed. What predictive coding does is it takes the type of documents, the layout, maybe the whispets of the documents, the format of the documents, and it uses a computer model to predict which documents out of the whole set might contain relevant information to be reviewed.

So one of the things that it does is, by using technology, it eliminates or minimizes some of the human error that might be associated with it. [Note proper use of the word "some," it eliminates some of the human error. It cannot be eliminated entirely.] Sometimes there’s inefficiencies with key word searching in that it may include or exclude documents, whereas training the model to go back and predict this, we can look at it and use statistics and other sampling information to pull back the information and feel more confident that the information that’s being reviewed is the universe of potentially responsive data.

He concluded that the trend was in favor of predictive coding because it eliminates human error and expedites review. [The modifier "some" to "eliminates human error" is not used here, and thus is a slight overstatement.]

In addition, Mr. Scarazzo opined credibly and without contradiction that petitioners’ approach to responding to respondent’s discovery request is the most reasonable way for petitioners to comply with that request. Petitioners asked Mr. Scarazzo to analyze and to compare the parties’ dueling approaches in the setting of the data to be restored from Dynamo’s backup tapes and to opine on which of the approaches is the most reasonable way for petitioners to comply with respondent’s request. Mr. Scarazzo assumed as to petitioners’ approach that the restored data would be searched using specific criteria, that the resulting information would be reviewed for privilege, and that petitioners would produce the nonprivileged information to respondent. He assumed as to respondent’s approach that the restored data would be searched for privileged information without using specific search criteria, that the resulting privileged information would be removed, and that petitioners would then produce the remaining data to respondent. As to both approaches, he examined certain details of Dynamo’s backup tapes, interviewed the person most knowledgeable on Dynamo’s backup process and the contents of its backup tapes (Dynamo’s director of information technology), and performed certain cost calculations.

Mr. Scarazzo concluded that petitioners’ approach would reduce the universe of information on the tapes using criteria set by the parties to minimize review time and expense and ultimately result in a focused set of information germane to the matter. He estimated that 200,000 to 400,000 documents would be subject to review under petitioners’ approach at a cost of $80,000 to $85,000, while 3.5 million to 7 million documents would be subject to review under respondent’s approach at a cost of $500,000 to $550,000. [This is a huge reduction, and shows the importance of predictive coding. It is a reduction of from between 2.2 million to 6.6 million documents. That seems credible to me, but the actual cost saving quoted here seems off, or at least, seems incomplete. For instance, if you assume 300,000 documents, the mid-point of the estimated document count using predictive coding, and a projected cost of $85,000, that is only $00.28 per document. That is a valid number for the predictive coding culling process, but not for the actual review of the documents for confidentiality and privilege, and to confirm the privilege predictions.]

Our Rules, including our discovery Rules, are to “be construed to secure the just, speedy, and inexpensive determination of every case.” Rule 1(d). Petitioners may use predictive coding in responding to respondent’s discovery request. If, after reviewing the results, respondent believes that the response to the discovery request is incomplete, he may file a motion to compel at that time. See Rule 104(b), (d).


Is the IRS’s Inability to Find Emails the Result of Unethical Behavior? New Opinion by U.S. Tax Court Provides Some Clues – Part One

September 28, 2014

Lois-Lerner-invokes-5th-AmendmentThe IRS seems inherently incapable of finding emails. The most famous incident that everyone has heard about, and many have complained about, is the loss of emails of key witnesses in a Congressional investigation of the IRS tea party targeting scandal. In June 2014 the IRS admitted that it could not find many of the emails of the key witness, Lois Lerner. Lerner is the first IRS official to admit that agents had improperly scrutinized tax exempt applications. When she was subpoenaed to testify before Congress, she plead the Fifth Amendment and refused to answer any questions. She is not the only IRS official under investigation whose email has gone missing. As of September 5, 2014, eighteen of the 82 people questioned “had some type of technical computer issue” and at least five have lost all emails.

An opinion in the United Stated Tax Court by Judge Ronald L. Buch shows that email search incompetence continues at the IRS. Dynamo Holdings, Ltd. vs. Commissioner, 143 T.C. No. 9 (Sept. 17, 2014). In Dynamo IRS lawyers wanted to force linear review of millions of emails stored on backup disks covering hundreds of custodians. Everyone knows that this outdated approach is the least likely way to find relevant evidence in big data searches.

irs_scandalThe IRS lawyers’ position in Dynamo shows a complete lack of understanding of legal search. It is almost like they are hard wired not to find emails, or any other ESI for that matter. The real scandal at the IRS goes beyond targeting political groups fir special treatment, it includes the general lack of technical competence of the whole agency. I dare say this Nineteenth Century paper mentality is not unique to the IRS. In infects many federal agencies, most of who still file emails by printing to paper.

Judge_Ronald_BuchFortunately Tax Court Judge Buch is a lot more savvy than the IRS lawyers who practice before him. He forcefully rejected the attempts of IRS lawyers to prevent the defendant from using predictive coding. His well written opinion now stands as a significant contribution to the growing jurisprudence of legal search.

Even though it is clear to most everyone in the field of e-discovery that predictive coding technology is the best way to find needles with probative value in an otherwise haystack of emails, the IRS lawyers opposed the defendants use of predictive coding. Instead, the IRS wanted to have individual all too human reviewers look at everything to determine relevance and privilege. Of course, the IRS also argued, quite disingenuously I think, that the defendant could, if it wanted, simply turn over all the contents of backup tapes without any review at all. The IRS would do that for them. How nice. Reminds me of that famous phrase: “Hello. I’m from the Federal Government and I’m here to help.”

Of course, the defendant would have none of that phony cooperation approach of just give me everything and we’ll protect you with a clawback. Instead, the defendant moved the Tax Court for permission to use predictive coding over the objection of the IRS.

Although I am sure the motion practice and hearing, which included testimony of experts on predictive coding, was an expensive exercise, the results are fortuitous. The opinion by Judge Buch should be of great value to other litigants and judges when faced with dinosaurs like the IRS.

Insights Into Unethical Behavior of the IRS

The kind of incompetence we see here by IRS lawyers is also revealing as to what kind of evil may lay behind the IRS’s destruction of emails in the Tea Party Targeting scandal. Many do not believe the IRS claim that the destruction of key email was an accident. They believe it was intentional; a criminal act of destruction of evidence. They argue that the emails were destroyed to protect a powerful government agency and many of its high ranking officials, bureaucrats all. They call this email loss an egregious abuse of power. I used to think that was possible, as I have had some personal experiences with the non-profit group at the IRS myself, and I know just how arrogant and abusive they can be. Power often corrupts. But I am not so sure this is what happened here. There may be an unintentional form of unethical behavior at work here.

To explain, I must harken back to the typology of unethical lawyers set forth in my last blog, What Can Happen When Lawyers Over Delegate e-Discovery Preservation and Search to a Client, and Three Kinds of “Ethically Challenged” Lawyers: “Slimy Weasels,” “Gutless,” and “Clueless”.  There I outlined three basic types of ethically challenged lawyers, and added three subcategories for one of those types, the Clueless, for a total of five kinds of unethical lawyers. As to why lawyers are often ethically challenged, another subject I have tried to understand for years, see my article: Lawyers Behaving Badly, 60 Mercer L. Rev. 983 (Spring 2009).

Under my typology the theory that the IRS destroyed the emails on purpose would make the IRS officials involved Slimy Weasel types. They did bad, they knew it, and they did not care. They wanted to cover up the truth and they did not care how many laws they broke in the process.

This could well be true, but the Dynamo Holdings, Ltd. vs. Commissioner case suggests that the IRS are not actually a bunch of Slimy Weasel types, at least when it comes to emails. It instead suggests that we are dealing with the Clueless, whom, as my last blog spells out, come in three types:

  1. Egomaniac Clueless. People who are so full of themselves that they have no idea they are clueless; they think they know it all.
  2. Open Clueless. People who have some slight idea that they might possibly be clueless and so might be open to learning some new things.
  3. Arrogant Clueless. These are the people who actually brag about how clueless they are. You know the lawyers who bash all things technological. They still think (pray) that email and this whole ESI discovery thing may be a fad. They brag about sticking to good old paper discovery

Lerner_arroganceThe positions the IRS lawyers took in Dynamo Holdings suggest they are among the Egomaniac Clueless, which are, as everyone knows, among the most obnoxious kind of ethically challenged lawyers. Some might say it is a bit harsh to refer to clueless lawyers as unethical, after all, they have done nothing wrong on purpose, they just did not know any better. Being clueless is not unethical they may say, but taking on a case, and serving a client incompetently, certainly is unethical. The duty of competence is spelled out in the Rules of Professional Conduct of every Bar Association in the country. They all require competence of their lawyer members as an Ethical Duty, not just as good advice on how to avoid malpractice. In fact, under the model ABA Model Rules of Professional Conduct that most states follow, it is Rule 1.1:

A lawyer shall provide competent representation to a client. Competent representation requires the legal knowledge, skill, thoroughness, and preparation reasonably necessary for the representation.

As the ABA saysCompetence: Acquire it or Hire it. It cannot and should not be ignored, much less faked. It is not ok to handle e-discovery and be clueless about it at the same time. It is unethical. Either spend the time to learn, which is a good idea for any litigator under fifty, or bring in another attorney who knows how in order to help you. I might add it also helps to listen to what they say and actually follow his or her advice!

In Part Two of this blog I will go into the actual holdings of Dynamo Holdings, Ltd. vs. Commissioner, 143 T.C. No. 9 (Sept. 17, 2014) by providing extensive quotations and my commentary. Yes, you can rest assured, I will tell you what I really think, even if this does trigger yet another audit.


What Can Happen When Lawyers Over Delegate e-Discovery Preservation and Search to a Client, and Three Kinds of “Ethically Challenged” Lawyers: “Slimy Weasels,” “Gutless,” and “Clueless”

September 21, 2014
Sergeant Schultz of Hogan's Heros

“I see nothing, NOTHING!” Sergeant Schultz

Bad things tend to happen when lawyers delegate e-discovery responsibility to their clients. As all informed lawyers know, lawyers have a duty to actively supervise their client’s preservation. They cannot just turn a blind eye; just send out written notices and forget it. Lawyers have an even higher duty to manage discovery, including search and production of electronic evidence. They cannot just turn e-discovery over to a client and then sign the response to the request for production. The only possible exception proves the rule. If a client has in-house legal counsel, and if they appear of record in the case, and if the in-house counsel signs the discovery response, then, and only then, is outside counsel (somewhat) off the hook. Then they can lay back, a little bit, but, trust me, this almost never happens.

To see a few of the bad things that can happen when lawyers over delegate e-discovery, you have only to look at a new district court opinion in Ohio. Brown v. Tellermate Holdings Ltd., No. 2:11-cv-1122 (S.D. Ohio July 1, 2014) (2014 WL 2987051 ). Severe sanctions were entered against the defendant because its lawyers were too laid back. The attorneys were personally sanctioned too, and ordered to pay the other side’s associated fees and costs.

The attorneys were sanctioned because they did not follow one of the cardinal rules of attorney-client relations in e-discovery, the one I call the Ronald Reagan Rule, as it is based on his famous remark concerning the nuclear arms treaty with the USSR: Trust but verify

The sanctioned attorneys in Brown trusted their client’s representations to them that they had fully preserved, that they had searched for the evidence. Do not get me wrong. There is nothing wrong with trusting your client, and that is not why they were sanctioned. They were sanctioned because they failed to go on to verify. Instead, they just accepted everything they were told with an uncritical eye. According to the author of the Brown opinion, U.S. Magistrate Judge Terence P. Kemp:

… significant problems arose in this case for one overriding reason: counsel fell far short of their obligation to examine critically the information which Tellermate [their client] gave them about the existence and availability of documents requested by the Browns. As a result, they did not produce documents in a timely fashion, made unfounded arguments about their ability and obligation to do so, caused the Browns to file discovery motions to address these issues, and, eventually, produced a key set of documents which were never subject to proper preservation. The question here is not whether this all occurred – clearly, it did – but why it occurred, and what, in fairness, the Court needs to do to address the situation which Tellermate and its attorneys have created.

Id. at pgs. 2-3 (emphasis added).

What is the Worst Kind of Lawyer?

slimy_weasel3Taking reasonable steps to verify can be a sticky situation for some lawyers. This is especially true for ethically challenged lawyers. In my experience lawyers like this generally come in three different varieties, all repugnant. Sometimes the lawyers just do not care about ethics. They are the slimy weasels among us. They can be more difficult to detect than you might think. They sometimes talk the talk, but never walk it, especially when the judge is not looking, or they think they can get away with it. I have run into many slimy weasel lawyers over the years, but still, I like to think they are rare.

cowardOther lawyers actually care about ethics. They know what they are doing is probably wrong, and it bothers them, at least somewhat. They understand their ethical duties, they also understand Rule 26(g), Federal Rules of Civil Procedure, but they just do not have the guts to fulfill their duties. They know its is wrong to simply trust the client’s response of no, we do not have that, but they do it anyway. They are gutless lawyers.

Often the gutless suffer from a combination of weak moral fibre and pocketbook pressures. They lack the economic independence to do the right thing. This is especially true in smaller law firms that are dependent on only a few clients to survive, or in siloed lawyers in a big firm without proper management. Such gutless lawyers may succumb to client pressures to save on fees and just let the client handle e-discovery. I have some empathy for such cowardly lawyers, but no respect. They often are very successful; almost as successful as the slimy weasels types that do not care at all about ethics.

ScarecrowThere is a third kind of lawyer, the ones who do not even know that they have a personal duty as an officer of the court to supervise discovery. They do not know that they have a personal duty in litigation to make reasonable, good faith efforts to try to ensure that evidence is properly preserved and produced. They are clueless lawyers. There are way too many of these brainless scarecrows in our profession.

I do not know which attorneys are worse. The clueless ones who are blissfully ignorant and do not even know that they are breaking bad by total reliance on their clients? Or the ones who know and do it anyway? Among the ones who know better, I am not sure who is worse either. Is it the slimy weasels who put all ethics aside when it comes to discovery, and are not too troubled about it. Or, is it the gutless lawyers, who know better, and do it anyway out of weak moral fortitude, usually amplified by economic pressures. All three of these lawyer types are dangerous, not only to themselves, and their clients, but to the whole legal system. So what do you think? Please fill out the online poll below and tell us which kind of lawyer you think is the worst.

 

I will not tell you how I voted, but I will share my personal message to each of the three types. There are not many slimy weasels who read my blog, but I suspect there may be a few. Be warned. I do not care how powerful and protected you think you are. If I sniff you out, I will come after you. I fear you not. I will expose you and show no mercy. I will defeat you. But, after the hearing, I will share a drink with some of you. Others I will avoid like the plague. Evil comes in many flavors and degrees too. Some slimy weasel lawyers are charming social engineers, and not all bad. The admissions they sometimes make to try to gain your trust can be especially interesting. I protect the confidentiality of their off-the-record comments, even though I know they would never protect mine. Those are the rules of the road in dancing with the devil.

The-devil-s-advocate

As to the gutless, and I am pretty sure that a few of my readers fall into that category, although not many. To you I say: grow a spine. Find your inner courage. You cannot take money and things with you when you die. So what if you fail financially? So what if you are not a big success? It is better to sleep well. Do the right thing and you will never regret it. Your family will not starve. Your children will respect you. You will be proud to have them follow in your footsteps, not ashamed. I will not have drinks with gutless lawyers.

As to the clueless, and none of my readers by definition fall into that category, but I have a message for you nonetheless: wake up, your days are numbered. There are at least three kinds of clueless lawyers and my attitude towards each is different. The first kind is so full of themselves that they have no idea they are clueless. I will not have drinks with these egomaniacs. The second type has some idea that they may need to learn more about e-discovery. They may be clueless, but they are starting to realize it. I will share drinks with them. Indeed I will try very hard to awaken them from their ethically challenged slumber. The third kind is like the first, except that they know they are clueless and they are proud of it. They brag about not knowing how to use a computer. I will not have drinks with them. Indeed, I will attack them and their stone walls almost as vigorously as the weasels.

Judges Dislike the Clueless, Gutless, and Slimy Weasels

Judges dislike all three kinds of ethically challenged lawyers. That is why I was not surprised by Judge Kemp’s sanction in Brown of both the defendant and their attorneys. (By the way, I know nothing about defense counsel in this case and have no idea which category, if any, they fall into.) Here is how Judge Kemp begins his 47 page opinion.

There may have been a time in the courts of this country when building stone walls in response to discovery requests, hiding both the information sought and even the facts about its existence, was the norm (although never the proper course of action). Those days have passed. Discovery is, under the Federal Rules of Civil Procedure, intended to be a transparent process. Parties may still resist producing information if it is not relevant, or if it is privileged, or if the burden of producing it outweighs its value. But they may not, by directly misrepresenting the facts about what information they have either possession of or access to, shield documents from discovery by (1) stating falsely, and with reckless disregard for the truth, that there are no more documents, responsive or not, to be produced; or (2) that they cannot obtain access to responsive documents even if they wished to do so. Because that is the essence of what occurred during discovery in this case, the Court has an obligation to right that wrong, and will do so in the form of sanctions authorized by Fed. R. Civ. P. 37.

Take these words to heart. Make all of the attorneys in your firm read them. There are probably a few old school types in your firm where you should post the quote on their office wall, no matter which type they are.

Brown v. Tellermate Holdings Ltd.

Judge_KempThe opinion in Brown v. Tellermate Holdings Ltd., No. 2:11-cv-1122 (S.D. Ohio July 1, 2014) (2014 WL 2987051) by U.S. Magistrate Judge Terence Kemp in Columbus, Ohio, makes it very clear that attorneys are obligated to verify what clients tell them about ESI. Bottom line – the court held that defense counsel in this single plaintiff, age discrimination case:

… had an obligation to do more than issue a general directive to their client to preserve documents which may be relevant to the case. Rather, counsel had an affirmative obligation to speak to the key players at [the defendant] so that counsel and client together could identify, preserve, and search the sources of discoverable information.

Id. at pg. 35.

In Brown the defense counsel relied on representations from their client regarding the existence of performance data within a www.salesforce.com database and the client’s ability to print summary reports. The client’s representations were incorrect and, according to the court, had counsel properly scrutinized the client’s representations, they would have uncovered the inaccuracies.

As mentioned, both defendant and its counsel were sanctioned. The defendant was precluded from using any evidence that would tend to show that the plaintiffs were terminated for performance-related reasons. This is a very serious sanction, which is, in some ways, much worse than an adverse inference instruction. In addition, both the defendant and its counsel were ordered to jointly reimburse plaintiffs the fees and costs they incurred in filing and prosecuting multiple motions to compel various forms of discovery. I hope it is a big number.

The essence of the mistake made by defense counsel in Brown was to trust, but not verify. They simply accepted their client’s statements. They failed to do their own due diligence. Defense counsel aggravated their mistake by a series of over aggressive discovery responses and argumentative positions, including such things as over-designation of AEO confidentiality, a document dump, failure to timely log privileged ESI withheld, and refusal to disclose search methods used.

The missteps of defense counsel are outlined in meticulous detail in this 47 page opinion by Judge Terence Kemp. In addition to the great quotes above, I bring the following quotes to your attention. Still, I urge you to read the whole opinion, and more importantly, to remember its lessons the next time a client does not want you to spend the time and money to do your job and verify what the client says. This opinion is a reminder for all of us to exercise our own due diligence and, at the same time, to cooperate in accord with your professional duties. An unsophisticated client might not always appreciate that approach, but, it is in their best interests, and besides, as lawyers and officers of the court, we have no choice.

[when e-discovery is involved] Counsel still have a duty (perhaps even a heightened duty) to cooperate in the discovery process; to be transparent about what information exists, how it is maintained, and whether and how it can be retrieved; and, above all, to exercise sufficient diligence (even when venturing into unfamiliar territory like ESI) to ensure that all representations made to opposing parties and to the Court are truthful and are based upon a reasonable investigation of the facts.

 Id. at Pg. 3.

As this Opinion and Order will explain, Tellermate’s counsel:

- failed to uncover even the most basic information about an electronically-stored database of information (the “salesforce.com” database);

- as a direct result of that failure, took no steps to preserve the integrity of the information in that database;

- failed to learn of the existence of certain documents about a prior age discrimination charge (the “Frank Mecka matter”) until almost a year after they were requested;

- and, as a result of these failures, made statements to opposing counsel and in oral and written submissions to the Court which were false and misleading, and which had the effect of hampering the Browns’ ability to pursue discovery in a timely and cost-efficient manner (as well as the Court’s ability to resolve this case in the same way).

These are serious matters, and the Court does not reach either its factual or its legal conclusions in this case lightly.

Id. at pg. 4.

In addition to the idea that discovery is broad and is designed to permit parties to obtain enough evidence either to prove their claims or disprove the opposing party’s claim, discovery under the Federal Rules of Civil Procedure has been designed to be a collaborative process. As one Court observed,

It cannot seriously be disputed that compliance with the “spirit and purposes” of these discovery rules requires cooperation by counsel to identify and fulfill legitimate discovery needs, yet avoid seeking discovery the cost and burden of which is disproportionally large to what is at stake in the litigation. Counsel cannot “behave responsively” during discovery unless they do both, which requires cooperation rather than contrariety, communication rather than confrontation.

Mancia v. Mayflower Textile Servs. Co., 253 F.R.D. 354, 357-58 (D. Md. 2008). Such a collaborative approach is completely consistent with a lawyer’s duty to represent his or her client zealously. See Ruiz-Bueno v. Scott, 2013 WL 6055402, *4 (S.D. Ohio Nov. 15, 2013). It also reflects a duty owed to the court system and the litigation process.

Id. at pgs. 28-29. Also see: Losey, R. Mancia v. Mayflower Begins a Pilgrimage to the New World of Cooperation, 10 Sedona Conf. J. 377 (2009 Supp.).

Tellermate, as an entity, knew that every statement it made about its control over, and ability to produce, the salesforce.com records was not true when it was made. It had employees who could have said so – including its salesforce.com administrators – had they simply been asked. Its representations were illogical and were directly contradicted by the Browns, who worked for Tellermate, had salesforce.com accounts, and knew that Tellermate could access those accounts and the information in them. And yet Tellermate’s counsel made these untrue statements repeatedly, in emails, letters, briefs, and during informal conferences with the Court, over a period of months, relenting only when the Court decided that it did not believe what they were saying. This type of behavior violated what has been referred to as “the most fundamental responsibility” of those engaged in discovery, which is “to provide honest, truthful answers in the first place and to supplement or correct a previous disclosure when a party learns that its earlier disclosure was incomplete or incorrect.” Lebron v. Powell, 217 F.R.D. 72, 76 (D.D.C. 2003). “The discovery process created by the Federal Rules of Civil Procedure is premised on the belief or, to be more accurate, requirement that parties who engage in it will truthfully answer their opponents’ discovery requests and  consistently correct and supplement their initial responses.” Id. at 78. That did not happen here.

Id. at pg. 31.

But it is not fair to place the entire blame on Tellermate, even if it must shoulder the ultimate responsibility for not telling counsel what, collectively, it knew or should have known to be the truth about its ability to produce the salesforce.com information. As this Court said in Bratka, in the language quoted above at page 3, counsel cannot simply take a client’s representations about such matters at face value. After all, Rule 26(g) requires counsel to sign discovery responses and to certify their accuracy based on “a reasonable inquiry” into the facts. And as Judge Graham (who is, coincidentally, the District Judge presiding over this case as well, and whose views on the obligations of counsel were certainly available to Ms. O’Neil and Mr. Reich), said in Bratka, 164 F.R.D. at 461:

The Court expects that any trial attorney appearing as counsel of record in this Court who receives a request for production of documents in a case such as this will formulate a plan of action which will ensure full and fair compliance with the request. Such a plan would include communicating with the client to identify the persons having responsibility for the matters which are the subject of the discovery request and all employees likely to have been the authors, recipients or custodians of documents falling within the request. The plan should ensure that all such individuals are contacted and interviewed regarding their knowledge of the existence of any documents covered by the discovery request, and should include steps to ensure that all documents within their knowledge are retrieved. All documents received from the client should be reviewed by counsel to see whether they indicate the existence of other documents not retrieved or the existence of other individuals who might have documents, and there should be appropriate follow up. Of course, the details of an appropriate document search will vary, depending upon the circumstances of the particular case, but in the abstract the Court believes these basic procedures should be employed by any careful and conscientious lawyer in every case.

 Id. at pgs. 32-33.

Like any litigation counsel, Tellermate’s counsel had an obligation to do more than issue a general directive to their client to preserve documents which may be relevant to the case. Rather, counsel had an affirmative obligation to speak to the key players at Tellermate so that counsel and client together could identify, preserve, and search the sources of discoverable information. See Cache La Poudre Feeds, LLC v. Land O’ Lakes, Inc., 244 F.R.D. 614, 629 (D. Colo. 2007). In addition, “counsel cannot turn a blind eye to a procedure that he or she should realize will adversely impact” the search for discovery. Id. Once a “litigation hold” is in place, “a party cannot continue a routine procedure that effectively ensures that potentially relevant and readily available information is no longer ‘reasonably accessible’ under Rule 26(b)(2)(B).” Id.

Id. at pg. 35.

As noted above, Tellermate and its counsel also made false representations to opposing counsel and the Court concerning the existence of documents relating to the Frank Mecka matter. Indeed, at the hearing on the pending motions, Tellermate’s counsel stated that she was unaware of the existence of the great majority of the Frank Mecka documents until almost a year after they were requested. Once again, it is not sufficient to send the discovery request to a client and passively accept whatever documents and information that client chooses to produce in response. See Cache La Poudre Feeds, 244 F.R.D. at 629.

 Id. at pg. 37 (emphasis added).

There are two distinct but related problems with trying to remedy Tellermate’s failings concerning these documents. The first is the extremely serious nature of its, and counsel’s, strenuous efforts to resist production of these documents and the strident posture taken with both opposing counsel and the Court. Perhaps the most distressing aspect of the way in which this was litigated is how firmly and repeatedly counsel represented Tellermate’s inability to produce these documents coupled with the complete absence of Tellermate’s compliance with its obligation to give counsel correct information, and counsel’s complete abdication of the responsibilities so well described by this Court in Bratka. At the end of the day, both Tellermate’s and its counsel’s actions were simply inexcusable, and the Court has no difficulty finding that they were either grossly negligent or willful acts, taken in objective bad faith.

Id. at pg. 43.

The only realistic solution to this problem is to preclude Tellermate from using any evidence which would tend to show that the Browns were terminated for performance-related reasons. … This sanction is commensurate with the harm caused by Tellermate’s discovery failures, and is also warranted to deter other similarly-situated litigants from failing to make basic, reasonable inquiries into the truth of representations they make to the Court, and from failing to take precautions to prevent the spoliation of evidence. It serves the main purposes of Rule 37 sanctions, which are to prevent parties from benefitting from their own misconduct, preserving the integrity of the judicial process, and deterring both the present litigants, and other litigants, from engaging in similar behavior.

Id. at pg. 45.

Of course, it is also appropriate to award attorneys’ fees and costs which the Browns have incurred in connection with moving to compel discovery concerning the salesforce.com documents and the Mecka documents, and those fees and expenses incurred in filing and prosecuting the motion for sanctions and the motion relating to the attorneys-eyes-only documents. … Finally, Tellermate and its counsel shall pay, jointly, the Browns’ reasonable attorneys’ fees and costs incurred in the filing and prosecution of those two motions as well as in the filing of any motions to compel discovery relating to the salesforce.com and Frank Mecka documents.

Id. at pgs. 45-46.

So sayeth the Court.

 Conclusion

obligatory iPhone Selfie jazzed up with ink strokes effectsThe defendant’s law firm here did a disservice to their clients by not pushing back, and by instead simply accepting their clients’ report on what relevant ESI they had, or did not have. Defense counsel cannot do that. We have a responsibility to supervise discovery, especially complex e-discovery, and be proactive in ESI preservation. This opinion shows what happens when a firm chooses not to be diligent. The client loses and the lawyers are sanctioned.

Our obligation as attorneys of record does not end with the client’s sending a litigation hold notice. If a client tells us something regarding the existence, or more pointedly, the non-existence, of electronically stored information that does not make sense, or seemingly is contradicted by other evidence, it is critical for an attorney to investigate further. The client may not want you to do that, but it is in the client’s best interests that you do so. The case could depend upon it. So could your license to practice law, not to mention your reputation as a professional. It is never worth it. It is far better to sleep well at night with a clear conscience, even if it sometimes means you lose a client, or are generally not as successful, or rich, as the few ethically challenged lawyers who appear to get away with it.


Caveat Emptor – Beware of Vendor Trickery

September 18, 2014

In a crowd of e-Discovery Vendors, where each claims to have the Best Software

HOW CAN YOU KNOW WHO IS TELLING THE TRUTH?

Watch this short video animation below for one answer to that question, and yes, this is somewhat self-promotional, but still true.

???????????????

____________

___________

Only trust independent expert commentators and peer reviewed scientific experiments.

 _____

A full blog on lawyer ethics and an important new case on diligence is coming on this blog soon.


Guest Blog: Talking Turkey

September 7, 2014

Maura-and-Gordon_Aug2014EDITORS NOTE: This is a guest blog by Gordon V. Cormack, Professor, University of Waterloo, and Maura R. Grossman, Of Counsel, Wachtell, Lipton, Rosen & Katz. The views expressed herein are solely those of the authors and should not be attributed to Maura Grossman’s law firm or its clients. 

This guest blog constitutes the first public response by Professor Cormack and Maura Grossman, J.D., Ph.D., to articles published by one vendor, and others, that criticize their work. In the Editor’s opinion the criticisms are replete with misinformation and thus unfair. For background on the Cormack Grossman study in question, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic DiscoverySIGIR’14, July 6–11, 2014, and the Editor’s views on this important research seeLatest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One and Part Two and Part Three. After remaining silent for some time in the face of constant vendor potshots, Professor Cormack and Dr. Grossman feel that a response is now necessary. They choose to speak at this time in this blog because, in their words:

We would have preferred to address criticism of our work in scientifically recognized venues, such as academic conferences and peer-reviewed journals. Others, however, have chosen to spread disinformation and to engage in disparagement through social media, direct mailings, and professional meetings. We have been asked by a number of people for comment and felt it necessary to respond in this medium.

___________________

Guest Blog:  TALKING TURKEY

OrcaTec, the eDiscovery software company started by Herbert L. Roitblat, attributes to us the following words at the top of its home page: “Not surprisingly, costs of predictive coding, even with the use of relatively experienced counsel for machine-learning tasks, are likely to be substantially lower than the costs of human review.” These words are not ours. We neither wrote nor spoke them, although OrcaTec attributes them to our 2011 article in the Richmond Journal of Law and Technology (“JOLT article”).

[Ed. Note: The words were removed shortly after blog was published.]

Oratec_Grossman_quote

 

A series of five OrcaTec blog posts (1, 2, 3, 4, 5) impugning our 2014 articles in SIGIR and Federal Courts Law Review (“2014 FCLR article”) likewise misstates our words, our methods, our motives, and our conclusions. At the same time, the blog posts offer Roitblat’s testimonials—but no scientific evidence—regarding the superiority of his, and OrcaTec’s, approach.

As noted in Wikipedia, a straw man is a common type of argument and is an informal fallacy based on the misrepresentation of an opponent’s argument. To be successful, a straw man argument requires that the audience be ignorant or uninformed of the original argument.”  First and foremost, we urge readers to avoid falling prey to Roitblat’s straw man by familiarizing themselves with our articles and what they actually say, rather than relying on his representations as to what they say.  We stand by what we have written.

Second, we see no reason why readers should accept Roitblat’s untested assertions, absent validation through the scientific method and peer review. For example, Roitblat claims, without providing any scientific support, that:

These claims are testable hypotheses, the formulation of which is the first step in distinguishing science from pseudo-science; but Roitblat declines to take the essential step of putting his hypotheses to the test in controlled studies.

Overall, Roitblat’s OrcaTec blog posts represent a classic example of truthiness. In the following paragraphs, we outline some of the misstatements and fallacious arguments that might leave the reader with the mistaken impression that Roitblat’s conclusions have merit.

With Us or Against Us?

Our JOLT article, which OrcaTec cites approvingly, concludes:

Overall, the myth that exhaustive manual review is the most effective—and therefore, the most defensible—approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.  Of course, not all technology-assisted reviews (and not all manual reviews) are created equal. The particular processes found to be superior in this study are both interactive, employing a combination of computer and human input.  While these processes require the review of orders of magnitude fewer documents than exhaustive manual review, neither entails the naïve application of technology absent human judgment. Future work may address which technology-assisted review process(es) will improve most on manual review, not whether technology-assisted review can improve on manual review (emphasis added; original emphasis in bold).

The particular processes shown to be superior, based on analysis of the results of the Interactive Task of the TREC 2009 Legal Track, were an active learning method employed by the University of Waterloo, and a rule-based method employed by H5. Despite the fact that OrcaTec chose not to participate in TREC, and their method—which employs neither active learning nor a rule base—is not one of those shown by our study to be superior, OrcaTec was quick to promote TREC and our JOLT article as scientific evidence for the effectiveness of their method.

Oratec _quote_Grossman

In his OrcaTec blog posts following the publication of our SIGIR and 2014 FCLR articles, however, Roitblat espouses a different view. In Daubert, Rule 26(g) and the eDiscovery Turkey, he states that the TREC 2009 data used in the JOLT and SIGIR studies “cannot be seen as independent in any sense, in that the TREC legal track was overseen by Grossman and Cormack.” Notwithstanding his argumentum ad hominem, the coordinators of the TREC 2009 Legal Track included neither of us.  Cormack was a TREC 2009 participant, who directed the Waterloo effort, while Grossman was a “Topic Authority,” who neither knew Cormack at the time, nor had any role in assessing the Waterloo effort. It was not until 2010, that Cormack and Grossman became Legal Track coordinators.

TREC Overview

Roitblat’s change of perspective perhaps owes to the fact that our SIGIR article is critical of random training for technology-assisted review (“TAR”), and our 2014 FCLR article is critical of “eRecall,” both methods advanced by Roitblat and employed by OrcaTec. But nothing about TREC 2009 or our JOLT study has changed in the intervening years, and the OrcaTec site continues—even at the time of this writing—to (mis)quote our work as evidence of OrcaTec’s effectiveness, despite Roitblat’s insistence that OrcaTec bears no resemblance to anything we have tested or found to be effective. The continuous active learning (“CAL”) system we tested in our SIGIR study, however, does resemble the Waterloo system shown to be more effective than manual review in our JOLT study. If OrcaTec bears no resemblance to the CAL system—or indeed, to any of the others we have tested—on what basis has OrcaTec cited TREC 2009 and our JOLT study in support of the proposition that their TAR tool works?

Apples v. Oranges

gaincurveContrary to the aphorism, “you can’t compare apples to oranges,” you certainly can, provided that you use a common measure like weight in pounds, price in dollars per pound, or food energy in Calories. Roitblat, in comparing his unpublished results to our peer-reviewed results, compares the shininess of an apple in gloss units with the sweetness of an orange in percent sucrose equivalent. The graph above, reproduced from the first of the five Roitblat blogs, shows three dots placed by Roitblat over four “gain curves” from our SIGIR article. Roitblat states (emphasis added): 

The x-axis shows the number of training documents that were reviewed. The y-axis shows the level of Recall obtained.

This may be true for Roitblat’s dots, but for our gain curves, on which his dots are superimposed, the x-axis shows the total number of documents reviewed, including both the training and review efforts combined.  Dots on a graph reflecting one measure, placed on top of curves reflecting a different measure, convey no more information than paintball splats.

paintball3For OrcaTec’s method, the number of training documents is tiny compared to the number of documents identified for subsequent review. Small wonder the dots are so far to the left. For a valid comparison, Roitblat would have to move his dots way to the right to account for the documents subject to subsequent review, which he has disregarded. Roitblat does not disclose the number of documents identified for review in the matters reflected by his three dots. We do know, however, that in the Global Aerospace case, OrcaTec was reported to achieve 81% recall with 5,000 training documents, consistent with the placement of Roitblat’s green dot. We also know that roughly 173,000 documents were identified for second-pass review. Therefore, in an apples-to-apples comparison with CAL, a dot properly representing Global Aerospace would be at the same height as the green dot, but 173,000 places farther to the right—far beyond the right edge of Roitblat’s graph.

Of course, even if one were to compare using a common measure, there would be little point, due to the number of uncontrolled differences between the situations from which the dots and gain curves were derived. Only a valid, controlled comparison can convey any information about the relative effectiveness of the two approaches.

Fool’s Gold?

In The Science of Comparing Learning Protocols—Blog Post II on the Cormack & Grossman Article, Roitblat seeks to discredit our SIGIR study so as to exempt OrcaTec from its findings. He misrepresents the context of our words in the highlighted quote below, claiming that they pertain to the “gold standard” we used for evaluation:

Here I want to focus on how the true set, the so-called “gold standard” was derived for [four of the eight] matters [Cormack and Grossman] present. They say that for the “true” responsiveness values “for the legal-matter-derived tasks, we used the coding rendered by the first-pass reviewer in the course of the review. Documents that were never seen by the first-pass reviewer (because they were never identified as potentially responsive) were deemed to be coded as non-responsive” (emphasis added).

As may be seen from our SIGIR article at page 155, the words quoted above do not refer to the gold standard at all, but to a deliberately imperfect “training standard” used to simulate human review. Our gold standard used a statistical sampling technique for the entire collection known as the Horvitz-Thompson estimator; a technique that has gained widespread acceptance in the scientific community since its publication, in 1952, in the Journal of the American Statistical Association.

Apparently, to bolster his claims, Roitblat also provides a column of numbers titled “Precision,” on the right side of the table reproduced below.

Table_3_Grossman

We have no idea where these numbers came from—since we did not report precision in our SIGIR article—but if these numbers are intended to reflect the precision achieved by the CAL process at 90% recall, they are simply wrong. The correct numbers may be derived from the information provided in Table 1 (at page 155) and Figure 1 (at page 157) of our SIGIR article.

While we make no claim that our study is without limitations (see Section 7.5 at page 161 of our SIGIR article), Roitblat’s special pleading regarding the real or imagined limitations of our study provides no support for his claim that random training (using the OrcaTec tool in particular) achieves superior results to active learning. If Roitblat believes that a different study would show a contrary result to ours, he should conduct such a study, and submit the results for peer review.

Outcomnes_Toolkit_GrossmanAlthough we have been described by Roitblat as “CAR vendors” with a “vested interest in making their algorithm appear better than others,” we have made freely available our TAR Evaluation Toolkit, which contains the apparatus we used to conduct our SIGIR study, including the support vector machine (“SVM”) learning algorithm, the simulation tools, and four of the eight datasets. Researchers are invited to reproduce our results—indeed, we hope, to improve on them—by exploring other learning algorithms, protocols, datasets, and review tasks. In fact, in our SIGIR article at page 161, we wrote: 

There is no reason to presume that the CAL results described here represent the best that can be achieved. Any number of feature engineering methods, learning algorithms, training protocols, and search strategies might yield substantive improvements in the future.

Roitblat could easily use our toolkit to test his claims, but he has declined to do so, and has declined to make the OrcaTec tool available for this purpose. We encourage other service providers to use the toolkit to evaluate their TAR tools, and we encourage their clients to insist that they do, or to conduct or commission their own tests. The question of whether Vendor X’s tool outperforms the free software we have made available is a hypothesis that may be tested, not only for OrcaTec, but for every vendor.

Since SIGIR, we have expanded our study to include the 103 topics of the RCV1-v2 dataset, with prevalences ranging from 0.0006% (5 relevant documents in 804,414) to 47.4% (381,000 relevant documents in 804,414). We used the SVMlight tool and word-based tf-idf tokenization strategy that the RCV1-v2 authors found to be most effective. We used the topic descriptions, provided with the dataset, as keyword “seed queries.” We used the independent relevance assessments, also provided with the dataset, as both the training and gold standards. The results—on 103 topics—tell the same story as our SIGIR paper, and will appear—once peer reviewed—in a forthcoming publication.

Straw Turkey

Straw_turkeyWe were dumbfounded by Roitblat’s characterization of our 2014 FCLR article:

Schieneman and Gricks argue that one should measure the outcome of eDiscovery efforts to assess their reasonableness, and Grossman and Cormack argue that such measurement is unnecessary under certain conditions.

What we actually wrote was:

[Schieneman and Gricks’] exclusive focus on a particular statistical test, applied to a single phase of a review effort, does not provide adequate assurance of a reasonable production, and may be unduly burdensome. Validation should consider all available evidence concerning the effectiveness of the end-to-end review process, including prior scientific evaluation of the TAR method, its proper application by qualified individuals, and proportionate post hoc sampling for confirmation purposes (emphasis added).

Roitblat doubles down on his strawman, asserting that we eschew all measurement, insisting that our metaphor of cooking a turkey is inconsistent with his false characterization of our position. We have never said—nor do we believe—that measurement is unnecessary for TAR. In addition to pointing out the necessity of ensuring that the method is sound and is properly applied by qualified individuals, we state (at page 312 of our 2014 FCLR article) that it is necessary to ensure “that readily observable evidence—both statistical and non-statistical—is consistent with the proper functioning of the method.”

The turkey-cooking metaphor appears at pages 301-302 of our 2014 FCLR article:

When cooking a turkey, one can be reasonably certain that it is done, and hence free from salmonella, when it reaches a temperature of at least 165 degrees throughout. One can be reasonably sure it has reached a temperature of at least 165 degrees throughout by cooking it for a specific amount of time, depending on the oven temperature, the weight of the turkey, and whether the turkey is initially frozen, refrigerated, or at room temperature. Alternatively, when one believes that the turkey is ready for consumption, one may probe the turkey with a thermometer at various places. Both of these approaches have been validated by biological, medical, and epidemiological evidence. Cooking a turkey requires adherence, by a competent cook, to a recipe that is known to work, while observing that tools like the oven, timer, and thermometer appear to behave properly, and that the appearance, aroma, and texture of the turkey turn out as expected. The totality of the evidence—vetting the method in advance, competently and diligently applying the method, and monitoring observable phenomena following the application of the method—supports the reasonable conclusion that dinner is ready.

Roitblat reproduces our story, and then argues that it is inconsistent with his mischaracterization of our position:

They argue that we do not need to measure the temperature of the turkey in order to cook it properly, that we can be reasonably sure if we roast a turkey of a specific weight and starting temperature for a specific time at a specific oven temperature. This example is actually contrary to their position. Instead of one measure, using a meat thermometer to assess directly the final temperature of the meat, their example calls on four measures: roasting time, oven temperature, turkey weight, and the bird’s starting temperature to guess at how it will turn out. . . .  To be consistent with their argument, they would have to claim that we would not have to measure anything, provided that we had a scientific study of our oven and a qualified chef to oversee the cooking process.

Cooked_TurkeyIn our story, the turkey chef would need to ensure—through measurement and other observations—that the turkey was properly cooked, in order to avoid the risk of food poisoning. The weight of most turkeys sold in the U.S. is readily observable on the FDA label because it has been measured by the packer, and it is reasonable to trust that information. At the same time, a competent chef could reasonably be expected to notice if the label information were preposterous; for example, six pounds for a full-sized turkey. If the label were missing, nothing we have ever said would even remotely suggest that the chef should refrain from weighing the turkey with a kitchen scale—assuming one were available—or even a bathroom scale, if the alternative was for everyone to go hungry. Similarly, if the turkey were taken from a functioning refrigerator, and were free of ice, a competent chef would know the starting temperature with a margin of error that is inconsequential to the cooking time. Any functioning oven has a thermostat that measures and regulates its temperature. It is hard to imagine our chef having no ready access to some sort of timepiece with which to measure cooking time. Moreover, many birds come with a built-in gizmo that measures the turkey’s temperature and pops up when the temperature is somewhat more than 165 degrees. It does not display the temperature at all, let alone with a margin of error and confidence level, but it can still provide reassurance that the turkey is done. We have never suggested that the chef should refrain from using the gizmo, but if it pops up after one hour, or the turkey has been cooking for seven hours and it still has not popped up, they should not ignore the other evidence. ThermoProbeAnd, if the gizmo is missing when the turkey is unwrapped, our chef can still cook dinner without running out to buy a laboratory thermometer. The bottom line is that there are many sources of evidence—statistical and otherwise—that can tell us whether a TAR process has been reasonable.

Your Mileage May Vary

Crash_testRoitblat would have us believe that science has no role to play in determining which TAR methods work, and which do not. In his fourth blog post, Daubert, Rule 26(g) and the eDiscovery Turkey, he argues that there are too many “[s]ources of variability in the eDiscovery process”; that every matter and every collection is different, and that “[t]he system’s performance in a ‘scientific study’ provides no information about any of these sources of variability. . . .” The same argument could be made about crash testing or EPA fuel economy ratings, since every accident, every car, every road, and every driver is also different.

The EPA’s infamous disclaimer, “your mileage may vary,” captures the fact that it is impossible to predict with certainty the fuel consumption of a given trip. But it would be very difficult indeed to find a trip for which a Toyota Prius consumed more fuel than a Hummer H1. And it would be a very good bet that, for your next trip, you would need less gas if you chose the Prius.

Manufacturers generally do not like controlled comparisons, because there are so few winners and so many also-rans. So it is with automobiles, and so it is with eDiscovery software. On the other hand, controlled comparisons help consumers and the courts to determine which TAR tools are reliable.

We have identified more than 100 instances—using different data collections with different prevalences, different learning algorithms, and different feature engineering methods—in which controlled comparison demonstrates that continuous active learning outperforms simple passive learning, and none in which simple passive learning prevails. Neither Roitblat, nor anyone else that we are aware of, has yet identified an instance in which OrcaTec prevails, in a controlled comparison, over the CAL implementation in our toolkit.

Illusion

In his fifth blog post, Daubert, Rule 26(g) and the eDiscovery Turkey: Tasting the eDiscovery Turkey, Part 2, Roitblat first claims that “[g]ood estimates of Recall can be obtained by evaluating a few hundred documents rather than the many thousands that could be needed for traditional measures of Recall,” but later admits that eRecall is a biased estimate of recall, “like a clock that runs a little fast or slow.” Roitblat further admits, “eRecall has a larger confidence interval than directly measured Recall because it involves the ratio of two random samples.” Roitblat then wonders “why [we] think that it is necessary to assume that the two measures [eRecall and the “direct method” of estimating recall] have the same confidence interval [(i.e., margin of error)].”

Our assumption came from representations made by Roitblat in Measurement in eDiscovery—A Technical White Paper:

Rather than exhaustively assessing a large random sample of thousands of documents [as required by the direct method], with the attendant variability of using multiple reviewers, we can obtain similar results by taking advantage of the fact that we have identified putatively responsive and putatively non-responsive documents. We use that information and the constraints inherent in the contingency table to evaluate the effectiveness of our process. Estimating Recall from Elusion can be called eRecall (emphasis added).

Our “mistake” was in taking Roitblat’s use of “similar results” to imply that an estimate of recall using eRecall would have a similar accuracy, margin of error, and confidence level to one obtained by the direct method; that is, unbiased, with a margin of error of ±5%, and a confidence level of 95%.

eRecall misses this mark by a long shot. If you set the confidence level to 95%, the margin of error achieved by eRecall is vastly larger than ±5%. Alternatively, if you set the margin of error to ±5%, the confidence level is vastly inferior to 95%, as illustrated below.

Table 2 at page 309 of our 2014 FCLR article (reproduced below) shows the result of repeatedly using eRecall, the direct method, and other methods to estimate recall for a review known to have achieved 75% recall and 83% precision, from a collection with 1% prevalence.

Table_2_recall

To achieve a margin of error of ±5%, at the 95% confidence level, the estimate must fall between 70% and 80% (±5% of the true value) at least 95% of the time. From the fourth column of the table one can see that the direct method falls within this range 97.5% of the time, exceeding the standard for 95% confidence. eRecall, on the other hand, falls within this range a mere 8.9% of the time. If the recall estimate had been drawn at random from a hat containing all estimates from 0% to 100%, the result would have fallen within the required range 10% of the time—more often than eRecall. Therefore, for this review, eRecall provides an estimate that is no better than chance.

Missed_targetHow large does the margin of error need to be for eRecall to achieve a 95% confidence level? The fifth and sixth columns of the table show that one would need to enlarge the target range to include all values between 0% and 100%, for eRecall to be able to hit the target 95% of the time. In other words, eRecall provides no information whatsoever about the true recall of this review, at the 95% confidence level. On the other hand, one could narrow the target range to include only the values between 70.6% and 79.2%, and the direct method would still hit it 95% of the time, consistent with a margin of error slightly better than ±5%, at the 95% confidence level.

In short, the direct method provides a valid—albeit burdensome—estimate of recall, and eRecall does not.

Summary

Roitblat repeatedly puts words in our mouths to attack positions we do not hold in order to advance his position that one should employ OrcaTec’s software and accept—without any scientific evidence—an unsound estimate of its effectiveness. Ironically, one of the positions that Roitblat falsely attributes to us is that one should not measure anything. Yet, we have spent the better part of the last five years doing quantitative research—measuring—TAR methods.

The Future

We are convinced that sound quantitative evaluation is essential to inform the choice of tools and methods for TAR, to inform the determination of what is reasonable and proportionate, and to drive improvements in the state of the art. We hope that our studies so far—and our approach, as embodied in our TAR Evaluation Toolkit—will inspire others, as we have been inspired, to seek even more effective and more efficient approaches to TAR, and better methods to validate those approaches through scientific inquiry.

Our next steps will be to expand the range of datasets, learning algorithms, and protocols we investigate, as well as to investigate the impact of human factors, stopping criteria, and measures of success. We hope that information retrieval researchers, service providers, and consumers will join us in our quest, by using our toolkit, by allowing us to evaluate their efforts using our toolkit, or by conducting scientific studies of their own.


Follow

Get every new post delivered to your Inbox.

Join 3,528 other followers