Over 500 people attended Georgetown’s Advanced eDiscovery Institute last week, smashing all CLE attendance records. The agenda of this event was long and impressive. Most of the time three different presentations were going on at the same time. Although many CLEs are labeled as advanced, this is the first one outside of Sedona that I have attended that really did have mostly sophisticated attendees. It was much better than in prior years where I have taught at the same Georgetown event. What a pleasure. Plus this year’s Georgetown Institute included a veritable who’s who of e-discovery on the faculty list. The only familiar expert I did not see there was Judge Waxse.
In next week’s blog I will cover a couple of the many excellent presentations made. But for this week I am going to focus on the most popular subject.
The Event’s Hot Topic
Can you guess the most popular topic? Yup, computer assisted review, which many have taken to calling TAR, including Georgetown. It seems that most advanced practitioners in the country are like me and want to improve their CAR driving skills. Lessons were presented in three parts, with three different panels. I was on the third mop-up panel. Here are their titles:
Defining Technology Assisted Review and Considerations in Developing a TAR Protocol: Seed Sets, Training Sets, and Iterations – Oh My!
Measuring, Validating, and Defending Your Technology Assisted Review (TAR): What Metrics and Measurements are Necessary or Appropriate?
Technology Assisted Review, Part 3, Selling TAR to Opposing Counsel and the Court: How Much Disclosure Is Necessary?
Here are the course descriptions and panelists:
The successful application of Technology Assisted Review (TAR) requires the correct combination of people, technology, and process to achieve the desired results. Part One of this three-part series will begin with an explanation of relevance-ranking and machine-learning algorithms. This panel will delve into the nitty-gritty details of the TAR process and will consider different TAR approaches currently in use.
Moderator: Scott A. Kane
Panelists: Emily A. Cobb, Gordon V. Cormack , Tom C. Gricks, Amir Milo
In Part Two of this three-part series you will learn how sampling and other tools can be used to assess and defend the quality of TAR results and will grapple with the critical questions of what constitutes a “reasonable effort” and how perfect the production has to be. This session will cover issues such as: what tests are available and can be used to verify that the TAR process has worked as expected; what confidence intervals, confidence levels, and sampling sizes are appropriate under what circumstances; and what impact the prevalence of responsive documents in a collection has on statistical measures.
Moderator: Maura R. Grossman
Panelists: David D. Lewis, Herbert L. Roitblat, Ian Soboroff, Ian Wilson
Part Three of this three-part series will consider TAR protocols in actual cases to date and will explore the thorny issue of whether and how much of the TAR process should be negotiated or disclosed. This panel is sure to set off some fireworks as we consider such issues as: whether the use of TAR must be disclosed to opposing counsel in the first place; what opportunities for cooperation are available (i.e., joint seed set development, joint determination of when to stop training), and considerations in deciding whether to agree to them; whether a party should have to share the seed or training documents that were used; whether a party should be required to show its adversary non-responsive documents identified by the system; and whether, and to what degree, technical experts need to be involved in designing or explaining the process to adversaries or the court.
Moderator: Conor R. Crowley
Panelists: Wendy Butler Curtis, Brian E. Calla, Conrad J. Jacoby, Ralph C. Losey
Disclosure Arguments at a 16(b) Hearing
We started our Part Three panel by discussion of the key cases for predictive coding, with only five minutes or less for each case. This was, after all, an Advanced level CLE, and we figured most everyone would have heard of them already. We were right. Anyway, here are the five cases we talked about.
- DaSilva Moore – Conrad Jacoby
- Kleen Products – Ralph Losey
- Global Aerospace – Wendy Curtis
- Actos – Brian Calla
- EORHB (Hooters) – Wendy Curtis
Our panel then started a mock 16(b) hearing, which lasted an hour. Our fearless leader, Conor Crowley, played the Judge. At the end he asked all attendees to vote on the nine issues we argued. As a joke he ruled according to the vote. He basically split the baby and required some, but not all, of the disclosures that the pushy plaintiff’s counsel argued for.
I got to play the part of the pushy show-all plaintiff’s lawyer. Conrad acted as the representative of the vendor that I dragged along with me to the hearing. I called my vendor, Droll OnTruck, which drew quite a few laughs. But I was outdone by Brian who played the defendant’s vendor, and named his company, Calamind, and claimed that his company not only invented predictive coding, but also the Internet. Wendy played the role of a very aggressive defense counsel, which seemed to fit her perfectly.
The hypo assumed that both sides were using predictive coding and the only issue was disclosure. Wendy had to advocate for the position that my side had to make full disclosure, whereas her’s had to make none. Basically I was asked to open my kimono and show her everything, whereas she would not even give me her basic measurements. Too bad, most of the audience seemed interested in that. Amazingly, in the vote at the end some of the audience actually voted for the totally one-sided disclosure position, a testament I suppose to her persuasive abilities.
For my argument to support disclosure I stressed the need for the requesting party to know about the responding party’s planned Quality Control efforts, which pertains to the process, and planned Quality Assurance, which pertains to the tests at the end. It was part of my request for reasonable assurances that the predictive coding processes would actually work to find the relevant documents requested, and that reasonable efforts were made.
Although I did not get a chance to state this during our presentation, in my opinion if you are using good predictive coding software and make reasonable efforts, then the efforts will work to find the relevant documents and will do so quickly, efficiently, and at less cost. But not all software advertised as having predictive coding features is necessarily good. They are not all created equal. So if you make reasonable efforts, and are disappointed with the results, and I had several people tell me this, then it may well be the software, not you. Try some other brands. We are still in the very early days of implementation of predictive coding for legal search. Perhaps you should ask your firm for a new CAR for Christmas?
Back to my presentation, I cited all of the cases we had discussed and pointed out that all of them had required the kind of disclosure we were debating. I specifically tied my arguments to Judge Nan Nolan’s order in Kleen Products where she quotes and explains the doctrine of Cooperation, including a lawyer’s duty to be both diligent and candid. I emphasized an attorney’s duty of candor as an officer of the court. I also tried to neutralize and explain Judge Nolan’s reliance on Sedona Principle Six, which I expected would be used against me in the mock exercise. It states that:
Responding parties are best situated to evaluate the procedures, methodologies, and techniques appropriate for preserving and producing their own electronically stored information.
Right after quoting this principle in Kleen Products Judge Nolan went on and:
urged the parties to consider whether there was a method to refine or supplement Defendants’ Boolean search so that Plaintiffs had a reasonable assurance that they were receiving a high percentage of responsive documents without completely scrapping Defendants’ search methodology.
I emphasized the reasonable assurance language, that we were not trying to dictate what defendants did, just receive disclosure sufficient to provide us with reasonable assurances.
In addition to the quality assurances, I argued the position, not necessarily my own, that we needed to see the seed sets to be sure the requesting party’s relevance conception was understood by the responding party. That is the garbage in, garbage out argument. The argument assumes predictive coding is special in needing a clear common conception of relevance. That is not really correct, however, or is at least an exaggeration, as good predictive coding software will be self-correcting of initial errors to some degree. My esteemed opposing counsel correctly pointed out that we do not require this kind of disclosure with other search methods, nor do the rules require it.
I did not have a chance to raise this issue during the panel, but the reality is all search methods require such relevancy communications to work effectively, not just predictive coding. I call this ESI discovery communications, which should be the first step in the process of predictive coding as shown below.
The nine disclosure issues that Wendy and I debated are described below. I actually made many of the disclosures of my planned quality controls and assurance methods during the hearing, for education purposes and to enhance my argument against defendants. Wendy’s assignment was to argue that she would never have to show me hers, even though I was showing her mine as part of the argument. This kind of argument drew quite a few laughs, as Wendy, who is both attractive and clever, was good at delivering punch lines. That helped to keep everyone awake in our early morning presentation, which, amazingly, was still a full house. Of course, I could be delusional. We’ll see what the evaluations say.
The nine disclosure topics we choose to argue were:
- Size of document corpus.
- Size of seed set. (I did not seek to dictate it. I just wanted to know what it was.)
- Seed set selection method. (Of course, I said I was going to use a multimodal seed set selection method. I was going to use a judgmental sample found by various search methods and a random sample. I wanted disclosure of what the defendant was doing for many reasons. One of which is that I wanted to know if defendants were using the dreaded Borg approach, the reliability of which I suspect. See Eg. Analysis of the Official Report on the 2011 TREC Legal Track – Part Two.)
- Contents of seed set. (I asked to see relevant, which would have to be produced anyway, and irrelevant. I pointed out that no privileged documents would be produced. Not only that, I would agree to withholding and logging mere sensitive documents. I would also agree to any levels of confidentiality requested, and of course, all irrelevant documents would be promptly returned. Despite all these concessions, the audience overwhelmingly voted against this position.)
- Experts selected to review seed set. (I wanted to know their qualifications. I stated I would object to unqualified persons training the machine. That is an important protection against GIGO.)
- Methodology for analysis of seed set deemed relevant. (I wanted to see what kind of quality controls they were going to use. I also wanted to be consulted on close questions so we could have a common understanding of relevancy. That was also to address the GIGO issue.)
- Sampling methodology with respect to documents deemed relevant/non-relevant. (Again, I did not seek to dictate it. I just wanted to know what it was.)
- Sample of documents deemed not relevant by TAR. (I proposed an accept on zero error protocol if any of the false negatives were found to be highly relevant. I did not care if it were just more of the same. We all agreed on the importance of proportionality.)
- Levels of precision and recall achieved and the confidence level for precision/recall metrics. (We all agreed that a set number was not required, but I did want to at least know what levels were attained and reserve my rights to object.)
CLE events like Georgetown’s Advanced eDiscovery Institute are not only a great way to stay current with the latest ideas and best practices, but also a good space to see old friends and make new ones. Georgetown and Sedona are two good events for lawyers who are advanced and ready for more complicated information. An ABA event in Tampa January 23 – 25, 2013 also looks promising: The 2013 E-Discovery and Information Governance National Institute. Check out the agenda and faculty in the brochure for the event.