GUEST BLOG: by Jason R. Baron.
This is the prepared written text for Jason R. Baron‘s opening keynote talk for the Seventh Circuit Electronic Discovery Workshop on Computer-Assisted Review. The event, which was previously described here, was held in the Ceremonial Courtroom of the Everett M. Dirksen Courthouse in downtown Chicago on June 14, 2012. Jason notes that he liberally departed from this prepared text at various points during his speech. A separate link to a video of the entire day’s presentations will be provided here when available.
Boldly Going Where Few Judges Have Gone Before: The Emerging Case Law on Software-Assisted Document Review and Our Next 5 Year Mission
I wish to thank Chief Judge Holderman and Judge Nolan for this very special invite to speak. And I also wish to thank Sean Byrne for pulling this whole thing off. It’s an honor and more than a little humbling to be speaking in front of a veritable College of Cardinals of e-discovery that are assembled here today, but I will attempt to do my best to be just provocative enough so that no one goes to sleep right after lunch.
My late Dad was an aeronautics professor at MIT, and at age eight he brought me to meet Carl Sagan at the Harvard Astrophysical Observatory — so I can’t help it if I give talks with space age puns in the title, or make allusions to same.
There is no truth however to the rumor that I co-founded the “TREC” Legal Track with William Shatner in mind. (Nor is “enterprise search” the same thing as searching the U.S.S. Enterprise, but I digress.) Somewhere along the line here I promise to get to predictive coding, supervised learning, and the like — but first I would appreciate your indulgence in my going off on the following tangent.
Robert H. Goddard (who lived from 1882 – 1945) was an American professor, physicist, and inventor, who built the world’s first liquid-fueled rocket — he launched that rocket on March 16, 1926, in the town of Auburn, Massachusetts.
Forty-three years after Goddard’s rocket launch, on July 20, 1969, Neil Armstrong and Buzz Aldrin landed on the Moon — something Goddard accurately had predicted would happen, as reported in a front page New York Times article of January 1920, with the headline that a scientist “Believes Rocket Can Reach Moon.”
So it took 43 years to go from Goddard to Armstrong … and another 43 years after Neil Armstrong, we find ourselves at the present juncture here in 2012. I like the symmetry in that. It’s fun in a kind of cosmic way to think about dates that are half way between other dates.
Here’s another one: in 1938 the Federal Rules of Civil Procedure came into being; and here we are in 2012, where the federal judiciary is now contemplating further rules changes on preservation. Did the practice of law change more between 1938 and 1975 (the midpoint), or between 1975 and 2012? I won’t ask for a show of hands, but I’d be surprised if many present here would opt for the first choice above. I started law school about at that midpoint, in 1977 to be exact, passed the bar in 1980, and have worked as a trial lawyer for 32 years. I have stood up on the order of 100 times in federal courts, with the privilege of being able to say I represented the United States. I have had the further privilege of acting as lead or co-lead Justice Department counsel in three major, complex trials in the 1990s. Conducting discovery in those lawsuits was quite simply living and being in a different world than the one we’re now in. It is obvious to me, as it should be obvious to many here today, that the pace of change, driven by technological innovation, accelerated during the first decade of the 21st century and has continued to do so now into the second — and that much of this acceleration is tied to the 2006 amendments to the Federal rules which included the now ubiquitous term “ESI.”
So now that 5+ years have lapsed since the rules amendments in December 2006, where are we and where are we going in the legal profession in the next 5+ years? My crystal ball is as cloudy as any other preacher on the ediscovery conference circuit, but in the time I have I’d like to sketch out three issues, in the form of questions, concerning the present and future state of e-discovery search that I trust may be of interest at least to some. First, in light of emerging case law, what does cooperation truly mean in an age of software-assisted review and the technical protocols that optimize them? Second, with the recent demise (or at least hiatus) of the TREC Legal Track, is there a need for further research to evaluate search efficacy? And third, are we finally at the point where we should be contemplating voluntary “quality” standards in the e-discovery arena?
But first: today’s event wouldn’t be taking place without the leadership that Chief Judge Holderman and Judge Nolan have shown in advancing the Seventh Circuit Pilot Project. What has emerged over the past few years with Phases I and II of the Seventh Circuit Pilot Project is truly remarkable: we are all front row participants in an ongoing experiment where we get to see periodic survey results — results that include documentation of the emergence of cooperative behavior in the discovery setting. Imagine that! As someone who long ago was banned at home from playing cutthroat Monopoly with my wife and daughter, I know about trial lawyers being all about having competitive streaks, and wishing to “win.” Nevertheless, I have been heartened to see in the past several years the adoption by many on the bench of The Sedona Conference(R) Cooperation Proclamation, as well as the fact that in this Circuit, Principle 1.02 on Cooperation is embedded front and center in the Seventh Circuit’s rule set.
E-discovery is often asymmetric, in multiple ways, including in terms of the financial positions of the parties, how much relevant information is being held by each party, and, of course, the competence of counsel. In terms of search requests, requesting parties, as Ralph Losey has noted, have been engaged for decades in a game of “Go Fish” — as they often are blindly guessing through requests to produce and more recently through proposed keywords as to where relevant evidence is to be found. I have previously described the “game theoretic” aspects of how search protocols end up being negotiated, subject to many different variations. But what does cooperation with respect to search protocols really mean in the brave new world we’re now in with software-assisted review?
As we all know, in the last four months civil discovery has undergone the equivalent of a seismic quake. In a few select cases, Judge Nolan and others have been significantly advancing the path of the law by having placed front and center the issue of whether advanced search analytics (by whatever name) shall be the “new normal” for our profession. I will not further discuss the Kleen Products case in any depth, as I don’t wish to cause discomfort here in Chicago, so let me instead speak for a bit about a simple, quiet, uncontroversial case that has garnered little attention in New York, namely: da Silva Moore v. Publicus Groupe SA (my notes say: pause for expected laughter here.)
In point of fact, the da Silva Moore case before Judge Peck in the Southern District of New York is truly remarkable, not for the circus atmosphere it has invoked, but because of its “joint protocol.” As I assume everyone in this room knows, the joint protocol appended to Judge Peck’s February 24, 2012 opinion has at the end a paragraph that says “Plaintiffs object to this ESI protocol in its entirety” — so, so much for cooperation! Nevertheless, in my view the protocol represents a watershed moment for the profession, due to both the nature of the technology being used in the case, but more importantly I think, the process employed to get at relevant evidence.
The da Silva Moore protocol — a document of many pages that contains a cookbook of instructions on software-assisted review technology in fine detail — provides, first, for the construction through various means of seed sets of documents, categorized by issue tags roughly corresponding to the main causes of action and/or important issues in the case; second, for an iterative process repeated up to seven times, where counsel for the responding party shares both responsive and nonresponsive documents with opposing counsel, in order to train the classifier algorithm being used; third, for sampling, both on the front end, and on the back end to account for the possibility of false negatives (i.e., documents missed); and fourth, a nominal cap on production (which may or may not go into effect) — recognizing in theory that their are limits to what the parties should have to deal with in terms of reviewing documents of marginal relevance in a given litigation.
In our 2007 Richmond law review article, “Information Inflation: Can the Legal System Adapt?,” George L. Paul and I discussed how to construct what we then termed “virtuous cycle iterative feedback loops,” by urging parties to meet and confer multiple times, for the purpose of reviewing sample sets of documents, in iterative fashion, to hone in on more narrowly tailored searches for relevant documents. This protocol is now a reality. Although we didn’t anticipate the possibility that support vector machines and their kin would be optimized by being fed both relevant and irrelevant documents, it has become clear that this is the case when one uses these new forms of predictive algorithms and systems.
George and I also talked about how the inflationary expansion of the ESI universe was changing the way lawyers have to practice law. In the coming world of “Law in the Age of Exabytes,” it is absurd to expect lawyers to be required to find “any and all” documents using standard Rule 34 boilerplate, where the vast majority of relevant documents are really of marginal relevance to the litigation. Instead, we all wish to find hot documents, consisting of a very few of obvious materiality, in the cause of: trying cases, settling cases, and responding to discovery requests in optimum fashion. Similarly, judges don’t want to look at thousands of exhibits; and they certainly don’t want to look at tens of thousands of documents on privilege logs. So why do we keep collectively buying into the notion of and opting for monster productions that fail to filter out information of merely tangential relevance to the case?
Let me be clear: just as technology largely created the inflationary ESI universe we now live in, I believe technology exists to help solve our TMI (too much information) problem — if lawyers are willing to reach to higher levels of transparency and cooperation in litigation as exhibited in the da Silva Moore protocol. That means training systems not just with respect to easy documents (like fantasy football docs being out, and hot documents material to the case that everyone would agree to being in). It means training systems on hard cases in the seed set as well, where optimally lawyers must agree on a level of transparency of the process that facilitates decisions on what is deemed “irrelevant.” And that may in fact mean drawing a line (or, in more technical language, drawing a hyperplane in the multidimensional space created by algorithms such as support vector machines), that considers documents of marginal relevance to be irrelevant for purposes of further training of the classifier. That is where the mathematics of information retrieval has taken us. In other words, cooperation in 2012 means agreeing to treat certain documents as irrelevant — and it is simply necessary for this to happen to train the system to achieve optimal results. But everything in our judicially-constructed rule sets, including both the Federal Rules of Civil Procedure, and the Seventh Circuit’s Principles, are all aimed of course at the production of relevant evidence, not irrelevant documents. Moreover, at the margins, there are ethical considerations to be taken into account as well, in not gratuitously revealing client confidences in violation of Model Rule 1.6. This is all novel terrain which we will be confronting in future joint protocols and future cases over the next five years.
Now a side note on judicial rules sets: In a rapidly changing world of technology, the elegance of the Seventh Circuit model is that it is not generally laden down with being “overspecific” — the judiciary’s version of overfitting to the training set!
A good example is Principle 2.02’s call for the ediscovery liaison being required to be knowledgable, or to find someone who is knowledgeable, about “relevant information retrieval technology, including search methodology.” That is a good generic description that would embrace both what has occurred in Kleen Products and the world that is knocking on all judicial doors. In contrast, consider Principle 2.05, where in identifying ESI at the meet and confer the Seventh Circuit calls for topics including “keyword searching, mathematical or thesaurus-based topic or concept clustering, or other advanced culling technologies.” This phrasing was of course very advanced for 2009 and 2010, but if it were being revisited from scratch in 2012 would I am sure certainly use different words to embrace the supervised learning techniques discussed here. Of course, I’m not advocating that any combination of specific words, like “predictive coding” (or “content based advanced analytics”) be added! That would miss my point entirely. Instead, I wish to gently suggest that future tweaks of the Seventh Circuit’s protocol should assume that the pace of change in technology necessarily will outrun any attempt to capture it (like lightning in a bottle) — and that judicial rules that wish to embrace technological change should always be biased towards generic formulations rather than being overly specific.
Not every judicial protocol is of the same model. For example, in the case of the Federal Circuit Advisory Council’s proposed Model Order issued in October 2011, paragraph 11 limits email production requests to “five search terms per custodian per party” and goes on to discuss what constitutes conjunctive and disjunctive terms and other narrowing search criteria. In a world of software assisted review, attempting to codify what constitutes best practices in keyword searching is, I think, kind of missing the point. It is all the more amusing, to this observer, to have parties citing Model Orders as establishing ‘standards’ in search, as such, where in no sense can judicially crafted rules be viewed as a proper vehicles for establishing technical standards.
The second general theme I wish to touch on today is renewing a call for more and greater research to be conducted over the next five years. It was one of the greatest privileges of my life to be able to found the Text REtrieval Conference (TREC) Legal Track, together with Professor Doug Oard and someone who is our audience today, Dr. David Lewis. I also would be remiss not giving a shout out to Maura Grossman, who has done so much to advance the TREC Legal track first as a topic authority and then as co-coordinator starting in 2010, and who continues to plow fertile fields in the cause of research. The TREC Legal Track has not been the only effort in the space — Herb Roitblat, also here today, and his colleagues at the Electronic Discovery Institute, have been similarly engaged in productive research as well. These efforts have all contributed to greater knowledge and awareness on the part of the e-discovery community of the importance of certain metrics, such as recall and precision, that have now made their way into opinions and now even into evidentiary proceedings.
The findings of the TREC Legal Track have generated a bucket load of surprises. It certainly came as a surprise to me in the early years of the TREC Legal Track that it was so difficult for any one method to consistently match the results of a well-constructed Boolean string, and yet, paradoxically, that only something like 22% of relevant docs were found by a Boolean baseline (see 2008 Overview paper discussing results from the second year of the track.) In more recent years, the TREC Legal Track was able to establish that interactive hybrid models with topic authorities in the loop, as well as supervised learning methods, could do better in many cases than relying on simple Boolean solutions. It has been pointedly noted that rates of recall found in the TREC Legal Track have been highly variable, depending on the particular topic under study. This variability continues to be of great interest to me, as well as the limits that have been established to what constitutes good recall and good precision. As I said, I am very happy that the legal community is waking up to basic information retrieval terms such as recall and precision and discussing their importance in particular litigation. However, to this casual oberver, there do seem to be more than the usual number of misimpressions, mischaracterizations, and mis-use of research findings, floating around.
The TREC Legal Track was never intended to be a competition, and it was never intended to be a standards making exercise as such. It was an information retrieval evaluation exercise where academics and vendors in the legal industry came together to test hypothesis and see how well their assumptions stacked up in an approximate legal setting. There were academics participating who literally bid against themselves — tying their hands around their back, as it were, to unleash variable search methods with known limitations, where the goal was to see differences rather than to “win”at achieving the best F1 score.
Nor did the TREC Legal track purport to establish that there is a particular metric or benchmark that must be met in litigation: say 80% recall. Every litigation is different, and I never contemplated that what would emerge is a standard of that kind. It’s enough to consider that each litigation should be transparent on what recall and precision have been achieved through sampling. If the TREC Legal Track helped to establish that foundation, I will declare victory.
In the scientific community, a degree of healthy skepticism exists over results obtained. Albert Einstein once famously declared: “Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods.” We should all strive to be humble when it comes to scientific proof (I would venture to guess that being humble is something even more difficult for some of us than being cooperative). We should all recognize that fair criticism is the norm in the scientific community — we should embrace this kind of criticism and learn from it. No one law review article or research project can possibly answer all questions definitively, it is rather the cumulative impact that many such studies have that contribute to the growth of knowledge to the point that members of the judiciary can take some kind of notice of what are well-accepted findings in a particular area of scientific expertise.
In an important study of costs in document review and the state of research and technology in advanced search, the RAND Institute’s publication “Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery” should be prime reading for everyone here. After undertaking a comprehensive review of the research projects that are out there to date, Nicholas M. Pace and Laura Zakaras, writing for RAND, make two important points:
First, that “although no experimental setting to assess the relative qualities of human or computer-categorized review can be completely free of ‘unrealism and artificiality,’ the empirical evidence that is currently available does suggest that similar results in large- scale reviews would be achieved with either approach.” (RAND report, p. 66.)
And second, that with respect to how cost-effective is predictive coding and similar technologies, the RAND study says that “the answer is not entirely clear at the moment.”
Let me paraphrase what the study goes on to add: predictive coding is a nascent technology in the context of legal discovery, and there simply are not many data points to use when comparing this process with the way litigants currently conduct review of electronic documents. It is almost certainly true that predictive coding will save money over the classic model of well-paid associates carefully reading each page in doc after document in large-scale productions. But … any assessment of relative costs must compare predictive coding with enhanced eyes-on review, that is human review in a contemporary document production that is assisted by near duplicate detection, better management of workflow with quality-control procedures, email threading, pre-review sampling, the use of experienced vendors specializing in large-scale review, clustering analysis, etc. etc. See RAND, pp .66-67
However, the RAND report’s bottom line is that software assisted review holds the potential for significant costs savings without compromising quality as compared with more traditional forms of human review.
Again let me be clear: I remain a strong cheerleader and advocate for advanced forms of search and document review. But there are dozens of open questions that remain for further research in this area, and would caution against making judicial short-cuts to findings that say this area is essentially a “solved problem” for lawyers. We ain’t there yet. Some of the kind of questions I have in mind:
- What kind of cases does software-assisted review methods work best in?
- What kind of seed set process is optimum — random sampling or some kind of judgmental sampling based on keywords, using hot docs and privileged docs?
- How good are such software assisted methods in stabilizing subsets of documents by issue tags?
- How good is the software in dealing with cryptic docs?
- How good can software-assisted methods be in really honing in on hot documents of the most material nature to a given litigation.
These are hard problems in information retrieval, due to the intractability of language and the squishiness of the legal notion of relevance, but I have high confidence that they can be partially tackled successfully through further research in the next five years.
My final point theme for today is to a renew a call for true standards — quality process standards — in this area, including through consideration of adoption of a specific ISO 9001 standard or code of practice for e-discovery.
As discussed in the The Sedona Commentary on Achieving Quality in E-Discovery (2009), there is an emerging consensus that quality in e-discovery is fostered by such elements as sound project management of the process, coupled with the use of an array of statistical and industrial techniques grounded in quality assurance and quality control, including testing and sampling of results so as to ensure defensibility. However, there is no widely agreed-upon set of standards or code of best practices for how to conduct a reasonable e-discovery search or use technological methods generally to find relevant evidence. The Sedona Conference and others have, however, called out to industry and academia to assist in the further development of standards of what constitutes “best practices” in the area of performing searches of electronic evidence.
In initiating a discussion about standards for what constitutes a “quality” process in e-discovery search, the DESI IV workshop held in Pittsburgh in June 2011 served to achieve the aim of bringing together academia and industry in the development of standards in this area. In a 2011 article that I co-authored on “Evaluation of Information Retrieval in E-Discovery,” in the journal Artificial Intelligence and Law’s special issue on E-Discovery, the authors suggested that:
One broad class of approaches that has gained currency in recent years … is known as ‘process quality.’ Essentially … the important thing is that we agree on how each performer of E-discovery services should design measures to gain insight into the quality of the results achieved by their particular process. The design of their process, and of their specific measures, is up to each performer. Of course, some performers might benefit from economies of scale by adopting measures designed by others, but because the measures must fit the process and because process innovation should be not just accommodated but encouraged, forced convergence on specific measures can be counterproductive. So process quality approaches seek to certify the way in which the measurement process is performed rather than what specifically is measured.
The DESI IV workshop was intended to provide a platform for discussion of an open standard governing the elements of a state-of-the-art search for electronic evidence in the context of civil discovery. The dialog at the workshop ranged from a straightforward discussion of how to measure and improve upon the “quality” of existing search processes; to discussing capability maturity models for e-discovery and the creation of a national or international recognized standard on what constitutes a “quality process” when undertaking e-discovery searches; to an even more ambitious discussion concerning creation of a type of standards “authority” or certification entity that could certify compliance with whatever standard emerges. Issues that were discussed included the potential benefits of a standard (e.g., reducing the need for evidentiary hearings on the reasonableness of a process), versus its potential costs (e.g., the risk of inhibiting innovation through early convergence on rigid standards).
I remain hopeful that we do not have to have endless evidentiary hearings concerning the use of new search and document review technologies. Indeed, my dream: future litigation in which proactive judges work with sophisticated litigants to ensure that parties do not engage in resource intensive, evidentiary disputes about search protocols and methods. Rather, parties will be able to reach agreement over a search protocol that measures up to agreed upon quality standard.
I have been carrying the torch of DESI IV in Pittsburgh to a productive end. I am presently working towards the goal of forming an ANSI (American National Standards Institute) accredited workgroup aimed at codifying an ediscovery quality standard or code of practice for purposes of audits. I intend to work with representatives of ARMA, AIIM, The Sedona Conference, the ABA Science and Technology Section’s E-Discovery and Digital Evidence committee, and other interested groups, including one or more members of the Seventh Circuit pilot project, to ensure that such a code of practice is thoroughly thought through.
In the meantime, you are all invited to the next international DESI or Discovery of ESI workshop, to be held as part of the AI and Law conference ICAIL 2013 in Rome, Italy, on Monday June 10, 2013, where the subject is slated to involve standards for forms of supervised learning for the e-discovery and e-recordkeeping space. I also have it on good authority that the WG6 Sedona conference is the same week in Europe…. so it’s a two-fer for everyone able to convince their firms and corporate types that they really, really are going to Europe to learn about standards in predictive coding and the like. Vi aspettiamo tutti in Italia l’anno prossimo! (See you all in Italy next year!)
I still consider myself a “seeker” in the area of designing better searches, but my views continue to evolve as to the difficulty of the problem presented and what constitutes a good question to work on asking. As I have previously said, I have come to learn that the wrong question to ask is whether a given method “beats Boolean.” The right questions: how does one go about designing an optimal process that produces a quality result. And are there ways to regularize or standardize that process so as to “certify” the result in a way that is defensible? This is as true about software assisted methods as about anything else that has come before, or that will be developed in the next five years . . . . let alone 43 years into the future (which by the way, will be the year 2055, when I will be 99. We will see if I am still working at the National Archives by then.)
Thanks again so very much for the opportunity to share my views today with all of you.