Guest Blog by Jason R. Baron
Ralph has once again turned over his blog to me to report on what transpired at the DESI V Workshop with the latest iteration of this workshop series held in Rome on June 14, 2013, as part of the 14th International Conference on Artificial Intelligence and Law (ICAIL 2013). DESI V was focused on standards for using predictive coding, machine learning, and other advanced search and review methods in e-discovery, with a high degree of tolerance on the part of the organizers in accepting interesting e-discovery papers across a range of nominally related topics.
Over the past several weeks Ralph has very kindly re-published (with his own extensive commentary) two papers first presented at the workshop, including a paper written by Jesse Freeman and myself on support vector machines, and a paper by Larry Chapin, Simon Attfield, and Efeosasere Okoro, on predictive coding, storytelling and God. I will leave it to close readers of Ralph’s blog to decide which is the more difficult to decipher: God’s mysteries or the mysteries of support vector machines, but I digress. In any event, all of the workshop papers can be found at this link, and I encourage readers of Ralph’s blog to go to the original source documents to take in the full measure of the proceedings.
When Doug Oard and I conceived of the DESI (Discovery of ESI) series of workshops in Palo Alto in 2007, I had in mind a forum that would essentially bring Picasso and Einstein together for dinner: i.e., academics with PhDs in Information Retrieval and Artificial Intelligence mixing it up with lawyers and legal service provider reps, all in an effort to advance the ball on how to introduce more advanced and sophisticated search and review methods into the ediscovery space. Doug and I were initially invited to do so as part of the ICAIL series of conferences, by Jack Conrad of ThomsonReuters, and I remain in Jack’s debt for inviting the DESI crowd to these bi-annual proceedings.
Since 2007, we have been fortunate to travel to Barcelona, Pittsburgh and now Rome as a part of the ICAIL conference, with a side trip along the way for DESI II in London (hosted by Simon Attfield, then at the University College of London). Doug and I were also two of the organizers of a SIRE workshop as part of SIGIR 2011 in Beijing. For the DESI series we have been assisted by other committed co-organizers along the way, including this year Jack Conrad, Dave Lewis, Debra Logan, and Fabrizio Sebastiani. Fabrizio is a highly respected senior researcher at the Institute for the Science and Technologies of Information in Pisa (a part of the National Research Council of Italy), who gave a keynote talk and otherwise ably functioned as our all-around local host.
I am happy to say that the combined oeuvre of DESI workshop research and position papers now has topped 85 or so over five iterations (see DESI V home page for links back to prior workshops). A number of the research contributions have been cited in such places as the RAND Study by Nicholas M. Pace and Laura Zakaras, entitled “Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery” (2012). (In my opinion the RAND Study continues to constitute one of the best general survey introductions to the subject of predictive coding.) We also produced a special issue of AI and Law Journal on e-discovery with several fleshed-out versions of papers first presented at the DESI workshops. It is certainly my hope that the workshop series continues on as part of ICAIL, as it continues to provide a forum or an outlet for creative thinking in the ediscovery space.
One example of such thinking has already been highlighted by Ralph in last week’s guest blog on the role of narrative, in the aforementioned paper by Chapin, Attfield and Okoro. I am in complete agreement with Ralph (and with the authors) that a key skill in trial lawyering is learning to tell a better, more compelling story than your opposing counsel. The interesting question for trial lawyer types remains however: in a world of big data getting only bigger, how does one reliably and efficiently utilize the resources at hand to tease out that story, that narrative? I am convinced that we as lawyers are on the cusp of utilizing vastly more complex visual analytics to assist us in constructing those narratives, as well as getting at the truly most important documents that function as the signal through all the data noise out there.
What follows is an incomplete and somewhat breezy summary of just a few of the papers presented at the workshop. My aim is a modest one: to whet the reader’s appetite sufficiently so that she or he peruses the original papers themselves for more in-depth study.
At the suggestion of Jack Conrad, we had two keynotes: a technical one from a PhD’s perspective, and a lawyerly one reporting on developments in case law and practice. The opening keynote by Fabrizio Sebastiani was entitled “Utility Theory, Minimum Effort and Predictive Coding. The central thesis: that human in the loop methods of predictive coding can be optimized more effectively by approaching the task using the perspective of utility theory. In Fabrizio’s words (which I have cobbled together from his powerpoints):
We need methods that given a desired level of accuracy, minimize the assessor’s effort necessary to achieve it; alternatively, given an available amount of human assessors’ effort, maximize the accuracy that can be obtained through it. This can be achieved by ranking the automatically classified documents in such a way that, by starting the inspection from the top of the ranking, the cost-effectiveness of the annotators work is maximized. We call the task of generating such a ranking Semi Automatic Text Classification (SATC). Previous work has addressed SATC via techniques developed for ‘active learning.’ In [these] cases, the automatically classified documents are ranked with the goal of having the human annotator start inspecting/correcting from the top; however, in active learning the goal is providing new training examples, [whereas] in SATC the goal is increasing the overall accuracy of the classified set.
Fabrizio went on to provide two intuitive insights that he wished to test in further research: first, that documents that have a higher probability of being misclassified should be ranked higher; and second, that documents that, if corrected, bring about a higher gain (i.e., have a bigger impact on the overall recall and precision metric being used to evaluate results), should also be ranked higher. Or, using concepts borrowed from utility theory, documents that have a higher utility (based on the probability of gain in impact) should be ranked higher. Fabrizio went on to report what he termed “promising” experimental results based on his insights for maximizing error reduction (i.e. the increase in the accuracy of coding) given a fixed amount of annotation effort. Note: Fabrizio’s keynote was based on a recent technical paper he wrote with Giacomo Berardi and Andrea Esuli, A Utility-Theoretic Ranking Method for Semi-Automated Text Classiﬁcation.
What Fabrizio may not have anticipated is that some of the lawyers in the room were loaded for bear, with a lively question period following. Notwithstanding the cross-examination, I do hope that he and his colleagues in Europe continue to think about ways in which their own research intersects with the interests of the greater e-discovery community, as their work will be viewed as important and very much welcomed.
Bill Butterfield and Conor Crowley presented a second keynote, talking about their paper Reality Bites: Why TAR’s Promises Have Yet to Be Fulfilled. In their paper, Bill and Conor provide a comprehensive summary of the case law on predictive coding to date, and as one would expect from the current Chair and Vice Chair of Working Group 1 of The Sedona Conference, they believe the approaches taken by parties in such cases as Da Silva Moore and In Re Actos:
…strongly suggest that, despite the deference to the producing parties’ choices of review tools and procedures noted in [such cases] . . . transparency and cooperation are more likely to promote both judicial and party acceptance of TAR [technology assisted review]. Both cases predicated their approval of TAR on significant transparency – to the court and the requesting party – and on significant involvement of the requesting party in the TAR process.
Bill and Conor’s paper goes on to acknowledge and discuss at some length the objections to transparency that continue to exist in the profession, based on privilege doctrines and what they call “litigation realities.” These include notions surrounding: revealing methods ends up revealing protected work-product; discovery about discovery is beyond the scope of Rule 26; revealing non-responsve documents exposes parties to unnecessary litigation risk; and that no greater assurances are required or warranted of producing parties once they have conducted a reasonable search.
The paper takes on these objections and discusses some of the strategies for nevertheless achieving a greater level of transparency than under older models. Bill and Conor’s bottom line:
As lawyers and judges become better educated about the processes needed to employ TAR effectively, agreement about the specifics of TAR protocols should become easier to achieve. This should reduce the costs associated with reaching agreement on TAR protocols and encourage wider adoption of TAR in civil discovery.
A side note: remarkably, Bill appeared at the workshop — as did Amanda Jones later in the day — after having travelled on a plane overnight, and after weather delays cancelled earlier flights. My hat’s off to both Bill and Amanda (and others who experienced travel issues) for being such troopers in making a supreme effort to come to the workshop, at what I know was some personal cost to their health and welfare!
Manfred Gabriel presented a research paper, co-written with Chris Paskach and David Sharpe, The Challenge and Promise of Predictive Coding for Privilege. In the words of Amir Milo (who had been tapped to comment on the various research papers as part of the panel), the paper:
… does a good service to the community by presenting the problem, explaining the complexity of it, presenting a direction to a solution using machine-learning technology and providing some preliminary results of implementation.
Readers of this blog will benefit from the many practical tips given in the paper when approaching the problem of coding for privilege.
Another research paper, Soft Labeling for Multi-Pass Document Review, presented by Jean-Michael Renders on behalf of himself and fellow authors Jianlin Cheng, Amanda Jones, and Caroline Privault, addressed the problems of (i) binary tagging (relevant vs. non-relevant) as too strict, along with (ii) human mistakes in performing tagging. The authors propose to use soft labeling (some factor between 0 and 1) that will compensate for these types of errors. Amir commented that the first problem is very hard to solve: for example, how would you mark a long document with a single “smoking gun” paragraph versus a short document that is slightly relevant? And how would you tag a document that some reviewers consider as “very” relevant versus a document that all reviewers see as “somewhat” relevant? Amir went on to say that on the other hand, the second problem of human error can truly benefit by the proposed approach, and that he would be interested to see more research and testing done on it, beyond what has already come out as part of the TREC Legal Track and other studies.
Thomas Barnett presented a research paper entitled Similar Document Detection and Electronic Discovery: So Many Documents, So Little Time, authored by Michael Sperling, Rong Jin, Illya Rayvych, Jianghong Li and Jinfeng Yi. As Amir summarized it, the paper presents a novel solution to the problem of finding near-duplicate documents in a large collection of documents. Their approach is based on a data structure of a vector with 62 entries (all letters and digits) that represents a document “fingerprint,” which then uses a sophisticated method to find “close” vectors. This approach is very efficient and has excellent performance.” Amir went on to say that he saw two lines of further research raised by issues presented in the paper: first, that the approach is most effective when looking for documents that are very similar (95% or above), but the desired threshold can be much lower in e-discovery, in the 60% range or so. Second, the approach works for languages with relatively small character sets, like English (with 26), but some languages (e.g., Japanese, Chinese, Korean) have much greater character sets.
Dan Regard gave us a provocative take on the Blair and Maron study, the “ur” document that has been repeatedly cited to by courts and commentators alike as showing that lawyers find only about 20% of relevant documents through traditional keyword approaches. In A Re-Examination of Blair & Maron (1985), Dan teases out the point that the information task that formed the basis of the study was different in kind than the task that we are presently trying to perform in connection with a standard ediscovery search: what Blair and Maron reported out was results from lawyers essentially conducting the equivalent of early case review, by searching for documents that they believed would be useful or relevant – with the key point being that the initial effort on the part of the lawyers was not in response to a pending discovery request. I agree with Dan that this is an important distinction worth taking into account, and I believe his paper helps to get our historical perspective on the study right. (David Blair passed away a year or so ago, to all of our chagrin. I had a chance to talk to him once a few years back at the urging of Macyl Burke, and so had a moment where I shared the initial TREC Legal Track results crediting him with helping to inspire a next generation of efforts to quantify ediscovery.)
Given the title of the workshop, I would be remiss in not calling out for attention two additional position papers which bravely tackled issues concerning the creation of “standards” for predictive coding head on. Both Gilbert S. Keteltas in his paper, Bridging the Technical and Legal Divide: Information Retrieval Process Quality Standards for Counsel, and Dan Brassil, on behalf of co-authors Bruce Hedin and Christopher Hogan, in the paper “Toward A Meaningful E-Discovery Standard: On the Place of Measurement in an E-Discovery Standard,” explore at further length themes first presented at the DESI IV workshop in Pittsburgh. These include the question of whether standards, including a possible international standard (ISO), represent a worthy endeavor in the e-discovery space.
In Gil’s view:
Search process quality standards will not gain wide acceptance if they are not understood by, and part of the work flow of, the day-to-day participants in the legal process. Standards focused on the certification of technology and technology providers are needed, but should be paired with standards that define a defensible path to satisfying [counsel’s] professional obligations to conduct a reasonable, comprehensive search in response to a Rule 34 request for documents and ESI. (internal quotations omitted).
Gil goes on to make the lively point that:
Technology and service providers, you are eager to have your products and services certified. Lawyers worldwide are breathless at the prospect of newly-minted marketing brochures, email signatures and press releases appropriately publicizing, communicating and promoting your certification. But search process quality standards that give counsel – the users of e-discovery technology and services – a path to defensibility, will do far more “to facilitate adoption of e-discovery products and services and accelerate the development of still more effective ones” (quoting the Hedin, Brassil, and Hogan paper at the workshop).
Gil’s paper goes on to discuss how we can build the kind of bridges needed between the legal and technology communities so as to contemplate the development of appropriate standards.
Dan Brassil, on behalf of his fellow H5 coauthors also presented on the subject of standards. His paper argues that:
[T]he potential benefits of a standard can be realized only if the standard addresses the central question potential consumers have when evaluating an e-discovery product or service: how accurate are the results? (Emphasis in original.) That question will be addressed only if the standard makes provision for the statistical sound measure of the effectiveness of the review/retrieval function of an e-discovery system.
Dan went on to make clear in his talk and in the paper that he was not arguing for any specific level of recall and precision in any given proceeding, just that the parties adopt a metric that is defensible.
I have given short shrift to these papers, and have omitted other important ones, in the interest of space and time, but I urge readers to dive more deeply. The subjects that I have chosen to emphasize above are among the ones I personally find of the most professional interest, but I have only scratched the surface of all that was presented.
And what about the pizza you say? The camaraderie at the DESI workshops was fostered by two dinners, pre- and post-, that we arranged for interested participants. Jack Conrad’s photo below capture the mood of some of the pre-workshop crowd.
Many of us were able to extend our time in Italy for any number of days, with yours truly visiting Florence, Pisa, Lucca, Sienna and San Gigimnamo, to name just a few stops enjoyed with family. Others in the crowd happily went on to Zurich to take part in the Sedona WG6 meeting.
ICAIL conferences alternate between non-European venues (e.g., in recent years US east & west coasts) and European ones, and so we may well be somewhere back in the US in June 2015 (or not). I hope that I have made the DESI forum sound sufficiently intriguing that at least some of Ralph’s thousands of readers & fans (not to mention followers on Twitter), will come join the next iteration of the workshop. Maybe we can even get Ralph himself to come next time!