Reports from the DESI III Global E-Discovery/E-Disclosure Workshop at ICAIL 2009 and The Sedona Conference® International Programme on Cross Border E-Discovery and Privacy
Ralph has once again graciously provided me with this forum to report on two e-discovery-related events held in the past month in Barcelona: the “DESI III” Global E-Discovery/E-Disclosure workshop at ICAIL 2009, and The Sedona Conference® International Programme on Cross Border E-Discovery and Data Privacy. Barcelona is of course home to the Modernist, Art Nouveau architect Antoni Gaudi (1852-1926), whose highly imaginative works are reflected throughout this posting along with scenes from Barcelona itself.
As faithful readers of this space know through Ralph’s various past blogs, I have been intensely focused for the past few years in fostering greater awareness on the part of lawyers on the subject of what constitutes a “reasonable search” for ESI in the context of civil litigation. What was obvious to me in 2003 remains obvious today: that a great need exists in the legal community to understand the problems and limitations of conducting keyword searches by traditional means, given the built-in inefficiencies involved, and that old ways of thinking about how to find relevant documents amongst an exponentially growing sea of ESI must be replaced by a new legal paradigm. Two of my chosen venues for “advancing the ball” in thinking about these problems have been through The Sedona Conference®, in volunteering as an Editor-in-Chief of the Best Practices Commentary on the Use of Search and Information Retrieval in E-Discovery (2007) and Commentary on Achieving Quality in E-Discovery (2009); and through founding the TREC Legal Track, now in its fourth year.
To a similar end, DESI III was the third workshop in a series of international gatherings aimed at bringing together lawyers and academics of various stripes, all interested in new ways of thinking about the problem of “search” and “information retrieval” in the legal domain. I am grateful to Ralph that he previously allowed me an opportunity to report on the DESI II workshop held in London in 2008 (see his past blog, “A Tale of Two ESI Forums”).
This year’s workshop was held amidst the Casa Convalesència, a part of a still-functioning hospital complex known as the Hospital de la Santa Creu I Sant Pau, that is designated as a UNESCO World Heritage site for the beauty of its architecture, especially its interior tile work. Leading off was a warm opening welcome by Dr. Hugo Zaragoza, of Yahoo! Research, who posed the question: why is it that linguistic knowledge has so little effect on relevance ranking in search engines? He went on to make the simple but profound point (repeated throughout the day), that the scientific community finds it to constitute still a very hard problem to find “meaning” in plaintext – even with everything accomplished to date in the world of information retrieval, natural language processing, and semantics. Simply put, computers do not “intuitively” know what to make of ASCII strings of text like “Pablo Picasso was born in Malaga, Spain” (which, if you think about it for a moment, is, at bottom, simply a sequence of ones and zeroes to the software that is trying to interpret it). Moreover, the information retrieval (IR) and AI crowds have a ways to go in coming up with reliable strategies in parsing the semantics of what we say. In my view, lawyers are best served when they are reminded of this underlying scientific reality.
Those of my fellow e-discovery travelers who chose to venture to Catalonia and to this lovely venue were then treated to what many agreed was a world-class keynote given by Dr. David Evans, of JustSystems Evans Research, out of Pittsburgh. We invited David Evans in light of his well-known ability to synthesize scientific concepts in a fashion intelligible to mere mortals (like lawyers), and he did not disappoint. The major themes of his talk, titled “Linguistic, Cultural, and Behavioral Dimensions of E-Discovery,” can be gleaned from a study of his detailed and extensive PowerPoints. Starting with a brief recounting of the findings of the TREC Legal Track that Boolean searching has been shown to miss substantial numbers of relevant documents at least in certain contexts and queries, he went on to show the limitations of the “art” of searching in present day e-discovery, which include lower than hoped for “recall” (i.e., the percentage of responsive documents found out of the whole); failure to rank items by higher or lesser relevance; and incomplete classification of ‘privileged’ documents. On top of all of that, in most cases there is little or no attempt to analyze foreign-language material, except in a quite superficial fashion.
Dr. Evans believes that we need to use realistic approaches to accommodate the “uncertainty” that exists in foreign-language translation exercises, and that Boolean expressions are necessarily limited in what they can do. He believes the bench and bar should be better educated about the problems involved in searching, and that they should have greater awareness of what he terms the “special needs” of cross-language information retrieval. He shared certain ideas he had about how to adjust one’s search strategy to take into account differing “information densities” in texts and other forms of ESI produced in foreign languages and cultures. His bottom line: “e-discovery practices should take account of the linguistic and cultural-behavioral contexts of companies and individual workers.” I haven’t done justice to all of what David Evans spoke about, but urge a look at his PowerPoints and would hope that he will be doing more speaking and writing on these themes in the months and years to come.
The workshop continued with a morning “research papers” session, with speakers reporting out the results of some form of original research, modeling, or the use of new tools and techniques. Representing two teams of researchers at Xerox and Xerox Europe, Svetlana Godjevac and Caroline Privault summarized their papers on “Machine Learning Classification for Document Review” and the desktop categorization and visualization product known as “DISCO.” Their combined emphasis was on greater use of clustering techniques and attorney coding and categorization, as a way to cut down in size the ESI volume problem and to produce more accurate results. Greater emphasis on clustering technologies as part of early case assessment appears to pay big dividends; based on anecdotal surveys of my private sector colleagues, however, my sense is that these techniques are only coming now in to some fair measure of use by law firms. (For a modest paper providing a checklist of tips on the subject of early case assessment search, clustering and filtering tools, I refer the reader to a short piece that Ronni D. Solomon, a rising e-discovery star at King & Spalding in Atlanta, and I did up for The Sedona Institute earlier this year. Ralph has been kind to cite it previously in this space even if he didn’t care for our cute title.)
Next up were research papers by Dr. Hans Henseler of the Amsterdam University of Applied Services, on “Network-based filtering for large email collections in E-Discovery,” and Dr. Maria Esteva of the Texas Advanced Computing Center at the University of Texas, Austin, who submitted two papers for the workshop on behalf of her and her colleagues, and who presented on “Finding Narratives of activities through archival bond in electronically stored information (ESI).” Hans Henseler, an expert in forensic technologies and techniques, and Maria Esteva, based on her background in archival theory and information science, each provided welcome additional perspectives on how to attack the problem of finding meaning in large collections of ESI, including through the use of novel forms of statistical methods and what is known as “social network analysis” (SNA). As defined by Dr. Henseler, SNA constitutes “the mapping and measuring of relationships and flows between people, groups, organizations, computers, web sites, and other information/knowledge processing entities.” His paper set out how e-mail filtering technologies using straightforward statistical methods may be married up with SNA in powerful ways to understand underlying patterns seen in the ESI data repository.
For her part, Maria Esteva demonstrated how archival theory could be brought to bear on e-discovery problems, especially in her use of the term “archival bond,” a “fundamental concept in archival theory and practice” which “describes relationships between documents as an essential property of documents.” Dr. Esteva went on to report of her experiments in finding relationships in and across documents using a statistical method of “paragraph alignment,” inspired by bioinformatics, as well as what she called a graphical user interface (GUI) providing links in a similar manner to SNA as described above. I am grateful to both Hans Henseler and Maria Esteva for coming to the workshop, and given the importance of their collective work will wish to collaborate with each in the future on ways to best utilize their techniques in connection with the real world of e-discovery issues that readers of this column face every day.
Rounding out the morning, Doug Oard and I presented a paper we worked on whose very promising lead author, Feng (Charlie) Zhao, unfortunately couldn’t join us at the workshop — as he was committed to finishing up his first year of law school back in Seattle (a second avocation he has pursued after already receiving a Ph.D. in Information Science)! The research experiments for the paper, “Improving Search Effectiveness in the Legal E-Discovery Process Using Relevance Feedback,” constitute, so far as I am aware, the first known attempt to actually quantify the efficacy of sampling, iteration and feedback loops in connection with lawyers’ exchanging and refining their Boolean search queries as part of the “meet and confer” process.
Our paper’s conclusion: it appears that statistically significant gains can be had if one adopts at least a two-time meet and confer process, where there is asymmetric knowledge of the actual contents of the ESI repository being searched (e.g., the paradigmatic case, where one side knows a great deal or is in a position to know about its ESI data set, and the other party is playing a version of “go fish” with much more limited knowledge). Assuming that parties are prepared to come to the table a first time to hash out the parameters of a search protocol, based on the responding party’s sampling results, the research – which concededly had to be conducted on a very limited universe of documents — showed that a second meet and confer to renegotiate (refine) the queries being used may well result in substantial gains in “recall.” But the research also found that further iterations were not as successful – that an asymptotic limit of sorts is reached after one re-negotiation round. Thus, our experiments do not support endless iterative loops – which should be music to the ears of judges who wish to put limits on the discovery process, and to opposing litigants (and their counsel) who really don’t wish to spend more time with each other anyway!
The findings of our paper provide an empirical basis to support more anecdotal observations that I and others have been making for some time about the importance of exploiting sampling, iteration, and feedback loops – starting with my co-authored article with George L. Paul, “Information Inflation: Can The Legal System Adapt?,” 13 Rich. J. Law & Tech. 10 (2007) (as also recounted in one of Ralph’s prior blogs), and most recently discussed at greater length in the previously mentioned Sedona Commentary on Achieving Quality in E-Discovery. These findings also dovetail with the increasing chorus of judicial voices (e.g., Judge Grimm’s decision in Victor Stanley v. Creative Pipe; Judge Peck’s in Walter Gross Construction; the D.C. Circuits’ now infamous In re Fannie Mae Litigation case, Judge Facciola’s O’Keefe and Equity Analytics decisions, etc.) who have been at various turns nudging, prompting, and urging parties to talk to each other about the parameters of the search of ESI to be conducted – and scolding them when they have failed to do so (or been oblivious to the prospect of doing so). And, of course, negotiating search protocols is one of the major planks of The Sedona Conference® Cooperation Proclamation, of which I wlll have more to say below.
Lunch consisted of a lively discussion of themes from the day, carried out so as to mix lawyers, vendor types, and academics at each lunch table, with reports back from each of the table facilitators at the end of the day on how we can all best make progress in the area of search and information retrieval. My thanks to Conor Crowley of Daley Crowley LLP, Jan Puzicha of Recommind Inc., Stuart Rennie, Esq. (our sole Canadian representative, from Vancouver), Johannes Scholtes, ZyLab North America LLC, and Sonya Sigler, Cataphora, for assisting in leading those discussions and participating in the end-of-day formal panel.
Space does not permit a detailed recitation of the afternoon sessions’ papers, except to say that I urge readers of this piece to go to the DESI III page and follow the link to “Agenda, Papers, and Bios,” where the full range of 25 or so accepted workshop papers, many accompanied by PowerPoints from presenters, are available online. I do, however, wish to go out of my way to thank additional colleagues who were so gracious in finding the time to work up papers and presentations and come the long way over to Spain, including K. Krasnow Waterman of LawTechIntersect LLC, Simon Attfield of University College London (co-organizer of DESI II), Bill Butterfield of Hausfeld LLP, David Chaplin of Kroll Ontrack, Jorge Román and Shelly Spearing of the Los Alamos National Laboratory, Bob Bauer and Chris Hogan, both of H5 (who together submitted two papers that were accepted at ICAIL and the DESI III workshop), and last but not least, my colleague, Doug Oard, at the University of Maryland, who graced us with some further remarks on cross-language search for e-discovery.
The organizers would be remiss in not also thanking Dr. Giovanni Sartor, Marie-Curie Professor of Legal Informatics and Legal Theory, at the European University Institute of Florence, for agreeing to provide some brief remarks as a representative of the AI+Law community. The workshop was a great success through the combined efforts of all of the named individuals above, plus those who submitted papers which were accepted but who could not be present (including a special mention to Marissa Caylor, Class of 2010 at Boston U. Law School, my law school alma mater, who agreed to co-author with me a short piece on “Searching for ESI in the EU: Some Rules of the Road for the European Data Controller”).
Especially warm thanks are due to my fellow DESI III workshop co-organizers Jack Conrad and Dr. Marc Light, both of Thomson Reuters Research & Development in Minneapolis; Dr. Kevin Ashley, Professor of Law and Intelligent Systems at the University of Pittsburgh School of Law School; and Debra Logan, of Gartner Inc. (who resides in and blogs from London on e-discovery), for all of their behind the scenes work necessary to put on a successful workshop, including reviewing this year’s research and position paper submissions. I would like the opportunity here to single out Jack Conrad, as fostering the connection with ICAIL – he both initially encouraged me to propose the DESI I workshop in Palo Alto as part of ICAIL 2007, and in turn spent considerable time and effort this year in ensuring the Barcelona workshop’s success as part of ICAIL 2009. I am also enormously grateful to Kevin Ashley, both for all of his past support and efforts, as well as for his announcement from the podium at this year’s workshop that he will be inviting selected participants from the DESI workshops to revise and expand upon their remarks, to be part of a special issue of the Artificial Intelligence and Law Journal coming out in 2010 devoted to e-discovery. His own co-authored position paper at the workshop on “Emerging AI+Law Approaches to Automating Analysis and Retrieval of ESI in Discovery Proceedings” is a mini-gem, well worth spending the time to read, and which I trust wlll form a kind of compass piece to the issues expected to be taken up in the AI+Law Journal special issue.
The Sedona Conference on Cross-Border and Privacy
After a day off touring with the family the Mediterranean coast north of Barcelona (I recommend to anyone a visit to the outrageous Salvador Dali Museum in Figueres, as well as a side-trip to the gorgeous seaside town of Cadaques), I returned to Barcelona in time for The Sedona Conference’s two day program on international cross-border e-discovery and privacy, held at the Hotel Arts (next to a giant sculptured whale).
I first wish to say that I am so very grateful to Richard Braman and Ken Withers, as well as to Jim Daley, who acted as chair of the Sedona event, for electing to co-locate Sedona’s international programme in Barcelona during the same week as the DESI III workshop. Doing so certainly fostered and encouraged the attendance at both of many of my friends and colleagues, themselves e-discovery luminaries, including, in addition to Ken Withers himself, Maura Grossman of Wachtell, Lipton, Rosen & Katz; Jeane Thomas of Crowell Moring; Ted Barassi of Symantec; Deborah Baron, Autonomy; Chris May, ieDiscovery; Adam Bendell of FTI Consulting; and Macyl Burke, ACT Litigation, among others. I also wish to say that I cannot do justice in this space to reporting on the full richness of the papers and presentations set out at the conference, but only wish to highlight a few points.
The Sedona Conference programme included invitations to a number of high profile members of the EU and UK judiciary, including Dr. Alexander Dix, Commissioner for Data Protection & Freedom of Information in Berlin, Germany, who gave the Day 1 Keynote: “Cross-Border Discovery Conflicts – A Way Forward.” In his remarks, Dr. Dix went out of his way to emphasize how The Sedona Conference© has been a leading light in the development of commentaries and conferences aimed at clarifying the difficult issues involved. Participants also had the opportunity to hear from, among others, Dr. Waltraut Kotschy, Chairwoman of the Austrian Data Protection Commission, Dr. Hiroshi Myashita, Office of Personal Information Protection, Tokyo, Japan, and the Hon. Simon Brown, QC, Specialist Mercantile Judge, Birmingham Civil Justice Centre, Birmingham, England, to name just a few of a wide variety of speakers. I also would be remiss in failing to note what a “hoot” it is to be a participant in a room where the lively Sandra Potter, of Potter Farelly & Associates, of Melbourne, Australia, is moderating a panel.
Master Whitaker, Senior Master, Queens Bench Division, London, England, gave the Day 2 Keynote entitled “UK eDisclosure Developments As A Study in Contrast With US E-Discovery.” The title of his talk notwithstanding, I was struck again, as I was last year hearing him at the 2008 MIS International Conference on Digital Evidence chaired by Stephen Mason, actually of how similar the experiences of US and UK lawyers and judges are, at least in terms of what it evidently takes to be “getting up to speed” on the subject of constructing what constitutes a reasonable search protocol. Indeed, Master Whitaker began his talk by putting forward the rhetorical proposition that for many lawyers in the U.K., it would seem perfectly acceptable to target one’s search of ESI using search terms of one’s own devising, without any prior discussion of one’s opponent’s thoughts. He went on to say that many would sign on to a second proposition, more or less, that what entails a “reasonable” search is whatever one does in connection with actually doing a search, as instructed by the client.
But, as he went on at great length to point out, in light of Digicel (St. Lucia) Ltd & Ors v. Cable & Wireless & Ors,  EWHC 2522 (Ch), those propositions simply do not hold any longer. Digicel, in which the U.K. judge parsed virtually every search term in a Boolean query to decide whether it or a variant should be included in the proposed search, is a watershed kind of decision, much like Victor Stanley and others in the U.S., where it can no longer be said that the bench and bar are without notice of the limitations of search. I know that Chris Dale has spent considerable time on his own blog discussing the implications of Digicel, and I would commend his efforts for further reading. Master Whitaker spent the remainder of his time discussing a questionnaire that he has had a role in developing, which much akin to local rules requiring special formatting of Rule 16 or 26 joint reports or submissions in the U.S., plays the role of directing parties to think about a variety of ESI-related issues in their cases.
I will end this blog with a thought based on a question I posed to Master Whitaker, which was in turn picked up as a theme addressed by the Hon. David Waxse, US District Court Magistrate Judge in the District of Kansas, near the end of the programme: To what extent would it make sense for the bench and bar in the U.K., or in the international legal community as a whole, to take up the cudgel in “signing on” to the The Sedona Conference® Cooperation Proclamation? I understand that the current document may be U.S.-centric, but it could easily be “internationalized” in a more generic form.
It was obvious to some of us sitting in the conference room in Barcelona that, at least in some places around the globe, pushing this particular idea could be seen as both foreign and gratuitous. (We all were made to understand, for example, that talking to the other side during a legal proceeding in Spanish courts is practically unheard of!) And yet, in light of Digicel and the overall tenor of Master Whitaker’s remarks, I can’t help but think that fostering greater cooperation is one of the keys to being smart in the digital age – that the nature, volume, and complexity fairly demands it, no matter what country one is in and which judicial system may control. I would certainly applaud efforts to “internationalize” the Cooperation Proclamation along these lines. Some tapas for thought. Anyone interested in continuing the discussion of why greater cooperation amongst lawyers can work in other countries, feel free to comment in this space, or drop me a separate line (firstname.lastname@example.org).