By: Jason R. Baron

Ralph has given me the honor of filling in for him this week with my report from two fascinating ESI-related forums I had the pleasure to be a part of while in London during the last week of June. As a charter member of the Ralph Losey e-discovery team fan club, let me say right off the bat that I am delighted to be a guest blogger here; it is certainly my hope that in agreeing to fill in that I have provided Ralph with a much needed July 4 break from his self-imposed, Atlas-like workload, in voluntarily turning out, week in and week out, a year-round e-discovery blog of such unsurpassed quality!


On June 25, 2008, the Second International Workshop on Supporting Search and Sensemaking for Electronically Stored Information In Discovery Proceedings (affectionately referred to here as DESI II) was held at University College London, hosted by UCL Interaction Centre, a leading center for research into human-computer interaction. As described on the DESI II workshop main page, the workshop grew out of the first DESI Workshop at the ICAIL 2007 conference last June in Palo Alto, where participants from five continents and from multiple research disciplines for the first time gathered together to discuss challenges presented by e-discovery, with a focus on dealing with large collections of electronically stored information (ESI) during litigation and regulatory investigations.

This year, the aim of the second DESI workshop was to bring together legal practitioners, service providers and researchers to develop the research agenda further, to elaborate a European perspective on these issues, and to consider how developments in areas such as information retrieval, artificial intelligence, and human computer interaction might be brought to bear on key problems faced in a variety of legal settings in the U.S. and around the globe.

I am so ever grateful to Dr. Simon Attfield, Senior Research Fellow at UCL, for agreeing to take on all of the administrative heavy lifting duties for running this year’s workshop, and to Dr. Ann Blandford, Professor of Human-Computer Interaction and Director of the UCL Interaction Centre, for her role in facilitating the hosting of the workshop at UCL. My thanks as well to my colleague Dr. Douglas W. Oard, Associate Dean of Research at the University of Maryland’s College of Information Studies (and one of the co-founders of the TREC Legal Track), and to Stephen Mason, a London barrister and Associate Senior Research Fellow at the Institute of Advanced Legal Studies, whose original invitation to have me speak at his International Conference on Digital Evidence (see below) inspired me to propose holding a second workshop in the first place!

The format of the past two DESI workshops has been truly unique: bring together 40 to 50 expert individuals from a cross-spectrum of disciplines, ranging from information retrieval, artificial intelligence, computer science, linguistics, legal technology, and psychology, and have them interact in a classroom setting for an entire day to discuss the subject of how to go about improving search and information retrieval as practiced in legal settings, including e-discovery. (Imagine inviting Picasso, Freud, and Einstein to a dinner party, and stepping back to see what happens.) Thanks to a format heavily inspired by Doug Oard, the workshops have been highly interactive: for each of two or more panel sessions, speakers are routinely limited to 10 or 15 minute presentations of their submitted papers; a “discussant” comments on the papers that have been presented in each of the panel sessions; and a moderator then encourages audience feedback. At lunch, all participants break into small groups (this year consisting of “language processing,” “information retrieval,” sensemaking, vendors, and barristers), with group leaders assigned in advance to foster discussion and report back at the end of the day what the consensus views of the group turned out to be.

This year’s workshop featured 10 submitted position papers from academics and legal service providers from the U.S., U.K., and Japan (all available on the DESI II workshop page), plus an opening keynote from Jeane A. Thomas, a partner at Crowell & Moring and head of the firm’s e-discovery group in their Washington, D.C. offices. Jeane, whom I know well through The Sedona Conference®, the Georgetown Advanced E-Discovery Institute, and other forums we have appeared in, graciously accepted an invite to provide an overview of the problems lawyers face in the area of conducting searches. Jeane started off her keynote using almost the same words as I used when I introduced myself to Doug Oard a few years ago, namely: I’m a lawyer, I’m not an expert in technology, and I have problems. Jeane proceeded to summarize the ESI challenge facing lawyers in the U.S. and arguably around the world, namely: an increasing volume and complexity of potential relevant documents and evidence that need to be accounted for; the need for better tools through future advances in technology to adequately carry out the search function; and the fact that technology by itself is not the answer – that lawyers themselves need to think more strategically about how to conduct reasonable discovery. Jeane went on to emphasize the increasingly international dimension to her antitrust and complex litigation practice.

Following Jeane, Doug gave an overview of this year’s TREC Legal Track, and I provided some brief remarks updating the audience on the emergence in the past year of U.S. case law recognizing the importance of search and information retrieval in e-discovery, including Judge Facciola’s decisions in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 (D.D.C. 2007), United States v. O’Keefe, 537 F.Supp.2d 14 (D.D.C. 2008), and Judge Grimm’s recent decision Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008). Also, new in the past year has been publication of The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, which was cited in both the O’Keefe and Creative Pipe cases as providing guidance to practitioners in strategically thinking about search issues.

Although I could go on at length about many of the very good papers presented about common search issues of interest, I was particularly struck by the remarks of Bob Bauer from H5, as well as by my workshop co-organizer Simon Attfield, on the relevance of “sensemaking” research to e-discovery. Until these DESI workshops, I frankly had been oblivious to the considerable body of academic research that exists on the subject of “sensemaking,” or how that research could be applied in a practical way to the e-discovery process. As Bob Bauer quotes from a recent paper by three Google academics presented in turn at a recent CHI workshop in Florence, “Sensemaking is simple – it’s the way people go about their process of collecting, organizing and creating representations of complex information sets, all centered around some problem they need to understand.”

The thesis Bob presents thru his jointly authored paper entitled “Automated Legal Sensemaking: The Centrality of Relevance and Intentionality,” is straightforward: 15 years of sensemaking research should be brought to bear to establish a minimum acceptable basis for conducting document review involving ESI. In doing so, of paramount importance is recognition of the fact that “[t]here is no substitute for a rigorous characterization of the explicit and tacit goals of the senior litigator,” i.e., the person “responsible for developing and fully understanding all nuances of their client’s legal strategy.” Thus, “[w]hat is required is a system (people, process, and technology) that replicates and automates the senior litigator’s human judgment.” In Bob’s view, “any method that depends primarily on human review fails to transfer properly the requisite knowledge of the senior litigator’s sensemaking into a consistent, reproducible document review.” His paper goes on to make a fascinating connection with cognitive task research involving intelligence analysts that has been conducted over the past 10 years, where it has been reported that senior and junior analyst behaviors are nearly the opposite of each other, in terms of taking “bottom up” or “top down” approaches to hypothesis building, and where

[t]he crucial insight based on sensemaking research is that in [ ] e-Discovery, senior litigators are NOT reviewing the literal content of text (i.e., bottom-up), but rather the overarching aspects of the situation and the author’s intent (i.e., top-down). (capitalization as in original).

The bottom line, paraphrasing Bob’s paper: future advances in efficiently performing document review must replicate the sensemaking of senior litigators as a top-down, automated process of searching for and encoding case-topic-specific questions, and that any scalable e-discovery process must establish explicit criteria for determining “relevance” as viewed through the lens of the senior litigator, while at the same time taking into account the multiplicity of communities that the original authors of documents work in, with all of the idiosyncrasies of language and linguistic terminology used by those authors. One last thing: “execution of the sensemaking approach,” in Bob’s view, also involves “rigorous measurement and statistically valid, in-process quality control.”

In a similar vein, Simon Attfield’s co-authored paper with Ann Blandford, entitled “E-discovery viewed as integrated human-computer sensemaking: the challenge of ‘frames’” reports on the results of an interview field study with lawyers working on a large regulatory investigation in the U.K., with the purpose of extrapolating from that study how further research into better understanding “evidence review and analysis in e-discovery” could be conducted. In particular, the authors introduce the notion of making sense of information through the application of particular types of “frames,” which in the e-discovery context would mean how lawyers go about choosing to aggregate documents with the aim of determining relevance, how decisions with respect to relevance are cued, and how decisions are in turn elaborated upon and validated. Simon’s paper states his belief that the Enron collection (a publicly available data set of email and other documents) is a candidate data set for further research into the sensemaking activities of lawyers, and I welcome his further research. The results of such a research program might yet find its way to being incorporated in “checklists” covering how relevance determinations are to be optimized in future complex document reviews.

I took away from the workshop the palpable sense that the information retrieval researcher crowd, represented by Mark Sanderson, a Reader at the University of Sheffield, Ian Ruthven, a Reader at the University of Strathclyde, Yunhyong Kim, a researcher at the University of Glasgow, and Mounia Lalmas, Professor at Queen Mary, University of London, were collectively intrigued by the practical problems lawyers (and barristers and solicitors) all face in this area, especially in light of the fact, as Mark Sanderson suggested in his remarks, it isn’t every day that IR researchers actually talk to the users of their information retrieval software designs. It might be too much to wish for, but a research project of a parallel nature to the TREC legal track in the U.S. might be something that a European-based grant-making agency would be interested in.

From the U.K., I wish to thank in particular Reza Alexander, Litigation & Practice Support Manager at DLA Piper UK for his fine introductory remarks, and Chris Dale of The e-Disclosure Information Project, for his presentation on “CaseMap issue linking in UK civil proceedings,” as well as favoring us with the attendance of His Honour Judge Simon Brown, a Designated Mercantile Judge, who together were appearing at still another e-discovery conference that day in London. I also wish to thank Jacki O’Neill of Xerox Research Centre Europe, and Ian Black of Autonomy Group in the U.K., for their papers and remarks. Getting the award for traveling the furthest distance was Professor Frank Bennett, Jr., of the Faculty of Law at Nagoya University in Japan, who spoke on unique OCR search issues when faced with reconstructing financial statements in Japanese.

Lastly, from the U.S., we were blessed to have with us an expert contingent of lawyers, academics, and technologists giving short talks and comments, including Craig Carpenter, Recommind, Kelly (KJ) Kutcha, Forensics Consulting Solutions, Carsten Gorg, Georgia Institute of Technology, David Chaplin, Kroll Ontrack, and Chris May, IE Discovery.

From the views of the participants and audience at the workshop, the workshop was successful in achieving the goal of raising awareness of the separate “island universes” represented by academics and lawyers on the matter of improving searches conducted in e-discovery. From the early returns, it would seem that interest remains high in holding a DESI III somewhere in the world next year. Possible choices include would-be organizers putting in a proposal for a workshop to be held as part of ICAIL 2009 in Barcelona next June. Readers of this column interested in participating in a future DESI III (or just interested in any excuse to go to an e-discovery related event in Barcelona) should certainly feel free to contact me at jason.baron@nara.gov.

MIS International Conference on Digital Evidence

I wish to also briefly report on the follow-on International Conference on Digital Evidence, held over two days on June 26-27, 2008 at historic Vintners’ Hall in London. Billed by MIS as “The First Conference to Treat Digital Evidence on a Global Platform,” I can say that the conference lived up to its name, and that it was well worth attending. The opening speech and chairman’s introduction was given by the above-mentioned Stephen Mason, who among many published works is the author of Electronic Signatures in Law (Tottel, 2d ed, 2007) and International Electronic Evidence (British Institute of International and Comparative Law, 2008) as well as general editor of the Digital Evidence and Electronic Signature Law Review (among other publications).

The MIS conference featured a remarkably broad panoply of speakers from around the world, discussing how their respective legal systems and cultures deal with problems of digital evidence. My own modest role at this conference was limited to giving a 30 minute talk on search and information retrieval issues from a U.S. perspective, including findings from the first two years of the TREC Legal Track, as part of a panel chaired by Iain G. Mitchell, QC.

I cannot possibly give justice here to the two dozen or so presentations given at the conference, so I wish to just limit my observations here to a truly remarkable pair of panel presentations that took place on the afternoon of Day 1, involving Senior Master Steven Whitaker, of the Supreme Court of England and Wales, Queen’s Bench Division, Royal Courts of Justice & the Queen’s Remembrancer, and Judge Francis M. Allegra of the U.S. Court of Federal Claims in Washington, D.C. Judge Allegra has penned several e-discovery cases of note, including United Medical Supply Co., Inc. v. United States, 77 Fed. Cl. 257 (Fed. Cl. 2007).

Let me set out the following remarks from one of the judges, and I invite the readership here to guess which judge made these statements:

  • Virtually every case we’re seeing involves electronic data of some form.
  • E-discovery decisions are generally of an interlocutory nature and therefore do not work their way into reported opinions, especially at the appellate level.
  • There is little or no training of judges with respect to ESI issues.
  • Digital voice recordings are becoming an evidentiary concern.
  • Judges should encourage litigants to meet and confer regarding ESI issues, and to cooperate on issues including with respect to determining what custodians hold ESI of importance to the case, what time frames should be used to limit searches for evidence, what types of documents and storage formats exist, and in general what search methods are to be used.
  • Search issues, including what keywords to use, are not always adequately handled by lawyers.
  • Judges should review e-discovery issues with proportionality and fairness principally in mind – smaller cases do not merit huge expenditures of time and effort in terms of the fanciest automated methodologies being utilized or required.

If you guessed the American judge said the above remarks, you would be wrong! While Judge Allegra gave a very fine talk describing the U.S. perspective on all of the above matters, the eye-opening remarks by Senior Master Whitaker cemented my already considerably developed bias from the DESI II workshop that the problems of ESI are now global in dimension, and that judges and lawyers everywhere – or at least, by this example, in the U.K. as well as the U.S. – find themselves essentially together, all in the same soup. We are all struggling to find reasonable ways in dealing with increasing volumes of complex ESI.

I found one additional aspect of Senior Master Whitaker’s remarks to be particularly intriguing, where he de-emphasized the importance of the finding of mere “relevance” in terms of producing documents to an opposing side as the result of a document review, while emphasizing instead the view that under Part 31 of the Civil Procedure Rules (CPR) in the U.K., parties should be concentrating on self-identifying those specific documents of importance to the case at hand (either favoring their position or not), and turning those over to the other side. In other words, in a world brimming with millions of potentially relevant documents, why not cooperate with the other side (or have judges force such collaboration) in producing – at least first and foremost – only those documents of obvious materiality to the issues involved in a specific action? I now see that Chris Dale has written at considerable length on this topic as well.

So let me pose here the obvious question to a U.S. audience: would there be a benefit to affecting a revision to Rule 26 of the Federal Rules of Civil Procedure, so as to expand upon the existing set of “initial disclosure” rules and meet and confer requirements to require (or allow) parties to concentrate their efforts in developing a search protocol aimed at finding the most specific, core, material documents that will be at issue in litigation – rather than engaging in the practice of massive “data dumps” of merely potentially relevant evidence? I am sure this would be viewed by some to be controversial. But the reality of ESI over the coming decades makes this seem imperative to me. We are facing billions (if not trillions) of potentially relevant files, yet as human beings trying cases we can only use a minuscule fraction of that potential relevant evidence during depositions and at trial. Why not then focus our principal aim in making reasonable efforts to find the truly “hot” documents of a material nature, rather than having our rules be written in terms of seeking out the merely “relevant” (or the even larger universe under FRE 403 of documents that might themselves not be relevant but would lead to the admission of relevant evidence).

I pose this merely as food for thought coming out of the London conference, and will of course expect that Ralph and others will have something more to say on this topic in some future column. But it remains an intriguing thought: perhaps on this Fourth of July weekend, maybe British barristers and judges have something to teach us in America about how to go about procedural reform aimed at cutting down to size the enormous and ever increasing ESI/digital evidence problem.

  2. Jared Selengut says:

    It seems to me that effective searching is an iterative process that relies on feedback from earlier searches. Without the senior strategist understanding the technicalities of the search method and experiencing quick turnaround, I don’t see how they can find what they want.

    I don’t see why people bother to sift haystacks; when I’m looking for something hard to find on the internet, I only look at the first page for the first N searches, while I refine my search terms.

    I don’t see the point of using a thousand keywords and reviewing ten thousand documents. If you have a huge number of search terms, then you really don’t know what you’re looking for, which guarantees you’ll never find it.

    In summary, more searches, fewer terms, more senior searchers.


  3. Jason, As expected, an interesting commentary. I hope you are well.

    The more search terms the better. No one can ever be certain what exactly they will find. There are never guarantees, just calculated guessestimates of what exists. I believe keywords are the most important aspect of any electronic search. A reasonable and well thought out data sift is likely worth the effort. One’s haystack is another’s pot of gold.

    Laura Zubulake


