Analysis of the Official Report on the 2011 TREC Legal Track – Part One

This blog provides my description and analysis of the official report of the 2011 TREC Legal Track search project. I urge you to read the original report that was published in July 2012: Overview of the TREC 2011 Legal Track. The report was written by the 2011 coordinators of TREC Legal Track: Maura R. Grossman, Gordon V. Cormack, Bruce Hedin, and Douglas W. Oard. I assume my readers already have a pretty good idea of what TREC is all about, but still, I think a short reintroduction may help put the 2011 report in perspective.

What is TREC Legal Track?

TREC stands for the Text REtrieval Conference. (If you ask me that is a stretch for an acronym, revealing that the founders were obviously Star Trek fans.) The stated purpose of this annual conference is to encourage research in information retrieval from large text collections. The TREC conference series is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The other co-sponsor of TREC is the U.S. Department of Defense. This is very much a government-run event where academia is encouraged to research.

The Legal Track part of TREC is just one of many different research areas (Tracks), and is a relative newcomer at that. TREC was started in 1992 to support research within the information retrieval community. The Legal Track was started in 2006 by Jason R. Baron, David D. Lewis and Douglas W. Oard to consider the unique problems of legal search. The stated goal of the Legal Track is to develop search technology that meets the needs of lawyers to engage in effective discovery in digital document collections.

One of the core purposes of all of the Tracks, including the Legal Track, is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems. Despite this reference to commercial products, and much to the frustration of some, the Legal Track is not a bake-off for e-discovery vendors to see who has the best search software. (Bake-off is a term that Jason Baron likes to use, although it seems a bit soft to describe the harsh world of competition among e-discovery vendors today that Craig Ball recently commented on.)

Despite the commercial products and  demonstration of robustness language, and the comparisons of technology and methods inherent in these events, this TREC was not built for enterprise. It is a decidedly academic research event. It is sponsored by federal governmental agencies and has strict rules prohibiting participants from telling people how well they did or did not do. TREC Statement on Product Testing and Advertising. 

This creates no problem for the universities and tenured professors who primarily participate, but creates real tension for the few e-discovery vendors and private service professionals who participate. Unlike the universities and government agencies, these vendors have real skin in the game, as we say. They offer software products and services having to do with information retrieval from large text collections. They are businesses, and in prior years, attorneys like me who are focused on legal search and want to test their methods and strategies.

Disclosure of Slight Conflict of Interest

At this point I feel I should make a personal disclosure, much like a broker who analyzes a stock should disclose whether they have a position in that stock. They make that disclosure because it indicates their analysis and recommendations may not be completely unbiased or disinterested. Try as they might, they have skin in the game. This creates at least an indirect conflict of interest that might impact the impartiality of their analysis.

Since the Spring of 2012 I have been in a similar position when it comes to TREC because I signed up at that time to be a participant in the 2012 Legal Track. (It was later postponed to 2013.) I am participating as my own independent group, not connected with my university, law firm, or a vendor. My group, or as I prefer to call it, e-discovery team, is led by me, but will include other independent professionals. I have a few other experts in mind. We plan to use Kroll Ontrack’s Inview software, since I know it so well, and they have graciously allowed me to use the software without charge for this purpose. Aside from my use of their tool, my participation in TREC will be completely independent from Kroll Ontrack. None of their personnel will be part of my team or provide assistance. To the contrary, Kroll Ontrack could enter their own competitive team, which I for one hope they do.

My team will test out the hybrid multimodal approach to information retrieval from large text collections that you have seen described here in many prior articles. The hybrid multimodal method can be done with a number of predictive coding software tools now on the market. It is my contention that the carpenter is more important to the building of a house than the tools he uses, but still, like any carpenter, I like to use  the latest power tools. Anyway, that is my thesis, and part of what I hope to test out in the next TREC Legal Track.

Based on my having some skin in the game, I may have a pro-TREC bias and otherwise be somewhat chilled in my write-up. Still, since I was not a participant in the 2011 Legal Track, and had no involvement in the official report, the impact of my participation in 2012 should be minimal.

My enrollment in the 2012 Legal Track (now postponed to 2013) does, however, create one legal  encumbrance that I should also disclose. Like all other 2012 TREC Legal Track participants, I have signed an agreement concerning dissemination of TREC results. The agreement contains non-disclosure requirements. You are not allowed to participate in the Legal Track of TREC unless you sign this agreement.

This obligation, which I of course take seriously and will strictly follow, should not, however, impact this article or analysis because it does not apply to the 2011 study. It is not ex post facto. Again, I was not a participant in the 2011 Legal Track. I have no real inside knowledge of the 2011 event to disclose. All of my information has come from public sources. Still, my commitment to participate in 2012-13 TREC Legal Track may influence my analysis somewhat, and so you may want to look to other commentators for the full story and complete critical analysis.

2011 Participants

There were ten participants in 2011; six universities from around the globe, and four private companies:

  1. Beijing University of Posts and Telecommunications
  2. Helioid, Inc.
  3. Indian Statistical Institute
  4. OpenText Corporation
  5. Recommind, Inc.
  6. Technology Concepts & Design, Inc.
  7. University of Melbourne
  8. University of South Florida
  9. University of Waterloo
  10. Ursinus College

I must admit, of the four private enterprises participating, I had only heard of Recommind, Inc. before, but my hats are off to all of the commercial entrants. We should get to know them better, and I provide links to their webs for that purpose. There are all very interesting companies. For instance, did you know Helioid has a free Internet search engine, somewhat like Google, but it allows for interactive narrowing of your search results? It looks very interesting. I have added that to my bookmarks and will try it out.

The vendors who participate in scientific research like TREC are serious about search. They are unlikely to make the kind of promises implied in the video below of a new product that my e-discovery team is about to release. We were thinking of calling it the needle finder, but apparently that name is already taken. So, perhaps we will call it the e-Haystack Smasher? Watch the movie and let me know if you have any better ideas for a name? Look for our booth at Legal Tech.

But seriously, I encourage more vendors to participate in the next TREC. Contrary to rumor, participation does not require you to disclose any of your valuable trade secrets (although you may have to muzzle your marketing department). What are you waiting for? Perhaps you are waiting for an e-Discovery Olympics where they will have clear winners and award gold medals? That could happen some day, and I am personally not adverse to it, but not at TREC. It is not too late to sign-up for the next TREC event, which, as mentioned, has been delayed to 2013.

Summary of the 2011 Search Challenge

Each year the Legal Track has a slightly different series of search tasks to try out and analyze. In 2011 there was only one task, which they called a learning task. It was designed to test the ability of predictive coding type search software and techniques to learn from instruction, from small seed sets, to properly rank a much larger data set. Each participant was provided with expert input for up to a 1,000 document seed set, and from there, the participants were to rank each document in a corpus of 685,592 documents. (TREC used the same Enron database put together by EDRM that I used in my seven-part search narrative, only my count was 699,082 documents, not 685,592, for reasons unknown. SeeDay One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron.)

To make the project more challenging, the ranking was done according to three topics, not just one. In other words, they had to make a binary determination, relevant or irrelevant, on three different issues. The ranking task further required participants to state a specific probability as to each document, i.e., 0% to 100%, on each issue. This ranking and probability scoring is a feature inherent in all predictive coding software that I have seen. I described this ranking with probability scores and how that works in practice in my Search Narrative where I searched on only one issue.

The 2011 Report abstract describes the task of participants as follows:

Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability. Participants were permitted to request up to 1,000 responsiveness determinations from a Topic Authority for each topic. Participants elected either to use only these responsiveness determinations in preparing automatic submissions, or to augment these determinations with their own manual review in preparing technology assisted submissions.

There were many subtleties to the task assigned that would take too long to describe, but I will mention one that was particularly interesting to me. It addressed one of my often stated hypothesis concerning effective search strategy, namely that the approach should be hybrid. By this I mean that the expert attorney input is just as important as the computer, that both man and machine should work together. I contrast this with what I call the Borg approach where the role of humans is limited and the artificial intelligence of the software algorithms dominate. See Eg. Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane(subsections Some Vendors and Experts Disagree with Hybrid Multimodal and Fighting for the Rights of Human Lawyers).

I happen to know from appearing on many panels with one of the coordinators, Maura Grossman, that she agrees with me on this issue. But I also know that there are other experts out there who are pro-Borg and think that attorneys just muck things up. They try to limit human input as much as possible and instead prefer random input. They are usually not attorneys and do not have a particularly high opinion of attorneys for reasons that I can well understand, but not condone.

Apparently, some of the smartest-guys-in-the-room types on Wall Street are also pro-Borg. On August 31, 2012, their misguided faith in computers and keyword counting caused some of biggest brokerage houses to lose millions in just a few minutes. They relied on computers to read Ben Bernanke’s speech for them and sold off thinking it was bad. In reality, the prepared text for the speech was good. The sell-off bounced back in minutes. The brokers who used a hybrid approach, and had experts actually skim read Bernanke’s speech and supplement the computer analysis, made millions when they bought on the Borg flash-crash. This is all  shown in the below chart of the market, and discussed at greater length in a Law Technology News commentary I wrote, Resistance Is Not Futile, and in my public Facebook page at

Since the Hybrid approach is central to my search strategy, and is such a hot item in the news today, I was glad to see a variable in the 2011 TREC task that addressed this controversy. See page five of the report, section 4.2 on Participation Categories. Participants could decide to use either a hybrid approach or Borg approach for each of the defined search reports. TREC, despite is sci-fi inspired name, did not use such colorful language. They called the Borg approach automatic, and the hybrid approach, technology assisted. The participants who elected to use the hybrid approach were asked to report the amount of human time expended. The effort expended varied considerably and are shown in Table 3 of the report. I note that one commercial participant spent more human time than all of the others using the technology assisted approach and that they scored very well in search accuracy.

For a list of which participants used which approach, automatic or tech assist, in other words, Borg or Hybrid, see Legal Track Results document. Would you be surprised to know the Chinese researchers participants at Beijing University adopted the Borg approach? But so too did the University of Waterloo in Canada, and Ursinus College in Pennsylvania. The university participant from my home state, the University of South Florida, went all out for Hybrid.

Details of the Legal Search Task

The task was designed to model a request for production in civil litigation. Remember this is a model, not an actual replica of an actual legal search project. (More on this later.) A more exact replica would be possible, but too expensive for TREC, which has little to no funding. TREC is dependent on many volunteers giving generously of their time, especially the coordinators. Still, even though just a model, it is fairly accurate, with a few exceptions, some of which I will point out.

In the TREC model a senior attorney, which they call a Topic Authority, interprets a request dreamed up by the TREC coordinators. The authority then communicates his or her interpretation to the participants. The participants then in turn search, review, code, rank, and produce the responsive documents to a requesting party (the coordinators). The coordinators then use sampling and outside reviewers to try to determine how well each participant did, their precision, recall and harmonic blend of same – the F1 score. Unlike the real world, the requesting party does not then complain, file countless motions, nor appeal to a much higher topic authority, namely a judge, and then an appeal judge. I am quite fine without those complications, thank you. But the point is, the real world is much more messy that any scientific experiments. This does not negate the value of  TREC, but it should temper our analysis.

The process begins with a Topic Authority’s study of the request, a background complaint, and a study of a sample of potentially responsive documents given to them by the TREC coordinators. There is a different Topic authority for each of the three issues, and like the coordinators, they are unpaid attorney volunteers. In 2011 the volunteers were Kevin F. Brady, Brendan M. Schulman, and Robert Singleton. They have done us all a service by volunteering their time for this important task.

The Topic Authorities then prepare a set of coding guidelines. The production request and the guidelines are then provided to the participants. Next, there is an initial kick-off phone conference where participants can, if they wish, ask the Topic Authorities questions about their interpretation of the request for production. I do not know if there was one call for all three Topic Authorities, or three different calls.

Thereafter, in the course of several weeks, each participant can request additional limited feedback from the Topic Authorities on particular documents. Each participant can submit up to a total of 1,000 documents for yes or no relevancy question for each topic, for a total maximum of 3,000 queries. The purpose of this is to improve the seed set training and there are timing limits built-in that effect when the queries can be submitted.  The Topic Authorities are only permitted to advise whether he or she considers the document to be responsive or not. They can not provide an explanation of why they thought a document was relevant, or not, nor can they engage in any dialogue.

A Model is Not the Real Thing

None of this is exactly the way it is in real civil litigation, in real legal search, but the last point is especially aberrant. Senior attorneys in  civil litigation have often been known to give elaborate explanations on documents. They explain in detail their thinking about the probative value of particular documents, their interpretation of the request for production, and their theory of the case. I know I have.

In my experience the give and take between the Topic Authority experts, the senior counsel, or team of senior attorneys in charge of a project, and reviewers, including especially the head reviewer or project manager, can significantly improve the quality of relevancy determinations. It improves the work of the reviewers, who get the benefit of the experts’ thinking. But just as important, and often overlooked, it improves the work of the Topic Authorities, the senior counsel like me who get the benefit of the reviewers greater familiarity with the documents. The learning that goes on is, ideally, two ways. The Topic Authority talks and teaches, but also listens and learns. This kind of interaction sets up a virtuous feedback loop that should not be underestimated. The way the tasks are set up in TREC, the efficacy of such positive dialogue cannot be tested.

Perhaps someday, when and if there is significant funding for TREC Legal Track, the model can become more realistic. For instance, the mock authorities in the TREC panel were all volunteers. Their time was limited and, of course, was not compensated. It would be very burdensome and time-consuming for them to fully dialogue with all TREC participants. This could change if TREC could afford to hire senior attorneys whose full-time job for several months was to play the role of Topic Authorities for all participants. This would improve the accuracy of the model and allow for additional metrics of the role of experts in legal discovery. It may be myopic on my part, but I happen to think their role is critical to effective legal search. I would like to see some experiments that address this.

Still, the 2011 TREC task set-up represents an improvement from prior years. The Topic Authorities were available to participants to provide substantial input, even if it was just yes or no. The total maximum of 3,000 queries is more than adequate, and, if all participants had availed themselves of the opportunity to receive this input (I understand many did not), this yes or not input alone would have been very burdensome to the volunteer Topic Authorities, i.e. 3,000 x 10. I look forward to subsequent analysis of the 2011 data that examines the impact of the queries. I expect that the participants who took full advantage of the query opportunity did better than those who did not.

To be continued . . . 

7 Responses to Analysis of the Official Report on the 2011 TREC Legal Track – Part One

  1. Michael Duffy says:

    Just one quick comment regarding your reference to the importance of the feedback loop between the TA or senior attorney in charge of the review and the reviewers. This feedback is not only good to have, it is essential to a properly conducted review. As you know, the identification of a properly responsive document is not a black and white decision but mostly grey and dependent upon many factors including case strategy, appetite for risk and knowledge of the subject matter. It is also an evolving determination and often can change 180 degrees from what was considered responsive on a particular issue at the beginning of a project. The decision maker, be they a TA or the senior attorney on the project, is learning the documents and adjusting their decision on responsiveness based on the feedback they are receiving. Without the ability to communicate their reasoning and even their change of opinion in some cases, there is no hope the calls being made by the front line reviewers will be accurate or consistent.

  2. […] This is part two of my description and analyses the official report of the 2011 TREC Legal Track search project. I urge you to read the original report that was published in July 2012: Overview of the TREC 2011 Legal Track, and of course, be sure to read Part One first. […]

  3. […] published in July 2012: Overview of the TREC 2011 Legal Track, and of course, be sure to read Part One and Part […]

  4. […] Analysis of the Official Report on the 2011 TREC Legal Track – Part One. […]

  5. […] of the Official Report on the 2011 TREC Legal Track – Part One – (Ralph […]

  6. […] conducted by TREC Legal Track of the National Institute of Science and Technology. See eg. Analysis of the Official Report on the 2011 TREC Legal Track – Part One, Part Two and Part Three; Secrets of Search: Parts One, Two, and Three. Also see Jason […]

  7. […] studies that I have written about many times in connection with predictive coding research. Analysis of the Official Report on the 2011 TREC Legal Track – Part One (of three). Among its many other activities, NIST also keeps track of many standard software […]

Leave a Reply