e-Discovery Team’s 2015 TREC Report

Google_Car_HybridHere the e-Discovery Team will explain how and what they did in the 2015 Text Retrieval Conference (TREC) conference conducted by the National Institute of Standards and Technology (NIST). In 2015 TREC had eight different types of text retrieval experiments going on; each one is called a Track. Our Team participated in one of them, the Total Recall Track using our Hybrid Multimodal method of CAR – Computer Assisted Review, and Kroll Ontrack software.

The Total Recall Track Coordinators were Adam Roegiest, University of Waterloo; Gordon V. Cormack, University of Waterloo; Maura R. Grossman, Wachtell Lipton Rosen & Katz; and, Charles L. A. Clarke, University of Waterloo. Our team was composed of Ralph Losey, Jackson Lewis; Jim Sullivan, Kroll Ontrack; Tony Reichenberger, Kroll Ontrack; and Mr. EDR. We played in all three Total Recall AtHome games, the only participant to do so in 2015:

  1. AtHome1Jeb Bush Email with a collection of 290,099 emails;
  2. AtHome2Black Hat World Forums with a collection of 465,149 forum posts (use caution in visiting these sites); and,
  3. AtHome3 – Local News Reports with a collection of 902,434 articles.

We did not play in the fully automated TREC Recall Sandbox this year, but maybe next year.

The primary purpose of the e-Discovery Team’s participation in TREC 2015 Total Recall Track was to test the effectiveness of our legal search methods and software. As to software, we used Kroll Ontrack’s EDR. As to methods, we used a modified form of Ralph Losey’s hybrid multimodal. Due to the way the Total Recall Track tests were set up we could not use our standard eight-step hybrid multimodal work flow. This is the method we normally use in law suits to find evidence. It is shown in the animation below.

We had to depart from this standard work flow in order to fit the experiments we were given. Plus, like all of the other participants in the Total Recall Track, we were working under very tight time limitations. We had 45 days to complete 30 different document reviews involving a total of 16,576,820 documents. Some of the variations from our standard workflow were made just to save time. That is, for instance, the primary reason we skipped Step Three – Random Baseline and Step Seven – ZEN Quality Assurance Tests. We had to skip our usual Step One – ESI Discovery Communications because of the way the experiments were set up. There was no one to communicate with. It was all automated. The standard eight-step work flow showing the three steps we omitted entirely in TREC are next shown.

Predictive Coding Search diagram by Ralph Losey

Our usual first step of ESI Discovery Communications normally takes hours to learn and refine the scope of relevance of the search project. In the Total Recall Track there were no communications as to what we were searching for except for two or three word descriptions of what was relevant given at the beginning of a search. That was it! But we were then supposed to learn and deduce the scope of relevance from instant feedback given to our document submissions to TREC. This was a kind of micro-feedback process that we never see in real search, especially legal search, but it did make for an interesting experiment, especially for the TREC Recall Sandbox participants. (We did not participate in that.) They used fully automated processes and never even looked at any documents.

By omitting our standard Step Three, the upfront Random Sample step that we normally use to determine prevalence, the Team saved a great deal of time. This omission did, however, make it harder to estimate when the Team thought they had found 90% of the relevant documents — 90% recall. Our ultimate goal for each of the thirty topics was to get as high a recall point as possible, with 90% representing an ideal, before determining that we made what we considered to be a reasonable effort (the point where we would normally stop a real-world legal review and begin quality assurance tests, including ei-Recall). It was a pleasant surprise for our Team to see how often we would exceed this 90% recall goal, even under the new and difficult conditions TREC presented, and even without our normal steps One, Three and Seven.

ZenBAgain to save time and be able to do thrity large document review projects in only forty-five days, we skipped most of the Zero Error Numerics procedures too. We typically follow this Seventh Step for quality control purposes, including a random sampling analysis, ei-RecallIt is risky to omit these procedures, which is why we never cut these corners when doing real legal work for clients. But we had to do so in order to timely review 16,576,820 documents in 45 days. That is an average of 368,374 documents every day.

Although the e-Discovery Team robot, Mr. EDR, was only down for an hour for maintenance, the humans on the team has to sleep several hours each day. They had other tasks to perform too, namely their day jobs, so most of their work was done at night and on weekends. As a legal review team we are used to hard work, deadlines and project management challenges. Still, this 30 projects in 45 days was the biggest challenge any of us had ever faced. The entire Team, and support staff, was given awards like shown below when they completed the challenge.


Even without all of the quality control procedures we normally use, and the random baseline in step three, we did very well. We made some mistakes, especially at first. But after we got a better idea as to how the games should be played from the first few topics we tackled, we sped through the remaining topics with only minor errors in a few. It was possible because only the top, most experienced players on the e-Discovery Team worked on the TREC games. That included e-Discovery Team coach, Ralph Losey, who led 17 topics, and three of Kroll Ontrack’s best experts, Team Captain Jim Sullivan (aka Mr. Perfect), who led 8 topics, Tony Reichenberger, who led 4, and Joe White, who led 1, plus a host of other supporting staff. The help of Kroll Ontrack team players made a huge difference, plus, so Coach says, did Mr. EDR, the Kroll Ontrack software.

Here is the Poster we prepared and presented at the TREC event. Most every participant included a poster like this to summarize their experiments. As you can see the poster summarizes our achievements and our main experiment.

trec_posterYes, you read the poster right. In one of the thirty topics we did, Topic 3484, we got a perfect score of 100% Recall and 100% Precision! We are pretty proud of that one. (And now you know how Jim Sullivan got his Mr. Perfect nickname.) More information of what we did will be forthcoming soon.

TREC_2We cannot compare how we did with the other participants until after TREC makes its full, final report in early 2016. Those are the TREC guidelines. At that time we will put together our official Final Report to let everyone know the full metrics, analysis of how we did, comparisons with the scores of others, the final answers to our three research questions, and provide a full description of our search efforts on all thirty topics. In the meantime, here is more information that we can provide at this time as to what the e-Discovery Team did in TREC 2015.

e-Discovery Team Members

The Team is made up of three attorneys — Ralph Losey, Jim Sullivan, and Tony Reichenberger – and one “robot,” Mr. EDR. The Team members are all attorneys who specialize in legal search and review of Electronically Stored Information (ESI). They are experts in the search for evidence in large unorganized data collections, such as corporate email, texts messages, and social media. Their work as search attorneys is typically performed in the context of legal proceedings. Their work includes the search, review, analysis, classification, production and admission of ESI as evidence in courts in the Unites States and other countries.


The Team leader is Ralph C. Losey, J.D., a practicing attorney, shareholder and National e-Discovery Counsel of Jackson Lewis P.C., a U.S. law firm with over 800 attorneys and fifty-five offices. He has over 35 years of legal experience, including document reviews. Losey is also a speaker and blogger at e-DiscoveryTeam.com where he has written over a million words on e-discovery. The past four years Losey has participated in multiple public and private experiments testing predictive coding methods and has written over fifty articles on the subject.

Jim Sullivan, J.D. (shown left above) and Tony Reichenberger, J.D. (shown right above) are attorney search and review specialists employed by Kroll Ontrack, Inc. That is the primary e-discovery vendor used by Losey and his law firm. Kroll Ontrack is a global e-Discovery software, processing and project management company (eDiscovery.com). The Team robot, Mr. EDR, is the Team’s personalization of Kroll Ontrack’s software, eDiscovery.com Review (EDR). The human Team members relied heavily on Mr. EDR’s active machine learning features, including his ranking and probable relevance predictions. They think of him as a fifth team member.

Research Questions Considered at TREC 2015 Recall Track

gavelBackground to questions considered: It is generally accepted in the legal search community that the use of predictive coding type search algorithms can improve the search and review of documents in legal proceedings.[1] The use of predictive coding has also been approved, and even encouraged by various courts around the world, including numerous courts in the U.S.[2]

Although there is agreement on use of predictive coding, there is controversy and disagreement as to the most effective methods of use.[3] There are, for instance, proponents for a variety of different methods to find training documents for predictive coding. Some advocate for the use of chance selection alone, others for the use of top ranked documents alone, others for a combination of top ranked and mid-level ranked documents where classification is unsure, and still others, including Losey, call for the use of a combination of all three of these selection processes and more.[4]

Many attorneys and predictive coding software vendors advocate the use of predictive coding search methods alone, and forego other search methods when they do so, such as keyword search, concept searches, similarity searches and linear review. e-Discovery Team members reject that approach and instead advocate for a hybrid multimodal approach they call Predictive Coding 3.0.[5]

The e-Discovery Team approach includes all types of search methods, with primary reliance placed on predictive coding. The Team also uses a multimodal selection of methods within the context of predictive coding to locate suitable training documents, including high ranking documents, some mid-level ranked uncertain documents, and all other search methods, keyword search, etc. 

Three Research Questions:

  1. Primary Question: What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all 30 Topics using the Team’s Predictive Coding 3.0 hybrid multimodal search methods and Kroll Ontrack’s software, com Review (EDR).
  2. How will the Team’s results using its semi-automated, supervised learning method compare with other AtHome Recall Track participants.
  3. What are the ideal ratios, if any, for relevant and irrelevant training examples to maximize effectiveness of active machine learning with EDR.

Overview Of Team Participation in 2015 TREC Recall Track

The e-Discovery Team participated in all thirty of the Recall Track Athome topics. It did not participate in the fully automated TREC Recall sandbox. The thirty topics were divided into three sets, each searching a different collection of documents.

The first set of Athome Topics searched a corpus of emails of Florida Governor Jeb Bush. There were 290,099 emails in the Jeb Bush Email collection. Almost all metadata of these emails and associated attachments and images had been stripped and converted to pure text files. The ten topics searched, and their names, which were the only guidance on relevance initially provided, are shown below.

Topic 100 School and Preschool Funding
Topic 101 Judicial Selection
Topic 102 Capital Punishment
Topic 103 Manatee Protection
Topic 104 New Medical Schools
Topic 105 Affirmative Action
Topic 106 Terri Schiavo
Topic 107 Tort Reform
Topic 108 Manatee County
Topic 109 Scarlet Letter Law

Ralph Losey personally searched each of these ten Topics. In all of these Topics he was at least somewhat assisted by a single “contract review attorney.”[6] The contract attorneys in most of these ten Topics did a majority of the document review, but had only limited involvement in initial keyword searches, and no involvement in predictive coding searches or related decisions.

All participants in the 2015 Recall Track were required to complete all ten of the Bush Email Topics. Completion of the other twenty Topics in two other data collections was optional.

The second set of Athome Topics searched a corpus of posts taken from Black Hat World Forums. There were 465,149 posts in that collection. Again, almost all metadata of these posts and associated images had been stripped and converted to pure text files. The ten topics searched, and their names, which again were the only guidance initially provided on relevance, are shown below.

Topic 2052 Paying for Amazon Book Reviews
Topic 2108 CAPTCHA Services
Topic 2129 Facebook Accounts
Topic 2130 Surely Bitcoins can be Used
Topic 2134 PayPal Accounts
Topic 2158 Using TOR for Anonymous Internet Browsing
Topic 2225 Rootkits
Topic 2322 Web Scraping
Topic 2333 Article Spinner Spinning
Topic 2461 Offshore Host Sites

The third set of Athome Topics searched a corpus of 902,434 online Local News Articles, again in text only format. The ten topics searched, and their names, which again were the only guidance initially provided on relevance, are shown below.

Topic 3089 Pickton Murders
Topic 3133 Pacific Gateway
Topic 3226 Traffic Enforcement Cameras
Topic 3290 Rooster Turkey Chicken Nuisance
Topic 3357 Occupy Vancouver
Topic 3378 Rob McKenna Gubernatorial Candidate
Topic 3423 Rob Ford Cut the Waist
Topic 3431 Kingston Mills Lock Murders
Topic 3481 Fracking
Topic 3484 Paul and Cathy Lee Martin

The Team found the News Articles less difficult to work with than our typical legal search of corporate ESI.

 Summary of the Team’s 45-Day Effort

MrEdr_CapedThe e-Discovery Team’s 2015 Recall Track Athome project started July 14, 2015, and concluded on August 28, 2015. Using both one and two-man attorney review teams the e-Discovery Team classified 17,014,085 documents in 30 different review projects. That is an average of 378,091 documents classified per day for 45 days, with the average project completed in 1.5 days.

Ten of the 30 topics were performed by Ralph Losey with a single assistant, a Kroll Ontrack contract review attorney. The other 20 topics were performed by a single attorney Team member, either Ralph Losey, Jim Sullivan, Tony Reichenberger, or Joe White.

All attorneys used the e-Discovery Team’s Predictive Coding 3.0 hybrid multimodal search techniques and were assisted by Mr. EDR. They relied on active machine learning and other search techniques to find relevant documents and effective training documents. The various types of searches included in the Team’s multimodal approach are shown in the search pyramid, below.

Multimodal Search Pyramid

The standard eight-step workflow used by the Team in legal search projects is shown in the diagram below.[7] To meet the time requirements of completing every review project in 1.5 days, the standard steps Three and Seven were omitted or severely constrained as will be further explained.

Predictive Coding Search diagram by Ralph Losey

The first step of ESI Discovery Communications also had to be drastically changed to fit the format of the Recall Track Athome experiments. The only communication under the TREC protocol was a very short, often just two-word description of relevance, plus instant feedback in the form or yes or no responses as to whether particular documents submitted were relevant. In the e-Discovery Team’s typical workflow discovery communications typically involve: (1) detailed requests for information contained in court documents such a subpoenas or Request For Production; (2) input from a qualified SME, who is typically a legal expert with deep knowledge of the factual issues in the case, and thus deep knowledge of what the presiding judge in the legal proceeding will hold to be relevant and discoverable; and, (3) dialogues with the party requesting the production of documents to clarify the search target, other parties, and the client.

vulcan-mind-meldThe Team never receives a request for production with just two or three word descriptions, as encountered in the TREC experiments. When the Team does receive longer, but still vague requests, which are common, the Team seeks clarification in discussions (Step One). In practice if there is disagreement as to relevance between the parties, which is also common, the presiding judge is brought in by motion practice to make final relevance rulings. Again, none of this was possible in the TREC experiments.

All of our usual practices in Step One had to be adjusted to the submissions format of the 30 Athome Topics. The most profound impact of these adjustments is that the four attorneys on the Team often lacked a clear understanding as to the intended scope of relevance and the rationale behind the automated TREC relevance rulings on particular documents. These protocol changes had the impact of minimizing the importance of the SME role on the active machine learning process. Instead, this role was often shifted almost entirely to the analytics of Mr. EDR, who could often see patterns, and correctly predict relevance, beyond the grasp of the human attorney reviewers. This is not common in legal search where the attorney reviewers always have some sort of understanding of relevance and the role of the SME is important.

working_hardRalph Losey was in charge of all e-Discovery Team efforts in all 30 Topics. Losey personally did the search and review work on ­­­17 Topics, including all 10 of the mandatory Athome1 Bush Emails Topics. Losey was assisted in the 10 Bush Email topics by a single Kroll Ontrack contract review attorney. Ann Bottolene, a contract attorney, assisted in 4 of these Topics, and David Jensen, Review Team Manager, Kroll Ontrack, acted as a contract attorney to assist in 6 Topics. The remaining 13 Athome Topics were completed by Kroll Ontrack’s Jim Sullivan, who did the review work on 8 topics; Tony Reichenberger on 4; and, Joe White on 1.

Short Answers to Research Questions

Research Question # 1 (Primary Question): What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all 30 Topics using the Team’s Predictive Coding 3.0 hybrid multimodal search methods and Kroll Ontrack’s software, eDiscovery Review (EDR).

Short Answer to Primary Question: The TREC measured results demonstrated high levels of Recall and Precision with relatively little human review efforts using the e-Discovery Team’s methods and Mr. EDR. Four attorneys and one robot were able to review and classify 17,014,085 documents in 45 days under TREC test conditions. They attained total Recall of all relevant documents in all 30 Topics by human review of only 32,916 documents. They did so with two-attorney teams in the 10 Bush Email Topics, and one-attorney teams in the 20 other Topics. In Topic 3484, 100% Recall was attained with 100% Precision. This is something considered impossible in a large search project, such as this one, where Topic 3484 searched a collection of 902,434 Local News Articles. On many other Topics the Team attained near perfection scores. In total, very high scores were recorded in 18 of the 30 topics with good results obtained in all, especially when considering the low human efforts involved in the supervised learning. This initial report will include a Full Answer on this first research question.

Research Question # 2: How will the Team’s results using its semi-automated, supervised learning method compare with other AtHome Recall Track participants.

Short Answer: We are still studying the results. The Final Report will include an analysis of these results.

Research Question # 3: What are the ideal ratios, if any, for relevant and irrelevant training examples to maximize effectiveness of active machine learning with EDR.

Short Answer: We are still studying the results and performing further experiments to answer questions on the relative efficacy of various training ratios, primarily the balance between positive and negative training documents. The Final Report will include an analysis of this question. What we can say at the current time is that training using mr. EDR does not function effectively if only relevant documents are used. A certain number of irrelevant training documents must be included in the training.



[1] Predictive Coding is defined by The Grossman-Cormack Glossary of Technology-Assisted Review,  2013 Fed. Cts. L. Rev. 7 (January 2013) (Grossman-Cormack Glossary) as: “An industry-specific term generally used to describe a Technology Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s) Coding of a Training Set of Documents. ” A Technology Assisted Review process is defined as: “A process for Prioritizing or Coding a Collection of electronic Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of Documents and then extrapolates those judgments to the remaining Document Collection. … TAR processes generally incorporate Statistical Models and/or Sampling techniques to guide the process and to measure overall system effectiveness.” Also see:  Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, Vol. XVII, Issue 3, Article 11 (2011).

[2] Da Silva Moore v. Publicis Groupe 868 F. Supp. 2d 137 (SDNY 2012) and numerous cases later citing to and following this landmark decision by Judge Andrew Peck.

[3] Grossman & Cormack, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery, SIGIR’14, July 6–11, 2014; Grossman & Cormack,  Comments on “The Implications of Rule 26(g) on the Use of Technology-Assisted Review”, 7 Federal Courts Law Review 286 (2014); Herbert Roitblat, series of five OrcaTec blog posts (1, 2, 3, 4, 5), May-August 2014; Herbert Roitblat, Daubert, Rule 26(g) and the eDiscovery Turkey OrcaTec blog, August 11th, 2014; Hickman & Schieneman, The Implications of Rule 26(g) on the Use of Technology-Assisted Review, 7 FED. CTS. L. REV. 239 (2013); Losey, R. Predictive Coding 3.0, part one (e-Discovery Team 10/11/15).

[4] Id.; Webber, Random vs active selection of training examples in e-discovery (Evaluating e-Discovery blog, 7/14/14).

[5] Losey, R., Predictive Coding 3.0, part two (e-Discovery Team, 10/18/15).

[6] “Contract review attorney,” or simply “contract attorney,” is a term now in common parlance in the legal profession to refer to licensed attorneys who do document review on a project-by-project basis. Their pay under a project contract is usually by the hour and is at a far lower rate than attorneys in a law firm, typically only $55 to $75 per hour. Their only responsibility is to review documents under the direct supervision of law firm attorneys who have much higher billing rates.

[7] Losey, R., Predictive Coding 3.0, part two (e-Discovery Team, 10/18/15) contains a complete description of all eight steps.


Comments are closed.

%d bloggers like this: