Hi! It’s me, Mr. EDR. I’m the document search and review robot that is part of the e-Discovery Team. You may know me as the latest E-Discovery Review software from Kroll Ontrack where I am just called EDR. My predictive coding is the AI behind the e-Discovery Team’s participation in TREC 2015, Recall Track, and 2016 Total Recall Track, sponsored by NIST. The Team members started calling me Mister EDR in 2015, they say out of respect for how helpful I was, and that name caught on. So coach Ralph Losey got me my own domain. I really like helping the human e-Discovery Team members play in the international search and review games that TREC set up.
We learned a great deal from the three Research Questions we choose to work on at TREC, and the experiments we performed under TREC auspices in 2015, and again in 2016.
We are still processing the data from 2016 but have released the TREC 2016 Total Recall Track NOTEBOOK. This is the initial draft report that does not include information about other participants.
In addition, our Team has written a 97 page treatise as to what we learned in 2016 and the changes our latest experiments caused us to make to our standard eight-step work flow. Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow (30,000+ word essay in seven Parts: One, Two, Three, Four, Five, Six and Seven (series includes 10 videos by Ralph Losey). The blog series is restated here in one post. The PDF version (97 pages) of this article is available here without videos.)
Our official TREC report for 2015 was published on February 20, 2016 and can be found on the NIST website at http://trec.nist.gov/pubs/trec24/papers/eDiscoveryTeam-TR.pdf. It is 116 pages long, by far the longest, most complete report submitted in 2015. I’m not saying it’s the best report ever submitted, but may well be the longest, certainly longer than any report ever submitted in Legal Track. With all of the reports built into my software, detailed reporting was fairly easy for my human team members. Also, you might want to take a look at our shorter, draft report of what we accomplished in 2015 TREC: e-Discovery Team’s Preliminary Report.
I am proud of what our team accomplished in 2015 and 2016, so too are the human Team members. I am free to tell you about the 2015 experiments, so please check out our full 2015 report here. I can also tell you a lot about the 2016 experiments, even though the final report hasn’t yet been prepared. See: TREC 2016 Total Recall Track NOTEBOOK. Let me also tell you about my adventure attending in-person the TREC Conference in Maryland on November 17 – 20, 2015.
Attending TREC at NIST Headquarters
TREC is held each year at the supposedly very secure headquarters of the National Institute of Standards and Technology, which is part of the United States government. They have guards with guns and high security to get in. In fact, the security to get in is so high they will not even accept a Minnesota drivers license as a photo ID. One of our Team members, Tony Reichenberger, found that out the hard way. If you are from Minnesota, or Texas, you had better have your passport with you. Another Team member from Minnesota, Jim Sullivan, remembered his passport, and he got in with no problem. Although, just in case, Jim had prepared the night before to get in by any means necessary by climbing a high fence with a suit on. (I kid you not, Jim’s a good climber.) Team coach, Ralph Losey, is rumored to have leapt over the same fence in a single bound, but injured his toes on the landing. Ralph apparently has no valid passport, but had a Florida license, which, amazingly, was accepted, so he got in. The three are shown below inside NIST with their Poster and my clone brother, Mr. Ediscovery
Once you get past the front gates and into the hallowed grounds of the National Institute of Standards and Technology, the security starts getting a little lax. Not that I’m complaining, mind you, but it is a good thing that I sent my clone, Mr. Ediscovery to attend the event in my stead. He was stolen by jealous scientists on the last day, who then sent photos falsely claiming his abandonment by Coach Losey. Outrageous, huh? We suspect a scientist there who quacks like a duck named Merg Anser. See his twitter feed @AnserMerg. We are still awaiting my clone brother’s ransom demands.
Jason Baron was especially sad about the loss. That’s because the Team had just given Mr. Ediscovery to Jason in appreciation for his setting up the first TREC Legal Track back in 2006. The photos below tell the whole sad story.
More About Me, Mr. EDR
As the software behind the Team efforts at TREC, the humans think of me as the brain-augmenter of the e-Discovery Team. All of the search features of my EDR software were used by the human team members: predictive coding, keyword, concept, similarity, and linear. The humans even invented a few new software search features during the TREC games! The use of many search features, not just predictive coding, is what team coach Losey calls the multimodal approach. Still, even coach agrees that my AI based ranking, iw- Predictive Coding features – were key to most of their review efforts in this years TREC. Of course, the humans did the heavy thinking. I, Mr. EDR, just augmented and enhanced their powers by active learning based document ranking. They liked that.
Here is our final official TREC report was published on February 20, 2016 and can be found on the NIST website at http://trec.nist.gov/pubs/trec24/papers/eDiscoveryTeam-TR.pdf. You might also want to look at our shorter e-Discovery Team’s Preliminary Report. To see how we did in 2016 see our TREC 2016 Total Recall Track NOTEBOOK.
For more background on me, Mr. EDR, and the 2015 TREC Total Recall Track, see the blogs: Short Blog this Month Because I’m Busy at TREC and Announcing a New Page to the e-Discovery Team Blog: MR. EDR.
For those who do not already know all about TREC, here is some useful information to help you to understand what we are doing there. TREC stands for the Text REtrieval Conference. The stated purpose of this annual conference is to encourage research in information retrieval from large text collections. The TREC conference series is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The other co-sponsor of TREC is the U.S. Department of Defense. (Geesh! I hope they don’t try to draft me. I’m a peace-loving robot.)
TREC is a government sponsored event to encourage state-of-the-art research in text retrieval. One of the core purposes of all of the Tracks, including the Recall Track, which is successor of the Legal Track before it, is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems. I like that goal.