Announcing a New Page to the e-Discovery Team Blog: MR. EDR

September 25, 2015

Blue_Lexie_robot_blackA new page (shown in the top margin) has been added called Mr. EDR. It introduces and explains the blog’s new mascot, a robot, whose full name is Lexi EDR. The page was written by Lexi and so is a self-introduction. You may remember that he used to hangout at ITLex, but after the foundation closed, he joined the e-Discovery Team. He got his last name last month as part of the e-Discovery Team’s efforts at the 2015 TREC Total Recall track. The whole story is explained on the Mr. EDR Page.

This new blog Page is also the place you will find Lexi describe the Team’s participation in NIST’s 2015 TREC Total Recall Track. The work of the Team is still underway, so you will not find any detailed reports yet, but you will find a general introduction. More information will be made available by the end of the year, or early next year, when we publish our official Final Report.

We can definitely tell you now that the Team was challenged, which we like, and the humans involves all learned a great deal. So too did Mr. EDR. I would highly recommend that any professional legal search expert participate in TREC. Hopefully they will run the Recall Track again in 2016. It was been a very worthwhile and well run experience. It is also a safe place to run public experiments on search. As TREC explains:

The annual Text REtrieval Conference (TREC) is an event in which organizations with an interest in information retrieval research take part in a coordinated series of experiments using the same experimental data. The goal of the conference series is to create the infrastructure necessary for large-scale evaluation of research retrieval systems and thereby foster research into effective techniques for information access.

By design, TREC is explicitly not a venue for commercial product tests (i.e., benchmark comparisons). A valid, informative vendor test requires a level of control in task definition and system execution that is counter to the scientific research goals of TREC. Insofar as TREC participants do the same task, the results from different participating teams are comparable, but interpretation of what those results actually represent may vary widely. For example, commercial participants may submit results from research prototype systems rather than their production system, or participants may deliberately degrade one aspect of their system to focus on another aspect.

To preserve the desired, pre-competitive nature of the TREC conferences, TREC requires all participants to sign and abide by an agreement concerning the dissemination and publication of TREC evaluation results. The guidelines embodied in the agreement are meant to preclude the publication of incomplete or inaccurate information that could damage the reputation of the conference or its participants. In particular, the agreement prohibits any advertising based on TREC results and sharply curtails the use of TREC results in marketing literature.

NIST and the TREC program committee are strongly committed to the ethos of cooperation the guidelines are designed to engender, but cannot accept responsibility for performance claims made by participants in violation of the agreement. TREC reserves the right to prohibit violators from participating in future TREC conferences.

The e-Discovery Team, including Mr. EDR, will, of course, strictly abide by and follow these guidelines. We are confident that all other participants will too.

Short Blog this Month Because I’m Busy at TREC

August 1, 2015

Ralph_TwitterThis month’s blog will be short and sweet, for TREC reasons, as I will explain. (TREC, by the way, stands for Text REtrieval Conference, and is sponsored by NIST, the National Institute of Science and Technology). Since I am tied up participating in TREC right now, I have no time to write my usual long monthly blog. But, I do have time to continue my almost daily micro-blogging, in other words, my tweets. I invite you to tune into these micro-blogs, and click on follow me on Twitter. What? Don’t yet have a Twitter account? Well then it is about time you did. It is free, easy to sign up and easy to use. It is my favorite social media.

twitter-bird-calloutOn Twitter you can see my latest thoughts about this and that, mostly e-discovery, tech-law, and cybersecurity. Sometimes I will go off topic and just share random things I find interesting or funny. Best yet, it is never more than 140 characters. It is all quick and concise pointers and mini thoughts. Like most everyone else I often include photos (worth a thousand words), but the new thing now is to include short videos. (Coming soon.) I like to check in with all the people and publishers that I follow and read almost every day, and I like to share my thoughts and interesting web-finds on Twitter. I find it relaxing and fun. Typically my tweets will include links to articles on the web that I find interesting. I do not tweet about personal stuff, nor follow anyone who does. Nothing wrong with that I suppose, just not my style.

If you are a reader here, I think you will like Twitter, and the mini-blogs I post there several times a week, sometimes several times a day. So drink the cool-aide, follow me @ralphlosey. Hopefully I will inform and even make you smile from time to time. I have about 2,982 followers. Come join the fun and get yourself a free Twitter account.

Twitter_growth_animatedI can get boring of course, but no worries, there are plenty of other people on Twitter who tweet about subjects you care about. Most are far more interesting than me and their numbers are growing every day. Right now there are 304 million monthly active users. I follow 134 people, for instance, which is a relatively small number since most of them do not post much. (You can see who I follow by going to my page at  The people I follow include a few select news organizations, scientists, and researchers, and even a few interesting CEOs, such as Elon MuskBill GatesGregory Bufithis and John Tredennick. No, I don’t follow celebrities, or sports heroes, but you can if you want. There are 500 million tweets sent per day, which is about 6,000 tweets per second, and it is all searchable. There is something for everyone. And yes, you can find evidence there too.

TREC: Why I’m Too Busy to Write My Usual Lengthy Blog This Month

nistI’m too busy to write my usual long blog this month (the opposite of my tweets) because all of my free time is being eaten up by participation in this year’s TREC. Yes, I am totally preoccupied as a participant in 2015 TREC, the Recall Track, which, as you probably know, is the successor to the prior years Legal Track. I have reported here on TREC many times in the past. See eg: Analysis of the Official Report on the 2011 TREC Legal Track – Part OnePart Two, and Part Three;  The Legal Implications of What Science Says About Recall. This is, however, the first time I have participated in TREC. (I was enrolled in  TREC 2012, but the Legal Track was cancelled until this year’s TREC, in 2015.) Of course, I hope to make a good showing. Unlike my other research projects, including EDI’s Oracle study, the results on this will be public.

hypercube_predictive_codingThe idea of TREC, like the EDI Oracle study, is not to compete, and see who is the best searcher in the world, but to learn and experiment. The point of science is the extension of knowledge. It is not a sport with winners and losers. Still, anybody (or corporation, include e-discovery vendors) always wants to know how well their kung-fu compares with others. That adds to the excitement of a world-wide public experiment of various e-discovery search software and search methods. Be it under the governance of the Electronic Discovery Institute and Oracle, or that of the U.S. government’s National Institute of Science and Technology, the chances to test and compare your relative search expertise are few and far between. It is amazing how few self-proclaimed search experts and vendors are willing to participate. You can draw your own conclusions.

This year’s TREC group experiment in Recall, which is the point of legal search, is headed up by Gordon Cormack and Maura Grossman. Judge Peck’s recent opinion in Rio Tinto appointing Maura Grossman as Special Master in that case, made clear his opinion as to her qualifications in predictive coding:

Ms. Grossman is one of the most knowledgeable lawyers (if not the most knowledgeable lawyer) about TAR, having supervised the National Institute of Standards and Technology’s TREC (text retrieval conference) Legal Track, as well as having conducted studies and written papers on TAR issues with computer science Professor Gordon Cormack.

(8/10/2010) – James Brosher/AMERICAN-STATESMAN – Home base umpire Mike Lusky signals safe as Express first baseman Brian Bogusevic (23) slides into home plate, scoring Round Rock's first run in a game against Sacramento at Dell Diamond on Tuesday, Aug. 10, 2010. 0811expressI agree with Judge Peck’s conclusion (well, I might quibble with the “the most knowledgeable lawyer” comment, but that would be my own vanity talking). In TREC 2015,  Maura Grossman, JD, PhD, and Professor Gordon Cormack, serve as the supervisors and judges of the TREC Recall Track. Judge Peck got all of that right. It is a lot of work, and their role as test-designers and umpires proves their qualifications. Of course, the participants in TREC Recall Track are not chopped liver either. I am proud to be one of a few attorneys willing to put my name out there as a player in TREC. Grossman and Cormack have done a good job to set up an interesting game for everyone to test their skills and software. All who cower to participate in TREC have surely missed out. Whether umpires are better at baseball than the players is beside the point. There is room for both in the Hall of Fame.

quality_triangleTREC is my chance to prove what I have been saying about predictive coding in the fifty articles I have written on the subject over the past four years. The challenge at this year’s TREC is to perform thirty search projects, all dreamed up by Grossman and Cormack, and, in my case, to complete these thirty search projects in a little over five weeks. (They give us a little longer than that, but that is all the time I have, and even in these five weeks, I’m still holding down my day job, albeit at a reduced summer schedule.) Unlike EDI, which I did alone, at TREC I have a small team from my favorite vendor to help me, and, of course, their top of the line software. Hundreds of contract review lawyers are not needed, just great software and the right hybrid, multimodal AI-enhanced search techniques.



Team_TriangleIt certainly remains to be seen how my e-Discovery Team of legal search and review experts will do this year. TREC is ongoing as we speak, and does not end until August 31, 2015. Still, rest assured dear readers, I will do my best to try to excel. My team is up against a strong international field. Who knows how well we will do against the many search experts from all over the world who are participating in this year’s TREC. Maybe U.S. lawyers like me will do well, maybe not. But win, lose or draw, I will report it here. I will also submit an official paper to TREC.

predictive.coding_CALTREC is not about winning or proving you are the most knowledgeable, or not, it is about science. Although we have only been working on TREC for two weeks now, I have already devised several new twists and techniques to improve my standard hybrid, multimodal AI-enhanced document reviews. The search challenges presented are giving me several new ideas for more improvements. We are testing and learning new things on a daily basis, trying out various software techniques, supplements, and even new software features. This is all exciting and very challenging work. The thirty document review projects we are running now, almost all at once, are using methods far beyond, and in some ways departing from, the standard 8-step workflow shown above and in my other writings. There is far more to the latest art and science of predictive coding then simply adding continuous training, the so-called CAL techniques. I will elaborate on some of this in future writings, at least in a general way, but do not have time now.

Gold_Lexie_robotLet me close by making clear that TREC is not a competition and is not about winning. It is about science and advancing the field of knowledge of search, of information retrieval. It is about conducting research to improve your technologies and methods. What really matters in TREC is learning from the intriguing experiments that NIST sponsors each year. Everyone who dares to participate is a winner because they will learn from these search challenges. Those who sit on the sidelines can advertise and market their self-proclaimed excellence in legal search all they want. But, as they say, talk is cheap. Be bold. Prove your claims. I say, walk your talk. Spend the time and effort needed in basic research and open comparisons. It is too late to join this year’s TREC, but consider doing so next year. Then, and only then, will you have credibility in my book.


Get every new post delivered to your Inbox.

Join 4,270 other followers