This month’s blog will be short and sweet, for TREC reasons, as I will explain. (TREC, by the way, stands for Text REtrieval Conference, and is sponsored by NIST, the National Institute of Science and Technology). Since I am tied up participating in TREC right now, I have no time to write my usual long monthly blog. But, I do have time to continue my almost daily micro-blogging, in other words, my tweets. I invite you to tune into these micro-blogs, and click on follow me on Twitter. What? Don’t yet have a Twitter account? Well then it is about time you did. It is free, easy to sign up and easy to use. It is my favorite social media.
On Twitter you can see my latest thoughts about this and that, mostly e-discovery, tech-law, and cybersecurity. Sometimes I will go off topic and just share random things I find interesting or funny. Best yet, it is never more than 140 characters. It is all quick and concise pointers and mini thoughts. Like most everyone else I often include photos (worth a thousand words), but the new thing now is to include short videos. (Coming soon.) I like to check in with all the people and publishers that I follow and read almost every day, and I like to share my thoughts and interesting web-finds on Twitter. I find it relaxing and fun. Typically my tweets will include links to articles on the web that I find interesting. I do not tweet about personal stuff, nor follow anyone who does. Nothing wrong with that I suppose, just not my style.
If you are a reader here, I think you will like Twitter, and the mini-blogs I post there several times a week, sometimes several times a day. So drink the cool-aide, follow me @ralphlosey. Hopefully I will inform and even make you smile from time to time. I have about 2,982 followers. Come join the fun and get yourself a free Twitter account.
I can get boring of course, but no worries, there are plenty of other people on Twitter who tweet about subjects you care about. Most are far more interesting than me and their numbers are growing every day. Right now there are 304 million monthly active users. I follow 134 people, for instance, which is a relatively small number since most of them do not post much. (You can see who I follow by going to my page at Twitter.com/RalphLosey.) The people I follow include a few select news organizations, scientists, and researchers, and even a few interesting CEOs, such as Elon Musk, Bill Gates, Gregory Bufithis and John Tredennick. No, I don’t follow celebrities, or sports heroes, but you can if you want. There are 500 million tweets sent per day, which is about 6,000 tweets per second, and it is all searchable. There is something for everyone. And yes, you can find evidence there too.
TREC: Why I’m Too Busy to Write My Usual Lengthy Blog This Month
I’m too busy to write my usual long blog this month (the opposite of my tweets) because all of my free time is being eaten up by participation in this year’s TREC. Yes, I am totally preoccupied as a participant in 2015 TREC, the Recall Track, which, as you probably know, is the successor to the prior years Legal Track. I have reported here on TREC many times in the past. See eg: Analysis of the Official Report on the 2011 TREC Legal Track – Part One, Part Two, and Part Three; The Legal Implications of What Science Says About Recall. This is, however, the first time I have participated in TREC. (I was enrolled in TREC 2012, but the Legal Track was cancelled until this year’s TREC, in 2015.) Of course, I hope to make a good showing. Unlike my other research projects, including EDI’s Oracle study, the results on this will be public.
The idea of TREC, like the EDI Oracle study, is not to compete, and see who is the best searcher in the world, but to learn and experiment. The point of science is the extension of knowledge. It is not a sport with winners and losers. Still, anybody (or corporation, include e-discovery vendors) always wants to know how well their kung-fu compares with others. That adds to the excitement of a world-wide public experiment of various e-discovery search software and search methods. Be it under the governance of the Electronic Discovery Institute and Oracle, or that of the U.S. government’s National Institute of Science and Technology, the chances to test and compare your relative search expertise are few and far between. It is amazing how few self-proclaimed search experts and vendors are willing to participate. You can draw your own conclusions.
This year’s TREC group experiment in Recall, which is the point of legal search, is headed up by Gordon Cormack and Maura Grossman. Judge Peck’s recent opinion in Rio Tinto appointing Maura Grossman as Special Master in that case, made clear his opinion as to her qualifications in predictive coding:
Ms. Grossman is one of the most knowledgeable lawyers (if not the most knowledgeable lawyer) about TAR, having supervised the National Institute of Standards and Technology’s TREC (text retrieval conference) Legal Track, as well as having conducted studies and written papers on TAR issues with computer science Professor Gordon Cormack.
I agree with Judge Peck’s conclusion (well, I might quibble with the “the most knowledgeable lawyer” comment, but that would be my own vanity talking). In TREC 2015, Maura Grossman, JD, PhD, and Professor Gordon Cormack, serve as the supervisors and judges of the TREC Recall Track. Judge Peck got all of that right. It is a lot of work, and their role as test-designers and umpires proves their qualifications. Of course, the participants in TREC Recall Track are not chopped liver either. I am proud to be one of a few attorneys willing to put my name out there as a player in TREC. Grossman and Cormack have done a good job to set up an interesting game for everyone to test their skills and software. All who cower to participate in TREC have surely missed out. Whether umpires are better at baseball than the players is beside the point. There is room for both in the Hall of Fame.
TREC is my chance to prove what I have been saying about predictive coding in the fifty articles I have written on the subject over the past four years. The challenge at this year’s TREC is to perform thirty search projects, all dreamed up by Grossman and Cormack, and, in my case, to complete these thirty search projects in a little over five weeks. (They give us a little longer than that, but that is all the time I have, and even in these five weeks, I’m still holding down my day job, albeit at a reduced summer schedule.) Unlike EDI, which I did alone, at TREC I have a small team from my favorite vendor to help me, and, of course, their top of the line software. Hundreds of contract review lawyers are not needed, just great software and the right hybrid, multimodal AI-enhanced search techniques.
It certainly remains to be seen how my e-Discovery Team of legal search and review experts will do this year. TREC is ongoing as we speak, and does not end until August 31, 2015. Still, rest assured dear readers, I will do my best to try to excel. My team is up against a strong international field. Who knows how well we will do against the many search experts from all over the world who are participating in this year’s TREC. Maybe U.S. lawyers like me will do well, maybe not. But win, lose or draw, I will report it here. I will also submit an official paper to TREC.
TREC is not about winning or proving you are the most knowledgeable, or not, it is about science. Although we have only been working on TREC for two weeks now, I have already devised several new twists and techniques to improve my standard hybrid, multimodal AI-enhanced document reviews. The search challenges presented are giving me several new ideas for more improvements. We are testing and learning new things on a daily basis, trying out various software techniques, supplements, and even new software features. This is all exciting and very challenging work. The thirty document review projects we are running now, almost all at once, are using methods far beyond, and in some ways departing from, the standard 8-step workflow shown above and in my other writings. There is far more to the latest art and science of predictive coding then simply adding continuous training, the so-called CAL techniques. I will elaborate on some of this in future writings, at least in a general way, but do not have time now.
Let me close by making clear that TREC is not a competition and is not about winning. It is about science and advancing the field of knowledge of search, of information retrieval. It is about conducting research to improve your technologies and methods. What really matters in TREC is learning from the intriguing experiments that NIST sponsors each year. Everyone who dares to participate is a winner because they will learn from these search challenges. Those who sit on the sidelines can advertise and market their self-proclaimed excellence in legal search all they want. But, as they say, talk is cheap. Be bold. Prove your claims. I say, walk your talk. Spend the time and effort needed in basic research and open comparisons. It is too late to join this year’s TREC, but consider doing so next year. Then, and only then, will you have credibility in my book.