Team’s TAR Course has been Updated and Expanded

May 21, 2017

I was in Manhattan at the Marriott on Times Square this week presenting on advanced TAR with Jim Sullivan. We provided an overview on the Team’s latest methods, Hybrid Multimodal IST Predictive Coding 4.0. Presentations like this allow you to interact with students and refine your approach. It was a good group, as is typical in NYC, of lawyers and litigation support experts.

I left N.Y. on Wednesday night, well before the madman drove his car into Times Square on Thursday. Jim stayed overnight and walked out of Times Square just minutes before this horrible massacre. My condolences to the family of the nineteen year old girl who was killed and the twenty-two other pedestrians who were injured.

When I got home from NYC I condensed our eighty-five minute presentation into a thirty-seven minute video. It now serves as the core video introduction to the e-Discovery Team’s free TAR Course. It is found in the first of the sixteen classes in the Course. I also revised and improved the wording in the Welcome Page of the course and made it a stand alone entry point. To have a little more fun with all of this I also created a new graphic, shown below. It provides a visualization of the core content of the TAR Course. Click on it to see a larger view.

This TAR Course welcome page now has its own written and video content. That material used to be combined with the first class. So the net result is an expansion of the TAR Course from sixteen to seventeen modules. At this point the first eleven classes have “Homework Assignments” at the end with suggested supplemental readings and analytic challenges. We will be adding homework to the last five classes in the next month.

If you have already studied the first class, I urge you to go back and reread and re-view the writings and new three-part video. I also reproduce the same new video below. I call it an overview video, a first tell, but it contains advanced materials and some of my latest thinking.

_____

_____

_____

_____

 

Go to the TAR Course.



Announcing the e-Discovery Team’s TAR Training Program: 16 Classes, All Online, All Free – The TAR Course

March 19, 2017

We launch today a sixteen class online training program on Predictive Coding: the e-Discovery Team TAR Course. This is a “how to” course on predictive coding. We have a long descriptive name for our method, Hybrid Multimodal IST Predictive Coding 4.0. By the end of the course you will know exactly what that means. You will also understand the seventeen key things you need to know to do predictive coding properly, shown this diagram.


Hands-on
 hacking of predictive coding document reviews has been my obsession since Da Silva went viral. Da Silva Moore v. Publicis Groupe & MSL Group, 27 F.R.D. 182 (S.D.N.Y. 2012). That is the case where I threw Judge Peck the softball opportunity to approve predictive coding for the first time. See: Judge Peck Calls Upon Lawyers to Use Artificial Intelligence and Jason Baron Warns of a Dark Future of Information Burn-Out If We Don’t

Alas, because of my involvement in Da Silva I could never write about it, but I can tell you that none of the thousands of commentaries on the case have told the whole nasty story, including the outrageous “alternate fact” attacks by plaintiff’s counsel on Judge Andrew Peck and me. I guess I should just take the failed attempts to knock me and the Judge out of the case as flattery, but it still leaves a bad taste in my mouth. A good judge like Andy Peck did not deserve that kind of treatment. 

At the time of Da Silva, 2012, my knowledge of predictive coding was mostly theoretical, informational. But now, after “stepping-in” for five years to actually make the new software work, it is practical. For what “stepping-in” means see the excellent book on artificial intelligence and future employment by Professor Thomas Davenport and Julia Kirby, titled Only Humans Need Apply (HarperBusiness, 2016). Also see: Dean Gonsowski, A Clear View or a Short Distance? AI and the Legal Industry, and, Gonsowski, A Changing World: Ralph Losey on “Stepping In” for e-Discovery (Relativity Blog). 

If you are looking to craft a speciality in the law that rides the new wave of AI innovations, then electronic document review with TAR is a good place to start. See Part Two of my January 22, 2017 blog, Lawyers’ Job Security in a Near Future World of AI. This is where the money will be.

 

Our TAR Course is designed to teach this practical, stepping-in based knowledge. The link to the course will always be shown on this blog at the top of the page. The TAR page next to it has related information.

Since Da Silva we have learned a lot about the actual methods of predictive coding. This is hands-on learning through actual cases and experiments, including sixty-four test runs at TREC in 2015 and 2016.

We have come to understand very well the technical details, the ins and outs of legal document review enhanced by artificial intelligence, AI-enhanced review. That is what TAR and predictive coding really mean, the use of active machine learning, a type of specialized artificial intelligence, to find the key documents needed in an investigation. In the process I have written over sixty articles on the subject of TAR, predictive coding and document review, most of them focused on what we have learned about methods.

The TAR Course is the first time we have put all of this information together in a systematic training program. In sixteen classes we cover all seventeen topics, and much more. The result is an online instruction program that can be completed in one long weekend. After that it can serve as a reference manual. The goal is to help you to step-in and improve your document review projects.

The TAR Course has sixteen classes listed below. Click on some and check them out. All free. We do not even require registration. No tests either, but someday soon that may change. Stay tuned to the e-Discovery Team. This is just the first step dear readers of my latest hack of the profession. Change we must, and not just gradual, but radical. That is the only way the Law can keep up with the accelerating advances in technology. Taking the TAR Course is a minimum requirement and will get you ready for the next stage.

  1. First Class: Introduction
  2. Second Class: TREC Total Recall Track
  3. Third Class: Introduction to the Nine Insights Concerning the Use of Predictive Coding in Legal Document Review
  4. Fourth Class: 1st of the Nine Insights – Active Machine Learning
  5. Fifth Class: Balanced Hybrid and Intelligently Spaced Training
  6. Sixth Class: Concept and Similarity Searches
  7. Seventh Class: Keyword and Linear Review
  8. Eighth Class: GIGO, QC, SME, Method, Software
  9. Ninth Class: Introduction to the Eight-Step Work Flow
  10. Tenth Class: Step One – ESI Communications
  11. Eleventh Class: Step Two – Multimodal ECA
  12. Twelfth Class: Step Three – Random Prevalence
  13. Thirteenth Class: Steps Four, Five and Six – Iterate
  14. Fourteenth Class: Step Seven – ZEN Quality Assurance Tests
  15. Fifteenth Class: Step Eight – Phased Production
  16. Sixteenth Class: Conclusion

This course is not about the theory or law of predictive coding. You can easily get that elsewhere. It is about learning the latest methods to do predictive coding. It is about learning how to train an AI to find the ESI evidence you want. The future looks bright for attorneys with both legal knowledge and skills and software knowledge and skills. The best and brightest will also be able to work with various kinds of specialized AI to do a variety of tasks, including AI-enhanced document review. If that is your interest, then jump onto the TAR Course and start your training today. Who knows where it may take you?

________

__

.

 

 


e-Discovery Team’s 2016 TREC Report: Once Again Proving the Effectiveness of Our Standard Method of Predictive Coding

February 24, 2017

Team_TRECOur Team’s Final Report of its participation in the 2016 TREC ESI search Conference has now been published online by NIST and can be found here and the final corrected version can be found here.  TREC stands for Text Retrieval Conference. It is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The stated purpose of the annual TREC conference is to encourage research in information retrieval from large text collections.

The other co-sponsor of TREC is the United States Department of Defense. That’s right, the DOD is the official co-sponsor of this event, although TREC almost never mentions that. Can you guess why the DOD is interested? No one talks about it at TREC, but I have some purely speculative ideas. Recall that the NSA is part of the DOD.

We participated in one of several TREC programs in both 2015 and 2016, the one closest to legal search, called the Total Recall Track. The leaders, administrators of this Track were Professors Gordon Cormack and Maura Grossman. They also participated each year in their own track.

One of the core purposes of all of the Tracks is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is:

[T]o speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems.

Our participation in TREC in 2015 and 2016 has demonstrated substantial improvements in retrieval methodologies. That is what we set out to do. That is the whole point of the collaboration between the Department of Commerce and Department of Defense to establish TREC.

clinton_emailThe e-Discovery Team has a commercial interest in participation in TREC, not a defense or police interest. Although from what we saw with the FBI’s struggles to search email last year, the federal government needs help. We were very unimpressed by the FBI’s prolonged efforts to review the Clinton email collection. I was one of the few e-discovery lawyers to correctly call the whole Clinton email server “scandal” a political tempest in a teapot. I still do and I am still outraged by how her email review was handled by the FBI, especially with the last-minute “revelations.”

prism_nsaThe executive agencies of the federal government have been conspicuously absent from TREC. They seem incapable of effective search, which may well be a good thing. Still, we have to believe that the NSA and other defense agencies are able to do a far better job at large-scale search than the FBI. Consider their ongoing large-scale metadata and text interception efforts, including the once Top Secret PRISM operation. Maybe it is a good thing the NSA doe not share it abilities with the FBI, especially these days. Who knows? We certainly will not.

Mr_EDRThe e-Discovery Team’s commercial interest is to transfer Predictive Coding technology from our research labs into commercial products, namely transfer our Predictive Coding 4.0 Method using KrolL Discovery EDR software to commercial products. In our case at the present time “commercial products” means our search methods, time and consultations. But who knows, it may be reduced to a robot product someday like our Mr. EDR.

The e-Discovery Team method can be used on other document review platforms as well, not just Kroll’s, but only if they have strong active machine learning features. Active machine learning is what everyone at TREC was testing, although we appear to have been the only participant to focus on a particular method of operation. And we were the only team led by a practicing attorney, not an academic or software company. (Catalyst also fielded a team in 2015 and 2106 headed by Information Science Ph.D., Jeremy Pickens.)

Olympics-finish-line-Usain-Bolt-winsThe e-Discovery Team wanted to test the hybrid multimodal software methods we use in legal search to demonstrate substantial improvements in retrieval methodologies on real-world problems. We have now done so twice; participating in both the 2015 and 2016 Total Recall Tracks. The results in 2016 were even better than 2015. We obtained remarkable results in document review speed, recall and precision; although, as we admit, the search challenges presented at TREC 2016 were easier than most projects we see in legal discovery. Still, to use the quaint language of TREC, we have demonstrated the robustness of our methods and software.

These demonstrations, and all of the reporting and analysis involved, have taken hundreds of hours of our time, but there was no other venue around to test our retrieval methodologies on real-world problems. The demonstrations are now over. We have proven our case. Our standard Predictive Coding method has been tested and its effectiveness demonstrated. No one else has tested and proven their predictive coding methods as we have done. We have proven that our hybrid multimodal method of AI-Enhanced document review is the gold standard. We will continue to make improvements in our method and software, but we are done with participation in federal government programs to prove our standard, even one run by the National Institute of Standards and Technology.

predictive_coding_4-0_web

To prove our point that we have now demonstrated substantial improvements in retrieval methodologies, we quote below Section 5.1 of our official TREC report, but we urge you to read the whole thing. It is 164 pages. This section of our report covers our primary research question only. We investigated three additional research questions not included below.

__________

Section 5.1 First and Primary Research Question

What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all thirty-four topics using the Team’s Predictive Coding 4.0 hybrid multimodal search methods and Kroll Ontrack’s software, eDiscovery.com Review (EDR).

Again, as in the 2015 Total Recall Track, the Team attained very good results with high levels of Recall and Precision in all topics, including perfect or near perfect results in several topics using the corrected gold standard. The Team did so even though it only used five of the eight steps in its usual methodology, intentionally severely constrained the amount of human effort expended on each topic and worked on a dataset stripped of metadata. The Team’s enthusiasm for the record-setting results, which were significantly better than its 2015 effort, is tempered by the fact that the search challenges presented in most of the topics in 2016 were not difficult and the TREC relevance judgments had to be corrected in most topics.  …

This next chart uses the corrected standard. It is the primary reference chart we use to measure our results. Unfortunately, it is not possible to make any comparisons with BMI standards because we do not know the order in which the BMI documents were submitted.

Last_TREC_Chart_2016

 

The average results obtained across all thirty-four topics at the time of reasonable call using the corrected standard are shown below in bold. The average scores using the uncorrected standard are shown for comparison in parentheses.

  • 91.57% Recall (75.46%)
  • 65.90% Precision (57.12%)
  • 76.64% F1 (57.69%)
  • 124 Docs Reviewed Effort (124)

At the time of reasonable call the Team had recall scores greater than 90% in twenty-two of the thirty-four topics and greater than 80% in five more topics. Recall of greater than 95% was attained in fourteen topics. These Recall scores under the corrected standard are shown in the below chart. The results are far better than we anticipated, including six topics with total recall – 100%, and two topics with both total recall and perfect precision, topic 417 Movie Gallery and topic 434 Bacardi Trademark.

recall-scores-amended-2016

At the time of reasonable call the Team had precision scores greater than 90% in thirteen of the thirty-four topics and greater than 75% in three more topics. Precision of greater than 95% was attained in nine topics. These Precision scores under the corrected standard are shown in the below chart. Again, the results were, in our experience, incredibly good, including three topics with perfect precision at the time of the reasonable call.

precision-scores-amended-2016

At the time of reasonable call the Team had F1 scores greater than 90% in twelve of the thirty-four topics and greater than 75% in two more. F1 of greater than 90% was attained in eight topics. These F1 scores under the corrected standard are shown in the below chart. Note there were two topics with a perfect score, Movie Gallery (100%) and Bacardi Trademark (100%) and three more that were near perfect: Felon Disenfranchisement (98.5%), James V. Crosby (97.57%), and Elian Gonzalez (97.1%).

f1-scores-amended_2016

We were lucky to attain two perfect scores in 2016 (we attained one in 2015), in topic 417 Movie Gallery and topic 434 Bacardi Trademark. The perfect score of 100% F1 was obtained in topic 417 by locating all 5,945 documents relevant under the corrected standard after reviewing only 66 documents. This topic was filled with form letters and was a fairly simple search.

The perfect score of 100% F1 was obtained in topic 434 Bacardi Trademark by locating all 38 documents relevant under the corrected standard after reviewing only 83 documents. This topic had some legal issues involved that required analysis, but the reviewing attorney, Ralph Losey, is an SME in trademark law so this did not pose any problems. The issues were easy and not critical to understand relevance. This was a simple search involving distinct language and players. All but one of the 38 relevant documents were found by tested, refined keyword search. One additional relevant document was found by a similarity search. Predictive coding searches were run after the keywords searches and nothing new was uncovered. Here machine learning merely performed a quality assurance role to verify that all relevant documents had indeed been found.

The Team proved once again, as it did in 2015, that perfect recall and perfect precision is possible, albeit rare, using the Team’s methods and fairly simple search projects.

The Team’s top ten projects attained remarkably high scores with an average Recall of 95.66%, average Precision of 97.28% and average F-Measure: 96.42%. The top ten are shown in the chart below.

top-10_results

In addition to Recall, Precision and F1, the Team per TREC requirements also measured the effort involved in each topic search. We measured effort by the number of documents that were actually human-reviewed prior to submission and coded relevant or irrelevant. We also measured effort by the total human time expended for each topic. Overall, the Team human-reviewed only 6,957 documents to find all the 34,723 relevant documents within the overall corpus of 9,863,366 documents. The total time spent by the Team to review the 6,957 documents, and do all the search and analysis and other work using our Hybrid Multimodal Predictive Coding 4.0 method, was 234.25 hours. reviewed_data_pie_chart_2016

It is typical in legal search to try to measure the efficiency of a document review by the number of documents classified by an attorney in an hour. For instance, a typical contract review attorney can read and classify an average of 50 documents per hour. The Team classified 9,863,366 documents by review of 6,957 documents taking a total time of 234.25 hours. The Team’s overall review rate for the entire corpus was thus 42,106 files per hour (9,863,366/234.25).

In legal search it is also typical, indeed mandatory, to measure the costs of review and bill clients accordingly. If we here assume a high attorney hourly rate of $500 per hour, then the total cost of the review of all 34 Topics would be $117,125. That is a cost of just over $0.01 per document. In a traditional legal review, where a lawyer reviews one document at a time, the cost would be far higher. Even if you assume a low attorney rate of $50 per hour, and review speed of 50 files per hour, the total cost to review every document for every issue would be $9,863,366. That is a cost of $1.00 per document, which is actually low by legal search standards.13

Analysis of project duration is also very important in legal search. Instead of the 234.25 hours expended by our Team using Predictive Coding 4.0, traditional linear review would have taken 197,267 hours (9,863,366/50). In other words, the review of thirty-four projects, which we did in our part-time after work in one Summer, would have taken a team of two lawyers using traditional methods, 8 hours a day, every day, over 33 years! These kinds of comparisons are common in Legal Search.

Detailed descriptions of the searches run in all thirty-four topics are included in the Appendix.

___________

We also reproduce below Section 1.0, Summary of Team Efforts, from our 2016 TREC Report. For more information on what we learned in the 2016 TREC see alsoComplete Description in 30,114 Words and 10 Videos of the e-Discovery Team’s “Predictive Coding 4.0” Method of Electronic Document ReviewNine new insights that we learned in the 2016 research are summarized by the below diagram more specifically described in the article.

predictive_coding_six-three-2

_________

Excerpt From Team’s 2016 Report

1.1 Summary of Team’s Efforts. The e-Discovery Team’s 2016 Total Recall Track Athome project started June 3, 2016, and concluded on August 31, 2016. Using a single expert reviewer in each topic the Team classified 9,863,366 documents in thirty-four review projects.

The topics searched in 2016 and their issue names are shown in the chart below. Also included are the first names of the e-Discovery Team member who did the review for that topic, the total time spent by that reviewer and the number of documents manually reviewed to find all of the relevant documents in that topic. The total time of all reviewers on all projects was 234.25 hours. All relevant documents, totaling 34,723 by Team count, were found by manual review of 6,957 documents. The thirteen topics in red were considered mandatory by TREC and the remaining twenty-one were optional. The e-Discovery Team did all topics.

trec-2016-topics

They were all one-person, solo efforts, although there was coordination and communications between Team members on the Subject Matter Expert (SME) type issues encountered. This pertained to questions of true relevance and errors found in the gold standard for many of these topics. A detailed description of the search for each topic is contained in the Appendix.

In each topic the assigned Team attorney personally read and evaluated for true relevance every email that TREC returned as a relevant document, and every email that TREC unexpectedly returned as Irrelevant. Some of these were read and studied multiple times before we made our final calls on true relevance, determinations that took into consideration and gave some deference to the TREC assessor adjudications, but were not bound by them. Many other emails that the Team members considered irrelevant, and TREC agreed, were also personally reviewed as part of their search efforts. As mentioned, there was sometimes consultations and discussion between Team members as to the unexpected TREC opinions on relevance.

This contrasts sharply with participants in the Sandbox division. They never make any effort to determine where their software made errors in predicting relevance, or for any other reasons. They accept as a matter of faith the correctness of all TREC’s prior assessment of relevance. To these participants, who were all academic institutions, the ground truth itself as to relevance or not, was of no relevance. Apparently, that did not matter to their research.

All thirty-four topics presented search challenges to the Team that were easier, some far easier, than the Team typically face as attorneys leading legal document review projects. (If the Bush email had not been altered by omission of metadata, the searches would have been even easier.) The details of the searches performed in each of the thirty-four topics are included in the Appendix. The search challenges presented by these topics were roughly equivalent to the most simplistic challenges that the e-Discovery Team might face in projects involving relatively simple legal disputes. A few of the search topics in 2016 included quasi legal issues, more than were found in the 2015 Total Recall Track. This is a revision that the Team requested and appreciated because it allowed some, albeit very limited testing of legal judgment and analysis in determination of true relevance in these topics. In legal search relevancy, legal analysis skills are obviously very important. In most of the 2016 Total Recall topics, however, no special legal training or analysis was required for a determination of true relevance.

At Home participants were asked to track and report their manual efforts. The e-Discovery Team did this by recording the number of documents that were human reviewed and classified prior to submission. More were reviewed after submission as part of the Team’s TREC relevance checking. Virtually all documents human reviewed were also classified, although all documents classified were not used for active training of the software classifier. The Team also tracked effort by number of attorney hours worked as is traditional in legal services. Although the amount of time varied somewhat by topic, the average time spent per topic was only 6.89 hours. The average review and classification speed for each project was 42,106 files per hour (9,863,366/234.25).

Again, for the full picture and complete details of our work please see the complete 164 page report to TREC of the e-Discovery Team’s Participation in the 2016 Total Recall Track.

 

 

 

 


%d bloggers like this: