More Enhancements to the TAR Course with New Videos on the Importance of Keyword Search, Blair Maron, the Search Quadrant and a Similarity Search Tip

May 28, 2017

Many new enhancements were made to the TAR Course this weekend, including additions and revisions to the written materials, new graphics, new homework (for the first time) for the Twelfth Class (Random Prevalence), along with two new videos, one for the Sixth Class (Similarity Searches) and a longer one for the Seventh Class on the Search Quadrant and the classic Blair Maron research. The videos are reproduced below for the convenience of those who have already gone through the course or otherwise may be curious about my latest thoughts on legal search.

The Seventh Class is entitled Keyword and Linear Review. The new video gives background on legal search in general, and Keyword search in particular, including its known limitations. It is shown in two parts. I start off simple explaining the basic terminology but eventually get to some more nuanced points, including discussion of the Search Quadrant and the Blair and Maron study.


In spite of the limits of keyword search, we still use a sophisticated form of keyword search in every project, especially at the beginning of a project. We use  tested, Boolean Parametric keyword search to find the low hanging fruit. That is part of Step Two of our eight-part method. It is also part of Step Six. We feed the documents we find by this, and all other methods, into our training matrix for our machine learning.That is part of Step-Four. The eight steps in our Predictive Coding 4.0 method are covered in Classes Nine through Fifteen of the sixteen class TAR Course.

One of the things we learned at our 2016 experiments at TREC was that keyword search is more valuable than we had originally thought, when done right and when done in a relatively simple search project. But still, when keyword search is done in a naive Go Fish manner, it is very poor at Recall and Precision, even in simple cases. In complex projects even sophisticated keyword search needs to be supplemented with the more powerful machine learning algorithms. Even the best forms of keyword search can only work well alone in projects with simple data, a clear target and a good SME. The war story in part two of my video above demonstrated that.

The second new video is a short one providing a search tip on one way to use Similarity Searches. it was added to the Sixth Class.



Here is one of the new graphics I added. It uses a photo of the Compact Muon Solenoid (CMS) detector in the Large Hadron Collider. That is the famous seventeen mile long particle accelerator that straddles the border of Switzerland and France. It is the largest machine in the world and was built by the European Organization for Nuclear Research (CERN).

This photo of a key component of the world’s most sophisticated electronic tool is shown with a lift in place. The lift allows engineers to step-in and keep the technology in good working order. (Stepping-In is discussed in Davenport and Kirby, Only Humans Need Apply, and by Dean Gonsowski, A Clear View or a Short Distance? AI and the Legal Industry, and A Changing World: Ralph Losey on “Stepping In” for e-Discovery. Also see: Losey, Lawyers’ Job Security in a Near Future World of AI, Part Two. The lift in the Hadron photo illustrates the importance of humans to maintain and operate all of the new technologies we are creating. It is truly a Man-Machine hybrid relationship, just like predictive coding, where we lawyers need to step-in and enhance our evidence finding by working with our own new technology tools.

I chose  the CERN CMS because it is the ultimate technology tool now existing to enhance human capabilities. In this case to see elementary particles. The tool makes and records forty million measurements per second of high energy particle collisions. To understand my enthusiasm for the Compact Muon Solenoid in the Large Hadron Collider, the beauty of the design and boldness of the experiments, check out a few instructional videos. Start with this one by the BBC, then, if you are interested, watch a few more. The one below allows for a 360 view that you control.


Back to the stepping-in, double loop IST training, this is taught in the fifth class of the TAR Course. That class is called Balanced Hybrid and Intelligently Spaced Training. We use IST, Intelligently Spaced Training, a form of continuous active learning, as part of our process to select documents to use for machine training. This allows us to set up a Double Feedback Loop, where we both teach and learn to better understand the machine’s training needs. IST and double-loop training are advanced concepts and techniques taught throughout the TAR Course, but featured in the Fifth Class. The writing in this class was also slightly improved and expanded. Here is one of the new graphics for that class. The class now explains that the extra control provided by the IST method provides more wiggle-room for human creativity and innovation. (This next graphic is not a giff animation. It is an optical illusion based on work of the Japanese experimental psychologist, Akiyoshi Kitaoka. The image itself is static.)

Another photo of the CERN collider without the lift is shown below. This graphic was added to the Second Class, on TREC Total Recall Track, 2015 and 2016. It illustrates the importance of experiments and research to the e-Discovery Team’s current understanding of the three primary quality controls in TAR: (1) Method, (2) Software and (3) SME.

These three QC process factors are explained in the Eighth Class, SME, Method, Software; the Three Pillars of Quality Control. In this class we discuss the debate between AI leading to automation, versus, IA, intelligence augmentation. We advocate for enhancement and empowerment of attorneys by technology, including quality controls and fraud detection. We oppose delegation of control to the machine for document review. See Why the ‘Google Car’ Has No Place in Legal Search.

This delegation to automated methods will not stop fraud as the full-automation side argues. The SMEs are still programing relevance input. But it will decrease precision and so drive up the costs of review. It will also result in too many lost black swans when a bad stop decision is made. There are other more effective ways to guard against a crooked attorney then trying to remove the human attorney from the equation. Experienced lawyers can already detect omissions, especially when using ranking based searches.

I predict the attorney fraud issue will be further addressed soon by specialized software designed to look for fraud indicators in ESI productions. That will further empower attorneys to detect the few trying to rig the system, and many more just operating out of some level of negligence. The abusers of discovery must and will be stopped.

Software should come on the market soon to help everyone police the ethics and competence of all counsel involved in discovery. Such software would also help in quality control of productions.

I think creation of this software is one of the biggest technology opportunities in the law today. Think you already have the solution? Want to put together, or further fund, a start-up to create new products in this area? Then contact me.

Finally, I also added new writings and some challenging homework assignments for the Twelfth Class. This class covers Step Three – Random Prevalence, of the Team’s standard eight-step workflow. In this step a little math is required, so I added some more explanations and detailed exercises. This should make it easier to learn this new knowledge.  Now only the fourteenth, fifteenth and sixteenth classes do not have homework assignments. They will be added soon enough. Consider this a rolling production.








Team’s TAR Course has been Updated and Expanded

May 21, 2017

I was in Manhattan at the Marriott on Times Square this week presenting on advanced TAR with Jim Sullivan. We provided an overview on the Team’s latest methods, Hybrid Multimodal IST Predictive Coding 4.0. Presentations like this allow you to interact with students and refine your approach. It was a good group, as is typical in NYC, of lawyers and litigation support experts.

I left N.Y. on Wednesday night, well before the madman drove his car into Times Square on Thursday. Jim stayed overnight and walked out of Times Square just minutes before this horrible massacre. My condolences to the family of the nineteen year old girl who was killed and the twenty-two other pedestrians who were injured.

When I got home from NYC I condensed our eighty-five minute presentation into a thirty-seven minute video. It now serves as the core video introduction to the e-Discovery Team’s free TAR Course. It is found in the first of the sixteen classes in the Course. I also revised and improved the wording in the Welcome Page of the course and made it a stand alone entry point. To have a little more fun with all of this I also created a new graphic, shown below. It provides a visualization of the core content of the TAR Course. Click on it to see a larger view.

This TAR Course welcome page now has its own written and video content. That material used to be combined with the first class. So the net result is an expansion of the TAR Course from sixteen to seventeen modules. At this point the first eleven classes have “Homework Assignments” at the end with suggested supplemental readings and analytic challenges. We will be adding homework to the last five classes in the next month.

If you have already studied the first class, I urge you to go back and reread and re-view the writings and new three-part video. I also reproduce the same new video below. I call it an overview video, a first tell, but it contains advanced materials and some of my latest thinking.






Go to the TAR Course.

%d bloggers like this: