More Enhancements to the TAR Course with New Videos on the Importance of Keyword Search, Blair Maron, the Search Quadrant and a Similarity Search Tip

Many new enhancements were made to the TAR Course this weekend, including additions and revisions to the written materials, new graphics, new homework (for the first time) for the Twelfth Class (Random Prevalence), along with two new videos, one for the Sixth Class (Similarity Searches) and a longer one for the Seventh Class on the Search Quadrant and the classic Blair Maron research. The videos are reproduced below for the convenience of those who have already gone through the course or otherwise may be curious about my latest thoughts on legal search.

The Seventh Class is entitled Keyword and Linear Review. The new video gives background on legal search in general, and Keyword search in particular, including its known limitations. It is shown in two parts. I start off simple explaining the basic terminology but eventually get to some more nuanced points, including discussion of the Search Quadrant and the Blair and Maron study.


In spite of the limits of keyword search, we still use a sophisticated form of keyword search in every project, especially at the beginning of a project. We use  tested, Boolean Parametric keyword search to find the low hanging fruit. That is part of Step Two of our eight-part method. It is also part of Step Six. We feed the documents we find by this, and all other methods, into our training matrix for our machine learning.That is part of Step-Four. The eight steps in our Predictive Coding 4.0 method are covered in Classes Nine through Fifteen of the sixteen class TAR Course.

One of the things we learned at our 2016 experiments at TREC was that keyword search is more valuable than we had originally thought, when done right and when done in a relatively simple search project. But still, when keyword search is done in a naive Go Fish manner, it is very poor at Recall and Precision, even in simple cases. In complex projects even sophisticated keyword search needs to be supplemented with the more powerful machine learning algorithms. Even the best forms of keyword search can only work well alone in projects with simple data, a clear target and a good SME. The war story in part two of my video above demonstrated that.

The second new video is a short one providing a search tip on one way to use Similarity Searches. it was added to the Sixth Class.



Here is one of the new graphics I added. It uses a photo of the Compact Muon Solenoid (CMS) detector in the Large Hadron Collider. That is the famous seventeen mile long particle accelerator that straddles the border of Switzerland and France. It is the largest machine in the world and was built by the European Organization for Nuclear Research (CERN).

This photo of a key component of the world’s most sophisticated electronic tool is shown with a lift in place. The lift allows engineers to step-in and keep the technology in good working order. (Stepping-In is discussed in Davenport and Kirby, Only Humans Need Apply, and by Dean Gonsowski, A Clear View or a Short Distance? AI and the Legal Industry, and A Changing World: Ralph Losey on “Stepping In” for e-Discovery. Also see: Losey, Lawyers’ Job Security in a Near Future World of AI, Part Two. The lift in the Hadron photo illustrates the importance of humans to maintain and operate all of the new technologies we are creating. It is truly a Man-Machine hybrid relationship, just like predictive coding, where we lawyers need to step-in and enhance our evidence finding by working with our own new technology tools.

I chose  the CERN CMS because it is the ultimate technology tool now existing to enhance human capabilities. In this case to see elementary particles. The tool makes and records forty million measurements per second of high energy particle collisions. To understand my enthusiasm for the Compact Muon Solenoid in the Large Hadron Collider, the beauty of the design and boldness of the experiments, check out a few instructional videos. Start with this one by the BBC, then, if you are interested, watch a few more. The one below allows for a 360 view that you control.


Back to the stepping-in, double loop IST training, this is taught in the fifth class of the TAR Course. That class is called Balanced Hybrid and Intelligently Spaced Training. We use IST, Intelligently Spaced Training, a form of continuous active learning, as part of our process to select documents to use for machine training. This allows us to set up a Double Feedback Loop, where we both teach and learn to better understand the machine’s training needs. IST and double-loop training are advanced concepts and techniques taught throughout the TAR Course, but featured in the Fifth Class. The writing in this class was also slightly improved and expanded. Here is one of the new graphics for that class. The class now explains that the extra control provided by the IST method provides more wiggle-room for human creativity and innovation. (This next graphic is not a giff animation. It is an optical illusion based on work of the Japanese experimental psychologist, Akiyoshi Kitaoka. The image itself is static.)

Another photo of the CERN collider without the lift is shown below. This graphic was added to the Second Class, on TREC Total Recall Track, 2015 and 2016. It illustrates the importance of experiments and research to the e-Discovery Team’s current understanding of the three primary quality controls in TAR: (1) Method, (2) Software and (3) SME.

These three QC process factors are explained in the Eighth Class, SME, Method, Software; the Three Pillars of Quality Control. In this class we discuss the debate between AI leading to automation, versus, IA, intelligence augmentation. We advocate for enhancement and empowerment of attorneys by technology, including quality controls and fraud detection. We oppose delegation of control to the machine for document review. See Why the ‘Google Car’ Has No Place in Legal Search.

This delegation to automated methods will not stop fraud as the full-automation side argues. The SMEs are still programing relevance input. But it will decrease precision and so drive up the costs of review. It will also result in too many lost black swans when a bad stop decision is made. There are other more effective ways to guard against a crooked attorney then trying to remove the human attorney from the equation. Experienced lawyers can already detect omissions, especially when using ranking based searches.

Finally, I also added new writings and some challenging homework assignments for the Twelfth Class. This class covers Step Three – Random Prevalence, of the Team’s standard eight-step workflow. In this step a little math is required, so I added some more explanations and detailed exercises. This should make it easier to learn this new knowledge.  Now only the fourteenth, fifteenth and sixteenth classes do not have homework assignments. They will be added soon enough. Consider this a rolling production.








Leave a Reply