Linked here, as a single document, is a slightly shortened version of my prior seven-part blog, Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. This narrative, which has been whittled down to only 82-pages, describes a multimodal legal search project using predictive coding. The 52-hour search project examined 699,082 Enron emails and attachments. I was looking for evidence of involuntary employee terminations. The report includes tons of screen shots and document examples, along with quality assurance analysis and explanations of my every move. Anyone may use and redistribute this narrative, so long as they do not change anything, nor charge anything for it.
Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron remains as the only narrative of a search project ever written. But look for a video sequel soon, where I essentially do it all again using a different monomodal approach.
The Predictive Coding Search Narrative is a blow-by-blow description of a search project with occasional editorial comments. I try to describe the review process in an accurate and detailed manner, yet still keep it interesting. That is a challenge, believe me, but I got a lot of positive feedback on the effort, especially from other specialists in the area. Still, keeping to the truth can be somewhat limiting, so for a more snazzy, science-fiction effort that I wrote after this, see Journey into the Borg Hive. There I describe a legal search project set in the not-too-distant future where two competing predictive coding search methods battle against each other. The fictional effort is primarily designed to teach about quality assurance methods and different types of predictive coding search.
I am admittedly obsessed with predictive coding and its potential to change everything we know and do in legal search. I now teach it on a regular basis, as well as perform machine learning enhanced searches for clients. I am also doing research on various predictive coding methods, thanks to a generous license grant from my favorite vendor.
From my instructional activities on the subject, and feedback from other lawyers, I know that detailed narratives describing predictive coding enhanced searches are helpful. I also get positive feedback from information scientists and engineers who like to see how the new software is actually being used. For these reasons I will keep the narratives coming. My hope is that this work will help advance the profession to the next level of legal search, a level where Computer Assisted Search (“CAR,” or “TAR” for Technology Assisted Review) is intelligently used to find far more of the truth in large complex, disorganized databases (like email) than ever before, and to do so at a fraction of the cost.
I know, it sounds too good to be true, but attorneys like me have now proven it time and time again. The main problem to widespread acceptance of such breakthrough technology, what some call disruptive technology, is that it is a very new approach to legal search, a very different approach. So far at least, not that many lawyers and vendors know how to do it right. (See the Preamble to the Grossman-Cormack Glossary on Technology-Assisted Review for more on how TAR is disruptive technology.) Another problem is that the prices some vendors charge is a bit too high, but prices seem to be coming down now as more and more of the top law firms in the world begin to use it. This raises the question: can you be a top law firm in litigation and not use some kind of predictive coding? Personally, I think not.
The attached narrative describes one of the two CAR search methods that are now competing against each other in the marketplace. It is the multimodal method where machine learning is supplemented with other forms of search, such as keyword, similarity and concept. These other search methods are used to help find relevant documents to train the machine. It is an active learning process involving both man and machine, but with man firmly in control. The contrary approach, that I will report on soon in a series of videos, only uses predictive coding, and omits the other forms of search. It is monomodal and the machine rules and takes over the search.
Stay tuned for coming videos, accompanied with detailed notes, where I describe the monomodal approach to search of the same 699,082 Enron database. In the meantime, please feel free to share the attached Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron.