This month my blog is not just an article, but a whole new website and Internet domain.
The new website introduces a new legal service, ZEN Document Review. It includes three short videos of me talking and, as usual for me, lots of words and graphics. This new service is part of the social transition that I wrote about last month: Information → Knowledge → Wisdom: Progression of Society in the Age of Computers. It represents a post-information approach to legal work, specifically document review, that goes beyond the first level of Information services. ZEN Document Review is instead a service based on Knowledge and Wisdom. It is here now, but represents the future.
In this new web, Zero Error Numerics, I share, for the first time, some of the inner-side of how I work. I also share more about the quality control procedures that I have developed for predictive coding based document review.
Go to ZeroErrorNumerics.com and see what I mean, including especially A Word About Zen Meditation. Unlike Steve Jobs I am not a Zen Buddhist, but, like Steve, and many others, I am a life-long meditator. I am also a lawyer and futurist with certain ideas as to what the next two stages of society should look like. I talked about this in Information → Knowledge → Wisdom. My major creation this month, Zero Error Numerics, implements these ideas in the field of work that I know.
Zero Error Numerics represents a knowledge and wisdom based approach to legal search and document review. It is not as weird as you might think. As I point out in A Word About Zen Meditation, about 25% of mainstream corporate America now encourage meditation at work for physical and mental health benefits. It creates a good vibe to control stress and get things done.
Postscript to Data Visualization with a Thanks to Kroll Ontrack
On another subject entirely, I have a postscript on my prior blog, Visualizing Data in a Predictive Coding Project – Parts One, Two and Three. I wrote that series in November 2014. You may recall my challenge to all vendors to include a probability distribution graphic along the lines I described in my blog. I asked for software developers to include such a graphics feature in future versions. I wanted to have a visual display of the relevance ranking of all documents in a predictive coding project. I made no special calls, nor even once asked Kroll Ontrack, my firm’s preferred vendor, to step up to the plate and do it. Yet, being the company that they are, they quietly added that feature in the version they released this Spring and waited to see when I would notice. It is an early, simple version, but it is there and it works well. That’s the way KO rolls, 0n track and ahead of the pack.
You may also recall that I shared a graphic in the Visualizing Data blog to show a probability distribution visualization. At the time, during the first three quarters of 2014, I was often seeing in my mind’s eye the kind of rankings that looked like an upside down champagne glass, shown right. I would typically see such a distribution at or near the end of active machine learning. I wanted a software feature that would take it out of my mind’s eye, my imagination, and put it onto the computer screen. My projects then would typically shape the data so that most documents were either highly ranked irrelevant, which I visualized as near the bottom of a vertical array in blue, or highly ranked relevant, which I visualized on the top in red. I was also finding documents in between, with a more gradual sloping of irrelevance at the bottom, than with relevance at the top. That is why it had an upside down champagne glass look.
If you take my graphic and turn it 90% clockwise, so it goes left to right, from irrelevant to relevant, and then flattened it out, it would look like this.
Kroll Ontrack met my challenge and implemented a visualization of data ranking by using a horizontal bar graph approach. Thanks and kudos to my software development friends at Kroll Ontrack for such a quick response. You never let me down. There is good reason that Kroll Ontrack was chosen by National Law Journal readers in 2014 as the leading predictive coding technology in the industry.
The latest version of KO’s software, which they call EDR, for EDiscovery.com Review, includes a cool graphics tool that does the job of visualizing probability data ranking. It is included in the Technology Assisted Review Metrics page under Probability Distribution. They basically added a spreadsheet bar graph display. You can also see the probability distribution in numeric table form for exact metrics of the probability distributions. You also have the choice to see the probability graph in increments of 5%, 10% or 25%. The screen shot below shows 10% increments. The bar graph display shows the probability ranking from left to right, irrelevant to relevant. Here is a screen shot from a recent project after training was complete. You can click on the graph to see a larger version.
This project had about a 4% prevalence of relevant documents, so it made sense for the relevant half to be far smaller. But what is striking about the data stratification is how polarized the groupings are. This means the ranking distribution separation, relevant and irrelevant, is very well-formed. There are an extremely small number of documents where the AI is unsure of classification. The slow curving shape of irrelevant probability on the left (or the bottom of my upside down champagne glass) is gone.
The visualization shows a much clearer and complete ranking at work than I had ever seen before. The AI is much more certain about what documents are irrelevant. Below is a screenshot of the table form display of this same project in 5% increments. It shows the exact numerics of the probability distribution in place when the machine training was completed. This is the most pronounced polar separation I have ever seen, which shows that my training on relevancy has been well understood by the machine.
I am unsure of the reason for this significant change in probability distribution from what I routinely saw last year. It could just be chance event. Time will tell. It could also just be a peculiarity of this data and search project, but it did seem typical to me, and certainly a prevalence of just over 4% is common. It could also be a result of some of the latest enhancements to the predictive coding functions in Kroll Ontrack’s EDR. The distribution attained might be more pronounced because the software is smarter. They are always working to make it better. That is how you stay number one.
The better results shown here might even be explained by improvements in my methods and my team’s performance. Maybe we are more relaxed and in the flow now than ever before. Who knows. It could also be some combination of these factors. I will keep a careful eye on the probability distributions in the future to see if this is the new normal, or just a lucky fluke.
Either way, in my experience the active machine learning, aka predictive coding, functions of Kroll Ontrack’s EDR software are working very well. It is a powerful and sophisticated tool. Like a top race car, it is hard to beat when driven correctly. Still, if you do not know how to drive, the best race car in the world will never win. If you combine both a bad car and poor driver, you may well get the world’s largest manual review project. I am told this kind of disaster happens all too often.
What passes as a good faith use of predictive coding by some law firms is a disgrace. Of course, if hide the ball is still your real game of choice, then all of the good software in the world will not make any difference. Keep breaking the law like that and someday you are bound to crash and burn. See eg my prior articles: Discovery As Abuse, There Can Be No Justice Unless Lawyers Maintain High Ethical Standards, and E-Discovery Gamers: Join Me In Stopping Them.