In predictive coding the computer classifies and ranks an entire dataset based on an extrapolation of training documents selected by reviewers. The intelligence of the computer arises from the documents selected as input for training. Although good predictive coding software is designed to allow for, and adjust, a certain number of inconsistencies in the training documents, the fact remains, that the computer bases its analysis on the input provided. The computer has no innate knowledge or expertise of the issues in the case.
There are natural logical mathematical limits on how well any software can smooth out the inconsistencies of input. Both accuracies and errors in judgment in document selection will be extrapolated and exponentially propagated out over the entire database. What predictive coding does is incredible, but not magic. It is a useful tool, the best we now have to cope with information inflation, but nothing more. It enhances the human intelligence fed into it, and applies the judgment extrapolations consistently at lightening speeds. It even recognizes and ignores some inconsistencies in the information provided, inconsistencies in document classifications. The GIGO rule, garbage in – garbage out, can be mitigated, but not eliminated. The software can detect and correct for inconsistencies, and that in itself is impressive. But there are limits to the correction capacities, and trade-offs, and in the end Bayesian probabilities based on metrics must prevail. GIGO can be smoothed out, but can never be eliminated. It is like the Second Law of Thermodynamics (entropy).
For these reasons it is a fundamental law of predictive coding that when it comes to document training and number of reviewers, less is more. Predictive coding will still work with five reviewers selecting the training documents, but it will work even better with four. Three reviewers are even better, and better still is training done by only two humans. But the ideal, the gold standard, is one expert dong all of the training by him or herself. All other things being equal, including expertise and search skills, one expert will beat two any day, and the competition of one against five will not even be close. Forget about one against fifty. The gold standard for predictive coding training is one. This is an inconvenient truth for vendors, indeed for most of the legal profession, but, like it or not, that is the way it is.
Why Less is More is an Inconvenient Truth to the Establishment
If you want to understand the legal profession, and why it does the things it does, or does not do the things you would expect it to, you have to look at more than just logic and justice. Although lawyers and their vendors are part of the justice system, and in theory law is a profession, not a business, lawyers and law firms are still part of the economy. With vendors that is crystal clear. Moreover, professionals or not, lawyers are all too human. To understand the legal profession you have to look not only at the lofty goals, which do drive many, if not most lawyers, you must also look at money and power. Like it or not, that is also the way it is. It is naive to pretend otherwise.
Why, for instance, hasn’t the legal profession already adopted predictive coding? It has been proven faster and cheaper, and has been approved by many courts. It has also been widely publicized. So, why are the vast majority of law firms sticking to linear review or keyword search. Yes, technophobia and ignorance are part of the answer, but only part. After all, lawyers as a group are very smart, albeit not usually tech-smart. To understand any human and organizational problem, including this, you must understand the money flows and the ego power-trips involved. Truth and justice are strong forces, but never underestimate the dark side, the strong motivators of fear and greed.
The Rand Report was brave and honest enough to point out that one of the major reasons law firms are reluctant to adopt predictive coding is law firm greed. Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery. Here is how the authors of the Rand Report, Nicholas M. Pace and Laura Zakaras, describe it at page 76:
Resistance of External Counsel. Another barrier to the widespread use of predictive coding could well be resistance to the idea of outside counsel motivated not so much by accuracy issues as by the potential loss of a historical revenue stream. Some interviewees reported grumblings from outside counsel when their companies decided to directly handle a fraction of the overall review process or to markedly reduce what was shipped out for review through the use of additional data processing.
The less is more principle in machine training is a threat to existing economic models in the legal profession, especially to contract review groups both in and outside of law firms. Although, as I will explain, the less is more principle does not mean the end of the contract review business, it does mean it will be scaled-down dramatically. Money will be lost. Empires destroyed. This will happen as predictive coding becomes more widely used, and properly used. Proper use requires consistency in training, which is where less is more comes in. The more reviewers, the more inconsistency and noise in the machine training, the lower the effectiveness.
Proper use also requires use of bona fide experts in the legal subjects and issues in the case. The necessity for SMEs is a related, but separate issue, well known to experts in search. I will address this important issue more extensively in another article. There is some interesting research going on in this area as law firms and software companies struggle with this requirement too.
Suffice it to say in this article that proper use of predictive coding technology requires the machine to be trained by bona fide SMEs, real experts with years of experience, not just inexperienced lawyers who will work cheaply. Once again certain vested interests that serve our profession are threatened by this truth. The review companies are rolling in money right now from the sweat of contract lawyers and will make a concerted effort to hide this truth, as well as the truth of less is more. The same can be said of many large and powerful law firms. Watch as storm troopers and P.R. machines are rolled out in the coming months to oppose and obfuscate this message and other truths that threaten the status quo. Understand the big picture of what is really going on here.
Do not be concerned about me, I have already endured many attacks. I expect more. Comes with the turf. But no matter what they do, I will not be stopped. More importantly, there many others who agree and feel as I do, many others out there boldly speaking the truth of technology, including generations of young lawyers lined up behind us. Together the defeat of the dark forces is inevitable. Do not be cowered by the strength of the big law firms and other vested interests. History is on your side. Speak truth to power. Many generations of brave Americans have sacrificed to give you these First Amendment rights. Use them. Speak out the next time a law firm wants to use non-SMEs, or worse, armies of contract lawyers to do machine training.
Most of the law firms and vendors in the world will not hear the call. They will go the obstructionist Luddite way, ignorant of the lessons of technology history. Most, but not all. A few law firms and review companies have already seen the writing on the wall. A few more will likely follow soon, rather than oppose us. They get it. They are already preparing for the next generation of technology workers. They are not wasting time trying to silence an inconvenient truth. They are embracing the disruptive technologies. They are positioning themselves to profit from the inevitable carnage of the existing systems.
Savants like Gregory Bufithis come to mind. He is the Founder and CEO of the well known The Posse List LLC and Project Counsel SCS, a contract lawyer staffing services company starting to work more and more with SMEs. Project Counsel serves Europe, the Persian Gulf, and the Asia-Pacific region as described in the YouTube video below. Greg, a former NY City intellectual property lawyer, now runs a growing empire from Brussels. He has already positioned his many companies for the coming changes, including Project Counsel.
Greg, who rarely seems to sleep, is at the same time making a deep-dive study of artificial intelligence and document review. I for one am looking forward to the release of his e-book IBM: a culture of innovation and analytics, which is nearing completion. Greg understands and is acting in accord with the Hacker Ethic: Fast, Bold, Impactful, Open and Social Values. Visionary entrepreneurs like Greg Bufithis are pushing change, not fighting it. They understand that what is an inconvenient truth for most can be an opportunity for some.
Many in the U.S. legal world, even though they mean well, are simply not as quick and forward thinking as Greg Bufithis. For example, I have now heard from several sources that many law firms experimenting with predictive coding software are doing so without the active leadership of SMEs. They instead continue their old systems. They use teams of reviewers to do the machine training. The reviewers may have had some training time with an SME, or read a manual, but that is it. The firms may also continue to keyword screen before they begin predictive coding because that’s the way they have always done things.
Then these same law firms wonder why predictive coding did not work very well for them! Not only are too many cooks spoiling the pot (the less is more principle), but none of the cooks are that good to begin with. You cannot crowd-source expertise, all you get is a general consensus and the lowest common denominator. That is not a meal I would want to eat. Yet too many clients today are given little choice but to eat such junk foods.
Current Shortage of Skilled Labor
For a predictive coding process to work, it must be done by small teams of SMEs, ideally one, that is, if you can find one with the time and availability to do it. That is another very real problem here. Quite often firms are unable to find any SME with the time and capability to do machine training, much less one with extensive search skills. Since they lack the talent to do predictive coding properly, these firms have no choice but to make do, and settle with whatever talent they have. They do not use SMEs for training, they use document reviewers with some level of supervision by knowledgeable attorneys who are not bona fide SMEs, but may have access to one. Most of the SMEs in most law firms are my age, which means generally clueless when it comes technology, much less advanced analytics.
If you are a law firm following this approach, do not be surprised if your predictive coding results are disappointing. The vendors software may be fine, but without the SME touch, you are just not using it right (despite all of the reassuring words the vendor may give you). Your results are poor because you have been hit by the GIGO problem, you have been derailed by inconsistent, all-too-often often wrong input.
That is why the less is more principle, coupled with the bona fide SME requirement, creates such an inconvenient truth. It is both a disruptive model to existing monetary interests and power structures, it is also a difficult model to follow. It is difficult primarily because of the current shortage of trained search experts who are also SMEs. There are simply not enough legal subject matter experts who are also able to use the software and have general search skills.
As discussed in LegalSearchScience.com, Professor Gary Marchionini, a founder of Human–computer information retrieval, and others, conclude that three types of expertise are needed for a successful information seeker: subject matter expertise, system expertise (here the particular predictive coding software used), and information seeking expertise (cognitive search skills based on both experience and innate talents). How many lawyers with such knowledge and skill sets are alive on the planet today? How many do you know, or have you heard of? I am talking hands-on reviewers here, not simply theorists or management. Five, ten, maybe twenty? There are certainly far more legal subjects requiring SMEs for machine training than there are qualified experts. This has got to change for the paradigm shift to kick in. Obviously there are vested interests opposed to such change. You know who they are.
This shortage of qualified SMEs is a known problem. I am working on this, and so are a few others. My approach is training, and also a kind of surrogate SME set up. I am still experimenting with this. It is a difficult problem, but I think I am almost ready to teach it, and may roll out a few new services based on it soon. Others are looking into software solutions with even higher error tolerance, and still others are working with extensive quality control procedures.
I know from the many young lawyers I have taught over the last seven years that this shortage of skilled labor is a temporary problem, a generational one. In ten years there should be an ample supply of capable SMEs, but for now it is a critical problem. This is a familiar societal problem, and is in no way unique to the legal profession. Technology has created this problem repeatedly over the last several generations, and society has always solved the problem with new generations of workers that somehow gain the necessary skills. Of course, you will not get it in most law schools. When it comes to technology most of our law schools are irrelevant. Many big law firms, the clueless ones, and those profiting from contract lawyers, have a vested interest in keeping it that way.
This shortage of skilled attorneys capable of accurate machine training is a problem for everyone who is trying to control e-discovery costs and still maintain quality. It is also a problem for e-discovery vendors trying to sell software tools that are too sophisticated for most law firms to use properly. Still, history tells us that this problem will be solved, the luddites and buggy whip manufacturers will be defeated. Now that the creative spark is out of the bottle, all that the dark establishment forces can do is delay the inevitable.
A Continuing Role for Contract Lawyers
The move to predictive coding will displace contract lawyers, especially as trained SMEs emerge in the coming years, but it will not replace them entirely. There is still a place for contract reviewers in final review of the documents identified as probable relevant and selected for possible production. Multiple reviewers can still be used for final relevancy determinations (with quality controls by SMEs), and for protection work (step 8 in the EDBP), just not for machine training in predictive coding. A team of reviewers will, for instance, work just fine in redaction and privilege logging work. The value of time savings of many reviewers, and availability of such reviewers, will still have a strong utilitarian function.
Of course, I am probably wrong about this in the long term. After ten years, or so, contract review lawyers who are not also SMEs may be about as prevalent as horse and buggies are today. As lawyers love to say at the end of long demand letters, PLEASE BE GOVERNED ACCORDINGLY.
To be continued …. Stay tuned for Part Two where I will discuss the essential reasons that less is more in large scale document reviews and the scientific experiments that support this view.