Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One

In predictive coding the computer classifies and ranks an entire dataset based on an extrapolation of training documents selected by reviewers. The intelligence of the computer arises from the documents selected as input for training. Although good predictive coding software is designed to allow for, and adjust, a certain number of inconsistencies in the training documents, the fact remains, that the computer bases its analysis on the input provided. The computer has no innate knowledge or expertise of the issues in the case.

There are natural logical mathematical limits on how well any software can smooth out the inconsistencies of input. Both accuracies and errors in judgment in document selection will be extrapolated and exponentially propagated out over the entire database. What predictive coding does is incredible, but not magic. It is a useful tool, the best we now have to cope with information inflation, but nothing more. It enhances the human intelligence fed into it, and applies the judgment extrapolations consistently at lightening speeds. It even recognizes and ignores some inconsistencies in the information provided, inconsistencies in document classifications. The GIGO rule, garbage in – garbage out, can be mitigated, but not eliminated. The software can detect and correct for inconsistencies, and that in itself is impressive. But there are limits to the correction capacities, and trade-offs, and in the end Bayesian probabilities based on metrics must prevail. GIGO can be smoothed out, but can never be eliminated. It is like the Second Law of Thermodynamics (entropy).

For these reasons it is a fundamental law of predictive coding that when it comes to document training and number of reviewers, less is more. Predictive coding will still work with five reviewers selecting the training documents, but it will work even better with four. Three reviewers are even better, and better still is training done by only two humans. But the ideal, the gold standard, is one expert dong all of the training by him or herself. All other things being equal, including expertise and search skills, one expert will beat two any day, and the competition of one against five will not even be close. Forget about one against fifty. The gold standard for predictive coding training is one. This is an inconvenient truth for vendors, indeed for most of the legal profession, but, like it or not, that is the way it is.

Why Less is More is an Inconvenient Truth to the Establishment

If you want to understand the legal profession, and why it does the things it does, or does not do the things you would expect it to, you have to look at more than just logic and justice. Although lawyers and their vendors are part of the justice system, and in theory law is a profession, not a business, lawyers and law firms are still part of the economy. With vendors that is crystal clear. Moreover, professionals or not, lawyers are all too human. To understand the legal profession you have to look not only at the lofty goals, which do drive many, if not most lawyers, you must also look at money and power. Like it or not, that is also the way it is. It is naive to pretend otherwise.

Why, for instance, hasn’t the legal profession already adopted predictive coding? It has been proven faster and cheaper, and has been approved by many courts. It has also been widely publicized. So, why are the vast majority of law firms sticking to linear review or keyword search. Yes, technophobia and ignorance are part of the answer, but only part. After all, lawyers as a group are very smart, albeit not usually tech-smart. To understand any human and organizational problem, including this, you must understand the money flows and the ego power-trips involved. Truth and justice are strong forces, but never underestimate the dark side, the strong motivators of fear and greed.

The Rand Report was brave and honest enough to point out that one of the major reasons law firms are reluctant to adopt predictive coding is law firm greed. Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery. Here is how the authors of the Rand Report, Nicholas M. Pace and Laura Zakaras, describe it at page 76:

Resistance of External Counsel. Another barrier to the widespread use of predictive coding could well be resistance to the idea of outside counsel motivated not so much by accuracy issues as by the potential loss of a historical revenue stream. Some interviewees reported grumblings from outside counsel when their companies decided to directly handle a fraction of the overall review process or to markedly reduce what was shipped out for review through the use of additional data processing.

The less is more principle in machine training is a threat to existing economic models in the legal profession, especially to contract review groups both in and outside of law firms. Although, as I will explain, the less is more principle does not mean the end of the contract review business, it does mean it will be scaled-down dramatically. Money will be lost. Empires destroyed. This will happen as predictive coding becomes more widely used, and properly used. Proper use requires consistency in training, which is where less is more comes in. The more reviewers, the more inconsistency and noise in the machine training, the lower the effectiveness.

Proper use also requires use of bona fide experts in the legal subjects and issues in the case. The necessity for SMEs is a related, but separate issue, well known to experts in search. I will address this important issue more extensively in another article. There is some interesting research going on in this area as law firms and software companies struggle with this requirement too.

Suffice it to say in this article that proper use of predictive coding technology requires the machine to be trained by bona fide SMEs, real experts with years of experience, not just inexperienced lawyers who will work cheaply. Once again certain vested interests that serve our profession are threatened by this truth. The review companies are rolling in money right now from the sweat of contract lawyers and will make a concerted effort to hide this truth, as well as the truth of less is more. The same can be said of many large and powerful law firms. Watch as storm troopers and P.R. machines are rolled out in the coming months to oppose and obfuscate this message and other truths that threaten the status quo. Understand the big picture of what is really going on here.

Do not be concerned about me, I have already endured many attacks. I expect more. Comes with the turf. But no matter what they do, I will not be stopped. More importantly, there many others who agree and feel as I do, many others out there boldly speaking the truth of technology, including generations of young lawyers lined up behind us. Together the defeat of the dark forces is inevitable. Do not be cowered by the strength of the big law firms and other vested interests. History is on your side. Speak truth to power. Many generations of brave Americans have sacrificed to give you these First Amendment rights. Use them. Speak out the next time a law firm wants to use non-SMEs, or worse, armies of contract lawyers to do machine training.

Most of the law firms and vendors in the world will not hear the call. They will go the obstructionist Luddite way, ignorant of the lessons of technology history. Most, but not all. A few law firms and review companies have already seen the writing on the wall. A few more will likely follow soon, rather than oppose us. They get it. They are already preparing for the next generation of technology workers. They are not wasting time trying to silence an inconvenient truth. They are embracing the disruptive technologies. They are positioning themselves to profit from the inevitable carnage of the existing systems.

Savants like Gregory Bufithis come to mind. He is the Founder and CEO of the well known The Posse List LLC and Project Counsel SCS, a contract lawyer staffing services company starting to work more and more with SMEs. Project Counsel serves Europe, the Persian Gulf, and the Asia-Pacific region as described in the YouTube video below. Greg, a former NY City intellectual property lawyer, now runs a growing empire from Brussels. He has already positioned his many companies for the coming changes, including Project Counsel.

Greg, who rarely seems to sleep, is at the same time making a deep-dive study of artificial intelligence and document review. I for one am looking forward to the release of his e-book IBM: a culture of innovation and analytics, which is nearing completion. Greg understands and is acting in accord with the Hacker Ethic: Fast, Bold, Impactful, Open and Social Values. Visionary entrepreneurs like Greg Bufithis are pushing change, not fighting it. They understand that what is an inconvenient truth for most can be an opportunity for some.

Many in the U.S. legal world, even though they mean well, are simply not as quick and forward thinking as Greg Bufithis. For example, I have now heard from several sources that many law firms experimenting with predictive coding software are doing so without the active leadership of SMEs. They instead continue their old systems. They use teams of reviewers to do the machine training. The reviewers may have had some training time with an SME, or read a manual, but that is it. The firms may also continue to keyword screen before they begin predictive coding because that’s the way they have always done things.

Then these same law firms wonder why predictive coding did not work very well for them! Not only are too many cooks spoiling the pot (the less is more principle), but none of the cooks are that good to begin with. You cannot crowd-source expertise, all you get is a general consensus and the lowest common denominator. That is not a meal I would want to eat. Yet too many clients today are given little choice but to eat such junk foods.

Current Shortage of Skilled Labor

For a predictive coding process to work, it must be done by small teams of SMEs, ideally one, that is, if you can find one with the time and availability to do it. That is another very real problem here. Quite often firms are unable to find any SME with the time and capability to do machine training, much less one with extensive search skills. Since they lack the talent to do predictive coding properly, these firms have no choice but to make do, and settle with whatever talent they have. They do not use SMEs for training, they use document reviewers with some level of supervision by knowledgeable attorneys who are not bona fide SMEs, but may have access to one. Most of the SMEs in most law firms are my age, which means generally clueless when it comes technology, much less advanced analytics.

If you are a law firm following this approach, do not be surprised if your predictive coding results are disappointing. The vendors software may be fine, but without the SME touch, you are just not using it right (despite all of the reassuring words the vendor may give you). Your results are poor because you have been hit by the GIGO problem, you have been derailed by inconsistent, all-too-often often wrong input.

That is why the less is more principle, coupled with the bona fide SME requirement, creates such an inconvenient truth. It is both a disruptive model to existing monetary interests and power structures, it is also a difficult model to follow. It is difficult primarily because of the current shortage of trained search experts who are also SMEs. There are simply not enough legal subject matter experts who are also able to use the software and have general search skills.

As discussed in LegalSearchScience.com, Professor Gary Marchionini, a founder of Human–computer information retrieval, and others, conclude that three types of expertise are needed for a successful information seeker: subject matter expertise, system expertise (here the particular predictive coding software used), and information seeking expertise (cognitive search skills based on both experience and innate talents). How many lawyers with such knowledge and skill sets are alive on the planet today? How many do you know, or have you heard of? I am talking hands-on reviewers here, not simply theorists or management. Five, ten, maybe twenty? There are certainly far more legal subjects requiring SMEs for machine training than there are qualified experts. This has got to change for the paradigm shift to kick in. Obviously there are vested interests opposed to such change. You know who they are.

This shortage of qualified SMEs is a known problem. I am working on this, and so are a few others. My approach is training, and also a kind of surrogate SME set up. I am still experimenting with this. It is a difficult problem, but I think I am almost ready to teach it, and may roll out a few new services based on it soon. Others are looking into software solutions with even higher error tolerance, and still others are working with extensive quality control procedures.

I know from the many young lawyers I have taught over the last seven years that this shortage of skilled labor is a temporary problem, a generational one. In ten years there should be an ample supply of capable SMEs, but for now it is a critical problem. This is a familiar societal problem, and is in no way unique to the legal profession. Technology has created this problem repeatedly over the last several generations, and society has always solved the problem with new generations of workers that somehow gain the necessary skills. Of course, you will not get it in most law schools. When it comes to technology most of our law schools are irrelevant. Many big law firms, the clueless ones, and those profiting from contract lawyers, have a vested interest in keeping it that way.

This shortage of skilled attorneys capable of accurate machine training is a problem for everyone who is trying to control e-discovery costs and still maintain quality. It is also a problem for e-discovery vendors trying to sell software tools that are too sophisticated for most law firms to use properly. Still, history tells us that this problem will be solved, the luddites and buggy whip manufacturers will be defeated. Now that the creative spark is out of the bottle, all that the dark establishment forces can do is delay the inevitable.

A Continuing Role for Contract Lawyers

The move to predictive coding will displace contract lawyers, especially as trained SMEs emerge in the coming years, but it will not replace them entirely. There is still a place for contract reviewers in final review of the documents identified as probable relevant and selected for possible production. Multiple reviewers can still be used for final relevancy determinations (with quality controls by SMEs), and for protection work (step 8 in the EDBP), just not for machine training in predictive coding. A team of reviewers will, for instance, work just fine in redaction and privilege logging work. The value of time savings of many reviewers, and availability of such reviewers, will still have a strong utilitarian function.

Of course, I am probably wrong about this in the long term. After ten years, or so, contract review lawyers who are not also SMEs may be about as prevalent as horse and buggies are today. As lawyers love to say at the end of long demand letters, PLEASE BE GOVERNED ACCORDINGLY.

________________

To be continued …. Stay tuned for Part Two where I will discuss the essential reasons that less is more in large scale document reviews and the scientific experiments that support this view.

This entry was posted on Sunday, November 24th, 2013 at 6:41 pm and is filed under Lawyers Duties, Metadata, Review, Search, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

19 Responses to Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One

Less Is More: When It Comes to Predictive Coding Training, the “Fewer Reviewers the Better” – Part One | @ComplexD says:

November 25, 2013 at 8:08 am

[…] Read the original article at: eDiscovery Team – Less Is More […]

Loading...

Reply
Kyle Simonson says:

November 25, 2013 at 5:28 pm

I’m not sure how you can train SMEs in a wide enough range of law to be just SMEs. Each case is usually different enough to require its own SME. If you are talking about a new type of position in law firms, even in really big firms (such as the firm I work for) there really isn’t enough work to require one person to get their hands around cases that involve predictive coding. And there are still enough open-ended aspects to litigation that would make it impossible for someone to become, say, an SME in class action FLSA cases, or product liability litigation.

Granted, search skills are something people can excel in. But that doesn’t make them an SME, at least not in my understanding of an SME.

Loading...

Reply
- Ralph Losey says:
  
  November 26, 2013 at 8:20 am
  
  There cannot be an SME in law in general. It is too complex. So we agree. After 34 years I’ve only managed to become an SME in about a dozen legal subject matters, among the hundreds out there. Also, in general I don’t consider my expertise in search to be a subject matter of law. SME refers to substantive areas of law. So, unless the issue in a case is the reasonability or not of search efforts, perhaps in connection with a sanctions issue, search is not a subject matter. it is more like a procedure, and to some extent, a field of science. So I think we pretty much agree on that too.
  
  As to the not enough work to just do predictive coding. That will all change soon, is changing now. Technology cannot be stopped.
  
  The SME solution seems to be a kind of surrogate action, which is what I’m working on now. The greater experience you have in the law (advantage to old lawyers), the easier it is for you to get the necessary knowledge from SMEs, who typically have no search skills, nor software skills. The e-discovery lawyer of tomorrow will have to be what we used to call three-letter-men. The will have to have software skills, general search skills, and SME skills. The later requirement will either be fulfilled personally – they are actually an SME in the subject – or by some sort of surrogate actions for a bona fide SME. I try to exemplify this in the triangle diagram below. It is my hope that it will not be as impossible to achieve as the Penrose triangle itself!
  
  Loading...
  
  Reply
Jeremy Pickens says:

November 25, 2013 at 5:55 pm

The main support for these “army of one” claims seem to be that the biggest bugaboo for CAR training is consistency.

We could argue (and hopefully with data) what effect consistency has on the entire CAR process, but I’d like to offer a different take on this whole discussion: The Long Tail.

I remain less convinced that a small amount of training will be enough to deal with the long tail, especially if there are interesting probative documents in that tail. And by small amount of training, I mean the amount of training that can be done with just one person, no matter whether that person is an SME or not.

This has less to do with whether or not the SME is getting the judgments “right” or “consistent”, and more to do with the fact that any single one person, whether or not that person is an SME, simply can’t get to the diversity and variety that a larger team of people can.

And my sense is that diversity is much more important when it comes to finding the probative documents.

Loading...

Reply
Georgetown Panel Addresses Potential Rule Changes and More E-Discovery News of the Week | E-Discovery Beat says:

November 25, 2013 at 7:20 pm

[…] e-Discovery Team: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – … […]

Loading...

Reply
Greg Fordham says:

November 25, 2013 at 8:57 pm

Ralph,

There are all kinds of technologies that lawyers could use to do e-discovery more effectively and more economically but as you point out that might diminish their revenue stream.

Predictive coding is quite a weapon. Nonetheless, there are many things that predictive coding will not do well like documents with substantial numerical data, documents will very little textual data, and of course image documents just to name a few.

The economy benefits that are often touted about predictive coding are over hyped. The amazing cost savings are typically when it is compared to the absurdly stupid manual review and not other computer assisted review and search techniques.

Your point that less is more is absolutely correct and not surprising at all either. It is entirely consistent with findings of other researchers about the inconsistency of document review by human reviewers. The problem will not be solved by predictive coding even when one reviewer is used because the other side’s reviewer will likely have an entirely different “subjective” evaluation of relevance or responsiveness. In fact, it could be argued just as one of your previous posters has done, that many reviewers could expand the relevancy range of the training sample such that it would likely have a greater chance of including documents believed relevant by the opposing side.

If slight differences could trigger motions to compel, predictive coding will not reduce that prospect. In fact, it might actually increase it for several reasons. First, there is no single technology or algorithm being used by predictive coding vendors. Thus, if the two sides have different algorithms those could produce different result sets even though there is likely to still be considerable overlap in the results. Also, predictive coding is sample based. So, not only is there possibility that the actual result is somewhere within the stated precision and confidence level, there is also the chance of sampling error where the actual result is not at all similar to what has been produced.

I find it hard to believe that the same people that can’t get keyword search technology to work properly have the discipline or the statistical sampling proficiency to do well with predictive coding. A big concern should be that giving any technology, much less predictive coding, to lawyers is like giving a machine gun to an aborigine—they are likely to use it as a club and that is if they don’t shoot themselves and everyone else around them first. Shooting themselves maybe exactly what they do with predictive coding because once the opposition takes apart the other side’s failures they will be able to use that same data to compute a scientific measure of incompetence.

Loading...

Reply
- Ralph Losey says:
  
  November 26, 2013 at 7:26 am
  
  Aborigines with machine guns is a striking image. Clever, although, IMO, unnecessarily insulting to us lawyers, who are otherwise a fairly sophisticated and intelligent group and can learn when given proper training. (Same could be said of aborigines too, I might add.) Still, aside from the attitude thing, most of your points are valid.
  
  I prefer the traditional American John Henry image. The tech shift behind this classic American myth has now happened in history many, many times. In the John Henry story plenty of people (the techs of their day – late 1800s) eventually learned to use a steam hammer, and the muscle men (John Henry foremost among them) who were hammering the rail spikes, were all put out of business. The manual laborers were replaced by men with tech skills enough to use the new technologies, in this case, steam hammers. Same is true with lawyers and predictive coding and other advanced tech. The answer is training; learning how to use the tools. It will happen. Just a matter of time, and historic inevitability, and a condescending attitude towards lawyers will not help. (Still, I know where you are coming from. We lawyers can be frustrating to deal with!)
  
  If I had been alive in John Henry’s day I would have tried to teach John to use a steam hammer, then perhaps he would not have died trying to stop the inevitable tide of technological advancement. What a great “steel driving man” he would then have been!
  
  As to the different conceptions of relevance, like any experienced attorney, I’m very aware of this problem. All lawyers are, even the tech-virgins. Having spent over thirty years as a practicing attorney, I know full well the differences in conception of relevance. I also know these differences get resolved, although sometimes not until trial. Please study my methods. You’ll see the predictive coding process, when done properly, begins with “relevancy dialogues“, which I also call “ESI Discovery Communications,” and if need be court rulings where agreement is not achieved. Routine. Same process in all legal search methods, linear, keyword, whatever. Nothing new here. Yes, the differences will get leveraged and magnified in the extrapolation process, but same is true for other methods that are otherwise light-yrs behind in accuracy and ahead in cost. Bottom line, you are correct that this issue should be addressed up front. That is why I make that step number one in my eight-step process.
  
  Loading...
  
  Reply
Dan says:

December 1, 2013 at 10:44 pm

Ralph:

I will have to disagree with your assessment that “one person” is the ideal number of people for predictive coding. The problem is that people can contradict each other (differences in relevancy judgements) (See Vorhee’s 1999) and many predictive coding teams don’t know how to handle this.

If two reviewers only agree 40-50% of the time, then given two or three equally motivated and equally educated reviewers, which one do you go with? Your solution seems to be – it doesn’t matter, as long as you go with one instead of both or all three.

Therefore, rather than trying to reconcile different tagging by different reviewers, your recommendation simply avoids the problem. This, however, creates a new problem.

The new problem is the uncertainty that you chose reviewer #1, but reviewer #2 might have made better decisions. If not on all of the documents, then certainly on some of the documents.

If you could solve the inconsistency issue, however, then you COULD use two, or three, or four, or more reviewers and thereby your training could go faster, subsequent privilege review could go faster, without compromising accuracy. In fact, you could improve accuracy. You could get closer to the ephemeral “gold standard” of “the truth”.

Well, fortunately there is a way to reconcile the decisions of multiple reviewers in a way that resolves collisions and reinforces correctness. We developed it and it works. Therefore you no longer have to avoid the problem of relevancy decisions collisions and you can effectively work with larger TAR teams.

Human review is not bad. Moreover, every technology currently in use relies upon it. The management of human review, however, has been historically bad. That’s the problem that needs to be solved.

Just a little food for thought,

Dan

Loading...

Reply
Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Two | e-Discovery Team ® says:

December 2, 2013 at 10:25 am

[…] is part-two of a three-part blog, so please read Part One […]

Loading...

Reply
Legal Search Science | e-Discovery Team ® says:

December 2, 2013 at 10:44 am

[…] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, and […]

Loading...

Reply
Are Subject Matter Experts Really Required for TAR Training? (A Follow-Up on TAR 2.0 Experts vs. Review Teams) says:

December 4, 2013 at 1:01 pm

[…] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better&#82….” He argues that one SME is the “gold standard” for the […]

Loading...

Reply
Are Subject Matter Experts Really Required for TAR Training? | @ComplexD says:

December 5, 2013 at 6:42 pm

[…] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better’ – ….” He argues that one SME is the “gold standard” for the […]

Loading...

Reply
Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three | e-Discovery Team ® says:

December 8, 2013 at 6:29 pm

[…] is part-three of a three-part blog, so please read Part One and Part […]

Loading...

Reply
Less is More: In Predictive Coding, Fewer Reviewers Are Better | @ComplexD says:

December 16, 2013 at 7:07 am

[…] is part-three of a three-part blog, so please read Part One and Part […]

Loading...

Reply
Annotator error and predictive reliability « Evaluating E-Discovery says:

December 26, 2013 at 3:29 pm

[…] like to know what the effect of these inconsistencies might be. Ralph Losey advocates the use of a single subject matter expert for training, while Esuli and Sebastiani report alarming falls in classifier effectiveness with even mild […]

Loading...

Reply
Top Ten e-Discovery Predictions for 2014 | e-Discovery Team ® says:

January 1, 2014 at 11:18 am

[…] prove how inconsistent human reviewers are, even when using search experts. See Less Is More, parts One, Two and Three. They still try to fix the old methods, and try to use human reviewers to measure […]

Loading...

Reply
Beware of the TAR Pits! – Part One | e-Discovery Team ® says:

February 16, 2014 at 4:31 pm

[…] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, and Three; and, Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be […]

Loading...

Reply
Beware of the TAR Pits! – Part Two | e-Discovery Team ® says:

February 23, 2014 at 6:42 pm

[…] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, […]

Loading...

Reply
Are Subject Matter Experts Really Required for TAR Training? (A Follow-Up on TAR 2.0 Experts vs. Review Teams) | Catalyst E-Discovery Search BlogCatalyst E-Discovery Search Blog says:

July 20, 2014 at 3:09 pm

[…] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better&#82….” He argues that one SME is the “gold standard” for the […]

Loading...

Reply