19 Responses to Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One

  1. Kyle Simonson says:

    I’m not sure how you can train SMEs in a wide enough range of law to be just SMEs. Each case is usually different enough to require its own SME. If you are talking about a new type of position in law firms, even in really big firms (such as the firm I work for) there really isn’t enough work to require one person to get their hands around cases that involve predictive coding. And there are still enough open-ended aspects to litigation that would make it impossible for someone to become, say, an SME in class action FLSA cases, or product liability litigation.

    Granted, search skills are something people can excel in. But that doesn’t make them an SME, at least not in my understanding of an SME.

    • Ralph Losey says:

      There cannot be an SME in law in general. It is too complex. So we agree. After 34 years I’ve only managed to become an SME in about a dozen legal subject matters, among the hundreds out there. Also, in general I don’t consider my expertise in search to be a subject matter of law. SME refers to substantive areas of law. So, unless the issue in a case is the reasonability or not of search efforts, perhaps in connection with a sanctions issue, search is not a subject matter. it is more like a procedure, and to some extent, a field of science. So I think we pretty much agree on that too.

      As to the not enough work to just do predictive coding. That will all change soon, is changing now. Technology cannot be stopped.

      The SME solution seems to be a kind of surrogate action, which is what I’m working on now. The greater experience you have in the law (advantage to old lawyers), the easier it is for you to get the necessary knowledge from SMEs, who typically have no search skills, nor software skills. The e-discovery lawyer of tomorrow will have to be what we used to call three-letter-men. The will have to have software skills, general search skills, and SME skills. The later requirement will either be fulfilled personally – they are actually an SME in the subject – or by some sort of surrogate actions for a bona fide SME. I try to exemplify this in the triangle diagram below. It is my hope that it will not be as impossible to achieve as the Penrose triangle itself!

  2. Jeremy Pickens says:

    The main support for these “army of one” claims seem to be that the biggest bugaboo for CAR training is consistency.

    We could argue (and hopefully with data) what effect consistency has on the entire CAR process, but I’d like to offer a different take on this whole discussion: The Long Tail.

    I remain less convinced that a small amount of training will be enough to deal with the long tail, especially if there are interesting probative documents in that tail. And by small amount of training, I mean the amount of training that can be done with just one person, no matter whether that person is an SME or not.

    This has less to do with whether or not the SME is getting the judgments “right” or “consistent”, and more to do with the fact that any single one person, whether or not that person is an SME, simply can’t get to the diversity and variety that a larger team of people can.

    And my sense is that diversity is much more important when it comes to finding the probative documents.

  3. […] e-Discovery Team: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – … […]

  4. Greg Fordham says:


    There are all kinds of technologies that lawyers could use to do e-discovery more effectively and more economically but as you point out that might diminish their revenue stream.

    Predictive coding is quite a weapon. Nonetheless, there are many things that predictive coding will not do well like documents with substantial numerical data, documents will very little textual data, and of course image documents just to name a few.

    The economy benefits that are often touted about predictive coding are over hyped. The amazing cost savings are typically when it is compared to the absurdly stupid manual review and not other computer assisted review and search techniques.

    Your point that less is more is absolutely correct and not surprising at all either. It is entirely consistent with findings of other researchers about the inconsistency of document review by human reviewers. The problem will not be solved by predictive coding even when one reviewer is used because the other side’s reviewer will likely have an entirely different “subjective” evaluation of relevance or responsiveness. In fact, it could be argued just as one of your previous posters has done, that many reviewers could expand the relevancy range of the training sample such that it would likely have a greater chance of including documents believed relevant by the opposing side.

    If slight differences could trigger motions to compel, predictive coding will not reduce that prospect. In fact, it might actually increase it for several reasons. First, there is no single technology or algorithm being used by predictive coding vendors. Thus, if the two sides have different algorithms those could produce different result sets even though there is likely to still be considerable overlap in the results. Also, predictive coding is sample based. So, not only is there possibility that the actual result is somewhere within the stated precision and confidence level, there is also the chance of sampling error where the actual result is not at all similar to what has been produced.

    I find it hard to believe that the same people that can’t get keyword search technology to work properly have the discipline or the statistical sampling proficiency to do well with predictive coding. A big concern should be that giving any technology, much less predictive coding, to lawyers is like giving a machine gun to an aborigine—they are likely to use it as a club and that is if they don’t shoot themselves and everyone else around them first. Shooting themselves maybe exactly what they do with predictive coding because once the opposition takes apart the other side’s failures they will be able to use that same data to compute a scientific measure of incompetence.

    • Ralph Losey says:

      Aborigines with machine guns is a striking image. Clever, although, IMO, unnecessarily insulting to us lawyers, who are otherwise a fairly sophisticated and intelligent group and can learn when given proper training. (Same could be said of aborigines too, I might add.) Still, aside from the attitude thing, most of your points are valid.

      I prefer the traditional American John Henry image. The tech shift behind this classic American myth has now happened in history many, many times. In the John Henry story plenty of people (the techs of their day – late 1800s) eventually learned to use a steam hammer, and the muscle men (John Henry foremost among them) who were hammering the rail spikes, were all put out of business. The manual laborers were replaced by men with tech skills enough to use the new technologies, in this case, steam hammers. Same is true with lawyers and predictive coding and other advanced tech. The answer is training; learning how to use the tools. It will happen. Just a matter of time, and historic inevitability, and a condescending attitude towards lawyers will not help. (Still, I know where you are coming from. We lawyers can be frustrating to deal with!)

      If I had been alive in John Henry’s day I would have tried to teach John to use a steam hammer, then perhaps he would not have died trying to stop the inevitable tide of technological advancement. What a great “steel driving man” he would then have been!

      As to the different conceptions of relevance, like any experienced attorney, I’m very aware of this problem. All lawyers are, even the tech-virgins. Having spent over thirty years as a practicing attorney, I know full well the differences in conception of relevance. I also know these differences get resolved, although sometimes not until trial. Please study my methods. You’ll see the predictive coding process, when done properly, begins with “relevancy dialogues“, which I also call “ESI Discovery Communications,” and if need be court rulings where agreement is not achieved. Routine. Same process in all legal search methods, linear, keyword, whatever. Nothing new here. Yes, the differences will get leveraged and magnified in the extrapolation process, but same is true for other methods that are otherwise light-yrs behind in accuracy and ahead in cost. Bottom line, you are correct that this issue should be addressed up front. That is why I make that step number one in my eight-step process.

  5. Dan says:


    I will have to disagree with your assessment that “one person” is the ideal number of people for predictive coding. The problem is that people can contradict each other (differences in relevancy judgements) (See Vorhee’s 1999) and many predictive coding teams don’t know how to handle this.

    If two reviewers only agree 40-50% of the time, then given two or three equally motivated and equally educated reviewers, which one do you go with? Your solution seems to be – it doesn’t matter, as long as you go with one instead of both or all three.

    Therefore, rather than trying to reconcile different tagging by different reviewers, your recommendation simply avoids the problem. This, however, creates a new problem.

    The new problem is the uncertainty that you chose reviewer #1, but reviewer #2 might have made better decisions. If not on all of the documents, then certainly on some of the documents.

    If you could solve the inconsistency issue, however, then you COULD use two, or three, or four, or more reviewers and thereby your training could go faster, subsequent privilege review could go faster, without compromising accuracy. In fact, you could improve accuracy. You could get closer to the ephemeral “gold standard” of “the truth”.

    Well, fortunately there is a way to reconcile the decisions of multiple reviewers in a way that resolves collisions and reinforces correctness. We developed it and it works. Therefore you no longer have to avoid the problem of relevancy decisions collisions and you can effectively work with larger TAR teams.

    Human review is not bad. Moreover, every technology currently in use relies upon it. The management of human review, however, has been historically bad. That’s the problem that needs to be solved.

    Just a little food for thought,


  6. […] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, and […]

  7. […] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better&#82….” He argues that one SME is the “gold standard” for the […]

  8. […] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better’ – ….” He argues that one SME is the “gold standard” for the […]

  9. […] is part-three of a three-part blog, so please read Part One and Part […]

  10. […] like to know what the effect of these inconsistencies might be. Ralph Losey advocates the use of a single subject matter expert for training, while Esuli and Sebastiani report alarming falls in classifier effectiveness with even mild […]

  11. […] prove how inconsistent human reviewers are, even when using search experts. See Less Is More, parts One, Two and Three. They still try to fix the old methods, and try to use human reviewers to measure […]

  12. […] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, and Three; and, Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be […]

  13. […] More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts One, Two, […]

  14. […] to a successful ranking process. Indeed, Ralph Losey made that case eloquently in his blog post: “Less Is More: When it comes to predictive coding training, the ‘fewer reviewers the better&#82….” He argues that one SME is the “gold standard” for the […]

Leave a Reply