Concept Drift and Consistency: Two Keys To Document Review Quality – Part Three

This is Part Three of this blog. Please read Part One and Part Two first.

Mitigating Factors to Human Inconsistency

Bob_DylanWhen you consider all of the classifications of documents, both relevant and irrelevant, my consistency rate in the two ENRON reviews jumps to about 99% (01% inconsistent). Compare this with the Grossman Cormack study of the 2009 TREC experiments, where agreement on all non-relevant adjudications, assuming all non-appealed decisions were correct, was 97.4 percent (2.6% inconsistent). My guess is that most well run CAR review projects today are in fact attaining overall high consistency rates. The existing technologies for duplication, similarity, concept and predictive ranking are very good, especially when all used together. When you consider both relevant and irrelevant coding, it should be in the 90s for sure, probably the high nineties. Hopefully, by using todays’ improved software and the latest, fairly simple 8-step methods, we can reduce the relevance inconsistency problem even further. Further scientific research is, however, needed test these hopes and suppositions. My results in the Enron studies could be black swan, but I doubt it. I think my inconsistency is consistent.

ei-recall_sphereEven though overall inconsistencies may be small, the much higher inconsistencies in relevance calls alone remains a continuing problem. It is a fact of life of all human document review as Voorhees showed years ago. The inconsistency problem must continue to be addressed by a variety of ongoing quality controls, including the use of predictive ranking, and including post hoc quality assurance tests such as ei-Recall. The research to date shows that duplicate, similarity and predictive coding ranking searches can help mitigate the inconsistency problem (the overlap has increased from the 30% range, to the 70% range), but not eliminate them entirely. By 2012 I was able to use these features to get the relevant-only disagreement rates down to 23%, and even then, the 63 inconsistently coded relevant documents were all unimportant. I suspect, but do not know, that my rates are now lower with improved quality controls, but do not know that. Again, further research is required before any blanket statements like that can be made authoritatively.

Our quest for quality legal search requires that we keep the natural human weakness of inconsistency front and center. Only computers are perfectly consistent. To help keep the human reviewers as consistent as possible, and so mitigate any damages that inconsistent coding may cause, a whole panoply of quality control and quality assurance methods should be used, not just improved search methods. See eg: ZeroErrorNumerics.com.

ZenB_transparent

The Zero Error Numerics (ZEN) quality methods include:

  • UpSide_down_champagne_glasspredictive coding analytics, a type of artificial intelligence, actively managed by skilled human analysts in a hybrid approach;
  • data visualizations with metrics to monitor progress;
  • flow-state of human reviewer concentration and interaction with AI processes;
  • quiet, uninterrupted, single-minded focus (dual tasking during review is prohibited);
  • disciplined adherence to a scientifically proven set of search and review methods including linear, keyword, similarity, concept, and predictive coding;
  • repeated tests for errors, especially retrieval omissions;
  • objective measurements of recall, precision and accuracy ranges;
  • judgmental and random sampling and analysis such as ei-Recall;
  • active project management and review-lawyer supervision;
  • small team approach with AI leverage, instead of large numbers of reviewers;
  • quality_trianglerecognition that mere relevant is irrelevant;
  • recognition of the importance of simplicity under the 7±2 rule;
  • multiple fail-safe systems for error detection of all kinds, including reviewer inconsistencies;
  • use of only the highest quality, tested e-discovery software and vendor teamsunder close supervision and teamwork;
  • use of only experienced, knowledgeable Subject Matter Experts for relevancy guidance, either directly or by close consultation;
  • extreme care taken to protect client confidentiality; and,
  • high ethics – our goal is to find and disclose the truth in compliance with local laws, not win a particular case.

That is my quality play book. No doubt others have come up with their own methods.

Conclusion

quality_compassHigh quality effective legal search depends in part on recognition of the common document review phenomena of concept shift and inconsistent classifications. Although you want to avoid inconsistencies, concept drift is a good thing. It should appear in all complex review projects. Think Bob Dylan – He not busy being born is busy dying. Moreover, you should have a standard protocol in place to both encourage and efficiently deal with such changes in relevance conception. If coding does not evolve, if relevance conceptions do not shift by conversations and analysis, there could be a quality issue. It is a warning flag and you should at least investigate.

race_car_warning_flagVery few projects go in a straight line known from the beginning. Most reviews are not like a simple drag race. There are many curves. If you do not see a curve in the road, and you keep going straight, a spectacular wreck can result. You could fly off the track. This can happen all too easily if the SME in charge of defining relevance has lost track of what the reviewers are doing. You have to keep your eyes on the road and your hands on the wheel.

NASCAR-Driver that looks like Losey

Good drivers of CARs – Computer Assisted Reviews – can see the curves. They expect them, even when driving a new course. When they come to a curve, they are not surprised, they know how to speed through the curves. They can do a power drift through any corner. Change in relevance should not be a speed-bump. It should be an opportunity to do a controlled skid, an exciting drift with tires burning. Race_car_drift_cornerSpeed drifts help keep a document review interesting, even fun, much like a race track. If you are not having a good time with large scale document review, then you are obviously doing something wrong. You may be driving an old car using the wrong methods. See: Why I Love Predictive Coding: Making document review fun with Mr. EDR and Predictive Coding 3.0.

quality_diceConcept shift makes it harder than ever to maintain consistency. When the contours of relevance are changing, at least somewhat, as they should, then you have to be careful and be sure all of your prior codings are redone and made consistent with the latest understanding. Your third step of a baseline random sample should, for instance, be constantly revisited. All of the prior codings should be corrected to be consistent with the latest thinking. Otherwise your prevalence estimate could be way off, and with it all of your rough estimates of recall. The concern with consistency may slow you down a bit, and make the project cost a little more, but the benefits in quality are well worth it.

If you are foolish enough to still use secret control sets, you will not be able to make these changes at all. When the drift hits, as it almost always does, your recall and precision reports based on this control set will be completely worthless. Worse, if the driver does not know this, they will be mislead by the software reports of precision and recall based on the secret control set. That is one reason I am so adamantly opposed to the use of secret control set and have called for all software manufacturers to remove them. See Predictive Coding 3.0 article, part one.

If you do not go back and correct for changes in conception, then you risk withholding a relevant document that you initially coded irrelevant. It could be an important document. There is also the chance that the inconsistent classifications can impact the active machine learning by confusing the algorithmic classifier. Good predictive coding software can handle some errors, but you may slow things down, or if it is extreme, mess them up entirely. Quality controls of all kinds are needed to prevent that.

Less_More_RalphAll types of quality controls are needed to address the inevitability of errors in reviewer classifications. Humans, even lawyers, will make some mistakes from time to time. We should expect that and allow for it in the process. Use of duplicate and near-duplicate guides, email strings, and other similarity searches, concept searches and probability rankings can mitigate against that fact that no human will ever attain perfect machine like consistency. So too can a variety of additional quality control measures, primary among them being the use of as few human reviewers as possible. This is in accord with the general review principle that I call less is more. See: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part One and Part Two. That is not a problem if you are driving a good CAR, one with the latest predictive coding search engines. More than a couple of reviewers in a CAR like that will just slow you down. But it’s alright, Ma, it’s life, and life only.

________________

Since I invoked the great Bob Dylan and It’s Alright, Ma earlier in this blog, I thought I owed it to you share the full lyrics, plus a video of young Bob’s performance. It could be his all time best song-poem. What do you think? Feeling very creative, leave a poem below that paraphrases Dylan to make one of the points in this blog

______________________

 “It’s Alright, Ma (I’m Only Bleeding)”

Bob Dylan as a young man

Bob Dylan

Darkness at the break of noon
Shadows even the silver spoon
The handmade blade, the child’s balloon
Eclipses both the sun and moon
To understand you know too soon
There is no sense in trying.
Pointed threats, they bluff with scorn
Suicide remarks are torn
From the fools gold mouthpiece
The hollow horn plays wasted words
Proved to warn
That he not busy being born
Is busy dying.
Temptation’s page flies out the door
You follow, find yourself at war
Watch waterfalls of pity roar
You feel to moan but unlike before
You discover
That you’d just be
One more person crying.
So don’t fear if you hear
A foreign sound to you ear
It’s alright, Ma, I’m only sighing.
As some warn victory, some downfall
Private reasons great or small
Can be seen in the eyes of those that call
To make all that should be killed to crawl
While others say don’t hate nothing at all
Except hatred.
Disillusioned words like bullets bark
As human gods aim for their marks
Made everything from toy guns that sparks
To flesh-colored Christs that glow in the dark
It’s easy to see without looking too far
That not much
Is really sacred.
While preachers preach of evil fates
Teachers teach that knowledge waits
Can lead to hundred-dollar plates
Goodness hides behind its gates
But even the President of the United States
Sometimes must have
To stand naked.
An’ though the rules of the road have been lodged
It’s only people’s games that you got to dodge
And it’s alright, Ma, I can make it.
Advertising signs that con you
Into thinking you’re the one
That can do what’s never been done
That can win what’s never been won
Meantime life outside goes on
All around you.
You loose yourself, you reappear
You suddenly find you got nothing to fear
Alone you stand without nobody near
When a trembling distant voice, unclear
Startles your sleeping ears to hear
That somebody thinks
They really found you.
A question in your nerves is lit
Yet you know there is no answer fit to satisfy
Insure you not to quit
To keep it in your mind and not forget
That it is not he or she or them or it
That you belong to.
Although the masters make the rules
For the wise men and the fools
I got nothing, Ma, to live up to.
For them that must obey authority
That they do not respect in any degree
Who despite their jobs, their destinies
Speak jealously of them that are free
Cultivate their flowers to be
Nothing more than something
They invest in.
While some on principles baptized
To strict party platforms ties
Social clubs in drag disguise
Outsiders they can freely criticize
Tell nothing except who to idolize
And then say God Bless him.
While one who sings with his tongue on fire
Gargles in the rat race choir
Bent out of shape from society’s pliers
Cares not to come up any higher
But rather get you down in the hole
That he’s in.
But I mean no harm nor put fault
On anyone that lives in a vault
But it’s alright, Ma, if I can’t please him.
Old lady judges, watch people in pairs
Limited in sex, they dare
To push fake morals, insult and stare
While money doesn’t talk, it swears
Obscenity, who really cares
Propaganda, all is phony.
While them that defend what they cannot see
With a killer’s pride, security
It blows the minds most bitterly
For them that think death’s honesty
Won’t fall upon them naturally
Life sometimes
Must get lonely.
My eyes collide head-on with stuffed graveyards
False gods, I scuff
At pettiness which plays so rough
Walk upside-down inside handcuffs
Kick my legs to crash it off
Say okay, I have had enough
What else can you show me?
And if my thought-dreams could been seen
They’d probably put my head in a guillotine
But it’s alright, Ma, it’s life, and life only.

 

6 Responses to Concept Drift and Consistency: Two Keys To Document Review Quality – Part Three

  1. Ralph Losey says:

    My eyes collide head-on with emails, texts
    One by one document relevance I code
    Grey area words masking false nodes
    No AI cause ancient lawyers in charge eschew
    Leaving me a life of reviewer drudgery
    Enough already with batching and linear review
    What else can you show me?

  2. Ralph Losey says:

    Come on friends, surely you can do better than I have at invoking Dylan.

  3. Tony Reichenberger says:

    The e-discovery world it is exploding
    Volumes rising, data loading
    Frustrations arise, images not downloading
    You don’t believe in TAR but the reviewers’ coding
    Are causing you to feel like this project’s imploding

    But you tell me
    Over and over and over again my friend
    Ah, you don’t believe
    That TAR can help your production

    Don’t you understand what I’m tryin’ to say
    Can’t you feel the answers I’m tellin’ today?
    When a button is pushed, it’s all child’s play
    All the docs that you crave are all on the way
    Take a look around you- OK, and take back your Friday

    But you tell me
    Over and over and over again my friend
    Ah, you don’t believe
    That TAR can help your production

    –With all apologies to Bob Dylan and Barry McGuire.

  4. […] What is the target? What is the information need? What documents are relevant? What would a hot document look like? A common understanding of relevance by a review team, of what you are looking for, requires a lot of talk. Silent review projects are doomed to failure. They tend to stagnate and do not enjoy the benefits of Concept Drift, where a team’s understanding of relevance is refined and evolves as the review progresses. Yes, the target may move, and that is a good thing. See: Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three. […]

  5. […] What is the target? What is the information need? What documents are relevant? What would a hot document look like? A common understanding of relevance by a review team, of what you are looking for, requires a lot of talk. Silent review projects are doomed to failure. They tend to stagnate and do not enjoy the benefits of Concept Drift, where a team’s understanding of relevance is refined and evolves as the review progresses. Yes, the target may move, and that is a good thing. See: Concept Drift and Consistency: Two Keys To Document Review Quality – Parts One, Two and Three. […]

%d bloggers like this: