A Buddhist AI – “The DUDE” – Explains the Eight Steps of Hybrid Multimodal Document Review with Help from a Human Lawyer

January 19, 2023

Written by Chat GPT 3.5, partly in “Comic Hipster Style” with Ralph Losey

Losey to Chat: “Use a comedic hipster writing style to explain my eight steps of electronic document review.” Chat GPT 3.5 replies:

“Yo dude, imagine like, you’re on this journey of enlightenment and you’re trying to find your inner peace and all that jazz. But instead of meditating in a cave, you’re sorting through mountains of documents, trying to find the ones that will bring you to legal nirvana.

It’s like the Eightfold Doc Review Path, man.

First, you have to have a little chit-chat with your homies about what’s relevant and what’s not. (“right speech”)

Then, you use all the fancy tech tools at your disposal to sift through the documents like a boss. (“right livelihood”)

You take a random sample, just to get a feel for the lay of the land. (“right mindfulness”)

Then you train that machine learning algorithm like a Zen master (“right resolve”), until it’s ranking documents like a pro. (“right efforts”)

Then, you let the attorneys have a look-see, and make sure everything’s on point. (“right view”)

Then, you apply some Zen-like QC to make sure there’s no errors or inconsistencies. (“right concentration”)

And finally, you produce those relevant documents like a true compliance ninja. (“right action”)

It’s like achieving legal enlightenment, but better, because you don’t have to sit on a hard floor for hours.

That’s Right, Man.

The Dude Bowls Over Losey with his Doc Review Summary

You don’t have to sit on a hard floor for hours!

Image is GPT Generated


Leaving Hipster Style Humor Language and Asking Chat GPT to further explain the Eight Steps using an Authoritative Writing Style and Ralph’s Graphics.

The Eight Steps of Electronic Document Review

1. Right Talk (Step 1 – Relevance Dialogues): Talk is the first step in the electronic document review process. It involves having relevance dialogues between the trial counsel and other parties to understand the scope of the review and to identify the relevant documents.

Communication with the ESI requesting party’s attorney is required by Rule 26(f), and Rule 1 to: a) discuss and clarify scope of relevance; b) provide a brief overview of past, and proposed e-discovery efforts, i.e. the Discovery Plan; c) advise of intent to use a multimodal search process for computer-assisted review and proportional, phased productions; d) discuss plan to review or ignore any non-text documents; and, e) you may also wish to invite input from the requesting party (depends).

2. Right ECA (Step 2 – Early Case Assessment using all methods): Early Case Assessment (ECA) is the second step in the process, in which all the methods are used to identify and classify the documents. This includes using active machine learning, keyword search, focused linear search, and other methods to identify potentially relevant documents.

In this critical second step you primarily use your professional judgment to begin to search for key documents, ones that you hope may be fairly easy to find, aka, the “low hanging fruit.” This is called Judgmental sampling. It is not a random sample. It is a selection of a sample documents for review by use of attorney judgment, case knowledge, and various types of searches using a hybrid multimodal process. Coding of documents begins in this step, including those already known.

The four types of search here are:

Active Machine Learning (aka Predictive Coding): Active Machine Learning is a method of using computer algorithms to automatically identify and classify documents based on their relevance to a legal matter. It involves training a machine learning model using a set of manually coded documents, and then using that model to classify the remaining documents. It is a form of active learning because the machine is actively trying to classify the documents.

Concept & Similarity Searches (aka Passive Learning): Concept & Similarity Searches are another form of machine learning that is used to identify and classify documents. Unlike Active Machine Learning, which is trained on a set of manually coded documents, Concept & Similarity Searches rely on the machine to identify similarities between documents and group them together based on those similarities. This is considered a passive form of machine learning because the machine is not actively trying to classify the documents.

Keyword Search (Tested, Boolean, Parametric): Keyword search is a method of searching for documents that contain specific words or phrases. It can be done using tested, Boolean or parametric search methods. Tested search methods are based on a set of predefined keywords, Boolean search allows the use of logical operators (and, or, not) to combine keywords and parametric search allows to use variables. This method is useful when searching for a specific set of documents, but it can also be less effective if the keywords used are not specific enough.

Focused Linear Search (Key Dates & People): Focused Linear Search is a method of searching for documents by looking at specific dates or people related to the case. It can be used to narrow down the search to a specific time period or group of people, which can help to identify relevant documents more quickly.


Here I asked the hipster Dude style of Chat GTP writing to come back. I wanted to hear The Dude’s explanation of these four types of search.

The Dude Explains Search
Image is GPT Generated

Alright “The Dude” explains in a cool, music related, hipster way:

  1. Active Machine Learning, or Predictive Coding, is like getting a personalized playlist from your cool music app. It’s like, you know, “The Dude’s got a good taste in music, I trust him to know what I’ll like.”
  2. Concept & Similarity Searches, or Passive Learning, is like browsing the record store for new music. It’s like, “The Dude’s just casually browsing, but he’s still finding some hidden gems.”
  3. Keyword Search is like going to the record store with a specific album in mind. It’s like, “The Dude knows exactly what he’s looking for, and he’s not leaving until he finds it.”
  4. Focused Linear Search is like thumbing through your collection of vinyls one by one. It’s like, “The Dude’s taking his time, going through every single album he’s got, making sure he doesn’t miss a thing.”

So, in summary, The Dude’s got different methods to find that perfect legal document, whether it’s by trusting the algorithm, casually browsing, searching for something specific or by taking his time going through everything.

End Dude. Putting Chat EPT 3.5 back to normal authoritative style.

3. Right Random (Step 3 – Prevalence Range Estimate, not control sets): In this step, a random sample of the documents is chosen to provide an estimate of the prevalence of relevant documents within the entire set. This is done by extracting a random sample of documents, and reviewing them to estimate the percentage of relevant documents in the entire set, aka, the Prevalence. This Prevalence percentage calculation is always a range of numbers based on statistical calculation of probability. This is an important quality control metric and review guide. The random sample Prevalence calculation also provides evidence of proportional efforts and Recall. The quality and persuasive value of this evidence depends in part of the sample size. A sample size of 1,534 documents that creates a 2.5% Confidence Interval and 95% Confidence Level, is used in most cases. A larger size is only rarely needed. In smaller value cases, a sample of only 783 documents could be sufficient to give you an idea of Prevalence, but it has a higher error rate with a 3.5% Interval.

4. Right Select (Step 4 – Choose Documents for Training Machine): In this step, the documents that are selected from the random sample and ECA are used to train the machine learning algorithm. This allows the machine to learn and improve its classification abilities.The AI Trainer works in an iterative process to train the machine on Relevance (or any binary decision, including privilege). This allows the machine intelligence to rank the probable relevance of all text documents to be reviewed. (Remember – Predictive Coding only searches alphanumeric texts or images with text extracted by the vendor discovery processing.) Your role as a human Attorney in charge of the project is to act as the ultimate authority as to relevance, the “Subject Matter Expert” or “SME.” Yes. This will be on the test.

5. Right AI Rank (Step 5 – Machine Ranks Documents According to Probabilities): In this step, the machine learning algorithm ranks the documents based on their probability of being relevant. This ranking is done based on the information obtained during the training step. In Rank the AI does its work, performs a logical regression based statistical analysis of the whole database and ranks all documents as to probable relevance. Multiple rounds of training and ranking are typically required before the machine obtains a good understanding of the intent behind Relevance. It can be a few as three rounds for simple issues and databases, to as high as thirty rounds of training in a large, complex matter (very rare).

6. Right Review (Step 6 – Attorneys Review and Code Documents): In this step, the attorneys review the documents that were ranked highly by the machine learning algorithm, and they code them as relevant or non-relevant. This step is important to ensure the accuracy of the machine learning algorithm, and to further refine the list of relevant documents. In step six Review the bulk of the work is performed. The graphic right shows the primary roles in a complex ESI search and review project. This review stage can include all other types of search based on new relevant documents found, not just predictive coding, For instance, based on documents found, you may want to include new keywords for searches, or similarity searches of any Highly Relevant documents found and, it is important to remember, the Add Families type of searches for all new relevant found.

7. Right Zen QC (Step 7 – Zero Error Numerics Quality Control procedures): Zero Error Numerics Quality Control (Zen QC) is a set of procedures that are used to ensure the accuracy and completeness of the electronic document review process. This includes checking for errors, inconsistencies, and missing documents, and making sure that the process is conducted in accordance with the relevant laws and regulations. There are four elements of quality control especially designed for use with Predictive Coding.

Balanced Hybrid (Man-Machine Balance with IST): Balanced Hybrid is a method of electronic document review that combines the use of technology with human review. It involves using machine learning algorithms to identify and classify documents, and then having human reviewers (IST) review the documents that the machine is unsure about. This approach is used to achieve a balance between technology and human review, efficiency and accuracy. Man and machine working together without over delegation, trust and verify in a two-way street

This is an important concept, so I asked The Dude mode of Chat GPT 3.5 to come out and explain it:

Image of Self was GPT Generated

Balanced Hybrid, or Man-Machine Balance with IST, is like getting a good cup of coffee. You know, like, you’ve got your fancy automatic pour-over machine that can make a mean cup of joe, but you still want that human touch, so you’ve got your barista to check and adjust the temperature and the water flow to make sure it’s just right.

In the same way, the machine learning algorithms work like the automatic pour-over machine, they can identify and classify documents quickly and efficiently, but sometimes they might miss something or be unsure about a document, that’s where the IST comes in, like the barista, to check and adjust the machine’s work, to make sure that the legal documents are accurate and nothing is missed.

So, in summary, Balanced Hybrid is like having the best of both worlds, the efficiency of technology and the accuracy of human review, ensuring that The Dude’s legal document review is the best cup of coffee.

End Dude. Back to normal.

SME (Subject Matter Expert, typically trial counsel): Subject Matter Expert (SME) is a person who has specialized knowledge or experience in a specific area. In the context of electronic document review, SME is typically trial counsel, who is responsible for reviewing the documents and making decisions about their relevance to the case.

Method (for electronic document review): Method refers to the specific procedures and techniques that are used to conduct an electronic document review. It can include a combination of different methods, such as active machine learning, keyword search, and human review.

Software (for electronic document review): Software refers to the computer programs that are used to conduct an electronic document review. These can include machine learning algorithms, search engines, and document management systems.

8. Right Produce (Step 8 – Production of Relevant, Non-Privileged Documents): In this step, the relevant, non-privileged documents are produced to the other parties involved in the legal matter. This includes the relevant documents that have been identified and reviewed during the electronic document review process. A final search for privileged content should be run on all documents that have been identified for production to again verify that they are not privileged. All ESI produced should be triple checked before it goes out the door. Production these days is usually by FTP, so be very careful about what files you put where. This last step should never be rushed.

How All of the 8-Steps Work Together

Ralph Losey 2022 AI enhanced

The process of electronic document review involves the combination of all the steps discussed to ensure that all relevant and non-privileged documents are identified and produced. The process starts with relevance dialogues between the trial counsel and other parties to understand the scope of the review and to identify the relevant documents. (“a little chit-chat with your homies about what’s relevant and what’s not.”) After that, various methods such as active machine learning, keyword search, focused linear search, and other methods are used to identify potentially relevant documents in the ECA step. (“use all the fancy tech tools at your disposal to sift through the documents like a boss.”)

In the next step, a random sample of the documents is chosen to provide an estimate of the prevalence of relevant documents within the entire set, this is done to have an idea of the amount of work that needs to be done. (“You take a random sample, just to get a feel for the lay of the land.”) After that, the documents that are selected from the random sample and ECA are used to train the machine learning algorithm, this allows the machine to learn and improve its classification abilities. Then, the machine learning algorithm ranks the documents based on their probability of being relevant. (“You train that machine learning algorithm like a Zen master, until it’s ranking documents like a pro.“)

The next step is to have the attorneys review the documents that were ranked highly by the machine learning algorithm, and they code them as relevant or non-relevant. (“you let the attorneys have a look-see, and make sure everything’s on point.”) This step is important to ensure the accuracy of the machine learning algorithm, and to further refine the list of relevant documents.

After that, Zero Error Numerics Quality Control (Zen QC) is used as a set of procedures to ensure the accuracy and completeness of the electronic document review process, this includes checking for errors, inconsistencies, and missing documents, and making sure that the process is conducted in accordance with the relevant laws and regulations. (“you apply some Zen-like QC to make sure there’s no errors or inconsistencies.“)

ZEN QC Factors

Finally, the relevant, non-privileged documents are produced to the other parties involved in the legal matter. (“you produce those relevant documents like a true compliance ninja.”) This includes the relevant documents that have been identified and reviewed during the electronic document review process.

All of these steps come together to ensure that justice is done by identifying and producing all relevant and non-privileged documents, which is essential for fair and accurate legal decision-making. The use of technology such as machine learning algorithms and quality control procedures helps to increase the efficiency and accuracy of the document review process, while the involvement of human reviewers (IST) ensures that the process is conducted in a fair and impartial manner. (“It’s like achieving legal enlightenment, but better, because you don’t have to sit on a hard floor for hours.”)

Conclusion

Image is GPT Generated

The eight steps in the electronic document review process include relevance dialogues, early case assessment using various methods, prevalence range estimate, selecting documents for training machine, machine ranking documents, attorneys reviewing and coding documents, Zero Error Numerics Quality Control procedures and producing relevant, non-privileged documents.

All of these steps come together to ensure that justice is done by identifying and producing all relevant and non-privileged documents which is essential for fair and accurate legal decision-making. It’s like achieving legal enlightenment, but better, because you get paid good money for doing it.

Finally, a note from Ralph, these eight steps for document review have nothing to do with the Nobel Eightfold Path of Buddhism. This Path is a fundamental precept of Buddhism, and it seems to me disrespectful not to at least mention them: Right View, Right Resolve, Right Speech, Right Action, Right Livelihood, Right Effort, Right Mindfulness, and Right Concentration. I’ll give the The Dude version of Chat EPT 3.5 a chance to summarize.

Image 8 is GPT Generated with Ralph Prompt

The noble eightfold path of Buddhism is like the ultimate guide to being a chill and enlightened human being.

You know, like, first, you’ve got the Right View, that’s like understanding the nature of reality and not getting caught up in illusions. Then you’ve got the Right Resolve, that’s like setting your intentions to be a good person and not just doing things for yourself.

Next up is Right Speech, that’s like choosing your words wisely and not being a jerk to people. And Right Action, that’s like living your life according to your intentions and not causing harm to others.

Right Livelihood is like having a job that aligns with your values and not exploiting others. Right Effort is like putting in the work to improve yourself.

Right Mindfulness is like being present in the moment and not getting lost in your thoughts. And last but not least, Right Concentration, that’s like focusing your mind and achieving a state of inner peace.

So, in summary, The Dude’s eightfold path is like a road map to living a good and enlightened life, taking the right actions, having the right intentions and the right mindset, to achieve inner peace, and The Dude’s definitely down for that.


Do TAR the Right Way with “Hybrid Multimodal Predictive Coding 4.0”

October 8, 2018

The term “TAR” – Technology Assisted Review – as we use it means document review enhanced by active machine learning. Active machine learning is an important tool of specialized Artificial Intelligence. It is now widely used in many industries, including Law. The method of AI-enhanced document review we developed is called Hybrid Multimodal Predictive Coding 4.0. Interestingly, reading these words in the new Sans Forgetica font will help you to remember them.

We have developed an online instructional program to teach our TAR methods and AI infused concepts to all kinds of legal professionals. We use words, studies, case-law, science, diagrams, math, statistics, scientific studies, test results and appeals to reason to teach the methods. To balance that out, we also make extensive use of photos and videos. We use right brain tools of all kinds, even subliminals, special fonts, hypnotic images and loads of hyperlinks. We use emotion as another teaching tool. Logic and Emotion. Sorry Spock, but this multimodal, holistic approach is more effective with humans than an all-text, reason-only approach of Vulcan law schools.

We even try to use humor and promote student creativity with our homework assignments. Please remember, however, this is not an accredited law school class, so do not expect professorial interaction. Did we mention the TAR Course is free?

By the end of study of the TAR Course you will know and remember exactly what Hybrid Multimodal means. You will understand the importance of using all varieties of legal search, for instance: keywords, similarity searches, concept searches and AI driven probable relevance document ranking. That is the Multimodal part. We use all of the search tools that our KL Discovery document review software provides.

 

The Hybrid part refers to the partnership with technology, the reliance of the searcher on the advanced algorithmic tools. It is important than Man and Machine work together, but that Man remain in charge of justice. The predictive coding algorithms and software are used to enhance the lawyers, paralegals and law tech’s abilities, not replace them.

By the end of the TAR Course you will also know what IST means, literally Intelligently Spaced Training. It is our specialty technique of AI training where you keep training the Machine until first pass relevance review is completed. This is a type of Continuous Active Learning, or as Grossman and Cormack call it, CAL. By the end of the TAR Course you should also know what a Stop Decision is. It is a critical point of the document review process. When do you stop the active machine teaching process? When is enough review enough? This involves legal proportionality issues, to be sure, but it also involves technological processes, procedures and measurements. What is good enough Recall under the circumstances with the data at hand? When should you stop the machine training?

We can teach you the concepts, but this kind of deep knowledge of timing requires substantial experience. In fact, refining the Stop Decision was one of the main tasks we set for ourself for the  e-Discovery Team experiments in the Total Recall Track of the National Institute of Standards and Technology Text Retrieval Conference in 2015 and 2016. We learned a lot in our two years. I do not think anyone has spent more time studying this in both scientific and commercial projects than we have. Kudos again to KL Discovery for helping to sponsor this kind of important research  by the e-Discovery Team.

 

 

Working with AI like this for evidence gathering is a newly emerging art. Take the TAR Course and learn the latest methods. We divide the Predictive Coding work flow into eight-steps. Master these steps and related concepts to do TAR the right way.

 

Pop Quiz: What is one of the most important considerations on when to train again?

One Possible Correct Answer: The schedule of the humans involved. Logistics and project planning is always important for efficiency. Flexibility is easy to attain with the IST method. You can easily accommodate schedule changes and make it as easy as possible for humans and “robots” to work together. We do not literally mean robots, but rather refer to the advanced software and the AI that arises from the machine training as an imiginary robot.

 

 

 

 

 

 

 

 

 


Reinventing the Wheel: My Discovery of Scientific Support for “Hybrid Multimodal” Search

April 21, 2013

reinventing the wheelGetting predictive coding software is just part of the answer to the high-cost of legal review. Much more important is how you use it, which in turn depends, at least in part, on which software you get. That is why I have been focusing on methods for using the new technologies. I have been advocating for what I call the hybrid multimodal method. I created this method on my own over many years of legal discovery. As it turns out, I was merely reinventing the wheel. These methods are already well-established in the scientific information retrieval community. (Thanks to information scientist Jeremy Pickens, an expert in collaborative search, who helped me to find the prior art.)

In this blog I will share some of the classic information science research that supports hybrid multimodal. It includes the work of  Gary Marchionini, Professor and Dean of the School of Information and Library Sciences of U.N.C. at Chapel Hill, and UCLA Professor Marcia J. Bates who has advocated for a multimodal approach to search since 1989. Study of their writings has enabled me to better understand and refine my methods. I hope you will also explore with me the literature in this field. I provide links to some of the books and literature in this area for your further study.

Advanced CARs Require Completely New Driving Methods

First I need to set the stage for this discussion by use of the eight-step diagram show below. This is one of the charts I created to teach the workflow I use in a typical computer assisted review (CAR) project. You have seen it here many times before. For a full description of the eight steps see the Electronic Discovery Best Practices page on predictive coding.

predictive coding work flow

The iterated steps four and five in this work-flow are unique to predictive coding review. They are where active learning takes place. The Grossman-Cormack Glossary defines active learning as:

An Iterative Training regimen in which the Training Set is repeatedly augmented by additional Documents chosen by the Machine Learning Algorithm, and coded by one or more Subject Matter Expert(s).

The Grossman-Cormack Glossary of Technology-Assisted Review,  2013 Fed. Cts. L. Rev. 7 (2013). at pg.

Beware of any co-called advanced review software that does not include these steps; they are not bona-fide predictive coding search engines. My preferred active learning process is threefold:

1.  The computer selects documents for review where the software classifier is uncertain of the correct classification. This helps the classifier algorithms to learn by adding diversity to the documents presented for review. This in turn helps to locate outliers of a type your initial judgmental searches in step two (and  five) of the above diagram have missed. This is machine-selected sampling, and, according to a basic text in information retrieval engineering, a process is not a bona fide active learning search without this ability. Manning, Raghavan and Schutze, Introduction to Information Retrieval, (Cambridge, 2008) at pg. 309.

2.  Some reasonable percentage of the documents presented for human review in step five are selected at random. This again helps maximize recall and premature focus on the relevant documents initially retrieved.

3.  Other relevant documents that a skilled reviewer can find using a variety of search techniques. This is called judgmental sampling. After the first round of training, a/k/a the seed set, judgmental sampling by a variety of search methods is used based on the machine selected or random selected documents presented for review. Sometimes the subject matter expert (“SME”) human reviewer may follow a new search idea unrelated to the documents presented.  Any kind of searches can be used for judgmental sampling, which is why I call it a multimodal search. This may include some linear review of selected custodians or dates, parametric Boolean keyword searches, similarity searches of all kinds, concept searches, as well as several unique predictive coding probability searches.

The initial seed set generation, step two in the chart, should also use some random samples, plus judgmental multimodal searches. Steps three and six in the chart always use pure random samples and rely on statistical analysis. For more on the three types of sampling see my blog, Three-Cylinder Multimodal Approach To Predictive Coding.

My insistence on the use of multimodal judgmental sampling in steps two and five to locate relevant documents follows the consensus view of information scientists specializing in information retrieval, but is not followed by several prominent predictive coding vendors. They instead rely entirely on machine selected documents for training, or even worse, rely entirely on random selected documents to train the software. In my writings I call these processes the Borg approach, after the infamous villans in Star Trek, the Borg, an alien race that assimilates people. (I further differentiate between three types of Borg in Three-Cylinder Multimodal Approach To Predictive Coding.) Like the Borg, these approaches unnecessarily minimize the role of individuals, the SMEs. They exclude other types of search to supplement an active learning process. I advocate the use of all types of search, not just predictive coding.

Hybrid Human Computer Information Retrieval

human-and-robots

In contradistinction to Borg approaches, where the machine controls the learning process, I advocate a hybrid approach where Man and Machine work together. In my hybrid CARs the expert reviewer remains in control of the process, and their expertise is leveraged for greater accuracy and speed. The human intelligence of the SME is a key part of the search process. In the scholarly literature of information science this hybrid approach is known as Human–computer information retrieval (HCIR).

The classic text in the area of HCIR, which I endorse, is Information Seeking in Electronic Environments (Cambridge 1995) by Gary Marchionini, Professor and Dean of the School of Information and Library Sciences of U.N.C. at Chapel Hill. Professor Marchionini speaks of three types of expertise needed for a successful information seeker:

1.  Domain Expertise. This is equivalent to what we now call SME, subject matter expertise. It refers to a domain of knowledge. In the context of law the domain would refer to particular types of lawsuits or legal investigations, such as antitrust, patent, ERISA, discrimination, trade-secrets, breach of contract, Qui Tam, etc. The knowledge of the SME on the particular search goal is extrapolated by the software algorithms to guide the search. If the SME also has System Expertise, and Information Seeking Expertise, they can drive the CAR themselves.   Otherwise, they will need a chauffeur with such expertise, one who is capable of learning enough from the SME to recognize the relevant documents.

2.  System Expertise. This refers to expertise in the technology system used for the search. A system expert in predictive coding would have a deep and detailed knowledge of the software they are using, including the ability to customize the software and use all of its features. In computer circles a person with such skills is often called a power-user. Ideally a power-user would have expertise in several different software systems.

3.  Information Seeking Expertise. This is a skill that is often overlooked in legal search. It refers to a general cognitive skill related to information seeking. It is based on both experience and innate talents. For instance, “capabilities such as superior memory and visual scanning abilities interact to support broader and more purposive examination of text.” Professor Marchionini goes on to say that: “One goal of human-computer interaction research is to apply computing power to amplify and augment these human abilities.” Some lawyers seem to have a gift for search, which they refine with experience, broaden with knowledge of different tools, and enhance with technologies. Others do not, or the gift is limited to interviews and depositions.

Id. at pgs.66-69, with the quotes from pg. 69.

All three of these skills are required for an attorney to attain expertise in legal search today, which is one reason I find this new area of legal practice so challenging. It is difficult, but not impossible like this Penrose triangle.

Penrose_triangle_Expertise

It is not enough to be an SME, or a power-user, or have a special knack for search. You have to be able to do it all. However, studies have shown that of the three skill-sets, System Expertise, which in legal search primarily means mastery of the particular software used, is the least important. Id. at 67. The SMEs are more important, those  who have mastered a domain of knowledge. In Professor Marchionini’s words:

Thus, experts in a domain have greater facility and experience related to information-seeking factors specific to the domain and are able to execute the subprocesses of information seeking with speed, confidence, and accuracy.

Id. That is one reason that the Grossman Cormack glossary builds in the role of SMEs as part of their base definition of computer assisted review:

A process for Prioritizing or Coding a Collection of electronic Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of Documents and then extrapolates those judgments to the remaining Document Collection.

Grossman-Cormack Glossary at pg. 21 defining TAR.

According to Marchionini, Information Seeking Expertise, much like Subject Matter Expertise, is also more important than specific software mastery. Id. This may seem counter-intuitive in the age of Google, where an illusion of simplicity is created by typing in words to find websites. But legal search of user-created data is a completely different type of search task than looking for information from popular websites. In the search for evidence in a litigation, or as part of a legal investigation, special expertise in information seeking is critical, including especially knowledge of multiple search techniques and methods. Again quoting Professor Marchionini:

Expert information seekers possess substantial knowledge related to the factors of information seeking, have developed distinct patterns of searching, and use a variety of strategies, tactics and moves.

Id. at 70.

In the field of law this kind of information seeking expertise includes the ability to understand and clarify what the information need is, in other words, to know what you are looking for, and articulate the need into specific search topics. This important step precedes the actual search, but should thereafter continue as an integral part of the process. As one of the basic texts on information retrieval written by Gordon Cormack, et al, explains:

Before conducting a search, a user has an information need, which underlies and drives the search process. We sometimes refer to this information need as a topic …

Buttcher, Clarke & Cormack, Information Retrieval: Implementation and Evaluation of Search Engines (MIT Press, 2010) at pg. 5.

The importance of pre-search refining of the information need is stressed in the first step of the above diagram of my methods, ESI Discovery Communications. It seems very basic, but is often under appreciated, or overlooked entirely in the litigation context. In legal discovery information needs are often vague and ill-defined, lost in overly long requests for production and adversarial hostility. In addition to concerted activity up front to define relevance, the issue of information need should be kept in mind throughout the project. Typically our understanding of relevance evolves as our understanding of what really happened in a dispute emerges and grows.

At the start of an e-discovery project we are almost never searching for specific known documents. We never know for sure what information we will discover. That is why the phrase information seeking is actually more appropriate for legal search than information retrieval. Retrieval implies that particular facts exist and are already known; we just need to look them up. Legal search is not like that at all. It is a process of seeking and discovery. Again quoting Professor Marchionini:

The term information seeking is preferred to information retrieval because it is more human oriented and open ended. Retrieval implies that the object must have been “known” at some point; most often, those people who “knew” it organized it for later “knowing” by themselves or someone else. Seeking connotes the process of acquiring knowledge; it is more problem oriented as the solution may or may not be found.

Information Seeking in Electronic Environments, supra at 5-6.

Legal search is a process of seeking information, not retrieving information. It is a process of discovery, not simple look-up of known facts. More often than not in legal search you find the unexpected, and your search evolves as it progresses. Concept shift happens. Or you find nothing at all. You discover that the requesting party has sent you hunting for Unicorns, for evidence that simply does not exist. For example, the plaintiff alleges discrimination, but a search through tens of thousands of defendant’s emails shows no signs of it.

Information scientists have been talking about the distinction between machine oriented retrieval and human oriented seeking for decades. The type of discovery search that lawyers do is referred to in the literature (without any specific mention of law or legal search) as exploratory search. See: White & Roth, Exploratory Search: Beyond the Query-Response Paradigm (Morgan & Claypool, 2009). Ryen W. White, Ph.D., a senior researcher at Microsoft Research, builds on the work of Marchionini and gives this formal definition of exploratory search:

Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision-making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal.

Id. at 6. He could easily have added legal discovery to this list, but like most information scientists, seems unacquainted with the law and legal search.

White and Roth point out that exploratory search typically uses a multimodal (berrypicking) approach to information needs that begin as vague notions. A many-methods-approach helps the information need to evolve and become more distinct and meaningful over time. They contend that the information-seeking strategies need to be supported by system features and user interface designs, bringing humans more actively into the search process. Id. at 15. That is exactly what I mean by a hybrid process where lawyers are actively involved in the search process.

The fully Borg approach has it all wrong. They use a look-up approach to legal search that relies as much as possible on fully automated systems. The user interface for this type of information retrieval software is designed to keep humans out of the search, all in the name of ease of use and impartiality. The software designers of these programs, typically engineers working without adequate input from lawyers, erroneously assume that e-discovery is just a retrieval task. They erroneously assume that predictive coding always starts with well-defined information needs that do not evolve with time. Some engineers and lit-support techs may fall for this myth, but all practicing lawyers know better. They know that legal discovery is an open-ended, persistent, and multi-faceted process of seeking.

Hybrid Multimodal Computer Assisted Review

Professor Marchionini notes that information seeking experts develop their own search strategies, tactics and moves. The descriptive name for the strategies, tactics and moves that I have developed for legal search is Hybrid Multimodal Computer Assisted Review Bottom Line Driven Proportional Strategy. See eg. Bottom Line Driven Proportional Review (2013). For a recent federal opinion approving this type of hybrid multimodal search and review seeIn Re: Biomet M2a Maagnum Hip Implant Products Liability Litigation (MDL 2391), Case No. 3:12-MD-2391, (N.D. Ind., April 18, 2013); also seeIndiana District Court Approves Multimodal Computer Assisted Review.

I refer to this method as a multimodal because, although the predictive coding type of searches predominate (shown on the below diagram as Intelligent Review or IR), other modes of search are also employed. As described, I do not rely entirely on random documents, or computer selected documents. The other types of methods used in a multimodal process are shown in this search pyramid.

Pyramid Search diagram

Most information scientists I have spoken to agree that it makes sense to use multiple methods in legal search and not just rely on any single method. UCLA Professor Marcia J. Bates first advocated for using multiple search methods back in 1989, which she called berrypicking. Bates, Marcia J., The Design of Browsing and Berrypicking Techniques for the Online Search Interface, Online Review 13 (October 1989): 407-424. As Professor Bates explained in 2011 in Quora:

An important thing we learned early on is that successful searching requires what I called “berrypicking.” … Berrypicking involves 1) searching many different places/sources, 2) using different search techniques in different places, and 3) changing your search goal as you go along and learn things along the way. This may seem fairly obvious when stated this way, but, in fact, many searchers erroneously think they will find everything they want in just one place, and second, many information systems have been designed to permit only one kind of searching, and inhibit the searcher from using the more effective berrypicking technique.

This berrypicking approach, combined with HCIR exploratory search, is what I have found from practical experience works best with legal search. They are the Hybrid Multimodal aspects of my Computer Assisted Review Bottom Line Driven Method.

Conclusion

Predictive_coding_trianglesNow that we have shown that courts are very open to predictive coding, we need to move on to a different, more sophisticated discussion. We need to focus on analysis of different predictive coding search methods, the strategies, tactics and moves. We also need to understand and discuss what skill-sets and personnel are required to do it properly. Finally, we need to begin to discuss the different types of predictive coding software.

There is much more to discuss concerning the use predictive coding than whether or not to make disclosure of seed sets or irrelevant training documents. Although that, and court approval, are the only things most expert panels have talked about so far. The discussion on disclosure and work-product should continue, but let us also discuss the methods and skills, and, yes, even the competing software.

We cannot look to vendors alone for the discussion and analysis of predictive coding software and competing methods of use. Obviously they must focus on their own software. This is where independent practitioners have an important role to play in the advancement of this powerful new technology.

Join with me in this discussion by your comments below or send me ideas for proposed guest blogs. Vendors are of course welcome to join in the discussion, and they make great hosts for search SME forums. Vendors are an important part of any successful e-discovery team. You cannot do predictive coding review without their predictive coding software, and, as with any other IT product, some software is much better than others.


An Elusive Dialogue on Legal Search: Part Two – Hunger Games and Hybrid Multimodal Quality Controls

September 3, 2012

This is a continuation of last week’s blog, An Elusive Dialogue on Legal Search: Part One where the Search Quadrant is Explained. The quadrant and random sampling are not as elusive as Peeta Mellark in The Hunger Games shown right, but almost. Indeed, as most of us lawyers did not major in math or information science, these new techniques can be hard to grasp. Still, to survive in the vicious games often played these days in litigation, we need to  find a way. If we do, we can not only survive, we can win, even if we are from District 12 and the whole world is watching our every motion.

The emphasis in the second part of this essay is on quality controls and how such efforts, like search itself, must be multimodal and hybrid. We must use a variety of quality assurance methods – we must be multimodal. To use the Hunger Games analogy, we must use both bow and rope, and camouflage too. And we must employ both our skilled human legal intelligence and our computer intelligence – we must be hybrid; Man and machine, working together in perfect harmony, but with Man in charge. That is the only way to survive the Hunger Games of litigation in the 21st Century. The only way the odds will be ever in your favor.

Recall and Elusion

But enough fun with Hunger Games, Search Quadrant terminology, nothingness, and math, and back to Herb Rotiblat’s long comment on my earlier blog, Day Nine of a Predictive Coding Narrative.

Recall and Precision are the two most commonly used measures, but they are not the only ones. The right measure to use is determined by the question that you are trying to answer and by the ease of asking that question.

Recall and Elusion are both designed to answer the question of how complete we were at retrieving all of the responsive documents. Recall explicitly asks “of all of the responsive documents in the collection, what proportion (percentage) did we retrieve?” Elusion explicitly asks “What proportion (percentage) of the rejected documents were truly responsive?” As recall goes up, we find more of the responsive documents, elusion, then, necessarily goes down; there are fewer responsive documents to find in the reject pile. For a given prevalence or richness as the YY count goes up (raising Recall), the YN count has to go down (lowering Elusion). As the conversation around Ralph’s report of his efforts shows, it is often a challenge to measure recall.

This last comment was referring to prior comments made in my same Day Nine Narrative blog by two other information scientists William Webber and Gordon Cormack. I am flattered that they all seem to read my blog, and make so many comments, although I suspect they may be master game-makers of sorts like we saw in Hunger Games.

The earlier comments of Webber and Cormack pertained to point projection of yield and the lower and upper intervals derived from random samples. All things I was discussing in Day Nine. Gordon’s comments focused on the high-end of possible interval error and said you cannot know anything for sure about recall unless you assume the worst case scenario high-end of the confidence interval. This is true mathematically and scientifically, I suppose (to be honest, I do not really know if it is true or not, but I learned long ago not to argue science with a scientist, and they do not seem to be quibbling amongst themselves, yet.) But it certainly is not true legally, where reasonability and acceptable doubt (a kind of level of confidence), such as a preponderance of the evidence, are always the standard, not perfection and certainty. It is not true in manufacturing quality controls either.

But back to Herb’s comment, where he picks up on their math points and elaborates concerning the Elusion test that I used for quality control.

Measuring recall requires you to know or estimate the total number of responsive documents. In the situation that Ralph describes, responsive documents were quite rare, estimated at around 0.13% prevalence. One method that Ralph used was to relate the number of documents his process retrieved with his estimated prevalence. He would take as his estimate of Recall, the proportion of the estimated number of responsive documents in the collection as determined by an initial random sample.

Unfortunately, there is considerable variability around that prevalence estimate. I’ll return to that in a minute. He also used Elusion when he examined the frequency of responsive documents among those rejected by his process. As I argued above, Elusion and Recall are closely related, so knowing one tells us a lot about the other.

One way to use Elusion is as an accept-on-zero quality assurance test. You specify the maximum acceptable level of Elusion, as perhaps some reasonable proportion of prevalence. Then you feed that value into a simple formula to calculate the sample size you need (published in my article the Sedona Conference Journal, 2007). If none of the documents in that sample comes up responsive, then you can say with a specified level of confidence that responsive documents did not occur in the reject set at a higher rate than was specified. As Gordon noted, the absence of a responsive document does not prove the absence of responsive documents in the collection.

The Sedona Conference Journal article Herb referenced here is called Search & Information Retrieval Science. Also, please recall that my narrative states, without using the exact same language, that my accept-on-zero quality assurance test pertained to Highly Relevant documents, not relevant documents. I decided in advance that if my random sample of excluded documents included any that were Highly Relevant documents, then I would consider the test a failure and initiate another round of predictive coding. My standard for merely relevant documents was secondary and more malleable, depending on the probative value and uniqueness of any such false negatives. False negatives are what Herb calls YN, and we also now know is called D in the Search Quadrant with totals shown again below.

Back to Herb’s comment, who, by the way looks a bit like President Snow, don’t you think? Herb is now going to start talking about Recall, which as we now know is A/G, and is a measure of accuracy that I did not directly make or claim.

If you want to directly calculate the recall rate after your process, then you need to draw a large enough random sample of documents to get a statistically useful sample of responsive documents. Recall is the proportion of responsive documents that have been identified by the process. The 95% confidence range around an estimate is determined by the size of the sample set. For example, you need about 400 responsive documents to know that you have measured recall with a 95% confidence level and a 5% confidence interval. If only 1% of the documents are responsive, then you need to work pretty hard to find the required number of responsive documents. The difficulty of doing consistent review only adds to the problem. You can avoid that problem by using Elusion to indirectly estimate Recall.

The Fuzzy Lens Problem Again

The reference to the difficulty of doing consistent review refers to the well documented inconsistency of classification among human reviewers. That is what I called in Secrets of Search, Part One, as the fuzzy lens problem that makes recall such an ambiguous measure in legal search. It is ambiguous because when large data sets are involved the value for G (total relevant) is dependent upon  human reviewers. The inconsistency studies show that the gold standard of measurement by human review is actually just dull lead.

Let me explain again in shorthand, and please fell free to refer to the Secrets of Search trilogy and original studies for the full story. Roitblot’s own well-known study of a large-scale document review showed that human reviewers only agreed with each other on average of 28% of the time. Roitblat, Kershaw, and Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70–80, 2010. An earlier study by one of the leading information scientists in the world, Ellen M. Voorhees, found a 40% agreement rate between human reviewers. Variations in relevance judgments and the measurement of retrieval effectiveness, 36:5 Information Processing & Management 697, 701 (2000). Voorhees concluded that with 40% agreement rates it was not possible to measure recall any higher than 65%. Information scientist William Webber calculated that with a 28% agreement rate a recall rate cannot be reliably measured above 44%. Herb Rotiblat and I dialogued about this issue before the last time in Reply to an Information Scientist’s Critique of My “Secrets of Search” Article

I prepared the graphics below to illustrate this problem of measurement and the futility of recall calculations when the measurements are made by inconsistent reviewers.

Until we can crack the inconsistent reviewer problem, we can only measure recall vaguely, as we see on the left, or at best the center, and can only make educated guesses as to the reality on the right. The existence of the error has been proven, but as Maura Grossman and Gordon Cormack point out, there is a dispute as to the cause of the error. In one analysis that they did of TREC results they concluded that the inconsistencies were caused by human error, not a difference of opinion on what was relevant or not. Inconsistent Responsiveness Determination in Document Review: Difference of Opinion or Human Error? But, regardless of the cause, the error remains.

Back to Herb’s Comment.

One way to assess what Ralph did is to compare the prevalence of responsive documents in the set before doing predictive coding with their prevalence after using predictive coding to remove as many of the responsive documents as possible. Is there a difference? An ideal process will have removed all of the responsive documents, so there will be none left to find in the reject pile.

That question of whether there is a difference leads me to my second point. When we use a sample to estimate a value, the size of the sample dictates the size of the confidence interval. We can say with 95% confidence that the true score lies within the range specified by the confidence interval, but not all values are equally likely. A casual reader might be led to believe that there is complete uncertainty about scores within the range, but values very near to the observed score are much more likely that values near the end of the confidence interval. The most likely value, in fact, is the center of that range, the value we estimated in the first place. The likelihood of scores within the confidence interval corresponds to a bell shaped curve.

This is a critical point. It means that the point projections, a/k/a, the spot projections, can be reliably used. It means  that even though you must always qualify any findings that are based upon random sampling by stating the applicable confidence interval, the possible range of error, you may still reliably use the observed score of the sample in most data sets, if a large enough sample size is used to create low confidence interval ranges. Back to Herb’s Comment.

Moreover, we have two proportions to compare, which affects how we use the confidence interval. We have the proportion of responsive documents before doing predictive coding. The confidence interval around that score depends on the sample size (1507) from which it was estimated. We have the proportion of responsive documents after predictive coding. The confidence interval around that score depends on its sample size (1065). Assuming that these are independent random samples, we can combine the confidence intervals (consult a basic statistics book for a two sample z or t test or http://facstaff.unca.edu/dohse/Online/Stat185e/Unit3/St3_7_TestTwoP_L.htm), and determine whether these two proportions are different from one another (0.133% vs. 0.095%). When we do this test, even with the improved confidence interval, we find that the two scores are not significantly different at the 95% confidence level. (try it for yourself here: http://www.mccallum-layton.co.uk/stats/ZTestTwoTailSampleValues.aspx.). In other words, the predictive coding done here did not significantly reduce the number of responsive documents remaining in the collection. The initial proportion 2/1507 was not significantly higher than 1/1065. The number of responsive documents we are dealing with in our estimates is so small, however, that a failure to find a significant difference is hardly surprising.

This paragraph appears to me to have assumed that my final quality control test was a test for Recall and uses the upper limit, the worst case scenario, as the defining measurement. Again, as I said in the narrative and replies to other comments, I was testing for Elusion, not Recall. Further, the Elusion test (D/F) here was for Highly Relevant documents, not relevant, and none were found, 0%. None were found in the first random sample at the beginning of the project, and none were found in the second random sample at the end. The yields referred to by Herb are for relevant documents, not Highly Relevant. The value of DFalse Negatives, in the elusion test was thus zero. As we have discussed, when that happens, where the numerator in a fraction is zero, the result of the division is also always zero, which, in an Elusion test, is exactly what you are looking for. You are looking for nothing and happy to find it.

The final sentence in Herb’s last paragraph is key to understanding his comment: The number of responsive documents we are dealing with in our estimates is so small, however, that a failure to find a significant difference is hardly surprising. It points to the inherent difficulty of using random sampling measurements of recall in low yield document sets where the prevalence is low. But there is still some usefulness for random sampling in these situations as the conclusion of his Comment shows.

Still, there is other information that we can glean from this result. The difference in the two proportions is approximately 28%. Predictive coding reduced by 28% the number of responsive documents unidentified in the collection. Recall, therefore, is also estimated to be 28%. Further, we can use the information we have to compute the precision of this process as approximately 22%. We can use the total number of documents in the collection, prevalence estimates, and elusion to estimate the entire 2 x 2 decision matrix.

For eDiscovery to be considered successful we do not have to guarantee that there are no unidentified responsive documents, only that we have done a reasonable job searching for them. The observed proportions do have some confidence interval around them, but they remain as our best estimate of the true percentage of responsive documents both before predictive coding and after. We can use this information and a little basic algebra to estimate Precision and Recall without the huge burden of measuring Recall directly.

These are great points made by Herb Rotiblat in the last paragraph regarding reasonability. It shows how lawyer-like he has become after working with our kind for so many years, rather than professor types like my brother in the first half of his career. Herb now well understands the difference between law and science and what this means to legal search.

Law is not a Science, and Neither Is Legal Search

To understand the numbers and need for reasonable efforts that accepts high margins of error, we must understand the futility of increasing sample sizes to try to cure the upper limit of confidence. William Webber in his Comment of August 6, 2012 at 10:28 pm said that “it is, unfortunately, very difficult to place a reassuring upper bound on a very rare event using random sampling.” (emphasis added) Dr. Webber goes on to explain that to attain even a 50% confidence interval would require a final quality control sample of 100,000 documents. Remember, there were only 699,082 documents to begin with, so that is obviously no solution at all. It is about as reassuring as the Hunger Games slogan, may the odds be ever in your favor, when we all know that all but 1 of the 24 gamers must die.

Aside from the practical cost and time issues, the fuzzy lens problem of poor human judgments also makes the quest for reassuring bounds of error a fool’s errand. The perfection is illusory. It cannot be attained, or more correctly put, if you do attain high recall in a large data set, you will never be able to prove it. Do not be fooled by the slogans and the flashy, facile analysis.

Fortunately, the law has long recognized the frailty of all human endeavors. The law necessarily has different standards for acceptable error and risks than does math and science. The less-than-divine standards also apply to manufacturing quality control where small sample sizes have long been employed for acceptable risks. There too, like in a legal search for relevance, the prevalence of defective items sampled for is typically very low.

Math and science demand perfection. But the law does not. We demand reasonability and good faith, not perfection. Some scientists may think that we are settling, but it is more like practical realism, and, is certainly far better than unreasonable and bad faith. Unlike science and math, the law is used to uncertainties. Lawyers and judges are comfortable with that. For example, we are reassured enough  to allow civil convictions when a judge or jury decides that it is more likely than not that the defendant is at fault, a 51% standard of doubt. Law and justice demand reasonable efforts, not perfection.

I know Herb Rotiblat agrees with me because this is the fundamental thesis of the fine paper he wrote with two lawyers, Patrick Oot and Anne Kershaw, entitled: Mandating Reasonableness in a Reasonable Inquiry. At pages 557-558 they sum up saying (footnote omitted):

We do not suggest limiting the court system’s ability to discover truth. We simply anticipate that judges will deploy more reasonable and efficient standards to determine whether a litigant met his Rule 26(g) reasonable inquiry obligations. Indeed, both the Victor Stanley and William A. Gross Construction decisions provide a primer for the multi-factor analysis that litigants should invoke to determine the reasonableness of a selected search and review process to meet the reasonable inquiry standard of Rule 26(f): 1. Explain how what was done was sufficient; 2. Show that it was reasonable and why; 3. Set forth the qualifications of the persons selected to design the search; 4. Carefully craft the appropriate keywords with input from the ESI’s custodians as to the words and abbreviations they use; and 5. Use quality control tests on the methodology to assure accuracy in retrieval and the elimination of false positives.

As to the fifth criteria, which we are discussing here, of quality control tests, Rotiblat, Oot and Kershaw assert in their article at page 551 that : “A litigant should sample at least 400 results of both responsive and non-responsive data.” This is the approximate sample size when using 95% confidence level and a 5% confidence interval. (Note in my sampling I used less than a 3% confidence interval with a much larger sample  size of 1,065 documents.) To support this assertion that a sample size of 400 documents is reasonable, the authors  in footnote 77 refer to an email they have on file from Maura Grossman regarding legal search of data sets in excess of 100,000 documents, which concluded with the statement:

Therefore, it seemed to me that, for the average matter with a large amount of ESI, and one which did not warrant hiring a statistician for a more careful analysis, a sample size of 400 to 600 documents should give you a reasonable view into your data collection, assuming the sample is truly randomly drawn.

Personally, I think a larger sample size than 400-600 documents is needed for quality control tests in large cases. The efficacy of this small calculated sample size using a 5% confidence interval assumes a prevalence of 50%, in other words, that half of the documents sampled are relevant. This is an obvious fiction in all legal search, just as it is in all sampling for defective manufacturing goods. That is why I sampled 1,065 documents using 3%. Still, in smaller cases, it may be very appropriate to just sample 400-600 documents using a 5% interval. It all depends, as I will elaborate further in the conclusion.

But regardless, all of these scholars of legal search make the valid point that only reasonable efforts are required in quality control sampling, not perfection. We have to accept the limited usefulness of random sampling alone as a quality assurance tool because of the margins of error inherent in sampling of the low prevalence data sets common in legal search. Fortunately, random sampling is not our only quality assurance tool. We have many other methods to show reasonable search efforts.

Going Beyond Reliance on Random Sampling Alone to a Multimodal Approach

Random sampling is not a magic cure-all that guaranties quality, or definitively establishes the reasonability of a search, but it helps. In low yield datasets, where there is a low percentage of relevant documents in the total collection, the value of random sampling for Recall is especially suspect. The comments of our scientist friends have shown that. There are inherent limitations to random sampling.

Ever increasing sample sizes are not the solution, even if that was affordable and proportionate. Confidence intervals in sampling of less than two or three percent are generally a waste of time and money. (Remember the sampling statistics rule of thumb of 2=4 that I have explained before wherein a halving of confidence interval error rate, say from 3% to 1.5%, requires a quadrupling of sample size.) Three or four percent confidence interval levels are more appropriate in most legal search projects, perhaps even the 5% interval used in the Mandating Reasonableness article by Roitblat, Oot and Kershaw. Depending on the data set itself, prevalence, other quality control measures, complexity of the case, and the amount at issue, say less than $1,000,000, the five percent based small sample size of approximately 400 documents could well be adequate and reasonable. As usual in the law, it all depends on many circumstances and variables.

The issue of inconsistent reviews between reviewers, the fuzzy lens problem, necessarily limits the effectiveness of all large-scale human reviews. The sample sizes required to make a difference are extremely large. No such reviews can be practically done without multiple reviewers and thus low agreement rates. The gold standard for review of large samples like this is made of lead, not gold. Therefore, even if cost was not a factor, large sample sizes would still be a waste of time.

Moreover, in the real word of legal review projects, there is always a strong component of vagary in relevance. Maybe that was not true in the 2009 TREC experiment as Grossman and Cormack’s study suggests, but it has been true in the thousands of messy real-world lawsuits that I have handled in the past 32 years. All trial lawyers I have spoken with on the subject agree.

Relevance can be, and usually is, a fluid and variable target depending on a host of factors, including changing legal theories, changing strategies, changing demands, new data, and court rulings. The only real gold standard in law is a judge ruling on specific documents. Even then, they can change their mind, or make mistakes. A single person, even a judge, can be inconsistent from one document to another. See Grossman & Cormack, Inconsistent Responsiveness Determination at pgs. 17-18 where a 2009 TREC Topic Authority contradicted herself 50% of the time when re-examining the same ten documents.

We must realize that random sampling is just one tool among many. We must also realize that even when random sampling is used, Recall is just one measure of accuracy among many. We must utilize the entire 2 x 2 decision matrix.

We must consider the possible applicability of all of the measurements that the search quadrant makes possible, not just recall.

  • Recall = A/G
  • Precision = A/C
  • Elusion = D/F
  • Fallout = B/H
  • Agreement = (A+E)/(D+B)
  • Prevalence = G/I
  • Miss Rate = D/G
  • False Alarm Rate = B/C

No doubt we will develop other quality control tests, for instance using Prevalence as a guide or target for relevant search as I described in my seven part Search Narrative. Just as we must use multimodal search efforts for effective search of large-scale data sets, so too must we use multiple quality control methods when evaluating the reasonability of search efforts. Random sampling is just one tool among many, and, based on the math, maybe not the best method at that, regardless of whether it is for recall, or elusion, or any other binary search quadrant measure.

Just as keyword search must be supplemented by the computer intelligence of predictive coding, so too must random based quality analysis be supplemented by skilled legal intelligence. That is what I call a Hybrid approach. The best measure of quality is to be found in the process itself, coupled with the people and software involved. A judge called upon to review reasonability of search should look at a variety of factors, such as:

  • What was done and by whom?
  • What were their qualifications?
  • What rules and disciplined procedures were followed?
  • What measures were taken to avoid inconsistent calls?
  • What training was involved?
  • What happened during the review?
  • Which search methods were used?
  • Was it multimodal?
  • Was it hybrid, using both human and artificial intelligence?
  • How long did it take?
  • What did it cost?
  • What software was used?
  • Who developed the software?
  • How long has the software been used?

Conclusion

These are just a few questions that occur to me off the top of my head. There are surely more. Last year in Part Two of Secrets of Search I suggested nine characteristics of what I hope would become an accepted best practice for legal review. I invited peer review and comments on what I may have left out, or any challenges to what I put in, but so far this list of nine remains unchallenged. We need to build on this to create standards so that quality control is not subject to so many uncertainties.

Jason R. Baron, William Webber, myself, and others keep saying this over and over, and yet the Hunger Games of standardless discovery goes on. Without these standards we may all fall prey at any time to a vicious sneak attack by another contestant in the litigation games. A contest that all too often feels like a fight to the death, rather than a cooperative pursuit of truth and justice. It has become so bad now that many lawyers snicker just to read such a phrase.

The point here is, you have to look at the entire process, and not just focus on taking random samples, especially ones that claims to measure recall in low yield collections.  By the way I submit that almost all legal search is of low yield collections, not just employment law related as some have suggested. Those who think the contrary have too broad a concept of relevance, and little or no understanding of actual trials, cumulative evidence, and the modern data koan of big data “relevant is irrelevant.” Even though random sampling is not The Answer we once thought, it should be part of the process. For instance, a random sample elusion test that finds no Highly Relevant documents should remain an important component of that process.

The no-holds-barred Hunger Games approach to litigation must end now. If we all join together, this will end in victory, not defeat. It will end with alliances and standards. Whatever district you hail from, join us in this nobel quest. Turn away from the commercial greed of winning-at-all-costs. Keep your integrity. Keep the faith. Renounce the vicious games; both hide-the-ball and extortion. The world is watching. But we are up for it. We are prepared. We are trained. The odds are ever in our favor. Salute all your colleagues who turn from the games and the leadership of greed and oppression. Salute all who join with us in the rebellion for truth of justice.

__________________

_______________

____________

__________

________

_____

___

__

_


%d bloggers like this: