Seventh Class: Keyword and Linear Review
This class covers Tested, Parametric Boolean Keyword Search and Linear Review.
Introduction to Keyword Search
As an attorney in private practice specializing in e-discovery with a large national law firm I know first hand how electronic document review is conducted in U.S. District Courts. My knowledge is especially deep in small and medium sized employment law cases. With fifty offices and eight hundred lawyers to supervise in e-discovery, I have seen it all.
We deal everyday with opposing counsel who are sometimes mired in keywords, thinking it is the end-all and be-all of legal search. Moreover, they usually want to go about doing it without any testing. Instead, they think they are geniuses who can just dream up good searches out of thin air. They think because they know what their legal complaint is about, they know what keywords will be used by the witnesses in all relevant documents. I cannot tell you how many times I see the word “complaint” in their keyword list. The guessing involved reminds me of the child’s game of Go Fish.
I wrote about this in 2009 and the phrase caught on after Judge Peck and others started citing to this article, which later became a chapter in my book, Adventures in Electronic Discovery, 209-211 (West 2011). The Go Fish analogy appears to be the third most popular reference in predictive coding case-law, after the huge, Da Silva Moore case in 2012 that Judge Peck and I are best known for.
E-discovery Team members employed by Kroll also see thousands of document reviews for other law firms and corporate clients. They see them from all around the world. There is no doubt in our minds that keyword search is still the dominant search method used by most attorneys. It is especially true in small to medium-sized firms, but also in larger firms that have no e-discovery search expertise. Many attorneys and paralegals who use a sophisticated, full featured document review platforms such as Kroll Ontrack’s EDR, still only use keyword search. They do not use the many other powerful search techniques of EDR, even though they are readily available to them. The Search Pyramid to them looks more like this, which I call a Dunce Hat.
The AI at the top, standing for Predictive Coding, is, for average lawyers today, still just a far off remote mountain top; something they have heard about, but never tried. Even though this is my speciality, I am not worried about this. I am confident that this will all change soon. Our new, easier to use methods will help with that, so too will ever improving software by the few vendors left standing. The judges are already doing their part. No judge has ever disapproved the use of predictive coding, although they do refuse to require it (so far). Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016).
Plus, high-tech propagation is an inevitable result of the next generation of lawyers assuming leadership positions in law firms and legal departments. The old-timey paper lawyers around the world are finally retiring in droves. The aging out of current leadership is a good thing. Their over-reliance on untested keyword search to find evidence is holding back our whole justice system. The law must keep up with technology and lawyers must not fear math, science and AI. They must learn to keep up with technology. This is what will allow the legal profession to remain a bedrock of contemporary culture. It will happen. Positive disruptive change is just under the horizon and will soon rise.
In the meantime we encounter opposing counsel everyday who think e-discovery means to dream up keywords and demand that every document that contains their keywords be produced. The more sophisticated of this confederacy of dunces understand that we do not have to produce them, that they might not all be per se relevant, but they demand that we review them all and produce the relevant ones. Fortunately we have the revised rules to protect our clients from these kind of disproportionate, unskilled demands. All too often this is nothing more than discovery as abuse.
This still dominant approach to litigation is really nothing more than an artifact of the old-timey paper lawyers’ use of discovery as a weapon. Let me speak plainly. This is nothing more than adversarial bullshit discovery with no real intent by the requesting party to find out what really happened. They just want to make the process as expensive and difficult as possible for the responding party because, well, that’s what they were trained to do. That is what they think smart, adversarial discovery is all about. Just another tool in their negotiate and settle, extortion approach to litigation. It is the opposite of the modern cooperative approach.
I cannot wait until these dinosaurs retire so we can get back to the original intent of discovery, a cooperative pursuit of the facts. Fortunately, a growing number of our opposing counsel do get it. We are able to work very well with them to get things done quickly and effectively. That is what discovery is all about. Both sides save their powder for when it really matters, for the battles over the meaning of the facts, the governing law, and how the facts apply to this law for the result desired.
Tested, Parametric Boolean Keyword Search
The biggest surprise from our 2016 TREC research was just how well good keyword search can perform under the right circumstances. I’m talking about hands-on, tested keyword search based on human document review and file scanning, sampling, and also based on strong keyword search software. When keyword search is done with skill and is based on the evidence seen, typically in a refined series of keyword searches, very high levels of Precision, Recall and F1 are sometimes attainable. Again, the dataset and other conditions must be just right for it to be that effective, as explained in the diagram: simple data, clear target and good SME. Sometimes keywords are the best way to find clear targets like names and dates.
In those circumstances the other search forms may not be needed to find the relevant documents, or at least to find almost all of the relevant documents. These are cases where the hybrid balance is tipped heavily towards the 400 pound man hacking away at the computer. All the AI does in these circumstances, when the human using keyword search is on a roll, is double-check and verify that it agrees that all relevant documents have been located. It is always nice to get a free second opinion from Mr. EDR. This is an excellent quality control and quality assurance application from our legal robot friends.
We are not going to try to go through all of the ins and outs of keyword search. There are many variables and features available in most document review platforms today to make it easy to construct effective keyword searches and otherwise find similar documents. This is the kind of thing that Kroll and I teach to the e-discovery liaisons in my firm and other attorneys and paralegals handing electronic document reviews. The passive learning software features can be especially helpful, so too can simple indexing and clustering. Most software programs have important features to improve keyword search and make it more effective. All lawyers should learn the basic tested, keyword search skills.
There is far more to effective keyword search than a simple Google approach. (Google is concerned with finding websites, not recall of relevant evidence.) Still, in the right case, with the right data and easy targets, keywords can open the door to both high recall and precision. Keyword search, even tested and sophisticated, does not work well in complex cases or with dirty data. It certainly has its limits and there is a significant danger in over reliance on keyword search. It is typically very imprecise and can all to easily miss unexpected word usage and misspellings. That is one reason that the e-Discovery Team always supplements keyword search with a variety of other search methods, including predictive coding.
Focused Linear Search – Key Dates & People
In Abraham Lincoln’s day all a lawyer had to do to prepare for a trial was talk to some witnesses, talk to his client and review all of the documents the clients had that could possibly be relevant. All of them. One right after the other. In a big case that might take an hour. Flash forward one hundred years to the post-photocopier era of the 1960s and document review, linear style reviewing them all, might take a day. By the 1990s it might take weeks. With the data volume of today such a review would take years.
All document review was linear up until the 1990s. Until that time almost all documents and evidence were paper, not electronic. The records were filed in accordance with an organization wide filing system. They were combinations of chronological files and alphabetical ordering. If the filing was by subject then the linear review conducted by the attorney would be by subject, usually in alphabetical order. Otherwise, without subject files, you would probably take the data and read it in chronological order. You would certainly do this with the correspondence file. This was done by lawyers for centuries to look for a coherent story for the case. If you found no evidence of value in the papers, then you would smile knowing that your client’s testimony could not be contradicted by letters, contracts and other paperwork.
This kind of investigative, linear review still goes on today. But with today’s electronic document volumes the task is carried out in warehouses by relatively low paid, document review contract lawyers. By itself it is a fool’s errand, but it is still an important part of a multimodal approach.
There is nothing wrong with Focused Linear Search when used in moderation. And there is nothing wrong with document review contract-lawyers, except that they are underpaid for their services, especially the really good ones. I am a big fan of document review specialists.
Large linear review projects can be expensive and difficult to manage. Moreover, it typically has only limited use. It breaks down entirely when large teams are used because human review is so inconsistent in document analysis. Losey, R., Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” (parts One, Two and three) (December 8, 2013, e-Discovery Team). When review of large numbers of documents are involved the consistency rate among multiple human reviewers is dismal. Also see: Roitblat, Predictive Coding with Multiple Reviewers Can Be Problematic: And What You Can Do About It (4/12/16).
Still, linear review can be very helpful in limited time spans and in reconstruction of a quick series of events, especially communications. Knowing what happened one day in the life of a key custodian can sometimes give you a great defense or great problem. Either are rare. Most of the time Expert Manual Review is helpful, but not critical. That is why Expert Manual Review is at the base of the Search Pyramid that illustrates our multimodal approach.
An attorney’s knowledge, wisdom and skill are the foundation of all that we do, with or without AI. The information that an attorney holds is also of value, especially information about the latest technology, but the human information roles are diminishing. Instead the trend is to delegate mere information level services to automated systems. The legal robots would not be permitted to go beyond information fulfillment roles and provide legal advice based on human knowledge and wisdom. Their function would be constrained to Information processing and reports. The metrics and technology tools they provide can make it easier for the human attorneys to build a solid evidentiary foundation for trial.
e-Discovery Team LLC COPYRIGHT 2017
ALL RIGHTS RESERVED